Mark LastEpilogIdx as maybe_unused (#204857)
#203108 added a variable which is read only in debug builds, so we are
seeing warning in release builds without asserts.
[flang][OpenMP] Scope-qualify user-defined reduction names in lowering (#202474)
A named !$omp declare reduction was lowered to an omp.declare_reduction
operation whose symbol name was just the bare reduction name (e.g.
`@a`), without any scope qualification. Semantic name resolution was
correct and gave each scope its own reduction symbol, but lowering
deduplicates the declare reduction op by name, so two subroutines that
declared a reduction with the same name collapsed onto a single op.
As a result, a reduction(name:var) clause could bind to a declaration
that leaked in from a different scope.
Per OpenMP 6.0 7.6.14, a user-defined reduction has the same visibility
and accessibility as a variable declared at the same location.
Qualify the generated op name with the scope in which the reduction is
declared using mangleName, the same approach already used for
omp.private and declare mapper. This is applied consistently when the op
is created, when a clause references it, and when its existence is
[2 lines not shown]
[flang][OpenMP] Emit warning that REVERSE_OFFLOAD is not supported (#204647)
Right now we quietly ignore it, whereas the OpenMP spec mandates a
compilation error for requirements that the implementation does not
support.
The REVERSE_OFFLOAD was not causing a compilation error to allow testing
of incremental implementation improvements, but we should at least warn
about not supporting it.
clang/AMDGPU: Fix double linking opencl libs with --libclc-lib
Noticed by inspection. If using an explicit --libclc-lib flag,
do not attempt to also link the rocm device libs which will contain
different implementations of the same opencl symbols.
Co-Authored-By: Claude <noreply at anthropic.com>
clang/AMDGPU: Merge toolchain subclasses
Simplify the toolchain implementations by collapsing
them into one. Previously we had a confusing split. The
AMDGPUToolChain base class implemented much of the base
support. It was subclassed by ROCMToolChain, which would
have been more accurately described as the offloading subclass.
That was further subclassed into HIP and OpenMP specific subclasses.
Deleting those two is the important part of this change. There was
code duplication, and features arbitrarily handled in one but not
the other. The offload kind is passed in almost everywhere if you
really need to know the original language. However, I consider
this an antifeature, and it is really poor QoI to have the HIP
and OpenMP toolchains behave differently in any way. The platform
should be consistent and the driver behaviors should not depend
on the language.
There is additional mess in the handling of spirv, which this
[9 lines not shown]
clang/AMDGPU: Remove artificial restriction on --gpu-max-threads-per-block
Previously this flag was only handled for HIP, and would produce an unused
argument warning. Also use a simpler method for forwarding the flag to cc1.
Revert "[libc] Implement basename and dirname in libgen.h (#204554)" (#204856)
Reverted due to death tests failing with ASan on buildbots. Reverts
commit 29692c150f86d76cfb58e8bf2c0e97dc6afd2088.
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[RFC][IR] Extract AMDGPU-specific verification logic into `VerifierAMDGPU.cpp`
`Verifier.cpp` is large and already mixes generic IR verification with
target-specific checks. We also have a growing amount of AMDGPU verifier logic
downstream, which would all end up in the same file if we don't address this,
and that is not ideal.
This patch extracts AMDGPU-specific verification logic into a separate
`VerifierAMDGPU.cpp` file, with shared infrastructure (`VerifierSupport`) moved
into `VerifierInternal.h`.
This is purely a code organization change, not a target-dependent IR verifier.
All checks remain compiled and linked into `LLVMCore` regardless of the target
triple. The extracted functions are called unconditionally at well-defined
extension points in `Verifier.cpp`, and each function internally gates on
target-specific conditions (for example, triple checks or intrinsic IDs) as
needed. The file is strictly limited to AMDGPU-specific IR constructs (amdgcn
intrinsics, AMDGPU module flags, etc.), and does not contain generic IR rules
that vary by target.
[10 lines not shown]
[x64][win] Windows x64 unwind v3: Use tail-relative epilog offsets and add size-based splitting (#203108)
Win64 Unwind v3 encodes each epilog's EpilogOffset as a signed 16-bit
field. The encoder previously measured the first epilog offset from the
fragment start, which overflowed for large functions and produced a
cryptic "<unknown>:0: value too large for field" error (and, on the
early .seh_handlerdata path, an assertion failure).
Two changes:
- MCWin64EH.cpp: Always emit epilog offsets tail-relative. The first
epilog descriptor is measured from the fragment end and subsequent ones
as deltas from the previous epilog, so descriptors are emitted in
descending address order (all non-positive, per spec). A new lazy
MCUnwindV3EpilogOffsetTargetExpr resolves the fragment-end-relative
value at layout time (it may not have a symbol yet when emitted via
.seh_handlerdata) and reports a clean, function-named diagnostic on
genuine overflow.
[11 lines not shown]
[MemorySanitizer] Merge x86 BMI and PackedBits handlers into handleGenericBitManipulation (#204786)
As discussed on #204144 - its not necessary to have separate handlers, just because some are target intrinsics
[SLP] Fix reduction cost crash for reduced values replaced by extractelement
A reduced value may be replaced by an extractelement while vectorizing a
previous subvector, so it is no longer a key in ReducedValsToOps.
Look through replaced values to the reduction operation among their users.
Fixes #204814
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/204847
[AMDGPU][NFC] Templatise and roundtrip gfx13_asm_vop3_dpp16.s
Again, this is based on the templatised version of
gfx12_asm_vop3_dpp16.s with the GFX13-specific changes re-applied
on top of it.
gfx13_dasm_vop3_dpp16.txt was never upstreamed, so no changes for
the disassembler side.
[libcxx] Make std::pair pretty-printer ABI-independent (#201768)
std::pair is printed explicitly instead of relying on GDB's default
struct formatting to keep output stable across ABI configurations.
With _LIBCPP_DEPRECATED_ABI_DISABLE_PAIR_TRIVIAL_COPY_CTOR (default on
some platforms, e.g. FreeBSD), std::pair gains an empty
__non_trivially_copyable_base base class. GDB would otherwise render
this as <...__non_trivially_copyable_base<...>> = {<No data fields>},
which makes output ABI-dependent.
Only first and second are meaningful, so print them directly.
Fix __transform_primary in FreeBSD
FreeBSD's strxfrm() encodes collation weights one level at a time,
separating the primary, secondary, and tertiary with '.' bytes. Since
primary equivalence only depends on the primary collation weight, ignore
everything after the first separator when constructing the transformed
key.
This patch the intended behavior of primary equivalence and avoids
relying on glibc's fixed-size collation-key representation.
[libc] Implement basename and dirname in libgen.h (#204554)
Added the POSIX standard functions basename and dirname under a new
libgen.h header. The implementations modify the input path in-place
using cpp::string_view to determine boundaries safely.
Added find_last_not_of to cpp::string_view to support trailing slash
removal.
Implemented:
* libc/include/libgen.yaml, libgen.h.def: Public API definitions.
* libc/src/libgen/basename.cpp, dirname.cpp: Generic implementations.
* libc/test/src/libgen/: Unit and hermetic tests.
Registered the new entrypoints for all active Linux targets (x86_64,
aarch64, arm, riscv) and added docgen configuration.
Assisted-by: Automated tooling, human reviewed.
[clang] Avoid premature Twine .str() materialization (#204830)
Several call sites pass `expr.str()` to parameters of type `const
llvm::Twine &`, forcing a throwaway heap std::string that is immediately
rewrapped into a Twine. Drop the `.str()` and let Twine accept the
StringRef/concatenation directly.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
AMDGPU/GlobalISel: Remove -new-reg-bank-select option
AMDGPU's -global-isel pipeline that uses AMDGPURegBankSelect and
AMDGPURegBankLegalize, previously -global-isel -new-reg-bank-select,
is now the default -global-isel pipeline.
Remove -new-reg-bank-select option from the compiler.
Remove -new-reg-bank-select from all llvm regression tests.
Edit a couple comments to reference RegBankLegalize instead of
-new-reg-bank-select.
[llvm][Target] Avoid premature Twine .str() materialization (#204836)
Call sites in the AMDGPU and SPIRV parsers and the SystemZ AsmPrinter /
InstrInfo pass `expr.str()` (or `.str().c_str()`) to parameters of type
`const llvm::Twine &`, forcing a throwaway heap std::string that is
immediately rewrapped into a Twine. Drop the materialization and let
Twine accept the concatenation directly.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>