[DAG] canCreateUndefOrPoison - out of range vector insert/extract element indices only generate poison (#196720)
Matches ValueTracking / GISel implementations - although testing options are limited until DAG has actual uses of UndefPoisonKind::UndefOnly
clang: Add BoundArch/OffloadKind argument to getSupportedSanitizers
Currently the AMDGPU HIP and OpenMP toolchains falsely report
all host sanitizers are supported, and then go out of their way
to skip forwarding those to the device compiles. Add an offloading
kind argument so that in the future this can be handled in one
place in the base toolchain.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
clang: Refactor handling of offload sanitizer arguments
Previously the AMDGPU toolchains hackily handled -fsanitize arguments.
They would lie and report that all host side sanitizers are available,
then TranslateArgs would filter out the device side cases that do not
work, providing diagnostics for the skipped cases. Move that logic
into the base sanitizer argument parsing.
This makes the produced diagnostics more consistent. Previously we
would get repeated warnings when a sanitizer is fully unsupported
by amdgpu, which should now be once for the toolchain. These could
be further improved; we're printing the specific field of -fsanitize
in more cases where it could be skipped. In other cases we have the
opposite problem, where we aren't reporting the exact sanitizer
from the -f flag in the case that depends on a subtarget feature.
This will help fix other broken target specific flag forwarding bugs
in the future.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
[MCParser] .incbin: Don't retain the buffer, don't require NUL termination (#196696)
processIncbinFile uses SourceMgr::AddIncludeFile, which
* sets `RequiresNullTerminator=true` and disable `mmap` when the file
size is a multiple of the page size,
* and unnecessarily retains the throwaway buffer in `Buffers`.
Switch to OpenIncludeFile so the buffer is freed when processIncbinFile
returns, and pass RequiresNullTerminator=false. The buffer is consumed
only by emitBytes; the lexer never scans it, so it does not need a
trailing '\0' (different from #154972). Without that requirement,
MemoryBuffer mmaps the file and RSS tracks only the touched pages.
Stress test (1000 .incbin "blob.bin", 0, 16 against a 1 MiB blob):
```
Maximum RSS
Before 1042944 KiB
[3 lines not shown]
[X86] Hoist ReservedIdentifiers to MCAsmInfo and shrink setup cost. NFC (#196699)
PR #186570 added a per-MCAsmInfo `StringSet<>` populated with X86
register names plus Intel-syntax keywords, which caused a minor
instructions:u increase.
Avoid heap allocation and hoist `ReservedIdentifiers` to MCAsmInfo for
other targets.
For the register-name source, prefer
`X86IntelInstPrinter::getRegisterName` over `MCRegisterInfo::getName`.
The former is a TableGen-emitted accessor into a `static const char
AsmStrs[]` pool in `X86GenAsmWriter1.inc`, populated from the lowercase
asm-name argument of each `def XX : X86Reg<"xx", ...>;` in
`X86RegisterInfo.td`.
[SelectionDAG] Split vector types for atomic load
Vector types that aren't widened are split
so that a single ATOMIC_LOAD is issued for the entire vector at once.
This change utilizes the load vectorization infrastructure in
SelectionDAG in order to group the vectors. This enables SelectionDAG
to translate vectors with type bfloat,half.
[libc++] Avoid non-trivial assignment in `__uninitialized_allocator_copy_impl`
__uninitialized_allocator_copy_impl has an optimization that replaces allocator_traits::construct with std::copy for raw pointer ranges when the element type is trivially copy constructible and trivially copy assignable.
The copy-assignment trait only checks whether assignment from const T& is trivial. That is weaker than the expression used by std::copy, which evaluates *out = *in. If overload resolution selects a different non-trivial assignment operator for that expression, std::copy can call that operator on uninitialized storage.
Const-qualify the input pointers in the optimized overload instead. This makes the std::copy expression assign from const T&, matching the existing is_trivially_copy_assignable check, preserving the optimized path when that assignment is trivial, and falling back to placement construction otherwise.
Add a vector copy-constructor regression test with a type whose defaulted copy assignment is trivial but whose templated assignment operator is selected for non-const lvalue sources.
Tested with:
build/bin/llvm-lit -q build/runtimes/runtimes-bins/libcxx/test --filter='(vector.cons/copy.pass|uninitialized_allocator_copy\\.pass)'
build/bin/llvm-lit -q build/runtimes/runtimes-bins/libcxx/test --param std=c++20 --filter='vector.cons/copy.pass'
build/bin/llvm-lit -q build/runtimes/runtimes-bins/libcxx/test --param std=c++11 --filter='vector.cons/copy.pass'
[libc++] Avoid non-trivial assignment in `__uninitialized_allocator_copy_impl`
__uninitialized_allocator_copy_impl has an optimization that replaces allocator_traits::construct with std::copy for raw pointer ranges when the element type is trivially copy constructible and trivially copy assignable.
The copy-assignment trait only checks whether assignment from const T& is trivial. That is weaker than the expression used by std::copy, which evaluates *out = *in. If overload resolution selects a different non-trivial assignment operator for that expression, std::copy can call that operator on uninitialized storage.
Const-qualify the input pointers in the optimized overload instead. This makes the std::copy expression assign from const T&, matching the existing is_trivially_copy_assignable check, preserving the optimized path when that assignment is trivial, and falling back to placement construction otherwise.
Add a regression test with a type whose defaulted copy assignment is trivial but whose templated assignment operator is selected for non-const lvalue sources.
Tested with:
build/bin/llvm-lit -q build/runtimes/runtimes-bins/libcxx/test --filter='uninitialized_allocator_copy(\\.pass|_template_op_assign)'
[RFC][NFCI][Constants] Add `Constant::isZeroValue`
The old `isZeroValue` was removed because it was functionally identical to
`Constant::isNullValue`. Currently, a "null value" in LLVM means a zero value.
We are moving toward changing the semantics of `ConstantPointerNull` to
represent a semantic null pointer instead of a zero-valued pointer. As a result,
the meaning of "null value" will also change in the future.
This PR series is the first step toward renaming the two widely used "null
value" interfaces to "zero value". As the first PR in the series, this change
adds a "new" `isZeroValue` alongside `isNullValue`, and makes `isNullValue` call
`isZeroValue` directly. Then, all uses of `isNullValue` in LLVM are replaced
with `isZeroValue`. Uses in other projects will be updated in separate PRs.
The plan is to eventually remove `isNullValue` after all uses have been
migrated.
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[AMDGPU] Add VOP1 DPP8 pseudo infrastructure
Add VOP_DPP8_Pseudo/VOP1_DPP8_Pseudo classes for DPP8 instructions, similar to
the existing VOP_DPP_Pseudo/VOP1_DPP_Pseudo pattern.
[VPlan] Lift isUsedByLoadStoreAddr into vputils, operate on VPValue(NFC) (#196415)
Extract the helper previously scoped to VPReplicateRecipe::computeCost
and make it available from VPlanUtils so other transforms can query
whether a VPValue is used as part of another load or store's address.
Also relax the input type from VPUser * to VPValue *: the worklist now
tracks VPValues directly, and traversal is gated on the user being a
VPSingleDefRecipe before walking its own users. This is NFC for the
existing caller.
[clang-format] Add BreakFunctionDeclarationParameters option. (#196567)
Adds an option the break function declaration parameters, always putting
them on the next line after the function opening parentheses.
This is an equivalent of `BreakFunctionDefinitionParameters`, but for
function declarations.
---------
Co-authored-by: Lukas Jirkovsky <lukas.jirkovsky at aveco.com>