[SelectionDAG] Fold subvector inserts into concat operands
Push insert_subvector into the containing CONCAT_VECTORS operand when the insertion is wholly contained there.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Fold extracts spanning concat operands
Factor the extract_subvector-of-CONCAT_VECTORS logic and handle
extracts that cover multiple whole concat operands by rebuilding a
smaller concat directly.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Fold nonzero extract-of-extract indices
Generalize the extract_subvector-of-extract_subvector fold to compose
nonzero indices instead of only handling an outer index of zero.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
clang/AMDGPU: Pass BoundArch through device libs handling
Pre-work to consolidate target identification for future target
option bug fixes. Also requires updating flang to match recent
clang changes.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
clang: Add BoundArch argument to addClangTargetOptions
addClangTargetOptions already has an OffloadKind argument,
but it kind of doesn't make sense for any function to know the
OffloadKind, but not the associated BoundArch.
The current process is kind of convoluted. TranslateArgs
synthesizes a -mcpu argument from BoundArch, and later
addClangTargetOptions re-parses that -mcpu argument each
time it wants the architecture. Add this argument so this
can be cleaned up in a future change.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
[SelectionDAG] Track demanded select elements in noundef checks
Propagate demanded elements through to the two arms of a select, and
check the condition with or without demanded elements depending on if
it's a vector or not.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Track bitcast demanded elements in noundef tests
Bitcasts preserve undef/poison status, but vector bitcasts can change
which source lanes cover a demanded result lane. Map the demanded
element mask through fixed-length vector bitcasts before checking the
source where possible.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[clang][AST] Hash `AttributedType`'s `Attr` by Arguments (#200961)
https://github.com/llvm/llvm-project/pull/108631 added
`ID.AddPointer(attr)` to `AttributedType::Profile`, which turned the
`ID` into a pointer-identity key. This inhibits deduplication of
attributed types (such as types with `_Nonnull/_Nullable` attributes).
Such duplications can lead to significant increases in pcm/pch sizes.
This PR adds the arguments of the attributes to the folding set ID, so
that the content of the argument is taken into account when computing
the ID in addition to the existing inputs. The implementation teaches
tablegen to generate the `profile` method for each attribute, similar to
how we generate methods to check equivalence. This way, the argument
contents are handled automatically. Additionally, an attribute can have
an escape hatch to add its own customized profile method, through the
`profileFn` tablegen field, in case something special is needed.
Assisted-by: claude-opus-4.7
Fixes rdar://170586474.
[SelectionDAG] Track demanded concat elements in noundef checks
Teach isGuaranteedNotToBeUndefOrPoison to distribute fixed-length
demanded element masks across CONCAT_VECTORS operands. This is part of
the series of fixes needed to resolve a SelectionDAG hang by making it
possible to prove certain values don't need to be frozen.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[Flang][OpenMP] Heap-allocate GPU dynamic private arrays in distribute parallel do (#200841)
Fixes GPU offload crashes for Fortran automatic arrays privatised in
target teams distribute parallel do.
For delayed privatisation on GPU, dynamically sized boxed array privates
are now routed through the existing heap-allocation path, with matching
cleanup emitted in the privatiser dealloc region. This avoids lowering
such arrays to runtime-sized scratch allocas whose descriptors can be
captured across the distribute callback boundary.
Fixes [#2419](https://github.com/ROCm/llvm-project/issues/2419).
Co-authored-by: Codex <codex at openai.com>
[mlir][ROCDL] Move ROCDL intrinsic enum immargs to enums (#198875)
In many cases, a "i32" `immarg` arguhment to an intrinsic in the AMDGPU
backend actually corresponds directly to some enumerated set of values
in the backend, which we have to smuggle through an I32. This makes the
MLIR forms of intrinsics less readable and means that people either have
to use the `amdgpu` dialect to get these enums or have to roll their own
enums if they want to know what's going on.
This PR rips the band-aid off and breaks the world by swapping out those
integer attributes for enum attributes.
Of special note is the handling of the aux/cachepolicy field on various
intrinsics; in the backend, all the architectures share an enum and
you've just got to use the right names in the right spots. Here, we've
separated out the cases for pre-gfx942, gfx942+, and gfx12 enums as
separate attributes (including separate casing for gfx12 atomics) and
allowed any of them to be used. We also allow an I32Attr in those
arguments for easy importing and to make the common case of "0" portably
[18 lines not shown]
Revert "[clang] Reland: fix getTemplateInstantiationArgs" (#201864)
Reverts llvm/llvm-project#201373
This caused compilation errors. See comment on the original PR.
[lld][WebAssembly] Add missing space in unmodeled diagnostic (#201764)
This is just a nit change, I hit this fatal while trying to use a GC
object, and noticed that the diagnostic showed `foo.ofile has unmodeled
reference or GC types`
[Driver][test] Use -### for non-ObjC constant-literal RUN lines (#201877)
The RUN lines added in 3b100666a70f did a real compile for
arm64-apple-macosx11, which fails on builders that don't register the
AArch64 backend (e.g. llvm-clang-x86_64-sie-ubuntu-fast). The
NoArgumentUnused behavior under test is driver-side, so switch to -###
and avoid the backend dependency.
[flang][OpenMP] Adding support for weak extended-atomic clause (#201823)
Adding support for "!$omp atomic compare weak".
!$omp atomic compare weak
if (var1 == num1) var1 = num2
!$omp end atomic
This also Fixes
[#201812](https://github.com/llvm/llvm-project/issues/201812)
---------
Co-authored-by: Sunil Kuravinakop <kuravina at pe31.hpc.amslabs.hpecorp.net>
[LLVM] Precise error message for intrinsic signature verification (3/n) (#200493)
Print precise error message for dependent types when an intrinsic's type
signature verification fails.
clang/HIP: Remove __ockl_fdot2 declaration (#201878)
The builtin headers should not be in the business of exporting
ockl functions, and only declaring the minimum which are actively
used by the builtin headers.
[Test] Fix loop exit conditions to prevent trivial optimizations (#201867)
Several tests had 'br i1 %ec, label %loop, label %exit' which exits on
the first iteration instead of looping so I swapped them. Also changed
predicates to keep the loops, otherwise they are going to be eliminated
by https://github.com/llvm/llvm-project/pull/201839.
[InstCombine] Use copyMetadata in PointerReplacer::replace (#201827)
PointerReplacer::replace creates a new load that differs from the
original only in its pointer operand; the loaded type is unchanged. It
was using copyMetadataForLoad(), which is meant for the case where the
load's *type* changes. Since the type is the same here, plain
copyMetadata() is correct and preserves all metadata directly.