[SampleProfile] Switch getNameTable() to return iterator_range (NFC) (#200995)
This patch teaches SampleProfileReader::getNameTable() to return an
iterator_range instead of a pointer to std::vector<FunctionId>.
This patch is meant to be a preparation patch for the following
speed-up opportunity. I'm planning to lazy-load SecNameTable in a
subsequent patch for performance reasons. We have SecNameTable that
takes up about 90MB on disk. We eager-load this section into
std::vector<FunctionId> on the heap. This ends up taking about 180MB
on the heap because the element type of the section is 8-byte MD5 hash
value while FunctionId takes up 16 bytes. This eager loading shows up
on the execution profile -- about 1%. Since we do have a few places
where we scan the entire NameTable, we should accommodate those places
with iterators that lazy-load SecNameTable.
See the RFC at:
https://discourse.llvm.org/t/rfc-faster-sample-profile-loading/90957
[LoopFusion] Simplify the logic of checking trip count equality (NFCI). (#201446)
Currently `haveIdenticalTripCounts` has a clunky return value, which
makes it very easy to make a mistake. The returned pair doesn't provide
much value and can be replaced with an optional integer. Also the
function `haveIdenticalTripCounts` does more than what its name
suggests. It checks whether peeling is supported for the pair of loops
or not. Interestingly this is not the only place where we check whether
peeling for this pair is supported!
This patch changes the function and renames it to
`calculateTripCountDiff`. It does exactly what the names says. It tries
to calculate the difference of the trip counts of the two loops and if
it fails it returns an empty optional. It is up to the caller to decide
whether it wants to do fusion/peeling based on this result. The patch
changes some debug output but no functional change is intended.
Datatypes has been modified with explicit specification of size and
signedness to avoid any bug due to overflow in subtraction or comparison
of different integer types.
[RISCV][TargetLowering][P-ext] Support sext_inreg or v2i32/v4i16 vectors on RV32. (#201752)
Update sext_vector_inreg expansion to use sext_inreg. Previously it
emitted 2 shifts that wouldn't be combined.
[SelectionDAG] Fold extracts of subvector inserts
Fold extract_subvector(insert_subvector(...)) when the extraction is
outside the inserted subvector or the inserted subvector only amends
the extracted
In particular,
1. vA extract_subvector (vB insert_subvector(vB X, vC Y, C1), C2) =>
vA extract_subvector(X, C2) when [C2, C2 + A) intersect [C1, C1 + C)
is the empty set
2. ... => extract_subvector(Y, C2 - C1) if [C2, C2 + Y) is a subset of
[C1, C1 + C) - an existing simplification
3. ... => vA insert_subvector(vA extract_subvector(vB X, C2), vC Y, C1 - C2)
if [C1, C1 + C) is a subset of [C2, C2 + A) - that is, if you're only
updating the extracted sub-part.
Adds a regresssion tests for an infinite SelectionDAG cycle that is
fixed by a stack of commits that ends with this one.
[3 lines not shown]
[SelectionDAG] Fold subvector inserts into concat operands
Push insert_subvector into the containing CONCAT_VECTORS operand when the insertion is wholly contained there.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Fold extracts spanning concat operands
Factor the extract_subvector-of-CONCAT_VECTORS logic and handle
extracts that cover multiple whole concat operands by rebuilding a
smaller concat directly.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Fold nonzero extract-of-extract indices
Generalize the extract_subvector-of-extract_subvector fold to compose
nonzero indices instead of only handling an outer index of zero.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
clang/AMDGPU: Pass BoundArch through device libs handling
Pre-work to consolidate target identification for future target
option bug fixes. Also requires updating flang to match recent
clang changes.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
clang: Add BoundArch argument to addClangTargetOptions
addClangTargetOptions already has an OffloadKind argument,
but it kind of doesn't make sense for any function to know the
OffloadKind, but not the associated BoundArch.
The current process is kind of convoluted. TranslateArgs
synthesizes a -mcpu argument from BoundArch, and later
addClangTargetOptions re-parses that -mcpu argument each
time it wants the architecture. Add this argument so this
can be cleaned up in a future change.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
[SelectionDAG] Track demanded select elements in noundef checks
Propagate demanded elements through to the two arms of a select, and
check the condition with or without demanded elements depending on if
it's a vector or not.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Track bitcast demanded elements in noundef tests
Bitcasts preserve undef/poison status, but vector bitcasts can change
which source lanes cover a demanded result lane. Map the demanded
element mask through fixed-length vector bitcasts before checking the
source where possible.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[clang][AST] Hash `AttributedType`'s `Attr` by Arguments (#200961)
https://github.com/llvm/llvm-project/pull/108631 added
`ID.AddPointer(attr)` to `AttributedType::Profile`, which turned the
`ID` into a pointer-identity key. This inhibits deduplication of
attributed types (such as types with `_Nonnull/_Nullable` attributes).
Such duplications can lead to significant increases in pcm/pch sizes.
This PR adds the arguments of the attributes to the folding set ID, so
that the content of the argument is taken into account when computing
the ID in addition to the existing inputs. The implementation teaches
tablegen to generate the `profile` method for each attribute, similar to
how we generate methods to check equivalence. This way, the argument
contents are handled automatically. Additionally, an attribute can have
an escape hatch to add its own customized profile method, through the
`profileFn` tablegen field, in case something special is needed.
Assisted-by: claude-opus-4.7
Fixes rdar://170586474.
[SelectionDAG] Track demanded concat elements in noundef checks
Teach isGuaranteedNotToBeUndefOrPoison to distribute fixed-length
demanded element masks across CONCAT_VECTORS operands. This is part of
the series of fixes needed to resolve a SelectionDAG hang by making it
possible to prove certain values don't need to be frozen.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[Flang][OpenMP] Heap-allocate GPU dynamic private arrays in distribute parallel do (#200841)
Fixes GPU offload crashes for Fortran automatic arrays privatised in
target teams distribute parallel do.
For delayed privatisation on GPU, dynamically sized boxed array privates
are now routed through the existing heap-allocation path, with matching
cleanup emitted in the privatiser dealloc region. This avoids lowering
such arrays to runtime-sized scratch allocas whose descriptors can be
captured across the distribute callback boundary.
Fixes [#2419](https://github.com/ROCm/llvm-project/issues/2419).
Co-authored-by: Codex <codex at openai.com>
[mlir][ROCDL] Move ROCDL intrinsic enum immargs to enums (#198875)
In many cases, a "i32" `immarg` arguhment to an intrinsic in the AMDGPU
backend actually corresponds directly to some enumerated set of values
in the backend, which we have to smuggle through an I32. This makes the
MLIR forms of intrinsics less readable and means that people either have
to use the `amdgpu` dialect to get these enums or have to roll their own
enums if they want to know what's going on.
This PR rips the band-aid off and breaks the world by swapping out those
integer attributes for enum attributes.
Of special note is the handling of the aux/cachepolicy field on various
intrinsics; in the backend, all the architectures share an enum and
you've just got to use the right names in the right spots. Here, we've
separated out the cases for pre-gfx942, gfx942+, and gfx12 enums as
separate attributes (including separate casing for gfx12 atomics) and
allowed any of them to be used. We also allow an I32Attr in those
arguments for easy importing and to make the common case of "0" portably
[18 lines not shown]
Revert "[clang] Reland: fix getTemplateInstantiationArgs" (#201864)
Reverts llvm/llvm-project#201373
This caused compilation errors. See comment on the original PR.