[VPlan] Thread scalar type through VPBlend, VPExpression recipes. (NFC) (#200255)
Set the scalar type for VPBlendRecipe and VPExpressionRecipe at
construction time, instead of inferring it on demand via VPTypeAnalysis.
With this change, all VPValues have their scalar type set at
construction, so VPTypeAnalysis::inferScalarType becomes a thin wrapper
around VPValue::getScalarType.
To be removed in a follow-up:
https://github.com/llvm/llvm-project/pull/200256.
PR: https://github.com/llvm/llvm-project/pull/200255
[SLP] Fix extract-cost scale using NCD of all external-user sites
ExtractCostCalculated deduplicates by scalar so only the first
ExternalUser determines the scale, making the cost depend on IR block
ordering via LLVM's reverse-insertion use-list order.
Add a pre-pass computing ScalarToExtractBlock - the nearest common
dominator of all effective extract sites per scalar. For PHI users inside
a loop the effective site is the incoming block; for PHI users outside
all loops it is the PHI's own block (scale = 1). The extract cost is
then scaled by getLoopNestScale of the NCD block, which is fully
order-independent.
Fixes #199548
Reviewers: bababuck, RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/199962
[MLIR][GPU][NFC] Reformat GPU target attachment tests (#199339)
Reformat attach-targets.mlir so each GPU module has a labeled check
block, split target-attachment RUN lines, and keep comments tied to the
expected target-specific matches.
[SLP] Fix extract-cost scale for LCSSA-phi external users in nested loops
getScaleToLoopIterations() used U->getParent() for all PHI-node external
users. For an LCSSA phi at an inner-loop exit still inside an outer loop,
this gave outer-loop scale instead of inner*outer scale. Because
ExtractCostCalculated deduplicates by scalar, only the first ExternalUser
determines the scale, making the cost order-dependent on use-list ordering
(and thus on .ll block ordering).
Reviewers: hiraditya, RKSimon, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/199954
[SLP] Recompute copyable operand deps for duplicate copyable nodes
A bundle may duplicate a previously built node that has copyable elements
(same schedulable instructions, different copyable lane) while the parent
node also has copyable elements. An operand modeled as a copyable element
in the previous node is then used directly by the new node, which is not
registered in the tree yet. Recomputing that operand's direct
dependencies at this point misses the direct use, so the scheduler
decrements the operand more times than its dependency count and trips the
unscheduled-deps assertion.
Defer recomputation of such operand dependencies via
RecalcCopyableOperandDeps and redo it at the next bundle scheduling, when
the duplicate node is part of the tree. Also clear and recompute the
direct dependencies of bundles whose user is a gather node referenced
through EdgeIdx == UINT_MAX in scheduleBlock, so combined gather
sub-entries get correct dependencies against the full tree.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/200564
[bazel] Fix compiler-rt:interception (#200561)
- Add `-DCOMPILER_RT_BUILD_PROFILE_ROCM=1`
- Prune `"lib/sanitizer_common/*.S"`, it means `*.inc.S`
- Add `-fvisibility=hidden`
Bazel doesn't add `-gline-tables-only` by default. Add flags to CMake
side to align the build to Bazel.
- `-DCOMPILER_RT_HAS_G_FLAG=OFF`
- `-DCOMPILER_RT_HAS_GLINE_TABLES_ONLY_FLAG=OFF`
[offload][OpenMP] Add strict flag for blocks and threads in kernel arguments (#199483)
Until now, strict behavior in the number of threads and blocks has been
applied only when the kernel is in bare mode. When this mode is enabled,
the values passed in UserNumBlocks and UserThreadLimit are not adjusted
and are the definitive values used to launch the kernel. This commit
detaches the strictness from the kernel mode.
This is going to be used by the kernel replay tool. Additionally, it
starts clearing the path for the upcoming OpenMP dims modifier, used to
configure multidimensional teams and leagues, which will include
strictness choices for teams and threads.
All the bare kernels must indicate strict behavior. Asserts are added to
check this condition.
[bazel] Fix compiler_rt:interception
- Add `-DCOMPILER_RT_BUILD_PROFILE_ROCM=1`
- Prune `"lib/sanitizer_common/*.S"`, it means `*.inc.S`
- Add `-fvisibility=hidden`
Bazel doesn't add `-gline-tables-only` by default. Add flags to CMake side to align the build to Bazel.
- `-DCOMPILER_RT_HAS_G_FLAG=OFF`
- `-DCOMPILER_RT_HAS_GLINE_TABLES_ONLY_FLAG=OFF`
[libc++] std::byteswap support for _BitInt(N) (#196512)
Add a byte-reversal loop fallback for `std::byteswap` when `sizeof(T) >
16`,
so the function works for `_BitInt(N)` with `N > 128` and any future
wider
integer type. Without it, those calls hit `static_assert(sizeof(_Tp) ==
0)`
and fail to compile.
Reject `_BitInt(N)` where `N` is not a multiple of `CHAR_BIT`. The
existing
`__builtin_bswap{16,32,64,128}` paths swap the storage representation
including padding bits, and the resulting value's meaning is
unspecified.
A new `static_assert` catches that case and reports it. Size-1 types are
exempt from the check, since no bytes move there.
Part of the [_BitInt(N) libc++
[6 lines not shown]
[Clang][test] Fix leading slash (#200549)
A reviewer in #200012 required checking for a leading (back-)slash to
the test despite none of the other tests doing so. Turns out, the slash
isn't there if the driver is unable to resolve the full path to the
linker. Remove the leading slash from the test.
Fixes reported buildbot failures:
* clang-solaris11-sparcv9
* clang-solaris11-amd64
[MergeICmps] Don't merge comparisons whose width isn't a byte multiple (#200346)
MergeICmps looks for cases like
struct S { char x; char y; }
A.x == B.x && A.y == B.y
If `x` and `y` are stored adjacent to one another, we can convert the
above into a memcmp, which can then be converted into a single 16-bit
compare.
This pass currently does the wrong thing if the struct members' sizes
are not multiples of 8 bits. To fix this, we simply bail if the elements
in question are not multiples of one byte.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
[LICM] Drop poison-generating flags when reassociating an icmp (#200344)
`hoistAdd`/`hoistSub` turn `LV + C1 <pred> C2` into `LV <pred> C2 - C1`,
changing the icmp's LHS. A `samesign` flag asserted about the old
operands need not hold for the new LHS, so keeping it can turn a defined
comparison into poison (e.g. for `%iv = -3`, `samesign slt(2, 100)` is
true but the reassociated `samesign slt(-3, 95)` has
opposite-sign operands → poison). Drop the icmp's poison-generating
flags after the rewrite, as `hoistMulAddAssociation` already does.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
[SelectionDAG] Preserve IR alignment on atomicrmw/cmpxchg MMOs (#200332)
Previously SelectionDAG used the natural alignment of the value type,
even if the instruction specified a different alignment.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
Revert "[X86] matchBinaryPermuteShuffle - match to X86ISD::SHLD funnel shift patterns" (#200546)
Reverts llvm/llvm-project#200136 while I investigate a miscompilation report
[Flang][OpenMP] Reject array sections and subobjects in LINEAR clause (#197430)
Array sections like a(:,1,1) and array elements like a(1) in a LINEAR
clause cause a crash during MLIR-to-LLVM IR translation because the
semantic checker doesn't catch them.
This adds a call to CheckVarIsNotPartOfAnotherVar for the LINEAR clause,
which is the same check used by PRIVATE and FIRSTPRIVATE to reject
subobject designators.
Fixes :
[https://github.com/llvm/llvm-project/issues/196068](https://github.com/llvm/llvm-project/issues/196068)
Co-authored-by: Chandra Ghale <ghale at pe34genoa.hpc.amslabs.hpecorp.net>
[Flang][Parser] Handle compiler directives inside INTERFACE blocks (#198516)
Unrecognized !DIR$ directives between interface specifications currently
cause cascading parse errors because the grammar for
InterfaceSpecification has no path to consume them. This patch adds
CompilerDirective as a valid alternative — matching how
InternalSubprogram and ModuleSubprogram already handle this — so that
unrecognized directives produce the expected warning instead of a fatal
parse failure.
Fixes :
[https://github.com/llvm/llvm-project/issues/198289](https://github.com/llvm/llvm-project/issues/198289)
---------
Co-authored-by: Chandra Ghale <ghale at pe34genoa.hpc.amslabs.hpecorp.net>
[SelectionDAG] Remove redundant asserts in WidenVecRes_ATOMIC_LOAD (#200159)
These asserts duplicate guarantees already provided elsewhere:
- `isVector()` checks are redundant because `findMemType()` calls
`WidenVT.getVectorElementType()` and `WidenVT.isScalableVector()`
internally, and `WidenVecRes_ATOMIC_LOAD` is only reached from the
`ATOMIC_LOAD` case in `WidenVectorResult`, which is the vector path.
- The element-type and scalability consistency between `LdVT` and
`WidenVT` is a property of `GetWidenedVector` / `getTypeToTransformTo`.
Follow-up to feedback on #197618.