[RISCV] Remove manual compression of SSPUSH in RISCVFrameLowering.cpp. NFC (#203635)
We used to emit a Zcmop instruction here, which required manual
compression. Since we now emit a Zicfiss instruction, we can rely on
CompressPat to do the right thing.
[X86] Fold XOR of two VGF2P8AFFINEQB instructions with same matrix (#199146)
Adds an optimization to fold a XOR between two `vgf2p8affineqb`
instructions that share the same matrix by XORing their sources
beforehand. This patch:
- Can eliminate one `vgf2p8affineqb` instruction.
- Doesn't occur if either affine is multi use, preventing an increase in code size.
- Includes test coverage for both positive and negative cases.
Fixes #196879
[llubi] Add support for constant expressions (#203746)
This patch adds support for most kinds of constant expressions, except
for ptrtoint/inttoptr. Casting between pointers and integers is
stateful, so they cannot be cached. I plan to implement them in
subsequent patches. ptrtoaddr is also supported in this patch to block
constant folding.
The logic in `evaluateConstantExpression` duplicates the interpreter's
code in `visit*` methods. But I think it is acceptable. Only the GEP
computation is reused.
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path and the remaining unsupported cases.
clang/AMDGPU: Split out target ID flags in TranslateArgs.
Change how xnack and sramecc are processed. Introduce
-mxnack/-mno-xnack and -msramecc/-mno-sramecc flags.
When the target is first parsed in TranslateArgs, synthesize
the appropriate flag for the toolchain. This avoids
special case feature string fixups in getAMDGPUTargetFeatures,
and also avoids an extra parse of the target ID.
In the future this will also simplify tracking these ABI
modifiers in a module flag.
As a side-effect, you can use these flags to override the
no specifier case with the flags. These do not fully replace
the target ID syntax, as there's no way to represent compiling
both modes for the same subtarget.
I didn't bother trying to forward these flags on the main command
line without being specified to the offload device, but I suppose
[3 lines not shown]
[mlir][OpenMP] Translate reductions on taskloop
Add LLVM IR translation for reduction and in_reduction clauses on omp.taskloop.context.
For taskloop reduction, emit the implicit taskgroup reduction setup and map each generated task to runtime-provided private reduction storage through __kmpc_task_reduction_get_th_data. For in_reduction, use the same runtime lookup path with a null descriptor to join an enclosing task reduction context.
Unsupported byref, cleanup, and two-argument initializer forms remain diagnosed.
Add MLIR translation tests for the supported taskloop reduction and in_reduction cases.
[libc++][NFC] Simplify `optional<T>` and `optional<T&>` a bit (#203665)
- Make `optional<T&>`'s iterator base directly from the storage base
instead of inheriting the empty bases, allowing us to remove the
`is_lvalue_reference_v` conditions in the empty bases
- Move the `__is_constructible_for_optional_{meow}` variables closer to
`make_optional` since that's the only place they're really useful for
now
- Change the SFINAE for the iterator availability to use concepts
instead
The above should make it easier to split up in an upcoming patch.
[mlir][OpenMP] Translate task_reduction on taskgroup
Add LLVM IR translation support for the task_reduction clause on
omp.taskgroup.
The translation builds task-reduction descriptors for the listed reduction
variables and emits the runtime initialization before the taskgroup body.
The reducer init and combiner callbacks are generated from the corresponding
omp.declare_reduction regions.
This patch keeps taskloop reduction and in_reduction translation unsupported;
those remain follow-up work. Unsupported task_reduction forms are diagnosed
instead of being lowered incorrectly.
Add MLIR translation tests for taskgroup task_reduction, multiple reducers,
plain taskgroup translation, and remaining unsupported cases.
[llvm-cov] Fix undercounting lines wrapped by gap regions
Lines with no region entry that are wrapped by a gap region were
reported with the gap's count (often 0), even when non-gap segments
on the line indicated the line was actually executed. This caused
llvm-cov to undercount coverage for lines that continue a covered
region after a gap (e.g., closing braces, simple statements following
an if/else).
Check for non-gap segments with HasCount on such lines and use their
max count instead of the gap region's count.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[llubi] Add support for exposed provenance (#200596)
This patch implements the semantics of exposed provenance, as described
in [nikic's RFC draft](https://hackmd.io/@nikic/SJBt4mFCll) and
[Miri](https://doc.rust-lang.org/beta/nightly-rustc/miri/enum.Provenance.html).
The provenance of an inttoptr is marked as "wildcard", which picks one
from previously exposed provenances each time a memory access is
performed. For angelic non-determinism, a snapshot of the exposed
provenance set is recorded when inttoptr executes. When a memory access
is performed, all invalid provenances are masked out. If we fail to pick
one, it is UB.
Since all memory objects in llubi are non-overlapping (i.e., there is at
most one memory object satisfying `Obj->inBounds(Addr)` for each
address), we can determine a unique memory object for a wildcard
provenance when the first memory access is performed.
This matches Miri's behavior. Another variant is to resolve the memory
object when inttoptr executes, which gives a limited provenance set
[14 lines not shown]
[llvm-cov] Fix undercounting lines wrapped by gap regions
Lines with no region entry that are wrapped by a gap region were
reported with the gap's count (often 0), even when non-gap segments
on the line indicated the line was actually executed. This caused
llvm-cov to undercount coverage for lines that continue a covered
region after a gap (e.g., closing braces, simple statements following
an if/else).
Check for non-gap segments with HasCount on such lines and use their
max count instead of the gap region's count.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[llvm-cov] Add failing test for gap region line coverage bug
LineCoverageStats incorrectly reports lines as uncovered when a gap
region with count=0 wraps into a line that has non-entry segments
with count > 0.
The test uses a minimal reproducer: a scoped block with a never-taken
early return followed by a statement. The closing "}" produces a gap
region that wraps to the next line, suppressing its execution count.
The extra statement after the if-block is required — without it,
clang emits a region entry (MinRegionCount > 0) and the bug doesn't
trigger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[llvm-cov] Add failing test for gap region line coverage bug
LineCoverageStats incorrectly reports lines as uncovered when a gap
region with count=0 wraps into a line that has non-entry segments
with count > 0.
The test uses a minimal reproducer: a scoped block with a never-taken
early return followed by a statement. The closing "}" produces a gap
region that wraps to the next line, suppressing its execution count.
The extra statement after the if-block is required — without it,
clang emits a region entry (MinRegionCount > 0) and the bug doesn't
trigger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[llvm-cov] Add failing test for gap region line coverage bug
LineCoverageStats incorrectly reports lines as uncovered when a gap
region with count=0 wraps into a line that has non-entry segments
with count > 0.
The test uses a minimal reproducer: a scoped block with a never-taken
early return followed by a statement. The closing "}" produces a gap
region that wraps to the next line, suppressing its execution count.
The extra statement after the if-block is required — without it,
clang emits a region entry (MinRegionCount > 0) and the bug doesn't
trigger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[llvm-cov] Add failing test for gap region line coverage bug
LineCoverageStats incorrectly reports lines as uncovered when a gap
region with count=0 wraps into a line that has non-entry segments
with count > 0.
The test uses a minimal reproducer: a scoped block with a never-taken
early return followed by a statement. The closing "}" produces a gap
region that wraps to the next line, suppressing its execution count.
The extra statement after the if-block is required — without it,
clang emits a region entry (MinRegionCount > 0) and the bug doesn't
trigger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[llvm-cov] Add failing test for gap region line coverage bug
LineCoverageStats incorrectly reports lines as uncovered when a gap
region with count=0 wraps into a line that has non-entry segments
with count > 0.
The test uses a minimal reproducer: a scoped block with a never-taken
early return followed by a statement. The closing "}" produces a gap
region that wraps to the next line, suppressing its execution count.
The extra statement after the if-block is required — without it,
clang emits a region entry (MinRegionCount > 0) and the bug doesn't
trigger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[SLP]Keep ordered fadd reductions sequential
tryToReduceOrdered vectorizes reductions that are not associative (e.g. an
fadd with reassoc but without nsz). The accumulation order must be
preserved, but the reduction was costed and emitted with reassoc, and the
whole tree was rotated to memory order by reorderTopToBottom, which can
change the result of the sequential reduction.
Drop reassoc from the flags used to cost and emit the reduction so the
generated llvm.vector.reduce.fadd stays ordered, and drop the top-to-bottom
reorder so the reduced values keep their original accumulation order.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/203741
[llvm-cov] Add failing test for gap region line coverage bug
LineCoverageStats incorrectly reports lines as uncovered when a gap
region with count=0 wraps into a line that has non-entry segments
with count > 0.
The test demonstrates this with two scoped blocks containing
never-taken early returns. The statement between them ("int result =
42;") and after them ("return true;") are genuinely executed but
reported with count=0 because the gap region from each block's
closing "}" incorrectly suppresses the line's execution count.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>