[LV] Use ResumeForEpilogue for header phi resume in epilogue plan (NFC) (#203786)
Pass the ResumeForEpilogue VPInstructions created by
preparePlanForMainVectorLoop into preparePlanForEpilogueVectorLoop and
get the resume IR from ResumeForEpilogue::getUnderlyingValue()
[LV] Drop the mask of a predicated store masked by the header mask. (#201676)
Drop the mask of a predicated store masked by the header mask (which is
guaranteed to be true at least for the first lane) and both the stored
value and the address are uniform across VF and UF.
An similar version for loads was included in
https://github.com/llvm/llvm-project/pull/196630, but restricted the
uniform-across-vfs-and-ufs did not have impact in practice.
For stores, this results in some improvements after
https://github.com/llvm/llvm-project/pull/196632.
PR: https://github.com/llvm/llvm-project/pull/201676
[BOLT] Zero alignment padding when reusing old text section (#202375)
With --use-old-text, the output starts as a byte-for-byte copy of the
input. Alignment padding between sections could retain stale data from
the original binary. Zero the padding so the result matches writing
sections to new file offsets.
[Github] Remove unnecessary packages from github-automation container (#203358)
This cuts the container size from 654 MB to 229 MB. This is mainly due
to removing the python3-pip package which was pulling in some big
depedencies like gcc.
A smaller container will be faster to download which will speed up the
workflow runs, but also, having less packages means smaller attack
surface for the container.
[llvm-cov] Replace binary test blobs with text formats
Replace .covmapping and .profdata binary blobs with .yaml (obj2yaml)
and .proftext respectively. The test now uses yaml2obj and
llvm-profdata merge to produce inputs at test time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[flang] Add support for the IARGC and GETARG legacy intrinsics (#196425)
Adds semantic checking and lowering, along with semantic and lowering
tests for the legacy GNU intrinsics 'IARGC()' and 'GETARG(POS, VALUE)'.
Although these could just be added as aliases to the standard
COMMAND_ARGUMENT_COUNT and GET_COMMAND_ARGUMENT intrinsics, they were
implemented as separate intrinsics because of some semantic differences
between them:
* IARGC always returns INTEGER(4), whereas COMMAND_ARGUMENT_COUNT
returns a default INTEGER, which could have a different kind.
* GETARG has only two arguments, both of which are required.
* GETARG's POS argument accepts any integer type of width less than or
equal to the default integer kind, while GET_COMMAND_ARGUMENT only
accepts default integers.
Fixes #158438
[RISCV] Remove manual compression of SSPUSH in RISCVFrameLowering.cpp. NFC (#203635)
We used to emit a Zcmop instruction here, which required manual
compression. Since we now emit a Zicfiss instruction, we can rely on
CompressPat to do the right thing.
[X86] Fold XOR of two VGF2P8AFFINEQB instructions with same matrix (#199146)
Adds an optimization to fold a XOR between two `vgf2p8affineqb`
instructions that share the same matrix by XORing their sources
beforehand. This patch:
- Can eliminate one `vgf2p8affineqb` instruction.
- Doesn't occur if either affine is multi use, preventing an increase in code size.
- Includes test coverage for both positive and negative cases.
Fixes #196879
[llubi] Add support for constant expressions (#203746)
This patch adds support for most kinds of constant expressions, except
for ptrtoint/inttoptr. Casting between pointers and integers is
stateful, so they cannot be cached. I plan to implement them in
subsequent patches. ptrtoaddr is also supported in this patch to block
constant folding.
The logic in `evaluateConstantExpression` duplicates the interpreter's
code in `visit*` methods. But I think it is acceptable. Only the GEP
computation is reused.
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path and the remaining unsupported cases.
clang/AMDGPU: Split out target ID flags in TranslateArgs.
Change how xnack and sramecc are processed. Introduce
-mxnack/-mno-xnack and -msramecc/-mno-sramecc flags.
When the target is first parsed in TranslateArgs, synthesize
the appropriate flag for the toolchain. This avoids
special case feature string fixups in getAMDGPUTargetFeatures,
and also avoids an extra parse of the target ID.
In the future this will also simplify tracking these ABI
modifiers in a module flag.
As a side-effect, you can use these flags to override the
no specifier case with the flags. These do not fully replace
the target ID syntax, as there's no way to represent compiling
both modes for the same subtarget.
I didn't bother trying to forward these flags on the main command
line without being specified to the offload device, but I suppose
[3 lines not shown]
[mlir][OpenMP] Translate reductions on taskloop
Add LLVM IR translation for reduction and in_reduction clauses on omp.taskloop.context.
For taskloop reduction, emit the implicit taskgroup reduction setup and map each generated task to runtime-provided private reduction storage through __kmpc_task_reduction_get_th_data. For in_reduction, use the same runtime lookup path with a null descriptor to join an enclosing task reduction context.
Unsupported byref, cleanup, and two-argument initializer forms remain diagnosed.
Add MLIR translation tests for the supported taskloop reduction and in_reduction cases.
[libc++][NFC] Simplify `optional<T>` and `optional<T&>` a bit (#203665)
- Make `optional<T&>`'s iterator base directly from the storage base
instead of inheriting the empty bases, allowing us to remove the
`is_lvalue_reference_v` conditions in the empty bases
- Move the `__is_constructible_for_optional_{meow}` variables closer to
`make_optional` since that's the only place they're really useful for
now
- Change the SFINAE for the iterator availability to use concepts
instead
The above should make it easier to split up in an upcoming patch.
[mlir][OpenMP] Translate task_reduction on taskgroup
Add LLVM IR translation support for the task_reduction clause on
omp.taskgroup.
The translation builds task-reduction descriptors for the listed reduction
variables and emits the runtime initialization before the taskgroup body.
The reducer init and combiner callbacks are generated from the corresponding
omp.declare_reduction regions.
This patch keeps taskloop reduction and in_reduction translation unsupported;
those remain follow-up work. Unsupported task_reduction forms are diagnosed
instead of being lowered incorrectly.
Add MLIR translation tests for taskgroup task_reduction, multiple reducers,
plain taskgroup translation, and remaining unsupported cases.
[llvm-cov] Fix undercounting lines wrapped by gap regions
Lines with no region entry that are wrapped by a gap region were
reported with the gap's count (often 0), even when non-gap segments
on the line indicated the line was actually executed. This caused
llvm-cov to undercount coverage for lines that continue a covered
region after a gap (e.g., closing braces, simple statements following
an if/else).
Check for non-gap segments with HasCount on such lines and use their
max count instead of the gap region's count.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[llubi] Add support for exposed provenance (#200596)
This patch implements the semantics of exposed provenance, as described
in [nikic's RFC draft](https://hackmd.io/@nikic/SJBt4mFCll) and
[Miri](https://doc.rust-lang.org/beta/nightly-rustc/miri/enum.Provenance.html).
The provenance of an inttoptr is marked as "wildcard", which picks one
from previously exposed provenances each time a memory access is
performed. For angelic non-determinism, a snapshot of the exposed
provenance set is recorded when inttoptr executes. When a memory access
is performed, all invalid provenances are masked out. If we fail to pick
one, it is UB.
Since all memory objects in llubi are non-overlapping (i.e., there is at
most one memory object satisfying `Obj->inBounds(Addr)` for each
address), we can determine a unique memory object for a wildcard
provenance when the first memory access is performed.
This matches Miri's behavior. Another variant is to resolve the memory
object when inttoptr executes, which gives a limited provenance set
[14 lines not shown]