[mlir][LLVM] Add the `byte` type to the LLVM dialect
This PR ports the newly added `byte` type from LLVM IR to mlir's LLVM dialect.
The simplest motivation for the byte type is being able to implement `memcpy` in LLVM IR. This was previously not possible: Due to rules around conversions between integers and pointers (which e.g. implicitly happen during loads), partial-poisons and pointer provenance were not preserved.
No alterantive types to integers existed that one could use to have poison and provenance preserving SSA-values. The byte type solves exactly this issue.
Frontends are encouraged to use it when needed for better optimization capabilities.
Currently, the only operation that has changed semantics around `byte` is `bitcast`. Is now allows casting between `byte` and `ptr` (unlike integers and pointers).
Corresponding LLVM commit: https://github.com/llvm/llvm-project/commit/80f2ef70f592
Assisted by Claude & Gemini
[X86] Remove shouldCastAtomicLoadInIR; use DAG combine instead (#199520)
Remove X86's shouldCastAtomicLoadInIR override that cast FP atomic loads
to integer at the IR level. Instead, handle this in a pre-legalize DAG
combine (combineAtomicLoad) that rewrites FP/FP-vector atomic loads to
integer atomic loads plus a bitcast.
This and #199310, which adds the necessary cmpxchg support for
non-integer atomic loads in AtomicExpand, are a response to
https://github.com/llvm/llvm-project/pull/148899 for `atomic_vec4_float`
of `atomic-load-store.ll`.
Stacked above #201303.
[LV] Use ResumeForEpilogue for header phi resume in epilogue plan (NFC) (#203786)
Pass the ResumeForEpilogue VPInstructions created by
preparePlanForMainVectorLoop into preparePlanForEpilogueVectorLoop and
get the resume IR from ResumeForEpilogue::getUnderlyingValue()
[LV] Drop the mask of a predicated store masked by the header mask. (#201676)
Drop the mask of a predicated store masked by the header mask (which is
guaranteed to be true at least for the first lane) and both the stored
value and the address are uniform across VF and UF.
An similar version for loads was included in
https://github.com/llvm/llvm-project/pull/196630, but restricted the
uniform-across-vfs-and-ufs did not have impact in practice.
For stores, this results in some improvements after
https://github.com/llvm/llvm-project/pull/196632.
PR: https://github.com/llvm/llvm-project/pull/201676
[BOLT] Zero alignment padding when reusing old text section (#202375)
With --use-old-text, the output starts as a byte-for-byte copy of the
input. Alignment padding between sections could retain stale data from
the original binary. Zero the padding so the result matches writing
sections to new file offsets.
[Github] Remove unnecessary packages from github-automation container (#203358)
This cuts the container size from 654 MB to 229 MB. This is mainly due
to removing the python3-pip package which was pulling in some big
depedencies like gcc.
A smaller container will be faster to download which will speed up the
workflow runs, but also, having less packages means smaller attack
surface for the container.
[llvm-cov] Replace binary test blobs with text formats
Replace .covmapping and .profdata binary blobs with .yaml (obj2yaml)
and .proftext respectively. The test now uses yaml2obj and
llvm-profdata merge to produce inputs at test time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
[flang] Add support for the IARGC and GETARG legacy intrinsics (#196425)
Adds semantic checking and lowering, along with semantic and lowering
tests for the legacy GNU intrinsics 'IARGC()' and 'GETARG(POS, VALUE)'.
Although these could just be added as aliases to the standard
COMMAND_ARGUMENT_COUNT and GET_COMMAND_ARGUMENT intrinsics, they were
implemented as separate intrinsics because of some semantic differences
between them:
* IARGC always returns INTEGER(4), whereas COMMAND_ARGUMENT_COUNT
returns a default INTEGER, which could have a different kind.
* GETARG has only two arguments, both of which are required.
* GETARG's POS argument accepts any integer type of width less than or
equal to the default integer kind, while GET_COMMAND_ARGUMENT only
accepts default integers.
Fixes #158438
[RISCV] Remove manual compression of SSPUSH in RISCVFrameLowering.cpp. NFC (#203635)
We used to emit a Zcmop instruction here, which required manual
compression. Since we now emit a Zicfiss instruction, we can rely on
CompressPat to do the right thing.
[X86] Fold XOR of two VGF2P8AFFINEQB instructions with same matrix (#199146)
Adds an optimization to fold a XOR between two `vgf2p8affineqb`
instructions that share the same matrix by XORing their sources
beforehand. This patch:
- Can eliminate one `vgf2p8affineqb` instruction.
- Doesn't occur if either affine is multi use, preventing an increase in code size.
- Includes test coverage for both positive and negative cases.
Fixes #196879
[llubi] Add support for constant expressions (#203746)
This patch adds support for most kinds of constant expressions, except
for ptrtoint/inttoptr. Casting between pointers and integers is
stateful, so they cannot be cached. I plan to implement them in
subsequent patches. ptrtoaddr is also supported in this patch to block
constant folding.
The logic in `evaluateConstantExpression` duplicates the interpreter's
code in `visit*` methods. But I think it is acceptable. Only the GEP
computation is reused.
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path and the remaining unsupported cases.
clang/AMDGPU: Split out target ID flags in TranslateArgs.
Change how xnack and sramecc are processed. Introduce
-mxnack/-mno-xnack and -msramecc/-mno-sramecc flags.
When the target is first parsed in TranslateArgs, synthesize
the appropriate flag for the toolchain. This avoids
special case feature string fixups in getAMDGPUTargetFeatures,
and also avoids an extra parse of the target ID.
In the future this will also simplify tracking these ABI
modifiers in a module flag.
As a side-effect, you can use these flags to override the
no specifier case with the flags. These do not fully replace
the target ID syntax, as there's no way to represent compiling
both modes for the same subtarget.
I didn't bother trying to forward these flags on the main command
line without being specified to the offload device, but I suppose
[3 lines not shown]
[mlir][OpenMP] Translate reductions on taskloop
Add LLVM IR translation for reduction and in_reduction clauses on omp.taskloop.context.
For taskloop reduction, emit the implicit taskgroup reduction setup and map each generated task to runtime-provided private reduction storage through __kmpc_task_reduction_get_th_data. For in_reduction, use the same runtime lookup path with a null descriptor to join an enclosing task reduction context.
Unsupported byref, cleanup, and two-argument initializer forms remain diagnosed.
Add MLIR translation tests for the supported taskloop reduction and in_reduction cases.
[libc++][NFC] Simplify `optional<T>` and `optional<T&>` a bit (#203665)
- Make `optional<T&>`'s iterator base directly from the storage base
instead of inheriting the empty bases, allowing us to remove the
`is_lvalue_reference_v` conditions in the empty bases
- Move the `__is_constructible_for_optional_{meow}` variables closer to
`make_optional` since that's the only place they're really useful for
now
- Change the SFINAE for the iterator availability to use concepts
instead
The above should make it easier to split up in an upcoming patch.