[AArch64][SelectionDAG] Generate subs+csel for usub.sat (#193203)
Fixes https://github.com/llvm/llvm-project/issues/191488
As this is a regression of
https://github.com/llvm/llvm-project/pull/170076, adds a check to avoid
generic lowering of usub.sat to X - zext(X != 0) in case of aarch64 by
making the constraint of this transformation stricter via an extra
isOperationLegalOrCustom guard on USUBO_CARRY. All other backends will
still receive generic lowering as implemented in the original patch.
[InstCombine] Reland #165975: Fix #163110: Support peeling off matching shifts from icmp operands via canEvaluateShifted (#190918)
This relanding of #165975 fixes the bug that caused the bootstrap-asan
buildbot failure
(https://lab.llvm.org/buildbot/#/builders/52/builds/16329).
## Original optimization
Consider a pattern like: `icmp (shl nsw/nuw X, L), (add nsw/nuw (shl
nsw/nuw Y, L), K)`
When K is a multiple of 2^L, this can be simplified to: `icmp X, (add
nsw/nuw Y, K >> L)`
This patch extends `canEvaluateShifted` to support `Instruction::Add`
and refactors its signature to accept a `ShiftSemantics` enum (`Lossy` /
`Unsigned` / `Signed`) instead of a bare opcode. This allows the
function to enforce losslessness requirements according to the overflow
flags (nsw/nuw) of the operands. The logic is wired into
[14 lines not shown]
[clang][bytecode] Rework APValue visiting (#194408)
First, we can't just ignore the LValuePath of an lvalue APValue. Add
code to handle that and a test case exercising the newly added code.
We also didn't look at APValue bases when initializing from an APValue.
[SystemZ] Enable -fpatchable-function-entry=M,N (#178191)
This PR enables the option `-fpatchable-function-entry` for SystemZ. It
utilizes existing common code and just adds the emission of nops after
the function label in the backend.
SystemZ provides multiple nop options of varying length, making the
semantics of this option somewhat ambiguous. In order to align with what
`gcc` does with that same option, we#re choosing `nopr` as the
canoonical nop for this purpose.
For test, this adapts an existing test file from aarch64.
(cherry picked from commit 355898a6ce901bf9285a428888068e008b5557e9)
[X86] lowerV4F32Shuffle - don't use INSERTPS if SHUFPS will suffice (#186468)
If we have 2 or more undef/undemanded elements, the INSERTPS replaces
those with explicit zero'd elements which can cause infinite loops later
on in shuffle combining depending on whether we demand those elements or
not.
I'll try to improve the (minor) v2f32 regressions in a follow up, but I
need to fix the infinite loop first.
Fixes #186403
[C++20] [Modules] Add VisiblePromoted module ownership kind (#189903)
This patch adds a new ModuleOwnershipKind::VisiblePromoted to handle
declarations that are not visible to the current TU but are promoted to
be visible to avoid re-parsing.
Originally we set the visible visiblity directly in such cases. But
https://github.com/llvm/llvm-project/issues/188853 shows such decls may
be excluded later if we import #include and then import. So we have to
introduce a new visibility to express the intention that the visibility
of the decl is intentionally promoted.
Close https://github.com/llvm/llvm-project/issues/188853
(cherry picked from commit c97e08e331736ae8c7d17bf1f24954570f564ad0)
[RISCV][NFC] Turn lowerVECTOR_SHUFFLE into a member function of RISCVTargetLowering (#194299)
Convert lowerVECTOR_SHUFFLE into a member function of
RISCVTargetLowering, aligning it with other lowerXXX member functions in
RISCVTargetLowering and matching other targets like AArch64.
security/dehydrated: Ensure the periodic script exits with the proper error code
PR: 294021
Reported by: Henrik <henrik at eyetea.se>
Reviewed by: linimon
MFH: 2026Q2
[MLIR][XeGPU] XeGPU DpasMx Op Definition adds Layout Support (#194117)
This PR extends the DpasMx operation to support MXFP (microscaling
floating point) matrix multiply with separate scale factor layouts.
1. Op Definition
Added layout_a_scale and layout_b_scale attributes to DpasMx op
Removed AllElementTypesMatch<["a", "b"]> trait to allow different types
for A/B with scales
2. Layout Infrastructure
setupDpasMxLayout(): Creates anchor layouts for all 5 operands (A, B,
C/D, scale_a, scale_b)
Derives scale layouts from parent matrix layouts by dividing innermost
dimension
Supports all layout kinds: Subgroup, InstData, Lane
Fix a bug in getupDpasSubgroupLayouts(): sg_data of A/B matrix should
keep the full K dimension.
3. Layout Propagation
[7 lines not shown]
Make xarray cyclic start looking for a free id at the position specified
by the next argument and stop after wrapping back to that position.
Previously looking for a free id started at the beginning of the
allocation range and stopped at the end, ignoring the next argument.
Currently xarray cyclic id allocations are only used by the GuC code in
inteldrm. In 6.18.25 drm, the amdgpu PASID allocation changes from
using cyclic idr to cyclic xarray.
[CIR] Implement PredefinedExpr in aggregate emitter and add consteval… (#194484)
… aggregate test
Handle PredefinedExpr by delegating to emitAggLoadOfLValue, removing the
NYI fallback. Also add a test for ConstantExpr aggregate emission
(consteval functions returning structs), which was already implemented
but lacked test coverage.
This unblocks ~206 libcxx test failures that involve aggregate
ConstantExpr and PredefinedExpr.
Note on LLVM IR divergence (will be addressed in follow-up PRs): For
consteval functions returning aggregates, CIR currently emits a global
constant + cir.copy that lowers to llvm.memcpy from the global, while
OGCG decomposes the constant into per-field stores. The added CIR / LLVM
/ OGCG CHECK lines in consteval-aggregate.cpp document this difference.
Convergence will come from a follow-up that decomposes the consteval
aggregate stores into per-field stores in LoweringPrepare (and related
GEP-index handling for padded structs).
[RISCV] Improve getInterleavedMemoryOpCost for interleave groups with tail gaps. (#192074)
For interleaved access groups where gaps are only at the tail (i.e.
members are contiguous starting from index 0 but do not fill the entire
factor), the interleaved memory access pass can lower them to
vlsseg/vssseg intrinsics with NF equal to the number of group members
rather than the factor after #151612 and #154647.
Previously these groups fell through to the generic fixed-vector shuffle
cost model. This patch adds a dedicated cost path that checks legality
and estimates appropriate cost for them.
TODO: Support scalable vector type.
Fix #151497
[CIR] Lower constant NTTP objects (#194496)
Like my previous patch, this just stores an NTTP object as a global
(using the same code, with 1 level of indrection slipped off), and
initializes it as a const. This patch also fleshes out the
CIRGenExprConstant.cpp area, leaving just 2 'NYI's in the area, 1 of
which is the MSGuidAttr again.
[clang][modules-driver] Fix failing import-std regression test (#194502)
See
https://github.com/llvm/llvm-project/pull/194475#issuecomment-4331347690.
This constrains the test to not run on aarch64, where it fails on
`clang-aarch64-quick` and `llvm-clang-aarch64-darwin` builders.
The failing builders don't show any output, and the test will be
re-enabled for aarch64 in a later follow-up.
Co-authored-by: Naveen Seth Hanig <naveen.hanig at oulook.com>
[DataLayout] Add null pointer value infrastructure
Add support for specifying the null pointer bit representation per address space
in DataLayout via new pointer spec flags:
- 'z': null pointer is all-zeros
- 'o': null pointer is all-ones
When neither flag is present, the address space inherits the default set by the
new 'N<null-value>' top-level specifier ('Nz' or 'No'). If that is also absent,
the null pointer value is zero.
No target DataLayout strings are updated in this change. This is pure
infrastructure for a future ConstantPointerNull semantic change to support
targets with non-zero null pointers (e.g. AMDGPU).