Reapply "[VPlan] Handle calls in VPInstruction:opcodeMayReadOrWriteFromMemory." (#191886)
This reverts commit
https://github.com/llvm/llvm-project/commit/3bf9639ec04544902670ab4199401ac470c1fcca.
The reapply adds trivial support for ExtractValue and InsertValue to fix
the crash causing the revert.
Original message:
Retrieve the called function and check its memory attributes, to
determine if a VPInstruction calling a function reads or writes memory.
Use it to strengthen assert in areAllLoadsDereferenceable.
PR: https://github.com/llvm/llvm-project/pull/190681
[AMDGPU] Add object linking support for LDS and named barrier lowering in the middle end
This is the first patch in a series introducing object linking support for
AMDGPU.
This PR adds the -amdgpu-enable-object-linking flag to enable object linking in
the backend. It also updates the AMDGPULowerModuleLDSPass and
AMDGPULowerExecSync passes to support lowering LDS and named barrier globals
when object linking is enabled.
[NVPTX] Add commutativity to SETP instructions to enable MachineCSE of inverted predicates
Inverted predicates can be used freely in PTX. If we can invert a
predicate and CSE the generating instruction we can save calculating
the inverse.
Teach the NVPTX commuteInstructionImpl that SETP instructions can be
inverted to allow CSEing with previous SETP that match the inverted
form. This also inverts the branch users of the predicate to maintain
correctness.
Currently only allow the SETP inversion if all users are branches.
Future work can extend this to sel and not instructions.
Made-with: Cursor
[NVPTX] Add reverseBranchCondition and CBranchOther
Add CBranchOther instruction for inverted predicate branches (@!p bra)
and implement reverseBranchCondition to support branch condition
inversion. Update analyzeBranch, insertBranch, and removeBranch to
handle both CBranch and CBranchOther.
This enables passes like branch folding to properly reverse branch
conditions, and is a prerequisite for SETP predicate inversion CSE.
Made-with: Cursor
[mlir][SPIR-V] Add support for SPV_INTEL_masked_gather_scatter extension (#189099)
Add MaskedGather/MaskedScatter ops and VectorOfPointerType for
SPV_INTEL_masked_gather_scatter extension implemented in #185418
[MLIR][XeGPU] Adding Layout Utility inferMaskOffsetLayoutForScatterIO (#191573)
This PR add a new layout utility function, named
inferMaskOffsetLayoutForScatterIO(), to support the propagation and
lowering of XeGPU scatter IO operations.
sysutils/nut*: Fix configure when user uses the uucp user account
Users wanting to use the uucp user account will experience configure
faiures because uucp is baked into the configure script. We make the
configure script default to "nothing" to address this edge case. The
default user (nut) is already set by the port and ports plumbing.
PR: 294350
[[CIR]] Implement 'to-union' cast. (#191485)
This ends up being pretty trivial/can only really happen in 2 ways, the
only useful way is via an extension. This patch implements this.
This doesn't really affect anything as it is a pretty rarely used
feature and thus doesn't appear in the test suite I've seen, but I saw
it while investigating something else.
[X86][regcall] Rework struct classification for non-Windows x86-64 targets (#187134)
Currently, when `X86_64ABIInfo::classifyRegCallStructTypeImpl`
classifies a struct argument or return value as direct, it leaves the
LLVM IR coerce type unspecified, implicitly relying on
`CodeGenTypes::ConvertType` to eventually construct a default IR type
based on the struct's layout. This conversion is neither stable nor
guaranteed to adhere to the ABI's classification rules.
Instead, rewrite `classifyRegCallStructTypeImpl` to construct an
explicit sequence of coerce types, using the existing field
classification to obtain a coerce type for each member of the struct.
Also, rename the function to `passRegCallStructTypeDirectly` and return
a boolean instead, so that now `classifyRegCallStructType` is the only
place that computes `ABIArgInfo`.
This rewrite also fixes several other issues with the `X86_64ABIInfo`
implementation of `__regcall`:
[17 lines not shown]