[DA] Move no-wrap flag check into checkSubscript (#190770)
Recent patches added no-wrap flag checks to each dependence test (except
for the Banerjee MIV test) to make them sound. These fixes have been
applied one by one to ensure that each dependence test was correctly
updated and the defects were properly addressed. However, ideally, these
functions should not be called at all when the required no-wrap flags
are not set. Specifically, `classifyPair` should tag pairs as
`NonLinear` when either addrec doesn't have the no-wrap flag, which
means that the addrec is as literal non-linear.
This patch moves the existing no-wrap flag checks in the each dependence
test to `checkSubscript`, which is called by `classifyPair`. With this
change, if the addrec doesn't have the no-wrap flag, the pair will be
classified as `NonLinear` and the dependence test will not be invoked at
all. I believe this change makes the code cleaner and consistent with
the meaning of `NonLinear` classification.
Note that this patch doesn't take care of the behavioral change caused
by the Benerjee MIV test, as the test is still not sound and there are
no plans to fix it in the near future.
[RISCV] Add isCommutable for VDOTA4 and VDOTA4U (#190090)
Mark PseudoVDOTA4_VV and PseudoVDOTA4U_VV as commutable since both
source operands have the same signedness. VDOTA4SU is left
non-commutable because its operands differ in signedness (signed x
unsigned).
Add findCommutedOpIndices cases for the new commutable pseudos and
a test covering commutable and non-commutable dot product variants.
---------
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[AMDGPU] Add a sched group mask for LDSDMA instructions
The existing VMEM masks are not fine-grained enough for some use cases. For
example, if users want to control async loads, using VMEM may cause the compiler
to pick instructions it shouldn't.
This PR adds a new sched group mask for LDSDMA instructions. It is a subclass of
VMEM, but only targets isLDSDMA instructions.
[CodeGen] relax kill copy hoist restriction for vreg to phys reg copies with folded loads (#190304)
Resolves: https://github.com/llvm/llvm-project/issues/62452
Currently, `TwoAddressInstructionPass` has a blanket rule against moving
kill copies, since many copies are better handled later by coalescing.
However, that rule is too strict when the kill is a virtual register to
physical register copy and the current two-address instruction has a
folded load. In that case, keeping the copy in place can force the pass
to break the folded rm form into a mov rm + op rr, even though the
physical register copy itself cannot be coalesced away in the usual
sense.
This fixes a missed optimization where a folded IMUL64rm was rewritten
into MOV64rm + IMUL64rr because a later $rax = COPY %src was kept in
place for mul.
[AMDGPU] Fix incorrect MachineMemOperand offsets and sizes in wide s_buffer_load splits (#189890)
When G_AMDGPU_S_BUFFER_LOAD (or its SelectionDAG equivalent) falls back
to MUBUF due to a divergent offset, wide loads (256-bit, 512-bit) are
split into multiple 128-bit chunks. Both code paths that perform this
split had bugs in how they annotated MachineMemOperand (MMO) metadata on
each chunk instruction — reporting wrong offsets and wrong sizes. This
does not affect generated assembly correctness but degrades the analysis
if that MMO metadata is used.
---------
Co-authored-by: Abhinav Garg <abhigarg at amd.com>
Co-authored-by: Jay Foad <jay.foad at gmail.com>
[X86][APX] Fix segfault in foldMemoryOperandImpl for two-address NDD fold (#190562)
The NoNDDM code path in foldMemoryOperandImpl assumed
NewMI->getOperand(1) is always a register. When IsTwoAddr is true,
fuseTwoAddrInst replaces operands 0-4 with memory address components, so
getOperand(1) is the immediate, not a register. Calling setReg() causes
a segfault in removeOperandFromUseList.
Skip the NoNDDM COPY block when IsTwoAddr is true, since the two-address
fold already correctly handles the dest==src1 constraint.
I believe the issue was introduced with #189222 , the 'NoNDDM' block
calls 'NewMI->getOperand(1).setReg()', but after 'fuseTwoAddrInst',
operand 1 is an immediate, not a register.
Passes all APX regression tests. Unit test included in commit. Fixes
issue #190557.
First time submitting a PR to the LLVM project, please let me know if I
need to fix something! Tagging @phoebewang and @RKSimon as potential
review candidates.