[RISCV] Use a vector MemVT when converting store+extractelt into a vector store. (#190107)
This is needed so that `allowsMemoryAccessForAlignment` checks for
unaligned vector memory
support instead of unaligned scalar memory support when called from
`RISCVTargetLowering::expandUnalignedVPStore`
While there remove incorrect setting of the truncating store flag
on the vector instruction. And restrict the transform to simple stores
since we don't have tests for volatile or atomic.
Fixes #189037
[RISCV][P-ext] Add isel patterns for for macc*.h00/macc*.w00. (#190444)
The RV32 macc*.h00 instructions take the lower half words from rs1 and
rs2, compute the full word product by extending the inputs, and
add to rd. The RV64 macc*.w00 is similar but operates on words
and produces a double word result.
I've restricted this to case where the multiply has a single use.
We don't have a general macc that multiplies the full xlen bits
of rs1 and rs2, so I'm allowing the input to be sext_inreg/and or
have sufficient sign/zero bits according to
ComputeNumSignBits/computeKnownBits.
We should also add mul*.h00/mul.*w00 patterns, but those we should
restrict to at least one input being sext_inreg/and and prefer
regular mul when there are no sext_inreg/and.
[AMDGPU] Add v2i32 and/or patterns for VOP3 AND_OR and OR3 operations (#188375)
Add ThreeOp_v2i32_Pats pattern class to support v2i32 vector operations
for AND_OR_B32 and OR3_B32 instructions. The new patterns check the
v2i32 and-or or or-or instruction sequence, extract individual 32-bit
elements from v2i32 operands, and applies the and_or or or3 vop3
operations.
[AMDGPU] Change isSingleLaneExecution to account for WWM enabling lanes even if there's only one workitem (#188316)
This issue was discovered during some downstream work around Vulkan CTS
tests, specifically
`dEQP-VK.subgroups.arithmetic.compute.subgroupadd_float`
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
[mlir][ABI] Add writable, dead_on_unwind, dead_on_return, nofpclass param attrs to LLVM dialect (#188374)
The MLIR LLVM dialect is missing support for several parameter
attributes that
exist in LLVM IR: `writable`, `dead_on_unwind`, `dead_on_return`, and
`nofpclass`. This adds them to the kind-to-name mapping in
`AttrKindDetail.h`
and the corresponding name accessors in `LLVMDialect.td`.
The existing generic conversion infrastructure in `ModuleTranslation`
and
`ModuleImport` picks them up automatically — `writable` and
`dead_on_unwind`
round-trip as `UnitAttr`, while `dead_on_return` and `nofpclass`
round-trip as
`IntegerAttr`.
CIR needs these to match classic codegen's ABI output (sret gets
`writable
[2 lines not shown]
[CIR] Use data size in emitAggregateCopy for overlapping copies (#186702)
Add skip_tail_padding property to cir.copy to handle
potentially-overlapping
subobject copies directly, instead of falling back to cir.libc.memcpy.
When
set, the lowering uses the record's data size (excluding tail padding)
for
the memcpy length. This keeps typed semantics and promotability of
cir.copy.
Also fix CXXABILowering to preserve op properties when recreating
operations,
and expose RecordType::computeStructDataSize() for computing data size
of
padded record types.
[DA] Use SmallVector instead of raw new/delete (NFC) (#190586)
Some functions used `new`/`delete` to allocate/free arrays. To avoid
memory leaks, it would be better to avoid using raw pointers. This patch
replaces the use of them with `SmallVector`.
[flang][OpenMP] Remove namespace qualification from GetUpperName, NFC (#190619)
This applies to flang/lib/Semantics/openmp-utils.cpp, since it contains
`using namespace Fortran::parser::omp`.
[flang][OpenMP] Remove namespace qualification from GetUpperName, NFC
This applies to flang/lib/Semantics/openmp-utils.cpp, since it contains
`using namespace Fortran::parser::omp`.
AMDGPU: Add range attribute to mbcnt intrinsic callsites (#189191)
It seems the known bits handling added in
686987a540bc176bceaad43ffe530cb3e88796d5
is insufficient to perform many range based optimizations. For some
reason
computeConstantRange doesn't fall back on KnownBits, and has a separate,
less used form which tries to use computeKnownBits.
[CIR] Implement global decomposition declarations (#190364)
No real challenge to these, it is effectively a copy/paste of the
classic codegen as it just requires we properly emit the holding
variable. The rest falls out of the rest of our handling of variables.
[clang][bytecode] Don't unref constexpr-unknown references (#190177)
If the pointer for a reference is constexpr-unknown, use the pointer
itself instead, instead of dereferencing it. Unfortunately, that means
constexpr-unknown pointers to reach a lot more places than before.
Split DWARF v2 tests to exclude 64-bit AIX targets (#189077)
64-bit AIX requires DWARF64 format, which was only introduced in DWARF
v3. DWARF v2 only supports 32-bit DWARF format, making it incompatible
with 64-bit AIX (the compiler throws a fatal error). These changes split
DWARF v2 tests into separate files that exclude 64-bit AIX targets while
still running on 32-bit AIX and other 64-bit platforms where DWARF v2 is
supported.