[M68k] Fix MOVEM collapse pass for 2 instances of same register (#174349)
Add test case for MOVEM collapse opt pass failure and fix pass handling
of 2 appearances of the same register in a MOVEM block.
[Sanitizers] Remove unused variable (#177061)
Must've remained from debugging the test case.
rdar://119958411
Co-authored-by: Mariusz Borsa <m_borsa at apple.com>
[TableGen] Prefer base class on tied RC sizes
When searching for a matching subclass tablegen behavior is non
deterministic if we have several classes with the same size.
Break the tie by chooisng a class with smaller BaseClassOrder.
[clang][test] Specify value of `-fopenmp=libomp` for tests. (#177239)
`libomp` is the default value when unconfigured in cmake, but llvm can
be configured to have `libgomp` be the default instead. Explicitly
specify this value so the test does not fail when it assumes libomp is
always the default.
Fix for f369d23ceaa49ffa9e6ef9673851749d66b35b3f (#150580)
[LifetimeSafety] Remove "experimental-" prefix from flags and diagnostics (#176821)
Remove the "experimental-" prefix from lifetime safety diagnostic groups
and command-line options. This enables the analysis in `-Wall`.
We are now in a pretty stable state with no crashes. This change
indicates that lifetime safety analysis is no longer considered
experimental and is now a stable feature. By removing the
"experimental-" prefix, we're signaling to users that this functionality
is ready for use.
- Renamed diagnostic groups from `experimental-lifetime-safety*` to
`lifetime-safety*`
- Updated command-line options from `-fexperimental-lifetime-safety*` to
`-flifetime-safety*` and this is now ON by default.
- Added a check to only enable lifetime safety analysis when relevant
diagnostics are enabled
- Updated test files to use the new flag names
[CIR][X86] Add support for shuff32x4/shufi32x4 builtins (#172960)
This implementation is adapted from the existing code for
`X86::BI__builtin_ia32_shuf_i*` and `X86::BI__builtin_ia32_shuf_f*` from
`/llvm-project/clang/lib/CodeGen/TargetBuiltins/X86.cpp`.
It adds support for the following X86 builtins:
- __builtin_ia32_shuf_f32x4
- __builtin_ia32_shuf_f64x2
- __builtin_ia32_shuf_i32x4
- __builtin_ia32_shuf_i64x2
- __builtin_ia32_shuf_f32x4_256
- __builtin_ia32_shuf_f64x2_256
- __builtin_ia32_shuf_i32x4_256
- __builtin_ia32_shuf_i64x2_256
Part of https://github.com/llvm/llvm-project/issues/167765
[TableGen] Prefer base class on tied RC sizes
When searching for a matching subclass tablegen behavior is non
deterministic if we have several classes with the same size.
Break the tie by chooisng a class with smaller BaseClassOrder.
[clang] Fix lifetime extension of temporaries in for-range-initializers in templates (#177191)
Fixes https://github.com/llvm/llvm-project/issues/165182.
This patch fix the lifetime extension of temporaries in
for-range-initializers in templates. Whether this issue was occurred
when the for-range statement in a dependent context, but itself is not
type/value dependent.
---------
Signed-off-by: Wang, Yihan <yronglin777 at gmail.com>
[NFCI][AMDGPU] Convert more `SubtargetFeatures` to use `AMDGPUSubtargetFeature` and X-macros (#177256)
Extend the X-macro pattern to eliminate boilerplate for additional
subtarget features.
This reduces ~50 lines of repetitive member declarations and getter
definitions.
Revert "[CGObjC] Allow clang.arc.attachedcall on -O0 (#164875)"
This reverts commit 5c29b64fda6a5a66e09378eec9f28a42066a7c6a.
This was causing failures at HEAD on x86-64 Linux.
[msan] Handle aarch64_neon_vcvt* (#177243)
This fills in missing gaps in MSan's AArch64 NEON vector conversion
intrinsic handling (intrinsics named aarch64_neon_vcvt* instead of
aarch64_neon_fcvt*). SVE support sold separately.
It also generalizes handleNEONVectorConvertIntrinsic to handle
conversions to/from fixed-point.
[CGObjC] Allow clang.arc.attachedcall on -O0 (#164875)
It is supported in GlobalISel there. On X86, we always kick to
SelectionDAG anyway, so there is no point in not doing it for X86 too.
I do not have merge permissions.
[VPlan] Support VPWidenPointerInduction in getSCEVExprForVPValue (NFCI)
Support VPWidenPointerInductionRecipe in getSCEVExprForVPValue.
This is used in code paths when computing SCEV expressions in the
VPlan-based cost model, which should produce costs matching the legacy
cost model.
[NFCI][AMDGPU] Convert more `SubtargetFeatures` to use `AMDGPUSubtargetFeature` and X-macros
Extend the X-macro pattern to eliminate boilerplate for additional subtarget features.
This reduces ~50 lines of repetitive member declarations and getter definitions.
[mlir][MemRef] Make fold-memref-alias-ops use memref interfaces
This replaces the large switch-cases and operation-specific patterns
in FoldMemRefAliashops with patterns that use the new
IndexedAccessOpInterface and IndexedMemCopyOpInterface, which will
allow us to remove the memref transforms' dependency on the NVGPU
dialect.
This does also resolve some bugs and potential unsoundnesses:
1. We will no longer fold in expand_shape into vector.load or
vector.transfer_read in cases where that would alter the strides
between dimensions in multi-dimensional loads. For example, if we have
a `vector.load %e[%i, %j, %k] : memref<8x8x9xf32>, vector<2x3xf32>`
where %e is
`expand_shape %m [[0], [1], [2. 3]] : memref<8x8x3x3xf32> to 8x8x9xf32,
we will no longer fold in that shape, since that would change which
value would be read (the previous patterns tried to account for this
but failed).
2. Subviews that have non-unit strides in positions that aren't being
[15 lines not shown]
[mlir] Add [may]updateStartingPosition to VectorTransferOpInterface
This commit adds methods to VectorTransferOpInterface that allow
transfer operations to be queried for whether their base memref (or
tensor) and permutation map can be updated in some particular way and
then for performing this update. This is part of a series of changes
designed to make passes like fold-memref-alias-ops more generic,
allowing downstream operations, like IREE's transfer_gather, to
participate in them without needing to duplicate patterns.
[mlir] Implement indexed access op interfaces for memref, vector, gpu, nvgpu
This commit implements the IndexedAccessOpInterface and
IndexedMemCopyInterface for all operations in the memref and vector
dialects that it would appear to apply to. It follows the code in
FoldMemRefAliasOps and ExtractAddressComputations to define the
interface implementations. This commit also adds the interface to the
GPU subgroup MMA load and store operations and to any NVGPU operations
currently being handled by the in-memref transformations (there may be
more suitable operations in the NVGPU dialect, but I haven't gone
looking systematically)
This code will be tested by a later commit that updates
fold-memref-alias-ops.
Assisted-by: Claude Code, Cursor (interface boilerplate, sketching out
implementations)
[mlir][memref] Define interfaces for ops that access memrefs at an index
This commit defines interfaces for operations that perform certain
kinds of indexed access on a memref. These interfaces are defined so
that passes like fold-memref-alias-ops and the memref flattener can be
made generic over operations that, informally, have the forms
`op ... %m[%i0, %i1, ...] ...` (an IndexedAccessOpInterface) or the
form `op %src[%s0, %s1, ...], %dst[%d0, %d1, ...] size ...` (an
IndexedMemCopyOpInterface).
These interfaces have been designed such that all the passes under
MemRef/Transforms that currently have a big switch-case on
memref.load, vector.load, nvgpu.ldmatrix, etc. can be migrated to use
them.
(This'll also let us get rid of the awkward fact that we have memref
transforms depending on the GPU and NVGPU dialects)
While the interface doesn't currently contemplate changing element
[6 lines not shown]
[NFCI][AMDGPU] Convert more `SubtargetFeatures` to use `AMDGPUSubtargetFeature` and X-macros
Extend the X-macro pattern to eliminate boilerplate for additional subtarget features.
This reduces ~50 lines of repetitive member declarations and getter definitions.
[TableGen] Prefer base class on tied RC sizes
When searching for a matching subclass tablegen behavior is non
deterministic if we have several classes with the same size.
Break the tie by chooisng a class with smaller BaseClassOrder.