[NFCI][AMDGPU] Convert more `SubtargetFeatures` to use `AMDGPUSubtargetFeature` and X-macros (#177256)
Extend the X-macro pattern to eliminate boilerplate for additional
subtarget features.
This reduces ~50 lines of repetitive member declarations and getter
definitions.
Revert "[CGObjC] Allow clang.arc.attachedcall on -O0 (#164875)"
This reverts commit 5c29b64fda6a5a66e09378eec9f28a42066a7c6a.
This was causing failures at HEAD on x86-64 Linux.
[msan] Handle aarch64_neon_vcvt* (#177243)
This fills in missing gaps in MSan's AArch64 NEON vector conversion
intrinsic handling (intrinsics named aarch64_neon_vcvt* instead of
aarch64_neon_fcvt*). SVE support sold separately.
It also generalizes handleNEONVectorConvertIntrinsic to handle
conversions to/from fixed-point.
[CGObjC] Allow clang.arc.attachedcall on -O0 (#164875)
It is supported in GlobalISel there. On X86, we always kick to
SelectionDAG anyway, so there is no point in not doing it for X86 too.
I do not have merge permissions.
[VPlan] Support VPWidenPointerInduction in getSCEVExprForVPValue (NFCI)
Support VPWidenPointerInductionRecipe in getSCEVExprForVPValue.
This is used in code paths when computing SCEV expressions in the
VPlan-based cost model, which should produce costs matching the legacy
cost model.
In standby_after_start order service reload ACTIVE/STANDBY
This has the added benefit that the reload on STANDBY will still
complete if the ACTIVE one is skipped for any reason.
(cherry picked from commit 0f66d934be3932c1b9b10700a517f84d3682ac48)
NAS-139417 / 26.04 / Robustize test alua config (#18082)
- In standby_after_start order service reload ACTIVE/STANDBY.
- Add some additional calls to _wait_for_alua_settle.
[NFCI][AMDGPU] Convert more `SubtargetFeatures` to use `AMDGPUSubtargetFeature` and X-macros
Extend the X-macro pattern to eliminate boilerplate for additional subtarget features.
This reduces ~50 lines of repetitive member declarations and getter definitions.
[mlir][MemRef] Make fold-memref-alias-ops use memref interfaces
This replaces the large switch-cases and operation-specific patterns
in FoldMemRefAliashops with patterns that use the new
IndexedAccessOpInterface and IndexedMemCopyOpInterface, which will
allow us to remove the memref transforms' dependency on the NVGPU
dialect.
This does also resolve some bugs and potential unsoundnesses:
1. We will no longer fold in expand_shape into vector.load or
vector.transfer_read in cases where that would alter the strides
between dimensions in multi-dimensional loads. For example, if we have
a `vector.load %e[%i, %j, %k] : memref<8x8x9xf32>, vector<2x3xf32>`
where %e is
`expand_shape %m [[0], [1], [2. 3]] : memref<8x8x3x3xf32> to 8x8x9xf32,
we will no longer fold in that shape, since that would change which
value would be read (the previous patterns tried to account for this
but failed).
2. Subviews that have non-unit strides in positions that aren't being
[15 lines not shown]
[mlir] Add [may]updateStartingPosition to VectorTransferOpInterface
This commit adds methods to VectorTransferOpInterface that allow
transfer operations to be queried for whether their base memref (or
tensor) and permutation map can be updated in some particular way and
then for performing this update. This is part of a series of changes
designed to make passes like fold-memref-alias-ops more generic,
allowing downstream operations, like IREE's transfer_gather, to
participate in them without needing to duplicate patterns.
[mlir] Implement indexed access op interfaces for memref, vector, gpu, nvgpu
This commit implements the IndexedAccessOpInterface and
IndexedMemCopyInterface for all operations in the memref and vector
dialects that it would appear to apply to. It follows the code in
FoldMemRefAliasOps and ExtractAddressComputations to define the
interface implementations. This commit also adds the interface to the
GPU subgroup MMA load and store operations and to any NVGPU operations
currently being handled by the in-memref transformations (there may be
more suitable operations in the NVGPU dialect, but I haven't gone
looking systematically)
This code will be tested by a later commit that updates
fold-memref-alias-ops.
Assisted-by: Claude Code, Cursor (interface boilerplate, sketching out
implementations)
[mlir][memref] Define interfaces for ops that access memrefs at an index
This commit defines interfaces for operations that perform certain
kinds of indexed access on a memref. These interfaces are defined so
that passes like fold-memref-alias-ops and the memref flattener can be
made generic over operations that, informally, have the forms
`op ... %m[%i0, %i1, ...] ...` (an IndexedAccessOpInterface) or the
form `op %src[%s0, %s1, ...], %dst[%d0, %d1, ...] size ...` (an
IndexedMemCopyOpInterface).
These interfaces have been designed such that all the passes under
MemRef/Transforms that currently have a big switch-case on
memref.load, vector.load, nvgpu.ldmatrix, etc. can be migrated to use
them.
(This'll also let us get rid of the awkward fact that we have memref
transforms depending on the GPU and NVGPU dialects)
While the interface doesn't currently contemplate changing element
[6 lines not shown]
[NFCI][AMDGPU] Convert more `SubtargetFeatures` to use `AMDGPUSubtargetFeature` and X-macros
Extend the X-macro pattern to eliminate boilerplate for additional subtarget features.
This reduces ~50 lines of repetitive member declarations and getter definitions.
In standby_after_start order service reload ACTIVE/STANDBY
This has the added benefit that the reload on STANDBY will still
complete if the ACTIVE one is skipped for any reason.
[TableGen] Prefer base class on tied RC sizes
When searching for a matching subclass tablegen behavior is non
deterministic if we have several classes with the same size.
Break the tie by chooisng a class with smaller BaseClassOrder.
[bazel] Suppress `-Wunused-command-line-argument` for header parsing (#177246)
Running bazel CI is full of warnings like this:
```
INFO: From Compiling libc/hdr/types/clock_t.h:
clang-21: warning: argument unused during compilation: '-c' [-Wunused-command-line-argument]
```
https://github.com/bazelbuild/rules_cc/pull/573 is a possible fix in
bazel itself. Until then, just use a copt to ignore it.
[AArch64] Handle all NZCV clobbers in AArch64ConditionOptimizer (#177034)
This pass was special casing some instructions that could clobber NZCV between
a CMP and a Bcc. This patch alters that to all instructions that might modify
NZCV, making sure we handle all cases.
Add PIPE to subprocess.run in link-up.py for stdout capture
- Updated subprocess.run with `stdout=PIPE` for capturing output
- Removed unnecessary `close_fds` parameter (default True in Python 3)