[clang-tidy][NFC] Fix incorrect `list.rst` modification by `add_new_check.py` (#179297)
We have observed unexpected and extensive modifications to `list.rst` in
a few Pull Requests. After some investigation I found that
`add_new_check.py` was misclassifying existing checks, leading to
instability in the generated documentation list.
More specifically:
- The script relies on `http-equiv=refresh` meta tags to identify alias
checks, which is missing in several existing checks, causing them to be
incorrectly listed as regular checks.
- The script fails to detect fix-its in checks that use CamelCase helper
methods.
With this patch, running `add_new_check.py` now generates a stable and
correct `list.rst` consistent with the actual codebase state.
[BOLT] Get symbol for const island referenced across func by relocation (#178988)
When handling relocation in one function referencing code or
data defined in another function, we should check if relocation
target is constant island or not, and get the referenced symbol
accordingly for both cases.
[mlir][MemRef] Make fold-memref-alias-ops use memref interfaces
This replaces the large switch-cases and operation-specific patterns
in FoldMemRefAliashops with patterns that use the new
IndexedAccessOpInterface and IndexedMemCopyOpInterface, which will
allow us to remove the memref transforms' dependency on the NVGPU
dialect.
This does also resolve some bugs and potential unsoundnesses:
1. We will no longer fold in expand_shape into vector.load or
vector.transfer_read in cases where that would alter the strides
between dimensions in multi-dimensional loads. For example, if we have
a `vector.load %e[%i, %j, %k] : memref<8x8x9xf32>, vector<2x3xf32>`
where %e is
`expand_shape %m [[0], [1], [2. 3]] : memref<8x8x3x3xf32> to 8x8x9xf32,
we will no longer fold in that shape, since that would change which
value would be read (the previous patterns tried to account for this
but failed).
2. Subviews that have non-unit strides in positions that aren't being
[15 lines not shown]
[mlir] Add [may]updateStartingPosition to VectorTransferOpInterface
This commit adds methods to VectorTransferOpInterface that allow
transfer operations to be queried for whether their base memref (or
tensor) and permutation map can be updated in some particular way and
then for performing this update. This is part of a series of changes
designed to make passes like fold-memref-alias-ops more generic,
allowing downstream operations, like IREE's transfer_gather, to
participate in them without needing to duplicate patterns.
[mlir] Implement indexed access op interfaces for memref, vector, gpu, nvgpu
This commit implements the IndexedAccessOpInterface and
IndexedMemCopyInterface for all operations in the memref and vector
dialects that it would appear to apply to. It follows the code in
FoldMemRefAliasOps and ExtractAddressComputations to define the
interface implementations. This commit also adds the interface to the
GPU subgroup MMA load and store operations and to any NVGPU operations
currently being handled by the in-memref transformations (there may be
more suitable operations in the NVGPU dialect, but I haven't gone
looking systematically)
This code will be tested by a later commit that updates
fold-memref-alias-ops.
Assisted-by: Claude Code, Cursor (interface boilerplate, sketching out
implementations)
Thread Safety Analysis: Add literal-based alias test (#179041)
Add a simple literal-based alias test that shows that the recently fixed
value-based literal comparison works when resolving aliases.
NFC.
[AMDGPU] Allow hoising of V_READFIRSTLANE_B32 for uniform operand
readfirstlane can be moved across control flow for uniform inputs.
The MachineInstr::NoConvergent attribute allows hoisting
which is otherwise prohibited for a convergent instruction.
[AMDGPU] Allo hoising of V_READFIRSTLANE_B32 for uniform operand
readfirstlane can be moved across control flow for uniform inputs.
The MachineInstr::NoConvergent attribute allows hoisting
which is otherwise prohibited for a convergent instruction.
[scudo] Add resident pages info to getStats (#178969)
Adding resident pages field to the primary allocator's getStats function
makes it consistent with the secondary allocator's getStats function.
Attributor: Add -light otions to -attributor-enable flag
Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.
Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.
I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
[AMDGPU][GlobalISel] Add COPY_SCC_VCC combine for VCC-SGPR-VGPR pattern
Eliminate VCC->SGPR->VGPR bounce created by UniInVcc when the uniform boolean
result is consumed by a VALU instruction that requires the input in VGPRs.
[RISC-V][Mach-O] Add codegen support for Mach-O object format. (#178263)
This commit enables code generation for RISC-V targeting Mach-O:
- Implement RISCVMachOTargetObjectFile::getNameWithPrefix method to
handle Mach-O symbol naming requirements.
- Use shouldAssumeDSOLocal() in RISCVTargetLowering::lowerGlobalAddress
instead of isDSOLocal() for proper Mach-O semantics in global address
lowering. Note that this is a NFC for RISCV when targeting ELF.
- Add comprehensive tests for various relocation types (direct globals,
GOT-based addressing, static vs PIC models).
- Test function calls, tail calls, and various symbol reference patterns
including addends and subtractions.
This patch is based on code originally written by Tim Northover.
[CodeGen] Add getTgtMemIntrinsic overload for multiple memory operands (NFC) (#175843)
There are target intrinsics that logically require two MMOs, such as
llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS,
so there's both a load and a store to different addresses.
Add an overload of getTgtMemIntrinsic that produces intrinsic info in a
vector, and implement it in terms of the existing (now protected)
overload.
GlobalISel and SelectionDAG paths are updated to support multiple MMOs.
The main part of this change is supporting multiple MMOs in
MemIntrinsicNodes.
Converting the backends to using the new overload is a fairly mechanical step
that is done in a separate change in the hope that that allows reducing merging
pains during review and for downstreams. A later change will then enable
using multiple MMOs in AMDGPU.
[mlir][tensor] Emit diagnostics for unranked tensor reshape ops instead of asserting (#179005)
This PR updates tensor.expand_shape and tensor.collapse_shape ODS
definitions to require ranked tensor operands/results by switching from
AnyTensor to AnyRankedTensor.
Fixes https://github.com/llvm/llvm-project/issues/178228