[mlir][MemRef] Make fold-memref-alias-ops use memref interfaces
This replaces the large switch-cases and operation-specific patterns
in FoldMemRefAliashops with patterns that use the new
IndexedAccessOpInterface and IndexedMemCopyOpInterface, which will
allow us to remove the memref transforms' dependency on the NVGPU
dialect.
This does also resolve some bugs and potential unsoundnesses:
1. We will no longer fold in expand_shape into vector.load or
vector.transfer_read in cases where that would alter the strides
between dimensions in multi-dimensional loads. For example, if we have
a `vector.load %e[%i, %j, %k] : memref<8x8x9xf32>, vector<2x3xf32>`
where %e is
`expand_shape %m [[0], [1], [2. 3]] : memref<8x8x3x3xf32> to 8x8x9xf32,
we will no longer fold in that shape, since that would change which
value would be read (the previous patterns tried to account for this
but failed).
2. Subviews that have non-unit strides in positions that aren't being
[15 lines not shown]
[libc] Disable strong stack protector for baremetal (#179559)
Strong stack protector introduces references to __stack_chk_guard
symbols with GOT relocation in ARM 32 bit targets which is not supported
in typical baremetal environments. Turning this off for baremetal.
Reapply "[SelectionDAGISel] Separate the operand numbers in OPC_EmitNode/MorphNodeTo into their own table. (#178722)"
This includes a fix to use size_t instead of uint64_t in one place.
[RISCV] Print MIR comments for AVL and VEC_RM operands (#179542)
Such that we can now have something like:
```
PseudoVFMACC_VV_M2_E64 %1, %28, %28, 7 /* frm=dyn */, %21 /* vl */, 6 /* e64 */, 0 /* tu, mu */
```
or
```
PseudoVFMACC_VV_M2_E64 %1, %28, %28, 7 /* frm=dyn */, -1 /* vl=VLMAX */, 6 /* e64 */, 0 /* tu, mu */
```
Hopefully this could make reading RISC-V MIR (a little) less painful.
[BPF] Replace copy-assign by move-assign in llvm/lib/Target/BPF/ (#179462)
An SDLoc transitively contains a TrackingMDRef which have a specialized
move constructor. It's more efficient to move element to it instead of
copying them.
FileContent contains std::vector<...> values. It's more efficient to
move then to copy the whole vector.
[mlir][tblgen] Add PredTypeTrait/PredAttrTrait support (#169153)
This patch adds support for `PredTypeTrait` and `PredAttrTrait` in type
and attribute definitions, enabling declarative predicate-based
verification similar to how `PredOpTrait` works for operations.
## Motivation
In 802bf02 (from 2021), `PredTypeTrait`/`PredAttrTrait` were defined in
TableGen but not implemented in the code generator. Using them causes
mlir-tblgen to crash with an assertion failure when trying to cast
`PredTrait` to `InterfaceTrait`. This patch fixes the crash and
implements the actual verification code generation.
## Usage
Use `$paramName` syntax in predicates to reference type/attribute
parameters:
[15 lines not shown]
[ArgPromotion] Add DW_CC_nocall to DISubprogram (#178973)
ArgumentPromotion pass may change function signatures. If this happens
and debuginfo is enabled, adding DW_CC_nocall allows dwarf to generate
DW_AT_calling_convention (DW_CC_nocall)
for DW_TAG_subprogram.
DeadArgumentElimination ([1]) already has similar implementation.
The pahole tool ([2]) is used in linux kernel build to generate vmlinux
BTF. One of its input is linux kernel dwarf. Currently, pahole
checks *all* DW_TAG_subprogram functions and find whether the source
signature matches the architecture ABI or not. If mismatch, pahole will
try to do some adjustment for those parameters. See [3]
and function parameter__new().
The linux kernel typically has ~65K functions and roughly 1100 functions
may have signature changed due to compile optimization. Without
DW_CC_nocall,
signatures of all of 64K functions will be checked in parameter__new().
[34 lines not shown]
[mlir] GPUToROCDL: lower `gpu.subgroup_id` to the intrinsic where possible (#179422)
Lower `gpu.subgroup_id` to `wave.id` intrinsic on gfx12+, lower to
`linearized_thread_id / subgroup_size` on older.
Revert "[SelectionDAGISel] Separate the operand numbers in OPC_EmitNode/MorphNodeTo into their own table. (#178722)"
This reverts commit caab98284166784459a2fb76df7bca3f1d35e41e.
This is failing some build bots.
[Thread Safety Analysis] Fix a bug of context saving in alias-analysis (#178825)
The commit b4c98fcbe1504841203e610c351a3227f36c92a4 introduces
alias-analysis and conservatively invalidates variable definitions at
function calls. For each invalidated argument, it creates and pushes a
context. So if there are multiple arguments being invalidated, there are
more than one context being pushed. However, the analysis expects one
context at the program point of a call, causing context mismatch. This
issue could lead to false negatives.
For example,
```
MyLock->Lock(); // 'MyLock' holds the lock
Lock_t *Ptr = MyLock; // 'Ptr' aliases with 'MyLock'
// Before the fix, two contexts are saved and pushed at the call below, causing context mismatch later.
escapeAliasMultiple(&Irrelevant, &Ptr);
Ptr->Unlock(); // 'Ptr' may no longer hold the lock but the analyzer missed it due to context mismatch
```
This commit fixes the issue.
[2 lines not shown]
[NFC][TableGen] Adopt CodeGenHelpers in X86MnemonicEmitter (#179324)
Additionally, cleanup the code a bit to use nested namespace definition
and emit it per code section, and emit spaces instead of tabs.
[SelectionDAGISel] Separate the operand numbers in OPC_EmitNode/MorphNodeTo into their own table. (#178722)
The operand lists for these opcode require 1 byte per operand and are
usually small values that fit in 3-4 bits. This makes their storage
inefficient. In addition, many EmitNode/MorphNodeTo in the isel table
will use the same list of operand numbers.
This patch proposes to separate the operand lists into their own table
where they can be de-duplicated. The OPC_EmitNode/MorphNodeTo in the
main table will only store an index into this smaller table.
This is a reduced version of a suggestion from this very old FIXME.
https://github.com/llvm/llvm-project/blob/d8d4096c0be0a6a3248c8deae96608913a85debf/llvm/utils/TableGen/DAGISelMatcherGen.cpp#L1070
For RISC-V this reduces the main table from 1437353 bytes to 1276015
bytes plus a 929 byte operand list table. A savings of about 11%.
For X86 this reduces the main table from 719237 bytes to 623612 bytes
plus a 1042 byte operand list table. A savings of about 11%.
I expect further savings could be had by moving more bytes over.
[VPlan] Refine exit select check in transformtoPartialReduction.
Make sure we find the actual select for the exit users and only use it
for the final link in the chain. This fixes a miscompile after
90b3712d8a20efa2cbaadc177da576e485dce038.
[VPlan] Generalize `VPAllSuccessorsIterator` to support predecessors (#178724)
To be used in Mel's https://github.com/llvm/llvm-project/pull/173265.
---------
Co-authored-by: Florian Hahn <flo at fhahn.com>
Co-authored-by: Luke Lau <luke_lau at icloud.com>
[CIR] Implement 'allocsize' function/call attribute lowering (#179342)
The alloc_size attribute takes the argument number(normalized to the
index!) of the element size and count, for things like 'malloc' or
'calloc'.
This ends up being slightly more complicated than others, as this has
data that we have to decide on a format for. LLVM chooses to pack both
of these 32 bit values into a single i64, but unpacks it for the purpose
of input/output. The second value, the number of elements, is optional.
This patch uses a DenseI32ArrayAttr to store them for the LLVMIR
dialect, which gets us the packed nature, but doesn't require us doing
any work to unpack it.