[MLIR][Arith] Fix crash in `arith.select` verification with mixed types (#178840)
The `BooleanConditionOrMatchingShape` trait was assuming that if the
condition was not i1, both condition and result must be `ShapedTypes`.
It would then call `AllShapesMatch` which performs a blind cast to
`ShapedType`, causing a crash when one of the operands was a scalar.
This PATCH fixes the problem.
Closes [#178230](https://github.com/llvm/llvm-project/issues/178230)
[TargetParser][cmake] Recurse for TableGen deps (#177274)
In the dependency tracking for TableGen-generated files, globbing was
previously limited to the root of include directories. This missed
transitive dependencies in subdirectories, such as the target-specific
intrinsic definitions located in llvm/IR/.
Modifying these untracked files could cause global state (like the
intrinsic enum) to shift without triggering a rebuild of downstream
instruction selectors. This resulted in "Cannot select: intrinsic"
errors during incremental builds. Using a recursive glob ensures all
relevant TableGen files are correctly tracked regardless of their
directory depth.
Fixes #156744
[libclc] Only use software fma for r600 target (#179428)
Implement generic __clc_fma with __builtin_elementwise_fma for all
targets except for r600.
Add --spirv-ext=+SPV_KHR_fma flag to SPIR-V generation. SPIR-V target
supports @llvm.fma since SPV_KHR_fma was implemented in llvm-spirv
(https://github.com/KhronosGroup/SPIRV-LLVM-Translator/pull/3467) and
SPIR-V backend (8f8dfbf8c9f0).
This PR assumes SPIR-V consumer with modern hardware supports fma.
Reland "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)" (#165032)
This is an attempt to merge https://reviews.llvm.org/D144006 with LTO
fix.
The last merge attempt was
https://github.com/llvm/llvm-project/pull/75385.
The issue with it was investigated in
https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121.
The problem happens when
1. Several modules are being linked.
2. There are several DISubprograms that initially belong to different
modules but represent the same source code function (for example, a
function included from the same source code file).
3. Some of such DISubprograms survive IR linking. It may happen if one
of them is inlined somewhere or if the functions that have these
DISubprograms attached have internal linkage.
4. Each of these DISubprograms has a local type that corresponds to the
same source code type. These types are initially from different modules,
[39 lines not shown]
[RISCV] Sink some encoding related lets into class/def bodies. NFC (#179544)
Rather than using lets around classes/defs, override them in the class
def/body.
Some of these lets were around single class/def were I thought it was
better to be inside. Some were around multiple unrelated classes where
it seemed better not to link their encodings like that.
For vmv, I added a multiclass to better encapsulate them but still kept
the let scope to avoid repetition. The encodings are closely related
enough that I thought this was ok.
[mlir][MemRef] Make fold-memref-alias-ops use memref interfaces
This replaces the large switch-cases and operation-specific patterns
in FoldMemRefAliashops with patterns that use the new
IndexedAccessOpInterface and IndexedMemCopyOpInterface, which will
allow us to remove the memref transforms' dependency on the NVGPU
dialect.
This does also resolve some bugs and potential unsoundnesses:
1. We will no longer fold in expand_shape into vector.load or
vector.transfer_read in cases where that would alter the strides
between dimensions in multi-dimensional loads. For example, if we have
a `vector.load %e[%i, %j, %k] : memref<8x8x9xf32>, vector<2x3xf32>`
where %e is
`expand_shape %m [[0], [1], [2. 3]] : memref<8x8x3x3xf32> to 8x8x9xf32,
we will no longer fold in that shape, since that would change which
value would be read (the previous patterns tried to account for this
but failed).
2. Subviews that have non-unit strides in positions that aren't being
[15 lines not shown]
[libc] Disable strong stack protector for baremetal (#179559)
Strong stack protector introduces references to __stack_chk_guard
symbols with GOT relocation in ARM 32 bit targets which is not supported
in typical baremetal environments. Turning this off for baremetal.
Reapply "[SelectionDAGISel] Separate the operand numbers in OPC_EmitNode/MorphNodeTo into their own table. (#178722)"
This includes a fix to use size_t instead of uint64_t in one place.
[RISCV] Print MIR comments for AVL and VEC_RM operands (#179542)
Such that we can now have something like:
```
PseudoVFMACC_VV_M2_E64 %1, %28, %28, 7 /* frm=dyn */, %21 /* vl */, 6 /* e64 */, 0 /* tu, mu */
```
or
```
PseudoVFMACC_VV_M2_E64 %1, %28, %28, 7 /* frm=dyn */, -1 /* vl=VLMAX */, 6 /* e64 */, 0 /* tu, mu */
```
Hopefully this could make reading RISC-V MIR (a little) less painful.
[BPF] Replace copy-assign by move-assign in llvm/lib/Target/BPF/ (#179462)
An SDLoc transitively contains a TrackingMDRef which have a specialized
move constructor. It's more efficient to move element to it instead of
copying them.
FileContent contains std::vector<...> values. It's more efficient to
move then to copy the whole vector.
[mlir][tblgen] Add PredTypeTrait/PredAttrTrait support (#169153)
This patch adds support for `PredTypeTrait` and `PredAttrTrait` in type
and attribute definitions, enabling declarative predicate-based
verification similar to how `PredOpTrait` works for operations.
## Motivation
In 802bf02 (from 2021), `PredTypeTrait`/`PredAttrTrait` were defined in
TableGen but not implemented in the code generator. Using them causes
mlir-tblgen to crash with an assertion failure when trying to cast
`PredTrait` to `InterfaceTrait`. This patch fixes the crash and
implements the actual verification code generation.
## Usage
Use `$paramName` syntax in predicates to reference type/attribute
parameters:
[15 lines not shown]
[ArgPromotion] Add DW_CC_nocall to DISubprogram (#178973)
ArgumentPromotion pass may change function signatures. If this happens
and debuginfo is enabled, adding DW_CC_nocall allows dwarf to generate
DW_AT_calling_convention (DW_CC_nocall)
for DW_TAG_subprogram.
DeadArgumentElimination ([1]) already has similar implementation.
The pahole tool ([2]) is used in linux kernel build to generate vmlinux
BTF. One of its input is linux kernel dwarf. Currently, pahole
checks *all* DW_TAG_subprogram functions and find whether the source
signature matches the architecture ABI or not. If mismatch, pahole will
try to do some adjustment for those parameters. See [3]
and function parameter__new().
The linux kernel typically has ~65K functions and roughly 1100 functions
may have signature changed due to compile optimization. Without
DW_CC_nocall,
signatures of all of 64K functions will be checked in parameter__new().
[34 lines not shown]
[mlir] GPUToROCDL: lower `gpu.subgroup_id` to the intrinsic where possible (#179422)
Lower `gpu.subgroup_id` to `wave.id` intrinsic on gfx12+, lower to
`linearized_thread_id / subgroup_size` on older.