[mlir][MemRef] Make fold-memref-alias-ops use memref interfaces
This replaces the large switch-cases and operation-specific patterns
in FoldMemRefAliashops with patterns that use the new
IndexedAccessOpInterface and IndexedMemCopyOpInterface, which will
allow us to remove the memref transforms' dependency on the NVGPU
dialect.
This does also resolve some bugs and potential unsoundnesses:
1. We will no longer fold in expand_shape into vector.load or
vector.transfer_read in cases where that would alter the strides
between dimensions in multi-dimensional loads. For example, if we have
a `vector.load %e[%i, %j, %k] : memref<8x8x9xf32>, vector<2x3xf32>`
where %e is
`expand_shape %m [[0], [1], [2. 3]] : memref<8x8x3x3xf32> to 8x8x9xf32,
we will no longer fold in that shape, since that would change which
value would be read (the previous patterns tried to account for this
but failed).
2. Subviews that have non-unit strides in positions that aren't being
[15 lines not shown]
pf: fix use of uninitialised variable
In pf_match_rule() we attempt to append matching rules to the end of
'match_rules'. We want to preserve the order to make the multiple
pflog entries easier to understand. So we keep track of the last added
rule item in 'rt'. However, that assumed that 'match_rules' was only
ever added to in that one call to pf_match_rules(). This isn't always
the case, for example if we have match rules in different anchors.
In that case we'd end up using the uninitialised 'rt' variable in the
SLIST_INSERT_AFTER call.
Instead track the match rules and the last matching rule (to enable
easy appending) in the struct pf_test_ctx.
This also allows us to reduce the number of arguments for some
functions, because we passed a ctx to most functions that needed
'match_rules'.
While here also make pf_match_rules() static, because it's only ever
used in pf.c
[5 lines not shown]
sysctl(9): Booleans: Fix old value length discovery
When calling sysctl(3) with a null 'oldp', i.e., length discovery mode,
'oldix' can be equal to 'oldlen', and we should not fail.
More generally, let SYSCTL_OUT() and SYSCTL_IN() handle corner cases,
simply removing the comparisons between 'oldidx' and 'oldlen' and
'newidx' and 'newlen' done by hand as the test just after is an equality
that does not require to know if 'idx' is smaller than 'len'.
PR: 292917
Reported by: cy
Fixes: 406da392ef8d ("sysctl(9): Booleans: Accept integers to ease knob conversion")
Sponsored by: The FreeBSD Foundation
[libc] Disable strong stack protector for baremetal (#179559)
Strong stack protector introduces references to __stack_chk_guard
symbols with GOT relocation in ARM 32 bit targets which is not supported
in typical baremetal environments. Turning this off for baremetal.
Reapply "[SelectionDAGISel] Separate the operand numbers in OPC_EmitNode/MorphNodeTo into their own table. (#178722)"
This includes a fix to use size_t instead of uint64_t in one place.
[RISCV] Print MIR comments for AVL and VEC_RM operands (#179542)
Such that we can now have something like:
```
PseudoVFMACC_VV_M2_E64 %1, %28, %28, 7 /* frm=dyn */, %21 /* vl */, 6 /* e64 */, 0 /* tu, mu */
```
or
```
PseudoVFMACC_VV_M2_E64 %1, %28, %28, 7 /* frm=dyn */, -1 /* vl=VLMAX */, 6 /* e64 */, 0 /* tu, mu */
```
Hopefully this could make reading RISC-V MIR (a little) less painful.
[BPF] Replace copy-assign by move-assign in llvm/lib/Target/BPF/ (#179462)
An SDLoc transitively contains a TrackingMDRef which have a specialized
move constructor. It's more efficient to move element to it instead of
copying them.
FileContent contains std::vector<...> values. It's more efficient to
move then to copy the whole vector.
lang/python3{12,13}: limit parallel .pyc compilation to MAKE_JOBS
This option is available since python312[0]. This fixes
python312 and python313 builds with qemu-user-static
emulating riscv64.
[0] https://github.com/python/cpython/commit/9a7e9f9921804f3f90151ca42703e612697dd430
Approved by: vishwin (#python), lwhsu (mentor)
Signed-off-by: Siva Mahadevan <siva at FreeBSD.org>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D54906
[mlir][tblgen] Add PredTypeTrait/PredAttrTrait support (#169153)
This patch adds support for `PredTypeTrait` and `PredAttrTrait` in type
and attribute definitions, enabling declarative predicate-based
verification similar to how `PredOpTrait` works for operations.
## Motivation
In 802bf02 (from 2021), `PredTypeTrait`/`PredAttrTrait` were defined in
TableGen but not implemented in the code generator. Using them causes
mlir-tblgen to crash with an assertion failure when trying to cast
`PredTrait` to `InterfaceTrait`. This patch fixes the crash and
implements the actual verification code generation.
## Usage
Use `$paramName` syntax in predicates to reference type/attribute
parameters:
[15 lines not shown]
[ArgPromotion] Add DW_CC_nocall to DISubprogram (#178973)
ArgumentPromotion pass may change function signatures. If this happens
and debuginfo is enabled, adding DW_CC_nocall allows dwarf to generate
DW_AT_calling_convention (DW_CC_nocall)
for DW_TAG_subprogram.
DeadArgumentElimination ([1]) already has similar implementation.
The pahole tool ([2]) is used in linux kernel build to generate vmlinux
BTF. One of its input is linux kernel dwarf. Currently, pahole
checks *all* DW_TAG_subprogram functions and find whether the source
signature matches the architecture ABI or not. If mismatch, pahole will
try to do some adjustment for those parameters. See [3]
and function parameter__new().
The linux kernel typically has ~65K functions and roughly 1100 functions
may have signature changed due to compile optimization. Without
DW_CC_nocall,
signatures of all of 64K functions will be checked in parameter__new().
[34 lines not shown]
[mlir] GPUToROCDL: lower `gpu.subgroup_id` to the intrinsic where possible (#179422)
Lower `gpu.subgroup_id` to `wave.id` intrinsic on gfx12+, lower to
`linearized_thread_id / subgroup_size` on older.
Revert "[SelectionDAGISel] Separate the operand numbers in OPC_EmitNode/MorphNodeTo into their own table. (#178722)"
This reverts commit caab98284166784459a2fb76df7bca3f1d35e41e.
This is failing some build bots.
[Thread Safety Analysis] Fix a bug of context saving in alias-analysis (#178825)
The commit b4c98fcbe1504841203e610c351a3227f36c92a4 introduces
alias-analysis and conservatively invalidates variable definitions at
function calls. For each invalidated argument, it creates and pushes a
context. So if there are multiple arguments being invalidated, there are
more than one context being pushed. However, the analysis expects one
context at the program point of a call, causing context mismatch. This
issue could lead to false negatives.
For example,
```
MyLock->Lock(); // 'MyLock' holds the lock
Lock_t *Ptr = MyLock; // 'Ptr' aliases with 'MyLock'
// Before the fix, two contexts are saved and pushed at the call below, causing context mismatch later.
escapeAliasMultiple(&Irrelevant, &Ptr);
Ptr->Unlock(); // 'Ptr' may no longer hold the lock but the analyzer missed it due to context mismatch
```
This commit fixes the issue.
[2 lines not shown]