[VPlan] Split out optimizeEVLMasks. NFC (#174925)
Addresses part of #153144 and splits off part of #166164
There are two parts to the EVL transform:
1) Convert the loop so the number of elements processed each iteration
is EVL, not VF. The IV and header mask are replaced with EVL-based
variants.
2) Optimize users of the EVL based header mask to VP intrinsic based
recipes.
(1) changes the semantics of the vector loop region, whereas (2) needs
to preserve them. This splits (2) out so we don't mix the two up, and
allows us to move (1) earlier in the pipeline in a future PR.
[NFC][Linalg] Add `matchConvolutionOpOfType` API and make `isaConvolutionOpOfType` API a wrapper (#174722)
-- This commit involves the following updates pertaining to
`isaConvolutionOpOfType` API :-
1. We don't want dilations/strides of convolution op to be returned as
pointer arguments to the API function - to tackle this we create a new
API `matchConvolutionOpOfType` which would return an optional struct of
dilations/stride.
2. To not break the original API's use case as a simple querying
functionality with true/false return - we keep `isaConvolutionOpOfType`
as a wrapper API which will invoke `matchConvolutionOpOfType` API and
return true/false depending on whether `matchConvolutionOpOfType` API
returned any value or not.
3. Dilations/strides of named convolution op are also populated now (it
was missed in the previous PRs while creating `isaConvolutionOpOfType`).
4. [Max/Min]UnsignedPool ops' body matcher now only matches unsigned int
ops (refer: https://github.com/llvm/llvm-project/pull/166070)
-- No tests are being added as all the above are NFC changes around the
[2 lines not shown]
[MLIR][Bufferization] Fold LoadOp only when the buffer is read only (#172595)
When we `memref.load` from a buffer, it folded to `tensor.extract` even
when the buffer was writable, causing unexpected results. For example:
```mlir
func.func @load_after_write_from_buffer_cast(%arg0: index, %arg1: index,
%arg2: tensor<?x?xf32>) -> f32 {
%0 = bufferization.to_buffer %arg2 : tensor<?x?xf32> to memref<?x?xf32>
linalg.ceil ins(%0 : memref<?x?xf32>) outs(%0 : memref<?x?xf32>)
%1 = memref.load %0[%arg0, %arg1] : memref<?x?xf32>
return %1 : f32
}
```
would fold into
```mlir
module {
func.func @load_after_write_from_buffer_cast(%arg0: index, %arg1: index, %arg2: tensor<?x?xf32>) -> f32 {
%0 = bufferization.to_buffer %arg2 : tensor<?x?xf32> to memref<?x?xf32>
[5 lines not shown]
[clang-repl] Use more precise search to find the orc runtime. (#175805)
The new mechanism relies on the path in the toolchain which should be
the autoritative answer. This patch tweaks the discovery of the orc
runtime from unittests where the resource directory is hard to deduce.
Should address the issue raised in #175435 and #175322
[RISCV][llvm] Support select codegen for P extension (#175741)
This is scalar condition with fixed vector true/false value, we can just
handle it same as scalars.
[RISCV][llvm] Support vselect codegen for P extension (#175744)
The only difference between vselect vs. select is condition value(a.k.a.
mask), we can select by using bitwise operation:
vselect(mask, true, false) = (mask & true) | (~mask & false)
[CodeGen][NPM] Add support for -print-regusage in New Pass Manager (#169761)
Support `-print-regusage` flag in NPM for printing register usage information
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK in riscv JIT (Menglong
Dong)
- Fix reference count leak in bpf_prog_test_run_xdp() (Tetsuo Handa)
- Fix metadata size check in bpf_test_run() (Toke Høiland-Jørgensen)
- Check that BPF insn array is not allowed as a map for const strings
(Deepanshu Kartikey)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix reference count leak in bpf_prog_test_run_xdp()
bpf: Reject BPF_MAP_TYPE_INSN_ARRAY in check_reg_const_str()
selftests/bpf: Update xdp_context_test_run test to check maximum metadata size
bpf, test_run: Subtract size of xdp_frame from allowed metadata size
riscv, bpf: Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK
security/bitwarden-cli: Update to 2025.12.1
While here, convert to use USES=electron for fetching and archiving
node modules, and adjust Makefile accordingly.
Changelog: https://github.com/bitwarden/clients/releases/tag/cli-v2025.12.1
Reported by: GitHub (watch releases)
[AMDGPU][GlobalISel] Add RegBankLegalize support for G_AMDGPU_S_MUL_*
Patch 3 of 4 patches to implement full G_MUL support in regbanklegalize.
Current mul.ll test is only partially updated and expected to fail.
It will be updated in the fourth patch.
[AMDGPU] Fix the encoding of VOP3PX2 instructions
ISA spec says `SCALE_OPSEL[0:1]` determines which parts of S3 and S4 are used, and `SCALE_OPSEL_HI[0:1]` should be zero.
[AMDGPU][GlobalISel] Add RegBankLegalize support for G_AMDGPU_MAD_*
Patch 2 of 4 patches to implement full G_MUL support in regbanklegalize.
Current mul.ll test is only partially updated and expected to fail.
It will be updated in the fourth patch.