[Tablegen] Fix condition to report when lanemask overflows (#181810)
This PR:
Fixes a slight off-by-one error in the check for how many bits are
allocated for subreg lane masks. If 65 subreg lanes are used, it fails
later, but the error message is not clear as to what has occured.
[mlir] [linalg] fix failure on specializing matmul with permuted loops (#184294)
This patch fixes generic specialization when the loop dimensions are
permuted in the generic w.r.t. to canonical iterator order of the named
ops by not forwarding the maps of the original generic and instead
recreating them ensuring they always follow the canonical order.
For example, the generic which is to be specialized to a matmul could
have `[parallel, reduction, parallel]` loops, specializing this as is
and just coping the indexing maps like we do now will lead to a
verification error since the dimension will not match the canonical form
the matmul named op expects
e.g. the maps could be:
```
(m, k, n) -> (m,k)
...
```
So we would have to recreate the maps to be:
[5 lines not shown]
[CodeGen] Use data layout aware constant folder in CGBuilder (#184819)
Use the DataLayout-aware TargetFolder instead of ConstantFolder in
Clang's CGBuilder. The primary impact of this change is that GEP
constant expressions are now emitted in canonical `getelementptr i8`
form. This is in preparation for the migration to ptradd, which requires
this form.
Part of the test updates were performed by Claude Code and reviewed by
me.
[AMDGPU] Add missing -wwm-regalloc=fast to 4 more tests (NFC)
Adding the missing wwm-regalloc=fast option in 4 more tests
that already specify -sgpr-regalloc=fast and -vgpr-regalloc=fast.
For consistency, the same preference should be applied to the
wwm-regalloc pipeline as well.
This is a follow-up to #184190 which addressed the same issue in
attr-amdgpu-flat-work-group-size-vgpr-limit.ll.
[DA] Remove isPeelFirst and isPeelLast (#183737)
`isPeelFirst` and `isPeelLast` are updated only in the Weak Zero SIV
tests, and no clients actually use them. Keeping these features while
fixing the existing defects in DA would add unnecessary complexity. If
they are unnecessary in the first place, it would be better to delete
them to mitigate maintenance burden.
[ORC] Make ElementSet, ContainerElementsMap inner classes. (#184955)
ElementSet and ContainerElementsMap were type aliases inside
WaitingOnGraph.
This commit replaces the aliases with classes deriving from DenseSet and
DenseMap, with convenience operations added for WaitingOnGraph (merge,
remove, remove_if, and visit). These convenience functions are used to
simplify the implementation of various parts of WaitingOnGraph.
Unit tests are added for the convenience operations to improve test
coverage.
In addition to improving readability of the main WaitingOnGraph
operations, this will make it easier to experiment with other underlying
representations for these types (e.g. sorted vectors).
[WebAssembly] Do not form minnum/maxnum (#184796)
For wasm, forming minnum/maxnum style ISD nodes is non-profitable,
because (in cases where any float min/max support exists at all), it has
pmin/pmax instructions that correspond to the fcmp+select semantics, or
relaxed_fmin/relaxed_fmax (for the nnan+nsz case) with even loser
semantics.
As such, return false from isProfitableToCombineMinNumMaxNum(), and also
respect that hook in the SDAGBuilder.
[RISCV][P-ext] Recognize vector shifts with splat build_vector shift amount. (#184909)
If the shift is created during LegalizeVectorOps, the shift amount
will be created as a build_vector. Splat_vector is formed by a later
DAGCombine. LegalizeVectorOps will visit the new shift before the
splat_vector can be created. Handle this case too
[DA] Fix the Weak Zero SIV tests when the coeff may be zero (#183736)
In the Weak Zero SIV tests, given two subscripts `{c0,+,a}` and `c1`,
when `c0 == c1`, the tests conclude that a dependency exists from the
former subscript at the first iteration to the latter subscript at every
iteration. However, this conclusion is correct only when `a` is not
zero, which was not being checked.
This patch adds non-zero checks for `a` in the Weak Zero SIV tests.
Fix the test cases added in #183735 .
[MemProf] Enhance thin link optimization remarks (#184829)
Don't require -memprof-report-hinted-sizes for emitting opt remarks
during the thin link step. Invoke the handling also when opt remarks are
enabled for MemProf per OptimizationRemarkEmitter::allowExtraAnalysis.
Also, add a fallback message if we don't have the context size
information, adding tests for those new messages.
I also realized we don't currently emit these messages for MemProf with
regular LTO, and added a TODO.
[llvm-ir2vec] Adding getFuncNames API to ir2vec python bindings (#180473)
This is more a user convenience thing. But I thought it helpful.
Otherwise, at the moment, the user has to fetch the entire embeddings
dict, just to see what all functions a module has
[LV] Remove branch on false in blend-costs.ll test. NFC (#184816)
I have a patch I want to post that improves blend masks, but it ends up
with a weird diff in this test stemming from the branch on false.
This replaces it with an external boolean. This should still test
scalarizing a blend which I believe is the original intent.
[ELF] Remove unused handleTlsRelocation (#184951)
Now that all targets use target-specific relocation scanning for TLS
(#181332 RISC-V being the last), handleTlsRelocation is unused.
[RISCV][P-ext] Select plui.h/w and improve usage of pli.b/h/w. (#184937)
This patch adds custom instruction selection of splat_vector of
constants. Rather that using the element size from the VT, find
the smallest splat size in the constant. This allow us to use
pli.b for i16 or i32 elements that contain a byte splat.
[mlir] Improve dialect conversion failure diagnostics (#182729)
This PR improves MLIR dialect conversion failure diagnostics when
legalization fails.
Previously, the diagnostic mostly included the operation name (and in
partial conversion, whether it was explicitly marked illegal). This
change keeps that prefix and appends the printed failing operation. This
provides immediate operand/result/type context directly in the same
error line.
### Example
Before:
```
failed to legalize operation 'test.type_consumer' that was explicitly marked illegal
```
After:
[6 lines not shown]