[JTS] Add statistics (#183431)
This patch adds some statistics to the jump-table-to-switch pass. This
will make it easier to see in aggregate how changing the profitability
heuristics impacts how often the optimization fires.
[ValueTracking] Extend computeConstantRange for add/sub, sext/zext/trunc
Recursively compute operand ranges for add/sub and propagate ranges
through sext/zext/trunc.
For add/sub, the computed range is intersected with any existing range
from setLimitsForBinOp, and NSW/NUW flags are used via addWithNoWrap/
subWithNoWrap to tighten bounds.
The motivation is to enable further folding of reduce.add expressions
in comparisons, where the result range can be bounded by the input
element ranges.
[Driver][SPIRV] Fix SPIR-V build for AMD.
The AMD path doesn't use spirv-link, and the driver was incorrectly adding flags for it, which broke the build.
[clang][bytecode] Attach block scope variables to the root scope (#183279)
... if we don't have a block scope available. This can happen in
`EvalEmitter` scenarios and can cause local variable blocks to be
prematurely converted to dead blocks. Attach `ScopeKind::Block` variable
to the root scope instead.
[mlir][arith] Add `nneg` to index_castui. (#183383)
Follow up to #183165
`nneg` is added to `arith.index_castui`.
> When the `nneg` flag is present, the operand is assumed to be
non-negative.
> In this case, zero extension is equivalent to sign extension. When
this
> assumption is violated, the result is poison.
* Updates op definition to add assembly format and `nneg` flag.
* Updates canonicalization patterns to take into account `nneg` in
`arith.index_castui`.
* Updates arith-to-llvm lowering to preserve `nneg` when lowering
`arith.index_castui` to `zext`
* Adds roundtrip, canonicalization, and lowering tests
[4 lines not shown]
[MLIR] Fix mlir-doc build failures by adding -dialect to add_mlir_doc calls
Add -dialect=<name> to all add_mlir_doc() calls that were missing it, fixing
failures after a8f2e80d5fe3 made findDialectToGenerate() require -dialect when
multiple dialects are present in a .td file.
Fix CI failure on Windows
The new test was failing on Windows, because it tries to call
`__cmpsf2`, which the generic builtins/comparesf2.c only defines
conditionally on `__ELF__`. Do the same in the test.
Fix CI failure on Windows
The new test was failing on Windows, because it tries to call
`__cmpdf2`, which the generic builtins/comparedf2.c only defines
conditionally on `__ELF__`. Do the same in the test.
[LLVM][SimplifyCFG] Allow switch-to-table for some vector constants. (#183057)
Only applies to fixed length vector constants that are made up of either
ConstantInt or ConstantFP elements.
[VPlan] Limit interleave group narrowing to consecutive wide loads.
Tighten check in canNarrowLoad to require consecutive wide loads; we
cannot properly narrow gathers at the moment.
Fixe https://github.com/llvm/llvm-project/issues/183345.
[clang][bytecode] Optimize `interp::Record` a bit (#183494)
And things around it.
Remove the `FieldMap`, since we can use the field's index instead and
only keep an array around. `reserve()` the sizes and use
`emplace_back()`.
[TableGen] Complete the support for artificial registers
Artificial registers were added in eb0c510ecde667cd911682cc1e855f73f341d134
as a means of giving super-registers heavier weights than that
of their subregisters, even when they only contain a single
physical subregister.
Artifical registers thus do exist in code and participate in
register unit weight calculations, but are not supposed to be
available for register allocation.
This patch completes the support for artificial registers to:
- Ignore artificial registers when joining register unit uber
sets. Artificial registers may be members of classes that
together include registers and their sub-registers, making it
impossible to compute normalised weights for uber sets they
belong to.
[28 lines not shown]
[flang][OpenMP] Inline CheckNestedBlock, NFC (#181732)
CheckNestedBlock no longer calls itself, which was the primary reason
for the code to be in a separate function.
[AMDGPU] Hoist WMMA coexecution hazard V_NOPs from loops to preheaders (#176895)
On GFX1250, V_NOPs inserted for WMMA coexecution hazards are placed at
the use-site. When the hazard-consuming instruction is inside a loop and
the WMMA is outside, these NOPs execute every iteration even though the
hazard only needs to be covered once.
This patch hoists the V_NOPs to the loop preheader, reducing executions
from N iterations to 1.
```
Example (assuming a hazard requiring K V_NOPs):
Before:
bb.0 (preheader): WMMA writes vgpr0
bb.1 (loop): V_NOP xK, VALU reads vgpr0, branch bb.1
-> K NOPs executed per iteration
After:
bb.0 (preheader): WMMA writes vgpr0, V_NOP xK
[12 lines not shown]
[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507)
Now that we have ExitingIVValue, we can also use it for tail-folded
loops; the only difference is that we have to compute the end value with
the original trip count instead the vector trip count.
This allows removing the induction increment operand only used when
tail-folding.
PR: https://github.com/llvm/llvm-project/pull/182507