[clang][bytecode] Attach block scope variables to the root scope (#183279)
... if we don't have a block scope available. This can happen in
`EvalEmitter` scenarios and can cause local variable blocks to be
prematurely converted to dead blocks. Attach `ScopeKind::Block` variable
to the root scope instead.
[mlir][arith] Add `nneg` to index_castui. (#183383)
Follow up to #183165
`nneg` is added to `arith.index_castui`.
> When the `nneg` flag is present, the operand is assumed to be
non-negative.
> In this case, zero extension is equivalent to sign extension. When
this
> assumption is violated, the result is poison.
* Updates op definition to add assembly format and `nneg` flag.
* Updates canonicalization patterns to take into account `nneg` in
`arith.index_castui`.
* Updates arith-to-llvm lowering to preserve `nneg` when lowering
`arith.index_castui` to `zext`
* Adds roundtrip, canonicalization, and lowering tests
[4 lines not shown]
[MLIR] Fix mlir-doc build failures by adding -dialect to add_mlir_doc calls
Add -dialect=<name> to all add_mlir_doc() calls that were missing it, fixing
failures after a8f2e80d5fe3 made findDialectToGenerate() require -dialect when
multiple dialects are present in a .td file.
Fix CI failure on Windows
The new test was failing on Windows, because it tries to call
`__cmpsf2`, which the generic builtins/comparesf2.c only defines
conditionally on `__ELF__`. Do the same in the test.
Fix CI failure on Windows
The new test was failing on Windows, because it tries to call
`__cmpdf2`, which the generic builtins/comparedf2.c only defines
conditionally on `__ELF__`. Do the same in the test.
[LLVM][SimplifyCFG] Allow switch-to-table for some vector constants. (#183057)
Only applies to fixed length vector constants that are made up of either
ConstantInt or ConstantFP elements.
[VPlan] Limit interleave group narrowing to consecutive wide loads.
Tighten check in canNarrowLoad to require consecutive wide loads; we
cannot properly narrow gathers at the moment.
Fixe https://github.com/llvm/llvm-project/issues/183345.
[clang][bytecode] Optimize `interp::Record` a bit (#183494)
And things around it.
Remove the `FieldMap`, since we can use the field's index instead and
only keep an array around. `reserve()` the sizes and use
`emplace_back()`.
[TableGen] Complete the support for artificial registers
Artificial registers were added in eb0c510ecde667cd911682cc1e855f73f341d134
as a means of giving super-registers heavier weights than that
of their subregisters, even when they only contain a single
physical subregister.
Artifical registers thus do exist in code and participate in
register unit weight calculations, but are not supposed to be
available for register allocation.
This patch completes the support for artificial registers to:
- Ignore artificial registers when joining register unit uber
sets. Artificial registers may be members of classes that
together include registers and their sub-registers, making it
impossible to compute normalised weights for uber sets they
belong to.
[28 lines not shown]
[flang][OpenMP] Inline CheckNestedBlock, NFC (#181732)
CheckNestedBlock no longer calls itself, which was the primary reason
for the code to be in a separate function.
[AMDGPU] Hoist WMMA coexecution hazard V_NOPs from loops to preheaders (#176895)
On GFX1250, V_NOPs inserted for WMMA coexecution hazards are placed at
the use-site. When the hazard-consuming instruction is inside a loop and
the WMMA is outside, these NOPs execute every iteration even though the
hazard only needs to be covered once.
This patch hoists the V_NOPs to the loop preheader, reducing executions
from N iterations to 1.
```
Example (assuming a hazard requiring K V_NOPs):
Before:
bb.0 (preheader): WMMA writes vgpr0
bb.1 (loop): V_NOP xK, VALU reads vgpr0, branch bb.1
-> K NOPs executed per iteration
After:
bb.0 (preheader): WMMA writes vgpr0, V_NOP xK
[12 lines not shown]
[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507)
Now that we have ExitingIVValue, we can also use it for tail-folded
loops; the only difference is that we have to compute the end value with
the original trip count instead the vector trip count.
This allows removing the induction increment operand only used when
tail-folding.
PR: https://github.com/llvm/llvm-project/pull/182507
AMDGPU: Stop adding uniform-work-group-size=false
This is one of the string attributes that takes a boolean
value for no reason. There is no point in ever writing this
with an explicit false. Stop adding the noise and reporting
an unnecessary change.
[SCEV] Introduce SCEVUse wrapper type (NFC)
Add SCEVUse as a PointerIntPair wrapper around const SCEV * to prepare
for storing additional per-use information.
This commit contains the mechanical changes of adding an intial SCEVUse
wrapper and updating all relevant interfaces to take SCEVUse. Note that
currently the integer part is never set, and all SCEVUses are
considered canonical.