[MLIR][Arith] Add canonicalization rules for int-to-float of integer extension (#185386)
Three patterns are valid but were missing:
1. `sitofp(extsi(x)) → sitofp(x)`: extsi preserves the sign and value,
so it represents the same signed integer as x.
2. `uitofp(extui(x)) → uitofp(x)`: same reasoning as above, but for
unsigned extension.
3. `sitofp(extui(x)) → uitofp(x)` extui zero-extends, so the extended
value is always non-negative. For non-negative integers, sitofp and
uitofp produce the same result, meaning we could replace the left
expression by `uitofp(extui(x))`. At this point rule 2. above can be
used to simplify further to `uitofp(x)`.
All three rewrites have been verified with Alive2.
[flang][OpenMP][DoConcurrent] Emit declare mapper for records (#179936)
Extends `do concurrent` device support by emitting compiler-generated
declare mapper ops for live-ins whose types are record types and have
allocatable members.
[mlir][affine] Fix crash in affine-super-vectorize for index constants inside loops (#184614)
When an arith.constant of index type is defined inside the loop body
being vectorized, vectorizeConstant creates a vector<Nxindex> constant
and registers it as the vector replacement. However,
getScalarValueReplacementsFor (used by vectorizeAffineStore to compute
indices for vector.transfer_write) looks only in the scalar replacement
map. With no scalar replacement registered for the index constant, it
falls back to the original scalar value, which is erased when the scalar
loop is cleaned up. This results in a "operation destroyed but still has
uses" crash.
Fix: when vectorizeConstant processes an index-typed constant, also
create a new scalar constant in the vector loop body and register it as
the scalar replacement. This ensures that memory operation index
computation can find a live value in the vectorized IR.
Fixes #122213
Assisted-by: Claude Code
[IR] Split Br into UncondBr and CondBr (#184027)
BranchInst currently represents both unconditional and conditional
branches. However, these are quite different operations that are often
handled separately. Therefore, split them into separate opcodes and
classes to allow distinguishing these operations in the type system.
Additionally, this also slightly improves compile-time performance.
[AMDGPU] Set preferred function alignment based on icache geometry (#183064)
Non-entry functions were unconditionally aligned to 4 bytes with no
architecture-specific preferred alignment, and setAlignment() was used
instead of ensureAlignment(), overwriting any explicit IR attributes.
Add instruction cache line size and fetch alignment data to GCNSubtarget
for each generation (GFX9: 64B/32B, GFX10: 64B/4B, GFX11+: 128B/4B). Use
this to call setPrefFunctionAlignment() in SITargetLowering, aligning
non-entry functions to the cache line size by default. Change
setAlignment to ensureAlignment in AMDGPUAsmPrinter so explicit IR align
attributes are respected.
Empirical thread trace analysis on gfx942, gfx1030, gfx1100, and gfx1200
showed that only GFX9 exhibits measurable fetch stalls when functions
cross the 32-byte fetch window boundary. GFX10+ showed no alignment
sensitivity. A hidden option -amdgpu-align-functions-for-fetch-only is
provided to use the fetch granularity instead of cache line size.
Assisted-by: Claude Opus
[X86] LowerINTRINSIC_W_CHAIN - ensure the X86ISD::CMPCCXADD X86CondCode is a i8 target constant (#185856)
Fixes verification failure in X86SelectionDAGInfo::verifyTargetNode (#185649)
[Clang] Fix ICE in constraint normalization when substituting concept template parameters (#184406)
23341c3d139b889e8c46867f8d704ab3c22b51f8 introduced
`SubstituteConceptsInConstraintExpression` to substitute non-dependent
concept template arguments into a concept's constraint expression during
normalization, as part of the P2841R7 implementation
([temp.constr.normal]/1.4).
The `ConstraintExprTransformer` added in that commit overrides
`TransformTemplateArgument` to only transform concept-related arguments
and preserve all others. However, `TransformUnresolvedLookupExpr` called
`Sema::SubstExpr`, which creates a separate `TemplateInstantiator` that
performs full substitution bypassing the selective override entirely.
This caused all template parameters in the constraint expression to be
substituted using the concept's MLTAL. For example, given:
```cpp
template <class A, template <typename...> concept C>
[22 lines not shown]
[LLVM][CodeGen][SVE] Refactor isel of 128-bit constant splats. (#185652)
Rather than lower constant splats that only SVE supports to scalable
vectors this patch maintains the use of fixed length vectors but adds
isel patterns to select the necessary SVE instructions.
Doing this means we can extend coverage to include SVE operations that
take an immediate operand without needing to convert more of the DAG to
scalable vectors, which can potentially prevent larger NEON patterns
from matching.