[SLP] Fix misvectorization in commutative to non-commutative conversion (#185230)
**Summary**
Fixes a miscompilation where commutative operations (e.g., or, and, mul)
with a left-hand side constant were incorrectly transformed into
non-commutative operations (e.g., shl, sub).
**The Problem**
In `BinOpSameOpcodeHelper::getOperand`, when a constant is at `Pos ==
0`, the helper was failing to swap operand order for new non-commutative
target opcodes. This resulted in inverted logic, such as transforming
`or 0, %x` into `shl 0, %x` (resulting in 0) instead of the correct `%x
<< 0`.
**The Fix**
The existing logic only protected the Sub opcode. This patch generalizes
the fix to all non-commutative instructions by using
`!Instruction::isCommutative(ToOpcode)`. This ensures that for any
directional operation, the variable is correctly placed on the LHS and
[9 lines not shown]
[lldb] Temporarily remove the assert to unblock the bots (#185505)
The assert is triggering from Dexter in the cross-project-test. This
temporarily removes the assert while I address the issue.
[RFC][mlir] Resource hierarchy for MLIR Side Effects. (#181229)
This patch allows creating a hierarchy of `SideEffects::Resource`s by adding
a virtual `getParent()` method, so that effects on *disjoint* resources
can be proven non-conflicting. It also adds virtual `isAddressable()` method
that represents a property of a resource to be addressable via a pointer
value. The non-addressable resources may not be affected via any pointer.
This is unblocking CSE, LICM and alias analysis without per-pass
special-casing.
RFC:
https://discourse.llvm.org/t/rfc-mlir-memory-region-hierarchy-for-mlir-side-effects/89811
[lldb] Assert & fix missing calls to UnregisterPlugin (#185162)
Fix missing calls to UnregisterPlugin and add an assert in the
PluginManager that ensures all plugins have been unregistered by the
time the plugin manager is destroyed.
[WebAssembly] Clang support for acquire-release atomics (#184901)
Add the feature to the clang target, including driver flags and
preprocessor defines.
Centralize prefetch target storage in MachineFunction. (#184194)
### Prefetch Symbol Resolution
Based on this
[suggestion](https://discourse.llvm.org/t/rfc-code-prefetch-insertion/88668/29?u=rlavaee),
we must identify if a prefetch target is defined in the current module
to avoid **undefined symbol errors**. Since this occurs during
sequential **CodeGen**, we must rely on function names rather than IR
Module APIs.
**Key Changes:**
* **`MachineFunction` Integration:** Added a `PrefetchTargets` field
(with serialization) to track all targets associated with a function.
* **Guaranteed Emission:** All prefetch targets are now emitted
regardless of basic block or callsite index matches to ensure the symbol
exists.
* **Fallback Placement:** Targets with non-matching callsite indices are
emitted at the end of the block to resolve the reference.
[AMDGPU] Cgscc amdgpu attributor boilerplate NFC (#179719)
This PR is adding a boilerplate of CGSCC AMDGPUAttributor pass
(amdgpu-attributor-cgscc) by doing refactoring from the existing Module
AMDGPUAttributor pass (amdgpu-attributor).
CGSCC AMDGPUAttributor pass sets `AttributorConfig.IsModulePass =
false`, and make Attributor's `Functions` set contain only functions in
a SCC.
The main implementations of abstract attributes have not changed - NFC.
Subsequently, in future work some of the AMDGPU abstract attributors
might move to be handled by CGSCC pass.
---------
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
[flang][cuda] Relax host intrinsic semantic check in acc routine (#185483)
Semantic check that checks if any actual argument is on the device
doesn't need to be active in acc routine function/subroutine.
[CIR] Change CmpOp assembly format to use bare keyword style (#185114)
Update the assembly format of cir.cmp from the parenthesized style
`cir.cmp(gt, %a, %b) : !s32i, !cir.bool`
to the bare keyword style used by other CIR ops like cir.cast:
`cir.cmp gt %a, %b : !s32i`
The result type (!cir.bool) is now automatically inferred as it is
always cir::BoolType.
[AArch64] Add partial reduce patterns for new fdot instructions (#184659)
This patch enables generation of new dot instruction added in under
FEAT_F16F32DOT from partial reduce nodes.
[HLSL] Implement Texture2D default template (#184207)
The Texture2D type has a default template of float4. This can be written
in a couple way: `Texture2D<>` or `Texture2D`. This must be implemented
for consistenty with DXC in HLSL202x.
To implement `Texture2D<>` we simply add a default type for the template
parameter.
To implement `Texture2D`, we have to add a special case for a template
type without a template instantiation. For HLSL, we check if it is a
texture type. If so, the default type is filled in.
Note that HLSL202x does not support C++17 Class Template Argument
Deduction, so we cannot use that feature to give us `Texture2D`.
See https://github.com/llvm/wg-hlsl/pull/386 for alternatives that were
considered.
[13 lines not shown]
[clang][ssaf] Implement Entity Linker CLI and patching for JSON Format
This PR implements Entity ID patching for the JSON serialization format
and introduces `ssaf-linker`, a command-line tool that drives the
`EntityLinker`.
1. Entity ID references inside summary blobs use the sentinel
representation `{"@": <uint64>}`. Patching walks the JSON value tree
recursively, recognizes sentinels, and rewrites their indices using the
`EntityResolutionTable` provided by the linker.
2. An object with an `@` key but extra keys `(size != 1)`, an `@` value
that is not a valid `uint64`, and an entity ID not present in the
resolution table, lead to patching errors.
3. `JSONFormat::EntityIdConverter` is replaced with two `function_ref`
typedefs to eliminate the wrapper class.
4.`ssaf-linker` is implemented in `clang/tools/ssaf-linker/` and gets
built at `bin/ssaf-linker`.
5. lit tests check CLI, verbose output, timing output, validation
errors, I/O errors, linking errors, and successful linking.
rdar://162570931
[mlir][OpenACC] Normalize loop bounds in convertACCLoopToSCFFor for negative steps (#184935)
`convertACCLoopToSCFFor` was passing `acc.loop` bounds directly to
`scf.for`, which produces an `scf.for` with a negative step when the
source is a Fortran DO loop counting down (e.g. `DO k = n, 1, -1`).
Since `scf.for` requires a positive step, this generated invalid IR that
caused downstream crashes during LLVM lowering.
`convertACCLoopToSCFParallel` already normalizes all loops
unconditionally to `lb=0, step=1, ub=tripCount`, but
`convertACCLoopToSCFFor` did not. This patch applies the same
normalization to `convertACCLoopToSCFFor`, with IV denormalization in
the loop body (`original_iv = normalized_iv * orig_step + orig_lb`), and
lets later passes fold away constants.
[libc] Use unsigned char in strcmp, strncmp, and strcoll comparisons (#185393)
According to section 7.24.1 of the C standard, character comparison in
string functions must be performed as if the characters had the type
`unsigned char`.
The previous implementations of `strcmp`, `strncmp`, and `strcoll` were
doing a direct subtraction of `char` values. On platforms where `char`
is signed, this resulted in incorrect negative values being returned
when characters exceeding 127 were being compared.
This patch fixes the comparison functions to explicitly cast the
character values to `unsigned char` prior to computing their difference.
It also adds regression tests to ensure the comparison behaves correctly
for ASCII values greater than 127.
AMDGPU: Annotate group size ABI loads with range metadata (#185420)
We previously did the same for the grid size when annotated.
The group size is easier, so it's weird that this wasn't implemented
first.
[HLSL][Matrix] Make HLSLElementwiseCast respect matrix memory layout (#184429)
Fixes #184379
Changes the implementation of HLSLElementwiseCast to respect matrix
memory layout.
The new implementation reads from the `LoadList` array in row-major
order as opposed to column-major in the old implementation, which makes
more sense because `LoadList` is always interpreted in row-major order
when read as a matrix.
The writes to the allocation `V` for the destination matrix now respects
the default matrix memory layout.
Assisted-by: claude-opus-4.6
[CIR] Fix try_call replacement for indirect calls (#185095)
We had a bug in the FlattenCFG pass where if an indirect call occurred
within a cleanup scope that required exception handling, the indirect
callee was not being preserved in the cir.try_call. This fixes that.