[SLP] Normalize copyable operand order to group loads for better vectorization
When building operands for entries with copyable elements, non-copyable
lanes may have inconsistent operand order (e.g., some lanes have
load,add while others have add,load for commutative ops). This prevents
VLOperands::reorder() from grouping consecutive loads on one side,
degrading downstream vectorization.
Normalize in two steps during buildOperands:
1) Majority voting: swap lanes that are the exact inverse of the
majority operand-type pattern.
2) Load preference: if the majority pattern has loads at OpIdx 1
(strict majority), swap to put loads at OpIdx 0, enabling
vector load + copyable patterns.
Recommit after revert at c2139f13606f0be8d09fa82a28e85dba4c3478dd, added
a check for commutative operations for reorder.
Original Pull Request: https://github.com/llvm/llvm-project/pull/189181
[2 lines not shown]
[RISCV] Further improved exact VLEN lowering for mul reductions (#192688)
This is a follow up to 973a05ed. When we have a horizontal multiply
reduction at high LMUL and we have exact knowledge of VLEN, we can
extract the individual m1 sub-vectors and perform the entire reduction
tree at m1. This reduces the work performed (by not performing high LMUL
operations on a vectors with empty tails), and decreases register
pressure. Interestingly, we don't even increase dynamic instruction
count as the register alignment of the original LMUL forced the use of
whole register moves in the tree reduction anyways. (In the non-exact
case, these are vslidedown instructions, and are required.)
Originally written by Claude Code, heavily revised by me.
[SPIR-V] Deduce argument types before doing GEP (#193046)
In SPIRVEmitIntrinsics, we try to get the type for a GEP by looking at
the type of the input pointer, and deducing the output pointer type from
it. If the input pointer is a function parameter, we do not have the
type
available yet because we deduce the type of the parameters after
processing the GEPs.
There is no reason to have that order. Moving the parameter passes
earlier so the GEP type deduction works.
The same test exposed a problem with function parameter attributes. They
require Kernel, so we no longer emit them when creating a shader.
<!-- branch-stack-start -->
<!-- branch-stack-end -->
[SPIR-V] Handle constant expression uses of PushConstant globals (#193005)
When lowering globals, users that are constant
expressions cannot be rewritten by the push constant access path
because a simple replacement with the result of a call to an intrinsic
will no longer be a constant. The uses of the GV that are constant
expression must be replaced with instruction before trying to rewrite
them.
[LIT] Add -nostdinc so system headers aren't searched with implicit module maps (#192125)
These lit tests are implicitly loading the module maps found in the
directories found in the search path. On z/OS this ends up trying to
load an invalid module map that is never used with the actual clang
compilation. Add `-nostdinc` to avoid searching dirs outside of the ones
being tested.
[mlir][arith] Add support for `arith.flush_denormals` emulation (#192660)
Add lowering pattern and a new pass `arith-expand-flush-denormals` that
rewrites `arith.flush_denormals` ops with integer arithmetics. This
lowering is useful for target architectures that cannot pattern-match
`arith.flush_denormals` + other FP arithmetics into special instructions
with FTZ semantics.
Assisted-by: claude-opus-4.7-thinking-high
Depends on #192641.
[X86][clang-cl] Make AVX10.2 map to the same target-cpu as AVX10.1 (#193147)
Diamondrapids contains a large feature set APX, which should not be
enabled by AVX10.2
[DAG] Reassociate (add (add X, Y), X) --> add(add(X, X), Y) (#162242)
Attempt to bring together self-additions, to help with folding to shift/mul/address patterns
[runtimes] Protect use of undefined CMAKE_Fortran_COMPILER (#193210)
Unlike everything else in CMake, cmake_path does not assume a default
value for undefined variables, but instead throws an error:
```
CMake Error at cmake/config-Fortran.cmake:77 (cmake_path):
cmake_path undefined variable for input path.
Call Stack (most recent call first):
CMakeLists.txt:284 (include)
```
Protect the use of cmake_path to not trigger this error when
CMAKE_Fortran_COMPILER is undefined.
Fixes the flang-aarch64-out-of-tree buildbot after #171610.
[Polly] Disable PCH reuse for unit tests (#193209)
Polly library targets already disable PCH reuse because Polly
unconditionally builds with -fno-rtti and -fno-exceptions. Reusing LLVM
PCHs that were built with RTTI or exceptions enabled is incompatible
with Clang when compiling Polly targets under those flags.
After 47eb8b43c990 enabled PCH reuse for unit tests, Polly unit tests
can hit the same mismatch as the library targets. Pass DISABLE_PCH_REUSE
through the shared add_polly_unittest wrapper so all Polly unit tests
follow the existing Polly target policy.
cc @aengelke -- a minor fix for polly.
[CIR][NFCI] Remove 'isConstant' from getCIRLinkageForX (#193100)
This variable has since disappeared from classic compiler, and we
weren't using it anywhere anyway. This patch gets us back in sync with
the classic codegen for these interfaces.
[AMDGPU] Multi dword spilling for unaligned tuples (#183701)
While spilling unaligned tuples, rather than breaking the
spill into 32-bit accesses, spill the first register as a single
32-bit spill, and spill the remainder of the tuple as an aligned tuple.
Some additional bookkeeping is required in the spilling
loop to manage the state.
References: https://github.com/llvm/llvm-project/pull/177317
[llvm-cov] Fix error propagation in CoverageMapping::load() (#193197)
Fix a subtle issue on the error path: if loadFromFile() fails there is no error to consume.
[InstCombine] fold fabs(uitofp(i16 a) - uitofp(i16 b)) < 1.0 to a == b (#191378)
Fixes: https://github.com/llvm/llvm-project/issues/187088
When a and b are types with bitwidth (16 bits) smaller than the mantissa
for float32 (24 bits), they will be exact and their absolute difference
would be integral ±1 or greater if a != b. On the corollary, if their
difference is < 1.0, this implies that a = b.
This patch exploits this fact to fold the expression to just a single
icmp.
Revert "[clang-tidy][NFC] add numeric include for transform_reduce" (#193200)
After experiment, this didn't fix the build failure. So revert this to
keep the trunk clean.
Reverts llvm/llvm-project#193165
[mlir][arith] Add `arith.flush_denormals` operation (#192641)
Add a new `arith.flush_denormals` operation. The operation takes a
floating-point value as input and returns zero if the value is denormal.
If the input is not denormal, the operation passes through the input.
This commit also adds support to the `ArithToAPFloat` infrastructure.
Running example:
```mlir
%flush_a = arith.flush_denormals %a : f32
%flush_b = arith.flush_denormals %b : f32
%res = arith.addf %flush_a, %flush_b : f32
%flush_res = arith.flush_denormals %res : f32
```
The exact lowering path depends on the backend and is not implemented as
part of this PR:
- Per-instruction mode. E.g., on NVIDIA architectures, the above example
can lower to `add.ftz.f32 dest, a, b`.
[11 lines not shown]