[mlir][arith] Add `arith.flush_denormals` operation (#192641)
Add a new `arith.flush_denormals` operation. The operation takes a
floating-point value as input and returns zero if the value is denormal.
If the input is not denormal, the operation passes through the input.
This commit also adds support to the `ArithToAPFloat` infrastructure.
Running example:
```mlir
%flush_a = arith.flush_denormals %a : f32
%flush_b = arith.flush_denormals %b : f32
%res = arith.addf %flush_a, %flush_b : f32
%flush_res = arith.flush_denormals %res : f32
```
The exact lowering path depends on the backend and is not implemented as
part of this PR:
- Per-instruction mode. E.g., on NVIDIA architectures, the above example
can lower to `add.ftz.f32 dest, a, b`.
[11 lines not shown]
[mlir] Add option to run CSE between greedy rewriter iterations (#193081)
The greedy pattern rewrite driver previously only deduplicated constant
ops between iterations (via the operation folder). Structurally
identical non-constant subexpressions remained distinct SSA values,
blocking fold patterns that only fire when operands match. Reaching the
true fixpoint required chaining an external `cse,canonicalize,...`
pipeline.
Add an opt-in `cseBetweenIterations` flag on `GreedyRewriteConfig` that
runs full CSE on the scoped region after each pattern-application
iteration, and surface it as a `cse-between-iterations` option on the
canonicalizer pass. Off by default to preserve existing performance
characteristics.
Assisted-by: Claude Code
[AMDGPU] Prefer mul24 over mad24 on SDWA targets (#193033)
If either of a mul24's operands can potentially fold into a SDWA
pattern, then don't fold into a mad24 node (which doesn't have SDWA
variants).
Fixes regressions I first noticed in #162242 - but turns out its an
older problem
[DAG] Add Srl combine for extracting last element of BUILD_VECTOR (#181412)
While working on another combine, I noticed some redundant zext shift
pairs `v_lshrrev_b32 + v_lshlrev_b32` coming from a `build_vector(undef,
x)` created by `TargetLowering::SimplifyDemandedBits` and a `srl`
created by `lowerEXTRACT_VECTOR_ELT`.
[SystemZ] Fix wrong mask for float vec_insert (#192967)
This commit fixes an error in vec_insert, where the index masking
effectively made the last two float elements of a vector non-insertable.
co-authored-by: @Andreas-Krebbel
[runtimes][CMake] Move Fortran support code from flang-rt (#171610)
Common CMake code to be used by flang-rt and openmp to emit Flang module
files. Most of the code is not yet used within this PR.
Extracted out of #171515 for review by @petrhosek.
[AArch64][llvm] Remove support for FEAT_MPAMv2_VID
`FEAT_MPAMv2_VID` instructions and system registers, as introduced
in change d30f18d2c, are being removed at this time, as they've been
removed from the latest Arm ARM, which doesn't preclude them returning
in some form in future.
Other system registers introduced with `FEAT_MPAMv2` are unaffected,
and these continue to be ungated, but since `+mpamv2` gating is now empty,
I'm removing this superfluous gating code.
[Attributor] Clarify volatile null pointer behavior (NFCI) (#193190)
The comment was referring to volatile stores in particular, which
are specified as non-willreturn. However, allowing volatile accesses
on null (independently of null_pointer_is_valid) is a general
provision that is independent of the access kind.
The actual behavior was still correct, because volatile loads are
considered as writing inaccessible memory, so the mayWriteToMemory()
check was ultimately redundant.
Add a test to make sure volatile load is handled correctly.
[CIR] Make array decay and get_element op perserve address spaces (#192361)
This patch makes sure that the maybeBuildArrayDecay function takes
address spaces into account and makes the get_element op preserve the
address space of the base pointer.
Assisted-by: Cursor / claude-4.6-opus-high
[AMDGPU] Unmark wave reduce intrinsics for constant folding (#193142)
The `add`, `sub`, and `xor` wave reduction intrinsics
cannot be constant folded, as `add` and `sub` need
to be multipled by the number of active lanes, and
`xor` depends on the parity of the number of
active lanes.
AMDGPU/GlobalISel: RegbankLegalize rules for merge-like opcodes (#193026)
Move RegbankLegalize handling for G_BUILD_VECTOR, G_MERGE_VALUES and
G_CONCAT_VECTORS from AMDGPURegBankLegalize to AMDGPURegBankLegalizeRules
by implementing rules for all supported types.
[libc++] Fix any.cpp not compiling with the minimum header version >= 7 (#193183)
The namespace was accidentally closed outside the header version check
while it was opened inside the check. This moves the closing code into
the check.
[LICM] Remove unnecessary check during store hoisting (#187529)
When hoisting stores, we check for interfering uses. This is done
by getting the clobbering def for the use and checking whether it
is outside the loop, which implies that no store in the loop can
interfere with it.
However, in addition to that, we check that the memory use does
not occur before the store. I believe that this additional check
is unnecessary, as if the use could be affected by the store, the
clobber walk would have pointed to the memory phi, not outside the
loop.
I think this check was added because MemorySSA had trouble with
loop-carried dependencies in the past (like in #54682), but this
should no longer be a problem.
This allows store hoisting in cases where there are unrelated
loads before the store.
[llvm] Errorize DebuginfodFetcher for inspection at call-sites (#191191)
Failure to fetch debuginfod is rarely an error, but there are cases where
we want to distinguish error reasons down the line, for example in order
to test connection timeouts.