[CIR] Make the -save-temps flag emit .cir and .mlir files (#186814)
This patch makes ClangIR emit .cir and .mlir files when the-save-temps
flag is specified. Having these files emitted is useful e.g. when
inspecting the generated code for OpenMP offloading.
Co-authored-by: Claude Opus 4.6 noreply at anthropic.com
[Flang] - Fix AliasAnalysis to preserve Allocate source kind through box loads (#187152)
When a boxed array is privatized via `omp.private`, the `SourceKind` of
the loaded box data was being misclassified as `SourceKind::Indirect` by
the alias analyzer. Instead its `SourceKind::Allocate` should be
preserved. This caused AliasAnalysis to conservatively return `MayAlias`
for accesses to privatized arrays vs dummy arguments. This prevented
InlineHLFIRAssign from inlining array section assignments.
Propagate the Allocate source kind when the box source is classified as
`Allocate`, so that alias analysis correctly returns `NoAlias`.
Uses/cabal.mk: cd into WRKDIR before calling 'cabal update' during cabal-extract
This prevents cabal from picking the Makefile.cabal file
Reported by: alven
[AArch64][PAC] Reset `killed` operand flag in fixupPtrauthDiscriminator
Conservatively reset `killed` flag on the AddrDisc operand when it is
updated by fixupPtrauthDiscriminator function.
Fix select-best-vf-tripcount.ll buildbot failure
This test failed on the llvm-clang-win-x-aarch64 buildbot.
It seems the rounding is different, leading to a different output.
Instead of:
Cost for VF 4: 9 (Estimated cost per lane: 2.2)
The windows buildbot it fails because the test output is:
Cost for VF 4: 9 (Estimated cost per lane: 2.3)
[CFG] Support CycleInfo in isPotentiallyReachable() (#187681)
Essentially do the same thing as for LoopInfo. Anything inside a cycle
is mutually reachable, and the cycle can be replaced by its exit blocks
in the walk.
An interesting additional thing we could do for CycleInfo (but not
LoopInfo) is to early exit the walk if the stop block is not in a cycle
and dominates the start block. I've not included this in this patch to
keep the implementation the same as for LoopInfo to start with.
[AMDGPU] Shrink S_MOV_B64 to S_MOV_B32 during rematerialization (#184333)
When rematerializing S_MOV_B64 or S_MOV_B64_IMM_PSEUDO and only a single
32-bit lane of the result is used at the remat point, emit S_MOV_B32
with the appropriate half of the 64-bit immediate instead.
This reduces register pressure by defining a 32-bit register instead of
a 64-bit pair when the other half is unused.
Fix a brain-f*rt in the special mac68k "nofault" bus error handling.
The information we need to pass along is packaged up for us neatly
in the stack frame and arguments being passed to trap(), so use those
to extract the %a2 value and faulting address.
Issue raised and fix tested by nat@
[X86] Use GFNI for vXi8 per-element shifts (#89644)
As detailed here:
https://github.com/InstLatx64/InstLatX64_Demo/blob/master/GFNI_Demo.h
These are a bit more complicated than gf2p8affine look ups, requiring us
to convert a SHL shift value / amount into a GF so we can perform a
multiplication. SRL/SRA need to be converted to SHL via
bitreverse/variable-sign-extension.
Followup to #89115
[mlir][spirv] Add reduction ops in TOSA Ext Inst Set (#187278)
This patch introduces the following reduction operators:
spirv.Tosa.ReduceAll
spirv.Tosa.ReduceAny
spirv.Tosa.ReduceMax
spirv.Tosa.ReduceMin
spirv.Tosa.ReduceProduct
spirv.Tosa.ReduceSum
Also dialect and serialization round-trip tests have been added.
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>