LLVM/project db9132elldb/include/lldb/Interpreter OptionGroupVariable.h, lldb/source/Interpreter OptionGroupVariable.cpp

[lldb] Reformat OptionGroupVariable.{h,cpp}, NFC. (#192395)


This patch runs clang-format on OptionGroupVariable.{h,cpp}.
DeltaFile
+84-21lldb/source/Interpreter/OptionGroupVariable.cpp
+3-3lldb/include/lldb/Interpreter/OptionGroupVariable.h
+87-242 files

LLVM/project 53368bfclang/lib/CIR/CodeGen CIRGenAsm.cpp, clang/lib/CIR/Dialect/IR CIRDialect.cpp

[CIR] Fix InlineAsmOp roundtrip parse crash on cir.asm (#186588)

Fix InlineAsmOp parser/printer roundtrip for cir.asm and avoid null
operand_attrs entries that crash alias printing during
--verify-roundtrip.

- Parse attr-dict before optional result arrow to match print order.

- Use non-null sentinel attributes for non-maybe_memory operands and
check UnitAttr explicitly.

- Keep lowering semantics by treating only UnitAttr as maybe_memory
marker.

- Update inline-asm CIR IR test to run with --verify-roundtrip and add
an attr+result coverage case.

Fix https://github.com/llvm/llvm-project/issues/161441
DeltaFile
+15-1clang/test/CIR/IR/inline-asm.cir
+5-5clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+1-1clang/lib/CIR/CodeGen/CIRGenAsm.cpp
+1-1clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+22-84 files

LLVM/project a109303compiler-rt/lib/tysan tysan.cpp

[TySan] Set and cache tool name. (#192410)

Partial reland of [sanitizer common
support](https://github.com/llvm/llvm-project/pull/183310)
DeltaFile
+3-0compiler-rt/lib/tysan/tysan.cpp
+3-01 files

LLVM/project 3091b98clang/lib/CIR/Dialect/Transforms LoweringPrepare.cpp, clang/test/CIR/CodeGen global-array-dtor.cpp global-init.cpp

[CIR] Add noundef to __cxx_global_array_dtor parameter (#191529)

The synthetic __cxx_global_array_dtor helper created by
LoweringPrepare was missing noundef on its ptr parameter,
causing a mismatch with classic codegen.
DeltaFile
+10-0clang/lib/CIR/Dialect/Transforms/LoweringPrepare.cpp
+1-1clang/test/CIR/CodeGen/global-array-dtor.cpp
+1-1clang/test/CIR/CodeGen/global-init.cpp
+12-23 files

LLVM/project 796302aclang/test/CIR global-var-simple.cpp

[CIR][NFC] Remove redundant global-var-simple.cpp test (#192354)

This early smoke test is fully covered by
`clang/test/CIR/CodeGen/globals.cpp` and is no longer needed.

Per @andykaylor's feedback on #191521.

Made with [Cursor](https://cursor.com)
DeltaFile
+0-101clang/test/CIR/global-var-simple.cpp
+0-1011 files

LLVM/project fd8b58cmlir/test/Dialect/SPIRV/IR group-ops.mlir non-uniform-ops.mlir

[mlir][spirv][nfc] Move GroupNonUniformBallotBitCount tests to `non-uniform-ops.mlir` (#192115)

Tests were incorrectly placed in `group-ops.mlir` since the op is
defined in `SPIRVNonUniformOps.td`.
DeltaFile
+0-60mlir/test/Dialect/SPIRV/IR/group-ops.mlir
+60-0mlir/test/Dialect/SPIRV/IR/non-uniform-ops.mlir
+60-602 files

LLVM/project b9ae015llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

AMDGPU: Add NextUseAnalysis Pass (#178873)

Based on
- https://github.com/llvm/llvm-project/pull/156079 and
- https://github.com/llvm/llvm-project/pull/171520

See those PRs for background.

Provides a compatibility mode option
`--amdgpu-next-use-analysis-compatibility-mode` that produces results
that match either PR #156079 (`compute`) or PR #171520 (`graphics`).

Co-authored-by: alex-t <atimofee at amd.com>
Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou at amd.com>

---------

Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou at amd.com>
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-035 files not shown
+800,864-041 files

LLVM/project e90f463llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 copyable_reorder.ll operand-reorder-with-copyables.ll

[SLP] Normalize copyable operand order via majority voting

When building operands for entries with copyable elements, non-copyable
lanes of commutative ops may have inconsistent operand order (e.g. some
lanes have load,add while others have add,load). This prevents
VLOperands::reorder() from grouping consecutive loads on one side,
degrading downstream vectorization.
Add majority-voting normalization during buildOperands: track the
(ValueID, ValueID) pair frequency across non-copyable lanes and swap
any lane whose operand types are the exact inverse of the most common
pattern. This makes operand order consistent, enabling better load
grouping.
This is part 1 of #189181.

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/191631
DeltaFile
+58-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+9-18llvm/test/Transforms/SLPVectorizer/X86/copyable_reorder.ll
+11-9llvm/test/Transforms/SLPVectorizer/X86/operand-reorder-with-copyables.ll
+1-1llvm/test/Transforms/SLPVectorizer/X86/reused-last-instruction-in-split-node.ll
+1-1llvm/test/Transforms/SLPVectorizer/X86/bottom-to-top-reorder.ll
+80-295 files

LLVM/project 7328b74flang/lib/Optimizer/CodeGen CodeGen.cpp, flang/test/Lower/PowerPC ppc-vec-load-elem-order.f90 ppc-vec-load.f90

[flang] Handle ub.poison in lowering (#192454)

This patch is to add the UB dialect registration and UBToLLVM conversion
interface in lowering.
DeltaFile
+21-21flang/test/Lower/PowerPC/ppc-vec-load-elem-order.f90
+16-16flang/test/Lower/PowerPC/ppc-vec-load.f90
+6-6flang/test/Lower/PowerPC/ppc-vec-store-elem-order.f90
+5-5flang/test/Lower/PowerPC/ppc-vec-store.f90
+4-4flang/test/Lower/PowerPC/ppc-vec-convert.f90
+2-0flang/lib/Optimizer/CodeGen/CodeGen.cpp
+54-521 files not shown
+56-527 files

LLVM/project 9908a04llvm/test/CodeGen/AMDGPU maximumnum.bf16.ll minimumnum.bf16.ll, llvm/test/CodeGen/RISCV/rvv vfma-vp.ll

Rebase, simplify

Created using spr 1.3.7
DeltaFile
+18,621-0llvm/test/CodeGen/Thumb2/mve-clmul.ll
+4,582-5,914llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll
+6,877-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Ultra-sve-instructions.s
+3,326-2,794llvm/test/CodeGen/AMDGPU/maximumnum.bf16.ll
+3,326-2,794llvm/test/CodeGen/AMDGPU/minimumnum.bf16.ll
+5,336-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Ultra-writeback.s
+42,068-11,5022,068 files not shown
+204,878-67,6932,074 files

LLVM/project 9e45a7aclang/lib/CodeGen CGExpr.cpp, clang/test/OpenMP target_indirect_codegen.cpp

[clang][OpenMP] Fix __llvm_omp_indirect_call_lookup signature for targets with non-default program AS (#192470)

The argument and return value for `__llvm_omp_indirect_call_lookup` are
function pointers so make sure they are in the correct address space.

Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
DeltaFile
+5-4clang/lib/CodeGen/CGExpr.cpp
+3-2clang/test/OpenMP/target_indirect_codegen.cpp
+8-62 files

LLVM/project ffde06fllvm/include/llvm/Transforms Utils.h, llvm/include/llvm/Transforms/Utils StripConvergenceIntrinsics.h

[NFC][SPIRV] Move `SPIRVStripConvergenceIntrinsics` to Utils (#188537)

The `SPIRVStripConvergenceIntrinsic` pass was written as a spirv pass as
it is the currently the only target that emits convergence tokens during
codegen. There is nothing target specific to the pass, and, we plan to
emit convergence tokens when targeting DirectX (and all targets in
general), so move the pass to a common place.

The previous pass used temporary `Undef`s, as part of moving the pass we
can simply reverse the traverse order to remove the use of `Undef` as it
is deprecated.

Enables the pass for targeting DirectX and is a pre-req for:
https://github.com/llvm/llvm-project/pull/188792.

Assisted by: Github Copilot
DeltaFile
+99-0llvm/lib/Transforms/Utils/StripConvergenceIntrinsics.cpp
+0-86llvm/lib/Target/SPIRV/SPIRVStripConvergentIntrinsics.cpp
+79-0llvm/test/Transforms/StripConvergenceIntrinsics/basic.ll
+38-0llvm/test/CodeGen/DirectX/strip-convergence-intrinsics.ll
+29-0llvm/include/llvm/Transforms/Utils/StripConvergenceIntrinsics.h
+7-0llvm/include/llvm/Transforms/Utils.h
+252-8610 files not shown
+265-9316 files

LLVM/project 4dd1632

[MCP][NFC] Opinionated refactoring

There are a few minor inconsistencies across the pass which I found mildly
distracting:

* The use of `Def`/`Dest`/`Dst` to refer to the same thing
* Inconsistent declaration order of `Dst`/`Src` vs `Src`/`Dst`
* Lots of `->getReg()->asMCReg()`, and uses of `Register` when the pass
  is always running after RA anyway.
* Some places explicitly `assert(isCopyInstr)` while others just deref
  the `optional`.

Standardize on `Dst`/`Src` to match the metaphor and ordering of
`DestSourcePair`.

Assume `std::optional::operator*` will assert in any reasonable
implementation, even though this may technically be undefined behavior.
When asserts are disabled it would be anyway.


    [11 lines not shown]
DeltaFile
+0-00 files

LLVM/project 92faf52llvm/lib/CodeGen MachineCopyPropagation.cpp, llvm/test/CodeGen/X86 machine-copy-prop.mir

[MCP] Never eliminate frame-setup/destroy instructions

Presumably targets only insert frame instructions which are significant,
and there may be effects MCP doesn't model. Similar to reserved registers this
is probably overly conservative, but as this causes no codegen change in
any lit test I think it is benign.

The motivation is just to clean up #183149 for AMDGPU, as we can spill
to physical registers, and currently have to spill the EXEC mask purely
to enable debug-info.

Change-Id: I9ea4a09b34464c43322edd2900361bf635efd9f7
DeltaFile
+22-0llvm/test/CodeGen/X86/machine-copy-prop.mir
+11-5llvm/lib/CodeGen/MachineCopyPropagation.cpp
+33-52 files

LLVM/project 2f8d3ea

[MCP][NFC] Cleanup and prepare to preserve frame-setup/destroy

This mixes renames, removing redundant code, avoiding
`else`-after-`return`, etc. with factoring out the `isNeverRedundant`
concept.

Change-Id: I43a62a9415019cdd63c68fd3b915ebb7505d317a
DeltaFile
+0-00 files

LLVM/project dcfe195llvm/lib/CodeGen MachineCopyPropagation.cpp

[MCP][NFC] Opinionated refactoring (#186239)

There are a few minor inconsistencies across the pass which I found mildly distracting:

* The use of `Def`/`Dest`/`Dst` to refer to the same thing
* Inconsistent declaration order of `Dst`/`Src` vs `Src`/`Dst`
* Lots of `->getReg()->asMCReg()`, and uses of `Register` when the pass
  is always running after RA anyway.
* Some places explicitly `assert(isCopyInstr)` while others just deref
  the `optional`.

Standardize on `Dst`/`Src` to match the metaphor and ordering of
`DestSourcePair`.

Assume `std::optional::operator*` will assert in any reasonable
implementation, even though this may technically be undefined behavior.
When asserts are disabled it would be anyway.

The refactor uses structured bindings for a couple reasons:

    [9 lines not shown]
DeltaFile
+163-194llvm/lib/CodeGen/MachineCopyPropagation.cpp
+163-1941 files

LLVM/project 7aa2b04llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 stack-clash-dynamic-alloca.ll stack-clash-small-alloc-medium-align.ll

[X86] Use unsigned comparison for stack clash probing loop (#192355)

The stack clash probing loop generated in `EmitLoweredProbedAlloca` used
a signed comparison (`X86::COND_GE`) to determine when the allocation
target had been reached.

In 32-bit mode, memory addresses above `0x80000000` have the sign bit
set. If the stack pointer lands in this region, treating the addresses
as signed integers causes the comparison logic to fail. This leads to
incorrect loop execution, resulting in an infinite loop and a crash
(segmentation fault) when setting up custom stacks for pthreads mapped
above `0x80000000` in a 32b process.

This patch changes the condition code to `X86::COND_AE` (Above or
Equal), which generates an unsigned comparison. This ensures that
addresses are treated correctly as unsigned quantities on all targets.

On 64-bit systems, this change has no practical effect on valid
user-space addresses because they do not use the sign bit (being

    [4 lines not shown]
DeltaFile
+4-4llvm/test/CodeGen/X86/stack-clash-dynamic-alloca.ll
+2-2llvm/test/CodeGen/X86/stack-clash-small-alloc-medium-align.ll
+1-1llvm/lib/Target/X86/X86ISelLowering.cpp
+7-73 files

LLVM/project 1317890clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp, clang/test/CodeGen/AArch64 neon-intrinsics.c

[CIR][AArch64] Lower NEON vrsra_n intrinsics (#191129)

### Summary
Implement CIR lowering for all intrinsics in
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#vector-rounding-shift-right-and-accumulate

This PR references the implementation from the ClangIR incubator:
https://github.com/llvm/clangir/blob/main/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp#L4854

AArch64 does not provide a dedicated "rounding shift right by immediate"
instruction. Instead, the `SRSHL` / `URSHL` intrinsics take a signed
per-lane shift amount where a negative value means right shift, so an
immediate right shift by `n` is encoded as a signed vector splat of
`-n`. The three infrastructure changes below exist to support this
encoding at the call site:

- extends `emitNeonShiftVector` with a `neg` parameter so the
right-shift-as-negative-left-shift encoding is handled inside the
helper;

    [8 lines not shown]
DeltaFile
+317-0clang/test/CodeGen/AArch64/neon/intrinsics.c
+0-245clang/test/CodeGen/AArch64/neon-intrinsics.c
+71-21clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+388-2663 files

LLVM/project a81621allvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.perm.pk.ll

AMDGPU/GlobalISel: RegBankLegalize rules for perm_pk16_b{4,6,8}_u4 (#192368)
DeltaFile
+95-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.perm.pk.ll
+12-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+107-12 files

LLVM/project fbbb8c1flang/test/Evaluate rewrite09.f90

Remove empty trailing lines
DeltaFile
+0-2flang/test/Evaluate/rewrite09.f90
+0-21 files

LLVM/project 1b433e9mlir/include/mlir/Dialect/OpenACC OpenACCCGOps.td, mlir/lib/Dialect/OpenACC/IR OpenACCCG.cpp

[mlir][acc] Add canonicalization patterns for compute_region (#192376)

This PR improves the APIs for navigating through acc.compute_region
block arguments and also adds canonicalization patterns for those
arguments to remove unused ones and merge duplicates.
DeltaFile
+111-2mlir/lib/Dialect/OpenACC/IR/OpenACCCG.cpp
+78-0mlir/test/Dialect/OpenACC/compute-region-canonicalize.mlir
+17-2mlir/include/mlir/Dialect/OpenACC/OpenACCCGOps.td
+5-0mlir/unittests/Dialect/OpenACC/OpenACCUtilsCGTest.cpp
+2-0mlir/lib/Dialect/OpenACC/Utils/OpenACCUtils.cpp
+213-45 files

LLVM/project 5938bc1flang/test/Evaluate rewrite09.f90

Add test file
DeltaFile
+51-0flang/test/Evaluate/rewrite09.f90
+51-01 files

LLVM/project f2ad59eflang/lib/Evaluate fold-implementation.h, flang/test/Evaluate rewrite01.f90

[flang] Fold x + 0, 0 + x and x - 0 for INTEGER and UNSIGNED
DeltaFile
+30-0flang/lib/Evaluate/fold-implementation.h
+4-12flang/test/Lower/HLFIR/array-ctor-as-inlined-temp.f90
+1-3flang/test/Lower/HLFIR/array-ctor-as-elemental.f90
+1-3flang/test/Lower/HLFIR/array-ctor-as-runtime-temp.f90
+1-1flang/test/Evaluate/rewrite01.f90
+37-195 files

LLVM/project 547197dllvm/lib/Target/AMDGPU AMDGPUUnifyDivergentExitNodes.cpp AMDGPUTargetMachine.cpp

[NFC][AMDGPU] Rename AMDGPUUnifyDivergentExitNodes to AMDGPUUnifyDivergentExitNodesLegacy (#192399)

### Summary
This NFC patch renames the legacy pass wrapper class for
`AMDGPUUnifyDivergentExitNodes` to
`AMDGPUUnifyDivergentExitNodesLegacy`. This makes the old pass manager
wrapper explicit and avoids ambiguity. No behavior change is intended.
DeltaFile
+10-8llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPU.h
+12-103 files

LLVM/project 04cae92clang/lib/Basic/Targets SPIR.cpp, clang/test/Preprocessor predefined-macros.c

[SPIRV] Conditionally define `__AMDGCN_UNSAFE_FP_ATOMICS__` for AMDGCN flavoured SPIR-V (#192136)

Client apps rely on the `__AMDGCN_UNSAFE_FP_ATOMICS__` macro to guide
optimised execution pathways. We were not defining it for AMDGCN
flavoured SPIR-V, which led to pessimisation.
DeltaFile
+4-0clang/test/Preprocessor/predefined-macros.c
+3-0clang/lib/Basic/Targets/SPIR.cpp
+7-02 files

LLVM/project 8246715llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.add.min.max.ll

AMDGPU/GlobalISel: RegBankLegalize rules for add_min/max intrinsics (#192356)
DeltaFile
+12-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+1-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.add.min.max.ll
+13-12 files

LLVM/project 911f03ellvm/lib/Target/AArch64 AArch64AsmPrinter.cpp, llvm/test/CodeGen/AArch64 ptrauth-intrinsic-auth-resign-with-blend.ll

[AArch64][PAC] Rework the expansion of AUT/AUTPAC pseudos

Refactor `AArch64AsmPrinter::emitPtrauthAuthResign` to improve
readability and fix the conditions when `emitPtrauthDiscriminator` is
allowed to clobber AddrDisc.

* do not clobber `AUTAddrDisc` when computing `AUTDiscReg` on resigning
  if `AUTAddrDisc == PACAddrDisc`, as it would prevent passing raw,
  64-bit value as the new discriminator
* move the code computing `ShouldCheck` and `ShouldTrap` conditions to a
  separate function
DeltaFile
+66-41llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+67-10llvm/test/CodeGen/AArch64/ptrauth-intrinsic-auth-resign-with-blend.ll
+133-512 files

LLVM/project 4ee68c9llvm/lib/Target/AArch64 AArch64AsmPrinter.cpp

Improve readability
DeltaFile
+20-19llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+20-191 files

LLVM/project 9d923ecclang/lib/Headers altivec.h, clang/test/CodeGen/PowerPC builtins-ppc-deeply-compressed-weights.c

[PowerPC] Implement Deeply Compressed Weights Builtins (#184666)

Add support for the following deeply compressed weights builtins for ISA
Future.
- vec_uncompresshn(vector unsigned char, vector unsigned char)
- vec_uncompressln(vector unsigned char, vector unsigned char)
- vec_uncompresshb(vector unsigned char, vector unsigned char)
- vec_uncompresslb(vector unsigned char, vector unsigned char)
- vec_uncompresshh(vector unsigned char, vector unsigned char)
- vec_uncompresslh(vector unsigned char, vector unsigned char)
- vec_unpack_hsn_to_byte(vector unsigned char)
- vec_unpack_lsn_to_byte(vector unsigned char)
- vec_unpack_int4_to_bf16(vector unsigned char, uint2)
- vec_unpack_int8_to_bf16(vector unsigned char, uint1)
- vec_unpack_int4_to_fp32(vector unsigned char, uint3)
- vec_unpack_int8_to_fp32(vector unsigned char, uint2)

Assisted by AI.
DeltaFile
+244-0llvm/test/CodeGen/PowerPC/deeply-compressed-weights.ll
+194-0clang/test/CodeGen/PowerPC/builtins-ppc-deeply-compressed-weights.c
+58-0clang/lib/Headers/altivec.h
+54-0clang/test/Sema/builtins-ppc-deeply-compressed-weights-error.c
+36-12llvm/lib/Target/PowerPC/PPCInstrFuture.td
+30-0llvm/include/llvm/IR/IntrinsicsPowerPC.td
+616-122 files not shown
+650-128 files

LLVM/project f4e43c4llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize select-cmp-multiuse.ll epilog-vectorization-any-of-reductions.ll

[VPlan] Remove ComputeAnyOfResult, use ComputeReductionResult. (#190039)

ComputeAnyOfResult is simply a boolean OR reduction. Remove the
dedicated opcode and model directly via ComputeReductionResult.

This simplifies and unifies the code, as well as enabling trivial
constant folding.

PR: https://github.com/llvm/llvm-project/pull/190039
DeltaFile
+76-90llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+36-48llvm/test/Transforms/LoopVectorize/select-cmp-multiuse.ll
+7-9llvm/test/Transforms/LoopVectorize/epilog-vectorization-any-of-reductions.ll
+0-16llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+15-0llvm/lib/Transforms/Vectorize/VPlan.cpp
+10-4llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+144-1678 files not shown
+158-20614 files