LLVM/project 59b7821llvm/lib/Transforms/Utils LowerMemIntrinsics.cpp

[NFC][LowerMemIntrinsics] Consistent parameter name comments in function calls

The `/*ParamName=*/foo` syntax is prescribed by the coding standards:
https://llvm.org/docs/CodingStandards.html#comment-formatting
DeltaFile
+53-53llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+53-531 files

LLVM/project d24a675llvm/lib/Transforms/Utils LowerMemIntrinsics.cpp, llvm/test/CodeGen/AMDGPU memset-param-combinations.ll memintrinsic-unroll.ll

[LowerMemIntrinsics] Optimize memset lowering (#169040)

This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.

This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.

I'm planning similar treatment for memset.pattern as a follow-up PR.

For SWDEV-543208.
DeltaFile
+1,896-0llvm/test/CodeGen/AMDGPU/memset-param-combinations.ll
+1,616-0llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+686-90llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
+218-116llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-mem-transfer.ll
+220-9llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+103-11llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memset.ll
+4,739-22612 files not shown
+4,898-32518 files

LLVM/project a13c6eallvm/lib/CodeGen ExpandPostRAPseudos.cpp

[CodeGen] Simplify ExpandPostRA::LowerSubregToReg. NFC. (#179634)

SUBREG_TO_REG always has a non-zero subreg index so DstSubReg can never
be the same as DstReg.
DeltaFile
+12-24llvm/lib/CodeGen/ExpandPostRAPseudos.cpp
+12-241 files

LLVM/project 8c3d22allvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Remove `+xs` gating for `tlbip *nxs` instructions

A recent specification update has removed FEAT_XS gating for `tlbip *nxs`
instructions. It remains gated on FEAT_XS for `tlbi *nxs` instructions.
DeltaFile
+6-16llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+8-9llvm/test/MC/AArch64/armv9a-sysp.s
+0-8llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+2-2llvm/test/MC/AArch64/tlbip-tlbid-or-d128.s
+1-2llvm/lib/Target/AArch64/AArch64SystemOperands.td
+17-375 files

LLVM/project e40b430llvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128

Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:

```
  All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
  that are currently dependent on FEAT_D128 are updated to be dependent
  on FEAT_D128 or FEAT_TLBID
```
DeltaFile
+259-0llvm/test/MC/AArch64/tlbip-tlbid-or-d128.s
+66-66llvm/test/MC/AArch64/armv9a-sysp.s
+14-6llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+20-0llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+11-2llvm/lib/Target/AArch64/AArch64SystemOperands.td
+370-745 files

LLVM/project 26b9cbellvm/test/MC/AArch64 directive-arch_extension-negative.s

fixup!

Adjust directive-arch_extension-negative.s
DeltaFile
+0-2llvm/test/MC/AArch64/directive-arch_extension-negative.s
+0-21 files

LLVM/project 761f1e2llvm/lib/Target/AArch64 AArch64InstrInfo.td AArch64SystemOperands.td, llvm/test/MC/AArch64 armv9a-sysp.s armv9-mrrs.s

[AArch64][llvm] Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions

Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions.

We removed gating for `sys`, `mrs` and `mrs` instructions previously,
on the basis that it doesn't add value, as it doesn't indicate that
any particular system registers or system instructions are available.

Therefore, remove `+d128` gating for these too.

(In an upcoming change, some `tlbip` instructions, which are `sysp` aliases
are allowed to be used with either `+d128` or `tlbid`. If we don't remove
this gating, then it would require some ugly work-arounds in the code to
support the relaxation mandated by the 2025 MemSys specification.

In this change, retain `+d128` gating for all `tlbip` instructions, which
will then be loosened to either `+d128` or `+tlbid` in a subsequent change)
DeltaFile
+122-196llvm/test/MC/AArch64/armv9a-sysp.s
+7-97llvm/test/MC/AArch64/armv9-mrrs.s
+42-46llvm/lib/Target/AArch64/AArch64InstrInfo.td
+7-53llvm/test/MC/AArch64/armv9-msrr.s
+4-2llvm/lib/Target/AArch64/AArch64SystemOperands.td
+2-3llvm/test/MC/AArch64/directive-arch_extension-negative.s
+184-3973 files not shown
+190-3989 files

LLVM/project 0fe9454llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining-avx512f.ll legalize-vec-assertzext.ll

[X86] Fold EXPAND(X,Y,M) -> SELECT(M,X,Y) when M is a lowest bit mask (#179630)

If a EXPAND node mask is just the lowest bits, then we can replace it
with a more general SELECT node, which can be cheaper and potentially
allow predication.

Fixes #179008
DeltaFile
+9-0llvm/lib/Target/X86/X86ISelLowering.cpp
+4-4llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
+2-2llvm/test/CodeGen/X86/legalize-vec-assertzext.ll
+1-1llvm/test/CodeGen/X86/avx512bwvl-arith.ll
+16-74 files

LLVM/project a8af090clang/lib/AST/ByteCode Interp.h, clang/test/AST/ByteCode c.c

[clang][bytecode] Don't call getOffset on non-block pointers (#179628)

Fixes https://github.com/llvm/llvm-project/issues/177587
DeltaFile
+8-0clang/test/AST/ByteCode/c.c
+2-2clang/lib/AST/ByteCode/Interp.h
+10-22 files

LLVM/project 7171d6cmlir/include/mlir/Dialect/SPIRV/IR SPIRVCastOps.td SPIRVNonUniformOps.td

[mlir][spirv] Update op examples that diverged from assemblyFormat. NFC. (#179594)

DeltaFile
+11-11mlir/include/mlir/Dialect/SPIRV/IR/SPIRVCastOps.td
+6-6mlir/include/mlir/Dialect/SPIRV/IR/SPIRVNonUniformOps.td
+4-4mlir/include/mlir/Dialect/SPIRV/IR/SPIRVArithmeticOps.td
+2-2mlir/include/mlir/Dialect/SPIRV/IR/SPIRVGroupOps.td
+2-2mlir/include/mlir/Dialect/SPIRV/IR/SPIRVAtomicOps.td
+1-1mlir/include/mlir/Dialect/SPIRV/IR/SPIRVCompositeOps.td
+26-261 files not shown
+27-277 files

LLVM/project ea7f8c8llvm/lib/Target/AArch64 MachineSMEABIPass.cpp

[AArch64] Fix a couple of typos (NFC) (#179639)

Fixes some comments I forgot to correct/update.
DeltaFile
+7-7llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+7-71 files

LLVM/project 8a2b41bclang/lib/Driver/ToolChains/Arch X86.cpp

[X86][NFC] Split mapxf options from mapx_features (#179638)

So that we don't need to check `Name == "apxf"` again.
DeltaFile
+19-20clang/lib/Driver/ToolChains/Arch/X86.cpp
+19-201 files

LLVM/project 49bf907llvm/lib/Target/Mips MipsLegalizerInfo.cpp MipsInstructionSelector.cpp, llvm/lib/Target/PowerPC/GISel PPCInstructionSelector.cpp

[NFC][LLVM] Make `MachineInstrBuilder::constrainAllUses` return `void` (#179632)

This function always returns `true`; so we can transform it to return
`void` and simplify the code.

Follow up of https://github.com/llvm/llvm-project/pull/179501 .
DeltaFile
+487-482llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+91-94llvm/lib/Target/PowerPC/GISel/PPCInstructionSelector.cpp
+11-22llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+8-9llvm/lib/Target/SPIRV/SPIRVISelLowering.cpp
+6-7llvm/lib/Target/Mips/MipsLegalizerInfo.cpp
+3-6llvm/lib/Target/Mips/MipsInstructionSelector.cpp
+606-6202 files not shown
+613-6258 files

LLVM/project 071bb46llvm/lib/Target/AMDGPU SIRegisterInfo.cpp, llvm/test/CodeGen/AMDGPU pei-build-spill-offset-overflow-gfx950.mir

[AMDGPU][SIRegisterInfo] Fix maxoffset calculation in buildSpillLoadStore (#179182)

This PR addresses Maxoffset calculation bug in SIRegisterInfo. When
RemSize is non-zero, maxoffset, that needs to be encoded in the offset
field, will be equal to "Offset + Size".

---------

Co-authored-by: Abhinav Garg <abhigarg at amd.com>
DeltaFile
+32-0llvm/test/CodeGen/AMDGPU/pei-build-spill-offset-overflow-gfx950.mir
+4-1llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+36-12 files

LLVM/project 05db0c4llvm/lib/Target/AMDGPU AMDGPULaneMaskUtils.h

[AMDGPU] Add CmpLG and OrN2 operators to LaneMaskConstants (#179493)

Add CmpLG and OrN2 operators to be able to use the LaneMaskConstants in
PhiLoweringHelper from SILowerI1Copies
DeltaFile
+4-0llvm/lib/Target/AMDGPU/AMDGPULaneMaskUtils.h
+4-01 files

LLVM/project 2808332llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 combine-bzhi.ll

[X86] computeKnownBitsForTargetNode - extend X86ISD::BZHI handling. Fixes 177364. (#179444)

Fixes #177364
DeltaFile
+41-0llvm/test/CodeGen/X86/combine-bzhi.ll
+21-1llvm/lib/Target/X86/X86ISelLowering.cpp
+62-12 files

LLVM/project 046413fllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel unmerge-sgpr-s16.mir

AMDGPU/GlobalISel: Fix sgpr s16 unmerge lowering in regbanklegalize (#179441)

Used to fail EXPENSIVE_CHECKS because of type mismatch.
DeltaFile
+5-3llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+4-4llvm/test/CodeGen/AMDGPU/GlobalISel/unmerge-sgpr-s16.mir
+9-72 files

LLVM/project 3462c2bclang/test/CodeGenCXX exceptions-seh.cpp

[NFC] Add redirect the output (#179623)

DeltaFile
+3-3clang/test/CodeGenCXX/exceptions-seh.cpp
+3-31 files

LLVM/project 111bef2llvm/include/llvm/ADT Uniformity.h, llvm/lib/Analysis UniformityAnalysis.cpp

update uniformity per val instead inst
DeltaFile
+110-4llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/per-output-uniformity.mir
+34-33llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+13-13llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+6-7llvm/lib/Analysis/UniformityAnalysis.cpp
+6-6llvm/include/llvm/ADT/Uniformity.h
+4-5llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+173-689 files not shown
+197-9315 files

LLVM/project a631f3ellvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 gfni-xor-fold.ll gfni-xor-fold-avx512.ll

[X86] Fold vgf2p8affineqb XOR with splat constant into immediate (#179103)

The vgf2p8affineqb instruction performs an affine transformation on each
byte and then XORs the result with an 8-bit immediate operand. When this
instruction is followed by a standalone XOR with a splatted constant,
LLVM currently generates extra instructions instead of folding the
constant into the instruction's immediate.
This PR adds a DAG combine optimization that detects the pattern
vgf2p8affineqb(x, m, imm8) ^ C where C is a splatted 8-bit constant and
transforms it to vgf2p8affineqb(x, m, imm8 ^ C), eliminating the
unnecessary XOR instruction.
- The optimization runs during the combine phase after type legalization
- Handles XOR with the constant on either side (commutative)
- Only applies when the GFNI instruction has a single use to avoid
de-optimization
- Validates that the XOR operand is a splatted 8-bit constant before
folding
- Includes test coverage for positive cases and negative cases
(multi-use, non-splat constant, variable XOR)
DeltaFile
+144-0llvm/test/CodeGen/X86/gfni-xor-fold.ll
+63-0llvm/test/CodeGen/X86/gfni-xor-fold-avx512.ll
+28-0llvm/lib/Target/X86/X86ISelLowering.cpp
+235-03 files

LLVM/project 7b600e6llvm/lib/Target/Mips MipsLegalizerInfo.cpp MipsInstructionSelector.cpp, llvm/lib/Target/PowerPC/GISel PPCInstructionSelector.cpp

[NFC][LLVM] Make `MachineInstrBuilder::constrainAllUses` return `void`

This function always returns `true`; so we can transform it to return
`void` and simplify the code.

Follow up of https://github.com/llvm/llvm-project/pull/179501 .
DeltaFile
+487-482llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+91-94llvm/lib/Target/PowerPC/GISel/PPCInstructionSelector.cpp
+11-22llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+8-9llvm/lib/Target/SPIRV/SPIRVISelLowering.cpp
+6-7llvm/lib/Target/Mips/MipsLegalizerInfo.cpp
+3-6llvm/lib/Target/Mips/MipsInstructionSelector.cpp
+606-6202 files not shown
+613-6258 files

LLVM/project 275eea2llvm/lib/Target/Hexagon HexagonISelLowering.cpp, llvm/test/CodeGen/Hexagon no-invalid-node-v4i16.ll

[HEXAGON] Extend/Truncate the shift amount into i32 (#179499)

Fixes a Backend error
DeltaFile
+24-0llvm/test/CodeGen/Hexagon/no-invalid-node-v4i16.ll
+11-2llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
+35-22 files

LLVM/project 319dcc1llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/LoongArch/lsx issue177155.ll

[SelectionDAG] Use promoted types when creating nodes after type legalization (#178617)

When creating new nodes with illegal types after type legalization, we
should try to use promoted type to avoid creating nodes with illegal
types.

Fixes: https://github.com/llvm/llvm-project/issues/177155
(cherry picked from commit 38e280d8a405bb442d176b8dab18da63d3fc2810)
DeltaFile
+26-0llvm/test/CodeGen/LoongArch/lsx/issue177155.ll
+7-0llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+33-02 files

LLVM/project cb4a27fllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel unmerge-sgpr-s16.mir

AMDGPU/GlobalISel: Fix sgpr s16 unmerge lowering in regbanklegalize

Used to fail EXPENSIVE_CHECKS because of type mismatch.
DeltaFile
+4-4llvm/test/CodeGen/AMDGPU/GlobalISel/unmerge-sgpr-s16.mir
+5-3llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+9-72 files

LLVM/project 974b768llvm/test/CodeGen/AArch64 clmul-scalable.ll clmul.ll

[AArch64] Add clmul AArch64 lowering tests (#179495)

DeltaFile
+1,172-0llvm/test/CodeGen/AArch64/clmul-scalable.ll
+454-16llvm/test/CodeGen/AArch64/clmul.ll
+458-0llvm/test/CodeGen/AArch64/clmul-fixed.ll
+2,084-163 files

LLVM/project 6640cabllvm/include/llvm/ADT Uniformity.h, llvm/lib/Analysis UniformityAnalysis.cpp

update uniformity per val instead inst
DeltaFile
+110-4llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/per-output-uniformity.mir
+34-33llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+13-13llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+6-7llvm/lib/Analysis/UniformityAnalysis.cpp
+6-6llvm/include/llvm/ADT/Uniformity.h
+4-5llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+173-688 files not shown
+196-9214 files

LLVM/project aa7e5e8llvm/lib/Target/Mips MipsLegalizerInfo.cpp MipsInstructionSelector.cpp, llvm/lib/Target/PowerPC/GISel PPCInstructionSelector.cpp

[NFC][LLVM] Make `MachineInstrBuilder::constrainAllUses` return `void`

This function always returns `true`; so we can transform it to return
`void` and simplify the code.

Follow up of https://github.com/llvm/llvm-project/pull/179501 .
DeltaFile
+487-482llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+91-94llvm/lib/Target/PowerPC/GISel/PPCInstructionSelector.cpp
+12-23llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+8-9llvm/lib/Target/SPIRV/SPIRVISelLowering.cpp
+6-7llvm/lib/Target/Mips/MipsLegalizerInfo.cpp
+3-6llvm/lib/Target/Mips/MipsInstructionSelector.cpp
+607-6212 files not shown
+614-6268 files

LLVM/project a1fd097llvm/lib/Transforms/Scalar SeparateConstOffsetFromGEP.cpp, llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU xor-decompose.ll

Revert "[SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (#150438)"

Cherry-pick of #179339 (a2c7c6032f27c4f8d6f7327a7ca15705d3081c3e).
DeltaFile
+0-435llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/xor-decompose.ll
+4-81llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+4-5162 files

LLVM/project 1a318e7libc/cmake/modules prepare_libc_gpu_build.cmake, libc/docs/gpu building.rst

[libc] Tweak the runtimes cross-build for GPU (#178548)

Summary:
We should likely use `-DLLVM_DEFAULT_TARGET_TRIPLE` as the general
source of truth, make the handling work with that since we use it for
the output directories. Fix the creation of startup files in this mode
and make sure it can detect the GPU properly.

Fixes: https://github.com/llvm/llvm-project/issues/179375
(cherry picked from commit e07a1182fd58a5b48a2c78bc3ae03872186d4ae0)
DeltaFile
+6-34libc/docs/gpu/building.rst
+2-2libc/cmake/modules/prepare_libc_gpu_build.cmake
+1-0libc/startup/gpu/CMakeLists.txt
+9-363 files

LLVM/project 3a653aflibcxx/include/__algorithm unwrap_range.h

[libc++] Simplify the implementation of __{un,re}wrap_range (#178381)

We can use a relatively simple `if constexpr` chain instead of SFINAE
and class template specialization, making the functions much simpler to
understand.
DeltaFile
+14-44libcxx/include/__algorithm/unwrap_range.h
+14-441 files