LLVM/project 23f9e42llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 aarch64-dup-ext.ll

[AArch64] Support SHUFFLE of ANY_EXTEND in performBuildShuffleExtendCombine (#178408)

Currently performBuildShuffleExtendCombine only supports ANY_EXTEND
operands for BUILD_VECTOR inputs, and will bail if it encounters a
VECTOR_SHUFFLE with ANY_EXTEND operands. Update the logic so that we
support shuffles with ANY_EXTEND operands, which brings the code in line
with the comment.
DeltaFile
+119-0llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+4-3llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+123-32 files

LLVM/project 3377756llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 is_fpclass-fp80.ll

[X86] checkSignTestSetCCCombine - handle SIGN_EXTEND_INREG/SHL patterns inside CMP(X,0) (#178710)

Handle SIGN_EXTEND_INREG and SHL patterns inside CMP(X,0) cases in checkSignTestSetCCCombine.

Fixes #178246
DeltaFile
+41-45llvm/test/CodeGen/X86/is_fpclass-fp80.ll
+8-4llvm/lib/Target/X86/X86ISelLowering.cpp
+49-492 files

LLVM/project 5a221c3mlir/lib/Dialect/MemRef/Transforms FoldMemRefAliasOps.cpp, mlir/test/Dialect/MemRef fold-memref-alias-ops.mlir

[mlir][memref]: Fold ExpandShape into TransferRead (#176786)

Add support for folding `memref.expand_shape` ops into
`vector.transfer_read` ops when the permutation map is a
non-minor-identity.

In the case that the permutation map indexes into expanded dimensions
that would be contiguous within the original source shape then it is
safe to make this transformation.

Signed-off-by: Jack Frankland <jack.frankland at arm.com>
DeltaFile
+27-10mlir/lib/Dialect/MemRef/Transforms/FoldMemRefAliasOps.cpp
+36-0mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
+63-102 files

LLVM/project a372152llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp DAGCombiner.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining.ll

[DAG] visitVECTOR_SHUFFLE - ensure correct resno when folding shuffle(bop(shuffle(x,y),shuffle(z,w)) (#179124)

TLI.isBinOp recognises some opcodes that have multiple results,
including UADDO etc.

In most cases we currently just bail if a binop has multiple results,
but shuffle combining was missing the check and its pretty trivial to
add handling in this case.

I've added add/sub-overflow opcodes to verifyNode to help catch these
cases in the future - IIRC there was a plan to autogen these, but there
isn't anything at the moment.

Fixes #179112
DeltaFile
+54-2llvm/test/CodeGen/X86/vector-shuffle-combining.ll
+13-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+71-43 files

LLVM/project f9423edmlir/lib/Interfaces DataLayoutInterfaces.cpp, mlir/test/Interfaces/DataLayoutInterfaces query.mlir

[mlir] Fix alignment for predicate (i1) vectors (#175975)

Legal scalable predicate vectors (legal in the LLVM sense), e.g.
`vector<[16]xi1>` (or `<vscale x 16 x i1>`, using LLVM syntax) ought to
have alignment **2** rather than **16**, see e.g. [1].

MLIR currently computes the vector “size in bits” as:

```cpp
vecType.getNumElements()
  * dataLayout.getTypeSize(vecType.getElementType()) * 8
```

but `getTypeSize()` returns a size in *bytes* (rounded up from bits), so
for `i1` it returns 1. Multiplying by 8 converts that storage byte back to 8
bits per element, which overestimates predicate vector sizes.

Instead, use:


    [18 lines not shown]
DeltaFile
+2-4mlir/lib/Interfaces/DataLayoutInterfaces.cpp
+6-0mlir/test/Interfaces/DataLayoutInterfaces/query.mlir
+8-42 files

LLVM/project 0321f3ellvm/lib/Target/AArch64/GISel AArch64InstructionSelector.cpp, llvm/test/CodeGen/AArch64 aarch64-tbz.ll

[AArch64][GlobalISel] Do no skip zext in getTestBitReg. (#177991)

We can, when attempting to lower to tbz, skip a zext that is then not
accounted for elsewhere. The attached test ends up with a tbz from an
extract that then does not properly zext the value extracted from the
vector. This patch fixes that by only looking through a G_ZEXT if the
bit checked is in the low part of the value, lining up the code with the
comment.

Fixes #173895
DeltaFile
+7-7llvm/test/CodeGen/AArch64/GlobalISel/widen-narrow-tbz-tbnz.mir
+5-4llvm/test/CodeGen/AArch64/aarch64-tbz.ll
+4-3llvm/test/CodeGen/AArch64/GlobalISel/opt-fold-xor-tbz-tbnz.mir
+5-1llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+21-154 files

LLVM/project f3cc908clang/lib/CodeGen TargetInfo.h, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

Address comments
DeltaFile
+64-22clang/test/CodeGenHIP/builtins-amdgcn-gfx1250-cooperative-atomics-templated.hip
+33-51clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+28-36clang/lib/CodeGen/Targets/SPIR.cpp
+34-9clang/test/CodeGenHIP/builtins-amdgcn-gfx1250-load-monitor-templated.hip
+6-10clang/lib/CodeGen/Targets/AMDGPU.cpp
+10-5clang/lib/CodeGen/TargetInfo.h
+175-1331 files not shown
+184-1347 files

LLVM/project c0be2cdllvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR per-output-uniformity.mir

add divergent input test for amgcn_else intrinsic
DeltaFile
+19-0llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/per-output-uniformity.mir
+19-01 files

LLVM/project 38b58a2llvm/lib/Target/SystemZ SystemZScheduleZ13.td SystemZInstrInfo.cpp, llvm/test/CodeGen/SystemZ copy-physreg-vr16.ll

[SystemZ] Bugfix: Add VLR16 to SystemZInstrInfo::copyPhysReg(). (#178932)

Support COPYs involving higher FP16 regs (like F24H) with a new pseudo
instruction 'VLR16'.

This is needed with -O0/regalloc=fast, and probably in more cases as
well.

Fixes #178788.

(cherry picked from commit 09f9a2892a412a73d42942e78eed9cde61c7a9e7)
DeltaFile
+35-0llvm/test/CodeGen/SystemZ/copy-physreg-vr16.ll
+1-1llvm/lib/Target/SystemZ/SystemZScheduleZ13.td
+2-0llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
+1-1llvm/lib/Target/SystemZ/SystemZScheduleZ14.td
+1-1llvm/lib/Target/SystemZ/SystemZScheduleZ15.td
+1-1llvm/lib/Target/SystemZ/SystemZScheduleZ16.td
+41-43 files not shown
+44-59 files

LLVM/project 467b3bbllvm/lib/ExecutionEngine/Orc/Debugging ELFDebugObjectPlugin.cpp, llvm/test/ExecutionEngine/JITLink/x86-64 ELF_no_debug_info.s

[ELFDebugObjectPlugin] Do not wait for std::future in post-fixup phase in the absent of debug info (#178541)

If there is no debug information, we wouldn't call
`DebugObject::collectTargetAlloc` in the post-allocation phase.
Therefore, when it's in the post-fixup phase,
`DebugObject::awaitTargetMem` will fail with _"std::future_error: No
associated state"_ because the std::future was not even populated.

(cherry picked from commit 696ea11b94d119416c9618b5add09d5ac09428aa)
DeltaFile
+20-0llvm/test/ExecutionEngine/JITLink/x86-64/ELF_no_debug_info.s
+14-1llvm/lib/ExecutionEngine/Orc/Debugging/ELFDebugObjectPlugin.cpp
+34-12 files

LLVM/project 5bebd32llvm/docs DTLTO.rst

[DOC][DTLTO] Update DTLTO documentation for the LLVM 22 release (#177368)

This change updates the documentation to reflect work completed during
the LLVM 22 timeframe, including support for the ThinLTO cache and
static libraries/archives.

It also clarifies that the goal of DTLTO is to support distribution of
ThinLTO backend compilations for any in-process ThinLTO invocation.

SIE Internal Tracker: TOOLCHAIN-21016

(cherry picked from commit 88478ab495f27f2cb798d4bf6912fe7cf4872997)
DeltaFile
+15-11llvm/docs/DTLTO.rst
+15-111 files

LLVM/project 279f407cross-project-tests/dtlto fat-lto-objects.test, lld/ELF Driver.cpp

[DTLTO] support distributing bitcode from FatLTO objects (#176928)

We already have code to extract bitcode files from archives so they can
be distributed. Extend this code to extract bitcode from FatLTO objects
too, which otherwise cannot be used with DTLTO.

(cherry picked from commit e45ea95dbe236e233ad978067688789e7478541a)
DeltaFile
+55-0cross-project-tests/dtlto/fat-lto-objects.test
+16-14llvm/lib/DTLTO/DTLTO.cpp
+16-5llvm/include/llvm/LTO/LTO.h
+4-2lld/ELF/Driver.cpp
+2-2lld/test/ELF/dtlto/timetrace.test
+93-235 files

LLVM/project ba53f94llvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/SimplifyCFG jump-threading.ll

[SimplifyCFG] Fix null pointer dereference in foldCondBranchOnValueKnownInPredecessorImpl (#178835)

(cherry picked from commit 956770a9cb27d56cd04432be90f1241d3e932019)
DeltaFile
+41-0llvm/test/Transforms/SimplifyCFG/jump-threading.ll
+2-0llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+43-02 files

LLVM/project 82de343llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 get-active-lane-mask-extract.ll

release/22.x: [AArch64][SME2] Allow lowering to whilelo.x2 in non-streaming mode (#178399)

Backport: https://github.com/llvm/llvm-project/commit/162267ee90019c6b8241dcf470a2d3fae2b306a7
DeltaFile
+26-25llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
+6-5llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+32-302 files

LLVM/project e4cd2e2llvm/lib/Support/Windows Process.inc Threading.inc

[Support] Move loadSystemModuleSecure into Process.inc. NFC. (#177598)

Move Windows-specific function
`llvm::sys::windows::loadSystemModuleSecure` from
`lib/Support/Windows/Threading.inc` into
`lib/Support/Windows/Process.inc`.

This is to fix link problems on Windows, see
https://github.com/llvm/llvm-project/pull/169224#issuecomment-3790350128

(cherry picked from commit 70ee6e4427c8f55a910193bbda2eadf75e8a75f2)
DeltaFile
+29-0llvm/lib/Support/Windows/Process.inc
+0-29llvm/lib/Support/Windows/Threading.inc
+29-292 files

LLVM/project f84d0aellvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor/AMDGPU nofpclass-amdgcn-trig-preop.ll

AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.trig.preop

Surprisingly this doesn't consider the special cases, and literally
just extracts the exponent and proceeds as normal.
DeltaFile
+12-0llvm/test/Transforms/Attributor/AMDGPU/nofpclass-amdgcn-trig-preop.ll
+4-0llvm/lib/Analysis/ValueTracking.cpp
+16-02 files

LLVM/project b9a10f4llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-intrinsics.ll

AMDGPU: Fix incorrect fold of undef for llvm.amdgcn.trig.preop

We were folding undef inputs to qnan which is incorrect. The instruction
never returns nan. Out of bounds segment select will return 0, so fold
undef segment to 0.
DeltaFile
+29-28llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+18-18llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+47-462 files

LLVM/project 80662c1llvm/lib/Target/AMDGPU AMDGPUCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Use SimplifyQuery in AMDGPUCodeGenPrepare (#179133)

Enables assumes in more contexts. Of particular interest is the
nan check for the fract pattern.

The device libs f32 and s64 sin implementations have a range check,
and inside the large path this pattern appears. After a small patch
to invert this check to send nans down the small path, this will
enable the fold unconditionally on the large path.
DeltaFile
+79-0llvm/test/CodeGen/AMDGPU/fract-match.ll
+27-22llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+106-222 files

LLVM/project 8d0830ellvm/include/llvm/Analysis LoopCacheAnalysis.h, llvm/lib/Analysis LoopCacheAnalysis.cpp

[LoopCacheAnalysis] Remove tryDelinearizeFixedSize (NFCI) (#177552)

LoopCacheAnalysis has its own function `tryDelinearizeFixedSize`, which
is a wrapper of Delinearization. Due to recent changes in
Delinearization, this function has become almost equivalent to
`delinearizeFixedSizeArray` and is no longer necessary. This patch
removes it.
DeltaFile
+4-33llvm/lib/Analysis/LoopCacheAnalysis.cpp
+0-5llvm/include/llvm/Analysis/LoopCacheAnalysis.h
+4-382 files

LLVM/project a667526llvm/lib/CodeGen MachineFunctionPass.cpp, llvm/test/CodeGen/AArch64 O3-pipeline.ll

[MachineFunctionPass] Preserve more IR analyses (#178871)

Preserve, PDT, BPI, LazyBPI and LazyBFI. These are all IR analysis that
are not invalidated by machine passes.

This partially mitigates the compile-time regression from
https://github.com/llvm/llvm-project/pull/174746.
DeltaFile
+8-0llvm/lib/CodeGen/MachineFunctionPass.cpp
+0-4llvm/test/CodeGen/AArch64/O3-pipeline.ll
+8-42 files

LLVM/project 6d83b16llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/include/llvm/CodeGen TargetInstrInfo.h

Implement per-output machine uniformity analysis
DeltaFile
+76-14llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+27-11llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+16-5llvm/include/llvm/ADT/GenericUniformityImpl.h
+8-9llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/per-output-uniformity.mir
+4-4llvm/lib/Target/AMDGPU/SIInstrInfo.h
+4-3llvm/include/llvm/CodeGen/TargetInstrInfo.h
+135-462 files not shown
+140-498 files

LLVM/project b4797d4llvm/lib/Target/PowerPC PPCISelLowering.cpp, llvm/test/CodeGen/PowerPC ucmp.ll

[PowerPC] Fix miscompilation when using 32-bit ucmp on 64-bit PowerPC (#178979)

I forgot that you need to clear the upper 32 bits for the carry flag to
work properly on ppc64 or else there will be garbage and possibly
incorrect results.

Fixes: https://github.com/llvm/llvm-project/issues/179119

I do not have merge permissions.
DeltaFile
+13-4llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+12-4llvm/test/CodeGen/PowerPC/ucmp.ll
+25-82 files

LLVM/project 136bbdeclang/lib/AST ExprConstant.cpp, clang/lib/AST/ByteCode State.cpp State.h

[clang][ExprConst] Move shared `EvalInfo` state into `interp::State` (#177738)

Instead of having `InterpState` call into its parent `EvalInfo`, just
save the state in `interp::State`, where both subclasses can access it.
DeltaFile
+15-130clang/lib/AST/ExprConstant.cpp
+74-13clang/lib/AST/ByteCode/State.cpp
+58-15clang/lib/AST/ByteCode/State.h
+4-33clang/lib/AST/ByteCode/InterpState.h
+7-6clang/lib/AST/ByteCode/InterpState.cpp
+158-1975 files

LLVM/project b83160bllvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp

AMDGPU: Use extractBitsAsZExtValue to get exponent in trig_preop folding (#179024)

DeltaFile
+1-1llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+1-11 files

LLVM/project 48c664bmlir/lib/Dialect/UB/IR CMakeLists.txt

[mlir] Fix build after #179039 (#179180)

Fix build after #179039.
DeltaFile
+2-0mlir/lib/Dialect/UB/IR/CMakeLists.txt
+2-01 files

LLVM/project 6c4f95bmlir/lib/Dialect/UB/IR CMakeLists.txt

[mlir] Fix build after #179039
DeltaFile
+2-0mlir/lib/Dialect/UB/IR/CMakeLists.txt
+2-01 files

LLVM/project a220502mlir/include/mlir/Dialect/UB/IR UBOps.td, mlir/lib/Dialect/UB/IR UBOps.cpp

[mlir][UB] Erase ops that precede `ub.unreachable`
DeltaFile
+37-0mlir/lib/Dialect/UB/IR/UBOps.cpp
+25-0mlir/test/Dialect/UB/canonicalize.mlir
+1-0mlir/include/mlir/Dialect/UB/IR/UBOps.td
+63-03 files

LLVM/project 1e33b73mlir/include/mlir/Interfaces ExecutionProgressOpInterface.td ExecutionProgressOpInterface.h, mlir/lib/Dialect/SCF/IR SCF.cpp

[mlir][Interfaces] Add `ExecutionProgressOpInterface` + folding pattern (#179039)

Add the `ExecutionProgressOpInterface` with an interface method to check
if an operation "must progress". Add `mustProgress` attributes to
`scf.for` and `scf.while` (default value is "true").

`mustProgress` corresponds to the [`llvm.loop.mustprogress`
metadata](https://llvm.org/docs/LangRef.html#langref-llvm-loop-mustprogress).

Also add a canonicalization pattern to erase `RegionBranchOpInterface`
ops that must progress but loop infinitely (and are non-side-effecting).
This canonicalization pattern is enabled for `scf.for` and `scf.while`.

RFC: https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
DeltaFile
+73-30mlir/lib/Interfaces/ControlFlowInterfaces.cpp
+51-0mlir/test/Dialect/SCF/canonicalize.mlir
+45-3mlir/lib/Dialect/SCF/IR/SCF.cpp
+48-0mlir/include/mlir/Interfaces/ExecutionProgressOpInterface.td
+39-0mlir/lib/Dialect/UB/IR/UBOps.cpp
+25-0mlir/include/mlir/Interfaces/ExecutionProgressOpInterface.h
+281-3313 files not shown
+356-4419 files

LLVM/project 013b345clang/lib/Sema SemaType.cpp, clang/lib/Serialization ASTReaderDecl.cpp

[Serialization] Stop demote var definition as declaration (#172430) (#177117)

Close https://github.com/llvm/llvm-project/issues/172241 
Close https://github.com/llvm/llvm-project/issues/64034 
Close https://github.com/llvm/llvm-project/issues/149404
 Close https://github.com/llvm/llvm-project/issues/174858

After this patch, we (the clang dev) no longer assumes there are at most
one definition in a redeclaration chain.

See


https://discourse.llvm.org/t/rfc-clang-not-assuming-there-is-at-most-one-definition-in-a-redeclaration-chain/89360
for details.

---

Update since last commit:

    [2 lines not shown]
DeltaFile
+110-0clang/test/Modules/var-inst-def.cppm
+104-0clang/test/Modules/pr149404-02.cppm
+94-0clang/test/Modules/demote-var-def.cpp
+52-24clang/lib/Sema/SemaType.cpp
+47-0clang/test/Modules/pr172241.cppm
+0-14clang/lib/Serialization/ASTReaderDecl.cpp
+407-386 files

LLVM/project 7675549llvm/lib/Target/AMDGPU SOPInstructions.td AMDGPU.td, llvm/test/MC/AMDGPU gfx13_asm_sopc.s gfx13_asm_sopp.s

[AMDGPU] Add SOPK, SOPC and SOPP encoding support for gfx13
DeltaFile
+2,360-0llvm/test/MC/AMDGPU/gfx13_asm_sopc.s
+448-263llvm/lib/Target/AMDGPU/SOPInstructions.td
+276-0llvm/test/MC/AMDGPU/gfx13_asm_sopp.s
+215-0llvm/test/MC/AMDGPU/gfx13_asm_sopk.s
+20-3llvm/lib/Target/AMDGPU/AMDGPU.td
+18-0llvm/test/MC/AMDGPU/gfx13_asm_sopp_alias.s
+3,337-2661 files not shown
+3,346-2667 files