LLVM/project aee5fa1clang-tools-extra/clang-tidy/modernize UseStructuredBindingCheck.cpp

UseStructuredBindingCheck.cpp - fix MSVC "not all control paths return a value" warning. NFC. (#179206)

DeltaFile
+1-0clang-tools-extra/clang-tidy/modernize/UseStructuredBindingCheck.cpp
+1-01 files

LLVM/project fd1e37bllvm/include/llvm/IR BasicBlock.h, llvm/include/llvm/Transforms/Utils BasicBlockUtils.h

[IR] Remove Before argument from splitBlock APIs (NFC) (#179195)

We never need to use this conditionally (and it doesn't really make
sense, as the behavior is substantially different). Force the use of
separate APIs instead of a boolean argument.
DeltaFile
+12-17llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
+9-20llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+3-8llvm/include/llvm/IR/BasicBlock.h
+4-4llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+1-5llvm/lib/IR/BasicBlock.cpp
+1-2llvm/lib/CodeGen/CodeGenPrepare.cpp
+30-562 files not shown
+32-588 files

LLVM/project 9d5a42cllvm/lib/Target/X86/GISel X86PostLegalizerCombiner.cpp, llvm/test/CodeGen/X86 isel-icmp.ll isel-fpclass.ll

[X86][GISEL] Enable PostLegalize Combiner (#174696)

This patch adds post legalize combiner for X86 Target.

Use case for OptNone Combine: I am curious about OptNone usecase,
specifically when we are using -On on command line and no opt on
specific function.
DeltaFile
+187-0llvm/lib/Target/X86/GISel/X86PostLegalizerCombiner.cpp
+41-46llvm/test/CodeGen/X86/isel-icmp.ll
+14-39llvm/test/CodeGen/X86/isel-fpclass.ll
+14-18llvm/test/CodeGen/X86/GlobalISel/add-ext.ll
+5-20llvm/test/CodeGen/X86/isel-llvm.set.rounding.ll
+6-18llvm/test/CodeGen/X86/finite-libcalls.ll
+267-14125 files not shown
+315-22231 files

LLVM/project f4a8f30llvm/include/llvm/Transforms/Utils CodeExtractor.h

[CodeExtractor] Format CodeExtractor header, NFC (#178662)

This patch applies clang-format to the CodeExtractor header and updates
usage of the LLVM_ABI macro to prevent unrelated patches touching this
file from having to make these changes in order to pass pre-merge
checks.
DeltaFile
+217-227llvm/include/llvm/Transforms/Utils/CodeExtractor.h
+217-2271 files

LLVM/project f84c367mlir/include/mlir/Transforms RegionUtils.h, mlir/lib/Transforms/Utils RegionUtils.cpp

[mlir] Extend moveValueDefinitions/moveOperationDependencies with cross-region support (#176343)

Extends `moveValueDefinitions` and `moveOperationDependencies` to
support moving operations across basic blocks and out of nested regions
DeltaFile
+239-20mlir/test/Transforms/move-operation-deps.mlir
+126-18mlir/lib/Transforms/Utils/RegionUtils.cpp
+25-10mlir/include/mlir/Transforms/RegionUtils.h
+390-483 files

LLVM/project f288f46llvm/lib/CodeGen/GlobalISel InstructionSelect.cpp, llvm/test/CodeGen/AArch64/GlobalISel 166563.mir

[AArch64][GlobalISel] Constrain G_CONSTANT_FOLD_BARRIER operand register classes (#177997)

[AArch64][GlobalISel] Constrain G_CONSTANT_FOLD_BARRIER operand

Instruction selection is lowering:

  bb.1:
    %6:gpr(s64) = G_CONSTANT i64 457873110
    ...
  bb.2:
    %12:gpr(s64) = G_CONSTANT_FOLD_BARRIER %6
    %24:gpr(s64) = G_CONSTANT i64 0
    %13:gpr(s64) = G_AND %24, %12
    ...

to:

  %13:gpr64 = ANDXrr %24:gpr64, %6:gpr64sp'


    [35 lines not shown]
DeltaFile
+237-0llvm/test/CodeGen/AArch64/GlobalISel/166563.mir
+4-3llvm/lib/CodeGen/GlobalISel/InstructionSelect.cpp
+241-32 files

LLVM/project 2f3935bclang/include/clang/Options Options.td, clang/lib/Basic/Targets X86.cpp

[X86][APX] Disable PP2/PPX generation on Windows (#178122)

The PUSH2/POP2/PPX instructions for APX require updates to the Microsoft
Windows OS x64 calling convention documented at
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170
due to lack of suitable unwinder opcodes that can support APX
PUSH2/POP2/PPX.

The PR request disables this support by default for code robustness;
workloads that choose to explicitly enable this support can change the
default behavior by explicitly specifying the flag options that enable
this support e.g. for experimentation or code paths that do not need
unwinder support.
DeltaFile
+25-5clang/lib/Driver/ToolChains/Arch/X86.cpp
+6-5clang/test/Driver/cl-x86-flags.c
+6-5clang/test/Driver/x86-target-features.c
+8-2clang/lib/Basic/Targets/X86.cpp
+2-6clang/include/clang/Options/Options.td
+4-0llvm/lib/TargetParser/Host.cpp
+51-232 files not shown
+56-238 files

LLVM/project 485e69bclang/docs ReleaseNotes.rst, clang/include/clang/Lex Preprocessor.h

[clang] Fix dependency output for #embed (#178001)

When requesting FileEntryRef for embedded file, make sure to not use an
absolute path. Instead, create a proper relative path if we're looking
for a file from current file.

Fixes https://github.com/llvm/llvm-project/issues/161950
DeltaFile
+77-0clang/unittests/Lex/PPDependencyDirectivesTest.cpp
+14-16clang/lib/Lex/PPDirectives.cpp
+4-7clang/include/clang/Lex/Preprocessor.h
+4-3clang/test/Preprocessor/embed_dependencies.c
+1-4clang/lib/Lex/PPMacroExpansion.cpp
+2-0clang/docs/ReleaseNotes.rst
+102-306 files

LLVM/project 137cde1mlir/include/mlir/Target/LLVM/ROCDL Utils.h, mlir/lib/Target/LLVM/ROCDL Target.cpp

[mlir][ROCDL] do not hardcode partial lld path in utilities

`ROCDL::linkObjectCode` was unconditionally appending llvm/bin/ld.lld to the
path it is been passed to to look for lld, which isn't desirable for a utility
function and makes it unusable with, e.g., system lld or one from the LLVM's
own build directory. Move this logic to the caller and let the utility take a
full path.
DeltaFile
+4-4mlir/lib/Target/LLVM/ROCDL/Target.cpp
+1-1mlir/include/mlir/Target/LLVM/ROCDL/Utils.h
+5-52 files

LLVM/project 23f9e42llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 aarch64-dup-ext.ll

[AArch64] Support SHUFFLE of ANY_EXTEND in performBuildShuffleExtendCombine (#178408)

Currently performBuildShuffleExtendCombine only supports ANY_EXTEND
operands for BUILD_VECTOR inputs, and will bail if it encounters a
VECTOR_SHUFFLE with ANY_EXTEND operands. Update the logic so that we
support shuffles with ANY_EXTEND operands, which brings the code in line
with the comment.
DeltaFile
+119-0llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+4-3llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+123-32 files

LLVM/project 3377756llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 is_fpclass-fp80.ll

[X86] checkSignTestSetCCCombine - handle SIGN_EXTEND_INREG/SHL patterns inside CMP(X,0) (#178710)

Handle SIGN_EXTEND_INREG and SHL patterns inside CMP(X,0) cases in checkSignTestSetCCCombine.

Fixes #178246
DeltaFile
+41-45llvm/test/CodeGen/X86/is_fpclass-fp80.ll
+8-4llvm/lib/Target/X86/X86ISelLowering.cpp
+49-492 files

LLVM/project 5a221c3mlir/lib/Dialect/MemRef/Transforms FoldMemRefAliasOps.cpp, mlir/test/Dialect/MemRef fold-memref-alias-ops.mlir

[mlir][memref]: Fold ExpandShape into TransferRead (#176786)

Add support for folding `memref.expand_shape` ops into
`vector.transfer_read` ops when the permutation map is a
non-minor-identity.

In the case that the permutation map indexes into expanded dimensions
that would be contiguous within the original source shape then it is
safe to make this transformation.

Signed-off-by: Jack Frankland <jack.frankland at arm.com>
DeltaFile
+27-10mlir/lib/Dialect/MemRef/Transforms/FoldMemRefAliasOps.cpp
+36-0mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir
+63-102 files

LLVM/project a372152llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp DAGCombiner.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining.ll

[DAG] visitVECTOR_SHUFFLE - ensure correct resno when folding shuffle(bop(shuffle(x,y),shuffle(z,w)) (#179124)

TLI.isBinOp recognises some opcodes that have multiple results,
including UADDO etc.

In most cases we currently just bail if a binop has multiple results,
but shuffle combining was missing the check and its pretty trivial to
add handling in this case.

I've added add/sub-overflow opcodes to verifyNode to help catch these
cases in the future - IIRC there was a plan to autogen these, but there
isn't anything at the moment.

Fixes #179112
DeltaFile
+54-2llvm/test/CodeGen/X86/vector-shuffle-combining.ll
+13-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+71-43 files

LLVM/project f9423edmlir/lib/Interfaces DataLayoutInterfaces.cpp, mlir/test/Interfaces/DataLayoutInterfaces query.mlir

[mlir] Fix alignment for predicate (i1) vectors (#175975)

Legal scalable predicate vectors (legal in the LLVM sense), e.g.
`vector<[16]xi1>` (or `<vscale x 16 x i1>`, using LLVM syntax) ought to
have alignment **2** rather than **16**, see e.g. [1].

MLIR currently computes the vector “size in bits” as:

```cpp
vecType.getNumElements()
  * dataLayout.getTypeSize(vecType.getElementType()) * 8
```

but `getTypeSize()` returns a size in *bytes* (rounded up from bits), so
for `i1` it returns 1. Multiplying by 8 converts that storage byte back to 8
bits per element, which overestimates predicate vector sizes.

Instead, use:


    [18 lines not shown]
DeltaFile
+2-4mlir/lib/Interfaces/DataLayoutInterfaces.cpp
+6-0mlir/test/Interfaces/DataLayoutInterfaces/query.mlir
+8-42 files

LLVM/project 0321f3ellvm/lib/Target/AArch64/GISel AArch64InstructionSelector.cpp, llvm/test/CodeGen/AArch64 aarch64-tbz.ll

[AArch64][GlobalISel] Do no skip zext in getTestBitReg. (#177991)

We can, when attempting to lower to tbz, skip a zext that is then not
accounted for elsewhere. The attached test ends up with a tbz from an
extract that then does not properly zext the value extracted from the
vector. This patch fixes that by only looking through a G_ZEXT if the
bit checked is in the low part of the value, lining up the code with the
comment.

Fixes #173895
DeltaFile
+7-7llvm/test/CodeGen/AArch64/GlobalISel/widen-narrow-tbz-tbnz.mir
+5-4llvm/test/CodeGen/AArch64/aarch64-tbz.ll
+4-3llvm/test/CodeGen/AArch64/GlobalISel/opt-fold-xor-tbz-tbnz.mir
+5-1llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+21-154 files

LLVM/project f3cc908clang/lib/CodeGen TargetInfo.h, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

Address comments
DeltaFile
+64-22clang/test/CodeGenHIP/builtins-amdgcn-gfx1250-cooperative-atomics-templated.hip
+33-51clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+28-36clang/lib/CodeGen/Targets/SPIR.cpp
+34-9clang/test/CodeGenHIP/builtins-amdgcn-gfx1250-load-monitor-templated.hip
+6-10clang/lib/CodeGen/Targets/AMDGPU.cpp
+10-5clang/lib/CodeGen/TargetInfo.h
+175-1331 files not shown
+184-1347 files

LLVM/project c0be2cdllvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR per-output-uniformity.mir

add divergent input test for amgcn_else intrinsic
DeltaFile
+19-0llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/per-output-uniformity.mir
+19-01 files

LLVM/project 38b58a2llvm/lib/Target/SystemZ SystemZScheduleZ13.td SystemZInstrInfo.cpp, llvm/test/CodeGen/SystemZ copy-physreg-vr16.ll

[SystemZ] Bugfix: Add VLR16 to SystemZInstrInfo::copyPhysReg(). (#178932)

Support COPYs involving higher FP16 regs (like F24H) with a new pseudo
instruction 'VLR16'.

This is needed with -O0/regalloc=fast, and probably in more cases as
well.

Fixes #178788.

(cherry picked from commit 09f9a2892a412a73d42942e78eed9cde61c7a9e7)
DeltaFile
+35-0llvm/test/CodeGen/SystemZ/copy-physreg-vr16.ll
+1-1llvm/lib/Target/SystemZ/SystemZScheduleZ13.td
+2-0llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
+1-1llvm/lib/Target/SystemZ/SystemZScheduleZ14.td
+1-1llvm/lib/Target/SystemZ/SystemZScheduleZ15.td
+1-1llvm/lib/Target/SystemZ/SystemZScheduleZ16.td
+41-43 files not shown
+44-59 files

LLVM/project 467b3bbllvm/lib/ExecutionEngine/Orc/Debugging ELFDebugObjectPlugin.cpp, llvm/test/ExecutionEngine/JITLink/x86-64 ELF_no_debug_info.s

[ELFDebugObjectPlugin] Do not wait for std::future in post-fixup phase in the absent of debug info (#178541)

If there is no debug information, we wouldn't call
`DebugObject::collectTargetAlloc` in the post-allocation phase.
Therefore, when it's in the post-fixup phase,
`DebugObject::awaitTargetMem` will fail with _"std::future_error: No
associated state"_ because the std::future was not even populated.

(cherry picked from commit 696ea11b94d119416c9618b5add09d5ac09428aa)
DeltaFile
+20-0llvm/test/ExecutionEngine/JITLink/x86-64/ELF_no_debug_info.s
+14-1llvm/lib/ExecutionEngine/Orc/Debugging/ELFDebugObjectPlugin.cpp
+34-12 files

LLVM/project 5bebd32llvm/docs DTLTO.rst

[DOC][DTLTO] Update DTLTO documentation for the LLVM 22 release (#177368)

This change updates the documentation to reflect work completed during
the LLVM 22 timeframe, including support for the ThinLTO cache and
static libraries/archives.

It also clarifies that the goal of DTLTO is to support distribution of
ThinLTO backend compilations for any in-process ThinLTO invocation.

SIE Internal Tracker: TOOLCHAIN-21016

(cherry picked from commit 88478ab495f27f2cb798d4bf6912fe7cf4872997)
DeltaFile
+15-11llvm/docs/DTLTO.rst
+15-111 files

LLVM/project 279f407cross-project-tests/dtlto fat-lto-objects.test, lld/ELF Driver.cpp

[DTLTO] support distributing bitcode from FatLTO objects (#176928)

We already have code to extract bitcode files from archives so they can
be distributed. Extend this code to extract bitcode from FatLTO objects
too, which otherwise cannot be used with DTLTO.

(cherry picked from commit e45ea95dbe236e233ad978067688789e7478541a)
DeltaFile
+55-0cross-project-tests/dtlto/fat-lto-objects.test
+16-14llvm/lib/DTLTO/DTLTO.cpp
+16-5llvm/include/llvm/LTO/LTO.h
+4-2lld/ELF/Driver.cpp
+2-2lld/test/ELF/dtlto/timetrace.test
+93-235 files

LLVM/project ba53f94llvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/SimplifyCFG jump-threading.ll

[SimplifyCFG] Fix null pointer dereference in foldCondBranchOnValueKnownInPredecessorImpl (#178835)

(cherry picked from commit 956770a9cb27d56cd04432be90f1241d3e932019)
DeltaFile
+41-0llvm/test/Transforms/SimplifyCFG/jump-threading.ll
+2-0llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+43-02 files

LLVM/project 82de343llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 get-active-lane-mask-extract.ll

release/22.x: [AArch64][SME2] Allow lowering to whilelo.x2 in non-streaming mode (#178399)

Backport: https://github.com/llvm/llvm-project/commit/162267ee90019c6b8241dcf470a2d3fae2b306a7
DeltaFile
+26-25llvm/test/CodeGen/AArch64/get-active-lane-mask-extract.ll
+6-5llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+32-302 files

LLVM/project e4cd2e2llvm/lib/Support/Windows Process.inc Threading.inc

[Support] Move loadSystemModuleSecure into Process.inc. NFC. (#177598)

Move Windows-specific function
`llvm::sys::windows::loadSystemModuleSecure` from
`lib/Support/Windows/Threading.inc` into
`lib/Support/Windows/Process.inc`.

This is to fix link problems on Windows, see
https://github.com/llvm/llvm-project/pull/169224#issuecomment-3790350128

(cherry picked from commit 70ee6e4427c8f55a910193bbda2eadf75e8a75f2)
DeltaFile
+29-0llvm/lib/Support/Windows/Process.inc
+0-29llvm/lib/Support/Windows/Threading.inc
+29-292 files

LLVM/project f84d0aellvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor/AMDGPU nofpclass-amdgcn-trig-preop.ll

AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.trig.preop

Surprisingly this doesn't consider the special cases, and literally
just extracts the exponent and proceeds as normal.
DeltaFile
+12-0llvm/test/Transforms/Attributor/AMDGPU/nofpclass-amdgcn-trig-preop.ll
+4-0llvm/lib/Analysis/ValueTracking.cpp
+16-02 files

LLVM/project b9a10f4llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-intrinsics.ll

AMDGPU: Fix incorrect fold of undef for llvm.amdgcn.trig.preop

We were folding undef inputs to qnan which is incorrect. The instruction
never returns nan. Out of bounds segment select will return 0, so fold
undef segment to 0.
DeltaFile
+29-28llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+18-18llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+47-462 files

LLVM/project 80662c1llvm/lib/Target/AMDGPU AMDGPUCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Use SimplifyQuery in AMDGPUCodeGenPrepare (#179133)

Enables assumes in more contexts. Of particular interest is the
nan check for the fract pattern.

The device libs f32 and s64 sin implementations have a range check,
and inside the large path this pattern appears. After a small patch
to invert this check to send nans down the small path, this will
enable the fold unconditionally on the large path.
DeltaFile
+79-0llvm/test/CodeGen/AMDGPU/fract-match.ll
+27-22llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+106-222 files

LLVM/project 8d0830ellvm/include/llvm/Analysis LoopCacheAnalysis.h, llvm/lib/Analysis LoopCacheAnalysis.cpp

[LoopCacheAnalysis] Remove tryDelinearizeFixedSize (NFCI) (#177552)

LoopCacheAnalysis has its own function `tryDelinearizeFixedSize`, which
is a wrapper of Delinearization. Due to recent changes in
Delinearization, this function has become almost equivalent to
`delinearizeFixedSizeArray` and is no longer necessary. This patch
removes it.
DeltaFile
+4-33llvm/lib/Analysis/LoopCacheAnalysis.cpp
+0-5llvm/include/llvm/Analysis/LoopCacheAnalysis.h
+4-382 files

LLVM/project a667526llvm/lib/CodeGen MachineFunctionPass.cpp, llvm/test/CodeGen/AArch64 O3-pipeline.ll

[MachineFunctionPass] Preserve more IR analyses (#178871)

Preserve, PDT, BPI, LazyBPI and LazyBFI. These are all IR analysis that
are not invalidated by machine passes.

This partially mitigates the compile-time regression from
https://github.com/llvm/llvm-project/pull/174746.
DeltaFile
+8-0llvm/lib/CodeGen/MachineFunctionPass.cpp
+0-4llvm/test/CodeGen/AArch64/O3-pipeline.ll
+8-42 files

LLVM/project 6d83b16llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/include/llvm/CodeGen TargetInstrInfo.h

Implement per-output machine uniformity analysis
DeltaFile
+76-14llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+27-11llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+16-5llvm/include/llvm/ADT/GenericUniformityImpl.h
+8-9llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/per-output-uniformity.mir
+4-4llvm/lib/Target/AMDGPU/SIInstrInfo.h
+4-3llvm/include/llvm/CodeGen/TargetInstrInfo.h
+135-462 files not shown
+140-498 files