LLVM/project d673532llvm/test/Analysis/UniformityAnalysis/AMDGPU workitem-intrinsics.ll, llvm/test/CodeGen/AMDGPU global_atomics_iterative_scan.ll global_atomics_iterative_scan_fp.ll

AMDGPU: Remove unnecessary target-cpu attributes from tests
DeltaFile
+7-9llvm/test/CodeGen/AMDGPU/global_atomics_iterative_scan.ll
+7-8llvm/test/CodeGen/AMDGPU/global_atomics_iterative_scan_fp.ll
+5-5llvm/test/CodeGen/AMDGPU/sroa-phi-nodes.ll
+2-3llvm/test/CodeGen/AMDGPU/inlineasm-sgmask.ll
+1-1llvm/test/Analysis/UniformityAnalysis/AMDGPU/workitem-intrinsics.ll
+1-1llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
+23-276 files

LLVM/project 33ac0aallvm/unittests/IR VerifierTest.cpp

Add verifier test
DeltaFile
+71-0llvm/unittests/IR/VerifierTest.cpp
+71-01 files

LLVM/project 770a169llvm/test/CodeGen/AMDGPU target-cpu.ll

AMDGPU: Remove leftover test for old promote-alloca subtarget feature

This feature was removed in a56993a694ed02775285b9fe0e23fce8346491c9.
The test used to have a pair testing the enabled and disabled case,
and there's no point leaving the enabled partner.
DeltaFile
+0-12llvm/test/CodeGen/AMDGPU/target-cpu.ll
+0-121 files

LLVM/project 359b475mlir/include/mlir/Dialect/MemRef/IR MemRefOps.td, mlir/lib/Dialect/MemRef/Transforms ExpandOps.cpp

[mlir][memref] Remove unsafe `getType()` from ReshapeOp (#205105)

Remove the unsafe `getType` method from ReshapeOp. It unconditionally
casts the result to `MemRefType`, but `memref.reshape` may return an
`UnrankedMemRefType`, leading to an assertion failure. The redundant
build method is also removed alongside this change. Fixes #203812.
DeltaFile
+13-0mlir/test/Transforms/test-bubble-down-memory-space-casts.mlir
+6-5mlir/lib/Dialect/MemRef/Transforms/ExpandOps.cpp
+0-8mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
+19-133 files

LLVM/project f0bbae6clang-tools-extra/clangd XRefs.cpp AST.h, clang-tools-extra/clangd/index SymbolCollector.cpp SymbolCollector.h

[clangd] Navigate go-to-definition through forwarding wrappers to the constructor (#199480)

When the user invokes **Go to Definition** on a call like
`std::make_unique<T>(args...)` or `std::make_shared<T>(args...)`,
surface the constructor of `T` that is actually invoked inside the
wrapper, alongside the wrapper itself. The constructor is added before
the wrapper so LSP clients that auto-jump to the first target land on
it; clients that present a menu still let the user reach the wrapper.

This is the forward-direction counterpart to the find-references work in
#169742 (clangd/clangd#716): the same `isLikelyForwardingFunction` +
`searchConstructorsInForwardingFunction` machinery, applied to
`locateASTReferent`.
DeltaFile
+73-0clang-tools-extra/clangd/unittests/XRefsTests.cpp
+46-20clang-tools-extra/clangd/XRefs.cpp
+21-0clang-tools-extra/clangd/AST.h
+4-15clang-tools-extra/clangd/index/SymbolCollector.cpp
+16-0clang-tools-extra/clangd/AST.cpp
+3-3clang-tools-extra/clangd/index/SymbolCollector.h
+163-386 files

LLVM/project 10755f4llvm/test/Transforms/LoopVectorize/RISCV interleaved-masked-access.ll, llvm/test/Transforms/LoopVectorize/SystemZ addressing.ll

[LV][NFC] Remove instcombine pass from RUN lines in target tests (#205848)

There is still one test remaining:

  LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll

but this looks more like a phase-ordering test and should probably be
handled separately.
DeltaFile
+328-307llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
+122-119llvm/test/Transforms/LoopVectorize/RISCV/interleaved-masked-access.ll
+75-75llvm/test/Transforms/LoopVectorize/SystemZ/addressing.ll
+47-47llvm/test/Transforms/LoopVectorize/X86/interleaving.ll
+38-38llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll
+7-6llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll
+617-5921 files not shown
+623-5977 files

LLVM/project b2e7255llvm/lib/Object DXContainer.cpp, llvm/lib/ObjectYAML DXContainerYAML.cpp

[DirectX][ObjectYAML] Add PRIV part support (#204899)

Add support for DXContainer PRIV in the ObjectYAML pipeline so it can be
represented in structured YAML and round-tripped through
yaml2obj/obj2yaml.

PRIV part can store arbitrary user-provided binary blobs in DXContainer.
Unlike other DXContainer parts, PRIV part does not have to have 4-byte
aligned size. Therefore, if it is present, it is always the last section
in a DXContainer.

llvm-objcopy is already able to extract PRIV section. A test to verify
extraction of binary from PRIV is added.
DeltaFile
+42-0llvm/test/tools/obj2yaml/DXContainer/PRIVPart.yaml
+35-0llvm/test/tools/llvm-objcopy/DXContainer/dump-section-priv.yaml
+30-0llvm/unittests/Object/DXContainerTest.cpp
+29-0llvm/unittests/ObjectYAML/DXContainerYAMLTest.cpp
+11-0llvm/lib/Object/DXContainer.cpp
+9-0llvm/lib/ObjectYAML/DXContainerYAML.cpp
+156-04 files not shown
+169-010 files

LLVM/project 9e9de1fllvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp, llvm/test/CodeGen/AArch64/GlobalISel legalize-saddsat.mir legalize-ssubsat.mir

GlobalISel/LegalizerHelper: Use same LLT kind as WideTy for widen merge (#205816)

In widenScalarMergeValues, WideTy is input given by target. Use same LLT
kind for other types of different sizes instead of LLT::scalar.
Makes a difference with extendedLLTs.
DeltaFile
+2-2llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+1-2llvm/test/CodeGen/AArch64/GlobalISel/legalize-saddsat.mir
+1-2llvm/test/CodeGen/AArch64/GlobalISel/legalize-ssubsat.mir
+4-63 files

LLVM/project e1cbf0fllvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp, llvm/test/CodeGen/AArch64/GlobalISel legalize-and.mir

GlobalISel/LegalizerHelper: Use type of input load dst for LowerLoad (#205815)

Deduce dst type for new instructions, that do the load lowering, from
destination type of original load instead of from MMO.
Makes a difference with extendedLLTs.
DeltaFile
+24-27llvm/test/CodeGen/AArch64/GlobalISel/legalize-and.mir
+2-2llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+26-292 files

LLVM/project a0248a2llvm/lib/Target/AArch64 AArch64SVEInstrInfo.td AArch64TargetTransformInfo.cpp, llvm/test/CodeGen/AArch64 sve-mul-imm-add-adr.ll

[AArch64][SVE] add missing MLA commute instcombine (#205526)

Remove the MLA commuted patterns added in #198566 and canonicalise
those operations in instcombine instead.
DeltaFile
+0-29llvm/test/CodeGen/AArch64/sve-mul-imm-add-adr.ll
+24-0llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-fma-binops.ll
+0-11llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+6-0llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+30-404 files

LLVM/project a91a221llvm/docs LangRef.rst, llvm/include/llvm/IR Instructions.h

Update for comments.
DeltaFile
+14-11llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+6-6llvm/docs/LangRef.rst
+3-9llvm/lib/IR/Instructions.cpp
+2-5llvm/include/llvm/IR/Instructions.h
+6-0llvm/test/Bitcode/compatibility.ll
+5-1llvm/lib/AsmParser/LLParser.cpp
+36-321 files not shown
+37-337 files

LLVM/project bda6db4mlir/include/mlir/Dialect/XeGPU/uArch IntelGpuXe2.h uArchBase.h, mlir/lib/Dialect/XeGPU/Transforms XeGPUPropagateLayout.cpp XeGPULayoutImpl.cpp

[MLIR][XeGPU] Enable `isa<>` check for uarch (#204577)
DeltaFile
+29-647mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h
+427-141mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h
+68-0mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe3.h
+28-23mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+19-27mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+38-0mlir/include/mlir/Dialect/XeGPU/uArch/uArchCommon.h
+609-8387 files not shown
+639-86113 files

LLVM/project 6ab3433llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp

[NFC][AMDGPU][SIMemoryLegalizer] Use BitMaskUtils Helpers

We already used BitMaskUtils but did not use any of the helpers.
Fix it so the pass is a bit less verbose.

One unfortunate problem with BitMaskUtils is the lack of a bool operator,
so we need to use `any` instead. This is because C++ doesn't allow
conversion operators as free functions.
DeltaFile
+45-47llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+45-471 files

LLVM/project e354cd0llvm/include/llvm/Analysis VectorUtils.h, llvm/lib/Analysis VectorUtils.cpp

[LV] Only collect strides without predicates under OptForSize when interleaved access analysis (#205793)

During interleaved access analysis, certain addresses require a no-wrap
predicate to form an add recurrence and obtain the stride. However, when
optimizing for size, generating SCEV runtime checks is disallowed.

This patch modifies the constant stride collection when optimizing for
size to only collect strides that do not require predicates. This
ensures that vectorization will not blocked by disallowed predicates.
DeltaFile
+31-18llvm/test/Transforms/LoopVectorize/AArch64/discarded-interleave-group.ll
+6-4llvm/include/llvm/Analysis/VectorUtils.h
+4-3llvm/lib/Analysis/VectorUtils.cpp
+1-1llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+42-264 files

LLVM/project 87a2c4cflang/include/flang/Optimizer/Analysis AliasAnalysis.h, flang/lib/Optimizer/Analysis AliasAnalysis.cpp

[fir][aa] Add opt-in cache for use by fir AliasAnalysis clients
DeltaFile
+222-0flang/lib/Optimizer/Analysis/AliasAnalysis.cpp
+77-0flang/include/flang/Optimizer/Analysis/AliasAnalysis.h
+13-1flang/lib/Optimizer/Transforms/LoopInvariantCodeMotion.cpp
+312-13 files

LLVM/project 6d48d45llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPUHWEvents.cpp

[AMDGPU][HWEvents] Refactor VMEM_ACCESS as VMEM_READ_ACCESS (#204545)

Instead of having an HWEvent that can be either a read or a write
depending on the target, keep the events as straightforward as
possible and let InsertWaitCnt interpret it. Rename VMEM_ACCESS
to VMEM_READ_ACCESS and set VMEM_WRITE_ACCESS & similar events
even if the target does not have a VSCnt.

I think this conceptually makes more sense.
This separates concerns better so that HWEvents models events
objectively, and InsertWaitCnt handles them as necessary for the task
it is trying to achieve (insert wait instructions).

My end goal with this series of changes is to de-tangle InsertWaitCnt so
we can divide it into layers, and each layer worries about its own thing.  
This is only possible with proper separation of concerns.
DeltaFile
+23-13llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+5-4llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+1-3llvm/lib/Target/AMDGPU/AMDGPUHWEvents.def
+29-203 files

LLVM/project 938ee65llvm/lib/Target/AMDGPU AMDGPUHWEvents.cpp SIInsertWaitcnts.cpp

[AMDGPU][InsertWaitCnts] Move TENSOR/ASYNC event detection to separate header (#204544)

I forgot to move those out of the way as they were not grouped with the
other.
Now `getEventsFor` does all the work.
DeltaFile
+7-0llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+0-5llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+7-52 files

LLVM/project 78fab20llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Adjust comment
DeltaFile
+5-2llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+5-21 files

LLVM/project 134210dllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPUHWEvents.cpp

[AMDGPU][HWEvents] Refactor VMEM_ACCESS as VMEM_READ_ACCESS

Instead of having an HWEvent that can be either a read or a write
depending on the target, keep the events as straightforward as
possible and let InsertWaitCnt interpret it. Rename VMEM_ACCESS
to VMEM_READ_ACCESS and set VMEM_STORE_ACCESS & similar events
even if the target does not have a VSCnt.

I think this conceptually makes more sense.
This separates concerns better so that HWEvents nodels events
objectively, and InsertWaitCnt handles them as necessary for the task
it is trying to achieve (insert wait instructions).
DeltaFile
+18-11llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+5-4llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+1-3llvm/lib/Target/AMDGPU/AMDGPUHWEvents.def
+24-183 files

LLVM/project 97da529llvm/lib/Target/AMDGPU AMDGPUHWEvents.cpp SIInsertWaitcnts.cpp

[AMDGPU][InsertWaitCnts] Move TENSOR/ASYNC event detection to separate header

I forgot to move those out of the way as they were not grouped with the other.
Now `getEventsFor` does all the work.
DeltaFile
+7-0llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+0-5llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+7-52 files

LLVM/project 4bf16dcllvm/lib/Target/AMDGPU AMDGPUHWEvents.h SIInsertWaitcnts.cpp

[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask (#203864)

Follow up from comments on
https://github.com/llvm/llvm-project/pull/202886

Make HWEvent a bitmask by default instead of having both the enum, and a
separate HWEventSet. This has the advantage of streamlining the code a
bit and opening the possibility of adding "modifiers" to events, e.g. I
imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked
into the design.

I opted for a bit more verbosity by taking inspiration from
FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a
class w/ helper function. The downside is having to reimplement all the
little bitwise ops, but the result is a cleaner, simpler interface than
a raw enum (class) w/ many helper functions. I initially tried that but
I recoiled at the sight of things like `contains(A, B)` which isn't very
clear, while `A.contains(B)` is self explanatory.

    [3 lines not shown]
DeltaFile
+137-89llvm/lib/Target/AMDGPU/AMDGPUHWEvents.h
+99-105llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+61-59llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+28-34llvm/lib/Target/AMDGPU/AMDGPUHWEvents.def
+325-2874 files

LLVM/project a42540bllvm/test/CodeGen/ARM bf16-instructions.ll

[ARM] Add basic bf16 instructions tests. NFC (#206003)

Many of these are disabled as they do not yet lower successfully.
DeltaFile
+714-0llvm/test/CodeGen/ARM/bf16-instructions.ll
+714-01 files

LLVM/project 254df87llvm/lib/Transforms/InstCombine InstructionCombining.cpp, llvm/test/Transforms/InstCombine unshuffle-constant-poison-mask.ll

[InstCombine] Handle shuffle masks selecting poison in unshuffleConstant (#205870)

A shuffle mask can select from the second operand even when that operand
is poison. This caused unshuffleConstant to assert while trying to map
those mask elements into the first operand's constant vector.

Fix this by ignoring mask elements that select the poison operand.

Fixes https://github.com/llvm/llvm-project/issues/205769
DeltaFile
+14-0llvm/test/Transforms/InstCombine/unshuffle-constant-poison-mask.ll
+9-4llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+23-42 files

LLVM/project 1c6dc31llvm/lib/Target/Lanai LanaiCodeGenPassBuilder.cpp

formatting

Created using spr 1.3.7
DeltaFile
+7-3llvm/lib/Target/Lanai/LanaiCodeGenPassBuilder.cpp
+7-31 files

LLVM/project 9d6e0ddclang/lib/AST/ByteCode InterpHelpers.h, clang/test/AST/ByteCode new-delete.cpp

[clang][bytecode] Fix division by zero in CXXNewExpr handling (#205800)
DeltaFile
+11-0clang/test/AST/ByteCode/new-delete.cpp
+4-0clang/lib/AST/ByteCode/InterpHelpers.h
+15-02 files

LLVM/project 3ca4981llvm/lib/Target/Lanai LanaiISelDAGToDAG.h LanaiISelDAGToDAG.cpp

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.7

[skip ci]
DeltaFile
+25-0llvm/lib/Target/Lanai/LanaiISelDAGToDAG.h
+4-0llvm/lib/Target/Lanai/LanaiISelDAGToDAG.cpp
+29-02 files

LLVM/project d3df383llvm/lib/Target/Lanai LanaiCodeGenPassBuilder.cpp LanaiPassRegistry.def

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+65-0llvm/lib/Target/Lanai/LanaiCodeGenPassBuilder.cpp
+27-0llvm/lib/Target/Lanai/LanaiPassRegistry.def
+25-0llvm/lib/Target/Lanai/LanaiISelDAGToDAG.h
+8-0llvm/lib/Target/Lanai/LanaiTargetMachine.h
+4-0llvm/lib/Target/Lanai/LanaiISelDAGToDAG.cpp
+1-0llvm/lib/Target/Lanai/CMakeLists.txt
+130-06 files

LLVM/project 53783ebllvm/lib/Target/Lanai LanaiISelDAGToDAG.h LanaiISelDAGToDAG.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+25-0llvm/lib/Target/Lanai/LanaiISelDAGToDAG.h
+4-0llvm/lib/Target/Lanai/LanaiISelDAGToDAG.cpp
+29-02 files

LLVM/project 28f6605clang/include/clang/Basic CodeGenOptions.def, clang/include/clang/Options Options.td

Reapply "[Clang] Optionally use NewPM to run CodeGen Pipeline" (#205943)

This reverts commit 0c4cc9f8adc5acda1aa49b8a8704433e237848ee.

This patch also fixes the dependency issue by making the clang CodeGen
library depend on the LLVM CodeGen library which is needed by the NewPM
for CodeGen.

Reviewers: oontvoo

Pull Request: https://github.com/llvm/llvm-project/pull/205986
DeltaFile
+77-17clang/lib/CodeGen/BackendUtil.cpp
+9-0clang/test/CodeGen/X86/newpm.c
+8-0clang/include/clang/Options/Options.td
+1-0clang/include/clang/Basic/CodeGenOptions.def
+1-0clang/lib/CodeGen/CMakeLists.txt
+96-175 files

LLVM/project a736e61compiler-rt/lib/builtins/arm addsf3.S, compiler-rt/lib/builtins/arm/thumb1 addsf3fast.S addsf3.S

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+670-230compiler-rt/lib/builtins/arm/addsf3.S
+890-0compiler-rt/lib/builtins/arm/thumb1/addsf3fast.S
+385-0compiler-rt/test/builtins/Unit/addsf3new_test.c
+383-0compiler-rt/test/builtins/Unit/subsf3_test.c
+285-0compiler-rt/lib/builtins/arm/thumb1/addsf3.S
+142-89llvm/test/CodeGen/X86/apx/push2-pop2.ll
+2,755-319123 files not shown
+5,466-1,000129 files