LLVM/project ccc007allvm/lib/Target/AMDGPU AMDGPUAttributor.cpp, llvm/test/Transforms/PhaseOrdering/AMDGPU infer-address-space.ll

fixes
DeltaFile
+3-3llvm/test/Transforms/PhaseOrdering/AMDGPU/infer-address-space.ll
+1-2llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+4-52 files

LLVM/project a9f0b67llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 faddv-fp16.ll

Address comments 2
DeltaFile
+51-37llvm/test/CodeGen/AArch64/faddv-fp16.ll
+15-4llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+66-412 files

LLVM/project 9ec7b02llvm/lib/Transforms/Scalar JumpTableToSwitch.cpp, llvm/test/Transforms/JumpTableToSwitch stats.ll

[JTS] Add statistics (#183431)

This patch adds some statistics to the jump-table-to-switch pass. This
will make it easier to see in aggregate how changing the profitability
heuristics impacts how often the optimization fires.
DeltaFile
+39-0llvm/test/Transforms/JumpTableToSwitch/stats.ll
+8-0llvm/lib/Transforms/Scalar/JumpTableToSwitch.cpp
+47-02 files

LLVM/project 5e66b8cllvm/lib/Analysis ValueTracking.cpp, llvm/test/Analysis/BasicAA range.ll

[ValueTracking] Extend computeConstantRange for add/sub, sext/zext/trunc

Recursively compute operand ranges for add/sub and propagate ranges
through sext/zext/trunc.
For add/sub, the computed range is intersected with any existing range
from setLimitsForBinOp, and NSW/NUW flags are used via addWithNoWrap/
subWithNoWrap to tighten bounds.

The motivation is to enable further folding of reduce.add expressions
in comparisons, where the result range can be bounded by the input
element ranges.
DeltaFile
+1,231-69llvm/test/CodeGen/AMDGPU/div_v2i128.ll
+107-0llvm/unittests/Analysis/ValueTrackingTest.cpp
+44-51llvm/test/CodeGen/AMDGPU/srem64.ll
+58-9llvm/lib/Analysis/ValueTracking.cpp
+66-0llvm/test/Analysis/BasicAA/range.ll
+22-28llvm/test/CodeGen/AMDGPU/urem64.ll
+1,528-15711 files not shown
+1,610-25617 files

LLVM/project 2ed1940clang/lib/Driver/ToolChains Clang.cpp

[Driver][SPIRV] Fix SPIR-V build for AMD.

The AMD path doesn't use spirv-link, and the driver was incorrectly adding flags for it, which broke the build.
DeltaFile
+5-2clang/lib/Driver/ToolChains/Clang.cpp
+5-21 files

LLVM/project aa3d6b3clang/lib/AST/ByteCode Compiler.h, clang/test/AST/ByteCode codegen-constexpr-unknown.cpp

[clang][bytecode] Attach block scope variables to the root scope (#183279)

... if we don't have a block scope available. This can happen in
`EvalEmitter` scenarios and can cause local variable blocks to be
prematurely converted to dead blocks. Attach `ScopeKind::Block` variable
to the root scope instead.
DeltaFile
+28-3clang/test/AST/ByteCode/codegen-constexpr-unknown.cpp
+7-0clang/lib/AST/ByteCode/Compiler.h
+35-32 files

LLVM/project 58e3eaeutils/bazel/llvm-project-overlay/mlir BUILD.bazel, utils/bazel/llvm-project-overlay/mlir/python BUILD.bazel

[bazel] Fix build for 67ac275 (#183510)

DeltaFile
+57-57utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+15-15utils/bazel/llvm-project-overlay/mlir/python/BUILD.bazel
+3-3utils/bazel/llvm-project-overlay/mlir/test/BUILD.bazel
+75-753 files

LLVM/project ff3e4a6llvm/lib/Transforms/InstCombine InstCombineCompares.cpp, llvm/test/Transforms/InstCombine icmp-binop.ll

[InstCombine] Fold shift of boolean zext to logic sequence (#180596)

Alive2 proofs:
- `eq` case: 
  https://alive2.llvm.org/ce/z/09hPk-
- `ne` case:
  https://alive2.llvm.org/ce/z/zrof4X

Resolves llvm/llvm-project#180492
DeltaFile
+143-0llvm/test/Transforms/InstCombine/icmp-binop.ll
+27-32llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+170-322 files

LLVM/project d3f6902mlir/include/mlir/Dialect/Arith/IR ArithOps.td, mlir/lib/Conversion/ArithToLLVM ArithToLLVM.cpp

[mlir][arith] Add `nneg` to index_castui. (#183383)

Follow up to #183165

`nneg` is added to `arith.index_castui`. 

> When the `nneg` flag is present, the operand is assumed to be
non-negative.
> In this case, zero extension is equivalent to sign extension. When
this
>    assumption is violated, the result is poison.

* Updates op definition to add assembly format and `nneg` flag.
* Updates canonicalization patterns to take into account `nneg` in
`arith.index_castui`.
* Updates arith-to-llvm lowering to preserve `nneg` when lowering
`arith.index_castui` to `zext`
* Adds roundtrip, canonicalization, and lowering tests


    [4 lines not shown]
DeltaFile
+18-5mlir/lib/Conversion/ArithToLLVM/ArithToLLVM.cpp
+20-1mlir/include/mlir/Dialect/Arith/IR/ArithOps.td
+19-0mlir/test/Conversion/ArithToLLVM/arith-to-llvm.mlir
+13-3mlir/test/Dialect/Arith/canonicalize.mlir
+14-1mlir/test/Dialect/Arith/ops.mlir
+3-3mlir/lib/Dialect/Arith/IR/ArithCanonicalization.td
+87-136 files

LLVM/project 327f060llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 known-pow2.ll

[DAG] Fix OrZero in isKnownToBeAPowerOfTwo ISD::AND (#182934)

Fixes #181653
DeltaFile
+50-0llvm/test/CodeGen/X86/known-pow2.ll
+1-2llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+51-22 files

LLVM/project ffc8780mlir/include/mlir/Dialect/ArmSME/IR CMakeLists.txt, mlir/include/mlir/Dialect/ArmSVE/TransformOps CMakeLists.txt

[MLIR] Fix mlir-doc build failures by adding -dialect to add_mlir_doc calls

Add -dialect=<name> to all add_mlir_doc() calls that were missing it, fixing
failures after a8f2e80d5fe3 made findDialectToGenerate() require -dialect when
multiple dialects are present in a .td file.
DeltaFile
+3-3mlir/include/mlir/Dialect/LLVMIR/CMakeLists.txt
+2-2mlir/include/mlir/Dialect/Linalg/TransformOps/CMakeLists.txt
+2-2mlir/include/mlir/Dialect/ArmSME/IR/CMakeLists.txt
+1-1mlir/include/mlir/Dialect/ArmSVE/TransformOps/CMakeLists.txt
+1-1mlir/include/mlir/Dialect/Bufferization/TransformOps/CMakeLists.txt
+1-1mlir/include/mlir/Dialect/DLTI/TransformOps/CMakeLists.txt
+10-1023 files not shown
+33-3329 files

LLVM/project 5c8b812compiler-rt/lib/builtins/arm/thumb1 dcmp.h gedf2.S, compiler-rt/test/builtins/Unit comparedf2new_test.c

Merge branch 'arm-fp-dcmp' into arm-fp-fcmp

Pulls in the CI fix from that branch too.
DeltaFile
+13-20compiler-rt/lib/builtins/arm/thumb1/dcmp.h
+12-0compiler-rt/test/builtins/Unit/comparedf2new_test.c
+1-1compiler-rt/lib/builtins/arm/thumb1/gedf2.S
+1-1compiler-rt/lib/builtins/arm/thumb1/cmpdf2.S
+27-224 files

LLVM/project be05b10compiler-rt/test/builtins/Unit comparesf2new_test.c

Fix CI failure on Windows

The new test was failing on Windows, because it tries to call
`__cmpsf2`, which the generic builtins/comparesf2.c only defines
conditionally on `__ELF__`. Do the same in the test.
DeltaFile
+12-0compiler-rt/test/builtins/Unit/comparesf2new_test.c
+12-01 files

LLVM/project c64753ecompiler-rt/test/builtins/Unit comparedf2new_test.c

Fix CI failure on Windows

The new test was failing on Windows, because it tries to call
`__cmpdf2`, which the generic builtins/comparedf2.c only defines
conditionally on `__ELF__`. Do the same in the test.
DeltaFile
+12-0compiler-rt/test/builtins/Unit/comparedf2new_test.c
+12-01 files

LLVM/project 841d511llvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/SimplifyCFG/AArch64 switch-to-lookup-table-vector-constants.ll switch-to-lookup-table-vector-splat.ll

[LLVM][SimplifyCFG] Allow switch-to-table for some vector constants. (#183057)

Only applies to fixed length vector constants that are made up of either
ConstantInt or ConstantFP elements.
DeltaFile
+114-0llvm/test/Transforms/SimplifyCFG/AArch64/switch-to-lookup-table-vector-constants.ll
+0-47llvm/test/Transforms/SimplifyCFG/AArch64/switch-to-lookup-table-vector-splat.ll
+4-4llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+118-513 files

LLVM/project d57fdacllvm/lib/Transforms/Scalar NaryReassociate.cpp

review: address suggestions
DeltaFile
+27-36llvm/lib/Transforms/Scalar/NaryReassociate.cpp
+27-361 files

LLVM/project b04179aclang/lib/Driver/ToolChains FreeBSD.cpp

fixup! [Toolchains][FreeBSD] Honor system libgcc
DeltaFile
+6-0clang/lib/Driver/ToolChains/FreeBSD.cpp
+6-01 files

LLVM/project b7989delibc/test/shared CMakeLists.txt

[libc][math] Disable shared math tests on AArch64

DeltaFile
+4-0libc/test/shared/CMakeLists.txt
+4-01 files

LLVM/project 014f73cllvm/lib/Target/AMDGPU AMDGPU.td

Update llvm/lib/Target/AMDGPU/AMDGPU.td

Co-authored-by: Matt Arsenault <Matthew.Arsenault at amd.com>
DeltaFile
+1-1llvm/lib/Target/AMDGPU/AMDGPU.td
+1-11 files

LLVM/project c5d6febllvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/AArch64 transform-narrow-interleave-to-widen-memory-scalable.ll

[VPlan] Limit interleave group narrowing to consecutive wide loads.

Tighten check in canNarrowLoad to require consecutive wide loads; we
cannot properly narrow gathers at the moment.

Fixe https://github.com/llvm/llvm-project/issues/183345.
DeltaFile
+55-0llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-scalable.ll
+1-1llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+56-12 files

LLVM/project 369cff2llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Address suggestions (one-use mul, reduce indentation, use isa)
DeltaFile
+26-31llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+26-311 files

LLVM/project bfa3da8clang/lib/AST/ByteCode Record.h Record.cpp

[clang][bytecode] Optimize `interp::Record` a bit (#183494)

And things around it.

Remove the `FieldMap`, since we can use the field's index instead and
only keep an array around. `reserve()` the sizes and use
`emplace_back()`.
DeltaFile
+15-7clang/lib/AST/ByteCode/Record.h
+1-9clang/lib/AST/ByteCode/Record.cpp
+5-3clang/lib/AST/ByteCode/Program.cpp
+21-193 files

LLVM/project 4619f2blibc/test/shared CMakeLists.txt

[libc][math] Disable shared math tests on AArch64
DeltaFile
+4-0libc/test/shared/CMakeLists.txt
+4-01 files

LLVM/project bb30e28llvm/lib/Target/AArch64 AArch64InstrInfo.cpp, llvm/unittests/Target/AArch64 InstSizes.cpp

[AArch64] Report accurate sizes for MOVaddr and MOVimm pseudos
DeltaFile
+89-0llvm/unittests/Target/AArch64/InstSizes.cpp
+28-16llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+117-162 files

LLVM/project 9e48c00llvm/test/CodeGen/AMDGPU local-stack-alloc-add-references.gfx8.mir coalesce-copy-to-agpr-to-av-registers.mir, llvm/test/TableGen ArtificialRegs.td

[TableGen] Complete the support for artificial registers

Artificial registers were added in eb0c510ecde667cd911682cc1e855f73f341d134
as a means of giving super-registers heavier weights than that
of their subregisters, even when they only contain a single
physical subregister.

Artifical registers thus do exist in code and participate in
register unit weight calculations, but are not supposed to be
available for register allocation.

This patch completes the support for artificial registers to:

- Ignore artificial registers when joining register unit uber
  sets. Artificial registers may be members of classes that
  together include registers and their sub-registers, making it
  impossible to compute normalised weights for uber sets they
  belong to.


    [28 lines not shown]
DeltaFile
+180-180llvm/test/CodeGen/AMDGPU/local-stack-alloc-add-references.gfx8.mir
+120-120llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
+90-90llvm/test/CodeGen/AMDGPU/local-stack-alloc-add-references.gfx9.mir
+60-7llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+56-0llvm/test/TableGen/ArtificialRegs.td
+18-18llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir
+524-41525 files not shown
+675-56231 files

LLVM/project d8ce0e7flang/lib/Semantics check-omp-loop.cpp check-omp-structure.h

[flang][OpenMP] Inline CheckNestedBlock, NFC (#181732)

CheckNestedBlock no longer calls itself, which was the primary reason
for the code to be in a separate function.
DeltaFile
+21-26flang/lib/Semantics/check-omp-loop.cpp
+0-2flang/lib/Semantics/check-omp-structure.h
+21-282 files

LLVM/project d3f76b3llvm/lib/Target/AArch64 AArch64InstrInfo.cpp

[AArch64] Report accurate sizes for MOVaddr and MOVimm pseudos
DeltaFile
+28-16llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+28-161 files

LLVM/project 1ec3b86llvm/lib/Target/AArch64 AArch64ExpandPseudo.cpp AArch64ExpandImm.cpp

[NFC][AArch64] Extract MOVaddr* expansion model into common header

This makes the expansion logic reusable by getInstSizeInBytes in a
follow-up patch.
DeltaFile
+742-0llvm/lib/Target/AArch64/AArch64ExpandPseudo.cpp
+0-722llvm/lib/Target/AArch64/AArch64ExpandImm.cpp
+75-56llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+42-0llvm/lib/Target/AArch64/AArch64ExpandPseudo.h
+0-35llvm/lib/Target/AArch64/AArch64ExpandImm.h
+10-9llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+869-8225 files not shown
+886-83911 files

LLVM/project 254cb2allvm/include/llvm/CodeGen TargetInstrInfo.h, llvm/lib/CodeGen PostRAHazardRecognizer.cpp

[AMDGPU] Hoist WMMA coexecution hazard V_NOPs from loops to preheaders (#176895)

On GFX1250, V_NOPs inserted for WMMA coexecution hazards are placed at
the use-site. When the hazard-consuming instruction is inside a loop and
the WMMA is outside, these NOPs execute every iteration even though the
hazard only needs to be covered once.

This patch hoists the V_NOPs to the loop preheader, reducing executions
from N iterations to 1.

```
Example (assuming a hazard requiring K V_NOPs):
  Before:
    bb.0 (preheader): WMMA writes vgpr0
    bb.1 (loop):      V_NOP xK, VALU reads vgpr0, branch bb.1
                      -> K NOPs executed per iteration

  After:
    bb.0 (preheader): WMMA writes vgpr0, V_NOP xK

    [12 lines not shown]
DeltaFile
+516-30llvm/test/CodeGen/AMDGPU/wmma-nop-hoisting.mir
+163-62llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+21-4llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+14-7llvm/lib/CodeGen/PostRAHazardRecognizer.cpp
+3-2llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+3-1llvm/include/llvm/CodeGen/TargetInstrInfo.h
+720-1061 files not shown
+722-1077 files

LLVM/project 32b8b9bllvm/lib/Transforms/Vectorize VPlanConstruction.cpp VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize use-scalar-epilogue-if-tp-fails.ll

[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507)

Now that we have ExitingIVValue, we can also use it for tail-folded
loops; the only difference is that we have to compute the end value with
the original trip count instead the vector trip count.

This allows removing the induction increment operand only used when
tail-folding.

PR: https://github.com/llvm/llvm-project/pull/182507
DeltaFile
+66-11llvm/test/Transforms/LoopVectorize/X86/fold-tail-low-trip-count.ll
+48-8llvm/test/Transforms/LoopVectorize/AArch64/fold-tail-low-trip-count.ll
+12-26llvm/test/Transforms/LoopVectorize/use-scalar-epilogue-if-tp-fails.ll
+10-8llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+12-6llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+4-9llvm/test/Transforms/LoopVectorize/X86/small-size.ll
+152-688 files not shown
+167-9614 files