LLVM/project ca8419dmlir/include/mlir/Dialect/AMDGPU/IR AMDGPU.td, mlir/lib/Dialect/AMDGPU/IR AMDGPUDialect.cpp

[mlir][amdgpu] Fuse adjacent `MemoryCounterWaitOp` (#171148)

Taking the minimum value.
DeltaFile
+45-0mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp
+36-0mlir/test/Dialect/AMDGPU/canonicalize.mlir
+2-0mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+83-03 files

LLVM/project 0a9455aclang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

[CIR] Clean up visibility conversion (NFC)
DeltaFile
+5-7clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+5-71 files

LLVM/project ebdb903llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining-avx512f.ll compress-undef-float-passthrough.ll

[X86] Handle X86ISD::EXPAND/COMPRESS nodes as target shuffles (#171119)

Allows for shuffle simplification

Required a minor fix to the overly reduced compress-undef-float-passthrough.ll regression test
DeltaFile
+11-56llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
+46-0llvm/lib/Target/X86/X86ISelLowering.cpp
+2-2llvm/test/CodeGen/X86/compress-undef-float-passthrough.ll
+59-583 files

LLVM/project b08c72bflang/lib/Parser openmp-parsers.cpp unparse.cpp, flang/test/Lower/OpenMP/Todo threadset.f90

[Flang][OpenMP] Enables parsing of threadset clause (#169856)

DeltaFile
+79-0flang/test/Parser/OpenMP/threadset-clause.f90
+10-0flang/test/Lower/OpenMP/Todo/threadset.f90
+9-0flang/test/Semantics/OpenMP/threadset-clause.f90
+7-1flang/lib/Parser/openmp-parsers.cpp
+1-0flang/lib/Parser/unparse.cpp
+106-15 files

LLVM/project c5b9010llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/AArch64 pr60831-sve-inv-store-crash.ll sve-vector-reverse.ll

[VPlan] Use nuw when computing {VF,VScale}xUF (#170710)

These quantities should never unsigned-wrap. This matches the behavior
if only VFxUF is used (and not VF): when computing both VF and VFxUF,
nuw should hold for each step separately.
DeltaFile
+12-12llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+7-7llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
+3-3llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll
+4-2llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+2-2llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll
+2-2llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
+30-284 files not shown
+35-3310 files

LLVM/project fb3eff2bolt/include/bolt/Passes LivenessAnalysis.h, bolt/lib/Passes ShrinkWrapping.cpp TailDuplication.cpp

[ADT] Make use of subsetOf and anyCommon methods of BitVector (NFC)

Replace the code along these lines

    BitVector Tmp = LHS;
    Tmp &= RHS;
    return Tmp.any();

and

    BitVector Tmp = LHS;
    Tmp.reset(RHS);
    return Tmp.none();

with `LHS.anyCommon(RHS)` and `LHS.subsetOf(RHS)`, correspondingly, which
do not require creating temporary BitVector and can return early.
DeltaFile
+4-6bolt/lib/Passes/ShrinkWrapping.cpp
+4-4bolt/lib/Passes/TailDuplication.cpp
+2-6bolt/lib/Passes/RegReAssign.cpp
+2-4llvm/tools/llvm-exegesis/lib/SnippetGenerator.cpp
+2-4llvm/lib/CodeGen/RDFRegisters.cpp
+2-3bolt/include/bolt/Passes/LivenessAnalysis.h
+16-272 files not shown
+18-318 files

LLVM/project b125532bolt/lib/Passes LongJmp.cpp, bolt/lib/Rewrite GNUPropertyRewriter.cpp

[BOLT][BTI] Add needed BTIs in LongJmp or refuse to optimize binary

This patch adds BTI landing pads to ShortJmp/LongJmp targets in the
LongJmp pass when optimizing BTI binaries.

BOLT does not have the ability to add BTI to all types of functions.
This patch aims to insert the landing pad where possible, and emit an
error where it currently is not.

BOLT cannot insert BTIs into several function "types", including:
- ignored functions,
- PLT functions,
- other functions without a CFG.

Additional context:

In #161206, BOLT gained the ability to decode the .note.gnu.property
section, and warn about lack of BTI support for BOLT. However, this
warning is misleading: the emitted binary may not need extra BTI landing

    [3 lines not shown]
DeltaFile
+50-3bolt/lib/Passes/LongJmp.cpp
+46-0bolt/test/AArch64/long-jmp-bti.s
+35-0bolt/test/AArch64/long-jmp-bti-ignored.s
+2-2bolt/test/AArch64/bti-note.test
+2-2bolt/test/AArch64/no-bti-note.test
+1-2bolt/lib/Rewrite/GNUPropertyRewriter.cpp
+136-91 files not shown
+138-97 files

LLVM/project dd0f874clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

[CIR] Clean up visibility conversion (NFC)
DeltaFile
+5-7clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+5-71 files

LLVM/project 9a5fa30llvm/include/llvm/ADT STLExtras.h, llvm/unittests/ADT STLExtrasTest.cpp

[ADT] Add `llvm::reverse_conditionally()` iterator (#171040)

This patch adds a simple iterator range that allows conditionally
iterating a collection in reverse. It works with any collection
supported by `llvm::reverse(Collection)`.

```
void foo(bool Reverse, std::vector<int>& C) {
  for (int I : reverse_conditionally(C, Reverse)) {
    // ...
  }
}
```
DeltaFile
+24-0llvm/unittests/ADT/STLExtrasTest.cpp
+12-0llvm/include/llvm/ADT/STLExtras.h
+36-02 files

LLVM/project 2a389fdllvm/test/CodeGen/AMDGPU llvm.exp10.ll, llvm/test/CodeGen/RISCV rv32p.ll rv64p.ll

rebase for ReleaseNotes conflict

Created using spr 1.3.5-bogner
DeltaFile
+2,027-185llvm/test/CodeGen/X86/shift-i512.ll
+1,563-413llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+2-668llvm/test/CodeGen/RISCV/rv32p.ll
+323-320llvm/test/CodeGen/AMDGPU/llvm.exp10.ll
+0-629llvm/test/CodeGen/RISCV/rv64p.ll
+353-237mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+4,268-2,452370 files not shown
+10,662-5,329376 files

LLVM/project e86dd12llvm/unittests/ADT BitVectorTest.cpp

Address comments
DeltaFile
+8-7llvm/unittests/ADT/BitVectorTest.cpp
+8-71 files

LLVM/project 886f54allvm/lib/CodeGen/SelectionDAG LegalizeDAG.cpp

DAG: Set MachinePointerInfo for stack when expanding divrem libcall (#170537)

DeltaFile
+5-2llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+5-21 files

LLVM/project 8c772e0clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

[CIR] Clean up visibility conversion (NFC)
DeltaFile
+5-7clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+5-71 files

LLVM/project 1ae9575llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

[AMDGPU][NFC] Update a comment about FLAT v/s LDSDMA

The change in #170263 does not do justice to common knowledge in the backend.
Fix the comment to reflect the relation between FLAT encoding, flat pointer
access, and LDSDMA operations.
DeltaFile
+5-6llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+5-61 files

LLVM/project a6fc5a1clang-tools-extra/clang-tidy/fuchsia MultipleInheritanceCheck.cpp MultipleInheritanceCheck.h

[clang-tidy][NFC] Refactor `fuchsia-multiple-inheritance` (#171059)

DeltaFile
+36-81clang-tools-extra/clang-tidy/fuchsia/MultipleInheritanceCheck.cpp
+1-4clang-tools-extra/clang-tidy/fuchsia/MultipleInheritanceCheck.h
+37-852 files

LLVM/project ce73cbbclang/lib/Headers __clang_cuda_complex_builtins.h, clang/test/Headers amdgcn-openmp-device-math-complex.cpp amdgcn-openmp-device-math-complex.c

clang: Use generic builtins in cuda complex builtins header (#171106)

There's no reason to use the ocml or nv prefixed functions and
maintain this list of alias macros. I left these macros in for
NVPTX in the scalbn and logb case, since those have a special
case hack in the AMDGPU codegen and probably do not work on ptx.
DeltaFile
+87-143clang/lib/Headers/__clang_cuda_complex_builtins.h
+19-19clang/test/Headers/amdgcn-openmp-device-math-complex.cpp
+18-18clang/test/Headers/amdgcn-openmp-device-math-complex.c
+16-16clang/test/Headers/nvptx_device_math_complex.cpp
+16-16clang/test/Headers/nvptx_device_math_complex.c
+156-2125 files

LLVM/project cc19f42llvm/lib/Target/AMDGPU AMDGPUArgumentUsageInfo.h SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[AMDGPU][NPM] Port AMDGPUArgumentUsageInfo to NPM (#170886)

Port AMDGPUArgumentUsageInfo analysis to the NPM to fix suboptimal code
generation when NPM is enabled by default.

Previously, DAG.getPass() returns nullptr when using NPM, causing the
argument usage info to be unavailable during ISel. This resulted in
fallback to FixedABIFunctionInfo which assumes all implicit arguments
are needed, generating unnecessary register setup code for entry
functions.

Fixes LLVM::CodeGen/AMDGPU/cc-entry.ll

Changes:
- Split AMDGPUArgumentUsageInfo into a data class and NPM analysis
wrapper
- Update SIISelLowering to use DAG.getMFAM() for NPM path
- Add RequireAnalysisPass in addPreISel() to ensure analysis
availability

This follows the same pattern used for PhysicalRegisterUsageInfo.
DeltaFile
+54-12llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h
+22-8llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+15-11llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp
+3-3llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+6-0llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+5-1llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+105-352 files not shown
+108-388 files

LLVM/project 0487154mlir/include/mlir/Dialect/AMDGPU/IR AMDGPU.td, mlir/lib/Conversion/AMDGPUToROCDL AMDGPUToROCDL.cpp

[mlir][amdgpu] Add workgroup_mask to MakeDmaDescriptorOp (#171103)

- add `workgroup_mask` and `early_timeout`
DeltaFile
+85-2mlir/test/Conversion/AMDGPUToROCDL/gfx1250.mlir
+28-3mlir/test/Dialect/AMDGPU/ops.mlir
+25-1mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+9-0mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+3-0mlir/lib/Dialect/AMDGPU/IR/AMDGPUDialect.cpp
+150-65 files

LLVM/project 6e34fecllvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU inline-asm.ll inlineasm-mismatched-size-error.ll

review comments
DeltaFile
+0-12llvm/test/CodeGen/AMDGPU/inline-asm.ll
+6-0llvm/test/CodeGen/AMDGPU/inlineasm-mismatched-size-error.ll
+1-4llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-163 files

LLVM/project 5be82f0clang/include/clang/Analysis/Analyses/LifetimeSafety LifetimeAnnotations.h, clang/lib/Analysis/LifetimeSafety LifetimeAnnotations.cpp FactsGenerator.cpp

Implicit lifetimebound for std namespace
DeltaFile
+180-0clang/unittests/Analysis/LifetimeSafetyTest.cpp
+82-0clang/lib/Analysis/LifetimeSafety/LifetimeAnnotations.cpp
+2-62clang/lib/Sema/CheckExprLifetime.cpp
+14-0clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeAnnotations.h
+5-1clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+4-0clang/lib/Analysis/LifetimeSafety/Origins.cpp
+287-636 files

LLVM/project 444baf2clang/include/clang/Analysis/Analyses/LifetimeSafety FactsGenerator.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

std_move false positive
DeltaFile
+23-0clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+18-0clang/test/Sema/warn-lifetime-safety.cpp
+5-0clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+46-03 files

LLVM/project 9308d55clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp, clang/test/Sema warn-lifetime-safety.cpp warn-lifetime-safety-suggestions.cpp

dereference_operator
DeltaFile
+11-11clang/test/Sema/warn-lifetime-safety.cpp
+6-3clang/test/Sema/warn-lifetime-safety-suggestions.cpp
+6-0clang/test/Sema/warn-lifetime-safety-dataflow.cpp
+4-0clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+27-144 files

LLVM/project db307d4clang/include/clang/Analysis/Analyses/LifetimeSafety Origins.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp Origins.cpp

Tree -> List
DeltaFile
+153-371clang/test/Sema/warn-lifetime-safety-dataflow.cpp
+384-30clang/test/Sema/warn-lifetime-safety.cpp
+250-96clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+131-64clang/lib/Analysis/LifetimeSafety/Origins.cpp
+102-22clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+55-30clang/unittests/Analysis/LifetimeSafetyTest.cpp
+1,075-6139 files not shown
+1,190-65915 files

LLVM/project deac264mlir/include/mlir/Bindings/Python NanobindAdaptors.h, mlir/lib/Bindings/Python IRTypes.cpp MainModule.cpp

[mlir][py] partially use mlir_type_subclass for IRTypes.cpp

Port the bindings for non-shaped builtin types in IRTypes.cpp to use the
`mlir_type_subclass` mechanism used by non-builtin types. This is part of a
longer-term cleanup to only support one subclassing mechanism. Eventually, the
`PyConcreteType` mechanism will be removed.

This required a surgery in the type casters and the `mlir_type_subclass` logic
to avoid circular imports of the `_mlir.ir` module that would otherwise when
using `mlir_type_subclass` to define classes in the `_mlir.ir` module.

Tests are updated to use the `.get_static_typeid()` function instead of the
`.static_typeid` property that was specific to builtin types due to the
`PyConcreteType` mechanism. The change should be NFC otherwise.
DeltaFile
+342-639mlir/lib/Bindings/Python/IRTypes.cpp
+30-11mlir/include/mlir/Bindings/Python/NanobindAdaptors.h
+15-0mlir/lib/Bindings/Python/MainModule.cpp
+7-4mlir/test/python/ir/builtin_types.py
+4-4mlir/test/python/dialects/arith_dialect.py
+3-3mlir/test/python/ir/value.py
+401-6616 files

LLVM/project a022605llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU inline-asm-use-bool.ll

[AMDGPU] Fix a crash when a bool variable is used in inline asm
DeltaFile
+15-0llvm/test/CodeGen/AMDGPU/inline-asm-use-bool.ll
+5-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+20-02 files

LLVM/project e3a4f8cllvm/test/CodeGen/AMDGPU inline-asm-use-bool.ll inline-asm.ll

move test
DeltaFile
+0-15llvm/test/CodeGen/AMDGPU/inline-asm-use-bool.ll
+12-0llvm/test/CodeGen/AMDGPU/inline-asm.ll
+12-152 files

LLVM/project e6047d9mlir/include/mlir/Bindings/Python NanobindAdaptors.h, mlir/test/python/dialects pdl_types.py

[mlir][py] avoid crashing on None contexts in custom `get`s

Following a series of refactorings, MLIR Python bindings would crash if a
dialect object requiring a context defined using mlir_attribute/type_subclass
was constructed outside of the `ir.Context` context manager. The type caster
for `MlirContext` would try using `ir.Context.current` when the default `None`
value was provided to the `get`, which would also just return `None`. The
caster would then attempt to obtain the MLIR capsule for that `None`, fail,
but access it anyway without checking, leading to a C++ assertion failure or
segfault.

Guard against this case in nanobind adaptors. Also emit a warning to the user
to clarify expectations, as the default message confusingly says that `None` is
accepted as context and then fails with a type error. Using Python C API is
currently recommended by nanobind in this case since the surrounding function
must be marked `noexcept`.

The corresponding test is in the PDL dialect since it is where I first observed
the behavior. Core types are not using the `mlir_type_subclass` mechanism and
are immune to the problem, so cannot be used for checking.
DeltaFile
+14-6mlir/include/mlir/Bindings/Python/NanobindAdaptors.h
+13-0mlir/test/python/dialects/pdl_types.py
+27-62 files

LLVM/project e8219e5llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/AArch64 sve-predicated-costs.ll predicated-costs.ll

[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690)

In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we
unprofitably loop vectorize on RISC-V.

The loop looks something like:

```c
  for (int i = 0; i < n; i++) {
    if (x0[i] == a)
      if (x1[i] == b)
        if (x2[i] == c)
          // do stuff...
  }
```

Because it's so deeply nested the actual inner level of the loop rarely
gets executed. However we still deem it profitable to vectorize, which
due to the if-conversion means we now always execute the body.

    [19 lines not shown]
DeltaFile
+161-0llvm/test/Transforms/LoopVectorize/AArch64/sve-predicated-costs.ll
+115-0llvm/test/Transforms/LoopVectorize/RISCV/predicated-costs.ll
+62-34llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+61-0llvm/test/Transforms/PhaseOrdering/loop-vectorize-bfi.ll
+53-0llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll
+5-3llvm/test/Transforms/LoopVectorize/AArch64/early_exit_costs.ll
+457-375 files not shown
+463-4211 files

LLVM/project dd06214clang/lib/CIR/CodeGen CIRGenDeclOpenACC.cpp, clang/test/CIR/CodeGenOpenACC routine-bind.c routine-bind.cpp

[OpenACC][CIR] Implement routine 'bind'-with-a-string lowering (#170916)

The 'bind' clause emits an attribute on the RoutineOp that states which
function it should call on the device side. When provided in
double-quotes, the function on the device side should be the exact name
given. This patch emits the IR to do that.

As a part of that, we add a helper function to the OpenACC dialect to do
so, as well as a version that adds the ID version (though we don't
    exercise th at yet).

The 'bind' with an ID should do the MANGLED name, but it isn't quite
clear what that name SHOULD be yet. Since the signature of a function is
included in its mangling, and we're not providing said signature, we
have to come up with something. This is left as an exercise for a future
patch.
DeltaFile
+39-0clang/test/CIR/CodeGenOpenACC/routine-bind.c
+39-0clang/test/CIR/CodeGenOpenACC/routine-bind.cpp
+39-0mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp
+14-0clang/lib/CIR/CodeGen/CIRGenDeclOpenACC.cpp
+8-0mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
+139-05 files

LLVM/project cda2ea3llvm/test/CodeGen/AMDGPU maximumnum.bf16.ll minimumnum.bf16.ll, llvm/test/CodeGen/X86 wide-scalar-shift-by-byte-multiple-legalization.ll shift-i512.ll

Merge branch 'main' into users/rovka/relax-callers-for-chain-funcs
DeltaFile
+17,522-20,773llvm/test/CodeGen/X86/wide-scalar-shift-by-byte-multiple-legalization.ll
+8,857-10,952llvm/test/CodeGen/AMDGPU/maximumnum.bf16.ll
+8,840-10,957llvm/test/CodeGen/AMDGPU/minimumnum.bf16.ll
+4,725-0llvm/test/tools/llvm-mca/RISCV/SpacemitX60/vlseg-vsseg.s
+4,091-0llvm/test/CodeGen/AMDGPU/atomicrmw_usub_sat.ll
+2,027-185llvm/test/CodeGen/X86/shift-i512.ll
+46,062-42,8673,206 files not shown
+192,075-102,9463,212 files