LLVM/project 2a3e745llvm/test/CodeGen/AMDGPU rewrite-vgpr-mfma-scale-to-agpr.mir

Fix test from #168609 (#169041)

DeltaFile
+1-1llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-scale-to-agpr.mir
+1-11 files

LLVM/project b98f6a5llvm/lib/Transforms/Vectorize VPlanUtils.cpp

[VPlan] Cast to VPIRMetadata in getMemoryLocation (NFC) (#169028)

This allows us to strip an unnecessary TypeSwitch.
DeltaFile
+10-13llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+10-131 files

LLVM/project 31711c9llvm/lib/Transforms/Vectorize VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize/AArch64 force-target-instruction-cost.ll

[VPlan] Only apply forced cost to recipes with underlying values. (#168372)

Only apply forced instruction costs to recipes with underlying values to
match the legacy cost model. A VPlan may have a number of additional
VPInstructions without underlying values that are not considered for its
cost, and assigning forced costs to them would incorrectly inflate its
cost.

This fixes a cost divergence between legacy and VPlan-based cost models
with forced instruction costs.

PR: https://github.com/llvm/llvm-project/pull/168372
DeltaFile
+53-0llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+8-3llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+61-32 files

LLVM/project 3005886flang/test/Semantics/OpenMP loop-transformation-construct01.f90

[flang][OpenMP] Fix some typo-like things in test case (#168582)

DeltaFile
+12-12flang/test/Semantics/OpenMP/loop-transformation-construct01.f90
+12-121 files

LLVM/project f4ebee0flang/lib/Semantics resolve-directives.cpp check-omp-loop.cpp, flang/test/Parser/OpenMP fuse02.f90 loop-transformation-construct05.f90

[Flang][OpenMP] Add semantic support for Loop Sequences and OpenMP loop fuse (#161213)

This patch adds semantics for the `omp fuse` directive in flang, as
specified in OpenMP 6.0. This patch also enables semantic support for
loop sequences which are needed for the fuse directive along with
semantics for the `looprange` clause. These changes are only semantic.
Relevant tests have been added , and previous behavior is retained with
no changes.

---------

Co-authored-by: Ferran Toda <ferran.todacasaban at bsc.es>
Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek at amd.com>
DeltaFile
+114-106flang/lib/Semantics/resolve-directives.cpp
+106-33flang/lib/Semantics/check-omp-loop.cpp
+72-45flang/lib/Semantics/canonicalize-omp.cpp
+97-0flang/test/Parser/OpenMP/fuse02.f90
+93-0flang/test/Semantics/OpenMP/loop-transformation-construct02.f90
+90-0flang/test/Parser/OpenMP/loop-transformation-construct05.f90
+572-18417 files not shown
+968-22123 files

LLVM/project a2dc4e0llvm/test/CodeGen/AMDGPU whole-wave-functions.ll llvm.amdgcn.raw.buffer.store.ll

[AMDGPU] Enable multi-group xnack replay in hardware (GFX1250) (#169016)

This patch enables the multi-group xnack replay mode by
configuring the hardware MODE register at kernel entry.
This aligns the hardware behavior with the compiler's
existing multi-group s_wait_xcnt insertion logic.
DeltaFile
+282-281llvm/test/CodeGen/AMDGPU/whole-wave-functions.ll
+377-12llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.store.ll
+230-106llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.tfe.ll
+230-106llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.load.tfe.ll
+213-49llvm/test/CodeGen/AMDGPU/llvm.prefetch.ll
+246-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma.gfx1250.w32.ll
+1,578-554143 files not shown
+7,119-715149 files

LLVM/project 49995b2mlir/include/mlir/Dialect/GPU/IR GPUOps.td, mlir/lib/Conversion/GPUToNVVM WmmaOpsToNvvm.cpp

[MLIR][GPU] subgroup_mma fp64 extension (#165873)

This PR extends the `gpu.subgroup_mma_*` ops to support fp64 type.
The extension requires special handling during the lowering to `nvvm`
due to the return type for load ops for fragment a and b (they return a
scalar instead of a struct).
DeltaFile
+72-0mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f64.mlir
+42-10mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
+22-0mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
+4-4mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+2-2mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+2-2mlir/test/Dialect/GPU/invalid.mlir
+144-182 files not shown
+146-208 files

LLVM/project 31a552dllvm/include/llvm/Analysis LoopCacheAnalysis.h, llvm/lib/Analysis LoopCacheAnalysis.cpp

[LoopCacheAnalysis] Replace delinearization for fixed size array (#164798)

This patch replaces the delinearization function used in
LoopCacheAnalysis, switching from one that depends on type information
in GEPs to one that does not. Once this patch and
https://github.com/llvm/llvm-project/pull/161822 are landed, we can
delete `tryDelinearizeFixedSize` from Delienarization, which is an
optimization heuristic guided by GEP type information. After Polly
eliminates its use of `getIndexExpressionsFromGEP`, we will be able to
completely delete GEP-driven heuristics from Delinearization.
DeltaFile
+20-10llvm/test/Analysis/LoopCacheAnalysis/interchange-refcost-overflow.ll
+15-15llvm/lib/Analysis/LoopCacheAnalysis.cpp
+2-1llvm/include/llvm/Analysis/LoopCacheAnalysis.h
+1-1llvm/test/Transforms/LoopInterchange/pr43326.ll
+38-274 files

LLVM/project db5eeddllvm/include/llvm/ExecutionEngine/Orc/Debugging ELFDebugObjectPlugin.h, llvm/lib/ExecutionEngine/Orc/Debugging ELFDebugObjectPlugin.cpp

[ORC] Tailor ELF debugger support plugin to load-address patching only (#168518)

In 4 years the ELF debugger support plugin wasn't adapted to other
object formats or debugging approaches. After the renaming NFC in
https://github.com/llvm/llvm-project/pull/168343, this patch tailors the
plugin to ELF and section load-address patching. It allows removal of
abstractions and consolidate processing steps with the newly enabled
AllocActions from https://github.com/llvm/llvm-project/pull/168343.

The key change is to process debug sections in one place in a
post-allocation pass. Since we can handle the endianness of the ELF file
the single `visitSectionLoadAddresses()` visitor function now, we don't
need to track debug objects and sections in template classes anymore. We
keep using the `DebugObject` class and drop `DebugObjectSection`,
`ELFDebugObjectSection<ELFT>` and `ELFDebugObject`.

Furthermore, we now use the allocation's working memory for load-address
fixups directly. We can drop the `WritableMemoryBuffer` from the debug
object and most of the `finalizeWorkingMemory()` step, which saves one

    [5 lines not shown]
DeltaFile
+220-388llvm/lib/ExecutionEngine/Orc/Debugging/ELFDebugObjectPlugin.cpp
+12-22llvm/include/llvm/ExecutionEngine/Orc/Debugging/ELFDebugObjectPlugin.h
+232-4102 files

LLVM/project 6a5231eclang/lib/CodeGen CGStmtOpenMP.cpp

[clang][OpenMP][CodeGen] Use an else if instead of checking twice (#168776)

These two classes are mutually exclusive so avoid doing the two checks
when the first succeeded.
DeltaFile
+3-3clang/lib/CodeGen/CGStmtOpenMP.cpp
+3-31 files

LLVM/project e6f3ccallvm/lib/Target/RISCV RISCVSchedSpacemitX60.td, llvm/test/tools/llvm-mca/RISCV/SpacemitX60 rvv-mask.s rvv-permutation.s

[RISCV] Update SpacemiT-X60 vector mask instructions latencies (#150644)

This PR adds hardware-measured latencies for all instructions defined in
Section 15 of the RVV specification: "Vector Mask Instructions" to the
SpacemiT-X60 scheduling model.
DeltaFile
+379-379llvm/test/tools/llvm-mca/RISCV/SpacemitX60/rvv-mask.s
+39-39llvm/test/tools/llvm-mca/RISCV/SpacemitX60/rvv-permutation.s
+17-6llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td
+435-4243 files

LLVM/project 3422b79llvm/lib/Transforms/Utils LowerMemIntrinsics.cpp, llvm/test/CodeGen/AMDGPU memset-param-combinations.ll memintrinsic-unroll.ll

[LowerMemIntrinsics] Optimize memset lowering

This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.

This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.

I'm planning similar treatment for memset.pattern as a follow-up PR.

For SWDEV-543208.
DeltaFile
+1,900-0llvm/test/CodeGen/AMDGPU/memset-param-combinations.ll
+1,616-0llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+686-90llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
+218-116llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-mem-transfer.ll
+197-7llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+103-11llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memset.ll
+4,720-22411 files not shown
+4,826-30117 files

LLVM/project d69320ellvm/include/llvm/Frontend/Directive DirectiveBase.td, llvm/include/llvm/Frontend/OpenMP OMP.td

[OpenMP] Introduce "loop sequence" as directive association (#168934)

OpenMP 6.0 introduced a `fuse` directive, and with it a "loop sequence"
as the associated code. What used to be "loop association" has become
"loop-nest association".

Rename Association::Loop to LoopNest, add Association::LoopSeq to
represent the "loop sequence" association.

Change the association of fuse from "block" to "loop sequence".
DeltaFile
+12-12llvm/include/llvm/Frontend/OpenMP/OMP.td
+10-9llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
+3-2llvm/test/TableGen/directive1.td
+3-2llvm/test/TableGen/directive2.td
+3-2llvm/include/llvm/Frontend/Directive/DirectiveBase.td
+2-2llvm/lib/Frontend/OpenMP/OMP.cpp
+33-293 files not shown
+38-349 files

LLVM/project fe74323llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

[AArch64] Avoid introducing illegal types in LowerVECTOR_COMPRESS (NFC) (#168520)

This does not seem to be an issue currently, but when using
VECTOR_COMPRESS as part of another lowering, I found these BITCASTs
would result in "Unexpected illegal type!" errors.

For example, this would convert the legal nxv2f32 type into the illegal
nxv2i32 type. This patch avoids this by using no-op casts for unpacked
types.
DeltaFile
+45-16llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+45-161 files

LLVM/project 5ab49edmlir/include/mlir-c IR.h, mlir/lib/Bindings/Python IRCore.cpp

[mlir][py][c] Enable setting block arg locations. (#169033)

This enables changing the location of a block argument. Follows the
approach for updating type of block arg.
DeltaFile
+15-0mlir/test/python/ir/blocks.py
+6-0mlir/lib/Bindings/Python/IRCore.cpp
+5-0mlir/lib/CAPI/IR/IR.cpp
+4-0mlir/include/mlir-c/IR.h
+30-04 files

LLVM/project 1763830llvm/lib/Transforms/Utils LowerMemIntrinsics.cpp, llvm/test/CodeGen/AMDGPU memset-param-combinations.ll memintrinsic-unroll.ll

[LowerMemIntrinsics] Optimize memset lowering

This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.

This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.

I'm planning similar treatment for memset.pattern as a follow-up PR.

For SWDEV-543208.
DeltaFile
+1,900-0llvm/test/CodeGen/AMDGPU/memset-param-combinations.ll
+1,616-0llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+686-90llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
+218-116llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-mem-transfer.ll
+197-7llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+103-11llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memset.ll
+4,720-22411 files not shown
+4,826-30117 files

LLVM/project 4fca7b0clang/include/clang/Analysis/Analyses/LifetimeSafety FactsGenerator.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

[LifetimeSafety] Detect expiry of loans to trivially destructed types (#168855)

Handling Trivially Destructed Types

This PR uses `AddLifetime` to handle expiry of loans to trivially
destructed types.

Example:
```cpp
int * trivial_uar(){
    int *ptr;
    int x = 1;
    ptr = &x;
    return ptr;
}
```

The CFG created now has an Expire Fact for trivially destructed types:
```

    [19 lines not shown]
DeltaFile
+64-4clang/test/Sema/warn-lifetime-safety.cpp
+66-0clang/unittests/Analysis/LifetimeSafetyTest.cpp
+10-16clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+21-3clang/test/Sema/warn-lifetime-safety-dataflow.cpp
+3-0clang/lib/Sema/AnalysisBasedWarnings.cpp
+1-1clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+165-246 files

LLVM/project d36e2b6openmp/runtime/src kmp.h

[OpenMP][libomp] Add transparent task flag bit to kmp_tasking_flags (#168873)

Clang is adding support for the new `OpenMP transparent` clause on
`task` and `taskloop` directives.
The parsing and semantic handling for this clause is introduced in
https://github.com/llvm/llvm-project/pull/166810 .
To allow the compiler to communicate this clause to the `OpenMP`
runtime, a dedicated bit in `kmp_tasking_flags` is required.
This patch adds a new compiler-reserved bit `transparent` to the`
kmp_tasking_flags` structure.
DeltaFile
+4-2openmp/runtime/src/kmp.h
+4-21 files

LLVM/project 845da6fllvm/test/CodeGen/AArch64/GlobalISel legalize-min-max-crash.mir

Update MIR test
DeltaFile
+2-1llvm/test/CodeGen/AArch64/GlobalISel/legalize-min-max-crash.mir
+2-11 files

LLVM/project 347512flibcxx/docs/ReleaseNotes 22.rst, libcxx/include fstream

[libc++] Revert fstream::read optimizations (#168894)

This causes various runtime failures, as reported in #168628.

This reverts both #165223 and #167779
DeltaFile
+0-43libcxx/test/benchmarks/streams/fstream.bench.cpp
+25-0libcxx/test/benchmarks/streams/ofstream.bench.cpp
+0-19libcxx/include/fstream
+2-2libcxx/docs/ReleaseNotes/22.rst
+1-1libcxx/test/libcxx/input.output/file.streams/fstreams/filebuf/traits_mismatch.verify.cpp
+1-1libcxx/test/libcxx/input.output/file.streams/fstreams/traits_mismatch.verify.cpp
+29-666 files

LLVM/project 1bcb775llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp GCNSubtarget.h, llvm/test/CodeGen/AMDGPU memory-legalizer-buffer-atomics.ll spillv16.ll

[AMDGPU][gfx1250] Add wait_xcnt before any access that cannot be repeated

All volatile accesses are concerned, and buffer operations are also concerned by this.
DeltaFile
+435-0llvm/test/CodeGen/AMDGPU/memory-legalizer-buffer-atomics.ll
+16-3llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+14-0llvm/test/CodeGen/AMDGPU/spillv16.ll
+6-3llvm/lib/Target/AMDGPU/GCNSubtarget.h
+6-0llvm/test/CodeGen/AMDGPU/integer-mad-patterns.ll
+5-0llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+482-613 files not shown
+508-619 files

LLVM/project 274a4c0libcxx/include/__bit has_single_bit.h

[libc++] Optimize std::has_single_bit (#133063)

Clang translates most implementations of has_single_bit to `(v ^ (v-1))
> v-1` - except the one definition libc++ actually uses.

Proof of correctness: https://godbolt.org/z/d61bxW4r1

(Could also be fixed by teaching Clang to optimize better, but making
source match output feels clearer to me. And it improves unoptimized
performance.)
DeltaFile
+1-1libcxx/include/__bit/has_single_bit.h
+1-11 files

LLVM/project 622f72fclang/lib/CodeGen CGOpenMPRuntime.cpp, clang/test/OpenMP target_firstprivate_pointer_codegen.cpp target_map_codegen_01.cpp

[OpenMP] Fix firstprivate pointer handling in target regions (#167879)

Firstprivate pointers in OpenMP target regions were not being lowered
correctly, causing the runtime to perform unnecessary present table
lookups instead of passing pointer values directly.

This patch adds the OMP_MAP_LITERAL flag for firstprivate pointers,
enabling the runtime to pass pointer values directly without lookups.
The fix handles both explicit firstprivate clauses and implicit
firstprivate semantics from defaultmap clauses.

Key changes:
- Track defaultmap(firstprivate:...) clauses in MappableExprsHandler
- Add isEffectivelyFirstprivate() to check both explicit and implicit
firstprivate semantics
- Apply OMP_MAP_LITERAL flag to firstprivate pointers in
generateDefaultMapInfo()

Map type values:

    [21 lines not shown]
DeltaFile
+169-0clang/test/OpenMP/target_firstprivate_pointer_codegen.cpp
+64-12clang/lib/CodeGen/CGOpenMPRuntime.cpp
+3-3clang/test/OpenMP/target_map_codegen_01.cpp
+2-2clang/test/OpenMP/target_defaultmap_codegen_01.cpp
+2-2clang/test/OpenMP/target_teams_distribute_simd_depend_codegen.cpp
+2-2clang/test/OpenMP/target_map_codegen_09.cpp
+242-2112 files not shown
+256-3418 files

LLVM/project 41a9df2clang/docs ReleaseNotes.rst, clang/include/clang/AST JSONNodeDumper.h

[Clang] Fix handling of explicit parameters in `SemaLambda` (#168558)

Previously, the presence of an explicit parameter list was detected by
querying `getNumTypeObjects()` from the `Declarator` block of the lambda
definition. This breaks for lambdas which do not have a parameter list
but _do_ have a trailing return type; that is, both of

```
[]() -> int { return 0; };
[] -> int { return 0; };
```

would return `true` when inspecting
`LambdaExpr::hasExplicitParameters()`.

Fix this by instead querying the `LParenLoc()` from the `Declarator`'s
`FunctionTypeInfo`. If `isValid() == true`, then an explicit parameter
list must be present, and if it is `false`, then it is not.


    [6 lines not shown]
DeltaFile
+3,387-0clang/test/AST/ast-dump-lambda-json.cpp
+9-9clang/lib/Sema/SemaLambda.cpp
+15-0clang/test/AST/ast-dump-expr-json.cpp
+4-0clang/lib/AST/JSONNodeDumper.cpp
+3-0clang/docs/ReleaseNotes.rst
+1-0clang/include/clang/AST/JSONNodeDumper.h
+3,419-96 files

LLVM/project a8058c1mlir/lib/IR AsmPrinter.cpp

[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in AsmPrinter.cpp (NFC)
DeltaFile
+1-1mlir/lib/IR/AsmPrinter.cpp
+1-11 files

LLVM/project 764c1d4mlir/lib/Dialect/Affine/IR AffineOps.cpp

[MLIR] Apply clang-tidy fixes for readability-container-size-empty in AffineOps.cpp (NFC)
DeltaFile
+1-1mlir/lib/Dialect/Affine/IR/AffineOps.cpp
+1-11 files

LLVM/project 2f627c1llvm/include/llvm/IR IntrinsicsNVVM.td, llvm/lib/Target/NVPTX NVPTXIntrinsics.td

[NVPTX] Support for dense and sparse MMA intrinsics with block scaling. (#163561)

This change adds dense and sparse MMA intrinsics with block scaling. The
implementation is based on [PTX ISA version
9.0](https://docs.nvidia.com/cuda/parallel-thread-execution/). Tests for
new intrinsics are added for PTX 8.7 and SM 120a and are generated by
`llvm/test/CodeGen/NVPTX/wmma-ptx87-sm120a.py`. The tests have been
verified with ptxas from CUDA-13.0 release.
Dense MMA intrinsics with block scaling were supported by
@schwarzschild-radius.
DeltaFile
+334-2llvm/test/CodeGen/NVPTX/wmma.py
+190-1llvm/include/llvm/IR/IntrinsicsNVVM.td
+136-4llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+12-0llvm/test/CodeGen/NVPTX/wmma-ptx88-sm120a.py
+672-74 files

LLVM/project 18d3db4mlir/lib/Dialect/Shard/IR ShardOps.cpp

[MLIR] Apply clang-tidy fixes for readability-container-size-empty in ShardOps.cpp (NFC)
DeltaFile
+1-1mlir/lib/Dialect/Shard/IR/ShardOps.cpp
+1-11 files

LLVM/project 1bc592ellvm/test/MC/AMDGPU gfx11_asm_vop1.s, llvm/utils update_mc_test_checks.py

[Utils][update_mc_test_checks] Support generating asm tests from templates.

Reduces the pain of manual editing tests applying the same
changes over multiple instructions and keeping them consistent.
DeltaFile
+472-150llvm/test/MC/AMDGPU/gfx11_asm_vop1.s
+94-8llvm/utils/update_mc_test_checks.py
+566-1582 files

LLVM/project 3406163llvm/test/DebugInfo/RISCV relax_dwo_ranges.ll

add gen
DeltaFile
+13-15llvm/test/DebugInfo/RISCV/relax_dwo_ranges.ll
+13-151 files