LLVM/project 8557b57llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AArch64 sve-frintz.ll sve-fixed-length-frintz.ll

[DAG] Fold INT_TO_FP( FP_TO_INT (x) ) to FTRUNC(X)  (#198477)

Extends the `foldFPToIntToFP` DAG Combine so that it can now be applied
when `FTRUNC` has a custom lowering, and given that `INT_TO_FP
(FP_TO_INT (X))` is not already legal.

On AArch64 targets with SVE, this change simplifies the codegen of
`INT_TO_FP (FP_TO_INT (X))` conversions by making use of the `frintz`
instruction.
DeltaFile
+128-0llvm/test/CodeGen/AArch64/sve-frintz.ll
+84-0llvm/test/CodeGen/AArch64/sve-fixed-length-frintz.ll
+16-35llvm/test/CodeGen/AMDGPU/fptoui_uitofp.ll
+12-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+240-374 files

LLVM/project 1960c7aclang/lib/AST/ByteCode Compiler.cpp, clang/test/AST/ByteCode cxx23.cpp

[clang][bytecode] Reject erroneous vector conversions (#201368)
DeltaFile
+14-0clang/test/AST/ByteCode/cxx23.cpp
+3-0clang/lib/AST/ByteCode/Compiler.cpp
+17-02 files

LLVM/project 7da29bcclang/test/Preprocessor riscv-target-features.c, llvm/lib/Target/RISCV RISCVInstrInfoZvdota.td RISCVFeatures.td

[RISCV][MC] Support experimental Zvdota Family instructions (#195069)

Spec:
https://github.com/aswaterman/riscv-misc/blob/main/isa/ldot-bdot/ldot-bdot.adoc

---------

Co-authored-by: Brandon Wu <songwu0813 at gmail.com>
Co-authored-by: Craig Topper <craig.topper at sifive.com>
DeltaFile
+42-0llvm/lib/Target/RISCV/RISCVInstrInfoZvdota.td
+36-0clang/test/Preprocessor/riscv-target-features.c
+33-0llvm/test/MC/RISCV/rvv/zvfqwdota8f.s
+26-0llvm/lib/Target/RISCV/RISCVFeatures.td
+22-0llvm/test/MC/RISCV/rvv/zvqwdotai8i16.s
+22-0llvm/test/MC/RISCV/rvv/zvfwdota16bf.s
+181-09 files not shown
+215-115 files

LLVM/project b66b10eclang/test/CodeGenHLSL/builtins lerp-overloads.hlsl atan2-overloads.hlsl, llvm/test/Bitcode compatibility.ll

[IR] Add fast-math support to {u,s}itofp (#198470)

- `{u,s}itofp` are floating point typed values.
- CodeGen part (foldFPToIntToFP in DAGCombiner) needs `nsz` to fold
pattern (uitofp (fptoui x)) -> (trunc x).
- LLVM has intrinsic variants of `{u,s}itofp`, which already support
fast-math flags.

Now optimization flags require 9 bits in bitcode, fast-math flags of
`uitofp` are stored in high 8 bits.
VPlan part may need some extra work, it assumes optimization flags from
different categories are disjoint.
DeltaFile
+132-132llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
+48-48clang/test/CodeGenHLSL/builtins/lerp-overloads.hlsl
+32-32clang/test/CodeGenHLSL/builtins/atan2-overloads.hlsl
+32-32clang/test/CodeGenHLSL/builtins/step-overloads.hlsl
+32-32clang/test/CodeGenHLSL/builtins/pow-overloads.hlsl
+42-0llvm/test/Bitcode/compatibility.ll
+318-27659 files not shown
+900-79465 files

LLVM/project b09fcebllvm/lib/Transforms/Scalar LoopFuse.cpp, llvm/test/Transforms/LoopFusion peel-preserve-lcssa.ll

[LoopFusion] Reform LCSSA after peelFusionCandidate's peelLoop (#200442)

peelLoop's internal simplifyLoop call requires LCSSA to be preserved
across it, but the cloned exit edges and cloned defs that peelLoop
introduces are not reflected in the existing LCSSA phis, so the contract
cannot be honoured. Pass PreserveLCSSA=false to peelLoop here and reform
LCSSA on the affected nest immediately afterward. LCSSA is expected
before and after peel+fuse, just not during it.

Caught by yarpgen fuzzing of clang -O3 -fexperimental-loop-fusion -mllvm
-loop-fusion-peel-max-count=8 on AArch64.

Fixes #199418
DeltaFile
+45-0llvm/test/Transforms/LoopFusion/peel-preserve-lcssa.ll
+5-1llvm/lib/Transforms/Scalar/LoopFuse.cpp
+50-12 files

LLVM/project c88cefblibcxx/include/__memory unique_ptr.h

[libc++] Simplify unique_ptr constructor SFINAE (#201305)

This patch does a couple of things:
- inline aliases to `__enable_if_t`s, making it easier to understand
what's actually going on
- make the `enable_if`s dependent via a `class _Deleter = deleter_type`
instead of a `bool` and `__dependent_type`, reducing the number of
instantiated classes
- remove `__unique_ptr_deleter_sfinae`
DeltaFile
+46-109libcxx/include/__memory/unique_ptr.h
+46-1091 files

LLVM/project 3112581mlir/lib/Dialect/Tosa/IR TosaCanonicalizations.cpp, mlir/test/Dialect/Tosa constant_folding.mlir

[mlir][tosa] Improve folder conformance to TOSA specification (#200223)

This commit fixes some bugs in TOSA folders that cause non-conformant
results. The fixes include:
- tosa.intdiv - Folding when the lhs and rhs are zero. In the TOSA
specification this is undefined behaviour.
- tosa.div_ceil_shape/tosa.div_floor_shape - Folding when the lhs is
negative or the rhs is non-positive. In the TOSA specification this is
undefined behaviour.

In addition, some test cases have been added for non-exercised code
paths, including:
- tosa.intdiv - Rejects overflow cases
- tosa.greater/tosa.greater_equal/tosa.equal - Correctly evaluates NaN
cases to False.
- tosa.cast - Saturating rounding when input is out of range of the
output type.
- tosa.mod_shape - Rejects cases where lhs is negative or rhs is
non-positive.
DeltaFile
+173-7mlir/test/Dialect/Tosa/constant_folding.mlir
+24-4mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp
+197-112 files

LLVM/project d226dccllvm/lib/Target/NVPTX NVPTXISelLowering.h, llvm/test/CodeGen/NVPTX insert-vector-elt-bitcast-legalize.ll

[NVPTX] Fix illegal combineInsertEltToShuffle pattern (#198259)

Adds a condition to the `isShuffleMaskLegal` method to prevent
`combineInsertEltToShuffle` from creating an illegal `vector_shuffle`
after type legalization which leads to a crash.

Context:
This is triggered when bitcasting a `v2f16` into a vector that type
legalizes to a `v2i32`. This happens on architectures supporting packed
`f32` operations (>= `sm_100`). In certain cases, this leads to
`combineInsertEltToShuffle` creating a vector shuffle with `v4f16` for
simplification. Since this happens after type legalization and `v4f16`
is an illegal type, it leads to a crash.
DeltaFile
+24-0llvm/test/CodeGen/NVPTX/insert-vector-elt-bitcast-legalize.ll
+4-0llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+28-02 files

LLVM/project 808b9e5llvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/test/CodeGen/AArch64/GlobalISel prelegalizer-combiner-load-and-mask.mir prelegalizercombiner-sextload-from-sextinreg.mir

[GlobalISel] Combine sext(load), zext(load) patterns when the load has multiple uses (#182831)

Extend the existing combiners for sext(load), zext(load) patterns to also work when the load has multiple uses.
DeltaFile
+122-0llvm/test/CodeGen/AArch64/GlobalISel/prelegalizer-combiner-load-and-mask.mir
+59-1llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-sextload-from-sextinreg.mir
+17-8llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+7-7llvm/test/CodeGen/AMDGPU/ctlz_zero_poison.ll
+4-4llvm/test/CodeGen/AMDGPU/cttz.ll
+1-2llvm/test/CodeGen/AMDGPU/cttz_zero_poison.ll
+210-221 files not shown
+211-237 files

LLVM/project 6fd4b82llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-chained.ll partial-reduce-usabs.ll

[LV] Optimize partial reduction extends before handling inloop subs (#199665)

The crash avoided in #194660 was caused by the extend optimizations
failing to match as due to the extra sub/negation added to the
"ExtendedOp".

A similar crash exists for [us]abs partial reductions (see
https://godbolt.org/z/MerMon5rE), which is fixed with this patch.

This patch solves the underlying issue by running the extend
optimizations before any inloop sub/fsub handling.

Fixes #194000
DeltaFile
+70-66llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
+60-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-usabs.ll
+3-6llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+133-723 files

LLVM/project b0cb331clang-tools-extra/clang-tidy/readability BracesAroundStatementsCheck.cpp, clang-tools-extra/clang-tidy/utils BracesAroundStatement.cpp

[clang-tidy] Avoid brace fix-it crash in macro body expansion (#198788)

`readability-braces-around-statements `could assert when diagnosing an
unbraced statement that ends in the middle of a macro body expansion. It
would be hard/unsafe to give fix-its for such cases, so treat them as
diagnostic-only.

Closes https://github.com/llvm/llvm-project/issues/198711
DeltaFile
+9-1clang-tools-extra/clang-tidy/utils/BracesAroundStatement.cpp
+8-0clang-tools-extra/test/clang-tidy/checkers/readability/braces-around-statements.cpp
+8-0clang-tools-extra/test/clang-tidy/checkers/readability/braces-around-statements-same-line.cpp
+5-0clang-tools-extra/docs/ReleaseNotes.rst
+2-1clang-tools-extra/clang-tidy/readability/BracesAroundStatementsCheck.cpp
+32-25 files

LLVM/project f4db038llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU dynamic_stackalloc.ll amdgpu-cs-chain-fp-nosave.ll

[AMDGPU] In `LowerDYNAMIC_STACKALLOC`, hoist the readfirstlane up one instruction

Instead of:

```
$max_size_vgpr = wave_reduction_umax($vgpr_alloca_size)
$sgpr_newsp = readfirstlane($max_size_vgpr + $sgpr_sp)
```

Hoist the readfirstlane up to perform the addition using scalar
registers:

```
$max_size_sgpr = readfirstlane(wave_reduction_umax($vgpr_alloca_size))
$sgpr_newsp = $max_size_sgpr + $sgpr_sp
```
DeltaFile
+180-210llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll
+36-49llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll
+5-7llvm/test/CodeGen/AMDGPU/llvm.sponentry.ll
+5-6llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+226-2724 files

LLVM/project ca085afllvm/test/CodeGen/X86 freeze-vector.ll vector-shuffle-combining-avx512bwvl.ll

[SelectionDAG] Pre-commit tests for dagcombine improvements (#201270)

I've got a stack of dagcombine improvements that together make an
infinite cycle relating to freeze insertion in vector-manipulation IR.
Here we have

- Handling freeze(undef) in demanded-elts for shufflevector
- Improvements to noundef checks for bitcast, concat, and select
- Improvements to extract(concat), extract(extract), and extract(insert)
handling

Even though the regression I'm fixing is an AMDGPU one, these tests are
mainly X86 because the AMDGPU calling convention makes it hard to
demonstrate the folds I'm adding.

AI note: I got an LLM to find most of these tests, especially some of
the fiddly ones that needed control flow.
DeltaFile
+230-0llvm/test/CodeGen/X86/freeze-vector.ll
+73-0llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll
+64-0llvm/test/CodeGen/X86/madd.ll
+36-0llvm/test/CodeGen/X86/freeze-fp.ll
+403-04 files

LLVM/project b12cd6alldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime AppleObjCClassDescriptorV2.cpp AppleObjCClassDescriptorV2.h

[lldb] Use batched memory reads in ClassDescriptorV2::relative_list_entry_t (#201284)

This reduces the number of memory reads performed when reading Objective
C classes metadata.
Note: these addresses are indeed sequential (with a small offset between
them), but there are so many of them that they would not fit into a
single Process::ReadMemory cache line, so this is still a win, and it
also puts the code into the right shape for vectorizing the next read in
the same loop, which will see the biggest savings.
DeltaFile
+33-30lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.cpp
+4-3lldb/source/Plugins/LanguageRuntime/ObjC/AppleObjCRuntime/AppleObjCClassDescriptorV2.h
+37-332 files

LLVM/project 79cf6b6llvm/lib/Target/AArch64 AArch64RegisterInfo.td

fixup! Address CR comments
DeltaFile
+10-19llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+10-191 files

LLVM/project bb138cbllvm/lib/Target/AArch64 AArch64RegisterInfo.td SMEInstrFormats.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Address more CR comments
DeltaFile
+6-8llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+10-1llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+2-2llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+1-1llvm/lib/Target/AArch64/SMEInstrFormats.td
+19-124 files

LLVM/project c411947llvm/lib/Target/AArch64 AArch64RegisterInfo.td SMEInstrFormats.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Restrict luti6 (4 regs, 8-bit) to 0 <= Zn <= 7

The `luti6` instruction (table, four registers, 8-bit) should only
allow `0 <= Zn <= 7`, since there's only 3 bits. It actually allows:
```
   luti6 { z0.b - z3.b }, zt0, { z8 - z10 }
```
which produces a duplicate encoding to the following:
```
   luti6 { z0.b - z3.b }, zt0, { z0 - z2 }
```

Fix tablegen to ensure Zn is only allowed in correct range of 0 to 7.
DeltaFile
+15-0llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+5-0llvm/test/MC/AArch64/SME2p3/luti6-diagnostics.s
+4-0llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+1-1llvm/lib/Target/AArch64/SMEInstrFormats.td
+25-14 files

LLVM/project b659948clang/lib/CIR/CodeGen CIRGenExprComplex.cpp, clang/test/CIR/CodeGen complex-atomic-cast.c

[CIR] Fix Complex Casting with Atomic qualifier (#201424)

Apply Complex Casting with Atomic qualifier fixes from PRs #172210 and
#172163

Issue https://github.com/llvm/llvm-project/issues/192331
DeltaFile
+137-0clang/test/CIR/CodeGen/complex-atomic-cast.c
+13-8clang/lib/CIR/CodeGen/CIRGenExprComplex.cpp
+150-82 files

LLVM/project fbd4509lldb/docs/resources build.rst build.md, llvm/test/CodeGen/AMDGPU splitkit-getsubrangeformask-phi-extend.ll

Merge branch 'main' into users/krzysz00/pre-commit-rocm-llvm-2616-fix-tests
DeltaFile
+1,730-0llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask-phi-extend.ll
+0-755lldb/docs/resources/build.rst
+722-0lldb/docs/resources/build.md
+0-721lldb/docs/resources/test.rst
+698-0lldb/docs/resources/test.md
+0-602lldb/docs/resources/debugging.rst
+3,150-2,078242 files not shown
+11,665-6,097248 files

LLVM/project 6019583llvm/utils/TableGen GlobalISelCombinerEmitter.cpp GlobalISelEmitter.cpp, llvm/utils/TableGen/Common/GlobalISel/MatchTable Matchers.cpp Matchers.h

[NFC][GlobalISel] Refactor ownership of InstructionMatchers (#200798)

- Clarify that the array of InstructionMatchers in the RuleMatcher are
for the roots only.
- Let RuleMatcher own all of the InstructionMatcher used for/by
predicates.
They are all kept in an array in which the index of the
InstructionMatcher is equal to its
InsnID, which eliminates some redundant tracking.
- Remove duplicate tracking of InsnID from RuleMatcher;
InstructionMatcher does it on its own already.

Co-authored-by: Pierre-vh <29600849+Pierre-vh at users.noreply.github.com>
DeltaFile
+48-64llvm/utils/TableGen/Common/GlobalISel/MatchTable/Matchers.cpp
+29-43llvm/utils/TableGen/Common/GlobalISel/MatchTable/Matchers.h
+3-3llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
+2-2llvm/utils/TableGen/GlobalISelEmitter.cpp
+82-1124 files

LLVM/project 06a0c06llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp, llvm/test/CodeGen/AMDGPU memory-legalizer-non-volatile.mir

Comments
DeltaFile
+3-3llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.mir
+1-1llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+4-42 files

LLVM/project 8455183llvm/utils/TableGen GlobalISelEmitter.cpp GlobalISelCombinerEmitter.cpp, llvm/utils/TableGen/Common/GlobalISel/MatchTable Matchers.cpp Matchers.h

[GlobalISel] Do not depend on the RuleMatcher at MatchTable emission (#200799)

Some PredicateMatchers/MatchAction/OperandRenderers relied on accessing
RuleMatcher at emission as a crutch.
Instead, make these classes collect all necessary information in the
constructor so the `emit` methods don't depend on RuleMatcher anymore.

The primary motivation for this is that I've been looking at ways to optimize the MatchTable better,
and the fact that Predicates/Actions/Renderers are not "pure" objects, in the sense that they keep
accessing a bunch of data all over the place even as late as emission, was a consistent pain.

This is NFCI. There are no changes to any of the match table for AMDGPU/AArch64 in this patch.

This patch has a bunch of noise due to function signature changes so I'll highlight the following interesting changes:
- `SameOperandMatcher` needed a bit of an update in its `canHoistOutsideOf` function. I had to rewrite it
  but I think the end result is the same.
- `EraseInstAction` has been updated as well, and its users in both Combiner/ISel backends have been updated to.
  Instead of ignoring this action if the Inst was already erased, it's now the responsibility of the
  builder to never insert it in the first place. `BuildMIAction` had a small update because of that too.

    [4 lines not shown]
DeltaFile
+109-194llvm/utils/TableGen/Common/GlobalISel/MatchTable/Matchers.cpp
+161-134llvm/utils/TableGen/Common/GlobalISel/MatchTable/Matchers.h
+21-19llvm/utils/TableGen/GlobalISelEmitter.cpp
+7-5llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
+298-3524 files

LLVM/project a4d1cb4llvm/utils/TableGen GlobalISelCombinerEmitter.cpp GlobalISelEmitter.cpp, llvm/utils/TableGen/Common/GlobalISel/MatchTable Matchers.cpp Matchers.h

[NFC][GlobalISel] Refactor ownership of InstructionMatchers (#200798)

- Clarify that the array of InstructionMatchers in the RuleMatcher are for the roots only.
- Let RuleMatcher own all of the InstructionMatcher used for/by predicates.
They are all kept in an array in which the index of the InstructionMatcher is equal to its
InsnID, which eliminates some redundant tracking.
- Remove duplicate tracking of InsnID from RuleMatcher; InstructionMatcher does it on its own already.
DeltaFile
+48-64llvm/utils/TableGen/Common/GlobalISel/MatchTable/Matchers.cpp
+29-43llvm/utils/TableGen/Common/GlobalISel/MatchTable/Matchers.h
+3-3llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
+2-2llvm/utils/TableGen/GlobalISelEmitter.cpp
+82-1124 files

LLVM/project b47d4bcllvm/docs AMDGPUUsage.rst, llvm/test/CodeGen/AMDGPU memory-legalizer-non-volatile.ll memory-legalizer-non-volatile.mir

Restack + comments
DeltaFile
+2-14llvm/docs/AMDGPUUsage.rst
+4-4llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.ll
+1-1llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.mir
+7-193 files

LLVM/project 2f1c759llvm/docs AMDGPUUsage.rst, llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp SIInstrInfo.h

[AMDGPU][SIMemoryLegalizer] Consider scratch operations as NV=1 if GAS is disabled

- Clarify that `thread-private` MMO flag is still useful.
- If GAS is not enabled (which is the default as of last patch), consider an op as `NV=1` if it's a `scratch_` opcode, or if the MMO is in the private AS.
- Add tests for the new cases.
- Update AMDGPUUsage GFX12.5 memory model
DeltaFile
+181-0llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.mir
+75-36llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.ll
+13-6llvm/docs/AMDGPUUsage.rst
+14-3llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+3-1llvm/lib/Target/AMDGPU/SIInstrInfo.h
+286-465 files

LLVM/project b44c2a4llvm/test/CodeGen/AMDGPU memory-legalizer-non-volatile.mir

Fix MIR test
DeltaFile
+3-3llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.mir
+3-31 files

LLVM/project 1d88b83llvm/docs AMDGPUUsage.rst, llvm/lib/Target/AMDGPU GCNSubtarget.cpp AMDGPU.td

Comments
DeltaFile
+74-64llvm/docs/AMDGPUUsage.rst
+9-0llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+1-7llvm/lib/Target/AMDGPU/AMDGPU.td
+1-4llvm/lib/Target/AMDGPU/GCNSubtarget.h
+1-1llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-private-gas.ll
+1-1llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+87-778 files not shown
+95-8514 files

LLVM/project 6c33fecllvm/test/CodeGen/AMDGPU memory-legalizer-private-singlethread.ll memory-legalizer-private-wavefront.ll

Rebase
DeltaFile
+1,994-950llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+1,994-950llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+1,994-950llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+1,971-939llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+1,971-939llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+1,879-899llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+11,803-5,6276 files

LLVM/project a0fe215llvm/test/CodeGen/AMDGPU memory-legalizer-private-agent.ll memory-legalizer-private-system.ll

[AMDGPU] Make globally-addressable-scratch opt-in

This feature is meant to be opt-in for more advanced users, not default-enabled.
It may reduce performance otherwise as we can't assume private AS is thread-local
when it is enabled.

- Add `HasGloballyAddressableScratchSupport` feature to check if a target's scratch
  addressing is changed due to support for globally addressable scratch.
- Use `EnableGloballyAddressableScratch` to check whether the user opted into
  globally addressable scratch. This affects whether to lower scratch atomics as flat,
  and in the future will affect whether NV=1 can be set on scratch accesses.
DeltaFile
+4,816-4,142llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+4,584-3,938llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+4,595-3,921llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+4,564-3,881llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+4,412-3,729llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+4,412-3,729llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+27,383-23,34013 files not shown
+27,647-23,49719 files

LLVM/project 73802c2offload/test/offloading array_reductions.cpp multiple_reductions.cpp

[OpenMP][offload] Add enhanced array-reduction tests (#201040)
DeltaFile
+172-0offload/test/offloading/array_reductions.cpp
+2-0offload/test/offloading/multiple_reductions.cpp
+174-02 files