LLVM/project f56dddellvm/lib/Target/ARM ARMInstrThumb2.td, llvm/test/CodeGen/Thumb2/LowOverheadLoops pr168209.ll

[ARM] Restore hasSideEffects flag on t2WhileLoopSetup (#168948)

ARM relies on deprecated TableGen behavior of guessing instruction
properties from patterns (`def ARM : Target` doesn't have
`guessInstructionProperties` set to false).

Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup`
has side effects because the instruction wasn't matched by any pattern.

After the patch, TableGen guesses it has no side effects because the
added pattern uses only `arm_wlssetup` node, which has no side effects.

Add `SDNPSideEffect` to the node so that TableGen guesses the property
right, and also `hasSideEffects = 1` to the instruction in case ARM ever
sets `guessInstructionProperties` to false.
DeltaFile
+45-0llvm/test/CodeGen/Thumb2/LowOverheadLoops/pr168209.ll
+3-1llvm/lib/Target/ARM/ARMInstrThumb2.td
+48-12 files

LLVM/project 1367515llvm/test/CodeGen/X86/apx no-rex2-general.ll no-rex2-special.ll

test-changes
DeltaFile
+8-18llvm/test/CodeGen/X86/apx/no-rex2-general.ll
+8-16llvm/test/CodeGen/X86/apx/no-rex2-special.ll
+4-8llvm/test/CodeGen/X86/apx/no-rex2-pseudo-x87.ll
+2-4llvm/test/CodeGen/X86/apx/no-rex2-pseudo-amx.ll
+22-464 files

LLVM/project 16266e1llvm/lib/Target/X86 X86InstrInfo.cpp X86InstrInfo.h

X86: Stop overriding getRegClass

This function should not be virtual; making this virtual was
an AMDGPU hack that should be removed not spread to other
backends.

This does not need to be overridden to reserve registers. The
register reservation mechanism is orthogonal to to the register
class constraints of the instruction, this should be reporting
the underlying instruction constraint. The registers are separately
reserved, so they will be removed from the allocation order anyway.
If the actual class needs to change based on the subtarget,
it should probably generalize the LookupPtrRegClass mechanism.

This was added by #70958. The new tests there for the class are
probably not useful anymore. These instead should compile to the
end and try to stress the allocation behavior.
DeltaFile
+0-15llvm/lib/Target/X86/X86InstrInfo.cpp
+0-9llvm/lib/Target/X86/X86InstrInfo.h
+0-242 files

LLVM/project 76a6816flang/lib/Lower Runtime.cpp

[flang][NFC] replace std::exit by fir::emitFatalError in Lower/Runtime.cpp (#169050)

DeltaFile
+1-2flang/lib/Lower/Runtime.cpp
+1-21 files

LLVM/project bb2e468compiler-rt/lib/tsan/rtl tsan_platform_mac.cpp

[TSan] [Darwin] Fix off by one in TSAN init due to MemoryRangeIsAvailable (#169008)

DeltaFile
+1-1compiler-rt/lib/tsan/rtl/tsan_platform_mac.cpp
+1-11 files

LLVM/project 4538818clang/lib/CodeGen CGOpenMPRuntimeGPU.cpp, clang/test/OpenMP spirv_target_codegen_basic.cpp

[OpenMP][OMPIRBuilder] Use runtime CC for runtime calls (#168608)

Some targets have a specific calling convention that should be used for
generated calls to runtime functions.

Pass that down and use it.

Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
DeltaFile
+120-109llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+12-0mlir/test/Target/LLVMIR/omptarget-runtimecc.mlir
+10-0llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+4-1mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
+2-1clang/test/OpenMP/spirv_target_codegen_basic.cpp
+2-0clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+150-1111 files not shown
+151-1117 files

LLVM/project 89bb99dflang/include/flang/Optimizer/OpenACC/Support FIROpenACCOpsInterfaces.h, flang/lib/Optimizer/OpenACC/Support FIROpenACCOpsInterfaces.cpp RegisterOpenACCExtensions.cpp

[acc][flang] Implement acc interface for tracking type descriptors (#168982)

FIR operations that use derived types need to have type descriptor
globals available on device when offloading. Examples of this can be
seen in `CUFDeviceGlobal` which ensures that such type descriptor uses
work on device for CUF.

Similarly, this is needed for OpenACC. This change introduces a new
interface to the OpenACC dialect named
`IndirectGlobalAccessOpInterface` which can be attached to operations
that may result in generation of accesses that use type descriptor
globals. This functionality is needed for the `ACCImplicitDeclare` pass
that is coming in a follow-up change which implicitly ensures that all
referenced globals are available in OpenACC compute contexts.

The interface provides a `getReferencedSymbols` method that collects all
global symbols referenced by an operation. When a symbol table is
provided, the implementation for FIR recursively walks type descriptor
globals to find all transitively referenced symbols.

    [13 lines not shown]
DeltaFile
+96-0flang/lib/Optimizer/OpenACC/Support/FIROpenACCOpsInterfaces.cpp
+23-0mlir/include/mlir/Dialect/OpenACC/OpenACCOpsInterfaces.td
+9-0flang/include/flang/Optimizer/OpenACC/Support/FIROpenACCOpsInterfaces.h
+9-0flang/lib/Optimizer/OpenACC/Support/RegisterOpenACCExtensions.cpp
+137-04 files

LLVM/project 0b6db77llvm/test/CodeGen/AMDGPU mfma-loop.ll a-v-flat-atomicrmw.ll

[AMDGPU] Handle AV classes in SIFixSGPRCopies::processPHINode (#169038)

Fix a problem exposed by #166483 using AV classes in more places.
`isVectorRegister` only accepts registers of VGPR or AGPR classes.
`hasVectorRegisters` additionally accepts the combined AV classes.

Fixes: #168761
DeltaFile
+2,016-814llvm/test/CodeGen/AMDGPU/mfma-loop.ll
+315-327llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll
+40-37llvm/test/CodeGen/AMDGPU/av-split-dead-valno-crash.ll
+34-38llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
+24-24llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
+37-0llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-phi-regression-av-classes.ll
+2,466-1,2404 files not shown
+2,472-1,24510 files

LLVM/project bc323b6llvm/test/CodeGen/AMDGPU shufflevector.v4p0.v4p0.ll shufflevector.v4i64.v4i64.ll

AMDGPU: Stop implementing shouldCoalesce (#168988)

Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.
DeltaFile
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3p0.v4p0.ll
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3i64.v4i64.ll
+24,242-38,39661 files not shown
+56,404-78,61967 files

LLVM/project 77c329fmlir/include/mlir/Dialect/LLVMIR ROCDLOps.td, mlir/test/Dialect/LLVMIR rocdl.mlir

[mlir][ROCDL] Adds wmma scaled intrinsics for gfx1250 (#165915)

Signed-off-by: Muzammiluddin Syed <muzasyed at amd.com>
DeltaFile
+152-1mlir/test/Target/LLVMIR/rocdl.mlir
+84-28mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+20-0mlir/test/Dialect/LLVMIR/rocdl.mlir
+256-293 files

LLVM/project 8c3f59fmlir/include/mlir/Dialect/GPU/IR GPUOps.td, mlir/lib/Conversion/GPUToNVVM WmmaOpsToNvvm.cpp

Revert "[MLIR][GPU] subgroup_mma fp64 extension" (#169049)

Reverts llvm/llvm-project#165873

The revert is triggered by a failing integration test on a couple of
buildbots.
DeltaFile
+0-72mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f64.mlir
+10-42mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
+0-22mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
+4-4mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+2-2mlir/test/Dialect/GPU/invalid.mlir
+2-2mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+18-1442 files not shown
+20-1468 files

LLVM/project 560b83cllvm/test/Instrumentation/TypeSanitizer anon.ll basic.ll

[TySan][Clang] Add clang flag to use tysan outlined instrumentation a… (#166170)

…nd update docs
DeltaFile
+316-4llvm/test/Instrumentation/TypeSanitizer/anon.ll
+231-4llvm/test/Instrumentation/TypeSanitizer/basic.ll
+180-1llvm/test/Instrumentation/TypeSanitizer/sanitize-no-tbaa.ll
+127-10llvm/test/Instrumentation/TypeSanitizer/basic-nosan.ll
+132-1llvm/test/Instrumentation/TypeSanitizer/alloca.ll
+111-10llvm/test/Instrumentation/TypeSanitizer/byval.ll
+1,097-3017 files not shown
+1,565-15923 files

LLVM/project 2a3e745llvm/test/CodeGen/AMDGPU rewrite-vgpr-mfma-scale-to-agpr.mir

Fix test from #168609 (#169041)

DeltaFile
+1-1llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-scale-to-agpr.mir
+1-11 files

LLVM/project b98f6a5llvm/lib/Transforms/Vectorize VPlanUtils.cpp

[VPlan] Cast to VPIRMetadata in getMemoryLocation (NFC) (#169028)

This allows us to strip an unnecessary TypeSwitch.
DeltaFile
+10-13llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+10-131 files

LLVM/project 1ced6d1llvm/test/Analysis/Delinearization validation_parametric_sizes.ll

[Delinarization] Add test for inferred array size exceeds integer range
DeltaFile
+87-0llvm/test/Analysis/Delinearization/validation_parametric_sizes.ll
+87-01 files

LLVM/project aa02f85llvm/include/llvm/Analysis Delinearization.h, llvm/lib/Analysis DependenceAnalysis.cpp Delinearization.cpp

[DA][Delinearization] Move validation logic into Delinearization
DeltaFile
+4-130llvm/lib/Analysis/DependenceAnalysis.cpp
+107-0llvm/lib/Analysis/Delinearization.cpp
+10-0llvm/include/llvm/Analysis/Delinearization.h
+8-0llvm/test/Analysis/Delinearization/fixed_size_array.ll
+2-0llvm/test/Analysis/Delinearization/terms_with_identity_factor.ll
+2-0llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll
+133-13013 files not shown
+150-13019 files

LLVM/project 31711c9llvm/lib/Transforms/Vectorize VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize/AArch64 force-target-instruction-cost.ll

[VPlan] Only apply forced cost to recipes with underlying values. (#168372)

Only apply forced instruction costs to recipes with underlying values to
match the legacy cost model. A VPlan may have a number of additional
VPInstructions without underlying values that are not considered for its
cost, and assigning forced costs to them would incorrectly inflate its
cost.

This fixes a cost divergence between legacy and VPlan-based cost models
with forced instruction costs.

PR: https://github.com/llvm/llvm-project/pull/168372
DeltaFile
+53-0llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+8-3llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+61-32 files

LLVM/project 3005886flang/test/Semantics/OpenMP loop-transformation-construct01.f90

[flang][OpenMP] Fix some typo-like things in test case (#168582)

DeltaFile
+12-12flang/test/Semantics/OpenMP/loop-transformation-construct01.f90
+12-121 files

LLVM/project f4ebee0flang/lib/Semantics resolve-directives.cpp check-omp-loop.cpp, flang/test/Parser/OpenMP fuse02.f90 loop-transformation-construct05.f90

[Flang][OpenMP] Add semantic support for Loop Sequences and OpenMP loop fuse (#161213)

This patch adds semantics for the `omp fuse` directive in flang, as
specified in OpenMP 6.0. This patch also enables semantic support for
loop sequences which are needed for the fuse directive along with
semantics for the `looprange` clause. These changes are only semantic.
Relevant tests have been added , and previous behavior is retained with
no changes.

---------

Co-authored-by: Ferran Toda <ferran.todacasaban at bsc.es>
Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek at amd.com>
DeltaFile
+114-106flang/lib/Semantics/resolve-directives.cpp
+106-33flang/lib/Semantics/check-omp-loop.cpp
+72-45flang/lib/Semantics/canonicalize-omp.cpp
+97-0flang/test/Parser/OpenMP/fuse02.f90
+93-0flang/test/Semantics/OpenMP/loop-transformation-construct02.f90
+90-0flang/test/Parser/OpenMP/loop-transformation-construct05.f90
+572-18417 files not shown
+968-22123 files

LLVM/project a2dc4e0llvm/test/CodeGen/AMDGPU whole-wave-functions.ll llvm.amdgcn.raw.buffer.store.ll

[AMDGPU] Enable multi-group xnack replay in hardware (GFX1250) (#169016)

This patch enables the multi-group xnack replay mode by
configuring the hardware MODE register at kernel entry.
This aligns the hardware behavior with the compiler's
existing multi-group s_wait_xcnt insertion logic.
DeltaFile
+282-281llvm/test/CodeGen/AMDGPU/whole-wave-functions.ll
+377-12llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.store.ll
+230-106llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.tfe.ll
+230-106llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.load.tfe.ll
+213-49llvm/test/CodeGen/AMDGPU/llvm.prefetch.ll
+246-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wmma.gfx1250.w32.ll
+1,578-554143 files not shown
+7,119-715149 files

LLVM/project 49995b2mlir/include/mlir/Dialect/GPU/IR GPUOps.td, mlir/lib/Conversion/GPUToNVVM WmmaOpsToNvvm.cpp

[MLIR][GPU] subgroup_mma fp64 extension (#165873)

This PR extends the `gpu.subgroup_mma_*` ops to support fp64 type.
The extension requires special handling during the lowering to `nvvm`
due to the return type for load ops for fragment a and b (they return a
scalar instead of a struct).
DeltaFile
+72-0mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f64.mlir
+42-10mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
+22-0mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
+4-4mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+2-2mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+2-2mlir/test/Dialect/GPU/invalid.mlir
+144-182 files not shown
+146-208 files

LLVM/project 31a552dllvm/include/llvm/Analysis LoopCacheAnalysis.h, llvm/lib/Analysis LoopCacheAnalysis.cpp

[LoopCacheAnalysis] Replace delinearization for fixed size array (#164798)

This patch replaces the delinearization function used in
LoopCacheAnalysis, switching from one that depends on type information
in GEPs to one that does not. Once this patch and
https://github.com/llvm/llvm-project/pull/161822 are landed, we can
delete `tryDelinearizeFixedSize` from Delienarization, which is an
optimization heuristic guided by GEP type information. After Polly
eliminates its use of `getIndexExpressionsFromGEP`, we will be able to
completely delete GEP-driven heuristics from Delinearization.
DeltaFile
+20-10llvm/test/Analysis/LoopCacheAnalysis/interchange-refcost-overflow.ll
+15-15llvm/lib/Analysis/LoopCacheAnalysis.cpp
+2-1llvm/include/llvm/Analysis/LoopCacheAnalysis.h
+1-1llvm/test/Transforms/LoopInterchange/pr43326.ll
+38-274 files

LLVM/project db5eeddllvm/include/llvm/ExecutionEngine/Orc/Debugging ELFDebugObjectPlugin.h, llvm/lib/ExecutionEngine/Orc/Debugging ELFDebugObjectPlugin.cpp

[ORC] Tailor ELF debugger support plugin to load-address patching only (#168518)

In 4 years the ELF debugger support plugin wasn't adapted to other
object formats or debugging approaches. After the renaming NFC in
https://github.com/llvm/llvm-project/pull/168343, this patch tailors the
plugin to ELF and section load-address patching. It allows removal of
abstractions and consolidate processing steps with the newly enabled
AllocActions from https://github.com/llvm/llvm-project/pull/168343.

The key change is to process debug sections in one place in a
post-allocation pass. Since we can handle the endianness of the ELF file
the single `visitSectionLoadAddresses()` visitor function now, we don't
need to track debug objects and sections in template classes anymore. We
keep using the `DebugObject` class and drop `DebugObjectSection`,
`ELFDebugObjectSection<ELFT>` and `ELFDebugObject`.

Furthermore, we now use the allocation's working memory for load-address
fixups directly. We can drop the `WritableMemoryBuffer` from the debug
object and most of the `finalizeWorkingMemory()` step, which saves one

    [5 lines not shown]
DeltaFile
+220-388llvm/lib/ExecutionEngine/Orc/Debugging/ELFDebugObjectPlugin.cpp
+12-22llvm/include/llvm/ExecutionEngine/Orc/Debugging/ELFDebugObjectPlugin.h
+232-4102 files

LLVM/project 6a5231eclang/lib/CodeGen CGStmtOpenMP.cpp

[clang][OpenMP][CodeGen] Use an else if instead of checking twice (#168776)

These two classes are mutually exclusive so avoid doing the two checks
when the first succeeded.
DeltaFile
+3-3clang/lib/CodeGen/CGStmtOpenMP.cpp
+3-31 files

LLVM/project e6f3ccallvm/lib/Target/RISCV RISCVSchedSpacemitX60.td, llvm/test/tools/llvm-mca/RISCV/SpacemitX60 rvv-mask.s rvv-permutation.s

[RISCV] Update SpacemiT-X60 vector mask instructions latencies (#150644)

This PR adds hardware-measured latencies for all instructions defined in
Section 15 of the RVV specification: "Vector Mask Instructions" to the
SpacemiT-X60 scheduling model.
DeltaFile
+379-379llvm/test/tools/llvm-mca/RISCV/SpacemitX60/rvv-mask.s
+39-39llvm/test/tools/llvm-mca/RISCV/SpacemitX60/rvv-permutation.s
+17-6llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td
+435-4243 files

LLVM/project 3422b79llvm/lib/Transforms/Utils LowerMemIntrinsics.cpp, llvm/test/CodeGen/AMDGPU memset-param-combinations.ll memintrinsic-unroll.ll

[LowerMemIntrinsics] Optimize memset lowering

This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.

This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.

I'm planning similar treatment for memset.pattern as a follow-up PR.

For SWDEV-543208.
DeltaFile
+1,900-0llvm/test/CodeGen/AMDGPU/memset-param-combinations.ll
+1,616-0llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+686-90llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
+218-116llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-mem-transfer.ll
+197-7llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+103-11llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memset.ll
+4,720-22411 files not shown
+4,826-30117 files

LLVM/project d69320ellvm/include/llvm/Frontend/Directive DirectiveBase.td, llvm/include/llvm/Frontend/OpenMP OMP.td

[OpenMP] Introduce "loop sequence" as directive association (#168934)

OpenMP 6.0 introduced a `fuse` directive, and with it a "loop sequence"
as the associated code. What used to be "loop association" has become
"loop-nest association".

Rename Association::Loop to LoopNest, add Association::LoopSeq to
represent the "loop sequence" association.

Change the association of fuse from "block" to "loop sequence".
DeltaFile
+12-12llvm/include/llvm/Frontend/OpenMP/OMP.td
+10-9llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
+3-2llvm/test/TableGen/directive1.td
+3-2llvm/test/TableGen/directive2.td
+3-2llvm/include/llvm/Frontend/Directive/DirectiveBase.td
+2-2llvm/lib/Frontend/OpenMP/OMP.cpp
+33-293 files not shown
+38-349 files

LLVM/project fe74323llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

[AArch64] Avoid introducing illegal types in LowerVECTOR_COMPRESS (NFC) (#168520)

This does not seem to be an issue currently, but when using
VECTOR_COMPRESS as part of another lowering, I found these BITCASTs
would result in "Unexpected illegal type!" errors.

For example, this would convert the legal nxv2f32 type into the illegal
nxv2i32 type. This patch avoids this by using no-op casts for unpacked
types.
DeltaFile
+45-16llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+45-161 files

LLVM/project 5ab49edmlir/include/mlir-c IR.h, mlir/lib/Bindings/Python IRCore.cpp

[mlir][py][c] Enable setting block arg locations. (#169033)

This enables changing the location of a block argument. Follows the
approach for updating type of block arg.
DeltaFile
+15-0mlir/test/python/ir/blocks.py
+6-0mlir/lib/Bindings/Python/IRCore.cpp
+5-0mlir/lib/CAPI/IR/IR.cpp
+4-0mlir/include/mlir-c/IR.h
+30-04 files

LLVM/project 1763830llvm/lib/Transforms/Utils LowerMemIntrinsics.cpp, llvm/test/CodeGen/AMDGPU memset-param-combinations.ll memintrinsic-unroll.ll

[LowerMemIntrinsics] Optimize memset lowering

This patch changes the memset lowering to match the optimized memcpy lowering.
The memset lowering now queries TTI.getMemcpyLoopLoweringType for a preferred
memory access type. If that type is larger than a byte, the memset is lowered
into two loops: a main loop that stores a sufficiently wide vector splat of the
SetValue with the preferred memory access type and a residual loop that covers
the remaining bytes individually. If the memset size is statically known, the
residual loop is replaced by a sequence of stores.

This improves memset performance on gfx1030 (AMDGPU) in microbenchmarks by
around 7-20x.

I'm planning similar treatment for memset.pattern as a follow-up PR.

For SWDEV-543208.
DeltaFile
+1,900-0llvm/test/CodeGen/AMDGPU/memset-param-combinations.ll
+1,616-0llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+686-90llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
+218-116llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-mem-transfer.ll
+197-7llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+103-11llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memset.ll
+4,720-22411 files not shown
+4,826-30117 files