LLVM/project 39d4dfbllvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVISelLowering.h, llvm/test/CodeGen/RISCV vmadd-reassociate.ll

[RISCV] Incorporate scalar addends to extend vector multiply accumulate chains (#168660)

Previously, the following:
      %mul0 = mul nsw <8 x i32> %m00, %m01
      %mul1 = mul nsw <8 x i32> %m10, %m11
      %add0 = add <8 x i32> %mul0, splat (i32 32)
      %add1 = add <8 x i32> %add0, %mul1

    lowered to:
      vsetivli zero, 8, e32, m2, ta, ma
      vmul.vv v8, v8, v9
      vmacc.vv v8, v11, v10
      li a0, 32
      vadd.vx v8, v8, a0

    After this patch, now lowers to:
      li a0, 32
      vsetivli zero, 8, e32, m2, ta, ma
      vmv.v.x v12, a0

    [12 lines not shown]
DeltaFile
+143-0llvm/test/CodeGen/RISCV/vmadd-reassociate.ll
+14-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+5-0llvm/lib/Target/RISCV/RISCVISelLowering.h
+162-03 files

LLVM/project f8a8039llvm/include/llvm/Support ThreadPool.h

llvm: Disable copy for SingleThreadExecutor (#168782)

This is a workaround for the MSVC compiler, which attempts to generate a
copy assignment operator implementation for classes marked as
`__declspec(dllexport)`. Explicitly marking the copy assignment operator
as deleted works around the problem.

DevCom ticket:
https://developercommunity.microsoft.com/t/Classes-marked-with-__declspecdllexport/11003192
DeltaFile
+6-0llvm/include/llvm/Support/ThreadPool.h
+6-01 files

LLVM/project 55d8b63lldb/cmake/modules LLDBConfig.cmake

[lldb] Don't enable the Limited C API with Python 3.13 and SWIG 4.4.0 (#169065)

Don't automatically enable the Limited C API when we're targeting Python
3.13 or later in combination with SWIG 4.4.0 due to a bug in the latter.

SWIG Issue: https://github.com/swig/swig/issues/3283
SWIG PR: https://github.com/swig/swig/pull/3285
DeltaFile
+6-1lldb/cmake/modules/LLDBConfig.cmake
+6-11 files

LLVM/project 28c048ecompiler-rt/test/profile/Linux instrprof-debug-info-correlate-warnings.ll instrprof-debug-info-correlate-warnings.c, llvm/lib/ProfileData InstrProfCorrelator.cpp

[profdata] Skip probes with missing counter and function pointers (#163254)

DeltaFile
+183-0compiler-rt/test/profile/Linux/instrprof-debug-info-correlate-warnings.ll
+13-11llvm/lib/ProfileData/InstrProfCorrelator.cpp
+0-13compiler-rt/test/profile/Linux/instrprof-debug-info-correlate-warnings.c
+196-243 files

LLVM/project f23f78cllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll

Revert "[AMDGPU] Remove leftover implicit operands from SI_SPILL/SI_RESTORE. …"

This reverts commit b79a665f7170fbb631b13175ec747ccfd779bf9e.
DeltaFile
+1,005-1,005llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+31-31llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+22-22llvm/test/CodeGen/AMDGPU/scc-clobbered-sgpr-to-vmem-spill.ll
+21-21llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+20-20llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll
+8-8llvm/test/CodeGen/AMDGPU/fold-reload-into-exec.mir
+1,107-1,1077 files not shown
+1,121-1,11813 files

LLVM/project b6c2c10clang/include/clang/AST OpenMPClause.h OpenACCClause.h, clang/lib/AST ExprObjC.cpp

[AST] Construct iterator_range with the conversion constructor (NFC) (#169004)

This patch simplifies iterator_range construction with the conversion
constructor.
DeltaFile
+48-96clang/include/clang/AST/OpenMPClause.h
+3-7clang/include/clang/AST/OpenACCClause.h
+3-6clang/include/clang/AST/ExprObjC.h
+2-4clang/include/clang/AST/ExprCXX.h
+1-2clang/include/clang/AST/Stmt.h
+1-2clang/lib/AST/ExprObjC.cpp
+58-1176 files

LLVM/project 2185379llvm/include/llvm/CAS OnDiskTrieRawHashMap.h

[CAS] Remove redundant casts (NFC) (#169002)

FileOffset::get already returns uint64_t.

Identified with readability-redundant-casting.
DeltaFile
+2-2llvm/include/llvm/CAS/OnDiskTrieRawHashMap.h
+2-21 files

LLVM/project f7e0432llvm/lib/LTO LTOModule.cpp

[LTO] Use a range-based for loop (NFC) (#169000)

Identified with modernize-loop-convert.
DeltaFile
+3-5llvm/lib/LTO/LTOModule.cpp
+3-51 files

LLVM/project b27749dcompiler-rt/lib/scudo/standalone combined.h memtag.h, compiler-rt/lib/scudo/standalone/tests secondary_test.cpp memtag_test.cpp

[scudo] Small cleanup of memory tagging code part 2. (#168807)

Make the systemSupportsMemoryTagging() function return even on system
that don't support memory tagging. This avoids the need to always check
if memory tagging is supported before calling the function.

Modify iterateOverChunks() to call useMemoryTagging<>(Options) to
determine if mte is supported. This already uses the cached check of
systemSupportsMemoryTagging() rather than directly calling that
function.

Updated the code that calls systemSupportsMemoryTagging().
DeltaFile
+4-6compiler-rt/lib/scudo/standalone/combined.h
+2-4compiler-rt/lib/scudo/standalone/memtag.h
+1-3compiler-rt/lib/scudo/standalone/tests/secondary_test.cpp
+0-1compiler-rt/lib/scudo/standalone/tests/memtag_test.cpp
+7-144 files

LLVM/project 8be4641flang/lib/Lower Bridge.cpp, flang/test/Lower select-case-statement.f90

[flang] Use hlfir.cmpchar for SELECT CASE of charsSelect case hlfir cmpchar (#168476)

For SELECT CASE with character selector, instead of allways calling
runtime comparison function, emit hlfir.cmpchar. This has different
behaviors at different optimization levels: at -O0, it still emits
flang-rt call, but at higher optimization levels it does inline
comparison. Modify test/Lower/select-case-statement.f90 to test both
comparison cases.
DeltaFile
+94-48flang/test/Lower/select-case-statement.f90
+2-8flang/lib/Lower/Bridge.cpp
+96-562 files

LLVM/project 9d2b7ecllvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] remove getSplitIteration (#167698)

Remove getSplitIteration.
A follow-up patch will also remove DVEntry::Splitable and Dependnece::isSplitable.
DeltaFile
+5-166llvm/lib/Analysis/DependenceAnalysis.cpp
+4-60llvm/include/llvm/Analysis/DependenceAnalysis.h
+2-4llvm/test/Analysis/DependenceAnalysis/run-specific-dependence-test.ll
+2-4llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll
+1-2llvm/test/Analysis/DependenceAnalysis/Propagating.ll
+1-2llvm/test/Analysis/DependenceAnalysis/SymbolicSIV.ll
+15-2386 files

LLVM/project e724009lldb/unittests/Expression DWARFExpressionTest.cpp

[lldb] Add MockMemory class for dwarf expression testing (#168467)

This change unifies the way that we specify mocked memory to make it
easy to control the process and target memory contents for unit tests.
We add a MockMemory class that can be used in dwarf expression testing
to specify the output of the `ReadMemory` function.

The MockMemory class is built on a map that maps a `(address, size)`
pair to a vector of bytes that is `size` bytes long and contains the
memory contents for that `address`.

The MockProcessWithMemRead and MockTarget classes are updated to use the
new MockMemory interface. The MockProcessWithMemRead class was renamed
to MockProcess and the old MockProcess was deleted. The old MockProcess had
and ReadMemory implementation that returned the value `i & 0xff` for reading the
address `i` and was easily be replaced with the MockMemory object.

The CreateTestContext function now takes optional values for process memory and 
target memory and uses those to create the mock objects.
DeltaFile
+122-94lldb/unittests/Expression/DWARFExpressionTest.cpp
+122-941 files

LLVM/project ba36c48llvm/test/CodeGen/AMDGPU shufflevector.v4p0.v4p0.ll shufflevector.v4i64.v4i64.ll, llvm/test/tools/llvm-ir2vec/output reference_x86_entities.txt

Merge branch 'main' into users/kparzysz/n02-loop-nest-parser
DeltaFile
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll
+0-7,157llvm/test/tools/llvm-ir2vec/output/reference_x86_entities.txt
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3i64.v4i64.ll
+21,976-41,878763 files not shown
+89,961-99,668769 files

LLVM/project 01227abutils/bazel/llvm-project-overlay/llvm BUILD.bazel

[bazel][ORC] Port #168518: orc deps (#169059)

DeltaFile
+1-0utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+1-01 files

LLVM/project f56dddellvm/lib/Target/ARM ARMInstrThumb2.td, llvm/test/CodeGen/Thumb2/LowOverheadLoops pr168209.ll

[ARM] Restore hasSideEffects flag on t2WhileLoopSetup (#168948)

ARM relies on deprecated TableGen behavior of guessing instruction
properties from patterns (`def ARM : Target` doesn't have
`guessInstructionProperties` set to false).

Before #168209, TableGen conservatively guessed that `t2WhileLoopSetup`
has side effects because the instruction wasn't matched by any pattern.

After the patch, TableGen guesses it has no side effects because the
added pattern uses only `arm_wlssetup` node, which has no side effects.

Add `SDNPSideEffect` to the node so that TableGen guesses the property
right, and also `hasSideEffects = 1` to the instruction in case ARM ever
sets `guessInstructionProperties` to false.
DeltaFile
+45-0llvm/test/CodeGen/Thumb2/LowOverheadLoops/pr168209.ll
+3-1llvm/lib/Target/ARM/ARMInstrThumb2.td
+48-12 files

LLVM/project 1367515llvm/test/CodeGen/X86/apx no-rex2-general.ll no-rex2-special.ll

test-changes
DeltaFile
+8-18llvm/test/CodeGen/X86/apx/no-rex2-general.ll
+8-16llvm/test/CodeGen/X86/apx/no-rex2-special.ll
+4-8llvm/test/CodeGen/X86/apx/no-rex2-pseudo-x87.ll
+2-4llvm/test/CodeGen/X86/apx/no-rex2-pseudo-amx.ll
+22-464 files

LLVM/project 16266e1llvm/lib/Target/X86 X86InstrInfo.cpp X86InstrInfo.h

X86: Stop overriding getRegClass

This function should not be virtual; making this virtual was
an AMDGPU hack that should be removed not spread to other
backends.

This does not need to be overridden to reserve registers. The
register reservation mechanism is orthogonal to to the register
class constraints of the instruction, this should be reporting
the underlying instruction constraint. The registers are separately
reserved, so they will be removed from the allocation order anyway.
If the actual class needs to change based on the subtarget,
it should probably generalize the LookupPtrRegClass mechanism.

This was added by #70958. The new tests there for the class are
probably not useful anymore. These instead should compile to the
end and try to stress the allocation behavior.
DeltaFile
+0-15llvm/lib/Target/X86/X86InstrInfo.cpp
+0-9llvm/lib/Target/X86/X86InstrInfo.h
+0-242 files

LLVM/project 76a6816flang/lib/Lower Runtime.cpp

[flang][NFC] replace std::exit by fir::emitFatalError in Lower/Runtime.cpp (#169050)

DeltaFile
+1-2flang/lib/Lower/Runtime.cpp
+1-21 files

LLVM/project bb2e468compiler-rt/lib/tsan/rtl tsan_platform_mac.cpp

[TSan] [Darwin] Fix off by one in TSAN init due to MemoryRangeIsAvailable (#169008)

DeltaFile
+1-1compiler-rt/lib/tsan/rtl/tsan_platform_mac.cpp
+1-11 files

LLVM/project 4538818clang/lib/CodeGen CGOpenMPRuntimeGPU.cpp, clang/test/OpenMP spirv_target_codegen_basic.cpp

[OpenMP][OMPIRBuilder] Use runtime CC for runtime calls (#168608)

Some targets have a specific calling convention that should be used for
generated calls to runtime functions.

Pass that down and use it.

Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
DeltaFile
+120-109llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+12-0mlir/test/Target/LLVMIR/omptarget-runtimecc.mlir
+10-0llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+4-1mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
+2-1clang/test/OpenMP/spirv_target_codegen_basic.cpp
+2-0clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+150-1111 files not shown
+151-1117 files

LLVM/project 89bb99dflang/include/flang/Optimizer/OpenACC/Support FIROpenACCOpsInterfaces.h, flang/lib/Optimizer/OpenACC/Support FIROpenACCOpsInterfaces.cpp RegisterOpenACCExtensions.cpp

[acc][flang] Implement acc interface for tracking type descriptors (#168982)

FIR operations that use derived types need to have type descriptor
globals available on device when offloading. Examples of this can be
seen in `CUFDeviceGlobal` which ensures that such type descriptor uses
work on device for CUF.

Similarly, this is needed for OpenACC. This change introduces a new
interface to the OpenACC dialect named
`IndirectGlobalAccessOpInterface` which can be attached to operations
that may result in generation of accesses that use type descriptor
globals. This functionality is needed for the `ACCImplicitDeclare` pass
that is coming in a follow-up change which implicitly ensures that all
referenced globals are available in OpenACC compute contexts.

The interface provides a `getReferencedSymbols` method that collects all
global symbols referenced by an operation. When a symbol table is
provided, the implementation for FIR recursively walks type descriptor
globals to find all transitively referenced symbols.

    [13 lines not shown]
DeltaFile
+96-0flang/lib/Optimizer/OpenACC/Support/FIROpenACCOpsInterfaces.cpp
+23-0mlir/include/mlir/Dialect/OpenACC/OpenACCOpsInterfaces.td
+9-0flang/include/flang/Optimizer/OpenACC/Support/FIROpenACCOpsInterfaces.h
+9-0flang/lib/Optimizer/OpenACC/Support/RegisterOpenACCExtensions.cpp
+137-04 files

LLVM/project 0b6db77llvm/test/CodeGen/AMDGPU mfma-loop.ll a-v-flat-atomicrmw.ll

[AMDGPU] Handle AV classes in SIFixSGPRCopies::processPHINode (#169038)

Fix a problem exposed by #166483 using AV classes in more places.
`isVectorRegister` only accepts registers of VGPR or AGPR classes.
`hasVectorRegisters` additionally accepts the combined AV classes.

Fixes: #168761
DeltaFile
+2,016-814llvm/test/CodeGen/AMDGPU/mfma-loop.ll
+315-327llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll
+40-37llvm/test/CodeGen/AMDGPU/av-split-dead-valno-crash.ll
+34-38llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll
+24-24llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
+37-0llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-phi-regression-av-classes.ll
+2,466-1,2404 files not shown
+2,472-1,24510 files

LLVM/project bc323b6llvm/test/CodeGen/AMDGPU shufflevector.v4p0.v4p0.ll shufflevector.v4i64.v4i64.ll

AMDGPU: Stop implementing shouldCoalesce (#168988)

Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.
DeltaFile
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3p0.v4p0.ll
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3i64.v4i64.ll
+24,242-38,39661 files not shown
+56,404-78,61967 files

LLVM/project 77c329fmlir/include/mlir/Dialect/LLVMIR ROCDLOps.td, mlir/test/Dialect/LLVMIR rocdl.mlir

[mlir][ROCDL] Adds wmma scaled intrinsics for gfx1250 (#165915)

Signed-off-by: Muzammiluddin Syed <muzasyed at amd.com>
DeltaFile
+152-1mlir/test/Target/LLVMIR/rocdl.mlir
+84-28mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+20-0mlir/test/Dialect/LLVMIR/rocdl.mlir
+256-293 files

LLVM/project 8c3f59fmlir/include/mlir/Dialect/GPU/IR GPUOps.td, mlir/lib/Conversion/GPUToNVVM WmmaOpsToNvvm.cpp

Revert "[MLIR][GPU] subgroup_mma fp64 extension" (#169049)

Reverts llvm/llvm-project#165873

The revert is triggered by a failing integration test on a couple of
buildbots.
DeltaFile
+0-72mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f64.mlir
+10-42mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
+0-22mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
+4-4mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+2-2mlir/test/Dialect/GPU/invalid.mlir
+2-2mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+18-1442 files not shown
+20-1468 files

LLVM/project 560b83cllvm/test/Instrumentation/TypeSanitizer anon.ll basic.ll

[TySan][Clang] Add clang flag to use tysan outlined instrumentation a… (#166170)

…nd update docs
DeltaFile
+316-4llvm/test/Instrumentation/TypeSanitizer/anon.ll
+231-4llvm/test/Instrumentation/TypeSanitizer/basic.ll
+180-1llvm/test/Instrumentation/TypeSanitizer/sanitize-no-tbaa.ll
+127-10llvm/test/Instrumentation/TypeSanitizer/basic-nosan.ll
+132-1llvm/test/Instrumentation/TypeSanitizer/alloca.ll
+111-10llvm/test/Instrumentation/TypeSanitizer/byval.ll
+1,097-3017 files not shown
+1,565-15923 files

LLVM/project 2a3e745llvm/test/CodeGen/AMDGPU rewrite-vgpr-mfma-scale-to-agpr.mir

Fix test from #168609 (#169041)

DeltaFile
+1-1llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-scale-to-agpr.mir
+1-11 files

LLVM/project b98f6a5llvm/lib/Transforms/Vectorize VPlanUtils.cpp

[VPlan] Cast to VPIRMetadata in getMemoryLocation (NFC) (#169028)

This allows us to strip an unnecessary TypeSwitch.
DeltaFile
+10-13llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+10-131 files

LLVM/project 1ced6d1llvm/test/Analysis/Delinearization validation_parametric_sizes.ll

[Delinarization] Add test for inferred array size exceeds integer range
DeltaFile
+87-0llvm/test/Analysis/Delinearization/validation_parametric_sizes.ll
+87-01 files

LLVM/project aa02f85llvm/include/llvm/Analysis Delinearization.h, llvm/lib/Analysis DependenceAnalysis.cpp Delinearization.cpp

[DA][Delinearization] Move validation logic into Delinearization
DeltaFile
+4-130llvm/lib/Analysis/DependenceAnalysis.cpp
+107-0llvm/lib/Analysis/Delinearization.cpp
+10-0llvm/include/llvm/Analysis/Delinearization.h
+8-0llvm/test/Analysis/Delinearization/fixed_size_array.ll
+2-0llvm/test/Analysis/Delinearization/terms_with_identity_factor.ll
+2-0llvm/test/Analysis/Delinearization/constant_functions_multi_dim.ll
+133-13013 files not shown
+150-13019 files