LLVM/project 9e611e8clang/include/clang/CIR/Dialect/IR CIROps.td CIRAttrs.td, clang/include/clang/CIR/Interfaces ASTAttrInterfaces.h

[CIR] Add abstract delete operation without AST attribute (#185538)

This introduces the cir.delete_array operation, adds code to emit that
operation during CIR codegen, and adds lowering of the operation to the
CXXABILowering pass.

In order to handle possible variations in the delete representation, we
add the name of the delete function, the usual delete parameters, and,
optionally, the name of the element destructor function.

During the CXXABILoweringPass, the cir.delete_array operation is
expanded to call the delete function. This will be extended in a future
change to handle reading the array cookie, if required, and calling
element destructors.
DeltaFile
+64-0clang/test/CIR/CodeGen/delete-array.cpp
+41-0clang/include/clang/CIR/Dialect/IR/CIROps.td
+28-1clang/lib/CIR/Dialect/Transforms/CXXABILowering.cpp
+24-0clang/include/clang/CIR/Dialect/IR/CIRAttrs.td
+17-3clang/lib/CIR/CodeGen/CIRGenExprCXX.cpp
+1-0clang/include/clang/CIR/Interfaces/ASTAttrInterfaces.h
+175-46 files

LLVM/project 7a10495mlir/lib/Dialect/OpenACC/IR OpenACCCG.cpp

[MLIR] Fix -Wunused-variable in 41c0b19d878f2bb9b2c0a4ccb08f81da992e4fef
DeltaFile
+1-2mlir/lib/Dialect/OpenACC/IR/OpenACCCG.cpp
+1-21 files

LLVM/project 8e24cb4clang-tools-extra/clang-doc/assets comment-template.mustache

[clang-doc][NFC] Remove outdated tag in comment template (#185704)
DeltaFile
+0-5clang-tools-extra/clang-doc/assets/comment-template.mustache
+0-51 files

LLVM/project b50cf35llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU fp-min-max-num-flat-atomics.ll fp-min-max-num-global-atomics.ll

[AMDGPU][GlobalIsel] Add register bank legalization rules for amdgcn atomic fminmax num (#184564)

This patch adds register bank legalization rules for amdgcn global/flat
atomic fmin/fmax num operations in the AMDGPU GlobalISel pipeline.
DeltaFile
+145-30llvm/test/CodeGen/AMDGPU/fp-min-max-num-flat-atomics.ll
+68-26llvm/test/CodeGen/AMDGPU/fp-min-max-num-global-atomics.ll
+8-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+221-563 files

LLVM/project ab048acclang/lib/Headers __clang_cuda_runtime_wrapper.h

[clang][CUDA] Define _NV_RSQRT_SPECIFIER for glibc-2.42/cuda-13.2 compatibility (#185701)

CUDA-13.2 defines _NV_RSQRT_SPECIFIER to make its headers compileable
with glibc 2.42+. However, clang does not include the header that
defines the macro, and has to define it by itself.
DeltaFile
+12-0clang/lib/Headers/__clang_cuda_runtime_wrapper.h
+12-01 files

LLVM/project e94c21autils/bazel/llvm-project-overlay/mlir BUILD.bazel

[Bazel] Port 41c0b19d878f2bb9b2c0a4ccb08f81da992e4fef
DeltaFile
+1-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+1-01 files

LLVM/project 1bebdfbflang/lib/Parser openmp-parsers.cpp, flang/test/Parser/OpenMP no-commas-in-ods-list-item.f90

[flang][OpenMP] Allow parsing ODS as directive-specification list item

Normally a directive specification may use commas between the directive
name and the clauses, and between the clauses. There are some instances,
however, when a directive-specification is treated as a list item.
Specifically in arguments to the APPLY clause and as an argument to WHEN,
OTHERWISE, and the now-deprecated DEFAULT when used on a METADIRECTIVE.
In those cases, use of commas is prohibited to avoid confusion between
commas being part of the directive-specification, and the argument list
separators.
DeltaFile
+122-61flang/lib/Parser/openmp-parsers.cpp
+16-0flang/test/Parser/OpenMP/no-commas-in-ods-list-item.f90
+138-612 files

LLVM/project 79c9dadllvm/lib/Target/SPIRV SPIRVInstructionSelector.cpp, llvm/test/CodeGen/SPIRV/llvm-intrinsics powi-glsl.ll

[SPIR-V] Add lowering for G_FPOWI (#185454)

This fixes an assertion I was hitting in the fragment density map sample
in Vulkan Samples. In starfield.frag.hlsl, we have

float starCol = pow((rnd - threshhold) / (1.0 - threshhold), 16.0);

The optimizer recognizes 16.0 as a whole number and converts the call to
`llvm.powi`. The backend goes on to fail with:

fatal error: error in backend: cannot select: %46:fid(s64) = nnan ninf
nsz arcp afn reassoc G_FPOWI %44:fid, %45:iid(s64) (in function:
_Z9starFieldDv3_f)

On Vulkan, there is no integer-exponent for pow. This patch lowers it by
converting the exponent to float and calling GLSL.std.450's Pow.

---------

Co-authored-by: Steven Perron <stevenperron at google.com>
DeltaFile
+43-0llvm/test/CodeGen/SPIRV/llvm-intrinsics/powi-glsl.ll
+27-1llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+70-12 files

LLVM/project 4398b5flldb/include/lldb/Utility DataExtractor.h VirtualDataExtractor.h, lldb/source/Symbol ObjectFile.cpp

[lldb] Have ObjectFile::FindPlugin send a copy of the DE (#185727)

ObjectFile::FindPlugin iterates over plugins to find one that can handle
the binary provided. It is currently sending the one DataExtractorSP to
each subclass, but some subclasses may modify this DataExtractor during
their processing, e.g. calling DataExtractor::SetData on it, and I think
it is safer to isolate these with a copy of the DataExtractor so the
order the plugins are tried cannot possibly change behavior.
DeltaFile
+6-2lldb/source/Symbol/ObjectFile.cpp
+6-0lldb/include/lldb/Utility/DataExtractor.h
+4-0lldb/include/lldb/Utility/VirtualDataExtractor.h
+16-23 files

LLVM/project 9472490clang/docs ReleaseNotes.rst, clang/lib/Sema SemaOverload.cpp

[clang] fix explicit incomplete enum (#184210)

stop BuildConvertedConstantExpression early for already-broken
expressions to prevent crashes in the constant conversion
fixes #183887
DeltaFile
+8-0clang/test/SemaCXX/gh183887.cpp
+8-0clang/lib/Sema/SemaOverload.cpp
+1-0clang/docs/ReleaseNotes.rst
+17-03 files

LLVM/project 41c0b19mlir/include/mlir/Dialect/OpenACC/Transforms Passes.td, mlir/lib/Dialect/OpenACC/Transforms ACCComputeLowering.cpp

[mlir][acc] Add ACCComputeLowering pass (#185501)

Introduce a pass that lowers OpenACC compute constructs to a
representation that separates the data environment from the compute body
and prepares for parallelism assignment and privatization at the right
granularity.

- Decompose acc.parallel, acc.serial, and acc.kernels into
acc.kernel_environment and acc.compute_region. Launch arguments
(num_gangs, num_workers, vector_length) are turned into acc.par_width
and passed as compute_region launch operands.
- Convert acc.loop to SCF based on context: unstructured loops to
scf.execute_region; sequential (serial or seq) to scf.parallel with
par_dims=sequential; auto loops to scf.for (with collapse when
multi-dimensional); orphan loops to scf.for; independent loops in
parallel/kernels to scf.parallel with par_dims from the GPU mapping.

---------

Co-authored-by: Scott Manley <rscottmanley at gmail.com>
DeltaFile
+372-0mlir/lib/Dialect/OpenACC/Transforms/ACCComputeLowering.cpp
+176-0mlir/test/Dialect/OpenACC/acc-compute-lowering-loop.mlir
+107-0mlir/test/Dialect/OpenACC/acc-compute-lowering-compute.mlir
+75-1mlir/unittests/Dialect/OpenACC/OpenACCUtilsCGTest.cpp
+46-25mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td
+53-0mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsCG.cpp
+829-269 files not shown
+936-4615 files

LLVM/project 4131535flang/test/Driver fsafe-trampoline.f90

[flang] Add REQUIRES for the trampoline test (#185699)

Instead of listing the UNSUPPORTED list, it makes sense to have
the REQUIRES. 

Fix build failure in https://lab.llvm.org/buildbot/#/builders/157/builds/45154.
DeltaFile
+1-1flang/test/Driver/fsafe-trampoline.f90
+1-11 files

LLVM/project c3155felibclc/clc/include/clc/math clc_div_cr.h, libclc/clc/lib/generic CMakeLists.txt

libclc: Add div_cr utility function

This is a workaround for the modal div operator precision. The
OpenCL default is not correctly rounded, so this provides a backdoor
to get a correctly rounded fdiv. Ideally clang would have a builtin
or some other mechanism to control the precision.
DeltaFile
+26-0libclc/clc/include/clc/math/clc_div_cr.h
+12-0libclc/clc/lib/generic/math/clc_div_cr.inc
+11-0libclc/clc/lib/generic/math/clc_div_cr.cl
+4-0libclc/clc/lib/generic/CMakeLists.txt
+53-04 files

LLVM/project 859313fclang/include/clang/Analysis/Scalable/Analyses/UnsafeBufferUsage UnsafeBufferUsage.h, clang/lib/Analysis/Scalable CMakeLists.txt

[ssaf][UnsafeBufferUsage] Add JSON serialization for UnsafeBufferUsage

Implemented and registered a JSONFormat::FormatInfo for
UnsafeBufferUsage analysis

rdar://171920065
DeltaFile
+126-5clang/unittests/Analysis/Scalable/Analyses/UnsafeBufferUsage/UnsafeBufferUsageTest.cpp
+85-0clang/lib/Analysis/Scalable/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.cpp
+26-3clang/include/clang/Analysis/Scalable/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.h
+1-0clang/lib/Analysis/Scalable/CMakeLists.txt
+238-84 files

LLVM/project 30f13b1llvm/test/CodeGen/AMDGPU vgpr-mark-last-scratch-load.ll

[AMDGPU] New test for untested line in AMDGPUMarkLastScratchLoad (#185430)

[This
line](https://github.com/llvm/llvm-project/blob/af15474262100ade9a8fcfd05f9e05c7ba23ff8c/llvm/lib/Target/AMDGPU/AMDGPUMarkLastScratchLoad.cpp#L121)
in the AMDGPU backend is uncovered by the existing test suite (checked
using coverage, and by asserting that no tests in the existing test
suite fails if we insert an `abort()` at this line).

We propose a test that covers this line. We demonstrate the test by
inserting an `abort()` at that line in commit
[#3cb65cf](https://github.com/llvm/llvm-project/pull/185430/changes/3cb65cf4451b5e728fb1e4968ba78b8e83d74220).
Running all tests shows that only our proposed test fails in the
presence of the abort. We'll remove the abort before merging.

This is the only test that fails in the presence of the abort (our new
test) -- it will pass once we remove the abort:
`CodeGen/AMDGPU/mark-last-scratch-load.ll`
DeltaFile
+305-0llvm/test/CodeGen/AMDGPU/vgpr-mark-last-scratch-load.ll
+305-01 files

LLVM/project 122cffallvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUCoExecSchedStrategy.h, llvm/test/CodeGen/AMDGPU coexec-scheduler.ll

[AMDGPU] Add stalls for DS FIFO buffer

Change-Id: I73e56da97a931349e0655e4e20b24aeb97920647
DeltaFile
+56-53llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
+58-25llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+41-6llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+155-843 files

LLVM/project d8d9df5libc/test/integration/src/__support/GPU scan_reduce.cpp, libclc/clc/lib/amdgpu/subgroup clc_sub_group_reduce.cl sub_group_reduce.cl

Address comments

Created using spr 1.3.7
DeltaFile
+289-52llvm/test/Analysis/CostModel/AMDGPU/exp10.ll
+289-52llvm/test/Analysis/CostModel/AMDGPU/exp.ll
+153-48llvm/test/Analysis/CostModel/AMDGPU/exp2.ll
+145-0libclc/clc/lib/amdgpu/subgroup/clc_sub_group_reduce.cl
+0-145libclc/clc/lib/amdgpu/subgroup/sub_group_reduce.cl
+107-0libc/test/integration/src/__support/GPU/scan_reduce.cpp
+983-297127 files not shown
+2,697-962133 files

LLVM/project 5e2d990llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

Use findAllocaInsertPoint when possible and move the affinity packing logic to OpenMPToLLVMIRTranslation

- Move the omp.affinity_list packing logic from OMPIRBuilder to
  OpenMPToLLVMIRTranslation so that we have all the omp.affinity_list
  allocating logic inside the lambda defined in buildAffinityData
  - all the allocation logic for affinity list is now using
    findAllocaInsertPoint when possible (static count)
  - `task_affinity_iterator_dynamic_tripcount` in
    openmp-iterator.mlir is a regression test add previously for
    dynamic tripcount
DeltaFile
+67-7mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+3-49llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+6-6mlir/test/Target/LLVMIR/openmp-iterator.mlir
+5-1llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+1-3llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+82-665 files

LLVM/project 55e4326llvm/lib/Target/SystemZ SystemZAsmPrinter.cpp SystemZAsmPrinter.h

Reorder code to avoid globals
DeltaFile
+6-8llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+2-3llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
+8-112 files

LLVM/project 40cd48flld/test/wasm relocatable.ll

[lld][WebAssembly] Restore inactive checks relocatable.ll test. NFC (#185569)

Back in 6474d1b20 this test was updated, removing the NORMAL vs SHARED
distinction in the output checking. However many of the NORMAL-NEXT
lines were left unmodified, making them effectively disabled.

This restores and updates the expectations.
DeltaFile
+201-173lld/test/wasm/relocatable.ll
+201-1731 files

LLVM/project 5c4856cllvm/lib/Transforms/InstCombine InstCombineCasts.cpp, llvm/test/Transforms/InstCombine trunc.ll

[InstCombine] Fold trunc (usub.sat 1, x) to i1 -> icmp eq x, 0 (#185524)

Regression noticed in https://github.com/llvm/llvm-project/pull/184182

Proof: https://alive2.llvm.org/ce/z/hsyFbC
DeltaFile
+55-0llvm/test/Transforms/InstCombine/trunc.ll
+5-0llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+60-02 files

LLVM/project fd069a4compiler-rt/cmake base-config-ix.cmake, compiler-rt/cmake/Modules AllSupportedArchDefs.cmake

[copmiler-rt] Initial support for building profile library on the GPU (#185552)

Summary:
As suggested in https://github.com/llvm/llvm-project/pull/177665, we
should build a GPU version of the compiler-rt profile library instead of
writing it in-line in the lowering. This PR does not define anything GPU
specific, it simply re-uses the baremetal handling. Later PRs will
prevent the GPU specific handling we would want to do to optimize
counter handling on the GPU.

Note that this will require using the cache file, or setting these
options
manually for existing users. Hopefully if people are using the cache
file
as they should it won't break anything.
DeltaFile
+10-0compiler-rt/lib/profile/CMakeLists.txt
+4-2compiler-rt/cmake/caches/GPU.cmake
+2-4compiler-rt/cmake/base-config-ix.cmake
+2-2offload/cmake/caches/Offload.cmake
+3-1compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
+2-2offload/cmake/caches/FlangOffload.cmake
+23-114 files not shown
+27-1510 files

LLVM/project bc3bcd0llvm/test/CodeGen/AMDGPU llvm.amdgcn.mfma.ll llvm.amdgcn.sched.group.barrier.ll

[AMDGPU] Adds AGPR pressure during candidate init in GCN scheduler.

Scheduling heuristics automatically will consider AGPR pressure.
AGPRExcessLimit and AGPRCriticalLimit are added. Some of the VGPR
bias and error limits are reused. Helpers added mostly mirror the
existing VGPR logic. A ConsiderAGPR boolean controls whether AGPRs
should at all be factored in during candidate initialization, e.g.
on targets with allocatable AGPRs.

Verified that updated LIT tests use AGPRs.

Originally Authored-by: Nicholas Baron
(https://github.com/llvm/llvm-project/pull/150288)

Modified-by: Dhruva Chakrabarti

Assisted-by: Cursor
DeltaFile
+546-687llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
+390-374llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sched.group.barrier.ll
+337-388llvm/test/CodeGen/AMDGPU/mfma-cd-select.ll
+181-181llvm/test/CodeGen/AMDGPU/agpr-csr.ll
+120-112llvm/test/CodeGen/AMDGPU/mfma-no-register-aliasing.ll
+69-72llvm/test/CodeGen/AMDGPU/spill-agpr.ll
+1,643-1,8149 files not shown
+1,850-1,95915 files

LLVM/project 0872043llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-scheduler.ll

Update for rebase

Change-Id: If807373eb8553665b4a49e076fb155d261d8347d
DeltaFile
+92-91llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
+1-4llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+93-952 files

LLVM/project 34c9c6bllvm/lib/Target/SystemZ SystemZAsmPrinter.cpp SystemZAsmPrinter.h, llvm/test/CodeGen/SystemZ zos-section-1.ll zos-prologue-epilog.ll

[SystemZ][z/OS] Remove use of subsections.

HLASM has no notion of subsections. There are several possible solutions
how to deal with this. However,

- using a different section introduces a lot of relocations, which slows
  down the binder later
- emitting the PPA1 after the code changes the location which may break
  existing tools

The choosen solution is to record the PPA1 data, and emit them at the
end of the assembly into the code section. This solves both issues,
at the expense of having to do some bookkeeping.

This change moves the position of the PPA2, too, but this is less
critical.
DeltaFile
+127-105llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+24-24llvm/test/CodeGen/SystemZ/zos-section-1.ll
+26-20llvm/test/CodeGen/SystemZ/zos-prologue-epilog.ll
+22-17llvm/test/CodeGen/SystemZ/zos-ppa1-argarea.ll
+24-3llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
+10-12llvm/test/CodeGen/SystemZ/zos-hlasm-out.ll
+233-1815 files not shown
+246-20211 files

LLVM/project 072e869llvm/tools/sancov sancov.cpp

Add sancov support for large AArch64 binaries. (#185374)

In AArch64 calls have a +/-128MB range

(https://developer.arm.com/documentation/ddi0602/2025-12/Base-Instructions/BL--Branch-with-link-).
In cases where the .text is larger than that, the linker adds functions
that just jumps to the sanitizer functions and places them to some code
location where the rest of the binary can call it. These functions have
the prefix __AArch64ADRPThunk__.
This commit marks calls to this function as coverage points.
DeltaFile
+6-1llvm/tools/sancov/sancov.cpp
+6-11 files

LLVM/project 4d49a1ellvm/lib/Target/AMDGPU SIInstrInfo.cpp

[AMDGPU] Move constraining of the reg class during SGPR to VGPR copy to existing loop (#182104)
DeltaFile
+12-23llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+12-231 files

LLVM/project df8f645llvm/lib/Target/AArch64 AArch64InstrGISel.td, llvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp AArch64LegalizerInfo.cpp

[AArch64][GlobalISel] Add G_SQDMULL node

Previously, GISel was failing to lower the sqdmulls.scalar intrinsic. This is just a variation of sqdmull, but on two 32-bit S registers.
To fix this, create a G_SQDMULL node, and lower sqdmulls.scalar to that. This node is linked to the SD patterns for sqdmull, which allow this version of the intrinsic to lower.
DeltaFile
+99-62llvm/test/CodeGen/AArch64/arm64-vmul.ll
+7-0llvm/lib/Target/AArch64/AArch64InstrGISel.td
+2-0llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+0-2llvm/test/CodeGen/AArch64/arm64-int-neon.ll
+2-0llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+110-645 files

LLVM/project 52be4b6llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 ptrauth-tail-call-global.ll

[AArch64][PAC] Don't skip global legalization for AUTH_TCRETURN (#182513)

The 77bcab835aca1 folds llvm.ptrauth.resign intrinsic in case intrinsic
discriminant and key match those in call ptrauth bundle. However
assertion is now fired in AArch64AsmPrinter when PAC is enabled and
we're tail calling a global, because AUTH_TCRETURN expects address to be
stored in register.
DeltaFile
+16-0llvm/test/CodeGen/AArch64/ptrauth-tail-call-global.ll
+1-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+17-12 files

LLVM/project 13bd037offload/test/offloading dyn_groupprivate.cpp

Fix test
DeltaFile
+0-24offload/test/offloading/dyn_groupprivate.cpp
+0-241 files