LLVM/project 7f77ca0llvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU wmma-coexecution-valu-hazards.mir

[AMDGPU] Include TRANS instructions in WMMA coexecution hazard checking (#186269)
DeltaFile
+26-0llvm/test/CodeGen/AMDGPU/wmma-coexecution-valu-hazards.mir
+2-2llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+28-22 files

LLVM/project 4c63b28clang/lib/CodeGen CGOpenMPRuntime.cpp, llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h

[Clang][OpenMP] Move declare simd codegen into OMPIRBuilder (#186030)

Refactor declare simd codegen by moving logic that does not depend on
Clang declarations into OpenMPIRBuilder.
DeltaFile
+78-284clang/lib/CodeGen/CGOpenMPRuntime.cpp
+214-0llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+71-0llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+363-2843 files

LLVM/project 53a5c83llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp AMDGPUISelDAGToDAG.cpp, llvm/test/CodeGen/AMDGPU spill-scavenge-offset.ll promote-constOffset-to-imm.ll

[AMDGPU] Support for nested add in GVS pattern matching

Fixes ROCM-20181.
DeltaFile
+1,463-3,005llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll
+520-496llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
+129-0llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+108-0llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+41-15llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.load.async.to.lds.ll
+16-21llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.ll
+2,277-3,5371 files not shown
+2,279-3,5397 files

LLVM/project 53cae83utils/bazel/llvm-project-overlay/clang BUILD.bazel, utils/bazel/llvm-project-overlay/clang/unittests BUILD.bazel

[bazel] Add libraries, binaries, and tests for ScalableStaticAnalysisFramework. (#186905)
DeltaFile
+59-0utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+27-0utils/bazel/llvm-project-overlay/clang/unittests/BUILD.bazel
+86-02 files

LLVM/project 106c22ellvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp AMDGPUISelDAGToDAG.cpp, llvm/test/CodeGen/AMDGPU spill-scavenge-offset.ll promote-constOffset-to-imm.ll

[AMDGPU] Support for nested add in GVS pattern matching

Fixes ROCM-20181.
DeltaFile
+1,463-3,005llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll
+520-496llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
+135-0llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+108-0llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+41-15llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.load.async.to.lds.ll
+16-21llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.ll
+2,283-3,5371 files not shown
+2,285-3,5397 files

LLVM/project c7c7afdmlir/lib/Dialect/Linalg/IR LinalgInterfaces.cpp, mlir/lib/Dialect/Linalg/Transforms Specialize.cpp

[MLIR][Linalg] Add matchers to specialize more unary ops (#183259)

Add missing matchers to the `linalg.generic` specialization patterns to
handle the remaining named unary elementwise ops.
DeltaFile
+127-6mlir/test/Dialect/Linalg/specialize-generic-ops.mlir
+101-4mlir/test/Dialect/Linalg/transform-op-specialize-elemwise-unary.mlir
+50-2mlir/test/Dialect/Linalg/roundtrip-morphism-linalg-named-ops.mlir
+39-3mlir/test/Dialect/Linalg/linalg-morph-multi-step.mlir
+33-6mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
+5-1mlir/lib/Dialect/Linalg/IR/LinalgInterfaces.cpp
+355-226 files

LLVM/project 59e01a1llvm/utils/TableGen/Common CodeGenDAGPatterns.cpp

[TableGen] Add new line to end of TreePatternNode::dump. (#186865)
DeltaFile
+4-1llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
+4-11 files

LLVM/project 509f8f5mlir/include/mlir/Dialect/LLVMIR LLVMOps.td, mlir/test/Dialect/LLVMIR roundtrip.mlir

add roundtrip and doc
DeltaFile
+36-0mlir/test/Dialect/LLVMIR/roundtrip.mlir
+6-0mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
+42-02 files

LLVM/project 253a33allvm/lib/Target/AMDGPU AMDGPULowerKernelAttributes.cpp, llvm/test/CodeGen/AMDGPU implicit-arg-v5-opt.ll

AMDGPU: Annotate grid_dims ABI load with range metadata (#185610)

Also substitute with a constant for the reqd_work_group_size case.
DeltaFile
+183-0llvm/test/CodeGen/AMDGPU/implicit-arg-v5-opt.ll
+48-0llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
+231-02 files

LLVM/project d2f89e2clang/docs ReleaseNotes.rst, clang/lib/Sema SemaOverload.cpp

[clang] Fixed 'implicitly deleted' diagnostic for explicitly deleted candidate function (#186634)

When an explicit function template specialization is deleted, the
overload candidate `Fn` may be a non-canonical `FunctionDecl` where
`IsDeleted` is not set, even though the canonical decl has it set.
`isDeletedAsWritten()` reads `this` while `isDeleted()` reads
`getCanonicalDecl()`, causing the mismatch. Fix by using
`getCanonicalDecl()` consistently in the diagnostic.
Fixes #185693
DeltaFile
+12-0clang/test/SemaCXX/deleted-template-spec-diag.cpp
+3-1clang/lib/Sema/SemaOverload.cpp
+2-2clang/test/CXX/drs/cwg8xx.cpp
+3-0clang/docs/ReleaseNotes.rst
+20-34 files

LLVM/project 6a785bfllvm/include/llvm/CodeGen RegisterScavenging.h, llvm/lib/Target/AArch64 AArch64FrameLowering.cpp

Revert "[AArch64] Allocate two emergency spill slots for MTE to fix register …" (#186900)

Reverts llvm/llvm-project#186505

Breaks buildbot
DeltaFile
+0-190llvm/test/CodeGen/AArch64/memtag-stg-loop-reg-pressure.mir
+0-28llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+0-2llvm/include/llvm/CodeGen/RegisterScavenging.h
+0-2203 files

LLVM/project e363764clang/docs ReleaseNotes.rst, clang/lib/Sema SemaTemplate.cpp

[Clang][Sema] Fix crash in CheckNonTypeTemplateParameterType with invalid type (#186200)

When a non-type template parameter has a type containing an undeduced
placeholder type that is invalid (e.g., a function returning a
function), `SubstAutoTypeSourceInfoDependent` can return null if the
type is invalid. `CheckNonTypeTemplateParameterType` was not handling
this case and would dereference the null pointer.

Fixes #177545
DeltaFile
+4-1clang/lib/Sema/SemaTemplate.cpp
+5-0clang/test/SemaTemplate/deduction-crash.cpp
+1-0clang/docs/ReleaseNotes.rst
+10-13 files

LLVM/project 2bf97callvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU wmma-coexecution-valu-hazards.mir

[AMDGPU] Include TRANS instructions in WMMA coexecution hazard checking
DeltaFile
+26-0llvm/test/CodeGen/AMDGPU/wmma-coexecution-valu-hazards.mir
+2-2llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+28-22 files

LLVM/project e5012fa.ci all_requirements.txt, mlir/python requirements.txt

test fp8
DeltaFile
+55-47.ci/all_requirements.txt
+85-1mlir/test/python/execution_engine.py
+24-5mlir/python/mlir/runtime/np_to_memref.py
+2-3mlir/python/requirements.txt
+166-564 files

LLVM/project cb8c65aclang/include/clang/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage UnsafeBufferUsage.h UnsafeBufferUsageExtractor.h, clang/include/clang/ScalableStaticAnalysisFramework/Core/Analyses/UnsafeBufferUsage UnsafeBufferUsage.h UnsafeBufferUsageBuilder.h

Reapply "[clang][ssaf] Add UnsafeBufferUsage summary extractor for functions (#182941)"

This reverts commit 53739c75a8720aaef8032628267ed4fd050af038.

Reapply after module dependency issues are resolved.

(rdar://169191570)
DeltaFile
+384-39clang/unittests/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsageTest.cpp
+281-0clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsageExtractor.cpp
+0-120clang/include/clang/ScalableStaticAnalysisFramework/Core/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.h
+104-0clang/include/clang/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.h
+40-0clang/include/clang/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsageExtractor.h
+0-32clang/include/clang/ScalableStaticAnalysisFramework/Core/Analyses/UnsafeBufferUsage/UnsafeBufferUsageBuilder.h
+809-1913 files not shown
+827-1919 files

LLVM/project 9e4dddcllvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/PhaseOrdering/X86 vector-reductions.ll

[SimplifyCFG] Allow phi folding for boolean logic over non-equality (#185124)

Phi folding is suppressed over binary operation inputs in order to avoid
interfering with switch formation.

After #183692, code (for example, Rust's ASCII character classification)
may get an `or` hoisted up into it, which suppresses
`foldTwoEntryPHINode`. This then produces branching code where
previously we generated straightline code.

To maintain switch formation while preventing any binops from breaking
phi folding, restrict the scenario in which Phi folding is suppressed to
binops of *equality* ops. This should mesh with switch statements, which
require an explicit list of values, while not breaking optimization over
> / < etc. which would never have been promoted to switches in the first
place.

Fixes: rust-lang/rust#153504
DeltaFile
+15-19llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+4-5llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll
+1-5llvm/test/Transforms/SimplifyCFG/switch-transformations-no-lut.ll
+1-5llvm/test/Transforms/SimplifyCFG/extract-cost.ll
+21-344 files

LLVM/project 51937fcflang/include/flang/Lower CUDA.h, flang/lib/Lower ConvertVariable.cpp CUDA.cpp

Revert "[flang][OpenMP] Use cuf.alloc for privatization of CUDA Fortr… (#186891)

…an device arrays (#185984)"

This reverts commit fb18d570b0466ca2a401aba11d6e58b206aebc1a.

This PR caused compilation failures with allocatable arrays, reverting
now for more investigation.
DeltaFile
+8-41flang/lib/Lower/Support/PrivateReductionUtils.cpp
+0-31flang/test/Lower/OpenMP/delayed-privatization-cuda-device-array.cuf
+11-10flang/lib/Lower/ConvertVariable.cpp
+0-18flang/lib/Lower/CUDA.cpp
+0-8flang/include/flang/Lower/CUDA.h
+19-1085 files

LLVM/project 347cb74llvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU wmma-coexecution-valu-hazards.mir

[AMDGPU] Include TRANS instructions in WMMA coexecution hazard checking
DeltaFile
+26-0llvm/test/CodeGen/AMDGPU/wmma-coexecution-valu-hazards.mir
+2-2llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+28-22 files

LLVM/project 538fcbcllvm/cmake/modules AddLLVM.cmake

[clang][CMake] Fix ODR violation with LLVM_LINK_LLVM_DYLIB (#186689)

After 42b638c6b40d ("Propagate dependencies to OBJECT libraries in
add_llvm_library"), obj.clangSupport now inherits clangSupport's
LINK_LIBRARIES via target_link_libraries, which includes libLLVM.so when
LLVM_LINK_LLVM_DYLIB is enabled.

Previously the obj.clangSupport alias path was harmless because the
OBJECT library carried no link dependencies. Now, aliasing
clangSupport_tablegen to obj.clangSupport in DYLIB mode causes
clang-tblgen to transitively link libLLVM.so, while also having LLVM
symbols compiled in statically — triggering an ASan ODR violation on
globals like llvm::vfs::FileSystem::ID.

Fix by only propagating parts of the compile interface instead of the
full link interface - INTERFACE_INCLUDE_DIRECTORIES and
INTERFACE_SYSTEM_INCLUDE_DIRECTORIES. Also add a TODO to consider
replacing with target_link_libraries($<COMPILE_ONLY:tgt>) once minimum
CMake version is 3.27 or higher.
DeltaFile
+10-7llvm/cmake/modules/AddLLVM.cmake
+10-71 files

LLVM/project 9da068blldb/cmake/modules LLDBConfig.cmake

[lldb] Default LLDB_ENABLE_MTE to OFF when Sanitizers are enabled. (#186884)

The MTE launcher complicates injecting the sanitizer runtime libraries.
DeltaFile
+13-9lldb/cmake/modules/LLDBConfig.cmake
+13-91 files

LLVM/project 013f254llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize epilog-vectorization-reductions.ll

[LV] Simplify and unify resume value handling for epilogue vec. (#185969)

This patch tries to drastically simplify resume value handling for the
scalar loop when vectorizing the epilogue.

It uses a simpler, uniform approach for updating all resume values in
the scalar loop:

1. Create ResumeForEpilogue recipes for all scalar resume phis in the
main loop (the epilogue plan will have exactly the same scalar resume
phis, in exactly the same order)
2. Update ::execute for ResumeForEpilogue to set the underlying value
when executing. This is not super clean, but allows easy lookup of the
generated IR value when we update the resume phis in the epilogue. Once
we connect the 2 plans together explicitly, this can be removed.
3. Use the list of ResumeForEpilogue VPInstructions from the main loop
to update the resume/bypass values from the epilogue.

This simplifies the code quite a bit, makes it more robust (should fix

    [11 lines not shown]
DeltaFile
+62-196llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+102-4llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
+95-4llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll
+14-14llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll
+14-14llvm/test/Transforms/LoopVectorize/X86/intrinsiccost.ll
+4-10llvm/test/Transforms/LoopVectorize/X86/scatter_crash.ll
+291-24228 files not shown
+362-33034 files

LLVM/project 0fa9a77clang/test/CodeGen/AArch64/neon bf16-getset.c

[Clang][AArch64] Update comments in tests (nfc) (#186885)
DeltaFile
+10-2clang/test/CodeGen/AArch64/neon/bf16-getset.c
+10-21 files

LLVM/project b503da8clang/lib/CodeGen CodeGenModule.cpp

use parseTargetAttr on x86 and AIX to parse target_clones string
DeltaFile
+5-5clang/lib/CodeGen/CodeGenModule.cpp
+5-51 files

LLVM/project ddaaf4fllvm/lib/Transforms/Utils InlineFunction.cpp, llvm/test/Transforms/Inline ret_attr_nofpclass.ll ret_attr_align_and_noundef.ll

[Inliner] Fix return attribute propagation across multiple return sites (#186076)

Fixes #185159 

This patch fixes a bug in `AddReturnAttributes()` where propagated
return attributes could incorrectly leak across multiple return sites in
the callee being inlined.

`AddReturnAttributes()` walks the callee's return instructions and tries
to backward-propagate return attributes from the callsite to the
returned call when the callee directly returns a call result. However,
the propagated attribute builders were updated in-place while iterating
over return sites. As a result, attributes refined for one return site
could be reused when
processing a later return site. This is incorrect because each return
site should be handled independently, starting from the original
callsite attributes.

This patch ensures that propagated return attributes are reinitialized
for each return site, so propagation is computed independently per
returned call.
DeltaFile
+39-0llvm/test/Transforms/Inline/ret_attr_nofpclass.ll
+37-0llvm/test/Transforms/Inline/ret_attr_align_and_noundef.ll
+5-3llvm/lib/Transforms/Utils/InlineFunction.cpp
+81-33 files

LLVM/project ffbdec6libclc/clc/lib/generic/math clc_pow_base.inc clc_pow.inc

libclc: Update pow functions

The 4 flavors of pow were originally ported from rocm
device libs between c45ec604f593fcb03d770f4398142d2446017f68,
cc5c65b2c25e0a82fbad95f0ce3bb5262e29eeee, and
fe8e00bc3c65115b2e3d2a43cf3d0d756a934a52. Update to a newer
version. Additionally expose fast variants for use by the
libcall optimizer (e.g, __pow_fast) for float types.
DeltaFile
+542-0libclc/clc/lib/generic/math/clc_pow_base.inc
+0-438libclc/clc/lib/generic/math/clc_pow.inc
+0-414libclc/clc/lib/generic/math/clc_powr.inc
+0-405libclc/clc/lib/generic/math/clc_rootn.inc
+0-402libclc/clc/lib/generic/math/clc_pown.inc
+78-0libclc/clc/lib/generic/math/clc_ep.inc
+620-1,65923 files not shown
+990-1,72629 files

LLVM/project 45e80d9clang/lib/Sema SemaConcept.cpp, clang/test/SemaTemplate concepts.cpp

add test case and explanation for regression introduced by #183010
DeltaFile
+9-0clang/test/SemaTemplate/concepts.cpp
+6-1clang/lib/Sema/SemaConcept.cpp
+15-12 files

LLVM/project ed76cbcllvm/include/llvm/CodeGen RegisterScavenging.h, llvm/lib/Target/AArch64 AArch64FrameLowering.cpp

[AArch64] Allocate two emergency spill slots for MTE to fix register … (#186505)

…scavenger crash

When `-sanitize=memtag-stack` is enabled and the compiler optimizes
contiguous ST2Gi instructions into an MTE loop (via
`TagStoreEdit::emitLoop`), it spawns two new post-RA virtual registers
simultaneously:
1. `BaseReg`
2. `SizeReg`

Under extremely high register pressure (such as in Swift async
continuation thunks, where almost all registers are kept live), the
Register Scavenger must fall back to using emergency spill slots to
assign physical registers to `BaseReg` and `SizeReg`.

Prior to this patch, `determineCalleeSaves` assumed that a maximum of
one register would ever need to be scavenged at a time. It either
allocated a single emergency spill slot, or bypassed the allocation

    [20 lines not shown]
DeltaFile
+190-0llvm/test/CodeGen/AArch64/memtag-stg-loop-reg-pressure.mir
+28-0llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+2-0llvm/include/llvm/CodeGen/RegisterScavenging.h
+220-03 files

LLVM/project 1a85eb9clang/test/SemaTemplate concepts.cpp

Address feedback
DeltaFile
+16-2clang/test/SemaTemplate/concepts.cpp
+16-21 files

LLVM/project a12b612clang/test/CodeGenHLSL/resources res-array-global-unbounded.hlsl, llvm/include/llvm/Analysis DXILResource.h

[HLSL] Use 0 to represent unbounded resources (#186022)

SPIRV backend uses 0 to represent unbounded arrays. This patch makes
unbounded resources be represented with 0 when binding them, as well as
makes sure the backend uses OpTypeRuntimeArray to represent such cases.
Fix: https://github.com/llvm/llvm-project/issues/183367
DeltaFile
+29-0llvm/test/CodeGen/SPIRV/hlsl-resources/unbounded-arr.ll
+5-7llvm/lib/Analysis/DXILResource.cpp
+6-6clang/test/CodeGenHLSL/resources/res-array-global-unbounded.hlsl
+8-3llvm/include/llvm/Analysis/DXILResource.h
+2-3llvm/lib/Target/DirectX/DXILOpLowering.cpp
+4-0llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+54-1912 files not shown
+70-3518 files

LLVM/project 58ffc93llvm/lib/Transforms/InstCombine InstCombinePHI.cpp, llvm/test/Transforms/PhaseOrdering phi-protected-field-ptr.ll

Address review comments

Created using spr 1.3.6-beta.1
DeltaFile
+42-0llvm/test/Transforms/PhaseOrdering/phi-protected-field-ptr.ll
+2-4llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
+44-42 files