LLVM/project 38555dbllvm/lib/Target/RISCV RISCVISelDAGToDAG.cpp RISCVInstrInfoP.td

[RISCV][P-ext] Replace some custom isel code with tablegen patterns. NFC (#199881)
DeltaFile
+0-51llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+17-1llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+17-522 files

LLVM/project 358f5e7llvm/lib/Target/RISCV RISCVInstrInfoP.td

[RISCV][P-ext] Add missing let Inst{31} = 0b0 to RVPPairShift_rr. (#199885)

This bit was accidentally left unset. I think this means we might have
treated this bit as a don't care for the disassembler could disassemble
some invalid encodings to these instructions. I didn't check the opcode
map closely enough to confirm this.
DeltaFile
+1-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+1-01 files

LLVM/project a3acd80mlir/include/mlir/Dialect/XeGPU/Transforms XeGPULayoutImpl.h, mlir/lib/Dialect/XeGPU/Transforms XeGPUPropagateLayout.cpp XeGPULayoutImpl.cpp

[MLIR][XeGPU] Propagate layout onto loop-carried iter_arg entry edges (#198862)

as per title
DeltaFile
+14-0mlir/include/mlir/Dialect/XeGPU/Transforms/XeGPULayoutImpl.h
+9-4mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+8-4mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+2-2mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
+2-2mlir/test/Dialect/XeGPU/propagate-layout.mlir
+35-125 files

LLVM/project c3fc506llvm/lib/Target/AMDGPU AMDGPUTargetTransformInfo.cpp

[AMDGPU] Remove explicit PartialThreshold setting in loop unrolling (#198901)

Remove UP.PartialThreshold = UP.Threshold / 4 from AMDGPU TTI, restoring
the default PartialThreshold of 150.

This was introduced in #194924 to limit code-size growth from runtime
unrolling, but PartialThreshold also gates compile-time partial
unrolling of constant-trip-count loops. This change will make the
PartialThreshold back to the default value for both compile-time partial
unrolling and runtime partial unrolling.

Benchmarked across CK, llama.cpp, and xpu-perf — no performance impact
from restoring the default.

Fixes #196372, replaces #196818.

Assisted-by: Claude Code
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+1-21 files

LLVM/project 22ced91flang/lib/Lower/OpenMP OpenMP.cpp, flang/test/Lower/OpenMP target-inreduction.f90 target-inreduction-unused.f90

[flang][OpenMP] Lower target in_reduction for host fallback

Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target for the host-fallback path.

The translation looks up task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.

The patch also preserves in_reduction operands in the TargetOp builder
path and ensures target in_reduction list items are mapped into the
target region when needed.

The device/offload-entry path remains diagnosed as not yet implemented.
DeltaFile
+90-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+83-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+59-4flang/lib/Lower/OpenMP/OpenMP.cpp
+50-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+28-0flang/test/Lower/OpenMP/target-inreduction.f90
+27-0flang/test/Lower/OpenMP/target-inreduction-unused.f90
+337-82 files not shown
+342-288 files

LLVM/project 97b9c24libcxx/include/__functional function.h, libcxx/test/libcxx/utilities/function.objects block.func.compile.pass.cpp

Add missing annotations for Apple platforms (#198864)

These seemed to be missed in #193045.
DeltaFile
+31-0libcxx/test/libcxx/utilities/function.objects/block.func.compile.pass.cpp
+4-0libcxx/include/__functional/function.h
+35-02 files

LLVM/project dd2ce3dllvm/cmake/modules AddLLVM.cmake

[LLVM] Add per-target runtime directory to rpath (#199755)

Summary:
    This simply adds the LLVM_DEFAULT_TARGET_TRIPLE to the LLVM build's
rpath if present. This keeps things hermetic for the library (offload)
    that depends on it.
    
  The reason this is required is because `llvm-gpu-loader` calls
`DynamicLibrary` on the Offload runtime. However, in a shared library
build the actual call is in libLLVMSupport.so, which does not have this
RPath, so `dlopen` delegates to that which does not know how to find it.
The only options to fix this are to use `dlopen` directly in the loader,
    or add the rpath to the LLVM binaries.
    
I think this makes sense for LLVM, because the target-specific directory
    can contain LLVM related libraries.
DeltaFile
+4-0llvm/cmake/modules/AddLLVM.cmake
+4-01 files

LLVM/project b65a71dlibc/test/src/__support/threads/linux raw_mutex_test.cpp, utils/bazel/llvm-project-overlay/libc BUILD.bazel

[libc][bazel] Add rules for __support/threads tests. (#199871)

* Add Bazel BUILD rules for three `__support/threads` unit tests.
* Fix/expand BUILD rules for the support libraries they depend on
(clock_gettime and vdso) that were previously incorrectly missing `.cpp`
files with implementations.
* Minor fix to use `internal::exit` in `raw_mutex_test` to avoid adding
a dependency on `exit` entrypoint, which doesn't yet exist in Bazel.

Assisted by: Gemini
DeltaFile
+52-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+49-0utils/bazel/llvm-project-overlay/libc/test/src/__support/threads/BUILD.bazel
+2-2libc/test/src/__support/threads/linux/raw_mutex_test.cpp
+103-23 files

LLVM/project 468951bllvm/lib/Target/AMDGPU SIFrameLowering.cpp

Fix for noassert buildbot break in #183153

Change-Id: I285adf09ac2df239d0ab05459f7388b6970247ad
DeltaFile
+1-2llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+1-21 files

LLVM/project 06a6cbbllvm/lib/Target/AMDGPU SIFrameLowering.cpp

[AMDGPU] Fix SuperReg to MCRegister conversion (#199993)

This is a fix for "[AMDGPU] Implement CFI for non-kernel functions
(#183153)" f78a233ac89dc0f9f0f26dfe051874013ae6e242 to use
"SuperReg.asMCReg()" instead of "MCRegister(SuperReg)", which leads to
"ambiguous call" when using the MSVC compiler.
DeltaFile
+2-2llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+2-21 files

LLVM/project e5ad7f7flang/lib/Semantics resolve-directives.cpp, flang/test/Semantics/OpenMP declare-simd-uniform.f90

[flang][OpenMP] Remove ompFlagsRequireMark from symbol resolution (#198591)

The `ompFlagsRequireMark` set was there to make sure that we put the
flags from it on symbols even when no new symbols needed to be created.

Instead of doing that, we can just put the flag on the symbol every
time. There is no harm in having these flags, it's just extra
information.
DeltaFile
+2-12flang/lib/Semantics/resolve-directives.cpp
+1-1flang/test/Semantics/OpenMP/declare-simd-uniform.f90
+3-132 files

LLVM/project 976469cflang/lib/Lower/OpenMP OpenMP.cpp, flang/test/Lower/OpenMP target-inreduction.f90 target-inreduction-unused.f90

[flang][OpenMP] Support in_reduction on target

Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target.

The translation looks up the task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.

The patch also preserves in_reduction operands in the TargetOp builder
path and makes sure target in_reduction list items are mapped into the
target region when needed.
DeltaFile
+90-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+83-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+59-4flang/lib/Lower/OpenMP/OpenMP.cpp
+50-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+28-0flang/test/Lower/OpenMP/target-inreduction.f90
+27-0flang/test/Lower/OpenMP/target-inreduction-unused.f90
+337-82 files not shown
+342-288 files

LLVM/project bf8dae1llvm/lib/Transforms/InstCombine InstCombineCasts.cpp, llvm/test/Transforms/InstCombine trunc-abs-intrinsics.ll

[InstCombine] Narrow llvm.abs through trunc. (#199643)

Update EvaluateInDifferentType / canEvaluateTruncated to narrow abs
intrinsics when the operand has at least OrigBitWidth - BitWidth + 1
sign bits. The transform always emits the narrow abs with
IsIntMinPoison=false, as the narrowed value may be INT_MIN in the narrow
type, while not in the original width.

Alive2 Proof with weaker precondition (top and truncated sign bits must
match): https://alive2.llvm.org/ce/z/AMQRmi

End-to-end C pixel math example: https://clang.godbolt.org/z/Ma8bsTGTY

PR: https://github.com/llvm/llvm-project/pull/199643
DeltaFile
+174-0llvm/test/Transforms/InstCombine/trunc-abs-intrinsics.ll
+15-0llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+189-02 files

LLVM/project 3e8c217llvm/lib/Target/AMDGPU SIInstrInfo.td, llvm/test/CodeGen/AMDGPU fshl-scalar-shift-zero.ll

[AMDGPU] Fix ShiftAmt32Imm to use unsigned comparison (#199052)

ShiftAmt32Imm used a signed 'Imm < 32' predicate, which incorrectly
matched negative immediates such as -1. The scalar fshr fast path:
    
      def : GCNPat<(UniformTernaryFrag<fshr> i32:$src0, i32:$src1,
                                             (i32 ShiftAmt32Imm:$src2)),
        (i32 (EXTRACT_SUBREG (S_LSHR_B64 ..., $src2), sub0))>;
    
When fshl(scalar, X, Z) is lowered via expandFunnelShift for any
constant Z in [0, 31], the generic code converts it to fshr(..., ~Z) or
fshr(..., -Z), producing a negative shift amount. Because all such
values satisfy Imm < 32 in a signed comparison, ShiftAmt32Imm matched
and the pattern passed the negative immediate directly to S_LSHR_B64
without the S_AND_B32 masking. S_LSHR_B64 then shifted by the wrong
amount, producing an incorrect result.
    
Fix by changing the predicate to an unsigned comparison so that only
values in [0, 31] match, and negative values fall through to the general

    [8 lines not shown]
DeltaFile
+139-0llvm/test/CodeGen/AMDGPU/fshl-scalar-shift-zero.ll
+1-1llvm/lib/Target/AMDGPU/SIInstrInfo.td
+140-12 files

LLVM/project dcf50fellvm/lib/Target/SystemZ SystemZInstrInfo.cpp, llvm/test/CodeGen/SystemZ foldmem-regalloc.mir foldmemop-global.mir

[SystemZ] Don't fold memops after SSA if tied regs don't match. (#197475)

When foldMemoryOperandImpl() is called during register allocation,
folding into a reg/mem opcode mustn't be done if the tied def and use
operands do not end up referencing the same register.

Fixes #197414
DeltaFile
+123-0llvm/test/CodeGen/SystemZ/foldmem-regalloc.mir
+4-2llvm/test/CodeGen/SystemZ/foldmemop-global.mir
+2-0llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
+129-23 files

LLVM/project 6cb00b6llvm/lib/Target/Hexagon HexagonISelLoweringHVX.cpp, llvm/test/CodeGen/Hexagon isel-hvx-pred-bitcast-order.ll inst_masked_store_bug1.ll

[Hexagon] Fix up vector predicate before compressing it for bitcast (#199283)

In v64i1 vector Predicate, each i1 is represented by 2 bits of predicate
register. A predicate register needs to be fixed before we compress it.

Signed-off-by: Alexey Karyakin <akaryaki at qti.qualcomm.com>
Co-authored-by: Ikhlas Ajbar <iajbar at quicinc.com>
DeltaFile
+47-2llvm/test/CodeGen/Hexagon/isel-hvx-pred-bitcast-order.ll
+21-0llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp
+1-1llvm/test/CodeGen/Hexagon/inst_masked_store_bug1.ll
+69-33 files

LLVM/project c0a302dllvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp

[AMDGPU] Refactor insertRelease into insertWriteback + insertWait (NFC) (#199486)

A release consists of two actions: write-back the current cache, and
wait for "relevant" outstanding operations to complete. With the new
memory model, it is possible to disable the cache write-back using
"non-av". This patch cleanly separates the existing implementation so
that the write-backs can be selectively applied after checking for
non-av semantics.

Part of a stack:

- #199486
- #199621
- #199489 
- #199622

Assisted-By: Claude Opus 4.6

---------

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+123-137llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+123-1371 files

LLVM/project 6170eebflang/lib/Lower/OpenMP ClauseProcessor.cpp, flang/test/Lower/OpenMP copyprivate6.f90

[flang][OpenMP] Fix copyprivate crash with unlimited polymorphic pointer (#199768)

Lowering a copyprivate clause whose list item is an unlimited
polymorphic pointer (class(*), pointer) crashed in TypeInfo::typeScan.
The scan descends through the fir.class box and the fir.ptr, reaching a
`none` element type, which the terminal assertion did not allow.

Fixes #198770
DeltaFile
+26-0flang/test/Lower/OpenMP/copyprivate6.f90
+5-1flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+31-12 files

LLVM/project 9757718utils/bazel/llvm-project-overlay/clang BUILD.bazel

[Bazel] Fixes 81c523c (#199988)

This fixes 81c523c71c92bea4c5548b5f84288acd0f05db42.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+4-1utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+4-11 files

LLVM/project d25ec4bclang/test/Preprocessor riscv-target-features.c, llvm/lib/Target/RISCV RISCVInstrInfoZvbdota.td RISCVFeatures.td

[llvm][RISCV] Support batched dot product extensions MC layer (#196467)

spec:

https://github.com/aswaterman/riscv-misc/blob/main/isa/ldot-bdot/ldot-bdot.adoc#zvdota-and-zvbdota-families-of-dot-product-extensions-version-02
DeltaFile
+65-0llvm/lib/Target/RISCV/RISCVInstrInfoZvbdota.td
+43-0llvm/lib/Target/RISCV/RISCVFeatures.td
+40-0clang/test/Preprocessor/riscv-target-features.c
+38-0llvm/test/MC/RISCV/rvv/zvqwbdota.s
+31-0llvm/test/MC/RISCV/rvv/zvfqwbdota8f.s
+21-0llvm/test/MC/RISCV/rvv/zvfwbdota16bf.s
+238-015 files not shown
+393-221 files

LLVM/project 9e279e4llvm/test/CodeGen/X86 vector-shuffle-combining-avx512vbmi2.ll

[X86] vector-shuffle-combining-avx512vbmi2.ll - add VLX/NOVLX test coverage (#199984)
DeltaFile
+35-13llvm/test/CodeGen/X86/vector-shuffle-combining-avx512vbmi2.ll
+35-131 files

LLVM/project 1a86fdcclang/include/clang/AST DeclTemplate.h, clang/lib/AST DeclTemplate.cpp

[clang] fix getTemplateInstantiationArgs

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.

Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
DeltaFile
+194-429clang/lib/Sema/SemaTemplateInstantiate.cpp
+257-164clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
+154-150clang/lib/Sema/SemaTemplate.cpp
+96-95clang/include/clang/AST/DeclTemplate.h
+59-129clang/lib/Sema/SemaConcept.cpp
+60-92clang/lib/AST/DeclTemplate.cpp
+820-1,05949 files not shown
+1,451-1,70755 files

LLVM/project f55acfflld/ELF Writer.cpp, lld/test/ELF/linkerscript overlap-nobits.s

[lld][ELF] Exclude SHT_NOBITS sections from LMA overlap checks (#196423)

In embedded applications it's sometimes useful to load a section at the
same virtual address as the .bss section. For example, one possible use
case is for temporary code/data that is only needed for a short time
when the program is starting up:

    REGIONS {
        RAM  : ORIGIN = 0x100000, LENGTH = 1M
        INIT : ORIGIN = 0x200000, LENGTH = 1M
    }

    .text { *(.text); } > RAM
    .bss (NOLOAD) : { *(.bss); } > RAM
    .init : AT(LOADADDR(.bss)) { *(.init); } > INIT

The .init section gets placed in the file immediately after the .text
section. At startup the .init section contents are copied to the INIT
region before zeroing .bss. Once the .init section is no longer needed

    [14 lines not shown]
DeltaFile
+91-0lld/test/ELF/linkerscript/overlap-nobits.s
+10-4lld/ELF/Writer.cpp
+101-42 files

LLVM/project 7664fc7offload/liboffload CMakeLists.txt, offload/libomptarget CMakeLists.txt

[Offload] Fix missing SONAME version on offload libraries (#199975)
DeltaFile
+5-0offload/liboffload/CMakeLists.txt
+5-0offload/libomptarget/CMakeLists.txt
+10-02 files

LLVM/project f567742clang/docs ReleaseNotes.rst, clang/lib/AST DeclCXX.cpp

[clang] fix finding class template instantiation pattern for member specializations (#199979)

Stop treating the member which a member specialization specializes as
the pattern of the former.

Split off from https://github.com/llvm/llvm-project/pull/199528
DeltaFile
+9-6clang/lib/AST/DeclCXX.cpp
+0-6clang/test/CXX/temp/temp.spec/temp.expl.spec/p7.cpp
+1-2clang/test/Modules/cxx-templates.cpp
+1-0clang/docs/ReleaseNotes.rst
+11-144 files

LLVM/project 58156c2lldb/tools/lldb-dap/Protocol ProtocolTypes.cpp, lldb/unittests/DAP ProtocolTypesTest.cpp

[lldb-dap] Fix typo in StepInTarget serialization (#199907)

I fixed small typo and added test for that.
DeltaFile
+12-0lldb/unittests/DAP/ProtocolTypesTest.cpp
+1-1lldb/tools/lldb-dap/Protocol/ProtocolTypes.cpp
+13-12 files

LLVM/project 3c77752clang/include/clang/ScalableStaticAnalysisFramework SSAFBuiltinForceLinker.h, llvm/include/llvm/IR FunctionProperties.def

Merge remote-tracking branch 'upstream/main' into users/ssahasra/refactor-acq-rel
DeltaFile
+100-0llvm/test/CodeGen/PowerPC/ppc-i64-to-fp.ll
+0-70llvm/test/Analysis/FunctionPropertiesAnalysis/properties-stats.ll
+69-0llvm/test/Transforms/InstSimplify/call.ll
+15-50clang/include/clang/ScalableStaticAnalysisFramework/SSAFBuiltinForceLinker.h
+64-0llvm/test/Analysis/FunctionPropertiesAnalysis/func-properties-analysis.ll
+35-26llvm/include/llvm/IR/FunctionProperties.def
+283-14635 files not shown
+506-31741 files

LLVM/project d7684c1flang/lib/Lower/OpenMP OpenMP.cpp, flang/test/Lower/OpenMP target-inreduction.f90 target-inreduction-unused.f90

[flang][OpenMP] Support in_reduction on target

Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target.

The translation looks up the task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.

The patch also preserves in_reduction operands in the TargetOp builder
path and makes sure target in_reduction list items are mapped into the
target region when needed.
DeltaFile
+90-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+83-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+50-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+28-0flang/test/Lower/OpenMP/target-inreduction.f90
+27-0flang/test/Lower/OpenMP/target-inreduction-unused.f90
+17-4flang/lib/Lower/OpenMP/OpenMP.cpp
+295-82 files not shown
+300-288 files

LLVM/project 81c523cclang/docs/ScalableStaticAnalysisFramework/developer-docs ForceLinkerHeaders.rst HowToExtend.rst, clang/include/clang/ScalableStaticAnalysisFramework SSAFBuiltinForceLinker.h BuiltinAnchorSources.def

Reapply "[clang][ssaf][NFC] Rework how the Force linker anchors are defined and used" (#194693)

This reverts commit 582958c4337f539e650096c0257a322315298e1a.

Drop "const" from these anchor variables - like they are in clang-tidy

Turns out, MSVC likely doesn't conform with the C++ standard and makes
`const volatile` global variables have *internal* linkage - while they
should have *external* linkage.

https://eel.is/c++draft/basic.link#3.2
```
(3) The name of an entity that belongs to a namespace scope has internal linkage if it is the name of
(3.1) a variable, variable template, function, or function template that is explicitly declared static; or
(3.2) a non-template variable of non-volatile const-qualified type, unless
(3.2.1) it is declared in the purview of a module interface unit (outside the private-module-fragment, if any) or module partition, or
(3.2.2) it is explicitly declared extern, or
(3.2.3) it is inline, or
(3.2.4) it was previously declared and the prior declaration did not have internal linkage; or

    [3 lines not shown]
DeltaFile
+15-50clang/include/clang/ScalableStaticAnalysisFramework/SSAFBuiltinForceLinker.h
+0-51clang/unittests/ScalableStaticAnalysisFramework/SSAFBuiltinTestForceLinker.h
+23-13clang/docs/ScalableStaticAnalysisFramework/developer-docs/ForceLinkerHeaders.rst
+30-0clang/include/clang/ScalableStaticAnalysisFramework/BuiltinAnchorSources.def
+17-10clang/docs/ScalableStaticAnalysisFramework/developer-docs/HowToExtend.rst
+0-23clang/unittests/ScalableStaticAnalysisFramework/SSAFTestForceLinker.h
+85-14720 files not shown
+135-20326 files

LLVM/project 1e1a1ff.github/workflows release-binaries-all.yml

workflows/release-binaries-all: Validate input and remove template expansion (#199427)

https://github.com/llvm/llvm-project/security/code-scanning/1695
DeltaFile
+8-1.github/workflows/release-binaries-all.yml
+8-11 files