LLVM/project 60a2d43llvm/lib/Target/AArch64 SVEShuffleOpts.cpp AArch64TargetMachine.cpp, llvm/test/CodeGen/AArch64 sve-tbl-folding-opts.ll sve-tbl-folding-new-pm.ll

[AArch64] Add SVE shuffle optimization pass (#193951)

Add a pass to perform VLA shuffle optimizations for SVE.

First up is using tbl to replace deinterleave4+uunpk+zext/uitofp
by generating shuffle masks with index, exploiting the fact that
out-of-range indices in the mask produce zeroes in the result
vector. That way, we can easily zero-extend smaller elements
by using the destination type when generating the mask, and
having one index in range with several out-of-range for each
destination element.
DeltaFile
+642-0llvm/test/CodeGen/AArch64/sve-tbl-folding-opts.ll
+293-0llvm/lib/Target/AArch64/SVEShuffleOpts.cpp
+210-0llvm/test/CodeGen/AArch64/sve-tbl-folding-new-pm.ll
+14-0llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+14-0llvm/lib/Target/AArch64/AArch64.h
+6-0llvm/lib/Target/AArch64/AArch64PassRegistry.def
+1,179-02 files not shown
+1,183-08 files

LLVM/project 3cc9463llvm/lib/Analysis Delinearization.cpp, llvm/test/Analysis/Delinearization inconsistent-types.ll

[Delinearization] Narrow the scope of the term collection (#204145)

In parametric delinearization, it collects subexpressions whose SCEV
type is `SCEVUnknown` and uses them as candidates for the array
dimensions. When traversing these subexpressions, it may follow any kind
of expression. For example, if it follows a `sext` expression, this can
lead to type inconsistencies among the collected terms.
This patch fixes this issue by preventing traversal into subexpressions
other than `SCEVAddExpr` or `SCEVAddRecExpr`.

Note: I tried to minimize the test case, but this seems to be as far as
it can go.

Fix #204066.
DeltaFile
+44-0llvm/test/Analysis/Delinearization/inconsistent-types.ll
+5-11llvm/lib/Analysis/Delinearization.cpp
+49-112 files

LLVM/project bc70d29clang/docs LanguageExtensions.rst, clang/lib/CodeGen CodeGenModule.cpp CodeGenModule.h

[Clang][AIX] Add -mloadtime-comment-vars support to preserve variables in the final object file.
DeltaFile
+119-0clang/lib/CodeGen/CodeGenModule.cpp
+66-0clang/docs/LanguageExtensions.rst
+61-0clang/test/CodeGen/loadtime-comment-vars.c
+13-8llvm/test/Transforms/LowerCommentString/lower-comment-string.ll
+18-0clang/lib/CodeGen/CodeGenModule.h
+12-0clang/test/CodeGen/PowerPC/loadtime-comment-mixed.c
+289-84 files not shown
+319-810 files

LLVM/project f6fd6eamlir/lib/ExecutionEngine CMakeLists.txt

[mlir][ExecutionEngine] Fix dead -Wno-c++98-compat-extra-semi guard (#204524)

`check_cxx_compiler_flag` stores its result in
`CXX_SUPPORTS_NO_CXX98_COMPAT_EXTRA_SEMI_FLAG`, but the guarding `if()`
checked `CXX_SUPPORTS_CXX98_COMPAT_EXTRA_SEMI_FLAG` (without `_NO_`),
which is never set. The condition was therefore always false and the
`-Wno-c++98-compat-extra-semi` suppression for `mlir_rocm_runtime` was
never applied.

The sibling flag checks in the same block (`-Wno-return-type-c-linkage`,
`-Wno-nested-anon-types`, `-Wno-gnu-anonymous-struct`) already use
matching variable names, so this aligns the typo'd guard with the
established pattern.

No test is included, this is a build-system-only (CMake) change to a
warning-suppression guard and is not unit-testable.

Signed-off-by: bogdan-petkovic <bpetkovi at amd.com>
DeltaFile
+1-1mlir/lib/ExecutionEngine/CMakeLists.txt
+1-11 files

LLVM/project b90ec9cllvm/lib/CodeGen StackColoring.cpp

[StackColoring] Remove unused BB numbering state (#204414)
DeltaFile
+8-17llvm/lib/CodeGen/StackColoring.cpp
+8-171 files

LLVM/project 500d1f8llvm/lib/Target/SPIRV SPIRVUtils.cpp SPIRVPrepareFunctions.cpp, llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_function_pointers fun-ptr-void-call-aggregate-arg.ll

[SPIR-V] Fix crash on void indirect call with aggregate argument (#204388)

removeAggregateTypesFromCalls named the call to key the type-restoration
metadata, which asserts for void-returning calls. Key the metadata via
instruction metadata on the call instead, which works for void results.
DeltaFile
+42-0llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_function_pointers/fun-ptr-void-call-aggregate-arg.ll
+20-4llvm/lib/Target/SPIRV/SPIRVUtils.cpp
+9-6llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
+71-103 files

LLVM/project e6daa68compiler-rt/test/builtins/Unit lit.cfg.py

Revert "Revert "[Compiler-rt][test] Fix circular link dependency between builtins and libc"" (#204728)

Reverts llvm/llvm-project#203152
DeltaFile
+3-1compiler-rt/test/builtins/Unit/lit.cfg.py
+3-11 files

LLVM/project fdf3d44llvm/test/Transforms/InstCombine pext.ll pdep.ll

[InstCombine] Add tests showing failure to fold pdep(0,x) and pext(0,x) to 0 (#204783)

As noted on #204144
DeltaFile
+18-0llvm/test/Transforms/InstCombine/pext.ll
+18-0llvm/test/Transforms/InstCombine/pdep.ll
+36-02 files

LLVM/project 4549680clang/test/SemaCXX enable_if.cpp, llvm/examples/OrcV2Examples/LLJITWithSymbolAliases LLJITWithSymbolAliases.cpp

Merge branch 'main' into users/kasuga-fj/delin-fix-param-types
DeltaFile
+97-114llvm/include/llvm/Support/LSP/Protocol.h
+61-27clang/test/SemaCXX/enable_if.cpp
+85-0llvm/examples/OrcV2Examples/LLJITWithSymbolAliases/LLJITWithSymbolAliases.cpp
+73-0llvm/test/CodeGen/SPIRV/instructions/phi-large-vector-shader.ll
+48-21llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+65-0llvm/test/CodeGen/AArch64/sve-masked-gather-64b-unscaled.ll
+429-16296 files not shown
+1,430-596102 files

LLVM/project a5e83b9clang/include/clang/Basic arm_neon.td, clang/lib/CodeGen/TargetBuiltins ARM.cpp

[Clang][NEON ACLE] Remove +bf16 requirement from opaque bfloat builtins. (#204201)

Builtins that only care about the size of the element type but not its
format (e.g loads, stores and shuffles) do not require any special
instructions to code generate beyond those already available to +neon.

Fixes https://github.com/llvm/llvm-project/issues/203159
DeltaFile
+0-56clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+18-16clang/include/clang/Basic/arm_neon.td
+2-2clang/test/Sema/aarch64-neon-without-target-feature.cpp
+2-2clang/test/CodeGen/AArch64/neon-luti.c
+2-2clang/test/CodeGen/AArch64/bf16-lane-intrinsics.c
+2-2clang/test/CodeGen/AArch64/bf16-ldst-intrinsics.c
+26-806 files not shown
+30-9012 files

LLVM/project 39a8be5llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 sve-masked-gather-64b-unscaled.ll sve-masked-gather.ll

[AArch64] Combine undef UZP and NVCAST away.

These are used to lower insert_subvec nodes quite early in SDAG. After
DAG combines run, it's possible that the inputs to these AArch64 nodes
become UNDEF.
DeltaFile
+17-5llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+3-6llvm/test/CodeGen/AArch64/sve-masked-gather-64b-unscaled.ll
+3-6llvm/test/CodeGen/AArch64/sve-masked-gather.ll
+1-2llvm/test/CodeGen/AArch64/sve-masked-gather-legalize.ll
+24-194 files

LLVM/project 40cbc98llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp, llvm/test/CodeGen/AArch64 sve-masked-gather-64b-unscaled.ll sve-masked-scatter-64b-unscaled.ll

[AArch64][SDAG] Legalise nxv1 gather/scatter nodes (#204620)

This updates WidenVecRes_MGATHER and WidenVecOp_MSCATTER to support
scalable vector types.
DeltaFile
+65-0llvm/test/CodeGen/AArch64/sve-masked-gather-64b-unscaled.ll
+62-0llvm/test/CodeGen/AArch64/sve-masked-scatter-64b-unscaled.ll
+61-0llvm/test/CodeGen/AArch64/sve-masked-gather.ll
+58-0llvm/test/CodeGen/AArch64/sve-masked-scatter.ll
+18-14llvm/test/CodeGen/AArch64/sve-masked-gather-legalize.ll
+9-11llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+273-252 files not shown
+301-258 files

LLVM/project 47b29c2llvm/lib/Target/SPIRV SPIRVLegalizerInfo.cpp, llvm/test/CodeGen/SPIRV/instructions phi-large-vector-shader.ll phi-large-vector.ll

[SPIR-V] Legalize G_PHI of oversized vectors via fewer-elements (#203993)

`G_PHI` on vectors wider than the SPIR-V max vector size previously
failed legalization. This PR adds a `fewerElementsIf` rule that splits
them down to `MaxVectorSize`, matching how other vector ops are handled
in `SPIRVLegalizerInfo.cpp`.


Added the following test
`llvm/test/CodeGen/SPIRV/instructions/phi-large-vector.ll` covering
spirv32 and spirv64.
DeltaFile
+73-0llvm/test/CodeGen/SPIRV/instructions/phi-large-vector-shader.ll
+44-0llvm/test/CodeGen/SPIRV/instructions/phi-large-vector.ll
+5-1llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
+122-13 files

LLVM/project 06137a5llvm/include/llvm/CAS OnDiskGraphDB.h OnDiskTrieRawHashMap.h, llvm/include/llvm/CodeGen MIR2Vec.h

[llvm] Remove LLVM_ABI_FOR_TEST in public headers (#204627)

These annotations were mistakenly set up as LLVM_ABI_FOR_TEST. Since
these are public headers, they should be using LLVM_ABI.

The effort to build LLVM as a dylib is tracked in #109483.
DeltaFile
+97-114llvm/include/llvm/Support/LSP/Protocol.h
+15-17llvm/include/llvm/CAS/OnDiskGraphDB.h
+10-10llvm/include/llvm/CAS/OnDiskTrieRawHashMap.h
+10-10llvm/include/llvm/CAS/UnifiedOnDiskCache.h
+8-9llvm/include/llvm/CAS/OnDiskDataAllocator.h
+5-7llvm/include/llvm/CodeGen/MIR2Vec.h
+145-16722 files not shown
+192-21428 files

LLVM/project c428e87flang/lib/Lower/OpenMP OpenMP.cpp, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[flang][OpenMP] Lower target in_reduction for host fallback

Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.

Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.

Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
DeltaFile
+145-5mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+110-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+77-36mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+107-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+72-19flang/lib/Lower/OpenMP/OpenMP.cpp
+75-0mlir/test/Target/LLVMIR/openmp-target-in-reduction-multi.mlir
+586-637 files not shown
+756-8013 files

LLVM/project f96bd14clang/lib/CIR/CodeGen CIRGenTypes.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn-buffer-rsrc-type.hip

[CIR][AMDGPU] Adds __amdgpu_buffer_rsrc_t in the buffer-resource address space
DeltaFile
+80-0clang/test/CIR/CodeGenHIP/builtins-amdgcn-buffer-rsrc-type.hip
+3-1clang/lib/CIR/CodeGen/CIRGenTypes.cpp
+83-12 files

LLVM/project 6528388llvm/test/CodeGen/AArch64 sve-masked-gather-legalize.ll

Replace old masked.gather syntax
DeltaFile
+1-15llvm/test/CodeGen/AArch64/sve-masked-gather-legalize.ll
+1-151 files

LLVM/project 8970d84llvm/lib/Target/AMDGPU SIDefines.h AMDGPULegalizerInfo.cpp

Comments
DeltaFile
+4-3llvm/lib/Target/AMDGPU/SIDefines.h
+1-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.h
+1-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+8-75 files

LLVM/project c327ab3llvm/lib/Target/AArch64 AArch64FrameLowering.cpp, llvm/test/CodeGen/AArch64 windows-elf-frame-record-pairing.ll

[AArch64] Fix Windows target detection in FrameLowering (#204347)

In #156467, we switched to using `getMCAsmInfo()->usesWindowsCFI()` to
recognize "Windows". This does not include Windows triples with ELF
binary formats.

So, for aarch64-pc-windows-msvc-elf we would use the Windows callee-save
list in `AArch64RegisterInfo::getCalleeSavedRegs()`, but FrameLowering
would handle this like Linux, and fail to invalidate the (x29, x28)
pairing.

This patch switches back to using AArch64Subtarget::isTargetWindows(),
which aligns with getCalleeSavedRegs().

Note: We were using `usesWindowsCFI()` to include UEFI targets, however,
there does not seem to be tests/support for UEFI triples on AArch64
(basic examples that compile for x86 fail: https://godbolt.org/z/dPWdTrEG7). 
So, this has been moved to a TODO.

Fixes #204060
DeltaFile
+36-0llvm/test/CodeGen/AArch64/windows-elf-frame-record-pairing.ll
+6-1llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+42-12 files

LLVM/project 0e9b39bflang/lib/Lower/OpenMP OpenMP.cpp, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[flang][OpenMP] Lower target in_reduction for host fallback

Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.

Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.

Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
DeltaFile
+145-5mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+110-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+77-36mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+107-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+72-19flang/lib/Lower/OpenMP/OpenMP.cpp
+75-0mlir/test/Target/LLVMIR/openmp-target-in-reduction-multi.mlir
+586-637 files not shown
+756-8013 files

LLVM/project 04e094bclang/lib/CIR/CodeGen CIRGenTypes.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn-buffer-rsrc-type.hip

[CIR][AMDGPU] Adds __amdgpu_buffer_rsrc_t in the buffer-resource address space
DeltaFile
+81-0clang/test/CIR/CodeGenHIP/builtins-amdgcn-buffer-rsrc-type.hip
+3-1clang/lib/CIR/CodeGen/CIRGenTypes.cpp
+84-12 files

LLVM/project 3c7a8dfllvm/test/CodeGen/AArch64 sve-masked-scatter-64b-unscaled.ll sve-masked-gather.ll

Tidy up tests
DeltaFile
+23-15llvm/test/CodeGen/AArch64/sve-masked-scatter-64b-unscaled.ll
+12-24llvm/test/CodeGen/AArch64/sve-masked-gather.ll
+12-24llvm/test/CodeGen/AArch64/sve-masked-gather-64b-unscaled.ll
+12-20llvm/test/CodeGen/AArch64/sve-masked-scatter.ll
+14-0llvm/test/CodeGen/AArch64/sve-masked-scatter-64b-scaled.ll
+3-6llvm/test/CodeGen/AArch64/sve-masked-gather-64b-scaled.ll
+76-896 files

LLVM/project 06aa0d9llvm/lib/Analysis Delinearization.cpp

delete unnecessary return
DeltaFile
+1-7llvm/lib/Analysis/Delinearization.cpp
+1-71 files

LLVM/project b496d06clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp Origins.cpp, clang/test/Sema/LifetimeSafety safety.cpp safety-c.c

[LifetimeSafety] Model bit_cast and atomic casts in the fact generator (#204591)

VisitCastExpr dropped several borrow-carrying cast kinds into its
default case. Propagate the borrow through
`__builtin_bit_cast`/`std::bit_cast` of a pointer and through
wrapping/unwrapping `_Atomic(T*)`, so a stack address laundered through
either is caught (matching reinterpret_cast). hasOrigins and
buildListForType now see through AtomicType, which is transparent for
lifetimes.

Assisted-by: Claude Opus 4.8

Co-authored-by: Gabor Horvath <gaborh at apple.com>
DeltaFile
+28-0clang/test/Sema/LifetimeSafety/safety.cpp
+15-2clang/test/Sema/LifetimeSafety/safety-c.c
+14-0clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+7-0clang/lib/Analysis/LifetimeSafety/Origins.cpp
+64-24 files

LLVM/project b9587a7lld/ELF/Arch AArch64.cpp, lld/test/ELF aarch64-tls-le.s aarch64-tlsld-ldst.s

[ELF][AArch64] Relax zero TLSLE add to nop (#204286)

Optimize AArch64 local-exec TLS relocation handling by replacing a
self-add R_AARCH64_TLSLE_ADD_TPREL_HI12 instruction with nop when the
high 12 bits are zero.

The optimization is disabled by --no-relax and avoids non-equivalent
forms such as non-self-adds and 32-bit destination registers.
DeltaFile
+15-2lld/test/ELF/aarch64-tls-le.s
+8-0lld/ELF/Arch/AArch64.cpp
+1-1lld/test/ELF/aarch64-tlsld-ldst.s
+24-33 files

LLVM/project 9361b3dllvm/test/Transforms/LoopVectorize widen-call-op-scalar-vector.ll

[LV] Add test for WidenCall with mixed scalar-vector operands (#203092)
DeltaFile
+48-0llvm/test/Transforms/LoopVectorize/widen-call-op-scalar-vector.ll
+48-01 files

LLVM/project f8bd135llvm/utils/lit/lit TestRunner.py

[lit] Make RecursionError less likely in internal shell (#204573)

The lit internal shell chains together the contents of multiple RUN:
lines by connecting them with implicit && nodes, forming a binary tree
structure which is then executed recursively by `_executeShCommand`.
However the tree structure is constructed in a very simple way which
makes it effectively just a linked list, so `_executeShCommand` must
recurse to a depth equal to the number of commands.

If a test file contains more than 1000 RUN: lines (e.g. running the
clang driver only, with lots of different options), then this causes a
RecursionError exception, which did not happen using the external shell.
Failures of this kind can be avoided by instead connecting the commands
together in a _balanced_ binary tree, which has equivalent behaviour,
since the && shell operator is associative.
DeltaFile
+14-3llvm/utils/lit/lit/TestRunner.py
+14-31 files

LLVM/project 4fe695ellvm/include/llvm/Analysis BlockFrequencyInfoImpl.h ProfileSummaryInfo.h, llvm/include/llvm/IR Function.h

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+13-31llvm/lib/IR/Function.cpp
+4-30llvm/include/llvm/IR/Function.h
+14-18llvm/lib/Transforms/Utils/InlineFunction.cpp
+10-16llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h
+11-13llvm/include/llvm/Analysis/ProfileSummaryInfo.h
+3-14llvm/unittests/IR/MetadataTest.cpp
+55-12220 files not shown
+108-19326 files

LLVM/project 085e13bllvm/include/llvm/Analysis BlockFrequencyInfoImpl.h ProfileSummaryInfo.h, llvm/include/llvm/IR Function.h

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.7

[skip ci]
DeltaFile
+10-16llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h
+12-14llvm/include/llvm/Analysis/ProfileSummaryInfo.h
+10-16llvm/lib/IR/Function.cpp
+6-6llvm/lib/Analysis/BlockFrequencyInfoImpl.cpp
+0-10llvm/unittests/IR/MetadataTest.cpp
+3-7llvm/include/llvm/IR/Function.h
+41-695 files not shown
+55-8011 files

LLVM/project 880a7bellvm/include/llvm/Analysis BlockFrequencyInfoImpl.h, llvm/include/llvm/IR Function.h

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.7

[skip ci]
DeltaFile
+10-16llvm/lib/IR/Function.cpp
+10-16llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h
+6-6llvm/lib/Analysis/BlockFrequencyInfoImpl.cpp
+0-10llvm/unittests/IR/MetadataTest.cpp
+3-7llvm/include/llvm/IR/Function.h
+4-3llvm/lib/Analysis/ProfileSummaryInfo.cpp
+33-584 files not shown
+40-6610 files