LLVM/project 573138dllvm/include/llvm/IR DebugInfo.h, llvm/lib/IR DebugInfo.cpp

Handle more cases in DebugInfoFinder (#194684)

In #181028 we discovered that DebugInfoFinder is missing some cases.
This corrects several of these. It is hard to know if I found them all.
DeltaFile
+83-0llvm/test/DebugInfo/Generic/debuginfofinder-composite-type.ll
+48-0llvm/lib/IR/DebugInfo.cpp
+1-0llvm/include/llvm/IR/DebugInfo.h
+132-03 files

LLVM/project f72be67llvm/utils profcheck-xfail.txt

[ProfCheck] Add test from #197698 to xfail list (#199650)

All other coroutines tests are already on the list, but the new test has
not been added.
DeltaFile
+1-0llvm/utils/profcheck-xfail.txt
+1-01 files

LLVM/project 9fb53a7llvm/lib/Support UnicodeNameToCodepointGenerated.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.av.load.b128.ll

Merge from main
DeltaFile
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+8,633-8,584llvm/test/CodeGen/Thumb2/mve-clmul.ll
+12,365-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.load.b128.ll
+1,243-8,768llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll
+0-4,752llvm/test/tools/llvm-mca/RISCV/SiFiveP800/vlseg-vsseg.s
+4,549-0llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv/arithmetic.test
+50,663-43,0274,957 files not shown
+261,514-149,5734,963 files

LLVM/project 22d81f2compiler-rt/lib/builtins/arm extendsfdf2.S

Review nits (NFC)
DeltaFile
+5-4compiler-rt/lib/builtins/arm/extendsfdf2.S
+5-41 files

LLVM/project 70321b9llvm/lib/Support Path.cpp, llvm/unittests/Support Path.cpp CommandLineTest.cpp

[Support] Support runtime override for LLVM_WINDOWS_PREFER_FORWARD_SLASH (#199210)

Allow overriding the compile-time LLVM_WINDOWS_PREFER_FORWARD_SLASH
setting at runtime using an environment variable of the same name.

This enables testing both path separator behaviors (forward slash vs.
backslash on Windows) using a single build, which is useful for
CI/Buildbots.

The environment variable is checked once and cached in a static variable
for performance.

Also updated relevant tests in SupportTests (Path.cpp and
CommandLineTest.cpp) to dynamically detect the preferred separator style
at runtime instead of relying on the compile-time macro, making them
compatible with the override.
DeltaFile
+12-2llvm/lib/Support/Path.cpp
+6-5llvm/unittests/Support/Path.cpp
+6-2llvm/unittests/Support/CommandLineTest.cpp
+24-93 files

LLVM/project ded7aa0llvm/test/tools/llvm-objcopy/DXContainer dump-section.yaml

[z/OS] Add --ignore-case to FileCheck on output from od. (#196396)

This test fails on z/OS because `od` outputs upper case letters on z/OS.
DeltaFile
+1-1llvm/test/tools/llvm-objcopy/DXContainer/dump-section.yaml
+1-11 files

LLVM/project 05a2372llvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVInstrGISel.td, llvm/lib/Target/RISCV/GISel RISCVLegalizerInfo.cpp RISCVLegalizerInfo.h

[RISCV][GlobalISel] Lower i8 bitreverse using brev8 with Zbkb (#199469)

This teaches RISC-V GlobalISel to custom-lower scalar i8 G_BITREVERSE
using brev8 when Zbkb is available.

The i8 source is zero-extended to XLEN before applying the riscv_brev8
intrinsic. Since brev8 reverses bits independently within each byte, the
high zero bytes remain zero, so the result can be truncated back to i8.
DeltaFile
+136-0llvm/test/CodeGen/RISCV/GlobalISel/bitreverse-zbkb.ll
+29-1llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp
+24-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+9-0llvm/lib/Target/RISCV/RISCVInstrGISel.td
+6-0llvm/lib/Target/RISCV/RISCVISelLowering.h
+1-0llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h
+205-16 files

LLVM/project a903b95llvm/lib/Transforms/Utils CloneModule.cpp SplitModule.cpp, llvm/test/tools/llvm-split ifunc.ll alias-to-ifunc.ll

 [CloneModule] Clone undefined ifuncs (#197353)

To satisfy the verifier rule "IFunc resolver must be a definition". We
fix iFunc handling when cloning modules.
When cloning a module, if an IFunc has no definition
(ShouldCloneDefinition returns false), directly create an external
GlobalValue (Function or GlobalVariable) instead of trying to clone the
ifunc.
Add a test case for llvm-split to verify the ifunc cloning/splitting
behavior works correctly.
DeltaFile
+33-0llvm/test/tools/llvm-split/ifunc.ll
+29-0llvm/test/tools/llvm-split/alias-to-ifunc.ll
+15-0llvm/lib/Transforms/Utils/CloneModule.cpp
+1-0llvm/lib/Transforms/Utils/SplitModule.cpp
+78-04 files

LLVM/project 028153allvm/lib/Transforms/Vectorize VPlan.cpp, llvm/test/Transforms/LoopVectorize pointer-induction.ll scev-predicate-reasoning.ll

[VPlan] Make TransformState::get BCast-logic robust (#197589)

The logic for inserting Broadcasts in a more optimal location in
VPTransformState::get is quite fragile, especially around scalable VFs.
Fix it, resulting in minor improvements.
DeltaFile
+9-31llvm/lib/Transforms/Vectorize/VPlan.cpp
+2-2llvm/test/Transforms/LoopVectorize/pointer-induction.ll
+2-2llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll
+13-353 files

LLVM/project 7933783clang/lib/Sema OpenCLBuiltins.td, clang/test/SemaOpenCL intel-split-work-group-barrier-builtins.cl

[OpenCL] Add cl_intel_split_work_group_barrier builtins (#199424)

Add cl_intel_split_work_group_barrier declarations to OpenCLBuiltins.td
and cover the extension with a dedicated header-free SPIR test.

Specification:

https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_split_work_group_barrier.html

Co-authored-by: Copilot
DeltaFile
+26-13clang/lib/Sema/OpenCLBuiltins.td
+22-0clang/test/SemaOpenCL/intel-split-work-group-barrier-builtins.cl
+48-132 files

LLVM/project 10c0de8clang/tools/clang-linker-wrapper ClangLinkerWrapper.cpp

[LinkerWrapper] Fix temps being dumped to CWD instead of output path (#198679)

Summary:
Offloading save temps is a complex dance where we have clang,
linker-wrapper, and lld all making their own temp files. The ones in the
linker wrapper were not respecting the output directory because we
stripped everything with filename. Just get rid of this so it uses the
output file's directory properly in this mode.
DeltaFile
+10-16clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+10-161 files

LLVM/project 3bf5972llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-chained.ll partial-reduce-usabs.ll

[LV] Optimize partial reduction extends before handling inloop subs

The crash avoided in #194660 was caused by the extend optimizations
failing to match as due to the extra sub/negation added to the
"ExtendedOp".

A similar crash exists for [us]abs partial reductions
(see https://godbolt.org/z/MerMon5rE), which is fixed with this patch.

This patch solves the underlying issue by running the extend optimizations
before any inloop sub/fsub handling.

Fixes #194000
DeltaFile
+70-66llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
+57-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-usabs.ll
+3-6llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+130-723 files

LLVM/project 5d7c117llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-chained.ll partial-reduce.ll

[LV] Support partial reduce subs/fsubs without a mul operand

This allows the `UpdateR(PrevValue, ext(...))` form for fsub/sub
updates (i.e, AddWithSub or Sub reductions). For sub reductions the
codegen/handling is identical to add reductions (with the sub handled
out of loop). For AddWithSub, reductions the sub is handled in-loop
with a NegatedExtendedReduction VP expression, which the encapsulates
`reduce.[f]add(neg(ext(op)))`.
DeltaFile
+136-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
+51-66llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll
+117-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-costs.ll
+103-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub.ll
+12-7llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+12-3llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+431-762 files not shown
+442-768 files

LLVM/project 05cf561llvm/lib/Analysis GlobalsModRef.cpp

[GlobalsModRef] Don't erase while iterating

The loop erases from AllocsForIndirectGlobals while walking it, which
now hits the iterator invalidation assert in DenseMap::erase. Use
remove_if instead.
DeltaFile
+2-5llvm/lib/Analysis/GlobalsModRef.cpp
+2-51 files

LLVM/project 9939d61llvm/lib/ExecutionEngine/Orc MachOPlatform.cpp

[ORC] Avoid iterator invalidation when erasing image info symbols

processObjCImageInfo iterated the section's DenseSet of symbols while
calling removeDefinedSymbol, which erases from that same set. Re-fetch
begin() each iteration so the iterator is always fresh.
DeltaFile
+2-2llvm/lib/ExecutionEngine/Orc/MachOPlatform.cpp
+2-21 files

LLVM/project c437052llvm CMakeLists.txt, llvm/include/llvm/ADT APFloat.h

Reland "[APFloat] Add exp functions for single and double using exp/expf implementations from LLVM libc." (#197440) (#199570)

This reverts commit 1565f096d868f479f075fce3792db7b908cab9aa.

**Fixes applied on LLVM libc side:**
- gcc 7, 8, 9 compatibility:
  - https://github.com/llvm/llvm-project/pull/197476
  - https://github.com/llvm/llvm-project/pull/197868
- Add gcc's versions to LLVM libc-shared-tests precommit CI:
https://github.com/llvm/llvm-project/pull/199300

**Original commits messages:**

Discourse RFC:
https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450

- The implementation in LLVM libc is free-standing header-only:
https://github.com/llvm/llvm-project/issues/147386
- expf / exp implementation in LLVM libc is correctly rounded for all

    [4 lines not shown]
DeltaFile
+58-0llvm/unittests/ADT/APFloatTest.cpp
+31-0llvm/lib/Support/APFloat.cpp
+5-0llvm/lib/Support/CMakeLists.txt
+4-0llvm/include/llvm/ADT/APFloat.h
+4-0llvm/CMakeLists.txt
+1-1llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp
+103-16 files

LLVM/project 0f1d083clang/test CMakeLists.txt

[Clang][test] check-clang-format not created with LLVM_ENABLE_IDE (#199638)

add_lit_testsuites skips creating targets for each subdirectory when
LLVM_ENABLE_IDE. Only create the dependency (introduced in #199169) when
the check-clang-format target actually exists.

Fixes the LLVM build when using an IDE.
DeltaFile
+3-1clang/test/CMakeLists.txt
+3-11 files

LLVM/project adcad45clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp, clang/test/CodeGen/AArch64 neon-intrinsics.c

[CIR] Vector saturating rounding shift right and narrow intrinsics  (#198947)

This PR ignores all SISD variants that we had in #198216 

Part of https://github.com/llvm/llvm-project/issues/185382

Move the test cases to
[intrinsics.c](https://github.com/llvm/llvmproject/pull/clang/test/CodeGen/AArch64/neon/intrinsics.c)
Removed the test cases from
[neon-intrinsics.c](https://github.com/llvm/llvmproject/pull/clang/test/CodeGen/AArch64/neon/intrinsics.c)


variants that are skipped/ not covered in this PR  (SISD)

```

1.  vqrshrunh_n_s16 (uint8_t, _h scalar)
2.  vqrshruns_n_s32 (uint16_t, _s scalar)
3.  vqrshrnh_n_s16 (int8_t, _h scalar)

    [5 lines not shown]
DeltaFile
+260-0clang/test/CodeGen/AArch64/neon/intrinsics.c
+0-255clang/test/CodeGen/AArch64/neon-intrinsics.c
+29-2clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+289-2573 files

LLVM/project 495e6c5llvm/lib/Transforms/Coroutines MaterializationUtils.cpp, llvm/test/Transforms/Coroutines coro-materialize-intrinsics.ll coro-materialize.ll

[Coroutines] Allow rematerialization of unary operators and selected intrinsics (#197698)

All of those can be cheaply recomputed when the coroutine has resumed.

Before this change, results of unary operators and intrinsics were
spilled into the coroutine frame and reloaded on resume:

```
  %neg = fneg float %n
  store float %neg, ptr %neg.spill.addr

  ; In resume:
  %neg.reload = load float, ptr %neg.reload.addr
  ; ... use %neg.reload
```

After this change, only the operand is spilled and the operation is
rematerialized on each resume, avoiding the frame store:


    [9 lines not shown]
DeltaFile
+537-0llvm/test/Transforms/Coroutines/coro-materialize-intrinsics.ll
+74-0llvm/test/Transforms/Coroutines/coro-materialize.ll
+56-2llvm/lib/Transforms/Coroutines/MaterializationUtils.cpp
+667-23 files

LLVM/project 3c8341aclang/lib/Driver/ToolChains AMDGPU.cpp, clang/test/Driver amdgpu-validate-sanitize.cl

clang/AMDGPU: Report all runtimeless sanitizers as available
DeltaFile
+18-0clang/test/Driver/amdgpu-validate-sanitize.cl
+1-1clang/lib/Driver/ToolChains/AMDGPU.cpp
+19-12 files

LLVM/project b2634fcclang/lib/AST/ByteCode InterpBuiltin.cpp, clang/test/AST/ByteCode builtin-functions.cpp

[clang][bytecode] Fix a crash in __builtin_subcb (#199400)

Don't try to initialize pointers that can't be initialized
DeltaFile
+7-0clang/test/AST/ByteCode/builtin-functions.cpp
+2-1clang/lib/AST/ByteCode/InterpBuiltin.cpp
+9-12 files

LLVM/project 3398f4emlir/lib/Transforms Mem2Reg.cpp, mlir/test/Dialect/LLVMIR mem2reg.mlir

[mlir][mem2reg] fix assert for indirect blocking uses inside regions (#199193)

When adding new blocking uses created by the interface of a previous
blocking uses (typically forwarding the blocking uses to the op result
users), the mem2reg framework was assuming that the new blocking uses
are in the same region as the original blocking use, which is not true
in general and lead to the assert:

`Transforms/Mem2Reg.cpp:743: void
{anonymous}::MemorySlotPromoter::removeBlockingUses(mlir::Region*):
Assertion `op->getParentRegion() == region && "all operations must still
be in the same region"' failed.`

This patch fixes this by adding the new uses into the userToBlockingUses
for the region of the new blocking uses.
DeltaFile
+14-0mlir/test/Dialect/LLVMIR/mem2reg.mlir
+2-1mlir/lib/Transforms/Mem2Reg.cpp
+16-12 files

LLVM/project 5ab7435llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlan.h

[LV] Add support for partial reduction chains with fsubs. (#197114)

The cost-model prevented this from happening, but the LV would otherwise
generate incorrect code (i.e. without the fneg).
DeltaFile
+193-1llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-costs.ll
+66-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-fdot-product.ll
+0-40llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-fsub-chained.ll
+26-9llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+13-7llvm/lib/Transforms/Vectorize/VPlan.h
+15-4llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+313-612 files not shown
+329-638 files

LLVM/project 322ff9fllvm/lib/Target/RISCV RISCVISelLowering.cpp

[RISCV] Remove TargetLowering arg from getContainerForFixedLengthVector. NFC (#199629)

Unless I'm missing something we can just fetch the TLI from
RISCVSubtarget
DeltaFile
+52-65llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+52-651 files

LLVM/project 48967ccclang/tools/libclang CMakeLists.txt, lldb/source/API CMakeLists.txt

build: adjust LLDB and clang library naming on Windows (#185084)

Ensure that use of the GNU driver does not change the library name on
Windows. We would check the build tools being MSVC rather than targeting
Windows to select the output name.

(cherry picked from commit 687e66c989887542b1702a7a99eeaa4e25edd12e)
DeltaFile
+1-1clang/tools/libclang/CMakeLists.txt
+1-1lldb/source/API/CMakeLists.txt
+2-22 files

LLVM/project 8798085libc/cmake/modules prepare_libc_gpu_build.cmake, openmp/device CMakeLists.txt

[libc] Demote compiler check error to a warning (#198033)

Summary:
This check exists to encode the policy that this is only intended to be
built with a just-built compiler. In practice it's a little too strict
and breaks pretty much every six months when the version bumps or when
people try to build a separate patch. Just demote to a warning.

(cherry picked from commit 13da33e922fe43cd97246f5e33320acc4f5ea186)
DeltaFile
+1-1libc/cmake/modules/prepare_libc_gpu_build.cmake
+1-1openmp/device/CMakeLists.txt
+2-22 files

LLVM/project 73f141fllvm/lib/DebugInfo/CodeView CodeViewRecordIO.cpp

[NFC] Add null terminator assert to CodeViewRecordIO::mapStringZ (#199624)

mapStringZ assumes that there's a null terminator past the end of Value
(I suppose the name hints at this too). This doesn't seem very nice to
me, but at least we can add an assert to check that the assumption
holds.
DeltaFile
+1-0llvm/lib/DebugInfo/CodeView/CodeViewRecordIO.cpp
+1-01 files

LLVM/project 6e5effcllvm/lib/Target/LoongArch LoongArchLSXInstrInfo.td LoongArchLASXInstrInfo.td, llvm/test/CodeGen/LoongArch/lasx/ir-instruction avg.ll

[LoongArch] Revert "Add patterns to support vector type average instructions generation" (#198306)

Fixes #198254
DeltaFile
+0-321llvm/test/CodeGen/LoongArch/lsx/ir-instruction/avg.ll
+0-321llvm/test/CodeGen/LoongArch/lasx/ir-instruction/avg.ll
+0-30llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+0-18llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+0-6904 files

LLVM/project 19e915fllvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchMachineFunctionInfo.h, llvm/test/CodeGen/LoongArch musttail-indirect-args.ll musttail-call.ll

[LoongArch] Fix musttail with indirect arguments by forwarding incoming pointers (#198965)

When a `musttail` call passes arguments indirectly (fp128 on LA32, i128
on LA32), the backend allocates a stack temporary and hands the callee a
pointer. The tail call deallocates the caller's frame, and the pointer
dangles.

Fix by forwarding the incoming indirect pointers instead. They point to
the caller's caller's frame, which stays valid after the tail call.
Forwarded formal parameters reuse the pointer directly; computed values
get stored into the incoming buffer first.

The pointers are saved in virtual registers (`CopyToReg`/`CopyFromReg`)
rather than SDValues. The SelectionDAG is cleared between basic blocks
and musttail calls can appear in non-entry blocks, so storing raw
SDValues across BBs is unsound (this was the bug that led to the revert
in 501417baa60f). The vreg save only fires when the function has
musttail calls; other functions see no codegen change.


    [2 lines not shown]
DeltaFile
+907-0llvm/test/CodeGen/LoongArch/musttail-indirect-args.ll
+183-44llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+20-0llvm/test/CodeGen/LoongArch/musttail-call.ll
+17-0llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h
+1,127-444 files

LLVM/project 4448a1allvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 avx512-mask-op.ll avx512-ext.ll

[X86] LowerBUILD_VECTORvXi1 - scalarize the bool masks if we insert a single non-const value (#199523)

Minor generalization of the existing fold for splat bool masks - if only
a single value is used in insertion(s) (as well as any immediate/undefs
values), then fold to a scalar select (val, insert|immediate, immediate)

Yak shaving for #198162
DeltaFile
+38-26llvm/lib/Target/X86/X86ISelLowering.cpp
+25-31llvm/test/CodeGen/X86/avx512-mask-op.ll
+8-32llvm/test/CodeGen/X86/avx512-ext.ll
+7-25llvm/test/CodeGen/X86/avx512-insert-extract.ll
+78-1144 files