LLVM/project 1139f85clang/docs ReleaseNotes.rst, llvm/docs ReleaseNotes.md

release/22.x: Add release notes for the PowerPC target
DeltaFile
+30-0llvm/docs/ReleaseNotes.md
+7-0clang/docs/ReleaseNotes.rst
+37-02 files

LLVM/project 8555a6ellvm/include/llvm/IR RuntimeLibcalls.td, llvm/test/Transforms/SafeStack/AArch64 safestack.ll

Reapply "RuntimeLibcalls: Fix adding __safestack_pointer_address by default" (#182949)

This reverts commit 6d37110e091569509f54e2b1f3ef35e8a50e5b70.

Now with aarch64 test.
DeltaFile
+39-0llvm/test/Transforms/SafeStack/AArch64/safestack.ll
+38-0llvm/test/Transforms/SafeStack/SPARC/safestack.ll
+6-4llvm/include/llvm/IR/RuntimeLibcalls.td
+83-43 files

LLVM/project 6b65bbcclang-tools-extra/clang-tidy/readability RedundantTypenameCheck.cpp, clang-tools-extra/test/clang-tidy/checkers/readability redundant-typename.cpp

[clang-tidy] Fix false positive from `readability-redundant-typename` on partially specialized variables (#175473)

Fixes #174827.

(cherry picked from commit 6ddab42952eeccf7aedefade42611a272ba72745)
DeltaFile
+15-12clang-tools-extra/clang-tidy/readability/RedundantTypenameCheck.cpp
+10-0clang-tools-extra/test/clang-tidy/checkers/readability/redundant-typename.cpp
+25-122 files

LLVM/project 9b25011clang/include/clang/Sema Sema.h, clang/lib/Sema SemaConcept.cpp SemaTemplateInstantiate.cpp

[Clang] Fix the normalization of fold constraints (#177531)

Fold constraints can contain packs expanded from different locations.
For `C<Ps...>`, where the ellipsis immediately follows the argument, the
pack should be expanded in place regardless of the fold expression. For
`C<Ps> && ...`, the fold expression itself is responsible for expanding
Ps.

Previously, both kinds of packs were expanded by the fold expression,
which broke assumptions within concept caching. This patch fixes that by
preserving PackExpansionTypes for the first kind of pack while rewriting
them to non-packs for the second kind.

This patch also removes an unused function and performs some cleanup of
the evaluation contexts. Hopefully it is viable for backporting.

No release note, as this issue was a regression.

Fixes https://github.com/llvm/llvm-project/issues/177245

    [2 lines not shown]
DeltaFile
+89-53clang/lib/Sema/SemaConcept.cpp
+12-30clang/lib/Sema/SemaTemplateInstantiate.cpp
+16-0clang/test/SemaCXX/cxx2c-fold-exprs.cpp
+1-11clang/include/clang/Sema/Sema.h
+1-1clang/lib/Sema/TreeTransform.h
+119-955 files

LLVM/project 0b8bb80llvm/lib/MC MCContext.cpp, llvm/test/MC/ELF section-sym-err.s

[MC] Fix crash in x=0; .section x (#183001)

When an equated symbol (e.g. `x=0`) is followed by `.section x`,
getOrCreateSectionSymbol reports an "invalid symbol redefinition"
error but continues to reuse the equated symbol as a section symbol.
This causes an assertion failure in MCObjectStreamer::changeSection
when `setFragment` is called on the equated symbol.

Fix this by clearning `Sym`.
DeltaFile
+8-1llvm/test/MC/ELF/section-sym-err.s
+5-1llvm/lib/MC/MCContext.cpp
+13-22 files

LLVM/project 3f024d0lldb/source/Core Module.cpp, lldb/source/Plugins/ObjectContainer/BSD-Archive ObjectContainerBSDArchive.cpp ObjectContainerBSDArchive.h

[lldb] A few small code modernizations and cleanups [NFC] (#182656)

I was reading through ObjectContainerBSDArchive and came across some
dead method decls, a less-than-completely-clear `shared_ptr` typedef in
`ObjectContainerBSDArchive::Archive` for a shared_ptr<Archive> which was
a little unclear when reading a decl like `shared_ptr archive_sp;` for a
local variable.
DeltaFile
+19-19lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp
+12-15lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.h
+3-2lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp
+2-2lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp
+1-3lldb/source/Core/Module.cpp
+1-1lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
+38-426 files

LLVM/project 6654737llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 fcvt_combine.ll shuffle-tbl34.ll

[AArch64] Optimize 64-bit constant vector builds (#177076)

This patch optimizes the creation of constant 64-bit vectors (e.g.,
v2i32, v4i16) by avoiding expensive loads from the constant pool. The
optimization works by packing the constant vector elements into a single
i64 immediate and bitcasting the result to the target vector type. This
replaces a memory access with more efficient immediate materialization.
To ensure this transformation is efficient, a check is performed to
verify that the immediate can be generated in two or fewer mov
instructions. If it requires more, the compiler falls back to using the
constant pool.
The optimization is disabled for bigendian targets for now.
DeltaFile
+66-14llvm/test/CodeGen/AArch64/fcvt_combine.ll
+18-42llvm/test/CodeGen/AArch64/shuffle-tbl34.ll
+39-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+20-18llvm/test/CodeGen/AArch64/srem-vector-lkk.ll
+20-9llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll
+14-14llvm/test/CodeGen/AArch64/constant-pool-partition.ll
+177-9715 files not shown
+280-17121 files

LLVM/project 863813clldb/source/Plugins/ScriptInterpreter/Python CMakeLists.txt ScriptInterpreterPython.cpp, lldb/source/Plugins/ScriptInterpreter/Python/Interfaces CMakeLists.txt ScriptInterpreterPythonInterfaces.cpp

[lldb] Merge interfaces into lldbPluginScriptInterpreterPython (NFC) (#182962)

Make the interfaces part of lldbPluginScriptInterpreterPython instead of
putting them into their own static library. This avoids the need for an
extra static archive and more importantly a bunch of code duplication
between the two CMakeLists.txt.
DeltaFile
+0-46lldb/source/Plugins/ScriptInterpreter/Python/Interfaces/CMakeLists.txt
+12-4lldb/source/Plugins/ScriptInterpreter/Python/CMakeLists.txt
+4-1lldb/source/Plugins/ScriptInterpreter/Python/ScriptInterpreterPython.cpp
+0-2lldb/source/Plugins/ScriptInterpreter/Python/Interfaces/ScriptInterpreterPythonInterfaces.cpp
+16-534 files

LLVM/project bf3ac05clang/include/clang/Basic BuiltinsAMDGPU.td, clang/test/CodeGenHIP builtins-amdgcn-gfx1250-cvt-f16.hip

[Clang][AMDGPU] Change __fp16 to _Float16 in GFX1250 CVT builtin definitions (#182893)

Change the type signature `gfx1250 cvt` builtins from `__fp16` to
`_Float16` in the tablegen builtin definitions.
DeltaFile
+609-0clang/test/CodeGenHIP/builtins-amdgcn-gfx1250-cvt-f16.hip
+24-24clang/include/clang/Basic/BuiltinsAMDGPU.td
+633-242 files

LLVM/project 08e0b56llvm/test/Transforms/ThinLTOBitcodeWriter split-internal2.ll

[NFC][ThinLTO] Check that refs between split modules have the same GUID
DeltaFile
+9-0llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll
+9-01 files

LLVM/project a96daballvm/include/llvm/DWARFLinker/Classic DWARFLinkerCompileUnit.h

[DWARFLinker] Fix buildbot crash: NewUnit can be null during garbage (#182993)

The assert added in
[0ab1d23fbfa2ae0ba14315cb11678d2289510f66](https://github.com/llvm/llvm-project/commit/0ab1d23fbfa2ae0ba14315cb11678d2289510f66)
is incorrect, NewUnit is legitimately null for compile units that are
skipped during garbage collection (e.g. dwarf5-macro.test). Revert to
the original null check.
DeltaFile
+2-2llvm/include/llvm/DWARFLinker/Classic/DWARFLinkerCompileUnit.h
+2-21 files

LLVM/project 762ad00mlir/lib/Dialect/GPU/Transforms ModuleToBinary.cpp, mlir/test/Dialect/GPU module-to-binary-invalid-format.mlir

[mlir][gpu] Validate `gpu-module-to-binary` format (#182842)

`GpuModuleToBinaryPass::runOnOperation` now treats an unsupported
`format` value as a pass failure after emitting `"Invalid format
specified."`.

Add a regression test in
`mlir/test/Dialect/GPU/module-to-binary-invalid-format.mlir`.

Fix: https://github.com/llvm/llvm-project/issues/77052
Fix: https://github.com/llvm/llvm-project/issues/116344
Fix: https://github.com/llvm/llvm-project/issues/116346
Fix: https://github.com/llvm/llvm-project/issues/116352
DeltaFile
+5-0mlir/test/Dialect/GPU/module-to-binary-invalid-format.mlir
+3-1mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
+8-12 files

LLVM/project 03bb370llvm/test/CodeGen/RISCV/rvv fixed-vectors-zvqdotq.ll fixed-vectors-zvdot4a8i.ll, llvm/test/Transforms/LoopVectorize/RISCV partial-reduce-dot-product.ll

[RISCV][llvm] Rename zvqdotq to zvdot4a8i (#179393)

The renaming PR is here:
https://github.com/riscv/riscv-isa-manual/pull/2576
Note that this also update the version number.
DeltaFile
+0-1,684llvm/test/CodeGen/RISCV/rvv/fixed-vectors-zvqdotq.ll
+1,684-0llvm/test/CodeGen/RISCV/rvv/fixed-vectors-zvdot4a8i.ll
+0-1,035llvm/test/CodeGen/RISCV/rvv/zvqdotq-sdnode.ll
+1,035-0llvm/test/CodeGen/RISCV/rvv/zvdot4a8i-sdnode.ll
+287-287llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll
+0-337llvm/test/CodeGen/RISCV/rvv/vqdotu.ll
+3,006-3,34385 files not shown
+9,392-9,38991 files

LLVM/project d44e957clang/include/clang/AST ASTContext.h, clang/lib/AST ASTContext.cpp ExprConstant.cpp

[clang][AST] Make ASTContext::InterpContext mutable (#182884)

Do the `const_cast` only once, in `ASTContext::getInterpContext()`.
DeltaFile
+2-2clang/include/clang/AST/ASTContext.h
+2-2clang/lib/AST/ASTContext.cpp
+1-1clang/lib/AST/ExprConstant.cpp
+5-53 files

LLVM/project 0ab1d23llvm/include/llvm/DWARFLinker/Classic DWARFLinkerCompileUnit.h, llvm/lib/DWARFLinker/Classic DWARFLinker.cpp

[DWARFLinker] Use DIEEntry for backward ref_addr references (#181881)

The classic DWARF linker avoids `DIEEntry` for `DW_FORM_ref_addr`
references, using raw `DIEInteger` values with manual offset computation
instead. A stale FIXME explains this was because "the implementation
calls back to DwarfDebug to find the unit offset", but this is no longer
true. `DIEEntry` resolves offsets via
`DIEUnit::getDebugSectionOffset()`, which has no `DwarfDebug`
dependency.


And the real constraint is that forward references may point to
placeholder `DIEs` that never get adopted into a unit tree (due toODR
pruning), so `DIEEntry` cannot resolve them(a test failed during
refactoring this). However, backward references are safe, the target DIE
is already cloned and parented in a unit tree.
DeltaFile
+198-0llvm/test/tools/llvm-dwarfutil/ELF/X86/odr-backward-ref-addr.test
+9-14llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
+12-2llvm/include/llvm/DWARFLinker/Classic/DWARFLinkerCompileUnit.h
+219-163 files

LLVM/project 3de9828mlir/cmake/modules AddMLIR.cmake

[MLIR][CMake] Disable PCH reuse for C API libraries (#182862)

C API libraries override the symbol visibility default, which is incompatible with PCH.
DeltaFile
+3-0mlir/cmake/modules/AddMLIR.cmake
+3-01 files

LLVM/project 09ab9a1mlir/lib/Conversion/MemRefToEmitC MemRefToEmitC.cpp

[MemRefToEmitC] fix typo (#182991)

DeltaFile
+1-1mlir/lib/Conversion/MemRefToEmitC/MemRefToEmitC.cpp
+1-11 files

LLVM/project d5bf514llvm/lib/Target/NVPTX NVPTXISelLowering.cpp, llvm/test/CodeGen/NVPTX scalarize-non-coalescable-v2f32.ll

[NVPTX] Scalarize v2f32 instructions if input operand guarantees need for register coalescing (#180113)

The support of f32 packed instructions in #126337 revealed performance
regressions on certain kernels. In one case, the cause comes from
loading a v4f32 from shared memory but then accessing them as {r0, r2}
and {r1, r3} from the full load of {r0, r1, r2, r3}.

This access pattern guarantees the registers requires a coalescing
operation which increases register pressure and degrades performance.
The fix here is to identify if we can prove that an v2f32 operand comes
from non-contiguous vector extracts and if so scalarizes the operation
so the coalescing operation is no longer needed.

I've found that ptxas can see through the extra unpacks/repacks of
contiguous registers this causes in MIR. However in the full test case
the packing of the final scalar->vector results does generate additional
costs especially since the only users unpack them. An additional MIR
pass is possible to catch the case


    [4 lines not shown]
DeltaFile
+356-0llvm/test/CodeGen/NVPTX/scalarize-non-coalescable-v2f32.ll
+122-2llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+478-22 files

LLVM/project 9b4c99allvm/lib/CodeGen/AsmPrinter AsmPrinter.cpp

[AsmPrinter] Use default capture for assertion only lambda (#182986)

Otherwise we get an unused variable warning/error in non-assertion
builds.
DeltaFile
+1-1llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+1-11 files

LLVM/project 1b47242llvm/lib/Transforms/Instrumentation ControlHeightReduction.cpp, llvm/test/Transforms/PGOProfile chr-convergent.ll

[CHR] Skip regions containing convergent calls (#180882)

CHR (Control Height Reduction) merges multiple biased branches into a
single speculative check, cloning the region into hot/cold paths. On
GPU targets, the merged branch may be divergent (evaluated per-thread),
splitting the wavefront: some threads take the hot path, others the
cold path.

A convergent call like ds_bpermute (a cross-lane operation on AMDGPU)
requires a specific set of threads to be active — when thread X reads
from thread Y, thread Y must be active and participating in the same
call. After CHR cloning, thread Y may have gone to the cold path while
thread X is on the hot path, so the hot-path ds_bpermute reads a stale
register value from thread Y instead of the intended value.

This caused a miscompilation in rocPRIM's lookback scan: CHR duplicated
a region containing ds_bpermute, and the hot-path copy executed with a
different set of active threads, reading incorrect cross-lane data and
causing a memory access fault.

    [2 lines not shown]
DeltaFile
+137-0llvm/test/Transforms/PGOProfile/chr-convergent.ll
+20-1llvm/lib/Transforms/Instrumentation/ControlHeightReduction.cpp
+157-12 files

LLVM/project 3b2c1dbmlir/python/mlir/dialects ext.py, mlir/test/python/dialects ext.py

[MLIR][Python] Support type definitions in Python-defined dialects (#182805)

In this PR, we added basic support of type definitions in Python-defined
dialects, including:
- IRDL codegen for type definitions
- Type builders like `MyType.get(..)` and type parameter accessors (e.g.
`my_type.param1`)
- Use Python-defined types in Python-defined oeprations

```python
class TestType(Dialect, name="ext_type"):
    pass

class Array(TestType.Type, name="array"):
    elem_type: IntegerType[32] | IntegerType[64]
    length: IntegerAttr

class MakeArrayOp(TestType.Operation, name="make_array"):
    arr: Result[Array]

    [3 lines not shown]
DeltaFile
+127-7mlir/python/mlir/dialects/ext.py
+73-0mlir/test/python/dialects/ext.py
+200-72 files

LLVM/project ccfd59allvm/test/CodeGen/WebAssembly load-ext.ll

[NFC][WebAssembly] Expanding load-ext testcases for the MVP CPU target (#182864)

Some features tested in load-ext require sign-ext. 
To test this, add tests targeting the MVP CPU.
DeltaFile
+576-72llvm/test/CodeGen/WebAssembly/load-ext.ll
+576-721 files

LLVM/project 13838eflld/ELF/Arch X86_64.cpp, lld/test/ELF ztext.s x86-x32-abs.s

[ELF] Adjust allowed dynamic relocation types for x86-64 (#182905)

First, disallow R_X86_64_PC64 - generally only absolute relocations are
allowed in getDynRel. glibc and musl don't support R_X86_64_PC64 as
dynamic relocations.

Second, support R_X86_64_32 as dynamic relocation for the ILP32 ABI
(x32). GNU ld's behavior looks like:

- R_X86_64_32 => R_X86_64_RELATIVE
- R_X86_64_64 with addend 0 => R_X86_64_RELATIVE
- R_X86_64_64 with non-zero addend => R_X86_64_RELATIVE64 (unsupported
  by musl; compilers do not generate such constructs to the best of my
  knowledge)

For now we require R_X86_64_64 to be resolved at link-time for x32.

Fix #140465
DeltaFile
+25-13lld/test/ELF/ztext.s
+34-0lld/test/ELF/x86-x32-abs.s
+0-10lld/test/ELF/Inputs/ztext.s
+2-3lld/ELF/Arch/X86_64.cpp
+61-264 files

LLVM/project 05a0394libc/shared/math bf16mulf128.h, libc/src/__support/math CMakeLists.txt bf16mulf128.h

[libc][math] Refactor bf16mul family to header-only (#182018)

Refactors the bf16mul math family to be header-only.

Closes https://github.com/llvm/llvm-project/issues/182017

Target Functions:
  - bf16mul
  - bf16mulf
  - bf16mulf128
  - bf16mull
DeltaFile
+69-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+41-0libc/src/__support/math/CMakeLists.txt
+32-0libc/src/__support/math/bf16mulf128.h
+28-0libc/shared/math/bf16mulf128.h
+26-0libc/src/__support/math/bf16mul.h
+26-0libc/src/__support/math/bf16mulf.h
+222-012 files not shown
+339-4018 files

LLVM/project b311c02mlir/lib/Dialect/Affine/Utils LoopFusionUtils.cpp, mlir/test/Dialect/Affine loop-fusion-4.mlir

[MLIR][Affine] Fix assert in slice compute cost (#182712)

Fixes https://github.com/llvm/llvm-project/issues/180029.
DeltaFile
+54-0mlir/test/Dialect/Affine/loop-fusion-4.mlir
+5-2mlir/lib/Dialect/Affine/Utils/LoopFusionUtils.cpp
+59-22 files

LLVM/project 79b2ed3llvm/test/Transforms/ThinLTOBitcodeWriter split-internal2.ll

[NFC][ThinLTO] Check that refs between split modules have the same GUID
DeltaFile
+9-0llvm/test/Transforms/ThinLTOBitcodeWriter/split-internal2.ll
+9-01 files

LLVM/project a17a305llvm/lib/Analysis InstCount.cpp, llvm/test/Analysis/InstCount instcount.ll

[LLVM] Metric added - largest number of basic blocks in a single func… (#182970)

This metric gets the size of the biggest count of basic blocks in a
single function.
DeltaFile
+3-0llvm/lib/Analysis/InstCount.cpp
+1-0llvm/test/Analysis/InstCount/instcount.ll
+4-02 files

LLVM/project 6b63c59llvm/lib/CodeGen/AsmPrinter AsmPrinter.cpp, llvm/lib/Target/X86 X86AsmPrinter.cpp X86AsmPrinter.h

[NewPM][X86] Port AsmPrinter to NewPM

This patch makes AsmPrinter work with the NewPM. We essentially create
three new passes that wrap different parts of AsmPrinter so that we can
separate out doIntialization/doFinalization without needing to
materialize all MachineFunctions at the same time. This has two main
drawbacks for now:

1. We do not transfer any state between the three new AsmPrinter passes.
   This means that debuginfo/CFI currently does not work. This will be
   fixed in future passes by moving this state to MachineModuleInfo.
2. We probably incur some overhead by needing to setup up analysis
   callbacks for every MF rather than just per module. This should not
   be large, and can be optimized in the future on top of this if
   needed.
3. This solution is not really clean. However, a lot of cleanup is going
   to be difficult to do while supporting two pass managers. Once we
   remove LegacyPM support, we can make the code much cleaner and better
   enforce invariants like a lack of state between

    [5 lines not shown]
DeltaFile
+65-0llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+46-0llvm/lib/Target/X86/X86AsmPrinter.cpp
+38-0llvm/lib/Target/X86/X86AsmPrinter.h
+17-6llvm/lib/Target/X86/X86CodeGenPassBuilder.cpp
+16-4llvm/test/CodeGen/X86/llc-pipeline-npm.ll
+9-0llvm/test/CodeGen/X86/npm-asmprint.ll
+191-101 files not shown
+198-107 files

LLVM/project 25f69d7llvm/lib/Target/X86 X86AsmPrinter.cpp X86AsmPrinter.h

[NFCi][NewPM][x86] Use callbacks to get analyses in AsmPrinter

This allows for overriding these call backs when using the NewPM which
has different methods for obtaining analysis results.

Reviewers: RKSimon, arsenm, phoebewang, mingmingl-llvm, aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/182796
DeltaFile
+15-5llvm/lib/Target/X86/X86AsmPrinter.cpp
+3-0llvm/lib/Target/X86/X86AsmPrinter.h
+18-52 files

LLVM/project abc443bllvm/include/llvm/Passes CodeGenPassBuilder.h, llvm/lib/Target/AMDGPU R600TargetMachine.cpp AMDGPUTargetMachine.cpp

[CodeGen][NewPM] Adjust pipeline for AsmPrinter

AsmPrinter needs to be split into three passes (begin, per MF, end) to
avoid the need to materialize all machine functions at the same time.
Update the CodeGenPassBuilder hooks for this.

Reviewers: aeubanks, paperchalice, arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/182795
DeltaFile
+26-10llvm/include/llvm/Passes/CodeGenPassBuilder.h
+18-3llvm/lib/Target/AMDGPU/R600TargetMachine.cpp
+17-2llvm/lib/Target/X86/X86CodeGenPassBuilder.cpp
+14-2llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+75-174 files