LLVM/project 8624316llvm/test/DebugInfo/RISCV relax_dwo_ranges.ll

fix
DeltaFile
+3-6llvm/test/DebugInfo/RISCV/relax_dwo_ranges.ll
+3-61 files

LLVM/project 86303fbllvm/test/DebugInfo/RISCV relax_dwo_ranges.ll

fix
DeltaFile
+1-1llvm/test/DebugInfo/RISCV/relax_dwo_ranges.ll
+1-11 files

LLVM/project 03ddd5cllvm/lib/CodeGen/AsmPrinter DwarfCompileUnit.cpp, llvm/test/DebugInfo/RISCV relax_dwo_ranges.ll

[dwarf] make dwarf fission compatible with RISCV relaxations 2/2

This patch makes DWARF fission compatible with RISC-V relaxations by
using indirect addressing for the DW_AT_high_pc attribute. This
eliminates the remaining relocations in .dwo files.
DeltaFile
+30-14llvm/test/DebugInfo/RISCV/relax_dwo_ranges.ll
+5-3llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
+35-172 files

LLVM/project 453d7b6llvm/lib/CodeGen/AsmPrinter DwarfDebug.cpp, llvm/lib/MC MCSymbol.cpp

add comments
DeltaFile
+6-0llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+4-0llvm/lib/MC/MCSymbol.cpp
+10-02 files

LLVM/project 9436a86llvm/lib/MC MCSymbol.cpp

fix
DeltaFile
+4-9llvm/lib/MC/MCSymbol.cpp
+4-91 files

LLVM/project f01e48bllvm/include/llvm/MC MCSymbol.h, llvm/lib/CodeGen/AsmPrinter DwarfDebug.cpp

[dwarf] make dwarf fission compatible with RISCV relaxations 1/2

Currently, -gsplit-dwarf and -mrelax are incompatible options in
Clang. The issue is that .dwo files should not contain any
relocations, as they are not processed by the linker. However,
relaxable code emits relocations in DWARF for debug ranges that
reside in the .dwo file when DWARF fission is enabled.

This patch makes DWARF fission compatible with RISC-V relaxations.
It uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which
allow referencing addresses from .debug_addr instead of using
absolute addresses. This approach eliminates relocations from .dwo
files.
DeltaFile
+187-0llvm/test/DebugInfo/RISCV/relax_dwo_ranges.ll
+33-25llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+18-3llvm/lib/MC/MCSymbol.cpp
+2-0llvm/include/llvm/MC/MCSymbol.h
+240-284 files

LLVM/project cf837e2lldb/include/lldb/Utility NonNullSharedPtr.h

[lldb] Add assert to NonNullSharedPtr move constructor (#168979)

As suggested by Augusto, add an assert to the NonNullSharedPtr move
constructor.
DeltaFile
+6-4lldb/include/lldb/Utility/NonNullSharedPtr.h
+6-41 files

LLVM/project fb63260llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch/lasx/ir-instruction mulwev_od.ll

including v2i128
DeltaFile
+58-870llvm/test/CodeGen/LoongArch/lasx/ir-instruction/mulwev_od.ll
+13-4llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+71-8742 files

LLVM/project 956f882llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch/lsx/ir-instruction mulwev_od.ll

deal with lsx i128
DeltaFile
+96-356llvm/test/CodeGen/LoongArch/lsx/ir-instruction/mulwev_od.ll
+34-22llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+130-3782 files

LLVM/project 264d57allvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchLSXInstrInfo.td, llvm/test/CodeGen/LoongArch/lasx/ir-instruction mulwev_od.ll

[LoongArch] Perform DAG combine for MUL to generate `[x]vmulw{ev/od}`
DeltaFile
+124-2,144llvm/test/CodeGen/LoongArch/lasx/ir-instruction/mulwev_od.ll
+32-154llvm/test/CodeGen/LoongArch/lsx/ir-instruction/mulwev_od.ll
+112-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+50-0llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+41-0llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+359-2,2985 files

LLVM/project decdcc9llvm/test/CodeGen/LoongArch/lasx/ir-instruction mulwev_od.ll, llvm/test/CodeGen/LoongArch/lsx/ir-instruction mulwev_od.ll

using poison
DeltaFile
+64-64llvm/test/CodeGen/LoongArch/lasx/ir-instruction/mulwev_od.ll
+48-48llvm/test/CodeGen/LoongArch/lsx/ir-instruction/mulwev_od.ll
+112-1122 files

LLVM/project cbf5cc5llvm/test/CodeGen/LoongArch/lasx/ir-instruction mulwev_od.ll, llvm/test/CodeGen/LoongArch/lsx/ir-instruction mulwev_od.ll

[LoongArch][NFC] Pre-commit tests for `[x]vmulw{ev/od}` instructions
DeltaFile
+3,475-0llvm/test/CodeGen/LoongArch/lasx/ir-instruction/mulwev_od.ll
+1,145-0llvm/test/CodeGen/LoongArch/lsx/ir-instruction/mulwev_od.ll
+4,620-02 files

LLVM/project 2ab9492llvm/tools/llc NewPMDriver.cpp

[llc][NPM] Use buffer_ostream support for non-seekable streams (#168842)

NPM was missing buffering for non-seekable output streams (stdout,
pipes), causing assertion failures when generating object files with `-o
-`.

Use buffer_ostream to provide seekable buffering, matching legacy PM
behavior.

Co-authored-by: vikhegde <vikram.hegde at amd.com>
DeltaFile
+7-0llvm/tools/llc/NewPMDriver.cpp
+7-01 files

LLVM/project 3d3307eclang/lib/CodeGen CGAtomic.cpp CGBuiltin.cpp, clang/lib/Frontend DependencyGraph.cpp HeaderIncludeGen.cpp

[clang][NFC] Inline Frontend/FrontendDiagnostic.h -> Basic/DiagnosticFrontend.h (#162883)

d076608d58d1ec55016eb747a995511e3a3f72aa moved some deps around to avoid
cycles and left clang/Frontend/FrontendDiagnostic.h as a shim that
simply includes clang/Basic/DiagnosticFrontend.h. This PR inlines it so
that nothing in tree still includes clang/Frontend/FrontendDiagnostic.h.

Doing this will help prevent future layering issues. See #162865.

Frontend already depends on Basic, so no new deps need to be added
anywhere except for places that do strict dep checking.
DeltaFile
+2-2clang/lib/Frontend/DependencyGraph.cpp
+2-2clang/lib/Frontend/HeaderIncludeGen.cpp
+1-1clang/lib/CodeGen/CGAtomic.cpp
+1-1clang/lib/CodeGen/CGBuiltin.cpp
+1-1clang/lib/CodeGen/CGHLSLRuntime.cpp
+1-1clang/lib/CodeGen/CodeGenFunction.cpp
+8-824 files not shown
+32-3130 files

LLVM/project db6a034llvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchSelectionDAGInfo.cpp

[LoongArch] Fix for `VLDREPL` node validation
DeltaFile
+7-5llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+0-10llvm/lib/Target/LoongArch/LoongArchSelectionDAGInfo.cpp
+0-3llvm/lib/Target/LoongArch/LoongArchSelectionDAGInfo.h
+7-183 files

LLVM/project ddce26bllvm/test/CodeGen/AMDGPU load-constant-i1.ll lds-misaligned-bug.ll

regression
DeltaFile
+235-223llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+49-45llvm/test/CodeGen/AMDGPU/lds-misaligned-bug.ll
+25-16llvm/test/CodeGen/AMDGPU/collapse-endcf.ll
+309-2843 files

LLVM/project 75ac548llvm/test/CodeGen/AMDGPU limit-coalesce.mir no-limit-coalesce.mir

Rename test
DeltaFile
+0-75llvm/test/CodeGen/AMDGPU/limit-coalesce.mir
+71-0llvm/test/CodeGen/AMDGPU/no-limit-coalesce.mir
+71-752 files

LLVM/project 1df694bllvm/test/CodeGen/AMDGPU shufflevector.v4p0.v4p0.ll shufflevector.v4i64.v4i64.ll

AMDGPU: Stop implementing shouldCoalesce

Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.
DeltaFile
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3i64.v4i64.ll
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3p0.v4p0.ll
+24,242-38,39656 files not shown
+56,024-78,26062 files

LLVM/project bf4dc96mlir/include/mlir/Dialect/Linalg/IR LinalgStructuredOps.td, mlir/lib/Dialect/Linalg/IR LinalgOps.cpp

[mlir][linalg] Clean up op verifiers without custom checks(NFC) (#168712)

This PR removes op verifiers that do not implement any custom
verification logic.
DeltaFile
+0-9mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+0-2mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
+0-112 files

LLVM/project 1d73b68llvm/lib/CodeGen TargetLoweringBase.cpp

TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744)

Query RuntimeLibcalls for the support and the name. The check
that the implementation is exactly __guard_local instead of
unsupported feels a bit strange.
DeltaFile
+12-10llvm/lib/CodeGen/TargetLoweringBase.cpp
+12-101 files

LLVM/project c34f76dllvm/tools/dsymutil MachOUtils.cpp

[dsymutil] Add missing validation for zero alignment section (#168925)

DeltaFile
+4-4llvm/tools/dsymutil/MachOUtils.cpp
+4-41 files

LLVM/project cff4602clang/lib/AST ASTContext.cpp, clang/test/SemaOpenCL builtins-extended-image-param-gfx1100-err.cl builtins-extended-image-param-gfx942-err.cl

[AMDGPU] Treating HIP/C++ _Float16 same as OpenCL's half
DeltaFile
+15-0clang/lib/AST/ASTContext.cpp
+1-1clang/test/SemaOpenCL/builtins-extended-image-param-gfx1100-err.cl
+1-1clang/test/SemaOpenCL/builtins-extended-image-param-gfx942-err.cl
+17-23 files

LLVM/project 423bdb2clang/docs OpenCLSupport.rst, clang/include/clang/Basic OpenCLExtensions.def

[OpenCL] Add missing OpenCL 3.0 features to OpenCLExtensions.def; revert header-only macros (#168016)

Adds the remaining optional feature macros from the OpenCL C 3.0 spec
(section 6.2.1 table). Targets can now enable these via
OpenCLFeaturesMap returned by getSupportedOpenCLOpts().

Revert a84599f177a6 (header‑only feature macros).
Header‑only macros are difficult to disable on SPIR-V targets,
and the prior undef approach (a60b8f468119) does not scale.
After this PR, they can be disabled via `-cl-ext=-<feature>`.

https://github.com/KhronosGroup/OpenCL-Docs/issues/1328 also notes that
unconditional definition of the header‑only macros in opencl-c-base.h
should be removed.
DeltaFile
+244-0clang/test/SemaOpenCL/extension-version.cl
+66-34clang/test/SemaOpenCL/features.cl
+0-99clang/lib/Headers/opencl-c-base.h
+46-9clang/include/clang/Basic/OpenCLExtensions.def
+5-11clang/docs/OpenCLSupport.rst
+3-3clang/test/Headers/opencl-c-header.cl
+364-1562 files not shown
+371-1568 files

LLVM/project 8439aebclang/include/clang/Frontend CompilerInvocation.h, clang/lib/Frontend CompilerInvocation.cpp

[Clang] Refactor getOptimizationLevel and getOptimizationLevelSize to non-static. NFC. (#168839)

So that we can reuse these functions in few place, such as in
clang/lib/Driver/ToolChains/CommonArgs.cpp. Part of the code there is
currently copied from getOptimizationLevel.
DeltaFile
+56-57clang/lib/Frontend/CompilerInvocation.cpp
+5-0clang/include/clang/Frontend/CompilerInvocation.h
+61-572 files

LLVM/project 8fdfe29llvm/lib/Target/AMDGPU AMDGPUISelDAGToDAG.cpp AMDGPUInstrInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel extractelement.i128.ll implicit-kernarg-backend-usage-global-isel.ll

AMDGPU: Fix treating unknown mem operands as uniform

The test changes are mostly GlobalISel specific regressions.
GlobalISel is still relying on isUniformMMO, but it doesn't really
have an excuse for doing so. These should be avoidable with new
regbankselect.

There is an additional regression for addrspacecast for cov4. We
probably ought to be using a separate PseudoSourceValue for the
access of the queue pointer.
DeltaFile
+222-52llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
+43-27llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll
+8-10llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+3-5llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+277-955 files

LLVM/project b43293allvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll

VectorCombine: Improve the insert/extract fold in the narrowing case

Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:

1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
   allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
   compatible, which allows foldLengthChangingShuffles to successfully
   recognize a chain that can be folded.

There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.

commit-id:c151bb04
DeltaFile
+6-16llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+2-15llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+8-4llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
+4-4llvm/test/Transforms/VectorCombine/X86/extract-insert.ll
+2-2llvm/test/Transforms/VectorCombine/X86/pr126085.ll
+22-415 files

LLVM/project aaee5f6llvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll

VectorCombine: Fold chains of shuffles fed by length-changing shuffles

Such chains can arise from folding insert/extract chains.

commit-id:a960175d
DeltaFile
+168-0llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+8-33llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+176-332 files

LLVM/project aa6362bllvm/lib/Target/AMDGPU AMDGPUTargetTransformInfo.cpp, llvm/test/Analysis/CostModel/AMDGPU shufflevector.ll

AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles

These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"

The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.

commit-id:8b76e888
DeltaFile
+498-488llvm/test/Analysis/CostModel/AMDGPU/shufflevector.ll
+111-34llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+107-20llvm/test/Transforms/SLPVectorizer/AMDGPU/reduction.ll
+33-64llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+17-34llvm/test/Transforms/SLPVectorizer/AMDGPU/slp-v2f16.ll
+1-31llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-chain-to-shuffles.ll
+767-6716 files

LLVM/project c3fdba0clang/include/clang/Basic Builtins.def, clang/lib/AST ASTContext.cpp

[AMDGPU] Removal of language sensitive option for _Float16 and half( 'e') handling (#168037)

Removing the 'e' handling for the amdgcn builtins as we decided to use
_Float16 for both HIP/C++ and OpenCL
DeltaFile
+2-6clang/lib/AST/ASTContext.cpp
+0-1clang/include/clang/Basic/Builtins.def
+2-72 files

LLVM/project d406c2cllvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel irtranslator-amdgpu_kernel.ll regbankselect-widen-scalar-loads.mir

AMDGPU: Use ConstantPool as source value for DAG lowered kernarg loads

This isn't quite a constant pool, but probably close enough for this
purpose. We just need some known invariant value address. The aliasing
queries against the real kernarg base pointer will falsely report
no aliasing, but for invariant memory it probably doesn't matter.
DeltaFile
+216-216llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll
+76-76llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-widen-scalar-loads.mir
+73-73llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-load.mir
+22-9llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+20-7llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+8-8llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-split-scalar-load-metadata.mir
+415-3894 files not shown
+433-39110 files