LLVM/project fb63260llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch/lasx/ir-instruction mulwev_od.ll

including v2i128
DeltaFile
+58-870llvm/test/CodeGen/LoongArch/lasx/ir-instruction/mulwev_od.ll
+13-4llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+71-8742 files

LLVM/project 956f882llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch/lsx/ir-instruction mulwev_od.ll

deal with lsx i128
DeltaFile
+96-356llvm/test/CodeGen/LoongArch/lsx/ir-instruction/mulwev_od.ll
+34-22llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+130-3782 files

LLVM/project 264d57allvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchLSXInstrInfo.td, llvm/test/CodeGen/LoongArch/lasx/ir-instruction mulwev_od.ll

[LoongArch] Perform DAG combine for MUL to generate `[x]vmulw{ev/od}`
DeltaFile
+124-2,144llvm/test/CodeGen/LoongArch/lasx/ir-instruction/mulwev_od.ll
+32-154llvm/test/CodeGen/LoongArch/lsx/ir-instruction/mulwev_od.ll
+112-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+50-0llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+41-0llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+359-2,2985 files

LLVM/project decdcc9llvm/test/CodeGen/LoongArch/lasx/ir-instruction mulwev_od.ll, llvm/test/CodeGen/LoongArch/lsx/ir-instruction mulwev_od.ll

using poison
DeltaFile
+64-64llvm/test/CodeGen/LoongArch/lasx/ir-instruction/mulwev_od.ll
+48-48llvm/test/CodeGen/LoongArch/lsx/ir-instruction/mulwev_od.ll
+112-1122 files

LLVM/project cbf5cc5llvm/test/CodeGen/LoongArch/lasx/ir-instruction mulwev_od.ll, llvm/test/CodeGen/LoongArch/lsx/ir-instruction mulwev_od.ll

[LoongArch][NFC] Pre-commit tests for `[x]vmulw{ev/od}` instructions
DeltaFile
+3,475-0llvm/test/CodeGen/LoongArch/lasx/ir-instruction/mulwev_od.ll
+1,145-0llvm/test/CodeGen/LoongArch/lsx/ir-instruction/mulwev_od.ll
+4,620-02 files

LLVM/project 2ab9492llvm/tools/llc NewPMDriver.cpp

[llc][NPM] Use buffer_ostream support for non-seekable streams (#168842)

NPM was missing buffering for non-seekable output streams (stdout,
pipes), causing assertion failures when generating object files with `-o
-`.

Use buffer_ostream to provide seekable buffering, matching legacy PM
behavior.

Co-authored-by: vikhegde <vikram.hegde at amd.com>
DeltaFile
+7-0llvm/tools/llc/NewPMDriver.cpp
+7-01 files

LLVM/project 3d3307eclang/lib/CodeGen CGAtomic.cpp CGBuiltin.cpp, clang/lib/Frontend DependencyGraph.cpp HeaderIncludeGen.cpp

[clang][NFC] Inline Frontend/FrontendDiagnostic.h -> Basic/DiagnosticFrontend.h (#162883)

d076608d58d1ec55016eb747a995511e3a3f72aa moved some deps around to avoid
cycles and left clang/Frontend/FrontendDiagnostic.h as a shim that
simply includes clang/Basic/DiagnosticFrontend.h. This PR inlines it so
that nothing in tree still includes clang/Frontend/FrontendDiagnostic.h.

Doing this will help prevent future layering issues. See #162865.

Frontend already depends on Basic, so no new deps need to be added
anywhere except for places that do strict dep checking.
DeltaFile
+2-2clang/lib/Frontend/DependencyGraph.cpp
+2-2clang/lib/Frontend/HeaderIncludeGen.cpp
+1-1clang/lib/CodeGen/CGAtomic.cpp
+1-1clang/lib/CodeGen/CGBuiltin.cpp
+1-1clang/lib/CodeGen/CGHLSLRuntime.cpp
+1-1clang/lib/CodeGen/CodeGenFunction.cpp
+8-824 files not shown
+32-3130 files

LLVM/project db6a034llvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchSelectionDAGInfo.cpp

[LoongArch] Fix for `VLDREPL` node validation
DeltaFile
+7-5llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+0-10llvm/lib/Target/LoongArch/LoongArchSelectionDAGInfo.cpp
+0-3llvm/lib/Target/LoongArch/LoongArchSelectionDAGInfo.h
+7-183 files

LLVM/project ddce26bllvm/test/CodeGen/AMDGPU load-constant-i1.ll lds-misaligned-bug.ll

regression
DeltaFile
+235-223llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+49-45llvm/test/CodeGen/AMDGPU/lds-misaligned-bug.ll
+25-16llvm/test/CodeGen/AMDGPU/collapse-endcf.ll
+309-2843 files

LLVM/project 75ac548llvm/test/CodeGen/AMDGPU limit-coalesce.mir no-limit-coalesce.mir

Rename test
DeltaFile
+0-75llvm/test/CodeGen/AMDGPU/limit-coalesce.mir
+71-0llvm/test/CodeGen/AMDGPU/no-limit-coalesce.mir
+71-752 files

LLVM/project 1df694bllvm/test/CodeGen/AMDGPU shufflevector.v4p0.v4p0.ll shufflevector.v4i64.v4i64.ll

AMDGPU: Stop implementing shouldCoalesce

Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.
DeltaFile
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+5,975-8,879llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll
+3,880-6,644llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v3p0.ll
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3i64.v4i64.ll
+2,266-3,675llvm/test/CodeGen/AMDGPU/shufflevector.v3p0.v4p0.ll
+24,242-38,39656 files not shown
+56,024-78,26062 files

LLVM/project bf4dc96mlir/include/mlir/Dialect/Linalg/IR LinalgStructuredOps.td, mlir/lib/Dialect/Linalg/IR LinalgOps.cpp

[mlir][linalg] Clean up op verifiers without custom checks(NFC) (#168712)

This PR removes op verifiers that do not implement any custom
verification logic.
DeltaFile
+0-9mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+0-2mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
+0-112 files

LLVM/project 1d73b68llvm/lib/CodeGen TargetLoweringBase.cpp

TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744)

Query RuntimeLibcalls for the support and the name. The check
that the implementation is exactly __guard_local instead of
unsupported feels a bit strange.
DeltaFile
+12-10llvm/lib/CodeGen/TargetLoweringBase.cpp
+12-101 files

LLVM/project c34f76dllvm/tools/dsymutil MachOUtils.cpp

[dsymutil] Add missing validation for zero alignment section (#168925)

DeltaFile
+4-4llvm/tools/dsymutil/MachOUtils.cpp
+4-41 files

LLVM/project cff4602clang/lib/AST ASTContext.cpp, clang/test/SemaOpenCL builtins-extended-image-param-gfx1100-err.cl builtins-extended-image-param-gfx942-err.cl

[AMDGPU] Treating HIP/C++ _Float16 same as OpenCL's half
DeltaFile
+15-0clang/lib/AST/ASTContext.cpp
+1-1clang/test/SemaOpenCL/builtins-extended-image-param-gfx1100-err.cl
+1-1clang/test/SemaOpenCL/builtins-extended-image-param-gfx942-err.cl
+17-23 files

LLVM/project 423bdb2clang/docs OpenCLSupport.rst, clang/include/clang/Basic OpenCLExtensions.def

[OpenCL] Add missing OpenCL 3.0 features to OpenCLExtensions.def; revert header-only macros (#168016)

Adds the remaining optional feature macros from the OpenCL C 3.0 spec
(section 6.2.1 table). Targets can now enable these via
OpenCLFeaturesMap returned by getSupportedOpenCLOpts().

Revert a84599f177a6 (header‑only feature macros).
Header‑only macros are difficult to disable on SPIR-V targets,
and the prior undef approach (a60b8f468119) does not scale.
After this PR, they can be disabled via `-cl-ext=-<feature>`.

https://github.com/KhronosGroup/OpenCL-Docs/issues/1328 also notes that
unconditional definition of the header‑only macros in opencl-c-base.h
should be removed.
DeltaFile
+244-0clang/test/SemaOpenCL/extension-version.cl
+66-34clang/test/SemaOpenCL/features.cl
+0-99clang/lib/Headers/opencl-c-base.h
+46-9clang/include/clang/Basic/OpenCLExtensions.def
+5-11clang/docs/OpenCLSupport.rst
+3-3clang/test/Headers/opencl-c-header.cl
+364-1562 files not shown
+371-1568 files

LLVM/project 8439aebclang/include/clang/Frontend CompilerInvocation.h, clang/lib/Frontend CompilerInvocation.cpp

[Clang] Refactor getOptimizationLevel and getOptimizationLevelSize to non-static. NFC. (#168839)

So that we can reuse these functions in few place, such as in
clang/lib/Driver/ToolChains/CommonArgs.cpp. Part of the code there is
currently copied from getOptimizationLevel.
DeltaFile
+56-57clang/lib/Frontend/CompilerInvocation.cpp
+5-0clang/include/clang/Frontend/CompilerInvocation.h
+61-572 files

LLVM/project 8fdfe29llvm/lib/Target/AMDGPU AMDGPUISelDAGToDAG.cpp AMDGPUInstrInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel extractelement.i128.ll implicit-kernarg-backend-usage-global-isel.ll

AMDGPU: Fix treating unknown mem operands as uniform

The test changes are mostly GlobalISel specific regressions.
GlobalISel is still relying on isUniformMMO, but it doesn't really
have an excuse for doing so. These should be avoidable with new
regbankselect.

There is an additional regression for addrspacecast for cov4. We
probably ought to be using a separate PseudoSourceValue for the
access of the queue pointer.
DeltaFile
+222-52llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
+43-27llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll
+8-10llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+3-5llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+277-955 files

LLVM/project b43293allvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll

VectorCombine: Improve the insert/extract fold in the narrowing case

Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:

1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
   allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
   compatible, which allows foldLengthChangingShuffles to successfully
   recognize a chain that can be folded.

There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.

commit-id:c151bb04
DeltaFile
+6-16llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+2-15llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+8-4llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
+4-4llvm/test/Transforms/VectorCombine/X86/extract-insert.ll
+2-2llvm/test/Transforms/VectorCombine/X86/pr126085.ll
+22-415 files

LLVM/project aaee5f6llvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll

VectorCombine: Fold chains of shuffles fed by length-changing shuffles

Such chains can arise from folding insert/extract chains.

commit-id:a960175d
DeltaFile
+168-0llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+8-33llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+176-332 files

LLVM/project aa6362bllvm/lib/Target/AMDGPU AMDGPUTargetTransformInfo.cpp, llvm/test/Analysis/CostModel/AMDGPU shufflevector.ll

AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles

These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"

The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.

commit-id:8b76e888
DeltaFile
+498-488llvm/test/Analysis/CostModel/AMDGPU/shufflevector.ll
+111-34llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+107-20llvm/test/Transforms/SLPVectorizer/AMDGPU/reduction.ll
+33-64llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+17-34llvm/test/Transforms/SLPVectorizer/AMDGPU/slp-v2f16.ll
+1-31llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-chain-to-shuffles.ll
+767-6716 files

LLVM/project c3fdba0clang/include/clang/Basic Builtins.def, clang/lib/AST ASTContext.cpp

[AMDGPU] Removal of language sensitive option for _Float16 and half( 'e') handling (#168037)

Removing the 'e' handling for the amdgcn builtins as we decided to use
_Float16 for both HIP/C++ and OpenCL
DeltaFile
+2-6clang/lib/AST/ASTContext.cpp
+0-1clang/include/clang/Basic/Builtins.def
+2-72 files

LLVM/project d406c2cllvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel irtranslator-amdgpu_kernel.ll regbankselect-widen-scalar-loads.mir

AMDGPU: Use ConstantPool as source value for DAG lowered kernarg loads

This isn't quite a constant pool, but probably close enough for this
purpose. We just need some known invariant value address. The aliasing
queries against the real kernarg base pointer will falsely report
no aliasing, but for invariant memory it probably doesn't matter.
DeltaFile
+216-216llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll
+76-76llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-widen-scalar-loads.mir
+73-73llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-load.mir
+22-9llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+20-7llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+8-8llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-split-scalar-load-metadata.mir
+415-3894 files not shown
+433-39110 files

LLVM/project 4be9e5bllvm/lib/Target/AMDGPU SIISelLowering.cpp

AMDGPU: Handle invariant when lowering global loads

Global with invariant should be treated identically to
constant.
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1-11 files

LLVM/project a8b806cllvm/test/CodeGen/AMDGPU load-global-invariant.ll

AMDGPU: Add baseline test for split/widen invariant loads
DeltaFile
+77-0llvm/test/CodeGen/AMDGPU/load-global-invariant.ll
+77-01 files

LLVM/project 3954df9llvm/test/CodeGen/AMDGPU constant-address-space-32bit.ll

AMDGPU: Convert constant-address-space-32bit test to generated checks (#168975)

DeltaFile
+824-144llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
+824-1441 files

LLVM/project 06eac9flldb/include/lldb/Core SourceManager.h, lldb/include/lldb/Symbol CompileUnit.h

[lldb] Eliminate SupportFileSP nullptr derefs (#168624)

This patch fixes and eliminates the possibility of SupportFileSP ever
being nullptr. The support file was originally treated like a value
type, but became a polymorphic type and therefore has to be stored and
passed around as a pointer.

To avoid having all the callers check the validity of the pointer, I
introduced the invariant that SupportFileSP is never null and always
default constructed. However, without enforcement at the type level,
that's fragile and indeed, we already identified two crashes where
someone accidentally broke that invariant.

This PR introduces a NonNullSharedPtr to prevent that. NonNullSharedPtr
is a smart pointer wrapper around std::shared_ptr that guarantees the
pointer is never null. If default-constructed, it creates a
default-constructed instance of the contained type. Note that I'm using
private inheritance because you shouldn't inherit from standard library
classes due to the lack of virtual destructor. So while the new

    [4 lines not shown]
DeltaFile
+40-41lldb/source/Core/SourceManager.cpp
+80-0lldb/include/lldb/Utility/NonNullSharedPtr.h
+20-20lldb/include/lldb/Core/SourceManager.h
+6-6lldb/include/lldb/Symbol/CompileUnit.h
+5-7lldb/source/Utility/FileSpecList.cpp
+6-4lldb/unittests/Symbol/LineTableTest.cpp
+157-7816 files not shown
+184-10222 files

LLVM/project ac55d78llvm/lib/Target/AMDGPU SIInstrInfo.cpp, llvm/test/CodeGen/AMDGPU twoaddr-wmma.mir

AMDGPU: Don't duplicate implicit operands in 3-address conversion (#168426)

We previously got a duplicate implicit $exec operand. It didn't really
hurt anything (other than being a slight drag on compile-time
performance). Still, let's keep things clean.
DeltaFile
+12-12llvm/test/CodeGen/AMDGPU/twoaddr-wmma.mir
+2-2llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+14-142 files

LLVM/project 754ff45llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU codegen-prepare-addrspacecast-non-null.ll

AMDGPU: Try to use zext to implement constant-32-bit addrspacecast

If the high bits are assumed 0 for the cast, use zext. Previously
we would emit a build_vector and a bitcast with the high element
as 0. The zext is more easily optimized. I'm less convinced this is
good for globalisel, since you still need to have the inttoptr back
to the original pointer type.

The default value is 0, though I'm not sure if this is meaningful
in the real world. The real uses might always override the high
bit value with the attribute.
DeltaFile
+24-24llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-zextload-constant-32bit.mir
+18-18llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-constant-32bit.mir
+16-16llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-constant-32bit.mir
+18-9llvm/test/CodeGen/AMDGPU/codegen-prepare-addrspacecast-non-null.ll
+6-6llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir
+8-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+90-752 files not shown
+95-778 files

LLVM/project 5946d5bllvm/test/CodeGen/AMDGPU constant-address-space-32bit.ll

AMDGPU: Add more tests for 32-bit constant address space

The sub-dword cases just assert now, so comment those out.
DeltaFile
+1,560-19llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
+1,560-191 files