LLVM/project ea8f3dfllvm/test/Transforms/LoopVectorize cast-costs.ll vscale-cost.ll

[LV][NFC] Add cost model tests for VPInstructionWithType (#200135)
DeltaFile
+80-0llvm/test/Transforms/LoopVectorize/cast-costs.ll
+36-0llvm/test/Transforms/LoopVectorize/vscale-cost.ll
+116-02 files

LLVM/project f2f9eaellvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining-avx512vbmi2.ll

[X86] matchShuffleAsVSHLD - fix incorrect shift factor (#200754)

#200604 left the non-commuted case to still scale by 8bits instead of the src scalar bit size
DeltaFile
+17-0llvm/test/CodeGen/X86/vector-shuffle-combining-avx512vbmi2.ll
+1-1llvm/lib/Target/X86/X86ISelLowering.cpp
+18-12 files

LLVM/project 581c37autils/bazel/llvm-project-overlay/libc BUILD.bazel

[bazel] Port ae1d75e (#200758)
DeltaFile
+1-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-01 files

LLVM/project 63c29dfclang/lib/Serialization ASTReaderDecl.cpp, clang/test/PCH friend-template-spec-redecl.cpp

[Serialization] Fix assertion on re-deserialized friend template spec… (#200566)

…ialization in PCH (#198133)

A friend function-template specialization declared inside a class
template is serialized into a PCH. When the class template is later
instantiated while loading the PCH, the friend specialization can be
deserialized re-entrantly (VisitFriendDecl -> VisitFunctionDecl -> ...
-> VisitFunctionDecl for the same specialization) at the same time as
the canonical copy, producing two redeclarations of the same
specialization in the template's specialization set.

ASTDeclReader::VisitFunctionDecl asserted that this collision could only
happen when merging declarations from different modules. Since
38b3d87bd384, friend functions defined inside dependent class templates
are loaded eagerly, so the collision can now also occur within a single
PCH/AST file (non-modules build), tripping the assertion:

  Assertion failed: (Reader.getContext().getLangOpts().Modules &&

    [7 lines not shown]
DeltaFile
+34-0clang/test/PCH/friend-template-spec-redecl.cpp
+0-2clang/lib/Serialization/ASTReaderDecl.cpp
+34-22 files

LLVM/project ae1d75elibc/src/__support/math hypotf16.h expxf16_utils.h

[libc][math] Guard f16 math headers to fix certain 32-bit ARM builds (#200715)

Wrap hypotf16.h and expxf16_utils.h in LIBC_TYPES_HAS_FLOAT16 macros
like other flaot16 math headers. This fixes build breaks on systems
where float16 is unsupported (like some 32-bit ARM).
DeltaFile
+6-0libc/src/__support/math/hypotf16.h
+6-0libc/src/__support/math/expxf16_utils.h
+12-02 files

LLVM/project e9556fcmlir/lib/Conversion/MathToSPIRV MathToSPIRV.cpp, mlir/test/Conversion/MathToSPIRV math-to-gl-spirv.mlir math-to-opencl-spirv.mlir

[mlir][SPIR-V] Convert math.trunc to GL Trunc and CL trunc (#200739)
DeltaFile
+4-0mlir/test/Conversion/MathToSPIRV/math-to-gl-spirv.mlir
+4-0mlir/test/Conversion/MathToSPIRV/math-to-opencl-spirv.mlir
+2-0mlir/lib/Conversion/MathToSPIRV/MathToSPIRV.cpp
+10-03 files

LLVM/project 52e2280clang/lib/CodeGen TargetInfo.h CodeGenModule.cpp, clang/lib/CodeGen/Targets AMDGPU.cpp SPIR.cpp

[NFCI][clang] Allow overriding any global variable address space

Allow the target to change the AS of a global variable at will, not just whenever Clang cannot assign one.
This enables the next patch that will specialize LDS GVs for barriers as a separate address space.
DeltaFile
+10-9clang/lib/CodeGen/Targets/AMDGPU.cpp
+12-6clang/lib/CodeGen/TargetInfo.h
+7-8clang/lib/CodeGen/Targets/SPIR.cpp
+11-2clang/lib/CodeGen/CodeGenModule.cpp
+5-6clang/lib/CodeGen/TargetInfo.cpp
+6-3clang/lib/CodeGen/Targets/AVR.cpp
+51-346 files

LLVM/project c9f6a05llvm/test/CodeGen/AMDGPU s-barrier-id-allocation.ll, mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td

Fix MLIR
DeltaFile
+21-21llvm/test/CodeGen/AMDGPU/s-barrier-id-allocation.ll
+8-6mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+4-4mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-barriers-gfx12.mlir
+2-2mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+1-1mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+36-345 files

LLVM/project 005e564llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU addrspacecast-barrier.ll s-barrier.ll

[RFC][AMDGPU] Add BARRIER address space

Add a new BARRIER address space that is used for global variables that are used to represent the barrier IDs in GFX12.5.

These barrier addresses just have values corresponding 1-1 to barrier IDs. They are still implemented on top of LDS, but the offsetting happens during an addrspacecast to generic, not whenever the barrier GV is used.

The motivation for this is to make the relation between LDS and barrier GVs explicit in the compiler. It does add a bit more complexity, but that complexity was already there, just hidden by pretending barrier GVs were actual LDS.
DeltaFile
+442-0llvm/test/CodeGen/AMDGPU/addrspacecast-barrier.ll
+62-45llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+54-31llvm/test/CodeGen/AMDGPU/s-barrier.ll
+52-14llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+35-31llvm/test/CodeGen/AMDGPU/s-barrier-lowering.ll
+32-32llvm/test/CodeGen/AMDGPU/amdgpu-lower-exec-sync-and-module-lds.ll
+677-15342 files not shown
+1,108-44148 files

LLVM/project 7e5a386llvm/lib/Target/AMDGPU AMDGPULowerExecSync.cpp

clang-format
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+1-21 files

LLVM/project 62b7cf9llvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU buffer-store-dwordx4-vpk-mul-war-hazard-gfx942.mir

[AMDGPU] Widen MUBUF/MTBUF source-vgpr WAR hazard on gfx940-family to SGPR soffset (#197267)

createsVALUHazard previously gated the MUBUF/MTBUF source-vgpr WAR
hazard to fire only when SOFFSET was a literal or absent. On
gfx940-family subtargets that gate is too narrow: the hazard also fires
when SOFFSET is sourced from an SGPR.

Concretely, on gfx950 a sequence of the form

```
  buffer_store_dwordx4 v[X:X+3], voff, descr, sN offen
  v_pk_mul_f32 v[X:X+1], <src>, <src>           # next VALU cycle
```

deterministically commits the post-pk_mul value of v[X+1] to memory for
the second dword of the store; the other three dwords store correctly.

The wait-state window depends on the SOFFSET shape:


    [20 lines not shown]
DeltaFile
+122-0llvm/test/CodeGen/AMDGPU/buffer-store-dwordx4-vpk-mul-war-hazard-gfx942.mir
+58-21llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+180-212 files

LLVM/project dc01d53llvm/test/CodeGen/AMDGPU s-barrier-id-allocation.ll, mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td

Fix MLIR
DeltaFile
+21-21llvm/test/CodeGen/AMDGPU/s-barrier-id-allocation.ll
+8-6mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+4-4mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-barriers-gfx12.mlir
+2-2mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+1-1mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+36-345 files

LLVM/project 683c367lldb/test/Shell/lldb-server TestGdbserverErrorDarwin.test TestGdbserverErrorMessages.test, lldb/tools/lldb-server lldb-server.cpp

[lldb][lldb-server][Darwin] Error when gdbserver mode is requested (#199654)

Fixes #199035

lldb-server's platform mode works on Apple platforms but the gdbserver
mode does not. Users should use debugserver instead, and platform mode
knows to spawn debugserver instead of lldb-server.

I'm adding an error to state this, because until now it would maybe
appear to work, or crash in strange ways. None of which can be fixed by
a user and are a waste of our time dealing with the bug reports.
DeltaFile
+6-0lldb/tools/lldb-server/lldb-server.cpp
+4-0lldb/test/Shell/lldb-server/TestGdbserverErrorDarwin.test
+3-0lldb/test/Shell/lldb-server/TestGdbserverErrorMessages.test
+13-03 files

LLVM/project 51b54ffclang/lib/AST/ByteCode Compiler.cpp, clang/test/AST/ByteCode cxx23.cpp

[clang][bytecode] Fix discarded dynamic casts (#200723)

If they are checked.
DeltaFile
+4-0clang/test/AST/ByteCode/cxx23.cpp
+2-0clang/lib/AST/ByteCode/Compiler.cpp
+6-02 files

LLVM/project 40b722dmlir/include/mlir/Dialect/SPIRV/IR SPIRVGLOps.td SPIRVCLOps.td, mlir/test/Dialect/SPIRV/IR ocl-ops.mlir gl-ops.mlir

[mlir][SPIR-V] Add GL Trunc and CL trunc ops (#200738)
DeltaFile
+24-0mlir/include/mlir/Dialect/SPIRV/IR/SPIRVGLOps.td
+21-0mlir/include/mlir/Dialect/SPIRV/IR/SPIRVCLOps.td
+20-0mlir/test/Dialect/SPIRV/IR/ocl-ops.mlir
+18-0mlir/test/Dialect/SPIRV/IR/gl-ops.mlir
+2-0mlir/test/Target/SPIRV/gl-ops.mlir
+2-0mlir/test/Target/SPIRV/ocl-ops.mlir
+87-06 files

LLVM/project abdab06mlir/lib/Conversion/MathToSPIRV MathToSPIRV.cpp, mlir/test/Conversion/MathToSPIRV math-to-gl-spirv.mlir

[mlir][SPIR-V] Convert math.fpowi to spirv.GL.Pow (#200563)
DeltaFile
+44-1mlir/lib/Conversion/MathToSPIRV/MathToSPIRV.cpp
+34-0mlir/test/Conversion/MathToSPIRV/math-to-gl-spirv.mlir
+78-12 files

LLVM/project 24b8cfamlir/test/Target/SPIRV selection.mlir

[mlir][SPIR-V] Add roundtrip and validation test for spirv.Switch (NFC) (#200572)

Add missing `spirv-val` tests for spirv.Switch operation
DeltaFile
+29-0mlir/test/Target/SPIRV/selection.mlir
+29-01 files

LLVM/project 2b95597mlir/test/Conversion/MemRefToSPIRV bitwidth-emulation.mlir

[mlir][SPIR-V] Add i64 tests for MemRef bitwidth emulation (NFC) (#200724)
DeltaFile
+40-1mlir/test/Conversion/MemRefToSPIRV/bitwidth-emulation.mlir
+40-11 files

LLVM/project 0cb280fllvm/lib/Target/AArch64 AArch64RegisterInfo.td SMEInstrFormats.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Restrict luti6 (4 regs, 8-bit) to 0 <= Zn <= 7

The `luti6` instruction (table, four registers, 8-bit) should only
allow `0 <= Zn <= 7`, since there's only 3 bits. It actually allows:
```
   luti6 { z0.b - z3.b }, zt0, { z8 - z10 }
```
which produces a duplicate encoding to the following:
```
   luti6 { z0.b - z3.b }, zt0, { z0 - z2 }
```

Fix tablegen to ensure Zn is only allowed in correct range of 0 to 7.
DeltaFile
+15-0llvm/lib/Target/AArch64/AArch64RegisterInfo.td
+5-0llvm/test/MC/AArch64/SME2p3/luti6-diagnostics.s
+4-0llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+1-1llvm/lib/Target/AArch64/SMEInstrFormats.td
+25-14 files

LLVM/project fb8cb1blldb/include/lldb/Target Process.h, lldb/source/Plugins/Process/gdb-remote ProcessGDBRemote.cpp ProcessGDBRemote.h

[lldb] Strip pointer metadata in ReadMemoryRanges (#200398)

The Process base class is generally responsible for fixing pointer
metadata before delegating memory reads to concrete Process
specializations. However, ReadMemoryRanges was a direct path into the
derived classes, which made it so that pointer metadata was never
stripped.

This commit creates a non-virtual ReadMemoryRanges in Process, which
clears pointer metadata, before delegating to the new virtual method
DoReadMemoryRanges. This also allows, in the future, to plug into the
memory cache system.
DeltaFile
+86-0lldb/unittests/Target/MemoryTest.cpp
+8-4lldb/include/lldb/Target/Process.h
+11-0lldb/source/Target/Process.cpp
+5-5lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+4-4lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+114-135 files

LLVM/project fe4b56cutils/bazel/llvm-project-overlay/llvm BUILD.bazel

[bazel] Port 7a435ca (#200749)
DeltaFile
+2-2utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+2-21 files

LLVM/project ce9de13llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU addrspacecast-barrier.ll s-barrier.ll

[RFC][AMDGPU] Add BARRIER address space

Add a new BARRIER address space that is used for global variables that are used to represent the barrier IDs in GFX12.5.

These barrier addresses just have values corresponding 1-1 to barrier IDs. They are still implemented on top of LDS, but the offsetting happens during an addrspacecast to generic, not whenever the barrier GV is used.

The motivation for this is to make the relation between LDS and barrier GVs explicit in the compiler. It does add a bit more complexity, but that complexity was already there, just hidden by pretending barrier GVs were actual LDS.
DeltaFile
+442-0llvm/test/CodeGen/AMDGPU/addrspacecast-barrier.ll
+62-45llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+54-31llvm/test/CodeGen/AMDGPU/s-barrier.ll
+52-14llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+35-31llvm/test/CodeGen/AMDGPU/s-barrier-lowering.ll
+32-32llvm/test/CodeGen/AMDGPU/amdgpu-lower-exec-sync-and-module-lds.ll
+677-15342 files not shown
+1,108-44148 files

LLVM/project 261e0f4libc/config/linux/aarch64 headers.txt, libc/config/linux/riscv headers.txt

[libc] Add netinet/tcp.h header (#200356)

This patch adds the netinet/tcp.h header definition. For now I'm only
adding TCP_NODELAY to it, as that's the only constant specified by
POSIX.

I also include the header in the public headers list for linux targets
and hook it up to the implementation status docs.

I don't add a test as this is just a constant definition, and it would
be very hard to devise (if even possible over a loopback interface) a
test to check that the option has the desired effect (in fact, POSIX
says that an implementation doesn't even have to let you set the
option).

Assisted by Gemini.
DeltaFile
+10-0libc/include/netinet/tcp.yaml
+8-0libc/include/CMakeLists.txt
+3-0libc/utils/docgen/netinet/tcp.yaml
+1-0libc/config/linux/aarch64/headers.txt
+1-0libc/config/linux/riscv/headers.txt
+1-0libc/config/linux/x86_64/headers.txt
+24-02 files not shown
+26-08 files

LLVM/project 4c23489llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/Hexagon/autohvx cttz-elts-split.ll

[TargetLowering] Split CTTZ_ELTS when its step vector requires splitting (#197623)

Follow-up to #190914: getLegalMaskAndStepVector() returns an empty
StepVec to signal that the operation should be split. Only
expandVectorFindLastActive handled this; expandCttzElts crashed by
dereferencing the null SDValue during vector op legalization. Apply the
same split-and-recurse strategy, preferring the low half since CTTZ_ELTS
finds the first active lane.

Assisted-by: Anthropic::claude-4.6
DeltaFile
+354-0llvm/test/CodeGen/Hexagon/autohvx/cttz-elts-split.ll
+23-0llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+377-02 files

LLVM/project 78f429cllvm/lib/Target/AArch64 AArch64SVEInstrInfo.td, llvm/test/CodeGen/AArch64 sve2-sra.ll

[AArch64][SVE] Match (add_like x (lsr y, c)) -> usra x, y, c

Modify SVE USRA pattern to accept add_like, so both add and or disjoint
forms can select usra.

Add known-bits support for predicated SVE logical shifts, allowing
or_disjoint matching to prove disjointness for plain ORs where possible.
DeltaFile
+80-0llvm/test/CodeGen/AArch64/sve2-sra.ll
+2-2llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+82-22 files

LLVM/project 91b77dcllvm/include/llvm/IR DebugLoc.h, llvm/lib/AsmParser LLParser.cpp

[IR] Don't use TrackingMDNodeRef for DebugLoc (#200649)

TrackingMDNodeRef is expensive and the tracking functionality is only
used when parsing textual LLVM IR. Therefore, store a plain DILocation
pointer in DebugLoc and update the debug locs explicitly when parsing
finishes.

Invalid debug metadata now fails directly at parsing and not (just)
later when verifying. A consequence is that old-style DILocations cannot
be parsed from textual IR anymore.

As related cleanup, remove the now-unused hasTrivialDestructor() on
TrackingMDRef.

While work on changing DILocation to no longer be metadata is on the
way, it is going to take a while to finish, we can get this immediate
performance and max-rss improvement earlier.
DeltaFile
+25-38llvm/include/llvm/IR/DebugLoc.h
+0-42llvm/test/Verifier/dbg-declare-invalid-debug-loc.ll
+38-0llvm/test/Assembler/dbg-declare-invalid-debug-loc.ll
+0-36llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+26-2llvm/lib/AsmParser/LLParser.cpp
+6-14llvm/lib/IR/AutoUpgrade.cpp
+95-13229 files not shown
+175-24135 files

LLVM/project fa02a6ellvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/SimplifyCFG rangereduce.ll

[SimplifyCFG] Permit less dense lookup tables (#200664)

It should be most often beneficial to generate a lookup table instead of
a jump table: the lookup table is rarely larger, but saves on
instructions and an indirect branch. Therefore, adjust the lookup table
threshold to match the jump table threshold.

The motivation is clang::Decl::castToDeclContext, which is a rather hot
function when parsing C++ programs, but the switch density is just 38%.
This improves stage2-O0g by 0.17% (7zip/kimwitu++ >0.5%).
DeltaFile
+27-26llvm/test/Transforms/SimplifyCFG/rangereduce.ll
+15-11llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+4-9llvm/test/Transforms/SimplifyCFG/RISCV/switch-of-powers-of-two.ll
+46-463 files

LLVM/project db091a6clang/lib/CodeGen TargetInfo.h CodeGenModule.cpp, clang/lib/CodeGen/Targets AMDGPU.cpp SPIR.cpp

[NFCI][clang] Allow overriding any global variable address space

Allow the target to change the AS of a global variable at will, not just whenever Clang cannot assign one.
This enables the next patch that will specialize LDS GVs for barriers as a separate address space.
DeltaFile
+10-9clang/lib/CodeGen/Targets/AMDGPU.cpp
+12-6clang/lib/CodeGen/TargetInfo.h
+7-8clang/lib/CodeGen/Targets/SPIR.cpp
+11-2clang/lib/CodeGen/CodeGenModule.cpp
+5-6clang/lib/CodeGen/TargetInfo.cpp
+6-3clang/lib/CodeGen/Targets/AVR.cpp
+51-346 files

LLVM/project ddd0616llvm/lib/Target/AMDGPU AMDGPUMemoryUtils.cpp AMDGPUMemoryUtils.h

[NFC][AMDGPU] Generalize some LDS MemoryUtils

In preparation for upcoming work, I need some functions used by the LDS lowering
system to work on any GV. I removed the LDS specific queries inside these functions
and replaced them with functors passed by the caller, so these utility functions can be reused.

I also cleaned-up a few things that weren't up to code, such as lowercase variable names.
DeltaFile
+30-36llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
+37-9llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.h
+20-17llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+21-7llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+7-6llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+115-755 files

LLVM/project 13e62f0llvm/lib/Passes PassBuilderPipelines.cpp, llvm/test/Other new-pm-defaults.ll new-pm-thinlto-postlink-defaults.ll

[LoopInterchange] Enable it by default (#124911)

This enables loop-interchange, which was first discussed here:
https://discourse.llvm.org/t/enabling-loop-interchange/82589

All bugs have been fixed, including DependenceAnalysis, and all
components have at least one maintainer; default enablement is
now meeting the requirements in the Developer's policy.

This has been a major effort by different people, many thanks to:
- Ryotaro Kasuga,
- Madhur Amilkanthwar,
- Sebastian Pop,
- Ehsan Amiri,
- Michael Kruse,
- Nikita Popov,
- Sjoerd Meijer.
DeltaFile
+1-1llvm/lib/Passes/PassBuilderPipelines.cpp
+1-0llvm/test/Other/new-pm-defaults.ll
+1-0llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+1-0llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+1-0llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+5-15 files