LLVM/project 8fb1715llvm/test/Transforms/LoopVectorize/X86 masked_load_store.ll predicate-switch.ll

[LV][NFC] Regenerate CHECK lines for some tests (#197939)
DeltaFile
+897-897llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll
+302-339llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll
+239-239llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+133-145llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll
+91-114llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll
+95-93llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll
+1,757-1,8272 files not shown
+1,860-1,9308 files

LLVM/project f5f7f8dllvm/docs AMDGPUUsage.rst, llvm/test/CodeGen/AMDGPU memory-legalizer-non-volatile.ll memory-legalizer-non-volatile.mir

Restack + comments
DeltaFile
+2-14llvm/docs/AMDGPUUsage.rst
+4-4llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.ll
+1-1llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.mir
+7-193 files

LLVM/project a7758a9llvm/test/CodeGen/AMDGPU memory-legalizer-non-volatile.mir

Fix MIR test
DeltaFile
+3-3llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.mir
+3-31 files

LLVM/project 1db57b7llvm/docs AMDGPUUsage.rst, llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp SIInstrInfo.h

[AMDGPU][SIMemoryLegalizer] Consider scratch operations as NV=1 if GAS is disabled

- Clarify that `thread-private` MMO flag is still useful.
- If GAS is not enabled (which is the default as of last patch), consider an op as `NV=1` if it's a `scratch_` opcode, or if the MMO is in the private AS.
- Add tests for the new cases.
- Update AMDGPUUsage GFX12.5 memory model
DeltaFile
+181-0llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.mir
+75-36llvm/test/CodeGen/AMDGPU/memory-legalizer-non-volatile.ll
+13-6llvm/docs/AMDGPUUsage.rst
+14-3llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+3-1llvm/lib/Target/AMDGPU/SIInstrInfo.h
+286-465 files

LLVM/project 466eda6lld/test/ELF/lto amdgcn-oses.ll amdgcn.ll

Fix LLD tests
DeltaFile
+3-3lld/test/ELF/lto/amdgcn-oses.ll
+1-1lld/test/ELF/lto/amdgcn.ll
+4-42 files

LLVM/project 2f966a9clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp, clang/test/CodeGenOpenCL builtins-amdgcn-gfx1250-load-monitor.cl

[AMDGPU][Clang] use a ScopeModel when emitting load_monitor

Assisted-By: Claude Opus 4.6
DeltaFile
+17-9clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+20-0clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250-load-monitor.cl
+37-92 files

LLVM/project f020095llvm/test/Transforms/LoopVectorize/RISCV early-exit-live-out.ll

[LV][RISCV] Add strided search test for early-exit vectorization. nfc (#198080)

Co-authored-by: Florian Hahn <flo at fhahn.com>
DeltaFile
+163-0llvm/test/Transforms/LoopVectorize/RISCV/early-exit-live-out.ll
+163-01 files

LLVM/project 0d0f441clang/docs ReleaseNotes.rst, clang/www c_status.html

add exposure
DeltaFile
+27-5clang/docs/ReleaseNotes.rst
+1-1clang/www/c_status.html
+28-62 files

LLVM/project 751f96eclang/test/CodeGenHIP amdgpu-barrier-type.hip, llvm/lib/Target/AMDGPU AMDGPU.h

Address comments
DeltaFile
+25-9clang/test/CodeGenHIP/amdgpu-barrier-type.hip
+16-0llvm/test/CodeGen/AMDGPU/barrier-addrspace-dereference.ll
+2-2llvm/lib/Target/AMDGPU/AMDGPU.h
+2-2llvm/test/CodeGen/AMDGPU/s-barrier-lowering.ll
+0-3llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
+45-165 files

LLVM/project 76d2e53llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULowerExecSync.cpp, llvm/test/CodeGen/AMDGPU addrspacecast-barrier.ll s-barrier.ll

[RFC][AMDGPU] Add BARRIER address space

Add a new BARRIER address space that is used for global variables that are used to represent the barrier IDs in GFX12.5.

These barrier addresses just have values corresponding 1-1 to barrier IDs. They are still implemented on top of LDS, but the offsetting happens during an addrspacecast to generic, not whenever the barrier GV is used.

The motivation for this is to make the relation between LDS and barrier GVs explicit in the compiler. It does add a bit more complexity, but that complexity was already there, just hidden by pretending barrier GVs were actual LDS.
DeltaFile
+442-0llvm/test/CodeGen/AMDGPU/addrspacecast-barrier.ll
+62-45llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+34-54llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+54-31llvm/test/CodeGen/AMDGPU/s-barrier.ll
+36-31llvm/test/CodeGen/AMDGPU/s-barrier-lowering.ll
+33-33llvm/test/CodeGen/AMDGPU/amdgpu-lower-exec-sync-and-module-lds.ll
+661-19436 files not shown
+1,076-45842 files

LLVM/project 276a3adclang/lib/CodeGen TargetInfo.h CodeGenModule.cpp, clang/lib/CodeGen/Targets AMDGPU.cpp SPIR.cpp

[NFCI][clang] Allow overriding any global variable address space

Allow the target to change the AS of a global variable at will, not just whenever Clang cannot assign one.
This enables the next patch that will specialize LDS GVs for barriers as a separate address space.
DeltaFile
+10-9clang/lib/CodeGen/Targets/AMDGPU.cpp
+12-6clang/lib/CodeGen/TargetInfo.h
+7-8clang/lib/CodeGen/Targets/SPIR.cpp
+11-2clang/lib/CodeGen/CodeGenModule.cpp
+5-6clang/lib/CodeGen/TargetInfo.cpp
+6-3clang/lib/CodeGen/Targets/AVR.cpp
+51-346 files

LLVM/project 5a47757.github CODEOWNERS

[AMDGPU] Add @Pierre-vh and @ritter-x2a as memory model code owners

Covers both SIMemoryLegalizer (code sequence lowering) and InsertWaitcnt.
DeltaFile
+7-3.github/CODEOWNERS
+7-31 files

LLVM/project 3bdd54ebolt/lib/Rewrite RewriteInstance.cpp, bolt/test/AArch64 crel-relocs.s

[BOLT] Add support for CREL code relocations (#196383)

BOLT only checked for .rela sections when deciding whether relocation
mode could be enabled. This caused binaries with SHT_CREL code
relocations (such as .crel.text) to be rejected as missing relocations.

No support for SHT_REL has been added in this patch.

Reference:
https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600

Based on https://github.com/llvm/llvm-project/pull/119150 to address
issue https://github.com/llvm/llvm-project/issues/110407.

Co-Author: @0xfk0
DeltaFile
+34-0bolt/test/AArch64/crel-relocs.s
+11-3bolt/lib/Rewrite/RewriteInstance.cpp
+45-32 files

LLVM/project 17bc1caclang/test/Sema attr-counted-by-late-parsed-struct-ptrs-anon.c, llvm/test/TableGen/GlobalISelEmitter MatchTableOptimizerRecursion.td

Merge branch 'main' into users/kasuga-fj/da-consolidate-acc-gcd
DeltaFile
+204-0llvm/test/TableGen/GlobalISelEmitter/MatchTableOptimizerRecursion.td
+84-0clang/test/Sema/attr-counted-by-late-parsed-struct-ptrs-anon.c
+64-19llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp
+46-1mlir/lib/Conversion/GPUToSPIRV/WmmaOpsToSPIRV.cpp
+3-34llvm/utils/TableGen/GlobalISelEmitter.cpp
+1-34llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
+402-886 files not shown
+459-11412 files

LLVM/project 0f6f00aclang/lib/Parse ParseDecl.cpp, clang/lib/Sema SemaBoundsSafety.cpp

[Sema] Call ActOnFields before late parsing in ParseStructUnionBody (#187166)

Implements for:  #186914

Move the call to `ActOnFields()` before `ParseLexedCAttributeList()` in
ParseStructUnionBody for reordering so that the struct type is complete
when late-parsed attributes like counted_by get evaluated. This is a
prerequisite for supporting sizeof/offsetof expressions in counted_by
evaluation.

Update the heuristic for `GetEnclosingNamedOrTopAnonRecord`. Remove the
`isCompleteDefinition()` condition since it will always return true
under the new ordering. The `GetEnclosingNamedOrTopAnonRecord` intend to
treat the unnamed and anonymous struct permissively.

Add one test to verify the new ordering still make sure the function of
unnamed and anonymous struct works normally.
DeltaFile
+84-0clang/test/Sema/attr-counted-by-late-parsed-struct-ptrs-anon.c
+3-4clang/lib/Sema/SemaBoundsSafety.cpp
+3-3clang/lib/Parse/ParseDecl.cpp
+90-73 files

LLVM/project a435216llvm/test/CodeGen/AMDGPU memory-legalizer-private-wavefront.ll memory-legalizer-private-singlethread.ll

Rebase
DeltaFile
+1,994-950llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+1,994-950llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+1,994-950llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+1,971-939llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+1,971-939llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+1,879-899llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+11,803-5,6276 files

LLVM/project 61bf9b8llvm/lib/Target/AMDGPU AMDGPUMemoryUtils.cpp AMDGPUMemoryUtils.h

[NFC][AMDGPU] Generalize some LDS MemoryUtils

In preparation for upcoming work, I need some functions used by the LDS lowering
system to work on any GV. I removed the LDS specific queries inside these functions
and replaced them with functors passed by the caller, so these utility functions can be reused.

I also cleaned-up a few things that weren't up to code, such as lowercase variable names.
DeltaFile
+30-36llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
+37-9llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.h
+20-17llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+24-10llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+7-6llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+118-785 files

LLVM/project ba70c52llvm/test/CodeGen/AMDGPU memory-legalizer-private-agent.ll memory-legalizer-private-system.ll

[AMDGPU] Make globally-addressable-scratch opt-in

This feature is meant to be opt-in for more advanced users, not default-enabled.
It may reduce performance otherwise as we can't assume private AS is thread-local
when it is enabled.

- Add `HasGloballyAddressableScratchSupport` feature to check if a target's scratch
  addressing is changed due to support for globally addressable scratch.
- Use `EnableGloballyAddressableScratch` to check whether the user opted into
  globally addressable scratch. This affects whether to lower scratch atomics as flat,
  and in the future will affect whether NV=1 can be set on scratch accesses.
DeltaFile
+4,816-4,142llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+4,584-3,938llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll
+4,595-3,921llvm/test/CodeGen/AMDGPU/memory-legalizer-private-cluster.ll
+4,564-3,881llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll
+4,412-3,729llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll
+4,412-3,729llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll
+27,383-23,34013 files not shown
+27,647-23,49719 files

LLVM/project a314c8fllvm/docs AMDGPUUsage.rst, llvm/lib/Target/AMDGPU GCNSubtarget.cpp AMDGPU.td

Comments
DeltaFile
+74-64llvm/docs/AMDGPUUsage.rst
+9-0llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+1-7llvm/lib/Target/AMDGPU/AMDGPU.td
+1-4llvm/lib/Target/AMDGPU/GCNSubtarget.h
+1-1llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-private-gas.ll
+1-1llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
+87-778 files not shown
+95-8514 files

LLVM/project 06d50acllvm/test/CodeGen/AArch64/GlobalISel select-intrinsic-aarch64-sdiv.mir, llvm/test/TableGen/GlobalISelEmitter MatchTableOptimizerRecursion.td

[GlobalISel] Recursively Optimise MatchTable Matchers (#197143)

The core of this change is the additional call to `Matcher::optimize()`
in the `optimizeRules` function,
which enables the match table optimization logic to recurse on the
children of every GroupMatcher, forming
additional groups (which hoist more common predicates into a shared
group).

To enable that, I had to update the `getFirstConditionAsRootType`
implementation to support `GroupMatcher`.
I also included a small refactoring of the match table optimization
pipeline that was identical between the
GlobalISel and GlobalISelCombiner emitters.

The results of this change are up to a 25% size reduction for GlobalISel
match tables.
There is a tiny increase (a few bytes) in a combiner table because we
now create new groups

    [16 lines not shown]
DeltaFile
+204-0llvm/test/TableGen/GlobalISelEmitter/MatchTableOptimizerRecursion.td
+64-19llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp
+3-34llvm/utils/TableGen/GlobalISelEmitter.cpp
+1-34llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
+12-7llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.h
+4-4llvm/test/CodeGen/AArch64/GlobalISel/select-intrinsic-aarch64-sdiv.mir
+288-981 files not shown
+291-1007 files

LLVM/project 992df0allvm/utils/TableGen/Common/GlobalISel GlobalISelMatchTable.h

[GlobalISel][MatchTable] Fix RTTI of Imm/ImmPredicate classes (#197142)
DeltaFile
+7-5llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.h
+7-51 files

LLVM/project 114870fllvm/docs AMDGPUUsage.rst, llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp

[AMDGPU] Clamp load_monitor scope to minimum SCOPE_SE

The load_monitor instructions monitor L2 cache lines and therefore require at
least SCOPE_SE to ensure the L2 cache is hit. The current memory model requires
the user to ensure that the specified scope is such that it results in at least
SCOPE_SE, otherwise the behaviour is undefined. Instead, we now clamp the
emitted scope at a minimum of SCOPE_SE, so that the undefined behaviour is
converted into a performance loss instead.

Assisted-By: Claude Opus 4.6
DeltaFile
+37-3llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.monitor.gfx1250.ll
+25-0llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+9-6llvm/docs/AMDGPUUsage.rst
+71-93 files

LLVM/project 4b23504clang/include/clang/Parse Parser.h, clang/lib/Parse ParseExprCXX.cpp ParseDecl.cpp

[Clang][C2y] Add support for if declaration
DeltaFile
+59-0clang/test/C/C2y/n3267.c
+36-11clang/lib/Parse/ParseExprCXX.cpp
+4-6clang/include/clang/Parse/Parser.h
+3-2clang/lib/Parse/ParseDecl.cpp
+2-2clang/lib/Parse/ParseTentative.cpp
+1-1clang/lib/Parse/ParseStmt.cpp
+105-221 files not shown
+107-227 files

LLVM/project 66f30c1clang/include/clang/Parse Parser.h, clang/lib/Parse ParseExprCXX.cpp ParseDecl.cpp

[Clang][C2y] Add support for if declaration
DeltaFile
+59-0clang/test/C/C2y/n3267.c
+38-12clang/lib/Parse/ParseExprCXX.cpp
+4-6clang/include/clang/Parse/Parser.h
+3-2clang/lib/Parse/ParseDecl.cpp
+2-2clang/lib/Parse/ParseTentative.cpp
+1-1clang/lib/Parse/ParseStmt.cpp
+107-231 files not shown
+109-237 files

LLVM/project ee54401mlir/lib/Conversion/GPUToSPIRV WmmaOpsToSPIRV.cpp, mlir/test/Conversion/GPUToSPIRV wmma-ops-to-spirv-khr-coop-matrix.mlir

[mlir][spirv] Set signed coop matrix operands (#197932)

Populate CooperativeMatrixOperandsKHR on KHR cooperative matrix
multiply-add based on the cooperative matrix element types. Signed
integer A, B, C and result matrices require their corresponding signed
component bits; otherwise SPIR-V treats those integer components as
unsigned.

Added lit test

Co-authored-by: Hsiangkai Wang <hsiangkai.wang at arm.com>
DeltaFile
+46-1mlir/lib/Conversion/GPUToSPIRV/WmmaOpsToSPIRV.cpp
+25-1mlir/test/Conversion/GPUToSPIRV/wmma-ops-to-spirv-khr-coop-matrix.mlir
+71-22 files

LLVM/project aca9f83clang/test/CIR/CodeGenBuiltins/RISCV riscv-zbb.c, llvm/unittests/DebugInfo/LogicalView DWARFReaderTest.cpp

Merge branch 'main' into users/kasuga-fj/da-consolidate-acc-gcd
DeltaFile
+57-21clang/test/CIR/CodeGenBuiltins/RISCV/riscv-zbb.c
+75-1llvm/unittests/DebugInfo/LogicalView/DWARFReaderTest.cpp
+58-0mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+51-0mlir/test/Dialect/Linalg/canonicalize.mlir
+47-0mlir/test/Target/LLVMIR/nvvm/sqrt/sqrt.mlir
+38-1mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+326-2328 files not shown
+538-16434 files

LLVM/project aa1081flibc/src/__support/OSUtil/linux syscall.h auxv.h, libc/src/__support/threads/linux thread.cpp

[libc] Introduce a typed syscall wrapper and use it in mmap (#197459)

Linux reserves a range of values (everything above -4096u, aka
MAX_ERRNO) as an error value, so the check can be performed without
knowing the details of the specific syscall. libc functions where these
values would be a valid result (e.g. PTRACE_PEEKDATA) are implemented
differently at the kernel level (e.g. returning the result through a
pointer argument). The only exception are a handful of syscalls (getpid,
getuid, ...) which can never fail, and where this could be an actual
user/group ID (particularly on 32-bit systems).

Specifically, for mmap, this lets us remove the is_valid_mmap helper and
SYS_mmap2 ifdefs in various places.

More generally, this can simplify many syscall wrappers as often the
only thing they are doing is converting the return value into an
ErrorOr.
DeltaFile
+14-23libc/src/__support/threads/linux/thread.cpp
+18-10libc/src/__support/OSUtil/linux/syscall.h
+7-20libc/startup/linux/x86_64/tls.cpp
+7-20libc/startup/linux/riscv/tls.cpp
+7-19libc/startup/linux/aarch64/tls.cpp
+7-11libc/src/__support/OSUtil/linux/auxv.h
+60-1037 files not shown
+76-12313 files

LLVM/project 85db723mlir/include/mlir/Dialect/LLVMIR NVVMOps.td, mlir/lib/Dialect/LLVMIR/IR NVVMDialect.cpp

[MLIR][NVVM] Add sqrt Ops (#197422)

Adds two NVVM dialect ops covering all 14 floating-point `sqrt` forms:

- `nvvm.sqrt` -- IEEE-compliant sqrt with explicit rounding mode
  (`sqrt.<RM>[.ftz].{f32,f64}`), 12 forms.
- `nvvm.sqrt.approx` -- fast approximate sqrt (`sqrt.approx[.ftz].f32`),
  2 forms; uses the `NVVM_F32UnaryApproxOp` base class.

The two ops are split because the rounded forms require an explicit rounding mode and support both f32 and f64, while the approx forms have no rounding mode and are f32-only.
DeltaFile
+58-0mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+47-0mlir/test/Target/LLVMIR/nvvm/sqrt/sqrt.mlir
+35-0mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+17-0mlir/test/Target/LLVMIR/nvvm/sqrt/sqrt_invalid.mlir
+157-04 files

LLVM/project c9a79d5llvm/lib/MC MCAsmStreamer.cpp MCTargetOptionsCommandFlags.cpp, llvm/test/tools/llvm-mc show-source-loc.s

[𝘀𝗽𝗿] initial version

Created using spr 1.3.8-beta.1
DeltaFile
+51-0llvm/test/tools/llvm-mc/show-source-loc.s
+40-0llvm/lib/MC/MCAsmStreamer.cpp
+7-0llvm/lib/MC/MCTargetOptionsCommandFlags.cpp
+6-0llvm/tools/llvm-mc/llvm-mc.cpp
+3-3llvm/lib/MC/MCTargetOptions.cpp
+5-0llvm/test/tools/llvm-mc/Inputs/show-source-loc.inc
+112-33 files not shown
+116-59 files

LLVM/project 8553a27flang-rt/test/Driver write01.f90

[flang-rt][test] Fix write01.f90 missing LD_LIBRARY_PATH (introduced in #187662)

The test binary was run without setting LD_LIBRARY_PATH, causing
libflang_rt.runtime.so to not be found at runtime. Match the pattern
used by exec.f90 and ctofortran.f90.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+2-1flang-rt/test/Driver/write01.f90
+2-11 files