LLVM/project aa1f119clang/test lit.cfg.py

[clang][lit] Add option to skip clang-repl checks (#199255)

Whenever lit or llvm-lit is invoked to run clang tests, clang-repl is
run at least once to check for host jit capabilities, and possibly
several more times to probe related capabilities. This adds a noticeable
delay before testing starts, especially for debug builds.

This change adds a lit parameter (clang_skip_clang_repl_checks) and an
environment variable check (CLANG_LIT_SKIP_CLANG_REPL_CHECKS) to allow
the clang-repl probes to be skipped. When this option is used, any tests
that rely on jit execution will be reported as unsupported.

This option is intended only to allow quicker targeted testing during
development. It should not be used for comprehensive verification before
submitting a patch.

On my local test system, executing `ninja check-clang-cir-codegen` with
a previously completed debug build took 18 seconds to run 354 tests with
this option and 53 seconds without it. This is the sort of use case I am
targeting -- lit test runs when the clang-repl overhead will constitute
a significant portion of the total time to execute the tests.
DeltaFile
+10-2clang/test/lit.cfg.py
+10-21 files

LLVM/project 8a55acallvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes SeedCollection.cpp, llvm/test/Transforms/SandboxVectorizer seed_collection_loads.ll

[SandboxVec][SeedCollection] Iterate over all seeds (#195964)

Even though load seeds can already be collected by the seed collector,
the seed collection pass was not iterating over them. This patch fixes
this, we are now iterating over both store and load seeds.
DeltaFile
+44-42llvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes/SeedCollection.cpp
+15-0llvm/test/Transforms/SandboxVectorizer/seed_collection_loads.ll
+59-422 files

LLVM/project 3d9d776utils/bazel/llvm-project-overlay/libc BUILD.bazel, utils/bazel/llvm-project-overlay/llvm BUILD.bazel

[bzl] Reduce the `deps` size of libc's shared_math_header library. (#200006)

There were ~500 of them, which can cause build analysis/metric issues.
Glob the private headers in use, retaining only the support libraries
that have source code.
Make it a cc_library instead of a libc_header_library. Rename it
"apfloat_shared_math_headers" to clarify its limited use case.
DeltaFile
+57-495utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-1utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+58-4962 files

LLVM/project 0e84bfaclang/lib/CIR/CodeGen CIRGenStmt.cpp, clang/lib/CIR/Dialect/Transforms FlattenCFG.cpp

[CIR] Handle the 'before case' block of a switch statement. (#199752)

Before this patch, we would fail any time there was a block with
entry/exit (in this case, one with successors thanks to a label) with a
verification error. This patch adds special handling for that first
block.

This patch DOES choose to not trim them however. Unless there is a label
inside of the block, there isn't any way to get there, and it is dead
code. I've opted to NOT do that optimization, as I suspect that might be
valuable to future passes/something we may wish to warn about in some
sort of CFG analysis.

Additionally, there is some minor changes to FlattenCFG, first to make
sure we skip the switch ONLY if it is truly empty, and second to make
sure we transform any 'break' in the pre-case region.
DeltaFile
+364-0clang/test/CIR/CodeGen/switch-pre-case-stmts.cpp
+29-4clang/lib/CIR/CodeGen/CIRGenStmt.cpp
+17-14clang/lib/CIR/Dialect/Transforms/FlattenCFG.cpp
+410-183 files

LLVM/project 2b3bc03llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt gfx11_dasm_vop3.txt

[AMDGPU] Use shorter form for i16 operands (#198005)

For 16-bit operands an inline constant is zero extended
which in particular allows to use FP constants. These
will have 16 bits of zeroes in the high half and FP16
value in the low 16 bits.

The patch changes semantics of the FP literal argument
used in i16 context in the asm parser to fp16.
DeltaFile
+228-228llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+200-200llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3.txt
+200-200llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3.txt
+194-194llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3.txt
+144-144llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3c.txt
+128-128llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3cx.txt
+1,094-1,09479 files not shown
+3,554-3,75385 files

LLVM/project 6c9225fclang/include/clang/Basic FileManager.h, clang/lib/Basic FileManager.cpp

Reapply "[clang] Use FileError in FileManager::getFileRef, getDirectoryRef" (#199759)

Most callers are unchanged, since they either ignore the specific error
or have their own formatting of the error that includes both the path
and the errorToErrorCode-unwrapped value. However, for clients that just
forward the error it's helpful to ensure we do not lose track of the
filename that the error is associated with, so use FileError.

To reduce the overhead of using FileError, keep it only in the public
getFileRef and getDirectoryRef themselves, and use ErrorOr in the
implementation. For callers of getOptional* this should avoid
allocations for errors entirely, and for callers of getFileRef and
getDirectoryRef it makes the error allocation inlinable, which may (in
theory, not tested) let us optimize it away if the Error is immediately
unwrapped back to an error code, for example.

Incidentally clean up some callers to use getOptionalFileRef where they
throw away the error immediately.
DeltaFile
+69-0clang/unittests/Basic/FileManagerTest.cpp
+22-26clang/lib/Basic/FileManager.cpp
+34-5clang/include/clang/Basic/FileManager.h
+9-13clang/lib/Lex/PPDirectives.cpp
+2-6clang/lib/Lex/HeaderSearch.cpp
+1-2clang/lib/Lex/ModuleMap.cpp
+137-526 files

LLVM/project 34d5523clang/test/Driver msvc-link.c

[Driver][MSVC] Loosen regex for binary name in test (#200015)

`lld-link` will often be installed as `lld-link.exe` on windows. We need
to make sure this test passes either way.
DeltaFile
+1-1clang/test/Driver/msvc-link.c
+1-11 files

LLVM/project 8f03570llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.xor.ll llvm.amdgcn.reduce.add.ll

[AMDGPU] Add regression test for wave.reduce constant folding. (#198673)

Ensure wave.reduce.xor, wave.reduce.add, and wave.reduce.sub with
constant operands are not folded to the input value, since their results
depend on the number of active lanes at runtime.

Assisted-by: Cursor (Claude)
DeltaFile
+171-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+169-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+340-02 files

LLVM/project c8efcf5flang/include/flang/Support Fortran.h, flang/lib/Evaluate formatting.cpp

[flang][PPC] Improve vector type names in expression diagnostics (NFC) (#199383)

Continuation of #197821 
The change only affects vector types; all other types preserve their
existing formatting behavior.

Co-authored-by: virsworld <virpatel at mac.home>
DeltaFile
+6-40flang/lib/Semantics/type.cpp
+36-0flang/lib/Support/Fortran.cpp
+25-0flang/lib/Evaluate/formatting.cpp
+15-0flang/test/Semantics/PowerPC/ppc-vector-diagnostics.f90
+4-0flang/include/flang/Support/Fortran.h
+3-1flang/lib/Semantics/expression.cpp
+89-411 files not shown
+93-417 files

LLVM/project f828c0eclang/lib/CIR/CodeGen CIRGenFunction.cpp

[CIR] Report NYI for STDC FENV_ACCESS under -fclangir

Functions that enable IEEE floating-point intrinsics via `#pragma STDC
FENV_ACCESS ON` or `__attribute__((strict_fp))` set `UsesFPIntrin()` /
`StrictFPAttr` on the `FunctionDecl`. CIR doesn't support the
FP-constrained dialect ops needed to lower such functions correctly, so
`generateCode` now reports `errorNYI` and returns early when either flag
is set, preventing silent miscompilation.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+4-0clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+4-01 files

LLVM/project a3d9354llvm/include/llvm/CodeGen MachineMemOperand.h, llvm/test/CodeGen/X86 branchfolding-atomic-mmo.ll

[CodeGen] Compare MMO atomic ordering and syncscope. (#199892)

MachineMemOperand::operator== compared the address, flags, AA metadata,
range, alignment, and address space, but not atomic success ordering,
failure ordering, or syncscope. Users such as
MachineInstr::cloneMergedMemRefs could therefore treat atomic and
non-atomic MMOs, or atomics with different syncscopes, as identical.

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
DeltaFile
+42-0llvm/test/CodeGen/X86/branchfolding-atomic-mmo.ll
+4-1llvm/include/llvm/CodeGen/MachineMemOperand.h
+46-12 files

LLVM/project 0d8ba69utils/bazel MODULE.bazel.lock .bazelrc

[bazel] Add config for hermetic clang toolchain (#192528)

This config uses the https://github.com/hermeticbuild/hermetic-llvm
toolchain to avoid any dependency on the host compiler. This makes it
trivial to test with remote execution and also supports cross
compilation.
DeltaFile
+33-3utils/bazel/MODULE.bazel.lock
+22-0utils/bazel/.bazelrc
+2-0utils/bazel/MODULE.bazel
+57-33 files

LLVM/project b48d66allvm/include/llvm/CodeGen MachineFunction.h, llvm/lib/CodeGen MachineFunction.cpp

[AMDGPU][MC] Replace shifted registers in CFI instructions

Change-Id: I0d99e9fe43ec3b6fecac20531119956dca2e4e5c
DeltaFile
+67-67llvm/test/CodeGen/AMDGPU/sgpr-spill-overlap-wwm-reserve.mir
+33-0llvm/lib/MC/MCDwarf.cpp
+15-15llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll
+10-0llvm/lib/CodeGen/MachineFunction.cpp
+4-4llvm/test/CodeGen/AMDGPU/debug-frame.ll
+4-0llvm/include/llvm/CodeGen/MachineFunction.h
+133-865 files not shown
+143-9011 files

LLVM/project 4e99a0fllvm/lib/Target/AMDGPU SIFrameLowering.cpp SIMachineFunctionInfo.h, llvm/test/CodeGen/AMDGPU amdgpu-spill-cfi-saved-regs.ll

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
DeltaFile
+2,926-0llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
+12-0llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+10-0llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+9-0llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+2-0llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+2,959-05 files

LLVM/project 74b5fc0llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll gfx-callable-argument-types.ll

[AMDGPU] Implement CFI for CSR spills

Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.

Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.

Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+3,568-2,598llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,912-1,913llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
+2,700-12llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+631-631llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+505-510llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+394-399llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+9,710-6,063108 files not shown
+14,819-9,521114 files

LLVM/project 9df27a0llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

[AMDGPU] Use register pair for PC spill

Change-Id: Ibedeef926f7ff235a06de65a83087c151f66a416
DeltaFile
+4,331-4,331llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,742-1,740llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+1,562-1,560llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+1,462-1,460llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+1,238-1,236llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+1,030-1,028llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+11,365-11,35589 files not shown
+18,153-18,04495 files

LLVM/project b29c1f7

[Clang] Default to async unwind tables for amdgcn

To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.

There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.

Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
DeltaFile
+0-00 files

LLVM/project a8f3ad8

[AMDGPU] Implement CFI for non-kernel functions

This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.

Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+0-00 files

LLVM/project d85a6f0

[AMDGPU] Emit entry function Dwarf CFI

Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.

Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+0-00 files

LLVM/project e384b89

[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU

While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).

Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
DeltaFile
+0-00 files

LLVM/project ec9a1c5

[MIR] Error on signed integer in getUnsigned

Previously we effectively took the absolute value of the APSInt, instead
diagnose the unexpected negative value.

Change-Id: I4efe961e7b29fdf1d5f97df12f8139aac12c9219
DeltaFile
+0-00 files

LLVM/project bf3622ellvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch musttail-tailcc.ll

[LoongArch] Support `tail` calling convention
DeltaFile
+163-0llvm/test/CodeGen/LoongArch/musttail-tailcc.ll
+1-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+164-02 files

LLVM/project 3a8b5e3lldb/tools/lldb-dap OutputRedirector.cpp OutputRedirector.h

[lldb-dap] Use MainLoop instead of a background thread in OutputRedirector. (#199970)

Replace the background thread in OutputRedirector with LLDB's MainLoop
event loop. This reduces the number of threads created and ensures file
descriptors are properly closed when no longer needed.

Since debugger's output is not I/O intensive, there is no risk of
hitting the pipe buffer limit with this approach.
DeltaFile
+39-34lldb/tools/lldb-dap/OutputRedirector.cpp
+23-12lldb/tools/lldb-dap/OutputRedirector.h
+14-11lldb/tools/lldb-dap/DAP.cpp
+76-573 files

LLVM/project d627924mlir/lib/Analysis SliceAnalysis.cpp, mlir/test/Dialect/Affine slicing-utils.mlir

[mlir][SliceAnalysis] Fix visited set to avoid infinite recursion  (#200008)

Fixes #139694, which introduced use-def cycle detection during slice
analysis, but some cycles were still not detected, potentially leading
to infinite recursion.

This PR fixes the handling of the visited set, which tracks the current
DFS path during recursion. Previously, the set could fail to detect
double cycles because entries were erased even when no recursive call
was made. The insert/erase operations are now only performed when
recursion actually occurs, ensuring that cycle detection correctly
reflects the active DFS path.
DeltaFile
+23-0mlir/test/Dialect/Affine/slicing-utils.mlir
+12-8mlir/lib/Analysis/SliceAnalysis.cpp
+35-82 files

LLVM/project f8bf8afclang/lib/Headers wasm_simd128.h, cross-project-tests/intrinsic-header-tests wasm_simd128.c

[WebAssembly] Add f16x8.demote_f32x4_zero to wasm_simd128.h. (#199795)

Missing header intrinsic.
DeltaFile
+8-0clang/lib/Headers/wasm_simd128.h
+6-0cross-project-tests/intrinsic-header-tests/wasm_simd128.c
+14-02 files

LLVM/project 5dc633bllvm/lib/Target/AArch64/GISel AArch64LegalizerInfo.cpp, llvm/test/CodeGen/AArch64 fabs.ll bf16-instructions.ll

[AArch64][GlobalISel] Add BF16 fabs and fneg (#198655)

These should be very simple as they are just legal or expanded based on
whether fullfp16 is available, as the FP16 FNEG and FABS instructions can
be used equally for BF16.
DeltaFile
+42-17llvm/test/CodeGen/AArch64/fabs.ll
+35-12llvm/test/CodeGen/AArch64/bf16-instructions.ll
+21-8llvm/test/CodeGen/AArch64/bf16-v4-instructions.ll
+17-8llvm/test/CodeGen/AArch64/fneg.ll
+12-9llvm/test/CodeGen/AArch64/bf16-v8-instructions.ll
+3-2llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+130-566 files

LLVM/project f16c0ec.github/workflows issue-release-workflow.yml

workflows/issue-release-workflow: Remove template expansion of login names (#199772)

https://github.com/llvm/llvm-project/security/code-scanning/1609
https://github.com/llvm/llvm-project/security/code-scanning/1610
DeltaFile
+2-1.github/workflows/issue-release-workflow.yml
+2-11 files

LLVM/project a8e1f5cflang-rt/cmake/modules AddFlangRTOffload.cmake, flang-rt/include/flang-rt/runtime io-stmt.h

[flang-rt][cuda] Use a thinner I/O in CUDA build (#199769)

Reduce the footprint of IO in the CUDA build. This helps including IO
when using non relocatable device code mode.
DeltaFile
+194-0flang-rt/lib/runtime/io-stmt-minimal.cpp
+36-0flang-rt/lib/runtime/io-api-common.h
+9-0flang-rt/include/flang-rt/runtime/io-stmt.h
+4-1flang-rt/lib/runtime/CMakeLists.txt
+3-0flang-rt/cmake/modules/AddFlangRTOffload.cmake
+246-15 files

LLVM/project 5b38edd.github/workflows pr-code-lint.yml

workflows/pr-code-lint: Pin container image (#199767)

https://github.com/llvm/llvm-project/security/code-scanning/1678
DeltaFile
+1-1.github/workflows/pr-code-lint.yml
+1-11 files

LLVM/project ed918c1llvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/Transforms/AtomicExpand/RISCV atomicrmw-widen-volatile.ll

[AtomicExpand] Preserve volatile in widenPartwordAtomicRMW. (#199722)

widenPartwordAtomicRMW widens a sub-word atomicrmw to the target's
minimum cmpxchg size by calling CreateAtomicRMW, which has no
IsVolatile parameter, and didn't copy isVolatile() from the original.
Every other expansion path in this file already does.  Affects targets
whose MinCmpXchgSizeInBits exceeds the value width (RISC-V without
Zabha, LoongArch base, SPARC, AMDGPU, etc.).

This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
DeltaFile
+41-0llvm/test/Transforms/AtomicExpand/RISCV/atomicrmw-widen-volatile.ll
+1-0llvm/lib/CodeGen/AtomicExpandPass.cpp
+42-02 files