LLVM/project c162731compiler-rt/lib/builtins/arm dcmp.h cmpdf2.S, compiler-rt/lib/builtins/arm/thumb1 dcmp.h gedf2.S

Update for rename of endian.h in a previous patch
DeltaFile
+2-2compiler-rt/lib/builtins/arm/dcmp.h
+2-2compiler-rt/lib/builtins/arm/thumb1/dcmp.h
+1-1compiler-rt/lib/builtins/arm/cmpdf2.S
+1-1compiler-rt/lib/builtins/arm/gedf2.S
+1-1compiler-rt/lib/builtins/arm/thumb1/gedf2.S
+1-1compiler-rt/lib/builtins/arm/thumb1/unorddf2.S
+8-82 files not shown
+10-108 files

LLVM/project 3174273llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge from main
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-019,504 files not shown
+2,315,743-579,95619,510 files

LLVM/project 0c539fccompiler-rt/lib/builtins CMakeLists.txt, compiler-rt/lib/builtins/arm divdf3.S muldf3.S

[compiler-rt][ARM] Optimized double-precision FP mul/div (#179923)

Optimized AArch32 implementations of `muldf3` and `divdf3` are provided.
The division function is particularly tricky because its Newton-Raphson
approximation strategy requires a rigorous error bound. In this version
of the commit I've left out the full supporting machinery that validates
the error bound via Gappa and Rocq, but full details are provided via
links to the upstream version of this code in the Arm Optimized Routines
repository, and to a pair of Arm Community blog posts.
DeltaFile
+862-0compiler-rt/test/builtins/Unit/divdf3new_test.c
+832-0compiler-rt/test/builtins/Unit/muldf3new_test.c
+646-0compiler-rt/lib/builtins/arm/divdf3.S
+404-0compiler-rt/lib/builtins/arm/muldf3.S
+2-0compiler-rt/lib/builtins/CMakeLists.txt
+2,746-05 files

LLVM/project aa20895llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal
DeltaFile
+8-15llvm/test/MC/AMDGPU/literals.s
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+15-162 files

LLVM/project e1135dcoffload/plugins-nextgen/level_zero/include L0Kernel.h, offload/plugins-nextgen/level_zero/src L0Kernel.cpp

[OFFLOAD][L0] Simplify kernel setGroups logic (#197411)

This code path is not really used with upstream code generation.
DeltaFile
+12-220offload/plugins-nextgen/level_zero/src/L0Kernel.cpp
+0-51offload/plugins-nextgen/level_zero/include/L0Kernel.h
+12-2712 files

LLVM/project 2045ee5.github CODEOWNERS

Add new libc GH team to CODEOWNERS (#197630)

This auto-assigns PR reviewers, per the GitHub documentation.
DeltaFile
+1-0.github/CODEOWNERS
+1-01 files

LLVM/project 6990b14llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal
DeltaFile
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+2-5llvm/test/MC/AMDGPU/literals.s
+9-62 files

LLVM/project f098a22clang/docs ClangFormatStyleOptions.rst, clang/include/clang/Format Format.h

[clang-format] Add BreakBeforeReturnType option (#197268)

In certain codebases (e.g. embedded) — function declarations could
accumulate a long prefix of specifiers and attributes (`static`,
`inline`, `__attribute__((...))`, project-specific `AttributeMacros`,
etc.) before the return type, which buries the core prototype and pushes
parameters past the column limit.

This patch adds a `BreakBeforeReturnType` style option that places that
prefix on its own line(s):

```cpp
__attribute__((always_inline)) static inline
int do_thing(int a, int b, int c);
```

The recognized prefix tokens are function/storage specifiers (`static`,
`extern`, `inline`, `virtual`, `constexpr`, `consteval`, `friend`,
`export`, `_Noreturn`, `__forceinline`), C++11 attribute groups

    [16 lines not shown]
DeltaFile
+166-0clang/unittests/Format/FormatTest.cpp
+72-0clang/lib/Format/TokenAnnotator.cpp
+32-0clang/docs/ClangFormatStyleOptions.rst
+26-0clang/include/clang/Format/Format.h
+15-0clang/lib/Format/Format.cpp
+12-0clang/unittests/Format/ConfigParseTest.cpp
+323-04 files not shown
+339-110 files

LLVM/project 3e96e2bllvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge from main in the hope of fixing the unrelated CI failure
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-018,620 files not shown
+2,257,597-555,50718,626 files

LLVM/project dfb23a8compiler-rt/lib/builtins/arm divdf3.S

Correct symbol name in comment
DeltaFile
+2-2compiler-rt/lib/builtins/arm/divdf3.S
+2-21 files

LLVM/project a41d58eclang/test/CodeGen/AArch64 neon-perm.c poly64.c, clang/test/CodeGen/AArch64/fp8-intrinsics acle_neon_fp8_untyped.c

[CIR][AArch64] Lower NEON vtrn1/2 intrinsics (#197112)

### Summary

part of : https://github.com/llvm/llvm-project/issues/185382

Lower `vtrn1` and `vtrn2` intrinsics in
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#transpose-elements

All the intrinsics are handled inline in
llvm-project/build/lib/clang/23/include/arm_neon.h like:

```
#ifdef __LITTLE_ENDIAN__
__ai __attribute__((target("neon"))) int8x8x2_t vtrn_s8(int8x8_t __p0, int8x8_t __p1) {
  int8x8x2_t __ret;
  __builtin_neon_vtrn_v(&__ret, __builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 0);
  return __ret;
}

    [14 lines not shown]
DeltaFile
+533-0clang/test/CodeGen/AArch64/neon/perm.c
+1-421clang/test/CodeGen/AArch64/neon-perm.c
+0-40clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c
+0-20clang/test/CodeGen/AArch64/poly64.c
+534-4814 files

LLVM/project fc4aad7clang/lib/CodeGen CGDecl.cpp, clang/test/CodeGenCoroutines coro-param-fake-use.cpp

[Clang][Coroutines] Don't emit fake uses for coroutine parameters (#194690)

Fixes issue: https://github.com/llvm/llvm-project/issues/192351

The combination of coroutines with -fextend-variable-liveness has
resulted in use-after-free, caused by the fact that we insert fake uses
of coroutine parameters at the end of the coroutine. While this is fine
for normal functions, in coroutines these variables are stored in the
coroutine frame, which is freed before the end of the function; this
results in us loading from the deleted frame.

This patch fixes this by no longer emitting fake uses for most coroutine
parameters. Since coroutine parameters will be saved back to the frame
when we suspend, and currently may not be optimized out, fake uses are
not needed in this case, and so by not emitting them we avoid dealing
with the complexity of updating fake uses in the CoroSplit pass. The
exception to this is 'this', which is not saved to the frame.

(cherry picked from commit efb01c1bf558eaaf8ec64e1a54110584e827f21b)
DeltaFile
+42-0clang/test/CodeGenCoroutines/coro-param-fake-use.cpp
+7-2clang/lib/CodeGen/CGDecl.cpp
+49-22 files

LLVM/project 7625a2flldb/source/Plugins/LanguageRuntime/ObjC ObjCLanguageRuntime.h

fixup! change small size to 2
DeltaFile
+1-1lldb/source/Plugins/LanguageRuntime/ObjC/ObjCLanguageRuntime.h
+1-11 files

LLVM/project 5fb52fcclang/lib/CodeGen CoverageMappingGen.cpp, clang/test/CoverageMapping system_macro_switch.cpp

[Coverage] Fix assertion failure when a -isystem header invokes a user macro (#195427)

```
  // a.cc
  static void foo(int x) {
    switch (x) {
  #define GENERIC(n) case n:
  #include "types.def"   // -isystem header invokes a user macro
      break;
    }
  }

  // sys/types.def
  #define MID(name) GENERIC(name)
  MID(0)
  MID(1)
  MID(2)
```


    [20 lines not shown]
DeltaFile
+42-0clang/test/CoverageMapping/system_macro_switch.cpp
+16-11clang/lib/CodeGen/CoverageMappingGen.cpp
+58-112 files

LLVM/project 13a6287llvm/unittests/Target/AArch64 InstSizes.cpp

[AArch64][test] Fix use-after-scope in createInstrInfo (#197622)

https://github.com/llvm/llvm-project/pull/183506 revealed a pre-existing
use-after-scope in createInstrInfo (MSan bot:
https://lab.llvm.org/buildbot/#/builders/164/builds/21562 [*]).

This patch fixes the issue by changing the stack-allocated
AArch64Subtarget (which goes out of scope once createInstrInfo()
returns) into heap-allocated, allowing it to be safely stored in the
returned AArch64InstrInfo.

-----

[*] WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55555666fabd in
llvm::AArch64InstrInfo::getInstSizeInBytes(llvm::MachineInstr const&)
const
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:247:5
...

    [19 lines not shown]
DeltaFile
+27-21llvm/unittests/Target/AArch64/InstSizes.cpp
+27-211 files

LLVM/project a8ff36allvm/test/CodeGen/AMDGPU srem.ll load-global-i8.ll, llvm/test/CodeGen/AMDGPU/GlobalISel sdivrem.ll udivrem.ll

Merge upstream/main into users/mariusz-sikora-at-amd/gfx13/add-more-types-to-permlane
DeltaFile
+6,862-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Nano-sve-instructions.s
+3,436-2,769llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
+4,686-918llvm/test/CodeGen/X86/vector-reduce-ctpop.ll
+2,801-2,109llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
+2,144-2,147llvm/test/CodeGen/AMDGPU/srem.ll
+1,647-1,991llvm/test/CodeGen/AMDGPU/load-global-i8.ll
+21,576-9,9342,028 files not shown
+107,220-47,0012,034 files

LLVM/project 7ae25fbllvm/test/CodeGen/AArch64 neon-dotreduce.ll fsh.ll

[AArch64] Keep MMO when converting gather lane to LDRSui. (#197522)

We were losing the MMO when converting the load. Make sure we copy them
over, which apparently alters codegen more than I expected and helps
keep postinc generation after #196305.
DeltaFile
+183-183llvm/test/CodeGen/AArch64/neon-dotreduce.ll
+70-70llvm/test/CodeGen/AArch64/fsh.ll
+58-58llvm/test/CodeGen/AArch64/complex-deinterleaving-uniform-cases.ll
+32-32llvm/test/CodeGen/AArch64/fp-maximumnum-minimumnum.ll
+26-26llvm/test/CodeGen/AArch64/nontemporal-store.ll
+17-8llvm/test/CodeGen/AArch64/concat-vector.ll
+386-3772 files not shown
+401-3908 files

LLVM/project e5b06d8llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp

Use StandardB
DeltaFile
+4-7llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+4-71 files

LLVM/project 3272c56llvm/test/CodeGen/AMDGPU amdgpu-codegenprepare-idiv.ll udiv.ll

[AMDGPU] Remove RCP_IFLAG combine (#197426)

The combine was added in D48569 8 years ago with the aim of preserving
flags, but the current LangRef says the status flags are not observable
in the default FP environment.

The main motivation for this change is to enable scalar float reciprocal
generation v_s_rcp_f32 on newer hardware. There is no v_s_rcp_iflag_f32,
so the combine effectively blocks the selection.
See: pseudo-scalar-transcendental.ll.
DeltaFile
+160-160llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
+52-52llvm/test/CodeGen/AMDGPU/udiv.ll
+40-46llvm/test/CodeGen/AMDGPU/udivrem24.ll
+77-8llvm/test/CodeGen/AMDGPU/rcp_iflag.ll
+33-33llvm/test/CodeGen/AMDGPU/sdiv.ll
+32-32llvm/test/CodeGen/AMDGPU/permute_i8.ll
+394-33119 files not shown
+577-53825 files

LLVM/project 1d93fc4libc CMakeLists.txt, libc/config/linux/aarch64 entrypoints.txt

[libc] Add LLVM_LIBC_ENABLE_EXPERIMENTAL_ENTRYPOINTS CMake flag (#197537)

Adds a new CMake option, OFF by default, to gate entrypoints with
known-incomplete implementations. This lets developers build and test
partially-implemented functions without exposing them to production
users.

The motivating case is `sysconf`, which only handles three of the
required `_SC_*` constants (`_SC_PAGESIZE`, `_SC_NPROCESSORS_CONF`,
`_SC_NPROCESSORS_ONLN`) and returns `EINVAL` for everything else.
Functions like this are useful to have in a build for testing progress,
but shouldn't be part of a default full build until the implementation
is complete.

Changes:
- `libc/CMakeLists.txt`: adds
`option(LLVM_LIBC_ENABLE_EXPERIMENTAL_ENTRYPOINTS ... OFF)`
- `libc/cmake/modules/LLVMLibCCompileOptionRules.cmake`: propagates
`-DLIBC_EXPERIMENTAL_ENTRYPOINTS` when ON

    [6 lines not shown]
DeltaFile
+6-1libc/config/linux/aarch64/entrypoints.txt
+6-1libc/config/linux/riscv/entrypoints.txt
+6-1libc/config/linux/x86_64/entrypoints.txt
+2-0libc/CMakeLists.txt
+20-34 files

LLVM/project 29206d7clang/include/clang/Sema Sema.h, clang/lib/Parse ParseOpenMP.cpp

[OpenMP] Fix launch_bounds for OpenMP ompx_attribute (#195665)

This commit fixes the handling of `launch_bounds` within OpenMP's
`ompx_attribute`. The third attribute value, the maximum blocks, was not
parsed correctly.
DeltaFile
+16-9clang/lib/Sema/SemaDeclAttr.cpp
+10-4clang/test/OpenMP/thread_limit_gpu.c
+5-3clang/include/clang/Sema/Sema.h
+3-2clang/lib/Parse/ParseOpenMP.cpp
+34-184 files

LLVM/project 35f5d7ellvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp

[AArch64][GlobalISel] Fast-path common G_CONSTANT/G_BRCOND/G_FRAME_INDEX regbank mappings (#197383)

Returning the default register-bank mapping directly for these opcodes
is a -0.17% compile-time improvement on aarch64-O0-g.

https://llvm-compile-time-tracker.com/compare.php?from=b4aa4d4dcb6f1c8a00d1d1e53d2b353c97ec98b7&to=0779891fc6bf6a01e4f14d3f359e212c6ec52c0d&stat=instructions%3Au

Assisted-by: codex
DeltaFile
+20-0llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+20-01 files

LLVM/project 131d66cbolt/lib/Core DIEBuilder.cpp, bolt/test/X86 dwarf5-locexpr-regval-type.s dwarf5-form-ref-udata.s

[BOLT][DWARF] Support DW_FORM_ref_udata and DW_OP_regval_type (#197565)

Add support for DWARF opcodes seen in GCC-generated binaries:

- DW_FORM_ref_udata: ULEB128-encoded CU-relative DIE reference.

- DW_OP_regval_type (0xa5): DWARF5 expression opcode with operands
(SizeLEB, BaseTypeRef). The BaseTypeRef was not being updated when DIEs
were relocated because cloneExpression only handled (Size1, BaseTypeRef)
patterns. Generalized the first-operand copying to use raw bytes from
the data stream instead of assuming a single byte.

Fixes #188250

Assisted-by: Claude Opus 4.6/4.7
DeltaFile
+83-0bolt/test/X86/dwarf5-locexpr-regval-type.s
+70-0bolt/test/X86/dwarf5-form-ref-udata.s
+8-5bolt/lib/Core/DIEBuilder.cpp
+161-53 files

LLVM/project ac8361dllvm/lib/Target/X86 X86InstrCompiler.td, llvm/test/CodeGen/X86 atomic-load-store.ll

[X86] Remove extra MOV after widening atomic store

This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
DeltaFile
+47-64llvm/test/CodeGen/X86/atomic-load-store.ll
+99-0llvm/lib/Target/X86/X86InstrCompiler.td
+146-642 files

LLVM/project 6cdd328llvm/lib/AsmParser LLParser.cpp, llvm/test/Assembler thinlto-vtable-skip.ll thinlto-bad-summary1.ll

Handle typeidCompatibleVTable in skipModuleSummaryEntry (#196849)

This method needs to match the set of cases handled in parseSummaryEntry.
DeltaFile
+11-0llvm/test/Assembler/thinlto-vtable-skip.ll
+6-5llvm/lib/AsmParser/LLParser.cpp
+1-1llvm/test/Assembler/thinlto-bad-summary1.ll
+18-63 files

LLVM/project 12e06d7mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

Remove unrelated empty line
DeltaFile
+0-1mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+0-11 files

LLVM/project 83ae5ccflang/include/flang/Semantics semantics.h, flang/lib/Semantics resolve-directives.cpp rewrite-parse-tree.cpp

[flang][openacc] allow duplicate data sharing clauses (#197019)

This PR allows duplicate OpenACC `private` and `firstprivate` clauses.
While maintaining the restriction on `reduction` clauses.
DeltaFile
+122-0flang/test/Semantics/OpenACC/acc-dataclause-dedup.f90
+63-0flang/test/Lower/OpenACC/acc-dedup-private.f90
+27-16flang/lib/Semantics/resolve-directives.cpp
+28-0flang/test/Parser/acc-dedup-unparse.f90
+11-0flang/include/flang/Semantics/semantics.h
+10-0flang/lib/Semantics/rewrite-parse-tree.cpp
+261-161 files not shown
+262-177 files

LLVM/project 4f60fb9flang/docs Directives.md, flang/lib/Semantics expression.cpp

[flang][cuda] Honor !dir$ ignore_tkr(m) under -gpu=mem:{unified,managed} (#197518)

A device-typed dummy with `!dir$ ignore_tkr(m)` is meant to be an
overload discriminator (only selected for actuals with an explicit
`device/managed/unified` attribute). Skip the host->device relaxation in
AreCompatibleCUDADataAttrs when `IgnoreTKR::Managed` is set so
unattributed host actuals no longer bind to such a dummy.

Also document the §3.2.3 matching distance table next to
GetMatchingDistance and add LIT tests for the full Table 2 grid
and the ignore_tkr(m) carve-out.
DeltaFile
+90-0flang/test/Semantics/cuf-matching-distance.cuf
+56-0flang/test/Semantics/cuf-ignore-tkr-m-generic.cuf
+36-0flang/docs/Directives.md
+32-0flang/test/Semantics/cuf-ignore-tkr-m-error.cuf
+23-2flang/lib/Semantics/expression.cpp
+13-5flang/lib/Support/Fortran.cpp
+250-76 files

LLVM/project e2b5048llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Validate forced lit() immediate (#196623)

Right now it takes validation path of an inline constant if fits
even though it is forced to literal encoding.
DeltaFile
+7-8llvm/test/MC/AMDGPU/literals.s
+7-1llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+14-92 files

LLVM/project 1ee6e9cllvm/lib/ProfileData InstrProf.cpp, llvm/lib/Transforms/Instrumentation PGOMemOPSizeOpt.cpp

fix

Created using spr 1.3.7
DeltaFile
+0-58llvm/test/Transforms/PGOProfile/consecutive-zeros.ll
+33-16llvm/lib/ProfileData/InstrProf.cpp
+0-47llvm/test/Transforms/PGOProfile/Inputs/consecutive-zeros.proftext
+4-38llvm/test/Transforms/JumpTableToSwitch/profile-no-guid-metadata.ll
+0-7llvm/lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp
+37-1665 files