LLVM/project 652a978clang/test/AST ast-dump-templates.cpp, llvm/test/CodeGen/AArch64 bf16-v8-instructions.ll

Merge remote-tracking branch 'origin/main' into xteam-red-runtime
DeltaFile
+652-9,343clang/test/AST/ast-dump-templates.cpp
+7,584-740llvm/test/CodeGen/AArch64/bf16-v8-instructions.ll
+8,195-0llvm/test/MC/AMDGPU/gfx13_asm_vop3.s
+8,182-0llvm/test/MC/AMDGPU/gfx13_asm_vop3-fake16.s
+6,873-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Premium-sve-instructions.s
+6,862-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Nano-sve-instructions.s
+38,348-10,0838,257 files not shown
+437,282-165,1218,263 files

LLVM/project 0b53f85clang/lib/Driver/ToolChains/Arch AArch64.cpp, clang/test/Driver aarch64-march.c

[AArch64][clang] Improve -mcpu= and -mtune= error messages too

Similar to my previous change improving the error message for
`-march=` in #197441, this changes `-mcpu=` and `-mtune=` arguments
to only report the invalid feature flag, rather than the entire
string.

This is a much clearer error message for the user.
DeltaFile
+34-26clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+13-0clang/test/Driver/aarch64-march.c
+47-262 files

LLVM/project 5a0e401clang/test/Driver aarch64-march.c

fixup! Adjust test comment
DeltaFile
+1-1clang/test/Driver/aarch64-march.c
+1-11 files

LLVM/project ccd7f4cllvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal
DeltaFile
+33-48llvm/test/MC/AMDGPU/literals.s
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+40-492 files

LLVM/project a4acc5cllvm/test/CodeGen/X86 vector-reduce-umin.ll vector-reduce-umax.ll

[X86] Improve lowering of i32/i64 minmax reductions (#197578)

Allow 32-bit targets to correctly lower i64 ISD::VECREDUCE min/max nodes
via ReplaceNodeResults - this is necessary once we're finally ready for
#194473 and remove combineMinMaxReduction entirely

Improve handling of v2iXX reduction stages by consistently preferring
binop(extract(),extract()) scalarisation on SSE targets (if the vector
binop isn't legal).
DeltaFile
+592-703llvm/test/CodeGen/X86/vector-reduce-umin.ll
+573-658llvm/test/CodeGen/X86/vector-reduce-umax.ll
+460-549llvm/test/CodeGen/X86/vector-reduce-smin.ll
+464-543llvm/test/CodeGen/X86/vector-reduce-smax.ll
+227-276llvm/test/CodeGen/X86/horizontal-reduce-umin.ll
+210-244llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
+2,526-2,9735 files not shown
+2,886-3,39711 files

LLVM/project b3c0ae4openmp/device/src Reduction.cpp

add a comment about the self-reset
DeltaFile
+2-0openmp/device/src/Reduction.cpp
+2-01 files

LLVM/project 0521762compiler-rt/lib/builtins/arm extendsfdf2.S truncdfsf2.S

Update for rename of endian.h in a previous patch
DeltaFile
+1-1compiler-rt/lib/builtins/arm/extendsfdf2.S
+1-1compiler-rt/lib/builtins/arm/truncdfsf2.S
+2-22 files

LLVM/project e1b0f56llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge from main
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-019,520 files not shown
+2,324,406-580,58319,526 files

LLVM/project 1cb92d8compiler-rt/lib/builtins/arm fcmp.h cmpsf2.S, compiler-rt/lib/builtins/arm/thumb1 fcmp.h cmpsf2.S

[compiler-rt][ARM] Optimized single-precision FP comparisons (#179925)

These comparison functions follow the same structure as the
double-precision ones in a prior commit, of a header file containing the
main logic and some entry points varying the construction of the return
value.

In this case, we have provided versions for Thumb1 as well as
Arm/Thumb2.
DeltaFile
+443-0compiler-rt/test/builtins/Unit/comparesf2new_test.c
+189-0compiler-rt/lib/builtins/arm/thumb1/fcmp.h
+176-0compiler-rt/lib/builtins/arm/fcmp.h
+56-0compiler-rt/lib/builtins/arm/cmpsf2.S
+56-0compiler-rt/lib/builtins/arm/unordsf2.S
+55-0compiler-rt/lib/builtins/arm/thumb1/cmpsf2.S
+975-04 files not shown
+1,147-110 files

LLVM/project b783371llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal
DeltaFile
+33-48llvm/test/MC/AMDGPU/literals.s
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+40-492 files

LLVM/project d566c8cclang/lib/Format UnwrappedLineParser.cpp, clang/unittests/Format FormatTest.cpp

[clang-format] Fix parsing of goto labels (#197538)

Fixes #196662.

---------

Co-authored-by: owenca <owenpiano at gmail.com>
DeltaFile
+41-18clang/unittests/Format/FormatTest.cpp
+3-3clang/lib/Format/UnwrappedLineParser.cpp
+44-212 files

LLVM/project 6293f16llvm/lib/TableGen Record.cpp

[TableGen] Simplify Record type checks. NFC. (#197450)
DeltaFile
+6-6llvm/lib/TableGen/Record.cpp
+6-61 files

LLVM/project 3fda43dllvm/test/CodeGen/AMDGPU llvm.amdgcn.permlane.ll llvm.amdgcn.permlane.bcast.ll

[AMDGPU] Update permlane_bcast/down/up/xor intrinsic to support more types (#197141)

Co-authored-by: Acim Maravic <Acim.Maravic at amd.com>
DeltaFile
+3,435-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
+1,105-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.bcast.ll
+1,105-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.down.ll
+1,105-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.up.ll
+1,105-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.xor.ll
+0-440llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.gfx1250.ll
+7,855-4409 files not shown
+8,043-48015 files

LLVM/project 151cc5dllvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge from main
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-019,514 files not shown
+2,316,298-580,03119,520 files

LLVM/project ad6050ellvm/test/Transforms/LoopVectorize gather-scatter.ll if-conversion-scalable.ll

Nits to harmonise tests
DeltaFile
+159-31llvm/test/Transforms/LoopVectorize/gather-scatter.ll
+65-53llvm/test/Transforms/LoopVectorize/if-conversion-scalable.ll
+224-842 files

LLVM/project 7ae1962clang-tools-extra/clangd CodeComplete.cpp, clang-tools-extra/clangd/unittests CodeCompleteTests.cpp

[clangd] Fix parens suppression in mid-identifier code-completion (#197249)

When completing in the middle of an existing identifier (e.g.
`fo^o<int>(42)`), the next-token check lexes the character immediately
after the cursor, which prevents parens suppression to kick in.

After the fix, we go to the end of the current identifier first and only
then we start lexing for the next token, which handles redundant parens
even when the cursor is mid-identifier.

This also fixes the parens suppression in the replace mode which by
design is used mid-identifier.

Fixes https://github.com/clangd/clangd/issues/387
DeltaFile
+18-57clang-tools-extra/clangd/CodeComplete.cpp
+13-4clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp
+31-612 files

LLVM/project c0ed919llvm/lib/Target/AMDGPU GCNPreRAOptimizations.cpp, llvm/test/CodeGen/AMDGPU optimize-ds-bvh-stack-pre-ra.ll

[AMDGPU][GCNPreRAOptimizations] Reduce BVH premature reuse (#197386)

Add implicit uses to ds_bvh_stack instructions to avoid reuse of VGPRs
allocated to bvh_intersect_ray results prior to ds_bvh_stack. This
reduces likelihood of a premature s_wait_bvhcnt occuring due to partial
reallocation of unused bvh_intersect_ray results registers.
DeltaFile
+300-0llvm/test/CodeGen/AMDGPU/optimize-ds-bvh-stack-pre-ra.ll
+78-24llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
+378-242 files

LLVM/project d2af73ccompiler-rt/lib/builtins/arm dcmp.h unorddf2.S, compiler-rt/lib/builtins/arm/thumb1 dcmp.h cmpdf2.S

[compiler-rt][ARM] Optimized double-precision FP comparisons (#179924)

The structure of these comparison functions consists of a header file
containing the main code, and several `.S` files that include that
header with different macro definitions, so that they can use the same
procedure to determine the logical comparison result and then just
translate it into a return value in different ways.
DeltaFile
+619-0compiler-rt/test/builtins/Unit/comparedf2new_test.c
+231-0compiler-rt/lib/builtins/arm/thumb1/dcmp.h
+212-0compiler-rt/lib/builtins/arm/dcmp.h
+71-0compiler-rt/lib/builtins/arm/unorddf2.S
+64-0compiler-rt/lib/builtins/arm/cmpdf2.S
+61-0compiler-rt/lib/builtins/arm/thumb1/cmpdf2.S
+1,258-04 files not shown
+1,447-010 files

LLVM/project 4c88347clang/lib/Driver/ToolChains/Arch AArch64.cpp, clang/test/Driver aarch64-march.c

[AArch64][clang] Improve -mcpu= and -mtune= error messages too

Similar to my previous change improving the error message for
`-march=` in #197441, this changes `-mcpu=` and `-mtune=` arguments
to only report the invalid feature flag, rather than the entire
string.

This is a much clearer error message for the user.
DeltaFile
+34-26clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+13-0clang/test/Driver/aarch64-march.c
+47-262 files

LLVM/project 7b09a4ellvm/lib/Transforms/Vectorize VPlanRecipes.cpp LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/AArch64 reduction-cost.ll

[LV] Fix the cost model for freeze instructions (#197188)

While working on a PR to add a cost model for VPDerivedIV recipes I
noticed that a loop in or_reduction_with_freeze:

test/Transforms/LoopVectorize/AArch64/reduction-cost.ll

stopped vectorising because the cost model decided it was no longer
worth it. However, the main cause of this was the incredibly high cost
(14) of freeze for VF=2. We were using the cost of a vector mul
instruction as a proxy for the freeze cost, which is incredibly bad for
an AArch64 target without SVE since the operation needs scalarising.

As far as I understand, the freeze instruction does not lead to any
actual code being generated and acts merely as a barrier to potentially
unsafe optimisations. As such, I've updated the cost model to return 0
instead.
DeltaFile
+6-3llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+3-3llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll
+1-3llvm/test/Transforms/LoopVectorize/AArch64/reduction-cost.ll
+2-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+12-94 files

LLVM/project 20f4289llvm/test/Transforms/LoopVectorize/AArch64 reduction-small-size.ll

[LV][NFC] Generate full CHECK lines for reduction-small-size.ll (#197632)
DeltaFile
+161-38llvm/test/Transforms/LoopVectorize/AArch64/reduction-small-size.ll
+161-381 files

LLVM/project c162731compiler-rt/lib/builtins/arm dcmp.h cmpdf2.S, compiler-rt/lib/builtins/arm/thumb1 dcmp.h gedf2.S

Update for rename of endian.h in a previous patch
DeltaFile
+2-2compiler-rt/lib/builtins/arm/dcmp.h
+2-2compiler-rt/lib/builtins/arm/thumb1/dcmp.h
+1-1compiler-rt/lib/builtins/arm/cmpdf2.S
+1-1compiler-rt/lib/builtins/arm/gedf2.S
+1-1compiler-rt/lib/builtins/arm/thumb1/gedf2.S
+1-1compiler-rt/lib/builtins/arm/thumb1/unorddf2.S
+8-82 files not shown
+10-108 files

LLVM/project 3174273llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge from main
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-019,504 files not shown
+2,315,743-579,95619,510 files

LLVM/project 0c539fccompiler-rt/lib/builtins CMakeLists.txt, compiler-rt/lib/builtins/arm divdf3.S muldf3.S

[compiler-rt][ARM] Optimized double-precision FP mul/div (#179923)

Optimized AArch32 implementations of `muldf3` and `divdf3` are provided.
The division function is particularly tricky because its Newton-Raphson
approximation strategy requires a rigorous error bound. In this version
of the commit I've left out the full supporting machinery that validates
the error bound via Gappa and Rocq, but full details are provided via
links to the upstream version of this code in the Arm Optimized Routines
repository, and to a pair of Arm Community blog posts.
DeltaFile
+862-0compiler-rt/test/builtins/Unit/divdf3new_test.c
+832-0compiler-rt/test/builtins/Unit/muldf3new_test.c
+646-0compiler-rt/lib/builtins/arm/divdf3.S
+404-0compiler-rt/lib/builtins/arm/muldf3.S
+2-0compiler-rt/lib/builtins/CMakeLists.txt
+2,746-05 files

LLVM/project aa20895llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal
DeltaFile
+8-15llvm/test/MC/AMDGPU/literals.s
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+15-162 files

LLVM/project e1135dcoffload/plugins-nextgen/level_zero/include L0Kernel.h, offload/plugins-nextgen/level_zero/src L0Kernel.cpp

[OFFLOAD][L0] Simplify kernel setGroups logic (#197411)

This code path is not really used with upstream code generation.
DeltaFile
+12-220offload/plugins-nextgen/level_zero/src/L0Kernel.cpp
+0-51offload/plugins-nextgen/level_zero/include/L0Kernel.h
+12-2712 files

LLVM/project 2045ee5.github CODEOWNERS

Add new libc GH team to CODEOWNERS (#197630)

This auto-assigns PR reviewers, per the GitHub documentation.
DeltaFile
+1-0.github/CODEOWNERS
+1-01 files

LLVM/project 6990b14llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal
DeltaFile
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+2-5llvm/test/MC/AMDGPU/literals.s
+9-62 files

LLVM/project f098a22clang/docs ClangFormatStyleOptions.rst, clang/include/clang/Format Format.h

[clang-format] Add BreakBeforeReturnType option (#197268)

In certain codebases (e.g. embedded) — function declarations could
accumulate a long prefix of specifiers and attributes (`static`,
`inline`, `__attribute__((...))`, project-specific `AttributeMacros`,
etc.) before the return type, which buries the core prototype and pushes
parameters past the column limit.

This patch adds a `BreakBeforeReturnType` style option that places that
prefix on its own line(s):

```cpp
__attribute__((always_inline)) static inline
int do_thing(int a, int b, int c);
```

The recognized prefix tokens are function/storage specifiers (`static`,
`extern`, `inline`, `virtual`, `constexpr`, `consteval`, `friend`,
`export`, `_Noreturn`, `__forceinline`), C++11 attribute groups

    [16 lines not shown]
DeltaFile
+166-0clang/unittests/Format/FormatTest.cpp
+72-0clang/lib/Format/TokenAnnotator.cpp
+32-0clang/docs/ClangFormatStyleOptions.rst
+26-0clang/include/clang/Format/Format.h
+15-0clang/lib/Format/Format.cpp
+12-0clang/unittests/Format/ConfigParseTest.cpp
+323-04 files not shown
+339-110 files

LLVM/project 3e96e2bllvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge from main in the hope of fixing the unrelated CI failure
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-018,620 files not shown
+2,257,597-555,50718,626 files