LLVM/project f2ed002llvm/lib/Target/RISCV RISCVInsertVSETVLI.cpp, llvm/test/CodeGen/RISCV/rvv vsetvli-insert.ll vsetvli-insert.mir

[RISCV] Fix RISCVInsertVSETVLI coalescing clobbering VL def segment (#167712)

This fixes an assert when compiling llvm-test-suite with -march=rva23u64
-O3 that started appearing sometime this week.

We get "Cannot overlap two segments with differing ValID's" because we
try to coalescse these two vsetvlis:

    %x:gprnox0 = COPY $x8
dead $x0 = PseudoVSETIVLI 1, 208, implicit-def $vl, implicit-def $vtype
    %y:gprnox0 = COPY %x
    %v:vr = COPY $v8, implicit $vtype
    %x = PseudoVSETVLI %x, 208, implicit-def $vl, implicit-def $vtype

    -->

    %x:gprnox0 = COPY $x8
    %x = PseudoVSETVLI %x, 208, implicit-def $vl, implicit-def $vtype
    %y:gprnox0 = COPY %x

    [12 lines not shown]
DeltaFile
+43-0llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
+28-1llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.mir
+8-0llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+79-13 files

LLVM/project 4bcde03mlir/lib/Conversion/ArithToAPFloat ArithToAPFloat.cpp, mlir/lib/Dialect/Func/Utils Utils.cpp

Reland yet again: [mlir] Add FP software implementation lowering pass: `arith-to-apfloat` (#167608)

Fix both symbol visibility issue in the mlir_apfloat_wrappers lib and the linkage issue in ArithToAPFloat.
DeltaFile
+163-0mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+128-0mlir/test/Conversion/ArithToApfloat/arith-to-apfloat.mlir
+89-0mlir/lib/ExecutionEngine/APFloatWrappers.cpp
+40-0mlir/test/Integration/Dialect/Arith/CPU/test-apfloat-emulation.mlir
+25-0mlir/lib/Dialect/Func/Utils/Utils.cpp
+23-0mlir/lib/ExecutionEngine/CMakeLists.txt
+468-011 files not shown
+562-017 files

LLVM/project 70eb4b0clang/lib/AST/ByteCode Interp.h Compiler.cpp, clang/test/AST/ByteCode new-delete.cpp arrays.cpp

[clang][bytecode] Fix diagnosing subtration of zero-size pointers (#167839)

We need to get the element type size at bytecode generation time to
check. We also need to diagnose this in the LHS == RHS case.
DeltaFile
+15-17clang/lib/AST/ByteCode/Interp.h
+10-0clang/test/AST/ByteCode/new-delete.cpp
+8-1clang/lib/AST/ByteCode/Compiler.cpp
+4-0clang/test/AST/ByteCode/arrays.cpp
+1-0clang/lib/AST/ByteCode/Opcodes.td
+38-185 files

LLVM/project d2a2b16libunwind/src Registers.hpp

[libunwind] Ensure zaDisable() is called in jumpto/returnto (NFC) (#167674)

This is an NFC for now, as the SME checks for macOS platforms are not
implemented, so zaDisable() is a no-op, but both paths for resuming from
an exception should disable ZA.

This is a fixup for a recent change in #165066.
DeltaFile
+5-8libunwind/src/Registers.hpp
+5-81 files

LLVM/project f038dfdlibcxx/include module.modulemap.in, libcxx/include/__memory unique_ptr.h shared_ptr.h

[libc++] Merge is_{,un}bounded_array.h into is_array.h (#167479)

These headers are incredibly simple and closely related, so this merges
them into a single one.
DeltaFile
+0-38libcxx/include/__type_traits/is_unbounded_array.h
+0-36libcxx/include/__type_traits/is_bounded_array.h
+26-0libcxx/include/__type_traits/is_array.h
+0-8libcxx/include/module.modulemap.in
+0-2libcxx/include/__memory/unique_ptr.h
+0-2libcxx/include/__memory/shared_ptr.h
+26-864 files not shown
+26-9210 files

LLVM/project 189d185libcxx/include module.modulemap.in, libcxx/test/std/re/re.results/re.results.const move.pass.cpp

[libc++] Add an initial modulemap for the test support headers (#162800)

This should improve the time it takes to run the test suite a bit. Right
now there are only a handful of headers in the modulemap because we're
missing a lot of includes in the tests. New headers should be added
there from the start, and we should fill up the modulemap over time
until it contains all the test support headers.
DeltaFile
+8-7libcxx/test/std/strings/basic.string/string.cons/string_view_deduction.pass.cpp
+6-5libcxx/test/std/strings/basic.string/string.cons/string_view_size_size_deduction.pass.cpp
+10-0libcxx/test/support/module.modulemap
+4-1libcxx/include/module.modulemap.in
+3-2libcxx/test/std/strings/basic.string/string.cons/dtor.pass.cpp
+3-1libcxx/test/std/re/re.results/re.results.const/move.pass.cpp
+34-1647 files not shown
+103-3253 files

LLVM/project 478e45flibcxx/src atomic.cpp

[libc++] Improve performance of std::atomic_flag on Windows (#163524)

On Windows 8 and above, the WaitOnAddress, WakeByAddressSingle and
WakeByAddressAll functions allow efficient implementation of the C++20
wait and notify features of std::atomic_flag. These Windows functions
have never been made use of in libc++, leading to very poor performance
of these features on Windows platforms, as they are implemented using a
spin loop with backoff, rather than using any OS thread signalling
whatsoever. This change implements the use of these OS functions where
available, falling back to the original implementation on Windows
versions prior to 8.

Relevant API docs from Microsoft:

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-wakebyaddresssingle

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-wakebyaddressall

    [3 lines not shown]
DeltaFile
+69-0libcxx/src/atomic.cpp
+69-01 files

LLVM/project 2ac9e59libcxx/include/__memory shared_ptr.h, libcxx/test/std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.const unique_ptr.pass.cpp

[libc++] Simplify the implementation of the unique_ptr -> shared_ptr converting constructor (#165619)

This also backports LWG2415 as a drive-by.
DeltaFile
+6-35libcxx/include/__memory/shared_ptr.h
+2-2libcxx/test/std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.const/unique_ptr.pass.cpp
+8-372 files

LLVM/project 693f700libcxx/include/__locale_dir num.h locale_base_api.h, libcxx/include/__locale_dir/support bsd_like.h linux.h

[libc++] Implement our own is{,x}digit functions for the C locale (#165467)

The C locale is defined by the C standard, so we know exactly which
digits classify as (x)digits. Instead of going through the locale base
API we can simply implement functions which determine whether a
character is one ourselves, and probably improve codegen significantly
as well that way.
DeltaFile
+9-2libcxx/include/__locale_dir/num.h
+0-5libcxx/include/__locale_dir/locale_base_api.h
+0-4libcxx/include/__locale_dir/support/no_locale/characters.h
+0-4libcxx/include/__locale_dir/support/bsd_like.h
+0-4libcxx/include/__locale_dir/support/linux.h
+0-4libcxx/include/__locale_dir/support/windows.h
+9-236 files

LLVM/project 825706bcompiler-rt/lib/builtins CMakeLists.txt, compiler-rt/lib/builtins/i386 chkstk.S chkstk2.S

Revert "[compiler-rt] [builtins] Remove unused/misnamed x86 chkstk functions"

This reverts parts of commit 885d7b759b5c166c07c07f4c58c6e0ba110fb0c2,
and adds verbose comments explaining all the variants of this
function, for clarity for future readers.

It turns out that those functions actually weren't misnamed or
unused after all: Apparently Clang doesn't match GCC when it comes
to what stack probe function is referenced on i386 mingw. GCC < 4.6
references a symbol named "___chkstk", with three leading underscores,
and GCC >= 4.6 references "___chkstk_ms".

Restore these functions, to allow linking object files built with
GCC with compiler-rt.
DeltaFile
+40-0compiler-rt/lib/builtins/i386/chkstk.S
+18-0compiler-rt/lib/builtins/i386/chkstk2.S
+1-0compiler-rt/lib/builtins/CMakeLists.txt
+59-03 files

LLVM/project d2f0b27compiler-rt/lib/builtins CMakeLists.txt, compiler-rt/lib/builtins/i386 chkstk2.S chkstk.S

Revert "[compiler-rt] Rename the now lone i386/chkstk2.S to i386/chkstk.S"

This reverts commit 1f9eff100ce8faea1284d68b779d844c6e019b77.

This is done in preparation of reverting parts of
885d7b759b5c166c07c07f4c58c6e0ba110fb0c2.
DeltaFile
+39-0compiler-rt/lib/builtins/i386/chkstk2.S
+0-39compiler-rt/lib/builtins/i386/chkstk.S
+1-1compiler-rt/lib/builtins/CMakeLists.txt
+40-403 files

LLVM/project 5edf70cclang-tools-extra/docs/clang-tidy/checks/bugprone tagged-union-member-count.rst use-after-move.rst

[clang-tidy][docs][NFC] Enforce 80 characters limit (1/N) (#167492)

Fix documentation in `abseil`, `android`, `altera`, `boost` and
`bugprone`.

This is part of the codebase cleanup described in
[#167098](https://github.com/llvm/llvm-project/issues/167098)
DeltaFile
+23-21clang-tools-extra/docs/clang-tidy/checks/bugprone/tagged-union-member-count.rst
+19-19clang-tools-extra/docs/clang-tidy/checks/bugprone/use-after-move.rst
+18-14clang-tools-extra/docs/clang-tidy/checks/bugprone/signed-char-misuse.rst
+15-14clang-tools-extra/docs/clang-tidy/checks/bugprone/unintended-char-ostream-output.rst
+14-13clang-tools-extra/docs/clang-tidy/checks/bugprone/narrowing-conversions.rst
+13-12clang-tools-extra/docs/clang-tidy/checks/bugprone/inc-dec-in-conditions.rst
+102-9357 files not shown
+380-33463 files

LLVM/project 147e615.ci monolithic-windows.sh

[CI] Fix misspelled runtimes_targets variable (#167696)

This was preventing check-compiler-rt from actually running when we
touched a project that was supposed to cause compiler-rt to be tested.
DeltaFile
+1-1.ci/monolithic-windows.sh
+1-11 files

LLVM/project 140e07cmlir/include/mlir/Conversion/ArithToAPFloat ArithToAPFloat.h, mlir/lib/Conversion/ArithToAPFloat ArithToAPFloat.cpp

Revert "Reland yet again: [mlir] Add FP software implementation lowering pass: `arith-to-apfloat`" (#167834)

Reverts llvm/llvm-project#167608

Broken builder https://lab.llvm.org/buildbot/#/builders/52/builds/12781
DeltaFile
+0-163mlir/lib/Conversion/ArithToAPFloat/ArithToAPFloat.cpp
+0-128mlir/test/Conversion/ArithToApfloat/arith-to-apfloat.mlir
+0-89mlir/lib/ExecutionEngine/APFloatWrappers.cpp
+0-36mlir/test/Integration/Dialect/Arith/CPU/test-apfloat-emulation.mlir
+0-25mlir/lib/Dialect/Func/Utils/Utils.cpp
+0-21mlir/include/mlir/Conversion/ArithToAPFloat/ArithToAPFloat.h
+0-46211 files not shown
+0-55217 files

LLVM/project 99a726ellvm/lib/CodeGen/SelectionDAG SelectionDAGISel.cpp

[SelectionDAGISel] Const correct ChainNodesMatched argument to HandleMergeInputChains. NFC (#167807)

DeltaFile
+1-1llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+1-11 files

LLVM/project de3dcc8llvm/test/CodeGen/LoongArch expandmemcmp.ll expandmemcmp-optsize.ll

update tests
DeltaFile
+670-297llvm/test/CodeGen/LoongArch/expandmemcmp.ll
+612-155llvm/test/CodeGen/LoongArch/expandmemcmp-optsize.ll
+1,282-4522 files

LLVM/project cd7be8cllvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchTargetTransformInfo.cpp

[LoongArch] Support memcmp expansion for vectors and combine for i128/i256 setcc

This commit enables memcmp expansion for lsx/lasx. After doing
this, i128 and i256 loads which are illegal types on LoongArch
will be generated. Without process, they will be splited to
legal scalar type.

So this commit also enable combination for `setcc` to bitcast
i128/i256 types to vector types before type legalization and
generate vector instructions.

Inspired by x86 and riscv.
DeltaFile
+114-8llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+8-3llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp
+122-112 files

LLVM/project 13251f5llvm/lib/DWARFCFIChecker Registers.h

[DWARFCFIChecker] Use MCRegister instead of MCPhysReg. NFC (#167823)

DeltaFile
+4-4llvm/lib/DWARFCFIChecker/Registers.h
+4-41 files

LLVM/project 925fb05llvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV/GlobalISel/irtranslator vec-vleff.ll

[RISCV][GISel] Fallback to SelectionDAG for vleff intrinsics. (#167776)

Supporting this in GISel requires multiple changes to IRTranslator to
support aggregate returns containing scalable vectors and non-scalable
types. Falling back is the quickest way to fix the crash.

Fixes #167618
DeltaFile
+14-0llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/vec-vleff.ll
+4-2llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+18-22 files

LLVM/project 5feddcfllvm/test/CodeGen/AMDGPU a-v-flat-atomicrmw.ll a-v-global-atomicrmw.ll

Update regressed tests
DeltaFile
+1,570-1,557llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll
+1,136-1,130llvm/test/CodeGen/AMDGPU/a-v-global-atomicrmw.ll
+186-171llvm/test/CodeGen/AMDGPU/vni8-across-blocks.ll
+9-8llvm/test/CodeGen/AMDGPU/copy-to-reg-frameindex.ll
+2,901-2,8664 files

LLVM/project 9c3e46bllvm/test/CodeGen/AMDGPU global-atomicrmw-fmin.ll global-atomicrmw-fmax.ll

AMDGPU: Really use AV classes by default for vector classes

Update getRegClassFor to use AV classes in place of VGPRs for
gfx90a-gfx950. There are a handful of regressions. Most are
enabling unprofitable rematerialization which reduce register
count by 1 but add an unnecessary instruction.
DeltaFile
+524-524llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll
+524-524llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll
+520-524llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll
+520-524llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll
+436-440llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fsub.ll
+432-432llvm/test/CodeGen/AMDGPU/global-atomicrmw-fsub.ll
+2,956-2,96824 files not shown
+4,692-4,71330 files

LLVM/project 7f9c3bbllvm/test/CodeGen/AMDGPU a-v-global-atomicrmw.ll global-atomicrmw-fmin.ll

32-bitcase

Note this does very little because we only use VGPR classes
for FP types (though this doesn't particularly make any sense),
and we legalize normal loads and stores to integer.
DeltaFile
+190-190llvm/test/CodeGen/AMDGPU/a-v-global-atomicrmw.ll
+140-140llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll
+140-140llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll
+136-140llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll
+136-140llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll
+78-82llvm/test/CodeGen/AMDGPU/mfma-loop.ll
+820-83210 files not shown
+997-1,00616 files

LLVM/project f111afallvm/test/CodeGen/AMDGPU a-v-flat-atomicrmw.ll no-fold-accvgpr-mov.ll

Regression with 32-bit case
DeltaFile
+246-238llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll
+7-5llvm/test/CodeGen/AMDGPU/no-fold-accvgpr-mov.ll
+253-2432 files

LLVM/project ccde342llvm/lib/Target/AMDGPU AMDGPU.td GCNSubtarget.h, llvm/test/CodeGen/AMDGPU flat-saddr-atomics.ll global-atomicrmw-fadd.ll

[AMDGPU] Insert `s_wait_xcnt(0)` before atomics to work around write-combining miss hazard

This patch adds a workaround for a hazzard on GFX1250, which inserts an `s_wait_xcnt(0)` instruction before any atomic operation that might write to memory.

Fixes SWDEV-543703.
DeltaFile
+188-0llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
+72-0llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll
+56-0llvm/test/CodeGen/AMDGPU/atomics-system-scope.ll
+9-0llvm/lib/Target/AMDGPU/AMDGPU.td
+6-0llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll
+5-1llvm/lib/Target/AMDGPU/GCNSubtarget.h
+336-17 files not shown
+356-213 files

LLVM/project 4d47649llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

remove unnecessary `mayStore` check
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-11 files

LLVM/project 1337723llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU a-v-flat-atomicrmw.ll a-v-global-atomicrmw.ll

AMDGPU: Start to use AV classes for unknown vector class

Use AGPR+VGPR superclasses for gfx90a+. The type used
for the class should be the broadest possible class, to
be contextually restricted later. InstrEmitter clamps these
to the common subclass of the context use instructions, so we're
best off using the broadest possible class for all types.

Note this does very little because we only use VGPR classes
for FP types (though this doesn't particularly make any sense),
and we legalize normal loads and stores to integer.
DeltaFile
+280-280llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll
+140-140llvm/test/CodeGen/AMDGPU/a-v-global-atomicrmw.ll
+70-74llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fsub.ll
+30-34llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll
+30-34llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll
+24-17llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+574-57910 files not shown
+662-65516 files

LLVM/project dfe9838clang/lib/Format QualifierAlignmentFixer.cpp, clang/unittests/Format QualifierFixerTest.cpp

[clang-format] Don't swap `(const override)` with QAS_Right (#167191)

Fixes #154846
DeltaFile
+13-4clang/lib/Format/QualifierAlignmentFixer.cpp
+2-0clang/unittests/Format/QualifierFixerTest.cpp
+15-42 files

LLVM/project e58e799llvm/lib/Target/AMDGPU AMDGPU.td, llvm/test/CodeGen/AMDGPU flat-saddr-atomics.ll global-atomicrmw-fadd.ll

[AMDGPU] Insert `s_wait_xcnt(0)` before atomics to work around write-combining miss hazard

This patch adds a workaround for a hazzard on GFX1250, which inserts an `s_wait_xcnt(0)` instruction before any atomic operation that might write to memory.

Fixes SWDEV-543703.
DeltaFile
+188-0llvm/test/CodeGen/AMDGPU/flat-saddr-atomics.ll
+72-0llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll
+56-0llvm/test/CodeGen/AMDGPU/atomics-system-scope.ll
+9-0llvm/lib/Target/AMDGPU/AMDGPU.td
+6-0llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll
+6-0llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll
+337-07 files not shown
+356-213 files

LLVM/project 7aaa6b7clang-tools-extra/clang-doc HTMLMustacheGenerator.cpp Generators.cpp

[clang-doc] lift Mustache template generation from HTML

To prepare for more backends to use Mustache templates, this patch lifts
the Mustache functionality from HTMLMustacheGenerator.cpp to
Generators.h. A MustacheGenerator interface is created to share code for
template creation.
DeltaFile
+28-174clang-tools-extra/clang-doc/HTMLMustacheGenerator.cpp
+130-0clang-tools-extra/clang-doc/Generators.cpp
+83-0clang-tools-extra/clang-doc/Generators.h
+241-1743 files

LLVM/project 73e70e0mlir/test/Integration/Dialect/Linalg/CPU runtime-verification.mlir

[mlir][linalg] Fix Linalg runtime verification test (#167814)

This integration test has been broken for a while. This commit partially
fixes it.

- Use `CHECK` + `CHECK-NEXT` to ensure that the correct error lines are
matched together.
- Move all `CHECK-NOT` to the end. Having a `CHECK` with the same string
does not make sense after a `CHECK-NOT`.
- Add a missing `CHECK: ERROR` for one of the test cases.
- Deactivate `reverse_from_3`, which is broken, and put a TODO.
DeltaFile
+71-64mlir/test/Integration/Dialect/Linalg/CPU/runtime-verification.mlir
+71-641 files