LLVM/project 85e45cdllvm/lib/Target/AArch64 AArch64InstrInfo.td AArch64SVEInstrInfo.td, llvm/test/MC/AArch64/SVE2p2 fmmla.s

fixup! Add `HasSVE_F16MM` and use `Has_F16MM` just for Neon.
DeltaFile
+5-5llvm/test/MC/AArch64/SVE2p2/fmmla.s
+2-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+1-1llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+8-63 files

LLVM/project c0ab6e4llvm/test/CodeGen/AArch64 sve-intrinsics-matmul-bf16.ll sve-intrinsics-matmul-f16.ll

fixup! More small test fixes
DeltaFile
+1-3llvm/test/CodeGen/AArch64/sve-intrinsics-matmul-bf16.ll
+1-3llvm/test/CodeGen/AArch64/sve-intrinsics-matmul-f16.ll
+0-2llvm/test/CodeGen/AArch64/neon-matmul-f16.ll
+0-2llvm/test/CodeGen/AArch64/neon-matmul-f16f32mm.ll
+2-104 files

LLVM/project a616be5llvm/test/CodeGen/AArch64 aarch64-matmul-f16f32mm.ll aarch64-matmul-f16mm.ll

fixup! Rename files to be more logical
DeltaFile
+0-15llvm/test/CodeGen/AArch64/aarch64-matmul-f16f32mm.ll
+0-15llvm/test/CodeGen/AArch64/aarch64-matmul-f16mm.ll
+15-0llvm/test/CodeGen/AArch64/neon-matmul-f16.ll
+15-0llvm/test/CodeGen/AArch64/neon-matmul-f16f32mm.ll
+14-0llvm/test/CodeGen/AArch64/sve-intrinsics-matmul-bf16.ll
+14-0llvm/test/CodeGen/AArch64/sve-intrinsics-matmul-f16.ll
+58-302 files not shown
+58-588 files

LLVM/project 50f2639clang/include/clang/Basic arm_neon.td, clang/lib/CodeGen/TargetBuiltins ARM.cpp

fixup! Address Kerry's PR comments
DeltaFile
+2-2clang/include/clang/Basic/arm_neon.td
+1-3clang/test/CodeGen/AArch64/v9.7a-neon-mmla-intrinsics.c
+1-2clang/utils/TableGen/NeonEmitter.cpp
+0-2llvm/test/CodeGen/AArch64/sve-bfmmla-bf16.ll
+0-2clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+1-1clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_mmla-bf16.c
+5-122 files not shown
+6-158 files

LLVM/project c887f7aclang/test/CodeGen/AArch64 v9.7a-neon-mmla-intrinsics.c, clang/test/CodeGen/AArch64/sve-intrinsics acle_sve_mmla-bf16.c acle_sve_mmla-f16.c

[AArch64][clang][llvm] Add ACLE Armv9.7 MMLA intrinsics

Implement new ACLE matrix multiply-accumulate intrinsics for Armv9.7:

```c
  // 16-bit floating-point matrix multiply-accumulate.
  // Only if __ARM_FEATURE_SVE_B16MM
  // Variant also available for _f16 if (__ARM_FEATURE_SVE2p2 && __ARM_FEATURE_F16MM).
  svbfloat16_t svmmla[_bf16](svbfloat16_t zda, svbfloat16_t zn, svbfloat16_t zm);

  // Half-precision matrix multiply accumulating to single-precision
  // instruction from Armv9.7-A. Requires the +f16f32mm architecture extension.
  float32x4_t vmmlaq_f32_f16(float32x4_t r, float16x8_t a, float16x8_t b)

  // Non-widening half-precision matrix multiply instruction. Requires the
  // +f16mm architecture extension.
  float16x8_t vmmlaq_f16_f16(float16x8_t r, float16x8_t a, float16x8_t b)
```
DeltaFile
+47-0clang/test/CodeGen/AArch64/v9.7a-neon-mmla-intrinsics.c
+32-0clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_mmla-bf16.c
+32-0clang/test/Sema/AArch64/arm_sve_non_streaming_only_sve_AND_sve-b16mm.c
+32-0clang/test/Sema/AArch64/arm_sve_non_streaming_only_sve_AND_sve2p2_AND_f16mm.c
+32-0clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_mmla-f16.c
+16-0llvm/test/CodeGen/AArch64/sve-bfmmla-bf16.ll
+191-014 files not shown
+297-1220 files

LLVM/project ca13692clang/lib/Serialization ASTReader.cpp, clang/test/Sema redefine_extname.cpp

Fix formatting of changes in recent redefine_extname changes. (#189938)

My recent commits fd7388d14083bb5094bce6a75444a37e424689d7 and
37888541a96e4f10bf1b71b869145f0d31a9d580 had minor formatting issues
which happened during editing the PR as I had forgotten to rerun
clang-format. Sorry for that. So here is the update.

Did not fix line lengths in the FileCheck tests as exceeding line length
in these seems more consistent.
DeltaFile
+2-2clang/lib/Serialization/ASTReader.cpp
+1-1clang/test/Sema/redefine_extname.cpp
+3-32 files

LLVM/project f780e46llvm/lib/Transforms/Scalar ExpandMemCmp.cpp

[llvm][ExpandMemCmp] Avoid making copy of loop value (#193915)

This fixes a compiler warning.
DeltaFile
+1-1llvm/lib/Transforms/Scalar/ExpandMemCmp.cpp
+1-11 files

LLVM/project 8685fb1mlir/include/mlir/Dialect CommonFolders.h, mlir/include/mlir/Dialect/Math/IR MathOps.td

[mlir][math] Add constant folding for `math.fpowi` (#193761)

Adds a constant folder for `math.fpowi` when both operands are constant
and the integer exponent is exactly representable in the floating-point
type of the base.
DeltaFile
+52-42mlir/include/mlir/Dialect/CommonFolders.h
+33-0mlir/test/Dialect/Math/canonicalize.mlir
+28-0mlir/lib/Dialect/Math/IR/MathOps.cpp
+2-4mlir/include/mlir/Dialect/Math/IR/MathOps.td
+2-1mlir/test/lib/Dialect/Test/TestPatterns.cpp
+117-475 files

LLVM/project 7765252llvm/docs AMDGPUUsage.rst LangRef.rst

[LangRef][AMDGPU] Specify that syncscope can cause atomic operations to race

Targets should be able to specify that the syncscope of atomic operations
influences whether they participate in data races with each other.

For example, in AMDGPU, we want (and already implement) the load in the
following case to be in a data race (i.e., return `undef` according to the
current definition), because there is an atomic store with workgroup syncscope
executing in a different workgroup:

```
; workgroup 0:
  store atomic i32 1, ptr %p syncscope("workgroup") monotonic, align 4

; workgroup 1:
  store atomic i32 2, ptr %p syncscope("workgroup") monotonic, align 4
  load atomic i32, ptr %p syncscope("workgroup") monotonic, align 4
```


    [3 lines not shown]
DeltaFile
+4-1llvm/docs/AMDGPUUsage.rst
+2-1llvm/docs/LangRef.rst
+6-22 files

LLVM/project 82c052ellvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 08b22ccllvm/docs AMDGPUUsage.rst

[AMDGPUUsage] Specify what one-as syncscopes do

This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
DeltaFile
+89-77llvm/docs/AMDGPUUsage.rst
+89-771 files

LLVM/project f0b5410llvm/docs LangRef.rst

Add an "(or stronger)" for clarity, improve wrapping.
DeltaFile
+10-10llvm/docs/LangRef.rst
+10-101 files

LLVM/project 510cc14llvm/docs LangRef.rst

[LangRef] Allow monotonic & seq_cst accesses to inter-operate with other accesses

Currently, the LangRef says that atomic operations (which includes `unordered`
operations, which don't participate in the monotonic modification order) must
read a value from the modification order of monotonic operations.

In the following example, this means that the load does not have a store it
could read from, because all stores it may see do not participate in the
monotonic modification order:

```
; thread 0:
  store atomic i32 1, ptr %p unordered, align 4

; thread 1:
  store atomic i32 2, ptr %p unordered, align 4
  load atomic i32, ptr %p unordered, align 4
```


    [18 lines not shown]
DeltaFile
+17-15llvm/docs/LangRef.rst
+17-151 files

LLVM/project 839a22fflang/docs Directives.md, flang/lib/Lower Bridge.cpp

[Flang] Add `INLINEALWAYS` Compiler Directive (#192674)

Adds support for the INLINEALWAYS Compiler Directive to Flang. This was
previously supported in Classic-Flang, and works in the same way as
FORCEINLINE.

It can either be defined at the call site, or within the function the
user wishes to inline.

The missing support was highlighted while building an opensource
benchmark, as build warnings were indicating that this compiler
directive was being ignored.
DeltaFile
+139-0flang/test/Parser/inlinealways-directive.f90
+44-0flang/test/Lower/inlinealways-directive.f90
+25-0flang/lib/Lower/Bridge.cpp
+17-0flang/test/Semantics/inlinealways-directive01.f90
+15-0flang/lib/Semantics/resolve-names.cpp
+10-0flang/docs/Directives.md
+250-05 files not shown
+267-111 files

LLVM/project 170f030mlir/lib/Dialect/Math/IR MathOps.cpp

[mlir][math] Use APFloat::SemanticsToEnum in constant folding (#193914)

Refactor constant folding in the Math dialect to use APFloat::SemanticsToEnum() instead of getSizeInBits() when checking
floating-point semantics. Inferring semantics from bitwidth is fragile: different formats may share the same bit width but have distinct semantics, leading to incorrect dispatch. SemanticsToEnum() matches on the exact semantics descriptor, making the intent explicit and ensuring correct dispatch.
DeltaFile
+90-90mlir/lib/Dialect/Math/IR/MathOps.cpp
+90-901 files

LLVM/project 3ec9bbcllvm/test/Transforms/DeadStoreElimination fence.ll fence-todo.ll

[DSE] Merge two test files and generate checks (NFC) (#193922)

Merge the todo file into the main test, showing current codegen.
DeltaFile
+81-17llvm/test/Transforms/DeadStoreElimination/fence.ll
+0-50llvm/test/Transforms/DeadStoreElimination/fence-todo.ll
+81-672 files

LLVM/project 297fb93llvm/test/Transforms/LICM atomics.ll

[LICM] Generate test checks (NFC) (#193921)
DeltaFile
+149-45llvm/test/Transforms/LICM/atomics.ll
+149-451 files

LLVM/project 4852c5bllvm/lib/Target/ARM ARMInstrThumb2.td, llvm/test/MC/ARM thumb-hvc-virt-diagnostics.s

[ARM][MC] Gate Thumb hvc alias on virtualization (#193532)
DeltaFile
+19-0llvm/test/MC/ARM/thumb-hvc-virt-diagnostics.s
+3-1llvm/lib/Target/ARM/ARMInstrThumb2.td
+22-12 files

LLVM/project c55a73clldb/source/Commands CommandObjectMemory.cpp CommandObjectCommands.cpp

[lldb] Remove full stop from AppendErrorWithFormat format strings (part 1) (#193750)

To fit the style guide:
https://llvm.org/docs/CodingStandards.html#error-and-warning-messages

I found these with:
* Find `(\.AppendErrorWithFormat\(([\s\r\n]+)?"(?:(?:\\.|[^"\\])*))\."`
and replace with `$1"` using Visual Studio Code.
* Putting a call to `validate_diagnostic` in `AppendErrorWithFormat`.
* Manual inspection.

Note that this change *does not* include a call to `validate_diagnostic`
because I do not know what's going to crash on platforms that I haven't
tested on.
DeltaFile
+26-27lldb/source/Commands/CommandObjectMemory.cpp
+16-17lldb/source/Commands/CommandObjectCommands.cpp
+15-15lldb/source/Commands/CommandObjectFrame.cpp
+5-5lldb/source/Commands/CommandObjectDisassemble.cpp
+3-3lldb/source/Commands/CommandObjectBreakpointCommand.cpp
+3-3lldb/source/Commands/CommandObjectBreakpoint.cpp
+68-704 files not shown
+74-7710 files

LLVM/project 340ba11llvm/lib/Target/SPIRV SPIRVEmitIntrinsics.cpp, llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_memory_access_aliasing alias-load-store-atomic.ll

[SPIRV] Do not add aliasing decorations to OpAtomicStore/OpAtomicLoad (#193779)

Do not attach them to store atomic or load atomic intrinsics /
instructions since the extension is inconsistent at the moment (we
cannot add the decoration to atomic stores because they do not have an
id).
DeltaFile
+6-19llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+4-9llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_memory_access_aliasing/alias-load-store-atomic.ll
+10-282 files

LLVM/project bd469e8llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp

[SDAG] Minor cleanup to TargetLowering::expandFP_ROUND. NFC (#193793)

Noticed when porting to GISel, constants might as well be added to the
RHS of an add and a bitcast from the same type can be removed.
DeltaFile
+1-2llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+1-21 files

LLVM/project aa0db4fllvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchLSXInstrInfo.td, llvm/test/CodeGen/LoongArch ctpop-with-lsx.ll sextw-removal.ll

[LoongArch] Optimize for scalar type `ctpop` when lsx enabled
DeltaFile
+32-55llvm/test/CodeGen/LoongArch/ctpop-with-lsx.ll
+38-22llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+11-17llvm/test/CodeGen/LoongArch/sextw-removal.ll
+10-17llvm/test/CodeGen/LoongArch/ctlz-cttz-ctpop.ll
+21-3llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+112-1145 files

LLVM/project 6769876llvm/lib/Target/LoongArch LoongArchISelLowering.cpp

Address wanglei's comments
DeltaFile
+2-6llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+2-61 files

LLVM/project 7d3e13cllvm/lib/Target/LoongArch LoongArchInstrInfo.td, llvm/test/CodeGen/LoongArch ctlz-cttz-ctpop.ll

[LoongArch] Add patterns to match `cto.w/d` when meeting i8/i16 types `not+cttz`
DeltaFile
+8-12llvm/test/CodeGen/LoongArch/ctlz-cttz-ctpop.ll
+4-0llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
+12-122 files

LLVM/project 1634916llvm/lib/Target/LoongArch LoongArchLASXInstrInfo.td LoongArchLSXInstrInfo.td, llvm/test/CodeGen/LoongArch/lasx ctpop-ctlz.ll

[LoongArch] Add patterns for `[x]vclo.{b/h/w/d}` instructions
DeltaFile
+4-8llvm/test/CodeGen/LoongArch/lsx/ctpop-ctlz.ll
+4-8llvm/test/CodeGen/LoongArch/lasx/ctpop-ctlz.ll
+6-0llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+6-0llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+20-164 files

LLVM/project 347aa3fllvm/test/CodeGen/AArch64/Atomics aarch64-atomicrmw-lse2.ll aarch64-atomicrmw-v8a.ll, llvm/test/CodeGen/AArch64/GlobalISel arm64-atomic.ll

[GISel] Disable opt_brcond_by_inverting_cond combine at O0 (#193417)

This combine should not be necessary at O0 and disabling it is a -0.39%
geomean compile-time improvement on stage1-aarch64-O0-g CTMark [1] with
no change to code-size [2].

I also measured code-size without -g (cmake/caches/O0.cmake) locally
since it's not on llvm-compile-time-tracker and found no change.

[1] https://llvm-compile-time-tracker.com/compare.php?from=f2efeabe314bb0a9b1ef46c07b11f605ed351b9c&to=3d28cf0bb7fb2fc5baebc4b5de09b735c1115e7f&stat=instructions%3Au
[2] https://llvm-compile-time-tracker.com/compare.php?from=f2efeabe314bb0a9b1ef46c07b11f605ed351b9c&to=3d28cf0bb7fb2fc5baebc4b5de09b735c1115e7f&stat=size-total
DeltaFile
+1,250-1,305llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lse2.ll
+1,250-1,305llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-v8a.ll
+1,250-1,305llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-rcpc3.ll
+1,250-1,305llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-rcpc.ll
+615-615llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
+360-360llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-v8a_fp.ll
+5,975-6,19515 files not shown
+6,359-6,75021 files

LLVM/project efffb04llvm/lib/Target/DirectX/DXILWriter DXILBitcodeWriter.cpp, llvm/test/tools/dxil-dis debug-info.ll

[DirectX] Fix DILocalVariable (#192573)

LLVM 3.7 did not allow the DW_TAG_variable tag for them and had two
custom tags instead.
DeltaFile
+3-3llvm/test/tools/dxil-dis/debug-info.ll
+4-1llvm/lib/Target/DirectX/DXILWriter/DXILBitcodeWriter.cpp
+7-42 files

LLVM/project 3de3198llvm/lib/Target/DirectX/DXILWriter DXILBitcodeWriter.cpp, llvm/test/tools/dxil-dis vla.ll

[DirectX] Replace non-const count of DISubrange with -1 (#192576)

Non-const count is only emitted for C99 VLA, which are not supported.

Co-authored-by: Andrew Savonichev <andrew.savonichev at gmail.com>
DeltaFile
+38-0llvm/test/tools/dxil-dis/vla.ll
+14-10llvm/lib/Target/DirectX/DXILWriter/DXILBitcodeWriter.cpp
+52-102 files

LLVM/project ee88788llvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchLASXInstrInfo.td, llvm/test/CodeGen/LoongArch/lasx/ir-instruction avgfloor-ceil.ll

[LoongArch] Set `avg{floor/ceil}{s/u}` as legal for lsx and lasx

Suggested-by: tangaac <tangyan01 at loongson.cn>
Link: https://github.com/llvm/llvm-project/pull/161079#issuecomment-3420763377
DeltaFile
+16-64llvm/test/CodeGen/LoongArch/lasx/ir-instruction/avgfloor-ceil.ll
+16-64llvm/test/CodeGen/LoongArch/lsx/ir-instruction/avgfloor-ceil.ll
+8-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+4-0llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+4-0llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+48-1285 files

LLVM/project 8b3eac0llvm/lib/Target/DirectX/DXILWriter DXILBitcodeWriter.cpp, llvm/test/tools/dxil-dis di-compile-unit-versioned-language.ll

[DirectX] Convert DICompileUnit versioned language (#192574)

Versioned languages did not exist in LLVM 3.7.
DeltaFile
+13-0llvm/test/tools/dxil-dis/di-compile-unit-versioned-language.ll
+8-1llvm/lib/Target/DirectX/DXILWriter/DXILBitcodeWriter.cpp
+21-12 files