LLVM/project a3deda2llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp

declare and assign immediately if possible

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+1-2llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+1-21 files

LLVM/project 73de4c7llvm/lib/Target/AMDGPU VOP3PInstructions.td

[NFC][AMDGPU] Improve the predicate uses for WMMAs (#199807)
DeltaFile
+71-67llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+71-671 files

LLVM/project af8f3d5llvm/include/llvm/IR MemoryModelRelaxationAnnotations.h

fix the argument reference in doxygen

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve at amd.com>
DeltaFile
+1-1llvm/include/llvm/IR/MemoryModelRelaxationAnnotations.h
+1-11 files

LLVM/project 90fbd19llvm/utils/lit/lit TestRunner.py cl_arguments.py, llvm/utils/lit/tests fn-selection.py

[lit] Add --fn to prepend llvm-extract for function-level test narrowing

Add a --fn=name1,name2 flag to llvm-lit that prepends
llvm-extract --func=<name> to the first pipeline stage of each
RUN line whose first stage references %s. This lets users narrow
IR tests to specific functions and their dependencies without
modifying test files.
DeltaFile
+30-0llvm/utils/lit/lit/TestRunner.py
+25-0llvm/utils/lit/tests/fn-selection.py
+17-0llvm/utils/lit/lit/cl_arguments.py
+11-0llvm/utils/lit/tests/Inputs/fn-selection/lit.cfg
+2-0llvm/utils/lit/tests/Inputs/fn-selection/sample.ll
+2-0llvm/utils/lit/tests/Inputs/fn-selection/mock-bin/llvm-extract
+87-02 files not shown
+90-08 files

LLVM/project 08341d6llvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir, llvm/test/CodeGen/X86 horizontal-reduce-umax.ll

Merge branch 'main' into users/chenshanzhi/AArch64-TTI-getTgtMemIntrinsic
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+3,075-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+0-2,353llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
+0-2,280mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+13,851-4,8012,885 files not shown
+96,729-52,0652,891 files

LLVM/project 0e3a415llvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir, mlir/lib/Dialect/XeGPU/Transforms XeGPUSubgroupDistribute.cpp

rebase

Created using spr 1.3.4
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+3,075-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+0-2,280mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+16,047-2,4481,673 files not shown
+66,236-28,8131,679 files

LLVM/project a5a009fllvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir, mlir/lib/Dialect/XeGPU/Transforms XeGPUSubgroupDistribute.cpp

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.4

[skip ci]
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+3,075-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+0-2,280mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+16,047-2,4481,673 files not shown
+66,236-28,8131,679 files

LLVM/project 1f63ebcllvm/lib/Target/AMDGPU VOP3PInstructions.td

[NFC][AMDGPU] Improve the predicate uses for WMMAs
DeltaFile
+71-67llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+71-671 files

LLVM/project 1b19eccbolt/include/bolt/Profile DataAggregator.h, bolt/lib/Profile DataAggregator.cpp

[BOLT][NFC] Split out function marking from profile parsing

Move out `setHasProfileAvailable` into `markFunctionsWithProfile`.
This also allows extracting per-pre-aggregated type handling in
`parseAggregatedLBREntry` into a switch statement.

Test Plan:
NFC

Processing time change (wall time):
* 10MB pre-aggregated profile:
  - Parsing aggregated branch events: 0.16s -> 0.05s
  - Pre-process profile data (parsing+marking): 0.18s -> 0.16s

* 6GB perf.data file:
  - Parsing branch events: 29.06s -> 28.55s
  - Pre-process profile data (excluding perf script): 29.47s -> 29.13s

Reviewers:

    [2 lines not shown]
DeltaFile
+55-36bolt/lib/Profile/DataAggregator.cpp
+3-0bolt/include/bolt/Profile/DataAggregator.h
+58-362 files

LLVM/project 40602b6lld/COFF Writer.cpp, lld/test/COFF ctors_dtors_priority.s

[LLD][COFF] Gate second-dot section-name stripping on MinGW (#199625)

The comment in getOutputSectionName has always called the second-dot
stripping "for MinGW" (e.g. .ctors.NNNN), but the code applied it on
every target. This hiddes a split-dwarf bug #199616.

Take an isMinGW gate and skip the stripping when it is false.
DeltaFile
+11-0lld/test/COFF/ctors_dtors_priority.s
+4-2lld/COFF/Writer.cpp
+15-22 files

LLVM/project c12dc7dllvm/test/Transforms/EarlyCSE/AArch64 intrinsics-1xN.ll

fixup! [AArch64][TTI][EarlyCSE] Add support for ld1xN and st1xN intrinsics
DeltaFile
+18-0llvm/test/Transforms/EarlyCSE/AArch64/intrinsics-1xN.ll
+18-01 files

LLVM/project 7e6f337llvm/test/tools/llvm-mca/AArch64/Cortex A510-writeback.s A53-writeback.s

[llvm-mca] Fix total execution count in Average Wait times (#199500)

Fix the column `0` for the `<total>` row in llvm-mca's `Average Wait times` report. The `total`
row now represents the total dynamic execution count used to normalize the averages, 
instead of the per-instruction iteration count. Update the timeline view docs and autogenerated
test expectations accordingly.

Co-authored-by: liuxiaodong <liuxiaodong at sunmmio.com>
DeltaFile
+91-91llvm/test/tools/llvm-mca/AArch64/Cortex/A510-writeback.s
+91-91llvm/test/tools/llvm-mca/AArch64/Cortex/A53-writeback.s
+91-91llvm/test/tools/llvm-mca/AArch64/Cortex/A55-writeback.s
+91-91llvm/test/tools/llvm-mca/AArch64/Cortex/A57-writeback.s
+91-91llvm/test/tools/llvm-mca/AArch64/Cortex/C1Premium-writeback.s
+91-91llvm/test/tools/llvm-mca/AArch64/Cortex/C1Ultra-writeback.s
+546-546289 files not shown
+2,323-2,323295 files

LLVM/project d139f65llvm/test/Transforms/SLPVectorizer/AArch64 lcssa-phi-extract-scale.ll

[SLP][NFC]Add another test for external phi user, NFC



Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/199804
DeltaFile
+217-1llvm/test/Transforms/SLPVectorizer/AArch64/lcssa-phi-extract-scale.ll
+217-11 files

LLVM/project 0331d5eflang/lib/Lower/OpenMP Utils.cpp OpenMP.cpp, flang/test/Lower/OpenMP/Todo metadirective-structured-trait-property.f90

Guard unsupported metadirective trait matching
DeltaFile
+20-0flang/test/Lower/OpenMP/Todo/metadirective-structured-trait-property.f90
+6-3flang/lib/Lower/OpenMP/Utils.cpp
+1-0flang/lib/Lower/OpenMP/OpenMP.cpp
+27-33 files

LLVM/project 7938535llvm/include/llvm/IR GlobalValue.h, llvm/include/llvm/Transforms/Utils AssignGUID.h

Compute GUIDs once and store in metadata (#184065)

This allows us to keep GUIDs consistent across compilation phases which
may change the name or linkage type.

See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801

This is a large change since the addition of metadata breaks many tests.
The test changes are mostly just trivial changes to checks to get them
passing.
DeltaFile
+61-17llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+45-30llvm/lib/LTO/LTO.cpp
+55-0llvm/lib/IR/Globals.cpp
+49-3llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+42-5llvm/include/llvm/IR/GlobalValue.h
+42-0llvm/include/llvm/Transforms/Utils/AssignGUID.h
+294-55110 files not shown
+735-367116 files

LLVM/project e0ef143lld/test/wasm map-file.s global-base.test

[lld][WebAssembly] Only include __stack_pointer when needed (#199739)
DeltaFile
+18-19lld/test/wasm/map-file.s
+6-18lld/test/wasm/global-base.test
+5-17lld/test/wasm/stack-first.test
+2-14lld/test/wasm/table-base.s
+4-10lld/test/wasm/globals.s
+7-7lld/test/wasm/merge-string-debug.s
+42-8524 files not shown
+63-19730 files

LLVM/project e6d8a8fclang/docs ReleaseNotes.rst, clang/lib/CodeGen CodeGenModule.cpp

[Clang] Emit prefix map normalization before generating hashes for the unique linkage names. (#198667)

Use normalized path from the macro prefix map to generate the unique ids
for the internal linkage names. That allows a reproducible hash on any
build system. Regularly the macro prefix map gets normalized in favor of
the target system before the path substitution.
DeltaFile
+33-0clang/test/CodeGen/unique-internal-linkage-names.cpp
+4-6clang/lib/CodeGen/CodeGenModule.cpp
+5-0clang/docs/ReleaseNotes.rst
+42-63 files

LLVM/project 2bd872bllvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize alias-mask.ll alias-mask-negative-tests.ll

[LV] Add support for partial alias masking with tail folding (#182457)

This patch adds basic support for partial alias masking, which allows
entering the vector loop even when there is aliasing within a single
vector iteration. It does this by clamping the VF to the safe distance
between pointers. This allows the runtime VF to be anywhere from 2 to
the "static" VF.

Conceptually, this transform looks like:

```
  // `c` and `b` may alias.
  for (int i = 0; i < n; i++) {
    c[i] = a[i] + b[i];
  }
```

->


    [33 lines not shown]
DeltaFile
+356-0llvm/test/Transforms/LoopVectorize/alias-mask.ll
+270-0llvm/test/Transforms/LoopVectorize/AArch64/alias-mask-uniforms.ll
+264-0llvm/test/Transforms/LoopVectorize/AArch64/alias-mask.ll
+200-0llvm/test/Transforms/LoopVectorize/alias-mask-negative-tests.ll
+159-10llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+129-4llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+1,378-1425 files not shown
+2,127-7831 files

LLVM/project f4352c9clang/lib/Parse ParseExprCXX.cpp ParseStmt.cpp

reorder parsing logic
DeltaFile
+9-11clang/lib/Parse/ParseExprCXX.cpp
+3-1clang/lib/Parse/ParseStmt.cpp
+12-122 files

LLVM/project 9eb0d42llvm/utils/lit/lit TestRunner.py util.py, llvm/utils/lit/lit/formats shtest.py

[lit][NFC] remove future statements for mandatory features in Python 3 (#199786)

This patch removes future statements from lit for features that are
mandatory in Python 3.0 and later.

Specifically, it removes future statements for
[`absolute_import`](https://docs.python.org/3/library/__future__.html#future__.absolute_import)
and
[`print_function`](https://docs.python.org/3/library/__future__.html#future__.print_function),
since both became mandatory in Python 3.0.
DeltaFile
+1-1llvm/utils/lit/lit/TestRunner.py
+0-2llvm/utils/lit/lit/formats/shtest.py
+0-2llvm/utils/lit/lit/util.py
+0-2llvm/utils/lit/tests/Inputs/check_path.py
+0-1llvm/utils/lit/tests/Inputs/shtest-timeout/short.py
+0-1llvm/utils/lit/lit/LitConfig.py
+1-95 files not shown
+1-1411 files

LLVM/project e918a5allvm/utils/lit/lit TestRunner.py Test.py, llvm/utils/lit/lit/llvm subst.py

[lit][NFC] remove new-style class opt-ins (#199784)

In Python 3.0 and later it is no longer necessary to explicitly derive
from `object` to opt into "new-style" classes, they are the default.

Since the current minimum Python version is 3.8, this is no longer
required. This patch removes `object` from the base class lists of all
affected classes in lit.
DeltaFile
+4-4llvm/utils/lit/lit/TestRunner.py
+3-3llvm/utils/lit/lit/Test.py
+3-3llvm/utils/lit/lit/display.py
+2-2llvm/utils/lit/lit/ShellEnvironment.py
+2-2llvm/utils/lit/lit/llvm/subst.py
+1-1llvm/utils/lit/lit/run.py
+15-155 files not shown
+20-2011 files

LLVM/project 577e9a7lld/test/wasm tls-libcall.s, lld/wasm Driver.cpp Writer.cpp

[WebAssembly] WASIP3 Library Call Thread Context Support (#175800)

The [WebAssembly Component
Model](https://component-model.bytecodealliance.org/) has added support
for [cooperative
multithreading](https://github.com/WebAssembly/component-model/pull/557).
This has been implemented in the [Wasmtime
engine](https://github.com/bytecodealliance/wasmtime/pull/11751) and is
part of the wider project of [WASI preview
3](https://wasi.dev/roadmap#upcoming-wasi-03-releases), which is
currently tracked
[here](https://github.com/orgs/bytecodealliance/projects/16).

These changes require updating the way that `__stack_pointer` and
`__tls_base` work purely for a new `wasm32-wasip3` target; other targets
will not be touched. Specifically, rather than using a Wasm global for
tracking the stack pointer and TLS base, the new
[`context.get/set`](https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md#-canon-contextget)
component model builtin functions will be used (the intention being that

    [15 lines not shown]
DeltaFile
+71-0lld/test/wasm/tls-libcall.s
+47-20llvm/lib/Target/WebAssembly/WebAssemblyFrameLowering.cpp
+41-6lld/wasm/Driver.cpp
+39-0llvm/test/DebugInfo/WebAssembly/thread-context-abi.ll
+24-7lld/wasm/Writer.cpp
+31-0llvm/test/CodeGen/WebAssembly/tls-local-exec.ll
+253-3324 files not shown
+472-8430 files

LLVM/project fc60e08.github/workflows libc-overlay-tests.yml

[libc] Use containers for overlay precommit CIs. (#199294)
DeltaFile
+12-23.github/workflows/libc-overlay-tests.yml
+12-231 files

LLVM/project e16c5f1llvm/lib/Target/AMDGPU SIFrameLowering.cpp

Fixes for buildbot breaks in #183153

Attempt at fixing issues in #183153 caught by buildbots, specifically
no-assert and windows builds.

Not sure how to run those bots ahead of landing this?

Change-Id: I285adf09ac2df239d0ab05459f7388b6970247ad
DeltaFile
+7-8llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+7-81 files

LLVM/project bf420f0llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 interleaved-store-noninbounds-gep.ll

[AArch64] Fix hasNearbyPairedStore to handle non-inbounds GEPs (#199137)

Problem: `hasNearbyPairedStore` uses
`stripAndAccumulateInBoundsConstantOffsets` to decompose store pointers
into (base, offset) pairs and check whether two stores are 16 bytes
apart. This fails when LSR has rewritten pointer arithmetic into
non-inbounds GEPs because the function refuses to look through them. The
two stores then appear to have different base pointers and the check
returns false. When this happens, `lowerInterleavedStore` proceeds to
emit `ST2` for a pattern that would be more profitable as `zip+stp`,
since the load-store optimizer can pair adjacent stores into `STP` but
cannot merge `ST2` with anything. On a bf16-to-fp32 NEON conversion loop
this causes a regression from 11 to 17 instructions per iteration.
Note: Interleaved stores support was added for RISCV in
https://github.com/llvm/llvm-project/pull/115354. Turning this off
produces the desired STP instructions.

https://godbolt.org/z/1afsjPd3e


    [7 lines not shown]
DeltaFile
+44-0llvm/test/CodeGen/AArch64/interleaved-store-noninbounds-gep.ll
+4-3llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+48-32 files

LLVM/project 187dfefflang/include/flang/Optimizer/Builder FIRBuilder.h, flang/include/flang/Optimizer/Dialect FIRBoxUtils.h

[flang] Enabled pulling of rebox into array_coor. (#199161)

This patch enables pulling slicing `fir.rebox` operations
into `fir.array_coor`. This helps preserve information about
the original rank of the array being accessed.
`FIRToMemRef` and later passes may benefit from this.

Assisted by: Claude
DeltaFile
+42-13flang/test/Fir/array-coor-canonicalization.fir
+42-0flang/lib/Optimizer/Dialect/FIRBoxUtils.cpp
+29-2flang/lib/Optimizer/Dialect/FIROps.cpp
+29-0flang/include/flang/Optimizer/Dialect/FIRBoxUtils.h
+0-26flang/lib/Optimizer/Builder/FIRBuilder.cpp
+3-10flang/include/flang/Optimizer/Builder/FIRBuilder.h
+145-511 files not shown
+146-517 files

LLVM/project f3c0f26flang/lib/Optimizer/Transforms FIRToMemRef.cpp, flang/test/Transforms/FIRToMemRef array-coor-rebox-slice-shape.mlir array-coor-slice-shift.mlir

[flang][FIRToMemRef] Get strides from descriptor for some array_coor cases. (#199158)

This is a follow-up on Jean's comment
https://github.com/llvm/llvm-project/pull/198933#discussion_r3279535746

This patch makes use of the descriptor strides when `fir.array_coor`'s
memref is a `fir.box` that is not a fir.embox result.
DeltaFile
+28-14flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
+30-0flang/test/Transforms/FIRToMemRef/array-coor-rebox-slice-shape.mlir
+11-8flang/test/Transforms/FIRToMemRef/array-coor-slice-shift.mlir
+69-223 files

LLVM/project f263446clang/lib/Driver/ToolChains AMDGPU.cpp, clang/test/Driver amdgpu-validate-sanitize.cl

clang/AMDGPU: Report all runtimeless sanitizers as available (#199642)
DeltaFile
+18-0clang/test/Driver/amdgpu-validate-sanitize.cl
+1-1clang/lib/Driver/ToolChains/AMDGPU.cpp
+19-12 files

LLVM/project e9e5d4eclang/lib/Driver/ToolChains AMDGPU.cpp

clang/AMDGPU: Remove unnecessary fallback to check -march (#199780)

-march is now rewritten to -mcpu.
DeltaFile
+1-4clang/lib/Driver/ToolChains/AMDGPU.cpp
+1-41 files

LLVM/project 7e98d19llvm/lib/Transforms/Scalar LoopFuse.cpp, llvm/test/Transforms/LoopFusion different_guards.ll

[LoopFusion] Do not fuse loops with different guards (#199724)

The testcase that was originally contributed to #193641 exposed a
functional issue in which loop fusion can fuse functions with different
loop guards. There seem to two distinct bugs and each of them alone is
enough to let this happen.

- The condition that checks loop guards are identical, intends to
exclude loops that require peeling. But the condition is not correct and
it allows some loops that do not require peeling to pass.

- The condition that checks two guards are identical implicitly assume
conditions of guard branches are instructions, but this is not
necessarily always correct.

This patch fixes the problem for the loops that do not require peeling.
The issue still exists for loops that require peeling and will be fixed
separately.
DeltaFile
+45-0llvm/test/Transforms/LoopFusion/different_guards.ll
+20-15llvm/lib/Transforms/Scalar/LoopFuse.cpp
+65-152 files