LLVM/project 0677ebellvm/lib/Transforms/Vectorize VPlanRecipes.cpp

[VPlan] Use getSingleUser to improve code (NFC) (#203882)
DeltaFile
+3-7llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+3-71 files

LLVM/project 94f6b80llvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp SIISelLowering.cpp

[AMDGPU] Guard more intrinsics with target features
DeltaFile
+1-51llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+0-42llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+0-24llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+15-2llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+4-4llvm/test/CodeGen/AMDGPU/unsupported-av-store.ll
+4-4llvm/test/CodeGen/AMDGPU/unsupported-av-load.ll
+24-12712 files not shown
+45-14318 files

LLVM/project 53695bbllvm/lib/Target/RISCV RISCVInstrInfoC.td RISCVInstrInfoXqci.td

[𝘀𝗽𝗿] initial version

Created using spr 1.3.8-beta.1
DeltaFile
+30-30llvm/lib/Target/RISCV/RISCVInstrInfoC.td
+25-25llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td
+18-18llvm/lib/Target/RISCV/RISCVInstrInfoZc.td
+16-16llvm/lib/Target/RISCV/RISCVInstrInfoXwch.td
+6-6llvm/lib/Target/RISCV/RISCVInstrInfoZclsd.td
+1-1llvm/lib/Target/RISCV/RISCVInstrInfo.td
+96-966 files

LLVM/project e9faee6lldb/packages/Python/lldbsuite/test dotest.py

[lldb][test] Skip watchpoint and expression tests on WebAssembly (#204235)

WebAssembly has no watchpoint support (Process/wasm reports no
watchpoints; the stop reason comes back as a plain signal) and cannot
JIT or interpret expressions (ProcessWasm sets CanJIT to false). Teach
the existing per-platform category checks about wasm so the whole
"watchpoint" and "expression" categories are skipped, rather than
decorating each test individually.
DeltaFile
+15-0lldb/packages/Python/lldbsuite/test/dotest.py
+15-01 files

LLVM/project 1190787clang/lib/CodeGen CodeGenAction.cpp, llvm/lib/CodeGen/SelectionDAG SelectionDAGBuilder.cpp

[RFC][CodeGen] Add generic target feature checks for intrinsics

This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.

It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.

Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.

This PR uses one AMDGPU intrinsic as an example.
DeltaFile
+96-3llvm/lib/MC/MCSubtargetInfo.cpp
+37-0clang/lib/CodeGen/CodeGenAction.cpp
+36-0llvm/lib/IR/DiagnosticInfo.cpp
+33-1llvm/utils/TableGen/Basic/IntrinsicEmitter.cpp
+28-0llvm/test/TableGen/intrinsic-target-features.td
+25-0llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+255-414 files not shown
+391-920 files

LLVM/project 7e26ecbllvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp SIISelLowering.cpp

[AMDGPU] Guard more intrinsics with target features
DeltaFile
+1-51llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+0-42llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+0-24llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+14-0llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+4-4llvm/test/CodeGen/AMDGPU/unsupported-av-load.ll
+4-4llvm/test/CodeGen/AMDGPU/unsupported-av-store.ll
+23-12512 files not shown
+44-14118 files

LLVM/project b2c9be7clang/lib/CodeGen CodeGenAction.cpp, llvm/lib/CodeGen/SelectionDAG SelectionDAGBuilder.cpp

[RFC][CodeGen] Add generic target feature checks for intrinsics

This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.

It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.

Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.

This PR uses one AMDGPU intrinsic as an example.
DeltaFile
+96-3llvm/lib/MC/MCSubtargetInfo.cpp
+37-0clang/lib/CodeGen/CodeGenAction.cpp
+36-0llvm/lib/IR/DiagnosticInfo.cpp
+33-1llvm/utils/TableGen/Basic/IntrinsicEmitter.cpp
+32-0llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+28-0llvm/test/TableGen/intrinsic-target-features.td
+262-414 files not shown
+398-920 files

LLVM/project cab48fallvm/lib/Target/RISCV RISCVRegisterInfo.td, llvm/test/MC/RISCV rv32c-invalid.s rv64c-invalid.s

[𝘀𝗽𝗿] initial version

Created using spr 1.3.8-beta.1
DeltaFile
+12-12llvm/test/MC/RISCV/rv32c-invalid.s
+4-4llvm/test/MC/RISCV/rv64c-invalid.s
+2-2llvm/test/MC/RISCV/rvc-hints-invalid.s
+2-2llvm/test/MC/RISCV/xqcibm-invalid.s
+1-0llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+21-205 files

LLVM/project 4033370clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat LUSummaryEncoding.cpp TUSummary.cpp, clang/unittests/ScalableStaticAnalysisFramework EntityLinkerTest.cpp

Revert "Reland "[clang][ssaf] Track target triple in TU and LU summaries" (#2…"

This reverts commit 9434d4ab865319c443826c2eb408329d0011dc71.
DeltaFile
+1-24clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/LUSummaryEncoding.cpp
+1-24clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/TUSummary.cpp
+1-24clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/LUSummary.cpp
+1-24clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/TUSummaryEncoding.cpp
+9-12clang/unittests/ScalableStaticAnalysisFramework/EntityLinkerTest.cpp
+0-16clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/JSONFormatImpl.cpp
+13-124170 files not shown
+175-566176 files

LLVM/project fee56f1llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo

Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+79-36llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+36-10llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+27-1llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+16-0llvm/test/MC/AMDGPU/amdgcn-target-directive-triple-env.s
+5-5llvm/test/MC/AMDGPU/hsa-diag-v4.s
+4-4llvm/test/MC/AMDGPU/isa-version-pal.s
+167-5615 files not shown
+197-7621 files

LLVM/project 4e5d348llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU amdgcn_target_directive_from_eflags.s

AMDGPU: Teach disassembler to produce target id directives

Inspect the binary's e_flags to reproduce the .amdgcn_target directive.
This is a step towards round-trip disassembly without depending
on command line state specifying the subtarget. I wasn't sure
where to put the emission to ensure it is always emitted. I
also do not know why it's OK to just write to outs(), but that's
what the other directives here were doing.

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+72-0llvm/test/MC/AMDGPU/amdgcn_target_directive_from_eflags.s
+53-0llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+4-4llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s
+4-4llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx11.s
+3-3llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s
+3-3llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s
+139-1410 files not shown
+161-2616 files

LLVM/project e8b2205llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU vgpr-excess-threshold-percent.ll vgpr-excess-threshold-percent-invalid.ll

[AMDGPU] Add flag to control VGPR pressure limits (#203797)

The RP trackers don't accurately measure the RA problem, and can
underestimate the number of registers required. Currently, for VGPR
pressure, we account for these inaccuracies using VGPRLimitBias, and
ErrorMargin. These are used to reduce the VGPRCriticalLimit /
VGPRExcessLimit . During scheduling, we check RP against these limits,
and if we start to see RP exceeding these limits, we will trigger RP
reduction heuristics (when deciding which instructions to schedule
next). Thus VGPRLimitBias + ErrorMargin effectively reduce the amount of
allowable RP during scheduling, as a means to compensate for RP tracker
inaccuracies. Currently, ErrorMargin is set to 3, and VGPRLimitBias is
set to 0.

However, the degree of inaccuracy tends to scale with the number of
registers we have available for allocation. In other words, the RP
trackers inaccuracy is better expressed as a percent of the register
budget, rather than some literal value. This PR adds some functionality
to express this inaccuracy compensation is a percent - and exposes a

    [7 lines not shown]
DeltaFile
+194-0llvm/test/CodeGen/AMDGPU/vgpr-excess-threshold-percent.ll
+44-2llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+13-0llvm/test/CodeGen/AMDGPU/vgpr-excess-threshold-percent-invalid.ll
+251-23 files

LLVM/project b3d0487llvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp SIInstrInfo.h

[AMDGPU] NFC: Obviously show isVALU includes LDSDMA instructions (#203548)

In https://reviews.llvm.org/D124472 we started labelling LDSDMA as VALU
-- this was due to SPG stating that these instructions act as both
memory + VALU instructions.

This is buried in the isVALU methods - I'd argue that most users without
knowledge of this characteristic would not expect this behavior, and
looking at the implementation of these methods, there is nothing that
would suggest this behavior. This PR forces users to confront this
characteristic and decide if that is what they want to do for their
usecase.

I've personally seen at least two bugs in upstream code caused by this,
and have seen it cause problems a dozen + times in downstream code / in
WIP things.
DeltaFile
+61-49llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+19-7llvm/lib/Target/AMDGPU/SIInstrInfo.h
+11-8llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+3-5llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+4-4llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+3-3llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+101-767 files not shown
+114-8713 files

LLVM/project 698648fllvm/include/llvm/Transforms/Vectorize/SandboxVectorizer DependencyGraph.h Scheduler.h, llvm/lib/Transforms/Vectorize/SandboxVectorizer DependencyGraph.cpp Scheduler.cpp

[SandboxVec][DAG] Implement UnscheduledPreds API (#201240)

Mirroring UnscheduledSuccs, this patch adds an UnscheduledPreds DAG node
counter that counts how many predecessors are not scheduled yet.

It also renames the existing ready() to readyBottomUp() to help us
differentiate between the two variants that are now available.
DeltaFile
+145-9llvm/unittests/Transforms/Vectorize/SandboxVectorizer/DependencyGraphTest.cpp
+26-4llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/DependencyGraph.h
+20-4llvm/lib/Transforms/Vectorize/SandboxVectorizer/DependencyGraph.cpp
+7-0llvm/unittests/Transforms/Vectorize/SandboxVectorizer/SchedulerTest.cpp
+3-3llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp
+1-1llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/Scheduler.h
+202-216 files

LLVM/project 1b4d463llvm/test/ExecutionEngine/MCJIT frem.ll, llvm/tools/lli CMakeLists.txt

[MCJIT] Fix frem.ll test failure with LLVM_ENABLE_RPMALLOC on Windows (#200319)

When compiled with `LLVM_ENABLE_RPMALLOC`, `lli.exe` links statically to
the runtime. With `LLVM_EXPORT_SYMBOLS_FOR_PLUGINS` enabled, `lli.exe`
exports a subset of symbols from the runtime library, but not all. In
particular, `printf()` is exported from the application binary, but
`fflush()` and `exit()` are not. For a JITted module, unresolved
external symbols are loaded either from the application or dynamic
libraries, in this case, from `msvcrt.dll`. The `MCJIT/frem.ll` test
attempts to flush the output, but because the functions resolve to
different CRT instances, the output data is lost.

The patch avoids the test failure by disabling exporting symbols from
`lli.exe` when it is linked with the static runtime library.
DeltaFile
+15-1llvm/tools/lli/CMakeLists.txt
+0-2llvm/test/ExecutionEngine/MCJIT/frem.ll
+15-32 files

LLVM/project dcd41b4llvm/include/llvm/Target TargetSelectionDAG.td, llvm/lib/Target/X86 X86InstrFragmentsSIMD.td X86InstrAVX512.td

[X86] Extend alignedstore PatFrag to cover atomic_store (#197861)

Smaller FP vectors (`<N x half>`, `<N x bfloat>`) are left to the DAG
widen path on subtargets without native FP16/BF16 support; the
v8f16/v8bf16 bitconvert variants added to the Atomic Store Split
commit's patterns let the
widened path collapse to a single instruction on AVX+ targets.

Store-side counterpart to #148899 (and now
https://github.com/llvm/llvm-project/pull/199520). Stacked on top of
https://github.com/llvm/llvm-project/pull/201980; and below of
https://github.com/llvm/llvm-project/pull/201566.
DeltaFile
+86-0llvm/test/CodeGen/X86/atomic-load-store.ll
+4-2llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
+3-2llvm/lib/Target/X86/X86InstrAVX512.td
+1-1llvm/include/llvm/Target/TargetSelectionDAG.td
+94-54 files

LLVM/project 3a7f3fdllvm/test/Transforms/SLPVectorizer/RISCV rotated-strided-loads.ll

[SLP] Add test demonstrating bug in widened strided store logic (#204012)

See #204011
DeltaFile
+418-0llvm/test/Transforms/SLPVectorizer/RISCV/rotated-strided-loads.ll
+418-01 files

LLVM/project 77fe8dcflang/docs Extensions.md, flang/lib/Semantics resolve-names.cpp

[flang][semantics] Allow forward-typed PARAMETER constants under IMPLICIT NONE (#203398)

Under IMPLICIT NONE, flang rejected a named constant defined by a
PARAMETER statement whose explicit type declaration appears later in the
same specification part:

    implicit none
    parameter(n=4096)
    integer n          ! error: No explicit type declared for 'n'
    end

Accept it as an extension, reusing the existing ForwardRefImplicitNone
language feature that already permits forward references to dummy
arguments and COMMON variables under IMPLICIT NONE(TYPE). The behavior
is accepted silently by default and emits a portability warning under
-pedantic.

Assisted-by: AI
DeltaFile
+76-0flang/test/Semantics/resolve130.f90
+30-10flang/lib/Semantics/resolve-names.cpp
+34-0flang/test/Semantics/resolve131.f90
+4-0flang/docs/Extensions.md
+144-104 files

LLVM/project 2dfdd09clang/lib/StaticAnalyzer/Checkers MallocSizeofChecker.cpp, clang/test/Analysis malloc-sizeof-fp.cpp

[clang][StaticAnalyzer] Reduce MallocSizeofChecker false positives for layout-compatible types (#200253)

When one operand is a record type and the other is a non-record type,
treat them as compatible if they share the same size and the record's
alignment satisfies the scalar's alignment. This suppresses warnings for
patterns like `malloc(sizeof(std::atomic<int32_t>))` assigned to an
`int32_t *` (or a wrapper struct with an identical layout), while still
flagging genuinely mismatched types such as `long` vs `double` or
unrelated struct pairs.

rdar://177553628

---------

Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+61-0clang/test/Analysis/malloc-sizeof-fp.cpp
+32-5clang/lib/StaticAnalyzer/Checkers/MallocSizeofChecker.cpp
+93-52 files

LLVM/project 930a46dclang/include/clang/Basic BuiltinsAVR.def TargetBuiltins.h, clang/lib/Basic/Targets AVR.cpp

[clang][AVR] Add basic AVR builtin functions (#203214)

Adds support for AVR specific builtin functions as defined in:
https://gcc.gnu.org/onlinedocs/gcc/AVR-Built-in-Functions.html

The simpler builtins have been implemented: nop, sei, cli, sleep, wdr,
swap. And they are lowered to their llvm.avr.* intrinsics.

---------

Signed-off-by: Dakkshesh <beakthoven at gmail.com>
DeltaFile
+267-36clang/test/CodeGen/avr/avr-builtins.c
+47-0clang/include/clang/Basic/BuiltinsAVR.def
+42-0clang/lib/CodeGen/TargetBuiltins/AVR.cpp
+32-3clang/lib/Basic/Targets/AVR.cpp
+11-1clang/include/clang/Basic/TargetBuiltins.h
+8-0clang/test/Preprocessor/avr-builtins.c
+407-404 files not shown
+412-4310 files

LLVM/project 9434d4aclang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat TUSummaryEncoding.cpp TUSummary.cpp, clang/unittests/ScalableStaticAnalysisFramework EntityLinkerTest.cpp

Reland "[clang][ssaf] Track target triple in TU and LU summaries" (#204218)

This commit introduces the following changes:
  
- Add `TargetTriple` field to `TUSummary`, `LUSummary`, and their encodings.
- Frontend captures the triple from `CompilerInstance::getTarget()` when extracting a TU summary.
- JSON format reads/writes a `target_triple` field at the root of each summary; reader rejects strings not in `llvm::Triple::normalize` form.
- All TU/LU JSON test inputs/outputs and unit tests updated to include the new field.
- `TargetParser` is added to `LLVM_LINK_COMPONENTS` for `clangScalableStaticAnalysisFrameworkCore`, which provides `Triple::normalize` and `Triple(string&&)` constructor that the `JSONFormat` sources reference.

`clang-ssaf-linker` uses a hardcoded triple for the link unit; surfacing the triple through the tool will be handled in a follow-up PR.

rdar://179403011
DeltaFile
+24-1clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/TUSummaryEncoding.cpp
+24-1clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/TUSummary.cpp
+24-1clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/LUSummaryEncoding.cpp
+24-1clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/LUSummary.cpp
+12-9clang/unittests/ScalableStaticAnalysisFramework/EntityLinkerTest.cpp
+16-0clang/lib/ScalableStaticAnalysisFramework/Core/Serialization/JSONFormat/JSONFormatImpl.cpp
+124-13170 files not shown
+566-175176 files

LLVM/project 2916c77clang/docs SanitizerSpecialCaseList.rst ReleaseNotes.rst, clang/unittests/Basic DiagnosticTest.cpp

Make sanitizer special case list slash-agnostic (#149886)

This changes the glob matcher for the sanitizer special case format so
that it treats `/` as matching both forward and back slashes.

When dealing with cross-compiles or build systems that don't normalize
slashes, it's possible to run into file paths with inconsistent
slashiness, e.g. `../..\v8/include\v8-internal.h` when [building
chromium](https://g-issues.chromium.org/issues/425364464).

We can match this using the current syntax using this ugly kludge:
`src:*{/,\\}v8{/,\\}*`. However, since the format is explicitly for
listing file paths, it makes sense to treat `/` as denoting a path
separator rather than a literal forward slash. This allows us to write
the much more natural form `src:*/v8/*` and have it work on any
platform.

This is technically a behavior change, but it seems very unlikely to
come up in practice. It will only make a difference if a user has a

    [9 lines not shown]
DeltaFile
+35-0clang/unittests/Basic/DiagnosticTest.cpp
+25-6llvm/lib/Support/SpecialCaseList.cpp
+20-0llvm/unittests/Support/SpecialCaseListTest.cpp
+12-0clang/docs/SanitizerSpecialCaseList.rst
+5-0clang/docs/ReleaseNotes.rst
+97-65 files

LLVM/project b2cb999compiler-rt/lib/scudo/standalone secondary.h

[scudo] Use the unmap function on MemMap object. (#204001)

The current call does a unmap(MemMap), but the rest of the code is doing
MemMap.unmap(XXX), so follow that pattern.
DeltaFile
+1-1compiler-rt/lib/scudo/standalone/secondary.h
+1-11 files

LLVM/project 50e13e9flang/lib/Optimizer/Transforms/CUDA CUFOpConversion.cpp, flang/test/Fir/CUDA cuda-global-addr.mlir

[flang][cuda] Avoid runtime copies for scalar constant host reads (#204193)

Fix CUDA Fortran lowering for host reads from scalar module variables
with the `constant` attribute.

Host code can read and write CUDA constants, while kernels read the
device constant symbol. Flang keeps a host-visible value for scalar
constant host accesses and uses a device symbol for kernels.

After preserving the host declaration, scalar read-backs such as `x = c`
could still be lowered as device-to-host runtime copies, passing a host
pointer as the CUDA source. This change lowers those read-backs as
regular host load/store operations, while keeping the runtime update for
host-to-device assignments.
DeltaFile
+16-2flang/lib/Optimizer/Transforms/CUDA/CUFOpConversion.cpp
+15-0flang/test/Fir/CUDA/cuda-global-addr.mlir
+31-22 files

LLVM/project e421148llvm/test/Analysis/CostModel/AMDGPU rem.ll div.ll, llvm/test/Transforms/VectorCombine/AMDGPU extract-insert-i8.ll no-scalarize-vector-extract.ll

[AMDGPU] Refine i8 extractelement cost model (#203932)

Expand the cases when i8 extract elements are free. The extract elements
should be free when they are part of a sequence that extract multiple
consecutive elements the size of a register. This change enables the
SLPVectorizer to keep extract elements over more costly shufflevectors.

This PR also undoes a previous change that made insert element free, but
those require sequences of shift/or instructions so shouldn't be free.
DeltaFile
+42-42llvm/test/Analysis/CostModel/AMDGPU/rem.ll
+42-42llvm/test/Analysis/CostModel/AMDGPU/div.ll
+26-26llvm/test/Analysis/CostModel/AMDGPU/insertelement.ll
+6-40llvm/test/Transforms/VectorCombine/AMDGPU/extract-insert-i8.ll
+14-20llvm/test/Transforms/VectorCombine/AMDGPU/no-scalarize-vector-extract.ll
+16-16llvm/test/Analysis/CostModel/AMDGPU/extractelement.ll
+146-1863 files not shown
+164-2169 files

LLVM/project a07f90bllvm/utils/lit/lit TestRunner.py, llvm/utils/lit/tests per-test-coverage-by-lit-cfg.py per-test-coverage.py

[lit] Avoid profraw filename collisions with --per-test-coverage (#203998)

Per-test-coverage derived the `LLVM_PROFILE_FILE` name from the test's
basename with its extension removed, so siblings that share a basename
but differ by directory or extension (e.g. foo.c and foo.cpp in one
directory) wrote into the same profraw file and raced on it.

This PR builds the name from the full path in the suite and adds the
`%p` and `%m` placeholders so a test that runs several instrumented
binaries gets a distinct file per process and per binary, even across
exec chains or recycled process ids.
DeltaFile
+11-3llvm/utils/lit/tests/per-test-coverage-by-lit-cfg.py
+5-5llvm/utils/lit/lit/TestRunner.py
+8-0llvm/utils/lit/tests/Inputs/per-test-coverage-by-lit-cfg/name-collision/a/test.py
+8-0llvm/utils/lit/tests/Inputs/per-test-coverage-by-lit-cfg/name-collision/b/test.py
+3-3llvm/utils/lit/tests/per-test-coverage.py
+1-1llvm/utils/lit/tests/Inputs/per-test-coverage/per-test-coverage.py
+36-121 files not shown
+37-137 files

LLVM/project 5dce540flang/lib/Lower OpenACC.cpp Bridge.cpp, flang/lib/Semantics resolve-directives.cpp

[flang][OpenACC] Support COLLAPSE on DO CONCURRENT (#203085)

Lower a COLLAPSE clause on a DO CONCURRENT when the collapse value
equals the number of concurrent controls, matching the equivalent
nested-DO collapse form, and route the loop body into the collapsed
acc.loop. Emit specific not-yet-implemented diagnostics for the
collapse-less-than and collapse-greater-than control-count cases, and a
-Wportability warning for this non-standard extension.

Collapse of mismatched control cases will require a little more invasive
change, so I will submit that as a follow up PR if it is okay, if
desired I could fix the lowering for those two cases now.
DeltaFile
+44-0flang/test/Lower/OpenACC/do-concurrent-collapse.f90
+23-7flang/test/Lower/OpenACC/Todo/do-loops-to-acc-loops-todo.f90
+26-3flang/lib/Lower/OpenACC.cpp
+24-0flang/test/Semantics/OpenACC/acc-collapse-do-concurrent.f90
+16-5flang/lib/Lower/Bridge.cpp
+7-0flang/lib/Semantics/resolve-directives.cpp
+140-151 files not shown
+146-157 files

LLVM/project b87fe38mlir/docs/Tutorials ExternalTutorials.md

[mlir][docs] Add page for third-party tutorials (#188080)

Add a new page to the MLIR documentation that links to the
upstream Lighthouse project as well as additional third-party tutorials.
The goal is to make it easier for newcomers to discover MLIR learning
resources beyond the Toy tutorial.

The underlying discussion/RFC can be found
[here](https://discourse.llvm.org/t/rfc-tutorial-a-beginner-friendly-end-to-end-mlir-compiler-pipeline/89788).
DeltaFile
+31-0mlir/docs/Tutorials/ExternalTutorials.md
+31-01 files

LLVM/project 04d9c4bllvm/include/llvm/Target TargetSelectionDAG.td, llvm/lib/Target/X86 X86InstrFragmentsSIMD.td X86InstrAVX512.td

[X86] Extend alignedstore PatFrag to cover atomic_store
DeltaFile
+86-0llvm/test/CodeGen/X86/atomic-load-store.ll
+4-2llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
+3-2llvm/lib/Target/X86/X86InstrAVX512.td
+1-1llvm/include/llvm/Target/TargetSelectionDAG.td
+94-54 files

LLVM/project a23bf51llvm/lib/Transforms/Scalar SROA.cpp, llvm/test/Transforms/SROA vector-promotion-rmw-tree-merge.ll vector-promotion-rmw-cannot-tree-merge.ll

[SROA] Extend tree-structured merge to handle init + RMW pattern (#194441)

## Problem

When SROA rewrites an alloca used as a read-modify-write accumulator, it
emits a linear chain of `shufflevector + select` per partial store.
`InstCombine`'s `SimplifyDemandedVectorElts` walks this chain
recursively per element, scaling quadratically with chain length — in
practice tens of seconds of compile time on some matmul kernels.

## Example

Take an `<8 x float>` alloca initialized once and then updated in 4
chunks of 2 elements each:

```llvm
%alloca = alloca <8 x float>
store <8 x float> %init, ptr %alloca                       ; full init


    [104 lines not shown]
DeltaFile
+355-139llvm/lib/Transforms/Scalar/SROA.cpp
+326-0llvm/test/Transforms/SROA/vector-promotion-rmw-tree-merge.ll
+153-0llvm/test/Transforms/SROA/vector-promotion-rmw-cannot-tree-merge.ll
+834-1393 files