LLVM/project 0ca87a1clang/lib/UnifiedSymbolResolution USRGeneration.cpp, clang/unittests/Index IndexTests.cpp

[clang-index][USR] GenLoc prints file entry at most once, allow repeated offsets

GenLoc previously printed the source location at most once per USR,
gated by a member flag toggled on the first call.  During the
recursive visit, if both an outer and an inner decl needed to print
the location, only the outer one was printed.  When the outer decl did
not need the offset, no offset was ever printed.  For example, the USR
of `Holder<decltype([]{})>::method` depends on the location of the
type of the lambda but the outer decl prints the file entry only,
which disables offset printing.

Change the logic so the file-entry part of the location is printed at
most once (it must be identical), while offsets of sub-decl locations
may be printed multiple times.
DeltaFile
+35-10clang/lib/UnifiedSymbolResolution/USRGeneration.cpp
+34-0clang/unittests/Index/IndexTests.cpp
+69-102 files

LLVM/project 418c3f8llvm/docs/CommandGuide index.md, llvm/docs/GlobalISel index.rst Pipeline.rst

[docs] Enforce unambiguous toctree in llvm/docs

It seems like using a non-`hidden` `toctree` for page navigation is a
bit of a trap, in that every doc must have a single unique path through
the global toctree to the root doc, and it is very easy to end up with
multiple.

This patch tries to address the warnings (actually infos, hence why it
does not fail the build) in llvm/docs/, namely:

  $ sphinx-build -b html -jauto llvm/docs/ /tmp/sphinx-out
  checking consistency...
  llvm/docs/AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack/AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack.md: document is referenced in multiple toctrees: ['UserGuides', 'AMDGPUUsage'], selecting: UserGuides <- AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack/AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack
  llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst: document is referenced in multiple toctrees: ['UserGuides', 'AMDGPUUsage'], selecting: UserGuides <- AMDGPUDwarfExtensionsForHeterogeneousDebugging
  llvm/docs/CommandGuide/llvm-reduce.rst: document is referenced in multiple toctrees: ['CommandGuide/index', 'CommandGuide/index', 'Reference'], selecting: Reference <- CommandGuide/llvm-reduce
  llvm/docs/GitHub.rst: document is referenced in multiple toctrees: ['GettingInvolved', 'UserGuides'], selecting: UserGuides <- GitHub
  llvm/docs/GlobalISel/IRTranslator.rst: document is referenced in multiple toctrees: ['GlobalISel/index', 'GlobalISel/Pipeline'], selecting: GlobalISel/index <- GlobalISel/IRTranslator
  llvm/docs/GlobalISel/InstructionSelect.rst: document is referenced in multiple toctrees: ['GlobalISel/index', 'GlobalISel/Pipeline'], selecting: GlobalISel/index <- GlobalISel/InstructionSelect
  llvm/docs/GlobalISel/Legalizer.rst: document is referenced in multiple toctrees: ['GlobalISel/index', 'GlobalISel/Pipeline'], selecting: GlobalISel/index <- GlobalISel/Legalizer

    [35 lines not shown]
DeltaFile
+126-81llvm/docs/CommandGuide/index.md
+30-21llvm/docs/GlobalISel/index.rst
+50-0utils/docs/llvm_sphinx/ext/checks.py
+21-21llvm/tools/llvm-debuginfo-analyzer/README.md
+0-14llvm/docs/tutorial/MyFirstLanguageFrontend/index.rst
+0-8llvm/docs/GlobalISel/Pipeline.rst
+227-1455 files not shown
+229-15411 files

LLVM/project ed6f75dclang/docs ghlinks.py conf.py, lldb/docs conf.py

[docs] Create utils/docs

llvm-project is home to many sphinx documentation sites, each with
configuration quirks and bespoke extensions.

The sphinx config model makes sharing code somewhat difficult. There
are options like sphinx-multiproject, but some of our docs builds are
out of the source tree while some are done out of the binary tree, so
the multiproject configuration itself would need to be generated. It
also would impose more uniformity around extensions than required.

This change instead creates a python package at utils/docs/llvm_sphinx
and makes it available to all sphinx-build processes via PYTHONPATH.
Each conf.py does not modify its own sys.path because not all builds are
out of the source tree, so there isn't a stable relative path to use to
refer to the utils/docs/ directory.

Type checking via pyright in new package is pinned to being python 3.8
compatible.

    [29 lines not shown]
DeltaFile
+0-273clang/docs/ghlinks.py
+151-0utils/docs/llvm_sphinx/ext/ghlinks/__init__.py
+71-0utils/docs/llvm_sphinx/__init__.py
+12-44llvm/docs/conf.py
+6-39lldb/docs/conf.py
+9-30clang/docs/conf.py
+249-38623 files not shown
+423-64929 files

LLVM/project c4bd250

Bump minimum required sphinx Python to 3.8

There seems to be de-facto use of at least 3.6 in docs, namely:

* Use of pathlib (3.4) in various places
* Format f-strings (3.6) and used in clang/docs/ghlinks.py

I don't see a strong reason to maintain the divide in minimum version
between test/docs, especially considering the "FIXME" indicating
the 3.0 lower bound was just a guess to begin with.

Change-Id: I11e00295ae0a13ec0f1c5cefbb2fdd2db272b152
DeltaFile
+0-00 files

LLVM/project c290057

[docs] Add BOLTAArch64OptimizationStatus to toctree

Building docs-bolt-html fails with:

  Warning, treated as error:
  /home/slinder1/llvm-project/scratch/bolt/docs/BOLTAArch64OptimizationStatus.rst:document isn't included in any toctree

Just add the orphan document to the toctree in the index to silence
this. If there is a better parent it can be moved somewhere else in the
tree.

Change-Id: I1d26d96d5485d97d29231da89f8c8408b375c41f
DeltaFile
+0-00 files

LLVM/project 1a997e2llvm/include/llvm/DWARFLinker DWARFLinkerBase.h, llvm/lib/DWARFLinker/Parallel DWARFLinkerImpl.cpp DWARFLinkerImpl.h

[dsymutil] Reuse a single thread pool across architectures (#204691)

dsymutil links the architectures of a universal binary on a thread pool,
and the parallel linker's DWARFLinkerImpl::link() then created a second
pool to link each architecture's object files. With one such inner pool
per architecture, dsymutil spun up more worker threads than the machine
has cores.

Add DWARFLinkerBase::setThreadPool() so the caller provides the pool.
The parallel linker schedules the object files on it as a
ThreadPoolTaskGroup. dsymutil hands over the pool it already uses to
schedule the architectures, llvm-dwarfutil passes one sized by
--num-threads, and the classic linker ignores it and manages its own
threads (always 2 for the lockstep algorithm).

The per-compile-unit cloning still runs on the global llvm::parallel
executor, whose per-thread allocators are indexed by getThreadIndex(),
so it can't move onto this pool.
DeltaFile
+13-24llvm/tools/dsymutil/dsymutil.cpp
+10-1llvm/tools/llvm-dwarfutil/DebugInfoLinker.cpp
+3-4llvm/lib/DWARFLinker/Parallel/DWARFLinkerImpl.cpp
+5-2llvm/tools/dsymutil/DwarfLinkerForBinary.h
+6-0llvm/lib/DWARFLinker/Parallel/DWARFLinkerImpl.h
+3-0llvm/include/llvm/DWARFLinker/DWARFLinkerBase.h
+40-312 files not shown
+44-318 files

LLVM/project 5fb3d28llvm/docs ProgrammersManual.rst, llvm/test/CodeGen/AMDGPU sched-handleMoveUp-dead-def-join.mir

rebase

Created using spr 1.3.7
DeltaFile
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+0-8,306llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16.txt
+5,672-0llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16-fake.txt
+5,126-0llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-dead-def-join.mir
+0-4,257llvm/docs/ProgrammersManual.rst
+35,645-19,5923,005 files not shown
+127,989-79,3803,011 files

LLVM/project c1895bfllvm/docs ProgrammersManual.rst, llvm/test/CodeGen/AMDGPU sched-handleMoveUp-dead-def-join.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+0-8,306llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16.txt
+5,672-0llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16-fake.txt
+5,126-0llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-dead-def-join.mir
+0-4,257llvm/docs/ProgrammersManual.rst
+35,645-19,5923,005 files not shown
+127,989-79,3803,011 files

LLVM/project 37a71b6llvm/include/llvm/IR Function.h, llvm/lib/Analysis InlineCost.cpp

[IR] Remove ProfileCount Abstraction

This only exists to differentiate between real and synthetic profiles.
Remove the abstraction now that we plan to fully remove synthetic
profiles.

Reviewers: mtrofin, david-xl

Reviewed By: mtrofin

Pull Request: https://github.com/llvm/llvm-project/pull/204770
DeltaFile
+14-18llvm/lib/Transforms/Utils/InlineFunction.cpp
+2-24llvm/include/llvm/IR/Function.h
+5-17llvm/lib/IR/Function.cpp
+5-7llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+4-5llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
+4-5llvm/lib/Analysis/InlineCost.cpp
+34-7618 files not shown
+63-12524 files

LLVM/project 65a7ccfmlir/include/mlir/Dialect/OpenACC/Transforms Passes.td, mlir/lib/Dialect/OpenACC/Transforms ACCEmitRemarksLoop.cpp CMakeLists.txt

[MLIR][OpenACC] Add acc-emit-remarks-loop pass (#205203)

Add a function-level pass that emits optimization remarks for loops in
`acc.compute_region`, describing their mapping to OpenACC parallel
levels (gang, worker, vector, sequential) and GPU dimensions (blockIdx,
threadIdx).
DeltaFile
+163-0mlir/lib/Dialect/OpenACC/Transforms/ACCEmitRemarksLoop.cpp
+155-0mlir/test/Dialect/OpenACC/acc-emit-remarks-loop.mlir
+40-0mlir/test/Dialect/OpenACC/acc-emit-remarks-loop-pipeline.mlir
+17-0mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td
+1-0mlir/lib/Dialect/OpenACC/Transforms/CMakeLists.txt
+376-05 files

LLVM/project d6c0393lldb/cmake/modules LLDBConfig.cmake

[lldb] Disable dynamic script interpreters by default under Xcode (#205423)

When LLDB_ENABLE_DYNAMIC_SCRIPTINTERPRETERS is set, liblldb's export
list is built by merging the undefined LLDB symbols extracted from each
script interpreter plugin's objects (119e57630281). Because the plugins
link liblldb, the generated file is wired into liblldb's link via
LINK_DEPENDS, a file-level dependency with no target-level edge.

The Xcode generator only has coarse target-level dependencies, so that
generated liblldb-script-interpreter.exports ends up attached to two
targets with no common dependency, which its "new build system" rejects
at generation time:

```
  CMake Error in source/API/CMakeLists.txt:
    .../source/API/liblldb-script-interpreter.exports
    is attached to multiple targets ... but none of these is a common
    dependency of the other(s). This is not allowed by the Xcode "new
    build system".

    [6 lines not shown]
DeltaFile
+1-1lldb/cmake/modules/LLDBConfig.cmake
+1-11 files

LLVM/project 7e6d700llvm/include/llvm/Analysis ProfileSummaryInfo.h, llvm/lib/CodeGen MachineFunction.cpp

[PSI] Return raw entry count values

Now that synthetic entry counts are being removed, stop using the
ProfileCount wrapper around entrycounts given it only exists to
distinguish between synthetic and real profile counts.

Reviewers: teresajohnson, david-xl, mtrofin

Pull Request: https://github.com/llvm/llvm-project/pull/204769
DeltaFile
+11-12llvm/include/llvm/Analysis/ProfileSummaryInfo.h
+4-2llvm/lib/CodeGen/MachineFunction.cpp
+15-142 files

LLVM/project 05ce3a0llvm/docs ProgrammersManual.rst, llvm/test/CodeGen/AMDGPU sched-handleMoveUp-dead-def-join.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+0-8,306llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16.txt
+5,672-0llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16-fake.txt
+5,126-0llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-dead-def-join.mir
+0-4,257llvm/docs/ProgrammersManual.rst
+35,645-19,5922,982 files not shown
+127,381-79,3132,988 files

LLVM/project b74154cclang/lib/Driver/ToolChains WebAssembly.cpp, lld/test/wasm cooperative-threading.s

[WebAssembly] Cooperative threading for WASIP3 (#200855)

This PR builds on the changes to allow libcall thread context from
https://github.com/llvm/llvm-project/pull/175800/changes and adds the
necessary changes to support cooperative multithreading in the WASIP3
target:

- Not marking memory as shared 
- Allowing thread local accesses without atomics
- Only using passive segments for TLS segments

The linker changes are supported by a new flag called
`--cooperative-multithreading`. We talked about having two flags, one
for the `--libcall-thread-context` part and one for the cooperative
multithreading part. For now, I've simply replaced the
`--libcall-thread-context` flag with the `--cooperative-multithreading`
one and kept the internal configuration intact for simplicity.
DeltaFile
+85-0lld/test/wasm/cooperative-threading.s
+36-22lld/wasm/Writer.cpp
+23-9clang/lib/Driver/ToolChains/WebAssembly.cpp
+15-12lld/wasm/Driver.cpp
+25-0llvm/test/CodeGen/WebAssembly/cooperative-strip-tls.ll
+7-6lld/wasm/SyntheticSections.cpp
+191-4912 files not shown
+231-6418 files

LLVM/project bed4738flang/lib/Semantics check-omp-structure.cpp

Add comment
DeltaFile
+1-0flang/lib/Semantics/check-omp-structure.cpp
+1-01 files

LLVM/project e584691llvm/include/llvm/IR Function.h, llvm/lib/IR Function.cpp

[IR] Remove Synthetic Profile Support from Function

Synthetic profiles are not generated anywhere and support is very
sporadic across the code base. They are slated to be removed, so remove
support for them from Function member functions.

A future PR will clean up the ProfileCount abstraction that is now no
longer necessary.

Reviewers: teresajohnson, david-xl, mtrofin

Pull Request: https://github.com/llvm/llvm-project/pull/204768
DeltaFile
+10-16llvm/lib/IR/Function.cpp
+3-7llvm/include/llvm/IR/Function.h
+0-10llvm/unittests/IR/MetadataTest.cpp
+3-3llvm/lib/Transforms/Utils/ProfileVerify.cpp
+2-2llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+18-385 files

LLVM/project c0ddb50llvm/docs ProgrammersManual.rst, llvm/test/CodeGen/AMDGPU sched-handleMoveUp-dead-def-join.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+0-8,306llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16.txt
+5,672-0llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16-fake.txt
+5,126-0llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-dead-def-join.mir
+0-4,257llvm/docs/ProgrammersManual.rst
+35,645-19,5922,982 files not shown
+127,381-79,3132,988 files

LLVM/project 51e07dbflang/test/Semantics/OpenMP if-clause-60.f90

Modify test
DeltaFile
+2-2flang/test/Semantics/OpenMP/if-clause-60.f90
+2-21 files

LLVM/project ee682b8llvm/include/llvm/Analysis BlockFrequencyInfoImpl.h, llvm/lib/Analysis BlockFrequencyInfoImpl.cpp BlockFrequencyInfo.cpp

[BFI] Drop AllowSynthetic Parameter

This was never set anywhere to something other than the default outside
of the implementation and synthetic profile counts are slated for
removal.

Reviewers: teresajohnson, mtrofin, david-xl

Pull Request: https://github.com/llvm/llvm-project/pull/204767
DeltaFile
+10-16llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h
+6-6llvm/lib/Analysis/BlockFrequencyInfoImpl.cpp
+1-1llvm/lib/Analysis/BlockFrequencyInfo.cpp
+17-233 files

LLVM/project 70253b1llvm/docs ProgrammersManual.rst, llvm/test/CodeGen/AMDGPU sched-handleMoveUp-dead-def-join.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+0-8,306llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16.txt
+5,672-0llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16-fake.txt
+5,126-0llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-dead-def-join.mir
+0-4,257llvm/docs/ProgrammersManual.rst
+35,645-19,5922,982 files not shown
+127,381-79,3132,988 files

LLVM/project db320abflang/lib/Semantics check-omp-structure.cpp check-omp-structure.h

Rename ifLeafs to ifLeafs_
DeltaFile
+2-2flang/lib/Semantics/check-omp-structure.cpp
+1-1flang/lib/Semantics/check-omp-structure.h
+3-32 files

LLVM/project 5794f4ellvm/include/llvm/Analysis ProfileSummaryInfo.h, llvm/lib/Analysis ProfileSummaryInfo.cpp

[PSI] Drop AllowSynthetic parameter to getProfileCount

This was not set anywhere and synthetic profile counts are not
emitted/used anywhere, so remove it.

Reviewers: david-xl, mtrofin, teresajohnson

Pull Request: https://github.com/llvm/llvm-project/pull/204765
DeltaFile
+4-3llvm/lib/Analysis/ProfileSummaryInfo.cpp
+1-2llvm/include/llvm/Analysis/ProfileSummaryInfo.h
+5-52 files

LLVM/project 7e510f6llvm/include/llvm/Transforms/Vectorize LoopVectorizationLegality.h, llvm/lib/Transforms/Vectorize LoopVectorizationLegality.cpp

[LV] Remove unused getInt/Fp/PointerInductionDescriptor accessors (NFC) (#205414)

getIntOrFpInductionDescriptor and getPointerInductionDescriptor are
unused, remove them.
DeltaFile
+0-21llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+0-10llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+0-312 files

LLVM/project 0fdfa2bllvm/lib/Target/RISCV RISCVInstrInfo.td

[RISCV] Remove assembly string from PseudoLA_TLSDESC. NFC (#205406)
DeltaFile
+1-2llvm/lib/Target/RISCV/RISCVInstrInfo.td
+1-21 files

LLVM/project cf78c54llvm/include/llvm/Analysis MemoryBuiltins.h, llvm/lib/Analysis MemoryBuiltins.cpp

[MemoryBuiltins][NFC] Clang format and fixed coding style (#205205)
DeltaFile
+75-75llvm/lib/Analysis/MemoryBuiltins.cpp
+1-1llvm/include/llvm/Analysis/MemoryBuiltins.h
+76-762 files

LLVM/project ebf518bmlir/include/mlir/Dialect/XeGPU/Transforms XeGPULayoutImpl.h, mlir/include/mlir/Dialect/XeGPU/uArch IntelGpuXe2.h

[MLIR][XeGPU] Refactor XeGPU layout propagation: passing lane_layout/lane_data with inst_data (#203156)

**Motivation**

Enhance setup* rules in layout propagation to pass lane_layout, and
lane_data information during inst_data propagation, so that the
propagation can have lane level information when choosing an optimal
inst_data. This branch makes that relationship explicit and uniform
across all setup rules.

**Invariant**

All setup rules now produce layouts that satisfy:

Nd ops + dpas/dpas_mx: inst_data = k * (lane_layout * lane_data), k ≥ 1
Scatter/matrix ops + non-anchor ops: inst_data = lane_layout * lane_data

**Key changes in XeGPULayoutImpl**
- New per-op anchor setup rules: setupStoreNdAnchorLayout,

    [38 lines not shown]
DeltaFile
+1,554-816mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+388-369mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+286-240mlir/test/Dialect/XeGPU/propagate-layout-inst-data.mlir
+100-106mlir/test/Dialect/XeGPU/propagate-layout.mlir
+117-9mlir/include/mlir/Dialect/XeGPU/Transforms/XeGPULayoutImpl.h
+45-22mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h
+2,490-1,5624 files not shown
+2,514-1,57310 files

LLVM/project 9757708llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize pred-inst-discount-vector-library-call.ll

[LoopVectorize] Don't assert in getVectorCallCost for vector library variants (#202085)

During loop vectorization, `computePredInstDiscount` queries the cost of
instructions at vector VF using `getInstructionCost`. A `CallInst` with
a vector library variant delegates to `getVectorCallCost`, which
asserted that such variants should not reach it.

A predicated call can however reach `getVectorCallCost` via
`computePredInstDiscount` — before its widening decision is made — when
a predicated user (e.g. a scatter store) is being considered for
scalarization. Remove the assert and fall through to the existing
scalarization cost, which is the cost relevant to that analysis.

Adds a regression test exercising that path.
DeltaFile
+221-0llvm/test/Transforms/LoopVectorize/pred-inst-discount-vector-library-call.ll
+30-20llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+251-202 files

LLVM/project f5d3175llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-flavor-classification.mir coexec-scheduler.ll

[AMDGPU] Use instrLatency for memory HWUI cycle accounting

Change-Id: I7a9e2deb5db7638d6f735570e65d6ca988a7477f
DeltaFile
+3-3llvm/test/CodeGen/AMDGPU/coexec-sched-flavor-classification.mir
+5-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+2-2llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
+2-2llvm/test/CodeGen/AMDGPU/ldsdmacnt_sched.mir
+12-74 files

LLVM/project c29679allvm/test/CodeGen/AMDGPU coexec-sched-flavor-classification.mir

Precommit test change

Change-Id: Id3546e1492a335f48cf70d9b0e93afe22b31ff7a
DeltaFile
+5-5llvm/test/CodeGen/AMDGPU/coexec-sched-flavor-classification.mir
+5-51 files

LLVM/project 594cce3llvm/lib/Target/AMDGPU SIWholeQuadMode.cpp, llvm/test/CodeGen/AMDGPU uniform-intrin-combine-wqm-demote.ll

[AMDGPU] Change static NOP last terminator SI_DEMOTE_I1 to be replaced by S_BRANCH instead of assert (#204649)

This issue was first discovered in some testing downstream. A specific
chain of transformations on a ballot instruction with a constant
argument followed by an llvm.amgcn.wqm.demote call leads to an
instruction of `SI_DEMOTE_I1 -1, 0` being the last terminator of a block
with a single successor. This instruction is a NOP and can safely be
replaced with an S_BRANCH to the block's successor instead of asserting
failure.

The test added in this change is a very simplified recreation of the
pattern seen in the shader compilation in the downstream that lead to
assertion failure
DeltaFile
+17-0llvm/test/CodeGen/AMDGPU/uniform-intrin-combine-wqm-demote.ll
+1-1llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+18-12 files