LLVM/project 21f3248llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-debug.mir

[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging (#202937)

It's 8 years old, only used by a handful of tests, and has not been
updated
in a while except for maintenance as far as I can see.

I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt
debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
DeltaFile
+1-50llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-44llvm/test/CodeGen/AMDGPU/waitcnt-debug.mir
+1-942 files

LLVM/project 68c947fllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPUWaitcntUtils.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils (#202936)

Move really trivial functions into helpers to declutter InsertWaitCnt a
bit more.
I had to move HardwareLimits into a different header but it's only used
in InsertWaitCnt so it doesn't matter.
DeltaFile
+26-90llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+75-0llvm/lib/Target/AMDGPU/AMDGPUWaitcntUtils.cpp
+32-0llvm/lib/Target/AMDGPU/AMDGPUWaitcntUtils.h
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+133-1305 files

LLVM/project 1bea228libc/src/unistd/linux ftruncate.cpp link.cpp

[libc][NFC] Migrate unistd entrypoints to syscall wrappers (#204176)

Migrated link, ftruncate, and getentropy entrypoints to use their
corresponding syscall wrappers instead of direct syscall_impl calls.
Updated CMake dependencies accordingly.

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+4-18libc/src/unistd/linux/ftruncate.cpp
+5-15libc/src/unistd/linux/link.cpp
+6-8libc/src/unistd/linux/getentropy.cpp
+3-11libc/src/unistd/linux/CMakeLists.txt
+18-524 files

LLVM/project ec7235eclang/lib/CodeGen CGStmt.cpp, clang/test/CodeGenCXX noreturn-init-stmt.cpp

[clang][CodeGen] Fix crash on if/switch init-statement ending in noreturn (#201047)

EmitStmt may `ClearInsertionPoint()` to mark dead code, EmitDecl is not
prepared to handle it. Fix by `EnsureInsertPoint()` in transition from
EmitStmt to EmitDecl. If/Switch body may contain a label which makes
them not dead.

Fixes #115514.
DeltaFile
+98-0clang/test/CodeGenCXX/noreturn-init-stmt.cpp
+12-2clang/lib/CodeGen/CGStmt.cpp
+110-22 files

LLVM/project 4995c6eclang/include/clang/Basic LangOptions.def

[LifetimeSafety] Mark lifetime safety LangOptions as `Benign` (#204316)

Without this, we cannot load modules built without lifetime safety.
Analysis options are in general benign and does not effect AST
construction.

See doc:
```cpp
  /// For ASTs produced with different option value, signifies their level of
  /// compatibility.
  enum class CompatibilityKind {
    /// Does affect the construction of the AST in a way that does prevent
    /// module interoperability.
    NotCompatible,
    /// Does affect the construction of the AST in a way that doesn't prevent
    /// interoperability (that is, the value can be different between an
    /// explicit module and the user of that module).
    Compatible,
    /// Does not affect the construction of the AST in any way (that is, the

    [4 lines not shown]
DeltaFile
+4-4clang/include/clang/Basic/LangOptions.def
+4-41 files

LLVM/project 0dda20cllvm/test/Transforms/LoopVectorize/AArch64 replicating-load-store-costs-apple.ll induction-costs.ll, llvm/test/Transforms/LoopVectorize/WebAssembly memory-interleave.ll

[LV] Add initial cost model for VPScalarIVSteps (#203347)

This PR currently only adds a cost model for integer types in
non-replicating regions in order to limit the scope of impact.
We can also support replicating regions, but that requires
looking for a recipe with an underlying value in the same
region in order to get a BasicBlock to pass in to the
getPredBlockCostDivisor function. This can be done in a future
PR.
DeltaFile
+18-130llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll
+54-73llvm/test/Transforms/LoopVectorize/AArch64/replicating-load-store-costs-apple.ll
+16-67llvm/test/Transforms/LoopVectorize/X86/replicating-load-store-costs.ll
+30-43llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll
+46-17llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+31-31llvm/test/Transforms/LoopVectorize/WebAssembly/memory-interleave.ll
+195-3617 files not shown
+299-40913 files

LLVM/project b9f8eeellvm/include/llvm/BinaryFormat DXContainer.h

[DirectX][ObjectYAML][NFC] Remove unused function (#204019)

A small follow-up for #202761.
`updateSize()` function added there is a rebase artifact. It is never
actually used. This change removes it.
DeltaFile
+0-5llvm/include/llvm/BinaryFormat/DXContainer.h
+0-51 files

LLVM/project 55ea182llvm/lib/Transforms/IPO FunctionSpecialization.cpp, llvm/test/Transforms/FunctionSpecialization interposable.ll

[FuncSpec] Do not specialize interposable functions (#204314)

We cannot specialize interposable functions, because the definition we
see may not be prevailing one. The prevailing definition can have
arbitrary different behavior.

We *can* still specialize inexact definitions like linkonce_odr, similar
to inlining.
DeltaFile
+40-0llvm/test/Transforms/FunctionSpecialization/interposable.ll
+3-0llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
+43-02 files

LLVM/project 6f73bc2llvm/lib/Support Parallel.cpp

[llvm] Fix unused function warning in Parallel (#204114)

When llvm is built without threading support:
<...>/llvm-project/llvm/lib/Support/Parallel.cpp:230:13: warning: unused
function 'isNested' [-Wunused-function]
  230 | static bool isNested() {
      |             ^~~~~~~~

The function is only used once, so I've put the code into the caller,
which is itself guarded with `#if LLVM_ENABLE_THREADS`.

Function added in 8daaa26efdda3802f73367d844b267bda3f84cbe / #189293.
DeltaFile
+2-9llvm/lib/Support/Parallel.cpp
+2-91 files

LLVM/project 8c3d2e9llvm/docs Passes.rst, llvm/include/llvm InitializePasses.h

[Passes] Remove deadarghaX0r pass (#204310)

This was a pass internally used by bugpoint. Bugpoint has been removed,
so remove the pass as well.
DeltaFile
+7-30llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
+0-7llvm/include/llvm/Transforms/IPO/DeadArgumentElimination.h
+0-6llvm/docs/Passes.rst
+0-1llvm/lib/Transforms/IPO/IPO.cpp
+0-1llvm/include/llvm/InitializePasses.h
+7-455 files

LLVM/project 7d92d40mlir/include/mlir/Dialect/Tosa/IR TosaComplianceData.h.inc TosaOps.td, mlir/lib/Dialect/Tosa/IR TosaOps.cpp

[mlir][tosa] Add row_gather operator (#202895)

Adds support for the row_gather operator defined by the TOSA
specification, see https://github.com/arm/tosa-specification/pull/60.

This includes:
- Operator definition
- Verification logic for the operator
- Output shape inference for the operator
- Validation checks to ensure compliance with the TOSA specification
including profile compliance and level checks.
- Canonicalization to replace row_gather with gather when row_count is
statically known to be 1.

It does not yet cover support for MXFP types. This will be added once
block scaled types are supported.
DeltaFile
+88-16mlir/lib/Dialect/Tosa/IR/TosaOps.cpp
+63-0mlir/test/Dialect/Tosa/verifier.mlir
+59-0mlir/include/mlir/Dialect/Tosa/IR/TosaComplianceData.h.inc
+49-0mlir/test/Dialect/Tosa/tosa-infer-shapes.mlir
+47-0mlir/test/Dialect/Tosa/ops.mlir
+36-0mlir/include/mlir/Dialect/Tosa/IR/TosaOps.td
+342-1612 files not shown
+506-1618 files

LLVM/project b59f965lldb/test/API/lang/objc/hidden-ivars TestHiddenIvars.py

[lldb][test] Cleanup and modernize TestHiddenIvars.py (#202023)

This is simple rewrite of the test. The patch improves three things:

* It replaces old expect tests with the new expect_* variants that no
longer rely on substring matching.

* It unifies the strip/non-stripped checks as we actually produce
identical SBValues in both cases (by fetching data from the Objective-C
runtime).

* It builds this test with a shared build directory. Our stripping logic
generates a new stripped binary in a subdirectory and doesn't touch the
shared build files. This also halves the test runtime to 6s.
DeltaFile
+79-187lldb/test/API/lang/objc/hidden-ivars/TestHiddenIvars.py
+79-1871 files

LLVM/project 045ec52lldb/packages/Python/lldbsuite/test dotest.py dotest_args.py, lldb/test/API lit.cfg.py

[lldb][test] Only calculate LLDB python path once (#201327)

We spend about 70ms each dotest invocation recalculating the path where
the LLDB module is. This patch changes this so that dotest calculates
this path once and passes it to every dotest invocation.

As a fallback, we still support inferring the location from LLDB as
before, but I would propose we drop this support in the future.
DeltaFile
+29-11lldb/packages/Python/lldbsuite/test/dotest.py
+22-0lldb/test/API/lit.cfg.py
+6-0lldb/packages/Python/lldbsuite/test/dotest_args.py
+5-0lldb/packages/Python/lldbsuite/test/configuration.py
+62-114 files

LLVM/project de84e7cllvm/lib/Target/Mips MipsSEISelLowering.cpp MipsMSAInstrInfo.td, llvm/test/CodeGen/Mips/msa f16-llvm-ir.ll

[MIPS] soft-promote `f16` also when using `+msa` (#204158)

Fixes https://github.com/llvm/llvm-project/issues/202808
Re-lands https://github.com/llvm/llvm-project/pull/203065

Make use of the default soft-promote mechanism for f16, rather than an
ad-hoc approach making f16 storage-only.

In theory you could leave it at that, but I added custom implementations
to make use of the instructions for `FP16_TO_FP` and `FP_TO_FP16`, and
manually apply the "fptoui to fptosi trick" which generates shorter
code.

I've now tested that, in combination with
https://github.com/llvm/llvm-project/pull/203390, this PR is able to
build and run the rust `std` test suite, which exercises both `f16` and
vectors a bunch. The tests all pass under `qemu` as well.

The last commit fixes an integer overflow bug that triggered UBSan and
led to an earlier revert of these changes.
DeltaFile
+966-1,105llvm/test/CodeGen/Mips/msa/f16-llvm-ir.ll
+101-411llvm/lib/Target/Mips/MipsSEISelLowering.cpp
+0-37llvm/lib/Target/Mips/MipsMSAInstrInfo.td
+2-14llvm/lib/Target/Mips/MipsSEISelLowering.h
+0-6llvm/lib/Target/Mips/MipsRegisterInfo.td
+3-2llvm/lib/Target/Mips/MipsISelLowering.cpp
+1,072-1,5753 files not shown
+1,073-1,5809 files

LLVM/project 7e69b16llvm/include/llvm/IR IntrinsicInst.h, llvm/lib/Analysis InstructionSimplify.cpp

[InstCombine] Fold X == Identity ? Y : min/max(X, Y) (#202748)

Fixes #202576

Fold:

```llvm
select (X == -1), Y, umin(X, Y) -> umin(X, Y)
select (X == 0), Y, umax(X, Y) -> umax(X, Y)
select (X == SignedMax), Y, smin(X, Y) -> smin(X, Y)
select (X == SignedMax), Y, smax(Y, X) -> smax(X, Y)
```

Snd the inverted/commuted forms.

AI note: I used AI to help me read through the codebase and write the
tests.
DeltaFile
+57-0llvm/test/Transforms/InstCombine/umin-icmp.ll
+23-0llvm/include/llvm/IR/IntrinsicInst.h
+21-0llvm/lib/Analysis/InstructionSimplify.cpp
+11-0llvm/test/Transforms/InstCombine/smin-icmp.ll
+11-0llvm/test/Transforms/InstCombine/umax-icmp.ll
+10-0llvm/test/Transforms/InstCombine/smax-icmp.ll
+133-06 files

LLVM/project ec6e35fclang/include/clang/Basic LangOptions.def

lifetime-safety-is-benign
DeltaFile
+4-4clang/include/clang/Basic/LangOptions.def
+4-41 files

LLVM/project d5b7e08clang/include/clang/Basic LangOptions.def DiagnosticOptions.def, clang/include/clang/Options Options.td

frontend-opt-to-diags
DeltaFile
+4-4clang/include/clang/Options/Options.td
+0-8clang/include/clang/Basic/LangOptions.def
+6-0clang/include/clang/Basic/DiagnosticOptions.def
+4-1clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
+2-2clang/lib/Sema/SemaLifetimeSafety.h
+3-1clang/lib/Analysis/LifetimeSafety/Checker.cpp
+19-161 files not shown
+21-187 files

LLVM/project 285ed05llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo

Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+79-36llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+36-10llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+27-1llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+17-0llvm/test/MC/AMDGPU/amdgcn-target-directive-triple-env.s
+5-5llvm/test/MC/AMDGPU/hsa-diag-v4.s
+4-4llvm/test/MC/AMDGPU/isa-version-pal.s
+168-5616 files not shown
+198-7722 files

LLVM/project cb81b5dllvm/lib/Target/AMDGPU AMDGPUHWEvents.h SIInsertWaitcnts.cpp

[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask

Follow up from comments on https://github.com/llvm/llvm-project/pull/202886

Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked into the design.

I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory.

Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.
DeltaFile
+96-94llvm/lib/Target/AMDGPU/AMDGPUHWEvents.h
+73-79llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+28-34llvm/lib/Target/AMDGPU/AMDGPUHWEvents.def
+30-32llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+227-2394 files

LLVM/project 415c765llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-debug.mir

[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging

It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.

I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
DeltaFile
+1-50llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-44llvm/test/CodeGen/AMDGPU/waitcnt-debug.mir
+1-942 files

LLVM/project 4c01416llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Add helper for getLimit
DeltaFile
+9-8llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+9-81 files

LLVM/project 3d923fcllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Comment
DeltaFile
+5-5llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+5-51 files

LLVM/project 97b49callvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPUWaitcntUtils.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils

Move really trivial functions into helpers to declutter InsertWaitCnt a bit more.
I had to move HardwareLimits into a different header but it's only used in InsertWaitCnt so it doesn't matter.
DeltaFile
+21-86llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+75-0llvm/lib/Target/AMDGPU/AMDGPUWaitcntUtils.cpp
+32-0llvm/lib/Target/AMDGPU/AMDGPUWaitcntUtils.h
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+128-1265 files

LLVM/project d9dcc11flang/include/flang/Lower CustomIntrinsicCall.h, flang/include/flang/Optimizer/Builder IntrinsicCall.h

[flang][NFC] remove libFortranEvaluate from Optimizer libraries (#204222)

Replace usages of `AbstractConverter` inside IntrinsicCall.cpp by a
structure that propagates the required option to avoid bringing
libFortranEvaluate as a dependency of libFortranOptimizer while the
Optimizer is not using evalute::Expr or other front-end data structure
at all.

Also remove headers whose include have crept-in and that were never
removed while not required.
DeltaFile
+22-13flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+14-16flang/include/flang/Optimizer/Builder/IntrinsicCall.h
+16-1flang/lib/Lower/CustomIntrinsicCall.cpp
+3-8flang/lib/Optimizer/Builder/PPCIntrinsicCall.cpp
+3-2flang/lib/Lower/ConvertCall.cpp
+3-0flang/include/flang/Lower/CustomIntrinsicCall.h
+61-406 files not shown
+61-4612 files

LLVM/project 38f121elldb/packages/Python/lldbsuite/test lldbtest.py

[lldb][test] Fix duplicate error messages in expect_expr/var_path (#202310)

The error message field for expect_* methods always prints the value
object, so there is no need to specify a custom error message that then
just prints the object too.

This fixes the duplicate value object printout on test failures.
DeltaFile
+4-4lldb/packages/Python/lldbsuite/test/lldbtest.py
+4-41 files

LLVM/project c872ac1llvm/lib/Target/AMDGPU AMDGPUHWEvents.cpp SIInsertWaitcnts.cpp

[AMDGPU][InsertWaitCnts] Move HWEvent analysis code (#202887)

Building up on the previous RFC, if it is accepted:
Move the code that maps a MachineInstr to HWEventSet to a separate file.

This should be NFC.
DeltaFile
+177-0llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+3-156llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+7-1llvm/lib/Target/AMDGPU/AMDGPUHWEvents.h
+187-1573 files

LLVM/project 09a51b2offload CMakeLists.txt, offload/liboffload CMakeLists.txt

[offload] Add Windows offload support (#187006)
DeltaFile
+53-13offload/CMakeLists.txt
+15-1offload/liboffload/CMakeLists.txt
+13-0offload/plugins-nextgen/common/src/GlobalHandler.cpp
+8-1offload/plugins-nextgen/common/src/Utils/ELF.cpp
+5-1offload/tools/CMakeLists.txt
+3-3offload/plugins-nextgen/common/src/RecordReplay.cpp
+97-195 files not shown
+108-2411 files

LLVM/project d74caa4mlir/lib/Dialect/SPIRV/Transforms SPIRVConversion.cpp, mlir/test/Conversion/ConvertToSPIRV vector-unroll.mlir

[mlir][spirv] Fix crash on 0-D vectors in vector unrolling (#203291)

`getTargetShape` and `getNativeVectorShape` called `getShape().back()`
without checking for rank-0 vectors, whose shape is empty. This crashed
when the SPIR-V vector unrolling pass processed a function returning a
0-D vector (e.g. `vector<f32>`) or a 0-D elementwise op.

0-D vectors have no dimension to unroll along and are not SPIR-V vector
types, so bail out and leave them unchanged in both paths.

Fixes https://github.com/llvm/llvm-project/issues/203220
DeltaFile
+22-0mlir/test/Conversion/ConvertToSPIRV/vector-unroll.mlir
+6-0mlir/lib/Dialect/SPIRV/Transforms/SPIRVConversion.cpp
+28-02 files

LLVM/project 2cd778cclang-tools-extra/clangd ParsedAST.cpp, clang-tools-extra/clangd/unittests ReplayPeambleTests.cpp

[clangd] Replay macro definitions from preamble for clang-tidy checks (#202495)

Clang-tidy checkers observe preprocessor events via PPCallbacks. When
using a preamble, macro definitions in the preamble region of the main
file are not replayed during the main-file build, causing checkers like
bugprone-reserved-identifier to miss them.

This patch extends ReplayPreamble::replay() to also replay MacroDefined
events for macros defined directly in the preamble region of the open
file, similar to how InclusionDirective events are already replayed.

Fixes: https://github.com/clangd/clangd/issues/2501
DeltaFile
+52-0clang-tools-extra/clangd/unittests/ReplayPeambleTests.cpp
+31-9clang-tools-extra/clangd/ParsedAST.cpp
+83-92 files

LLVM/project 45304e1llvm/lib/Target/AMDGPU AMDGPUHWEvents.h SIInsertWaitcnts.cpp

[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask

Follow up from comments on https://github.com/llvm/llvm-project/pull/202886

Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked into the design.

I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory.

Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.
DeltaFile
+96-94llvm/lib/Target/AMDGPU/AMDGPUHWEvents.h
+73-79llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+30-32llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+28-34llvm/lib/Target/AMDGPU/AMDGPUHWEvents.def
+227-2394 files