LLVM/project f84d0aellvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor/AMDGPU nofpclass-amdgcn-trig-preop.ll

AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.trig.preop

Surprisingly this doesn't consider the special cases, and literally
just extracts the exponent and proceeds as normal.
DeltaFile
+12-0llvm/test/Transforms/Attributor/AMDGPU/nofpclass-amdgcn-trig-preop.ll
+4-0llvm/lib/Analysis/ValueTracking.cpp
+16-02 files

LLVM/project b9a10f4llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-intrinsics.ll

AMDGPU: Fix incorrect fold of undef for llvm.amdgcn.trig.preop

We were folding undef inputs to qnan which is incorrect. The instruction
never returns nan. Out of bounds segment select will return 0, so fold
undef segment to 0.
DeltaFile
+29-28llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+18-18llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+47-462 files

LLVM/project 80662c1llvm/lib/Target/AMDGPU AMDGPUCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Use SimplifyQuery in AMDGPUCodeGenPrepare (#179133)

Enables assumes in more contexts. Of particular interest is the
nan check for the fract pattern.

The device libs f32 and s64 sin implementations have a range check,
and inside the large path this pattern appears. After a small patch
to invert this check to send nans down the small path, this will
enable the fold unconditionally on the large path.
DeltaFile
+79-0llvm/test/CodeGen/AMDGPU/fract-match.ll
+27-22llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+106-222 files

LLVM/project 8d0830ellvm/include/llvm/Analysis LoopCacheAnalysis.h, llvm/lib/Analysis LoopCacheAnalysis.cpp

[LoopCacheAnalysis] Remove tryDelinearizeFixedSize (NFCI) (#177552)

LoopCacheAnalysis has its own function `tryDelinearizeFixedSize`, which
is a wrapper of Delinearization. Due to recent changes in
Delinearization, this function has become almost equivalent to
`delinearizeFixedSizeArray` and is no longer necessary. This patch
removes it.
DeltaFile
+4-33llvm/lib/Analysis/LoopCacheAnalysis.cpp
+0-5llvm/include/llvm/Analysis/LoopCacheAnalysis.h
+4-382 files

LLVM/project a667526llvm/lib/CodeGen MachineFunctionPass.cpp, llvm/test/CodeGen/AArch64 O3-pipeline.ll

[MachineFunctionPass] Preserve more IR analyses (#178871)

Preserve, PDT, BPI, LazyBPI and LazyBFI. These are all IR analysis that
are not invalidated by machine passes.

This partially mitigates the compile-time regression from
https://github.com/llvm/llvm-project/pull/174746.
DeltaFile
+8-0llvm/lib/CodeGen/MachineFunctionPass.cpp
+0-4llvm/test/CodeGen/AArch64/O3-pipeline.ll
+8-42 files

LLVM/project 6d83b16llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/include/llvm/CodeGen TargetInstrInfo.h

Implement per-output machine uniformity analysis
DeltaFile
+76-14llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+27-11llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+16-5llvm/include/llvm/ADT/GenericUniformityImpl.h
+8-9llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/per-output-uniformity.mir
+4-4llvm/lib/Target/AMDGPU/SIInstrInfo.h
+4-3llvm/include/llvm/CodeGen/TargetInstrInfo.h
+135-462 files not shown
+140-498 files

LLVM/project b4797d4llvm/lib/Target/PowerPC PPCISelLowering.cpp, llvm/test/CodeGen/PowerPC ucmp.ll

[PowerPC] Fix miscompilation when using 32-bit ucmp on 64-bit PowerPC (#178979)

I forgot that you need to clear the upper 32 bits for the carry flag to
work properly on ppc64 or else there will be garbage and possibly
incorrect results.

Fixes: https://github.com/llvm/llvm-project/issues/179119

I do not have merge permissions.
DeltaFile
+13-4llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+12-4llvm/test/CodeGen/PowerPC/ucmp.ll
+25-82 files

LLVM/project 136bbdeclang/lib/AST ExprConstant.cpp, clang/lib/AST/ByteCode State.cpp State.h

[clang][ExprConst] Move shared `EvalInfo` state into `interp::State` (#177738)

Instead of having `InterpState` call into its parent `EvalInfo`, just
save the state in `interp::State`, where both subclasses can access it.
DeltaFile
+15-130clang/lib/AST/ExprConstant.cpp
+74-13clang/lib/AST/ByteCode/State.cpp
+58-15clang/lib/AST/ByteCode/State.h
+4-33clang/lib/AST/ByteCode/InterpState.h
+7-6clang/lib/AST/ByteCode/InterpState.cpp
+158-1975 files

LLVM/project b83160bllvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp

AMDGPU: Use extractBitsAsZExtValue to get exponent in trig_preop folding (#179024)

DeltaFile
+1-1llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+1-11 files

LLVM/project 48c664bmlir/lib/Dialect/UB/IR CMakeLists.txt

[mlir] Fix build after #179039 (#179180)

Fix build after #179039.
DeltaFile
+2-0mlir/lib/Dialect/UB/IR/CMakeLists.txt
+2-01 files

LLVM/project 6c4f95bmlir/lib/Dialect/UB/IR CMakeLists.txt

[mlir] Fix build after #179039
DeltaFile
+2-0mlir/lib/Dialect/UB/IR/CMakeLists.txt
+2-01 files

LLVM/project a220502mlir/include/mlir/Dialect/UB/IR UBOps.td, mlir/lib/Dialect/UB/IR UBOps.cpp

[mlir][UB] Erase ops that precede `ub.unreachable`
DeltaFile
+37-0mlir/lib/Dialect/UB/IR/UBOps.cpp
+25-0mlir/test/Dialect/UB/canonicalize.mlir
+1-0mlir/include/mlir/Dialect/UB/IR/UBOps.td
+63-03 files

LLVM/project 1e33b73mlir/include/mlir/Interfaces ExecutionProgressOpInterface.td ExecutionProgressOpInterface.h, mlir/lib/Dialect/SCF/IR SCF.cpp

[mlir][Interfaces] Add `ExecutionProgressOpInterface` + folding pattern (#179039)

Add the `ExecutionProgressOpInterface` with an interface method to check
if an operation "must progress". Add `mustProgress` attributes to
`scf.for` and `scf.while` (default value is "true").

`mustProgress` corresponds to the [`llvm.loop.mustprogress`
metadata](https://llvm.org/docs/LangRef.html#langref-llvm-loop-mustprogress).

Also add a canonicalization pattern to erase `RegionBranchOpInterface`
ops that must progress but loop infinitely (and are non-side-effecting).
This canonicalization pattern is enabled for `scf.for` and `scf.while`.

RFC: https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
DeltaFile
+73-30mlir/lib/Interfaces/ControlFlowInterfaces.cpp
+51-0mlir/test/Dialect/SCF/canonicalize.mlir
+45-3mlir/lib/Dialect/SCF/IR/SCF.cpp
+48-0mlir/include/mlir/Interfaces/ExecutionProgressOpInterface.td
+39-0mlir/lib/Dialect/UB/IR/UBOps.cpp
+25-0mlir/include/mlir/Interfaces/ExecutionProgressOpInterface.h
+281-3313 files not shown
+356-4419 files

LLVM/project 013b345clang/lib/Sema SemaType.cpp, clang/lib/Serialization ASTReaderDecl.cpp

[Serialization] Stop demote var definition as declaration (#172430) (#177117)

Close https://github.com/llvm/llvm-project/issues/172241 
Close https://github.com/llvm/llvm-project/issues/64034 
Close https://github.com/llvm/llvm-project/issues/149404
 Close https://github.com/llvm/llvm-project/issues/174858

After this patch, we (the clang dev) no longer assumes there are at most
one definition in a redeclaration chain.

See


https://discourse.llvm.org/t/rfc-clang-not-assuming-there-is-at-most-one-definition-in-a-redeclaration-chain/89360
for details.

---

Update since last commit:

    [2 lines not shown]
DeltaFile
+110-0clang/test/Modules/var-inst-def.cppm
+104-0clang/test/Modules/pr149404-02.cppm
+94-0clang/test/Modules/demote-var-def.cpp
+52-24clang/lib/Sema/SemaType.cpp
+47-0clang/test/Modules/pr172241.cppm
+0-14clang/lib/Serialization/ASTReaderDecl.cpp
+407-386 files

LLVM/project 7675549llvm/lib/Target/AMDGPU SOPInstructions.td AMDGPU.td, llvm/test/MC/AMDGPU gfx13_asm_sopc.s gfx13_asm_sopp.s

[AMDGPU] Add SOPK, SOPC and SOPP encoding support for gfx13
DeltaFile
+2,360-0llvm/test/MC/AMDGPU/gfx13_asm_sopc.s
+448-263llvm/lib/Target/AMDGPU/SOPInstructions.td
+276-0llvm/test/MC/AMDGPU/gfx13_asm_sopp.s
+215-0llvm/test/MC/AMDGPU/gfx13_asm_sopk.s
+20-3llvm/lib/Target/AMDGPU/AMDGPU.td
+18-0llvm/test/MC/AMDGPU/gfx13_asm_sopp_alias.s
+3,337-2661 files not shown
+3,346-2667 files

LLVM/project 2d3ff80llvm/utils/gn/secondary/llvm/lib/Target/NVPTX BUILD.gn

[gn build] Port 1a23bca645dc
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/Target/NVPTX/BUILD.gn
+1-01 files

LLVM/project 1a23bcallvm/lib/Target/NVPTX NVPTXDwarfDebug.cpp, llvm/test/DebugInfo/NVPTX inlinedAt_6.ll inlinedAt_3.ll

[DebugInfo][NVPTX] Adding support for `inlined_at` debug directive in NVPTX backend (#170239)

This change adds support for emitting the enhanced PTX debugging
directives `function_name` and `inlined_at` as part of the `.loc`
directive in the NVPTX backend.

`.loc` syntax - 
>.loc file_index line_number column_position

`.loc` syntax with `inlined_at` attribute - 
>.loc file_index line_number column_position,function_name label {+
immediate }, inlined_at file_index2 line_number2 column_position2

`inlined_at` attribute specified as part of the `.loc` directive
indicates PTX instructions that are generated from a function that got
inlined. It specifies the source location at which the specified
function is inlined. `file_index2`, `line_number2`, and
`column_position2` specify the location at which the function is
inlined.

    [27 lines not shown]
DeltaFile
+334-0llvm/test/DebugInfo/NVPTX/inlinedAt_6.ll
+302-0llvm/test/DebugInfo/NVPTX/inlinedAt_3.ll
+222-0llvm/test/DebugInfo/NVPTX/inlinedAt_4.ll
+205-0llvm/test/DebugInfo/NVPTX/inlinedAt_5.ll
+177-0llvm/lib/Target/NVPTX/NVPTXDwarfDebug.cpp
+141-0llvm/test/DebugInfo/NVPTX/inlinedAt_1.ll
+1,381-013 files not shown
+1,822-4519 files

LLVM/project 448595dorc-rt/include/orc-rt Session.h, orc-rt/lib/executor Session.cpp

[orc-rt] Use future rather than condition_variable for shutdown wait. (#179169)

Session::waitForShutdown is a convenience wrapper around the
asynchronous Session::shutdown call. The previous
Session::waitForShutdown call waited on a std::condition_variable to
signal the end of shutdown, but it's easier to just embed a std::promise
in a callback to the asynchronous shutdown method.
DeltaFile
+4-5orc-rt/lib/executor/Session.cpp
+1-2orc-rt/include/orc-rt/Session.h
+5-72 files

LLVM/project 85545d4llvm/include/llvm/CodeGen MachineDominanceFrontier.h, llvm/lib/CodeGen MachineDominanceFrontier.cpp

[NewPM] Port MachineDominanceFrontierAnalysis (#177709)

DeltaFile
+32-43llvm/include/llvm/CodeGen/MachineDominanceFrontier.h
+38-14llvm/lib/CodeGen/MachineDominanceFrontier.cpp
+3-3llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
+3-3llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
+3-3llvm/lib/Target/Hexagon/HexagonRDFOpt.cpp
+3-3llvm/lib/Target/Hexagon/HexagonOptAddrMode.cpp
+82-696 files not shown
+91-7812 files

LLVM/project ec6a219orc-rt/lib/executor Session.cpp

[orc-rt] Prefer std::scoped_lock to std::lock_guard. NFCI. (#179165)

DeltaFile
+2-2orc-rt/lib/executor/Session.cpp
+2-21 files

LLVM/project 496d871mlir/lib/Bindings/Python IRCore.cpp

[mlir][Python] fix liveContextMap under free-threading after #178529 (#179163)

#178529 introduced a small bug under free-threading by bumping a
reference count (or something like that) when accessing the operand list
passed to `build_generic`. This PR fixes that.
DeltaFile
+1-1mlir/lib/Bindings/Python/IRCore.cpp
+1-11 files

LLVM/project c01828cmlir/lib/Bindings/Python IRCore.cpp

[mlir][Python] fix liveContextMap under free-threading after 178529
DeltaFile
+1-1mlir/lib/Bindings/Python/IRCore.cpp
+1-11 files

LLVM/project bb14eabllvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h

[VPlan] Split out EVL exit cond transform from canonicalizeEVLLoops. NFC (#178181)

This is split out from #177114.

In order to make canonicalizeEVLLoops a generic "convert to variable
stepping" transform, move the code that changes the exit condition to a
separate transform since not all variable stepping loops will want to
transform the exit condition. Run it before canonicalizeEVLLoops before
VPEVLBasedIVPHIRecipe is expanded.

Also relax the assertion for VPInstruction::ExplicitVectorLength to just
bail instead, since eventually VPEVLBasedIVPHIRecipe will be used by
other loops that aren't EVL tail folded.
DeltaFile
+46-32llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+6-4llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+6-0llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+2-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+60-364 files

LLVM/project dc152f0clang/test/Headers __clang_hip_math.hip, llvm/lib/IR Instruction.cpp

[IR] Add `fpmath` to keep list of dropUBImplyingAttrsAndMetadata (#179019)

`fpmath` is precision metadata rather than UB-implying metadata. This
avoids `fpmath` from being dropped in InstCombine FoldOpIntoSelect.
DeltaFile
+7-7clang/test/Headers/__clang_hip_math.hip
+9-0llvm/test/Transforms/InstCombine/fold-fops-into-selects.ll
+3-1llvm/lib/IR/Instruction.cpp
+19-83 files

LLVM/project acf2bbdllvm/lib/Target/AMDGPU GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU sched_mfma_rewrite_copies.mir sched_mfma_rewrite_cost.mir

Update new tests and format
DeltaFile
+949-949llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_copies.mir
+77-77llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_cost.mir
+1-2llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+1,027-1,0283 files

LLVM/project 6a7a832llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp GCNSchedStrategy.h, llvm/test/CodeGen/AMDGPU machine-scheduler-sink-trivial-remats-debug.mir

Set rematerialized MIs' reg operands to sentinel reg

Also removes a bunch of const specified on class members that prevents
std::sort from compiling on some configs.
DeltaFile
+20-8llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+6-4llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+2-2llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
+28-143 files

LLVM/project d739081llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp

Format
DeltaFile
+1-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+1-01 files

LLVM/project ba1fb49llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp GCNSchedStrategy.h, llvm/test/CodeGen/AMDGPU machine-scheduler-rematerialization-scoring.mir machine-scheduler-sink-trivial-remats-attr.mir

Re-apply "[AMDGPU][Scheduler] Scoring system for rematerializations (#175050)"

This re-applies commit f21e3593371c049380f056a539a1601a843df558 along
with the compile fix failure introduced in
8ab79377740789f6a34fc6f04ee321a39ab73724 before the initial patch was
reverted and fixes for the previously observed assert failure.

We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.

1. Small changes in slots indices (while preserving instruction order)
   compared to the pre-re-scheduling state meand that we have to
   re-compute live ranges for all register operands of rolled back
   rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
   arbitrary positions in their respective regions while their opcode
   is set to DBG_VALUE, even before their read operands are defined.
   This makes re-scheduling reverts mandatory before rolling back

    [4 lines not shown]
DeltaFile
+507-291llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+523-0llvm/test/CodeGen/AMDGPU/machine-scheduler-rematerialization-scoring.mir
+194-194llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir
+238-31llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir
+208-51llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+5-5llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
+1,675-5721 files not shown
+1,676-5737 files

LLVM/project cf60af8llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp GCNSchedStrategy.h, llvm/test/CodeGen/AMDGPU machine-scheduler-sink-trivial-remats-debug.mir machine-scheduler-sink-trivial-remats.mir

[AMDGPU][Scheduler] Revert all regions when remat fails to increase occ. (#177205)

When the rematerialization stage fails to increase occupancy in all
regions, the current implementation only reverts the effect of
re-scheduling in regions in which the increased occupancy target could
not be achieved. However, given that re-scheduling with a higher
occupancy target puts more pressure on the scheduler to achieve lower
maximum RP at the cost of potentially lower ILP as well, region
schedules made with higher occupancy targets are generally less
desirable if the whole function is not able to meet that target.
Therefore, if at least one region cannot reach its target, it makes
sense to revert re-scheduling in all affected regions to go back to a
schedule that was made with a lower occupancy target.

This implements such logic for the rematerialization stage, and adds a
test to showcase that re-scheduling is indeed interrupted/reverted as
soon as a re-scheduled region that does not meet the increased target
occupancy is encountered.


    [4 lines not shown]
DeltaFile
+118-0llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
+58-17llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+15-15llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir
+27-1llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+218-334 files

LLVM/project 9e5deb9clang-tools-extra/clang-tidy/modernize UseNullptrCheck.cpp

[clang-tidy] Speed up `modernize-use-nullptr` (#178829)

As noted in [this
comment](https://github.com/llvm/llvm-project/pull/178149#discussion_r2732896149),
it appears that registering one `anyOf(a, b, ...)` matcher is generally
slower than registering `a, b, ...` all individually. Applying that
knowledge to this check gives us an easy 3x speedup:
```txt
                    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
Status quo:         0.3281 (  6.1%)   0.0469 (  5.2%)   0.3750 (  6.0%)   0.3491 (  5.5%)  modernize-use-nullptr
With this change:   0.0938 (  1.8%)   0.0156 (  1.8%)   0.1094 (  1.8%)   0.1260 (  2.1%)  modernize-use-nullptr
```
I'm not exactly sure *why* this works, but it seems pretty consistent.
I've seen a similar result trying this with `bugprone-infinite-loop`.
DeltaFile
+24-27clang-tools-extra/clang-tidy/modernize/UseNullptrCheck.cpp
+24-271 files