AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.trig.preop
Surprisingly this doesn't consider the special cases, and literally
just extracts the exponent and proceeds as normal.
AMDGPU: Fix incorrect fold of undef for llvm.amdgcn.trig.preop
We were folding undef inputs to qnan which is incorrect. The instruction
never returns nan. Out of bounds segment select will return 0, so fold
undef segment to 0.
AMDGPU: Use SimplifyQuery in AMDGPUCodeGenPrepare (#179133)
Enables assumes in more contexts. Of particular interest is the
nan check for the fract pattern.
The device libs f32 and s64 sin implementations have a range check,
and inside the large path this pattern appears. After a small patch
to invert this check to send nans down the small path, this will
enable the fold unconditionally on the large path.
[LoopCacheAnalysis] Remove tryDelinearizeFixedSize (NFCI) (#177552)
LoopCacheAnalysis has its own function `tryDelinearizeFixedSize`, which
is a wrapper of Delinearization. Due to recent changes in
Delinearization, this function has become almost equivalent to
`delinearizeFixedSizeArray` and is no longer necessary. This patch
removes it.
[MachineFunctionPass] Preserve more IR analyses (#178871)
Preserve, PDT, BPI, LazyBPI and LazyBFI. These are all IR analysis that
are not invalidated by machine passes.
This partially mitigates the compile-time regression from
https://github.com/llvm/llvm-project/pull/174746.
[PowerPC] Fix miscompilation when using 32-bit ucmp on 64-bit PowerPC (#178979)
I forgot that you need to clear the upper 32 bits for the carry flag to
work properly on ppc64 or else there will be garbage and possibly
incorrect results.
Fixes: https://github.com/llvm/llvm-project/issues/179119
I do not have merge permissions.
[clang][ExprConst] Move shared `EvalInfo` state into `interp::State` (#177738)
Instead of having `InterpState` call into its parent `EvalInfo`, just
save the state in `interp::State`, where both subclasses can access it.
[mlir][Interfaces] Add `ExecutionProgressOpInterface` + folding pattern (#179039)
Add the `ExecutionProgressOpInterface` with an interface method to check
if an operation "must progress". Add `mustProgress` attributes to
`scf.for` and `scf.while` (default value is "true").
`mustProgress` corresponds to the [`llvm.loop.mustprogress`
metadata](https://llvm.org/docs/LangRef.html#langref-llvm-loop-mustprogress).
Also add a canonicalization pattern to erase `RegionBranchOpInterface`
ops that must progress but loop infinitely (and are non-side-effecting).
This canonicalization pattern is enabled for `scf.for` and `scf.while`.
RFC: https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
[DebugInfo][NVPTX] Adding support for `inlined_at` debug directive in NVPTX backend (#170239)
This change adds support for emitting the enhanced PTX debugging
directives `function_name` and `inlined_at` as part of the `.loc`
directive in the NVPTX backend.
`.loc` syntax -
>.loc file_index line_number column_position
`.loc` syntax with `inlined_at` attribute -
>.loc file_index line_number column_position,function_name label {+
immediate }, inlined_at file_index2 line_number2 column_position2
`inlined_at` attribute specified as part of the `.loc` directive
indicates PTX instructions that are generated from a function that got
inlined. It specifies the source location at which the specified
function is inlined. `file_index2`, `line_number2`, and
`column_position2` specify the location at which the function is
inlined.
[27 lines not shown]
[orc-rt] Use future rather than condition_variable for shutdown wait. (#179169)
Session::waitForShutdown is a convenience wrapper around the
asynchronous Session::shutdown call. The previous
Session::waitForShutdown call waited on a std::condition_variable to
signal the end of shutdown, but it's easier to just embed a std::promise
in a callback to the asynchronous shutdown method.
[mlir][Python] fix liveContextMap under free-threading after #178529 (#179163)
#178529 introduced a small bug under free-threading by bumping a
reference count (or something like that) when accessing the operand list
passed to `build_generic`. This PR fixes that.
[VPlan] Split out EVL exit cond transform from canonicalizeEVLLoops. NFC (#178181)
This is split out from #177114.
In order to make canonicalizeEVLLoops a generic "convert to variable
stepping" transform, move the code that changes the exit condition to a
separate transform since not all variable stepping loops will want to
transform the exit condition. Run it before canonicalizeEVLLoops before
VPEVLBasedIVPHIRecipe is expanded.
Also relax the assertion for VPInstruction::ExplicitVectorLength to just
bail instead, since eventually VPEVLBasedIVPHIRecipe will be used by
other loops that aren't EVL tail folded.
[IR] Add `fpmath` to keep list of dropUBImplyingAttrsAndMetadata (#179019)
`fpmath` is precision metadata rather than UB-implying metadata. This
avoids `fpmath` from being dropped in InstCombine FoldOpIntoSelect.
Set rematerialized MIs' reg operands to sentinel reg
Also removes a bunch of const specified on class members that prevents
std::sort from compiling on some configs.
Re-apply "[AMDGPU][Scheduler] Scoring system for rematerializations (#175050)"
This re-applies commit f21e3593371c049380f056a539a1601a843df558 along
with the compile fix failure introduced in
8ab79377740789f6a34fc6f04ee321a39ab73724 before the initial patch was
reverted and fixes for the previously observed assert failure.
We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.
1. Small changes in slots indices (while preserving instruction order)
compared to the pre-re-scheduling state meand that we have to
re-compute live ranges for all register operands of rolled back
rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
arbitrary positions in their respective regions while their opcode
is set to DBG_VALUE, even before their read operands are defined.
This makes re-scheduling reverts mandatory before rolling back
[4 lines not shown]
[AMDGPU][Scheduler] Revert all regions when remat fails to increase occ. (#177205)
When the rematerialization stage fails to increase occupancy in all
regions, the current implementation only reverts the effect of
re-scheduling in regions in which the increased occupancy target could
not be achieved. However, given that re-scheduling with a higher
occupancy target puts more pressure on the scheduler to achieve lower
maximum RP at the cost of potentially lower ILP as well, region
schedules made with higher occupancy targets are generally less
desirable if the whole function is not able to meet that target.
Therefore, if at least one region cannot reach its target, it makes
sense to revert re-scheduling in all affected regions to go back to a
schedule that was made with a lower occupancy target.
This implements such logic for the rematerialization stage, and adds a
test to showcase that re-scheduling is indeed interrupted/reverted as
soon as a re-scheduled region that does not meet the increased target
occupancy is encountered.
[4 lines not shown]
[clang-tidy] Speed up `modernize-use-nullptr` (#178829)
As noted in [this
comment](https://github.com/llvm/llvm-project/pull/178149#discussion_r2732896149),
it appears that registering one `anyOf(a, b, ...)` matcher is generally
slower than registering `a, b, ...` all individually. Applying that
knowledge to this check gives us an easy 3x speedup:
```txt
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
Status quo: 0.3281 ( 6.1%) 0.0469 ( 5.2%) 0.3750 ( 6.0%) 0.3491 ( 5.5%) modernize-use-nullptr
With this change: 0.0938 ( 1.8%) 0.0156 ( 1.8%) 0.1094 ( 1.8%) 0.1260 ( 2.1%) modernize-use-nullptr
```
I'm not exactly sure *why* this works, but it seems pretty consistent.
I've seen a similar result trying this with `bugprone-infinite-loop`.