[DebugInfo][NVPTX] Adding support for `inlined_at` debug directive in NVPTX backend (#170239)
This change adds support for emitting the enhanced PTX debugging
directives `function_name` and `inlined_at` as part of the `.loc`
directive in the NVPTX backend.
`.loc` syntax -
>.loc file_index line_number column_position
`.loc` syntax with `inlined_at` attribute -
>.loc file_index line_number column_position,function_name label {+
immediate }, inlined_at file_index2 line_number2 column_position2
`inlined_at` attribute specified as part of the `.loc` directive
indicates PTX instructions that are generated from a function that got
inlined. It specifies the source location at which the specified
function is inlined. `file_index2`, `line_number2`, and
`column_position2` specify the location at which the function is
inlined.
[27 lines not shown]
[orc-rt] Use future rather than condition_variable for shutdown wait. (#179169)
Session::waitForShutdown is a convenience wrapper around the
asynchronous Session::shutdown call. The previous
Session::waitForShutdown call waited on a std::condition_variable to
signal the end of shutdown, but it's easier to just embed a std::promise
in a callback to the asynchronous shutdown method.
[mlir][Python] fix liveContextMap under free-threading after #178529 (#179163)
#178529 introduced a small bug under free-threading by bumping a
reference count (or something like that) when accessing the operand list
passed to `build_generic`. This PR fixes that.
[VPlan] Split out EVL exit cond transform from canonicalizeEVLLoops. NFC (#178181)
This is split out from #177114.
In order to make canonicalizeEVLLoops a generic "convert to variable
stepping" transform, move the code that changes the exit condition to a
separate transform since not all variable stepping loops will want to
transform the exit condition. Run it before canonicalizeEVLLoops before
VPEVLBasedIVPHIRecipe is expanded.
Also relax the assertion for VPInstruction::ExplicitVectorLength to just
bail instead, since eventually VPEVLBasedIVPHIRecipe will be used by
other loops that aren't EVL tail folded.
www/elinks: Update to 0.19.0
Use gettext-runtime and localbase:ldflags.
Add TEST_TARGET and remove unneeded TEST_USES.
Add LD_FLAGS.
Pet portlint/portfmt.
Changelog: https://github.com/rkd77/elinks/releases/tag/v0.19.0
PR: 291966
Approved by: submitter is maintainer
[IR] Add `fpmath` to keep list of dropUBImplyingAttrsAndMetadata (#179019)
`fpmath` is precision metadata rather than UB-implying metadata. This
avoids `fpmath` from being dropped in InstCombine FoldOpIntoSelect.
Set rematerialized MIs' reg operands to sentinel reg
Also removes a bunch of const specified on class members that prevents
std::sort from compiling on some configs.
Re-apply "[AMDGPU][Scheduler] Scoring system for rematerializations (#175050)"
This re-applies commit f21e3593371c049380f056a539a1601a843df558 along
with the compile fix failure introduced in
8ab79377740789f6a34fc6f04ee321a39ab73724 before the initial patch was
reverted and fixes for the previously observed assert failure.
We were hitting the assert in the HIP Blender due to a combination of
two issues that could happen when rematerializations are being rolled
back.
1. Small changes in slots indices (while preserving instruction order)
compared to the pre-re-scheduling state meand that we have to
re-compute live ranges for all register operands of rolled back
rematerializations. This was not being done before.
2. Re-scheduling can move registers that were rematerialized at
arbitrary positions in their respective regions while their opcode
is set to DBG_VALUE, even before their read operands are defined.
This makes re-scheduling reverts mandatory before rolling back
[4 lines not shown]
[AMDGPU][Scheduler] Revert all regions when remat fails to increase occ. (#177205)
When the rematerialization stage fails to increase occupancy in all
regions, the current implementation only reverts the effect of
re-scheduling in regions in which the increased occupancy target could
not be achieved. However, given that re-scheduling with a higher
occupancy target puts more pressure on the scheduler to achieve lower
maximum RP at the cost of potentially lower ILP as well, region
schedules made with higher occupancy targets are generally less
desirable if the whole function is not able to meet that target.
Therefore, if at least one region cannot reach its target, it makes
sense to revert re-scheduling in all affected regions to go back to a
schedule that was made with a lower occupancy target.
This implements such logic for the rematerialization stage, and adds a
test to showcase that re-scheduling is indeed interrupted/reverted as
soon as a re-scheduled region that does not meet the increased target
occupancy is encountered.
[4 lines not shown]
[clang-tidy] Speed up `modernize-use-nullptr` (#178829)
As noted in [this
comment](https://github.com/llvm/llvm-project/pull/178149#discussion_r2732896149),
it appears that registering one `anyOf(a, b, ...)` matcher is generally
slower than registering `a, b, ...` all individually. Applying that
knowledge to this check gives us an easy 3x speedup:
```txt
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
Status quo: 0.3281 ( 6.1%) 0.0469 ( 5.2%) 0.3750 ( 6.0%) 0.3491 ( 5.5%) modernize-use-nullptr
With this change: 0.0938 ( 1.8%) 0.0156 ( 1.8%) 0.1094 ( 1.8%) 0.1260 ( 2.1%) modernize-use-nullptr
```
I'm not exactly sure *why* this works, but it seems pretty consistent.
I've seen a similar result trying this with `bugprone-infinite-loop`.