[NVPTX] Add commutativity to SETP instructions to enable MachineCSE of inverted predicates
Inverted predicates can be used freely in PTX. If we can invert a
predicate and CSE the generating instruction we can save calculating
the inverse.
Teach the NVPTX commuteInstructionImpl that SETP instructions can be
inverted to allow CSEing with previous SETP that match the inverted
form. This also inverts the branch users of the predicate to maintain
correctness.
Currently only allow the SETP inversion if all users are branches.
Future work can extend this to sel and not instructions.
Made-with: Cursor
[clang-tidy][NFC] Fix list.rst and improve alias detection of `add_new_check.py` (#192228)
Follow up of https://github.com/llvm/llvm-project/pull/192224.
This commit does two things:
- Replace the original alias detection based on `:http-equiv` (we may
remove these completely in the future) with a method of directly
matching the documentation section.
- Update the list.rst
---------
Co-authored-by: Victor Chernyakin <chernyakin.victor.j at outlook.com>
[NFC] [clangd] [C++20] [Modules] Rename and move scanningProjectModules (#193128)
I am going to add more stuff to ProjectModules and the current structure
and the file name scanningProjectModules may be confusing.
This NFC patch changes that.
[AMDGPU] Fixed verifier crash because of multiple live range components. (#190719)
In Rewrite AGPR-Copy-MFMA pass, after replacing spill instructions, the
replacement register may have multiple live range components when the
spill slot was stored to more than once. The verifier crashes with a bad
machine code error. This patch fixes the problem by splitting a live
range but assigning the same physical register in this scenario. A new
test has been added that verifies the absence of this verifier error.
Assisted-by: Claude Opus
[BOLT] Fix stream position before appendPadding in writeEHFrameHeader
When writeEHFrameHeader needs to allocate new space for .eh_frame_hdr
(because the old section is too small), it calls appendPadding to align
NextAvailableAddress. appendPadding writes zero bytes at the current
stream position, but after the section write loop in rewriteFile the
stream is positioned at the end of the last section written in
BinarySection::operator< order — not at the file offset corresponding
to NextAvailableAddress.
In the common case (single loadObject call) the write order matches file
offset order, so the stream happens to be in the right place. But when
a runtime library adds sections via additional loadObject calls, the
operator< iteration order (code-before-data) can diverge from file
offset order: a runtime library code section may have a higher file
offset than a runtime library data section that comes after it in the
write loop. The stream then ends at a lower offset than expected, and
appendPadding's zeros overwrite the beginning of the code section.
Fix by seeking to the correct file offset before calling appendPadding.
[test][LowerTypeTests] Re-generate jump table tests with --check-globals (#192734)
Debug information will be updated in the
https://github.com/llvm/llvm-project/pull/192736,
so we want to track the difference.
Prevent undefined behavior caused by combination of branch and load delay slots on MIPS1 (#185427)
Under certain conditions the LLVM `MipsDelaySlotFiller` fills a branch
delay slot with an instruction requiring a load delay slot. However the
`MipsDelaySlotFiller` does not check the filled instruction for hazard
which leads to code like this:
```asm
beqz $1, $BB0_5
lbu $2, %lo(_RNvCs5jWYnRsDZoD_3app13CONTROLLERS_A)($2)
# --- Some other instructions
$BB0_5:
andi $1, $2, 1
```
`lbu` got moved into the branch delay slot but has a load delay slot -
so when jumping to `$BB0_5` the value for `$2` will not be ready, which
leads to undefined behavior.
This PR suggests to declare instructions with a load delay slot to be
hazardous for the branch delay slot, only for `MIPS1`. This will prevent
[21 lines not shown]
[ObjC] Fix missing ptrauth signing of isa in constant ObjC literals (#191091)
154d2267b897 added support for emitting ObjC number, array, and
dictionary literals as constants, but did not sign the class pointer
fields in NSConstantIntegerNumber, NSConstantFloatNumber,
NSConstantDoubleNumber, NSConstantArray, and NSConstantDictionary
structs with the ObjCIsaPointers ptrauth schema on arm64e. Fix this by
using addSignedPointer instead of add when emitting those fields.
rdar://174359070