[RISCV] Use NoV0 register classes for masked `VPseudoBinaryM` (#175706)
There are two constraints:
1. The same register can't have two EEWs. `V0` is already the mask
register, so other register source operands can't be `V0`.
2. The destination and source registers can't overlap. We have added
`@earlyclobber` constraint so we won' allocate `V0` to destination.
[BOLT][BTI] Patch LLD-generated PLTs to contain BTI landing pad
This patch adds the patchPLTEntryForBTI to enable patching PLT entries
generated by LLD.
Context:
To keep BTI consistent, targets of stubs inserted in LongJmp need to be
patched. As PLTs are not optimized and emitted by BOLT, this patch adds
a helper for patching them in the original location.
For PLTs generated by LLD, this is safe as LLD inserts extra nops to
PLTs which don't already contain a BTI.
PLT entry before patching:
adrp x16, Page(&(.got.plt[n]))
ldr x17, [x16, Offset(&(.got.plt[n]))]
add x16, x16, Offset(&(.got.plt[n]))
[24 lines not shown]
[BOLT][BTI] Disassemble PLT entries when processing BTI binaries (#169663)
PLT entries are PseudoFunctions, and are not disassembled or emitted.
For BTI, we need to check the first MCInst of PLT entries, to see
if indirectly calling them is safe or not.
This patch disassembles PLTs for binaries using BTI, while not changing
the behaviour for binaries without BTI.
The PLTs are only disassembled, not emitted.
---------
Co-authored-by: Paschalis Mpeis <paschalis.mpeis at arm.com>
powerpc: fix release image building for Apple partitions
awk changed somewhere between 14 and 15 and it stopped accepting
a hexadecimal number as its input - it will always return 0.
This results in a very badly written apple boot block.
So just remove it; do the math in shell.
PR: kern/292341
Differential Revision: https://reviews.freebsd.org/D54639
Reviewed by: imp
MFC after: 1 week
(cherry picked from commit 7afa03963c448a14b1735a10eaf84941b0b74862)
[JITLink][CompactUnwind] Expand CompactUnwindTraits struct comment. (#176315)
Adds notes on the properties and methods that must be implemented by
traits classes derived from CompactUnwindTraits.
[JITLink][CompactUnwind] Express mergeability via +ve predicate. NFCI. (#176313)
Compact unwind record merging is an optimization. Using a can-be-merged
predicate is preferrable to a "cannot-be-merged" predicate as the former
encourages conservatively correct implementations: "what is safe to
merge" is easier to reason about than "what is safe to not not merge".
[LV] Prevent `extract-lane` generate unused IRs with single vector operand. (#172798)
When `extract-lane` only contains single vector operand. We can simplify
it to `extractelement`.
This patch makes `extract-lane` generate simple `extractelement` when it
only contains single vector operand to prevent unused IR generated.
This patch is mostly NFC, the unused IR should be removed in following
IR passes.
[MLIR][NVVM][Tests] Re-enable matmul.py tests (#175728)
This patch re-enables the matmul.py tests:
* Fix gpu.wait usages
* Fix gpu.launchOp usage
* Fix format-string for gpu.printf
* Fix verification failure by removing the block[0] append.
This is now done by the python script's init.
* Fix the runtime error by adding the missing initialize() call during
JIT.
* Add the missing waitGroup(0) for _ws implementation.
This was mistakenly removed in PR #113713. Without this fix,
I see timing issues and the _ws tests with stage>1 randomly show output
mismatch.
With all these fixes, the test compiles and
executes successfully on an sm90a machine.
(locally verified for 1K iterations)
Signed-off-by: Durgadoss R <durgadossr at nvidia.com>
[RISCV] Store original LocVT/LocInfo in PendingLocs instead of XLenVT/Indirect. NFC (#176193)
Convert to XLenVT/Indirect when we use the PendingLocs. This allows the
2*XLen case to use the original LocVT and not the overridden XLenVT.
Hoping this reduces some of the changes from #176093.
[libc][math] Refactor dfmal to Header Only. (#175359)
builds correctly with both Clang and GCC 12.2.
Since `fma` is not `constexpr`, `dfmal` cannot be declared `constexpr`
either.
Closes #175316.