[BOLT][Perf2bolt] Add support to generate pre-parsed perf data (#171144)
Adding a generator into Perf2bolt is the initial step to support the
large end-to-end tests for Arm SPE. This functionality proves unified format of
pre-parsed profile that Perf2bolt is able to consume.
Why does the test need to have a textual format SPE profile?
* To collect an Arm SPE profile by Linux Perf, it needs to have
an arm developer device which has SPE support.
* To decode SPE data, it also needs to have the proper version of
Linux Perf.
* The minimum required version of Linux Perf is v6.15.
Bypassing these technical difficulties, that easier to prove
a pre-generated textual profile format.
The generator relies on the aggregator work to spawn the required
perf-script jobs based on the the aggregation type, and merges the
[12 lines not shown]
[AMDGPU] Use s_cvt_i32/u32_f32 instructions for saturated uniform conversions (#187711)
We attempt to select `s_cvt_i32/u32_f32` where possible, with some
considerations:
* For `f64` default to `v_` instructions as there is no support for
`f64` in SALU.
* For `f16` to `i16` select `v_cvt_i16/u16_f16` which is consistent with
non-saturating conversions behavior. However we could emit
`s_cvt_f32_f16` followed by `s_cvt_i32/u32_f32` to keep the computation
in SALU, as SALU does not have `s_cvt_i16_f16`. Happy to look into it if
beneficial.
* When it comes to clamping, ISel turns min and max sequence into
`v_med3` with `v0` destination, whereas globalisel keeps min and max as
`s_min` and `s_max` and then moves the result into `v0`, as lit tests
expect the return value to be in `v0` in both cases. This is unrelated
to this change but I thought it is worth highlighting.
[clang][bytecode] Add source info to jump ops (#188003)
The attached test case otherwise results in a function with one jump op
but no source info at all.
[GlobalISel] Add `widenScalarFor()` function (#187731)
The function is mentioned in `Legalizer.rst` but has been missing. This
also fixes the asymetry between `narrowScalarXXX()` that has both
`narrowScalarFor()` and `narrowScalarIf()`, and `widenScalarXXX()` that
only had `widenScalarIf()`.
[AArch64] Combine cases with the same code in `expandMOVImm` (NFC) (#187843)
Combine cases for `ORRWri`, `ORRXri`, `ANDXri` and `EORXri` in
`AArch64ExpandPseudoImpl::expandMOVImm`, because these cases are handled
with exactly the same code.
[AArch64] Fix _sys implemantation and MRS/MSR Sema checks (#187290)
This patch fixes lowering of _sys builtin, which used to lower into
invalid MSR S1... instruction. This was fixed by adding new sys llvm
intrinsic and proper lowering into sys instruction and its aliases.
I also fixed the sema check for _sys, _ReadStatusRegister and
_WriteStatusRegister builtins so they correctly capture invalid
usecases.
libclc: Implement remainder with remquo
(#187999)
This fixes conformance failures for double and
without -cl-denorms-are-zero. Optimizations are
able to eliminate the unusued quo handling without
duplicating most of the code.
libclc: Update remquo (#187998)
This was failing in the float case without -cl-denorms-are-zero
and failing for double. This now passes in all cases.
This was originally ported from rocm device libs in
8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port
in of more recent changes with a few changes.
- Templatification, which almost but doesn't quite enable
vectorization yet due to the outer branch and loop.
- Merging of the 3 types into one shared code path, instead of
duplicating per type with 3 different functions implemented together.
There are only some slight differences for the half case, which mostly
evaluates as float.
- Splitting out of the is_odd tracking, instead of deriving it from the
accumulated quotient. This costs an extra register, but saves several
[6 lines not shown]
[mlir][LLVM] Add more `llvm.intr.experimental.constrained.*` ops (#187948)
Add additional "constrained" intrinsic ops. A rounding mode can be
specified for these ops.
Assisted by: claude-4.6-opus-high
[clang][bytecode] Create fewer pointers in __builtin_nan() (#187990)
Check the elements directly for initialization state and keep track of
whether we found a NUL byte.
libclc: Update remainder
Previously this was failing conformance without -cl-denorms-are-zero
in the float case, and always failing in the double case.
libclc: Implement remainder with remquo
This fixes conformance failures for double and
without -cl-denorms-are-zero. Optimizations are
able to eliminate the unusued quo handling without
duplicating most of the code.
[AsmPrinter] Add generic support for verifying instruction sizes (#187703)
Many backends rely on TII reporting correct instruction sizes for MIR
level branch relaxation passes. Reporting a too small size can result in
MC fixup failures (or silent miscompiles for unvalidated fixups).
Some time ago I added validation to the PPC asm printer to verify that
the TII instruction size matches the actually emitted size. This was
very helpful to systematically fix all incorrectly reported instruction
sizes.
However, the same problem also exists in lots of other backends, so this
moves the validation into AsmPrinter, controlled by a new
getInstSizeVerifyMode() hook in TII, which is disabled by default.
The intention here is to gradually enable this validation for more
backends (which requires fixing them first).
[AMDGPU] Update test to match comment. NFC (#187273)
The comment says there shouldn't be any free registers, so update the
inline assembly to clobber all non-preserved SGPRs.
[clang-tidy] Correctly ignore function templates in derived-method-shadowing-base-method (#185741) (#185875)
This commit fixes a false positive in the
derived-method-shadowin-base-method clang-tidy check, as described in
[ticket 185741](https://github.com/llvm/llvm-project/issues/185741)
Fixes #185741
---------
Co-authored-by: Tom James <tom.james at siemens.com>
Co-authored-by: Zeyi Xu <mitchell.xu2 at gmail.com>
[BOLT] Remove some unused code (NFC) (#183880)
Remove some unused code in BOLT:
- `RewriteInstance::linkRuntime` is declared but not defined
- `BranchContext` typedef is never used
- `FuncBranchData::getBranch` is defined but never used
- `FuncBranchData::getDirectCallBranch` is defined but never used
[X86] Emit user-friendly error for x86_fp80 with x87 disabled on x86_64 (#183932)
When compiling a function that uses `x86_fp80` on x86_64 with x87 disabled (`-mattr=-x87`), LLVM crashes with a cryptic internal error.
Fixes #182450