Enable pass instrumentation to signal failures. (#163126)
Enables adding instrumentation to pass manager that can track/flag
invariants. This would be useful for cases where one some tighter
requirements than the general dialects or for a phase of conversion that
elsewhere.
It would enable making verify also just a regular instrumentation I
believe, but also a non-goal as that is a first class concept and
baseline for the ops and passes.
Would have enabled some of the requirements of
https://discourse.llvm.org/t/pre-verification-logic-before-running-conversion-pass-in-mlir/88318/10
.
[clang][Driver] SPIRVAMDToolChain must not require device libs.
Prior to this changes, the toolchain was looking for device libs and failing.
This is fixed by not looking for device libs (for SPIR-V).
[sanitizer_common][test-only] Remove xfail for darwin ubsan on dedup_token_length_test (#171812)
This test is currently XPASSing on the iossim CI.
rdar://166219043
InstCombine: Fold ldexp with constant exponent to fmul
If we can represent this with an fmul, prefer it as a canonical
form. More optimizations will understand fmul, and allows contract to
fma.
ValueTracking: Teach computeKnownFPClass that multiply can avoid denormals
Multiply by large constant can be used to scale denormal inputs into
a normal range. This pattern appears frequently in math function library
implementations to make use of hardware instructions that do not support
denormals. We already handle this case for ldexp, but now canonicalize
ldexp by a constant to an fmul.
The test cases are mostly the existing nofpclass test for ldexp,
run through the new instcombine to replace ldexp with fmul.
[X86] Allow handling of i128/256/512 AND/OR/XOR bitlogic on the FPU (#171616)
If the scalar integer sources are freely transferable to the FPU, then
perform the bitlogic op as a SSE/AVX operation.
Uses the mayFoldIntoVector helper added at #171589
[MLIR][SCF] Verify number of regions in scf.reduce (#171450)
This patch adds `ReduceOp::verifyRegions` to ensure that the number of
reduction regions equals the number of operands (`getReductions().size()
== getOperands().size()`).
Additionally, `ParallelOp::verify` is updated to gracefully handle cases
where the number of reduce operands differs from the initial values,
preventing verification logic crashes and relying on `ReduceOp` to
report structural inconsistencies.
Fixes: #118768
[AtomicExpand] Add bitcasts when expanding load atomic vector
AtomicExpand fails for aligned `load atomic <n x T>` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered. It also adds support for
128 bit lowering in tablegen to support SSE/AVX.
[lldb][test] Fix toolchain-msvc.test for native ARM64 MSVC environment (#171797)
This patch fixes toolchain-msvc.test on Windows ARM64 hosts running
under native ARM64 environment via vcvarsarm64.bat. Our lab buildbot
recently switched from using cross vcvarsamd64_arm64.bat environment to
native vcvarsarm64.bat. This patch updates FileCheck patterns to also
allow HostARM64 and arm64 PATH entries.
Changes:
-> Extend host regex to match HostARM64 (case-insensitive)
-> Allow arm64 in PATH tail.
-> Apply same fix in both 32-bit and 64-bit sections.
[JITLink] Add TLS support for SystemZ (#171559)
This patch adds TLS support for SystemZ on top of orc-runtime support. A
separate orc-runtime support #171062 has been created from earlier TLS
support #[170706](https://github.com/llvm/llvm-project/pull/170706).
See conversations in
[#170706](https://github.com/llvm/llvm-project/pull/170706)
---------
Co-authored-by: anoopkg6 <anoopkg6 at github.com>
IR: Stop requiring nsz to reassociate fmul (#171726)
nsz can only change the behavior of the sign bit.
The sign bit for fmul can be implemented as xor,
which is associative. DAGCombiner already reassociates
the multiply by 2 constants without nsz.
Fixes #64967
[BOLT][BTI] Add needed BTIs in LongJmp or refuse to optimize binary
This patch adds BTI landing pads to ShortJmp/LongJmp targets in the
LongJmp pass when optimizing BTI binaries.
BOLT does not have the ability to add BTI to all types of functions.
This patch aims to insert the landing pad where possible, and emit an
error where it currently is not.
BOLT cannot insert BTIs into several function "types", including:
- ignored functions,
- PLT functions,
- other functions without a CFG.
Additional context:
In #161206, BOLT gained the ability to decode the .note.gnu.property
section, and warn about lack of BTI support for BOLT. However, this
warning is misleading: the emitted binary may not need extra BTI landing
[3 lines not shown]
[BOLT][BTI] Add MCPlusBuilder::insertBTI (#167329)
This function contains most of the logic for BTI:
- it takes the BasicBlock and the instruction used to jump to it.
- Then it checks if the first non-pseudo instruction is a sufficient
landing pad for the used call.
- if not, it generates the correct BTI instruction.
Also introduce the isCallCoveredByBTI helper to simplify the logic.
[MLIR][NVVM] Update PMEvent lowering to intrinsics (#171649)
The patch updates the lowering of `id` based pmevent
also to intrinsics. The mask is simply (1 << event-id).
Signed-off-by: Durgadoss R <durgadossr at nvidia.com>
[mlir][scf] Add value bound for computed upper bound of forall loop (#171158)
Add additional bound for the induction variable of the scf.forall such
that:
%iv <= %lower_bound + (%trip_count - 1) * step
Same as https://github.com/llvm/llvm-project/pull/126426 but for
scf.forall loop
InstCombine: Fold ldexp with constant exponent to fmul
If we can represent this with an fmul, prefer it as a canonical
form. More optimizations will understand fmul, and allows contract to
fma.
ValueTracking: Teach computeKnownFPClass that multiply can avoid denormals
Multiply by large constant can be used to scale denormal inputs into
a normal range. This pattern appears frequently in math function library
implementations to make use of hardware instructions that do not support
denormals. We already handle this case for ldexp, but now canonicalize
ldexp by a constant to an fmul.
The test cases are mostly the existing nofpclass test for ldexp,
run through the new instcombine to replace ldexp with fmul.
IR: Stop requiring nsz to reassociate fmul
nsz can only change the behavior of the sign bit.
The sign bit for fmul can be implemented as xor,
which is associative. DAGCombiner already reassociates
the multiply by 2 constants without nsz.
Fixes #64967