[NVPTX][clang] Ensure CLZ(0) is defined on NVPTX (#185630)
CUDA semantics specify that clz(0) = bitwidth, so clang should emit clz
/ ctz intrinsics for NVPTX with zero-is-poison = false.
[AArch64] Adding FeatureFuseFCmpFCSel (#184881)
This adds a new AArch64 feature, FeatureFuseFCmpFCSel - for FP compare
and FP Select instruction, and adds it to recent Apple CPUs.
Instruction scheduling makes such pairs adjacent.
[lldb][PlatformDarwin][NFC] Factor sanitization of Python module names into helper function (#185627)
I'm planning on re-using this logic for another API. This patch creates
a `SanitizedScriptingModuleName` that encapsulates the logic that checks
whether a file name would fail to be loaded by a `ScriptInterpreter`. I
called it something more generic despite it being `Python` specific at
the moment, in case the FIXME is eventually going to be addressed.
We have existing unit-tests that check this logic, so I'm relying on
that test coverage to give us confidence that this still works as
expected.
[AArch64][GlobalISel] Add G_SQDMULL node
Previously, GISel was failing to lower the sqdmulls.scalar intrinsic. This is just a variation of sqdmull, but on two 32-bit S registers.
To fix this, create a G_SQDMULL node, and lower sqdmulls.scalar to that. This node is linked to the SD patterns for sqdmull, which allow this version of the intrinsic to lower.
[LoongArch] Try to avoid casts around logical vector ops on lasx (#163523)
On LASX the type v4i1/v8i1/v16i1 may be legalized to v4i32/v8i16/v16i8,
which is LSX-sized register. In most cases we actually compare or select
LASX-sized registers and mixing the two types creates horrible code.
libclc: Add fast version utility functions for div, sqrt and reciprocal
These are subtly different from the native versions, and should have
tighter requirements. They should handle the special cases correctly,
unlike the native functions from the standard.
[SPIRV] Add tests documenting incorrect lowering of load/store atomic (#185628)
This patch only adds the tests documenting the broken behavior, but does
not fix them.
[libunwind][PAC] Defang ptrauth's PC in valid CFI range abort
It turns out making the CFI check a release mode abort causes many,
if not the majority, of JITs to fail during unwinding as they do not
set up CFI sections for their generated code. As a result any JITs
that do nominally support unwinding (and catching) through their JIT
or assembly frames trip this abort.
rdar://170862047
[libunwind][PAC] Defang ptrauth's PC in valid CFI range abort
It turns out making the CFI check a release mode abort causes many,
if not the majority, of JITs to fail during unwinding as they do not
set up CFI sections for their generated code. As a result any JITs
that do nominally support unwinding (and catching) through their JIT
or assembly frames trip this abort.
rdar://170862047
[DA] refactor bounds inference in exactSIVtest and exactRDIVtest (NFC) (#185719)
Replaces the `SmallVector`-based approach for computing the min/max of
affine domain bounds with `GetMaxOrMin` lambda returning `std::optional`
for better readability.
Previously, the code allocated a `SmallVector` to collect valid bounds
and relied on `smax(front(), back())` to handle the single-element case,
which may cause misunderstanding.
---------
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
libclc: Add fast version utility functions for div, sqrt and reciprocal
These are subtly different from the native versions, and should have
tighter requirements. They should handle the special cases correctly,
unlike the native functions from the standard.
[RISCV] Disable use of scalable vectors for VLEN=32 (#185553)
This patch prevents the loop vectorizer to choose scalable vector type
when target VLEN is less than RVVBitsPerBlock.