[Hexagon] Fix SplitVectors crash in HVX type legalization (#181377)
When LegalizeHvxResize splits a multi-step TL_EXTEND (e.g., v128i32 from
v128i8, which is i8->i32), SplitVectorOp halves both input and output
types. This creates operand types that are half the HVX vector width
(e.g., v64i8 = 512 bits on 128-byte HVX), which are not legal HVX types.
These sub-HVX intermediate types confuse the DAG type legalizer's map
tracking, causing "Unprocessed value in a map! SplitVectors" assertions
with EXPENSIVE_CHECKS or
-enable-legalize-types-checking.
Fix by first expanding multi-step TL_EXTEND/TL_TRUNCATE operations into
a chain of single-step operations via ExpandHvxResizeIntoSteps before
splitting. Each single-step operation (e.g., i16->i32) can be safely
split because halving its operand type produces a legal HVX type (e.g.,
v64i16 = HVX single vector).
[LangRef][ConstantTime] Add documentation for llvm.ct.select.* constant-time intrinsics (#181042)
This PR introduces and documents the llvm.ct.select.* constant-time
intrinsics, providing timing-independent selection operations for
security-sensitive code. The LangRef is updated with syntax, semantics,
supported types, and usage guidance.
Additionally, test coverage is extended with a new <8 x float> variant
(llvm.ct.select.v8f32) and corresponding X86 codegen tests to ensure
correct lowering on both x64 and x32 targets.
[AIX] Include system library paths in -print-search-dirs output (#182292)
Add `/usr/lib` and `/lib` to `-print-search-dirs` output to match GCC
behaviour and fix Meson/CMake build failures. Override
`AddFilePathLibArgs()` to prevent duplicate `-L` flags in linker
commands. This should allow build tools to construct correct `blibpath`.
---------
Co-authored-by: Tony Varghese <tony.varghese at ibm.com>
Co-authored-by: David Tenty <daltenty.dev at gmail.com>
[lldb][Darwin] Change HostInfoMacOSX mutex to a r/w mutex [NFC] (#182411)
I only need an exclusively lock when scanning a new shared cache; after
that, simultaneous read-only locks are sufficient and allow
multi-threaded access to the data store.
[lldb] Go through the plugin interface to obtain the scripting path (#182400)
Avoid directly including `ScriptInterpreterPython.h` in `SBHostOS`, and
instead go through the plugin interface to obtain the scripting path.
This also deprecates the ``SBHostOS::GetLLDBPythonPath`` method in favor
of the more generic GetScriptPath variant.
[mlir] Fix segfault in gpu.launch verifier when body region is empty (#182086)
Fixes #181971
Some methods in LaunchOp::verifyRegion() accesses body block without
checking that region is empty resulting in a segfault. Fix by adding an
early return when the body is empty.
[GVN] Restore the NumGVNInstr metric (#182380)
The NumGVNInstr metric counts the number of instructions deleted by the
GVN pass. #131753 changed where the instructions were deleted in the
pass, but that PR did not update the NumGVNInstr metric whenever
instructions were deleted. This PR fixes that bug by updating
NumGVNInstr whenever `removeInstruction(...)` is called.
[AArch64][llvm] Tighten SYSP; don't disassemble invalid encodings
Tighten SYSP aliases, so that invalid encodings are disassembled
to `<unknown>`. This is because:
```
Cn is a 4-bit unsigned immediate, in the range 8 to 9
Cm is a 4-bit unsigned immediate, in the range 0 to 7
op1 is a 3-bit unsigned immediate, in the range 0 to 6
op2 is a 3-bit unsigned immediate, in the range 0 to 7
```
Ensure we check this when disassembling, and also constrain
tablegen for compile-time errors of invalid encodings.
Also adjust the testcases in `armv9-sysp-diagnostics.s` and
`llvm/test/MC/AArch64/armv9a-sysp.s` as they were invalid,
and added a few invalid (outside of range) SYSP-alikes to
test that `<unknown>` is printed
[clang][deps] Don't treat the 'P1689' output format specially (#182069)
This patch essentially reverts 62fec3d2 which was an NFC, and replaces
the remaining check for the output format with more generic scanning
service option that controls whether we report absolute or relative file
paths. Along with #182063 this makes the scanner implementation entirely
independent of the desired output format.
[clang] Add missing support for traversal kind in addMatcher overloads (#170953)
This was noted in #170540, and seems to simply be an oversight. This
patch just add the same logic already used in other addMatcher()
implementations that honor the traversal kind.
Fixes #179386.
[NFC] [HWAsan] Run UTC on hwasan tests (#181437)
```
for x in $(grep -l 'UTC' llvm/test/Instrumentation/HWAddressSanitizer/**/*.ll); do
llvm/utils/update_test_checks.py --opt-binary build/bin/opt $x; done
```
[libc][math] Refactor fmax family to header-only (#182165)
Refactors the fmax math family to be header-only.
Closes https://github.com/llvm/llvm-project/issues/182164
Target Functions:
- fmax
- fmaxbf16
- fmaxf
- fmaxf128
- fmaxf16
- fmaxl
[libc] Fix RPC server with independent thread scheduling (#182211)
Summary:
The NVIDIA ITS protocol allows lanes to diverge inside of a warp. We
previously had contingencies around this, but there were cases where
issues would still show up under highly stressed usage.
The rules state that as long as the PC is the same, threads can
reconverge. This means that we can see a 'convergent' warp even when
they took completely divergent paths to get there. This resulted in the
'index' value in the RPC port lookup loop thinking we were in a
convergent group while all the indices were different. Fix this with a
broadcast to force the expected behavior
Additionally, we did not force that the threads were actually done with
their 'work_fn'. If the work included something that caused divergence
the other threads could continue and toggle the mailbox, resulting in
the server seeing unfinished work. Fix this with an explicit sync and
have one thread do it.
Add a test to make sure this actually works.
[libc] Improve GPU allocator lane usage and fences (#182388)
Summary:
Improves performance on the GPU allocator. First, we can use `uniform`
as our mask value when we obtain a slab. Because this is guarnateed
uniform we can safely treat it as our mask. This also improves the
behavior on NVIDIA's ITS.
Secondly, we do not actually need acquire / release fences on the
bitfield. These are one-to-one interfaces and the malloc / free
interface provides the necessary happens-before context. The only fences
that matter are the lifetime management for the guard pointer.