[lldb] Go through the plugin interface to obtain the scripting path (#182400)
Avoid directly including `ScriptInterpreterPython.h` in `SBHostOS`, and
instead go through the plugin interface to obtain the scripting path.
This also deprecates the ``SBHostOS::GetLLDBPythonPath`` method in favor
of the more generic GetScriptPath variant.
[mlir] Fix segfault in gpu.launch verifier when body region is empty (#182086)
Fixes #181971
Some methods in LaunchOp::verifyRegion() accesses body block without
checking that region is empty resulting in a segfault. Fix by adding an
early return when the body is empty.
[GVN] Restore the NumGVNInstr metric (#182380)
The NumGVNInstr metric counts the number of instructions deleted by the
GVN pass. #131753 changed where the instructions were deleted in the
pass, but that PR did not update the NumGVNInstr metric whenever
instructions were deleted. This PR fixes that bug by updating
NumGVNInstr whenever `removeInstruction(...)` is called.
[AArch64][llvm] Tighten SYSP; don't disassemble invalid encodings
Tighten SYSP aliases, so that invalid encodings are disassembled
to `<unknown>`. This is because:
```
Cn is a 4-bit unsigned immediate, in the range 8 to 9
Cm is a 4-bit unsigned immediate, in the range 0 to 7
op1 is a 3-bit unsigned immediate, in the range 0 to 6
op2 is a 3-bit unsigned immediate, in the range 0 to 7
```
Ensure we check this when disassembling, and also constrain
tablegen for compile-time errors of invalid encodings.
Also adjust the testcases in `armv9-sysp-diagnostics.s` and
`llvm/test/MC/AArch64/armv9a-sysp.s` as they were invalid,
and added a few invalid (outside of range) SYSP-alikes to
test that `<unknown>` is printed
janet: update to 1.41.2
There were some last-minute issues in the release of 1.41.1, namely a
regression in out-of-bounds buffer and array indexing. This release
fixes those issues.
- Use snprintf instead of sprintf
- Initialize memory allocated by `put`
[clang][deps] Don't treat the 'P1689' output format specially (#182069)
This patch essentially reverts 62fec3d2 which was an NFC, and replaces
the remaining check for the output format with more generic scanning
service option that controls whether we report absolute or relative file
paths. Along with #182063 this makes the scanner implementation entirely
independent of the desired output format.
[clang] Add missing support for traversal kind in addMatcher overloads (#170953)
This was noted in #170540, and seems to simply be an oversight. This
patch just add the same logic already used in other addMatcher()
implementations that honor the traversal kind.
Fixes #179386.
[NFC] [HWAsan] Run UTC on hwasan tests (#181437)
```
for x in $(grep -l 'UTC' llvm/test/Instrumentation/HWAddressSanitizer/**/*.ll); do
llvm/utils/update_test_checks.py --opt-binary build/bin/opt $x; done
```
[libc][math] Refactor fmax family to header-only (#182165)
Refactors the fmax math family to be header-only.
Closes https://github.com/llvm/llvm-project/issues/182164
Target Functions:
- fmax
- fmaxbf16
- fmaxf
- fmaxf128
- fmaxf16
- fmaxl
[libc] Fix RPC server with independent thread scheduling (#182211)
Summary:
The NVIDIA ITS protocol allows lanes to diverge inside of a warp. We
previously had contingencies around this, but there were cases where
issues would still show up under highly stressed usage.
The rules state that as long as the PC is the same, threads can
reconverge. This means that we can see a 'convergent' warp even when
they took completely divergent paths to get there. This resulted in the
'index' value in the RPC port lookup loop thinking we were in a
convergent group while all the indices were different. Fix this with a
broadcast to force the expected behavior
Additionally, we did not force that the threads were actually done with
their 'work_fn'. If the work included something that caused divergence
the other threads could continue and toggle the mailbox, resulting in
the server seeing unfinished work. Fix this with an explicit sync and
have one thread do it.
Add a test to make sure this actually works.
[libc] Improve GPU allocator lane usage and fences (#182388)
Summary:
Improves performance on the GPU allocator. First, we can use `uniform`
as our mask value when we obtain a slab. Because this is guarnateed
uniform we can safely treat it as our mask. This also improves the
behavior on NVIDIA's ITS.
Secondly, we do not actually need acquire / release fences on the
bitfield. These are one-to-one interfaces and the malloc / free
interface provides the necessary happens-before context. The only fences
that matter are the lifetime management for the guard pointer.