Revert "[IRTranslator] Precommit tests for bitcasts of the byte type #203638 (#204378)
This reverts commit 7e5bc4c7bd23e390cdb0b08f807968ea256b0df2 as the
MachineVerifier identifies 'bad machine code'.
[libc++] Document post-WG21-meeting conformance update procedure (#204357)
This patch adds a section in the documentation to explain the procedure
to follow after a WG21 meeting to properly track papers. This should
clear out some confusion about how this process happens and who should
be responsible for doing it.
[lldb] Fix race in macOS's FindProcessesImpl (#204109)
Our current FindProcessImpl has a TOCTOU bug where we first query the
buffer size we should provide via sysctl and then later pass a buffer
with that size to be filled. If the list of processes grows larger than
the buffer we pass, then our current implementation fails by returning
an empty list of processes. This race only happens rarely in practice as
we pad the buffer size with 10 additional entries to account for some
process growth.
This patch replaces this logic by a backoff loop that retries fetching
the process list if our buffer is too small (sysctl tells us if this is
the case by setting ENOMEM). This new implementation can only fail in
the system consistently spawns thousands of new processes between
each retry.
This should fix the actual root-cause for the random failures in
TestSimulator.py
[NVPTX] Fix build break from #201217 (#204380)
#201217 added a third `SymbolSize` argument to `AggBuffer::addSymbol()`
but missed one call site, which was added by 98160521cb72 after the PR
branch was cut. Pass `AllocSize` like the sibling calls do.
NAS-141417 / 27.0.0-BETA.1 / Convert hardware plugin to the typesafe pattern (#19145)
## Context
The hardware plugin is a directory of four mostly-private legacy
services (mseries.bios, mseries.nvdimm, hardware.memory, plus
hardware.virtualization). Only hardware.virtualization.variant is public
over the wire; the rest return plain dicts/bools consumed internally by
alert sources and usage reporting, so Pydantic models would be pure
overhead.
## Solution
Applied the port-plugin pattern: lean Service shims in __init__.py that
delegate to plain, fully type-annotated module functions, keeping the
existing dict/primitive return shapes so no consumer changes are needed.
The one public method gets check_annotations=True against the existing
HardwareVirtualizationVariant models. Registered the services in
main.py's ServiceContainer via nested hardware/mseries containers and
added the plugin to mypy.yml.
[AMDGPU] Remove unnecessary and broken sign/zero-extension (#203436)
When expanding div/rem by using floating-point operations,
sign/zero-extending the result from the calculated DivBits input width
to 32-bits is unnecessary. CreateFPToSI or CreateFPToUI is called with a
32-bit int type so the conversion instruction will already produce a
result with the desired width.
Also it is incorrect. For signed-division `DIVBITS_MAX_NEG/-1`, the
result should be `-DIVBITS_MAX_NEG` a positive value. Sign-extension
will incorrectly return a negative result. For example, for DivBits=4,
`-8/-1 = 8`, but adding code to do a 28-bit sign-extension will
incorrectly return `-8`.
Tested in https://github.com/llvm/llvm-test-suite/pull/423.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
[M68k] Do not allow addressing modes k and q as MOVE targets (llvm#200826) (#201653)
This is intended as a fix for #200826 by removing PC-relative address
modes from the m68k MOVE patterns. It also affects MOVEM, and the
"atomic store" pattern that maps to the respective MOVE instruction as
well. This patch is based on the big patch authored by Gemini at
https://github.com/llvm/llvm-project/issues/181481#issuecomment-4476933700
, but as has been carefully trimmed to just address one single issue,
and every change has been verified to make sense. Gemini also restricted
the list of source addressing modes for MOVEM to the valid destination
addressing modes, which is not required according to the Motorola
specification.
[llvm-exegesis] Add did-you-mean hint for unknown opcodes (#203463)
Fixes #203199
When skipping a benchmark entry with an unknown opcode name, suggest the
nearest matching opcode if the edit distance is <= 1 (similar to
OptTable::findNearest).
Example:
```text
warning: skipping benchmark entry: No opcode with name 'VADDPDYrrr' - did you mean 'VADDPDYrr' ?
```
And Tested with:
- `llvm-lit
llvm/test/tools/llvm-exegesis/X86/analysis-unknown-opcode-did-you-mean.test`
- `llvm-lit
llvm/test/tools/llvm-exegesis/X86/analysis-skip-unknown-opcode.test`
[2 lines not shown]
[NVPTX] Properly emit narrow ptrtoint in aggregate initializers. (#201217)
If you have a 64-bit pointer and use ptrtoint to convert it to an i32,
that's supposed to return the low 32 bits of the pointer.
But if you use a narrowing ptrtoint inside a global aggregate
initializer, we currently don't mask the bytes. For this IR:
@g = addrspace(1) global i32 0
@s = addrspace(1) global { i32, i32 }
{ i32 ptrtoint (ptr addrspace(1) @g to i32), i32 0xdeadbeef }
on nvptx64 we emitted
.u64 s[1] = {g};
i.e. we emitted the full 64-bit address of `@g`, and entirely dropped
the
trailing i32 (0xdeadbeef).
[2 lines not shown]
Revert "[IRTranslator] Precommit tests for bitcasts of the byte type (#203638)"
This reverts commit 7e5bc4c7bd23e390cdb0b08f807968ea256b0df2
as the MachineVerifier identifies 'bad machine code'.
[SimpleLoopUnswitch] Generalize the notion of trivial unswitching (#193989)
For a loop like this
```
for (int j = 0; j < M; j++) {
if (N <= 0) continue; // invariant guard branches to latch
for (int i = 0; i < N; i++)
A[i] = B[i] + 1;
}
```
Since none of the successors of the guard branch of the inner loop are
outside the loop `j`, unswitching treats this as a non-trivial branch.
In reality, this is a perfect loop nest. If the condition of the guard
of `i` loop is false, there is nothing to do and so if we unswitch this
branch, the loop does not need to be versioned. This matches the
requirements of a trivial unswitch. This patch extends trivial loop
unswitching to catch cases like this.
[3 lines not shown]
[Offload] Fix pessimistic max block count sizing on AMDGPU (#204242)
Summary:
For whatever reason, HSA copied the questionable choices that OpenCL
made and represents its launch parameters in a threads * blocks grid.
The problem is that you then combine this with an `int32_t` interface,
so you have 31 bits to represent your launch. We were then
pessimistically stating that your launch always had 1024 threads, which
left us with 2^21. This is only about two million which people do all
the time, and this caused us to perform weird clamping in OpenMP. The
effect was that tests like ompx_saxpy_mixed.c was hitting that clamp and
returning wrong results.
Also fix the sanitizer tests failing because of HSA core dumps.
[MLIR][GPUToXeVMPipeline] Expand MX scaling ops before XeVM conversion (#203632)
arith.scaling_extf/scaling_truncf were never lowered by the
gpu-lower-to-xevm pipeline, so micro-scaling (MX) GEMM kernels kept
these ops (and their narrow-float operands) live all the way to LLVM
translation.
Run arith-expand before the XeVM/LLVM conversions to break
scaling_extf/scaling_truncf into extf/truncf + mulf and to expand
f8E8M0FNU casts into integer bit manipulation. f4E2M1FN expansion is
intentionally left disabled: its casts are lowered by the XeVM
conversions (xevm.extf), whereas f8E8M0FNU is not handled there and must
be expanded here. The generic f4E2M1FN expansion would otherwise emit i4
vector arithmetic that the XeVM backend cannot legalize.
[MLIR][XeGPU] Fix load_matrix lowering for non-LLVM element types (#203629)
LoadStoreMatrixToXeVMPattern built the llvm.load result from the raw op
result element type. For element types without a direct LLVM
representation (e.g. f8E8M0FNU) this produced an illegal op: 'llvm.load'
op result #0 must be LLVM type with size, but got 'f8E8M0FNU'.
Derive the load result type from the type converter instead. This maps
such element types to an integer storage type of the same bit width,
collapses single-element vectors to a scalar, and flattens multi-element
vectors. The store path already used the converted operand and is
unchanged; the XeVM type converter's materialization casts bridge the
loaded value back to the original vector type for downstream consumers.
Add load_matrix regression tests for f8E8M0FNU (scalar and vector).
[lldb] Fix LLDB_BUILD_FRAMEWORK with the dynamic script interpreters (#204265)
When using LLDB_ENABLE_DYNAMIC_SCRIPTINTERPRETERS (the default on Darwin
as of #204015), the PluginManager loads at runtime by scanning the
directory that holds liblldb. A framework build moves liblldb into
LLDB.framework, but the plugins were only emitted into lib/ and never
copied into the bundle, so they were never found.
Add lldb_add_scriptinterpreter_plugin_to_framework(), called from the
Python and Lua plugin CMakeLists, which copies the plugin next to the
framework binary and appends an rpath so it can resolve liblldb from
inside the bundle. The copy uses the plugin's unversioned name because
PluginManager derives the initializer symbol from it (a versioned copy
would load but never register). Non-framework builds are unaffected.
[SimplifyCFG] Use context instruction in foldBranchToCommonDest() (#203516)
When determining whether instructions can be speculate, pass the
terminator of the predecessor as the context instruction.
Only do this for the single predecessor case, otherwise we'd have to
perform one query per predecessor. The multi-predecessor case is
excluded by the default cost model anyway.
[lldb][unittests] Add LLDB_UNITTEST_STRIP_DEBUG_INFO to cut unittest link time (#203274)
Add an opt-in cache option (default OFF). When ON, every target declared
via `add_lldb_unittest` links without per-target debug info and
dead-strips:
- **MSVC**: `/DEBUG:NONE + /INCREMENTAL:NO + /OPT:REF + /OPT:ICF`
- **clang/gcc**: `-g0 + LINKER:--strip-debug` (or `LINKER:-S` on macOS).
This drastically speeds up linking the unittests executable when
building with debug info on Windows.