Remove magic constants set in the cx56 driver's descriptor
in favour of the hpcreg.h definitions.
Confusingly, the MI driver apparently swaps the meanings of DO
and DI in the chip's datasheet.
Fix EEPROM reading on IP12: DELAY isn't available as early as
needed, so roll our own using the most pessimistic timings
(that is, busy loop enough for the fastest pre-ARCS CPU).
Clean up the EEPROM twiddling code and remove magic constants
while here.
[RISC-V] Add --implicit-check-not="error:" to a few tests
Ensures that the test checks for every error emitted by llvm-mc. To do this
we have to move the CHECK lines to the next line rather than the same line
since otherwise we get a false-positive match.
This adds a few missing CHECK line in the xqcibm-invalid test and is needed
to minimize the diff in one of my subsequent commit.
Pull Request: https://github.com/llvm/llvm-project/pull/203091
[MLIR][XeGPU] Support partial subgroup lane distribution (#201667)
for convert_layout
Add lowering support in XeGPUSgToLaneDistribute for values that are
distributed across only a fraction of the subgroup.
- SgToLaneConvertLayout now lowers a rank-2 xegpu.convert_layout that
shrinks the lane layout along the outer (distributed) dimension while
keeping lane_data unchanged (e.g. [16, 1] -> [8, 1]). The partial-subgroup
case is detected directly in the pattern: equal order, rank 2, unit inner
lane layout, and a genuinely distributed outer lane layout (> 1, which also
rules out the degenerate [1, 1] layout). Because the data is no longer
replicated in every lane, it is gathered across lanes and the distributed
outer dimension is doubled when the lane count is halved.
- The cross-lane gather is factored into a dedicated helper,
shuffleDataAsLaneLayoutChange(): it bitcasts the source to i32, issues
gpu.shuffle up to fetch the values from the dropped lanes, and concatenates
[9 lines not shown]
Ensure that curlwp is saved and restored across calls into
pre-ARCS PROM routines. This only matters for console output,
but do the same for shutdown/restart entrypoints as well.
While here, preformat strings before invoking the PROM's
printf routine, since we don't know what its formatting and
argument limitations are.
[2/2][AMDGPU] Insert IMPLICIT_DEF to provide a reaching def for unspilled reloads
Depends on https://github.com/llvm/llvm-project/pull/198472
PR #198472 skips unspilling a slot if a spill reload is reachable from
entry along a path that does not contain a spill store. This patch builds
on it by finding a basic block where an IMPLICIT_DEF can be inserted to
provide a reaching definition on all paths to such reloads, allowing the
unspill to proceed. This new def may extend the rewritten vreg's live
range, so extra interference checks are performed over the extended region
to pick an appropriate physical register.
For the joint-dominance tests, an IMPLICIT_DEF insertion block is found,
but no physical register is interference-free over the extended range,
so the unspill is conservatively skipped.
Assisted-by: Cursor/Claude Opus
Revert "[lldb][test] Increase polling in TestInterruptThreadNames.py (#201554)" (#203126)
This reverts commit fdfd1c1344187d64b63504ea8e3662ae4936503a.
The Intel mac CI bot is timing out often with these new timeouts and
we're getting failing runs. Raphael will adjust and re-land.
[MLIR][XeGPU] Extend 8-bit load_nd support in XeVM lowering (#201645)
2D block load on 8bit element type has a shape 32x16 supported by OpenCL
API
```
void intel_sub_group_2d_block_read_transform_8b_32r16x1c( // reads eight uints
global void* base_address,
int width, int height, int pitch, int2 coord, private uint* destination);
```
The API is for load with transform/VNNI request.
OpenCL does not provide a load API for the same vector type and no
transform request. But value returned is identical for this special
vector type. <32x16x"8b">
The PR adds support for this vector type with no transform request.
[MLIR][XeGPU] Enable peephole optimization for the CRI target (#201655)
Enable the XeGPU transpose peephole and array-length optimizations for
the Crescent Island (cri) target alongside pvc and bmg. Skip sub-byte (<
8-bit) element types in array-length optimizations, which are not yet
supported.
Add tests in peephole-optimize.mlir covering the cri target and the
array-length optimization rejecting sub-byte
[Passes] Enhance `--print-pipeline-passes` (#202892)
Allow users to specify output format, make pipeline output more
palatable to FileCheck. Currently, it only support `text` and `tree`
format.
Fixes #200926.
[SpecialCaseList] Add backward compatible dot-slash handling
This PR is preparation for:
* https://github.com/llvm/llvm-project/pull/167283
The new behavior is controlled by the `Version` field in the special
case list file.
- Version 1 and 2: Path is matched as-is, regardless of presence of "./".
- Version 3, 5 and higher: Paths with leading dot-slash are canonicalized
to paths without dot-slash before matching. This means that a rule
like `src=./foo` will never match, and `src=foo` will match both
`foo` and `./foo`. (Version 3 never became default but has this behavior).
- Version 4: Transitionary version. Paths are matched both ways
(canonicalized and non-canonicalized) to maintain backward compatibility.
If a match only works with the old behavior (non-canonicalized), a warning
is emitted.
This change allows for a gradual transition to the new behavior, while
[6 lines not shown]