[AArch64] Support SHUFFLE of ANY_EXTEND in performBuildShuffleExtendCombine (#178408)
Currently performBuildShuffleExtendCombine only supports ANY_EXTEND
operands for BUILD_VECTOR inputs, and will bail if it encounters a
VECTOR_SHUFFLE with ANY_EXTEND operands. Update the logic so that we
support shuffles with ANY_EXTEND operands, which brings the code in line
with the comment.
[mlir][memref]: Fold ExpandShape into TransferRead (#176786)
Add support for folding `memref.expand_shape` ops into
`vector.transfer_read` ops when the permutation map is a
non-minor-identity.
In the case that the permutation map indexes into expanded dimensions
that would be contiguous within the original source shape then it is
safe to make this transformation.
Signed-off-by: Jack Frankland <jack.frankland at arm.com>
[DAG] visitVECTOR_SHUFFLE - ensure correct resno when folding shuffle(bop(shuffle(x,y),shuffle(z,w)) (#179124)
TLI.isBinOp recognises some opcodes that have multiple results,
including UADDO etc.
In most cases we currently just bail if a binop has multiple results,
but shuffle combining was missing the check and its pretty trivial to
add handling in this case.
I've added add/sub-overflow opcodes to verifyNode to help catch these
cases in the future - IIRC there was a plan to autogen these, but there
isn't anything at the moment.
Fixes #179112
[mlir] Fix alignment for predicate (i1) vectors (#175975)
Legal scalable predicate vectors (legal in the LLVM sense), e.g.
`vector<[16]xi1>` (or `<vscale x 16 x i1>`, using LLVM syntax) ought to
have alignment **2** rather than **16**, see e.g. [1].
MLIR currently computes the vector “size in bits” as:
```cpp
vecType.getNumElements()
* dataLayout.getTypeSize(vecType.getElementType()) * 8
```
but `getTypeSize()` returns a size in *bytes* (rounded up from bits), so
for `i1` it returns 1. Multiplying by 8 converts that storage byte back to 8
bits per element, which overestimates predicate vector sizes.
Instead, use:
[18 lines not shown]
[AArch64][GlobalISel] Do no skip zext in getTestBitReg. (#177991)
We can, when attempting to lower to tbz, skip a zext that is then not
accounted for elsewhere. The attached test ends up with a tbz from an
extract that then does not properly zext the value extracted from the
vector. This patch fixes that by only looking through a G_ZEXT if the
bit checked is in the low part of the value, lining up the code with the
comment.
Fixes #173895
[SystemZ] Bugfix: Add VLR16 to SystemZInstrInfo::copyPhysReg(). (#178932)
Support COPYs involving higher FP16 regs (like F24H) with a new pseudo
instruction 'VLR16'.
This is needed with -O0/regalloc=fast, and probably in more cases as
well.
Fixes #178788.
(cherry picked from commit 09f9a2892a412a73d42942e78eed9cde61c7a9e7)
[ELFDebugObjectPlugin] Do not wait for std::future in post-fixup phase in the absent of debug info (#178541)
If there is no debug information, we wouldn't call
`DebugObject::collectTargetAlloc` in the post-allocation phase.
Therefore, when it's in the post-fixup phase,
`DebugObject::awaitTargetMem` will fail with _"std::future_error: No
associated state"_ because the std::future was not even populated.
(cherry picked from commit 696ea11b94d119416c9618b5add09d5ac09428aa)
[DOC][DTLTO] Update DTLTO documentation for the LLVM 22 release (#177368)
This change updates the documentation to reflect work completed during
the LLVM 22 timeframe, including support for the ThinLTO cache and
static libraries/archives.
It also clarifies that the goal of DTLTO is to support distribution of
ThinLTO backend compilations for any in-process ThinLTO invocation.
SIE Internal Tracker: TOOLCHAIN-21016
(cherry picked from commit 88478ab495f27f2cb798d4bf6912fe7cf4872997)
[DTLTO] support distributing bitcode from FatLTO objects (#176928)
We already have code to extract bitcode files from archives so they can
be distributed. Extend this code to extract bitcode from FatLTO objects
too, which otherwise cannot be used with DTLTO.
(cherry picked from commit e45ea95dbe236e233ad978067688789e7478541a)
AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.trig.preop
Surprisingly this doesn't consider the special cases, and literally
just extracts the exponent and proceeds as normal.
AMDGPU: Fix incorrect fold of undef for llvm.amdgcn.trig.preop
We were folding undef inputs to qnan which is incorrect. The instruction
never returns nan. Out of bounds segment select will return 0, so fold
undef segment to 0.
AMDGPU: Use SimplifyQuery in AMDGPUCodeGenPrepare (#179133)
Enables assumes in more contexts. Of particular interest is the
nan check for the fract pattern.
The device libs f32 and s64 sin implementations have a range check,
and inside the large path this pattern appears. After a small patch
to invert this check to send nans down the small path, this will
enable the fold unconditionally on the large path.
[LoopCacheAnalysis] Remove tryDelinearizeFixedSize (NFCI) (#177552)
LoopCacheAnalysis has its own function `tryDelinearizeFixedSize`, which
is a wrapper of Delinearization. Due to recent changes in
Delinearization, this function has become almost equivalent to
`delinearizeFixedSizeArray` and is no longer necessary. This patch
removes it.
[MachineFunctionPass] Preserve more IR analyses (#178871)
Preserve, PDT, BPI, LazyBPI and LazyBFI. These are all IR analysis that
are not invalidated by machine passes.
This partially mitigates the compile-time regression from
https://github.com/llvm/llvm-project/pull/174746.
bootgrid: introduce toggle-selected command (fixes https://github.com/opnsense/core/issues/9678)
This will only render if selection && multiSelect are true, and
stickySelect is disabled.
SDL3_image: updated to 3.4.0
3.4.0
This is a major release, adding support for animated cursors, clipboard images,
SDL GPU textures, saving more image formats, and loading and saving animated
image sequences.
SDL3: updated to 3.4.0
3.4.0
In addition to lots of bug fixes and general system improvements, this release
has some major themes of improved interoperability between the 3D GPU API and
the 2D rendering API, improved Emscripten support, improved pen handling, and
native support for PNG images.
[PowerPC] Fix miscompilation when using 32-bit ucmp on 64-bit PowerPC (#178979)
I forgot that you need to clear the upper 32 bits for the carry flag to
work properly on ppc64 or else there will be garbage and possibly
incorrect results.
Fixes: https://github.com/llvm/llvm-project/issues/179119
I do not have merge permissions.