[IR] Remove Before argument from splitBlock APIs (NFC) (#179195)
We never need to use this conditionally (and it doesn't really make
sense, as the behavior is substantially different). Force the use of
separate APIs instead of a boolean argument.
[X86][GISEL] Enable PostLegalize Combiner (#174696)
This patch adds post legalize combiner for X86 Target.
Use case for OptNone Combine: I am curious about OptNone usecase,
specifically when we are using -On on command line and no opt on
specific function.
[CodeExtractor] Format CodeExtractor header, NFC (#178662)
This patch applies clang-format to the CodeExtractor header and updates
usage of the LLVM_ABI macro to prevent unrelated patches touching this
file from having to make these changes in order to pass pre-merge
checks.
[mlir] Extend moveValueDefinitions/moveOperationDependencies with cross-region support (#176343)
Extends `moveValueDefinitions` and `moveOperationDependencies` to
support moving operations across basic blocks and out of nested regions
[X86][APX] Disable PP2/PPX generation on Windows (#178122)
The PUSH2/POP2/PPX instructions for APX require updates to the Microsoft
Windows OS x64 calling convention documented at
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170
due to lack of suitable unwinder opcodes that can support APX
PUSH2/POP2/PPX.
The PR request disables this support by default for code robustness;
workloads that choose to explicitly enable this support can change the
default behavior by explicitly specifying the flag options that enable
this support e.g. for experimentation or code paths that do not need
unwinder support.
[clang] Fix dependency output for #embed (#178001)
When requesting FileEntryRef for embedded file, make sure to not use an
absolute path. Instead, create a proper relative path if we're looking
for a file from current file.
Fixes https://github.com/llvm/llvm-project/issues/161950
[mlir][ROCDL] do not hardcode partial lld path in utilities
`ROCDL::linkObjectCode` was unconditionally appending llvm/bin/ld.lld to the
path it is been passed to to look for lld, which isn't desirable for a utility
function and makes it unusable with, e.g., system lld or one from the LLVM's
own build directory. Move this logic to the caller and let the utility take a
full path.
[AArch64] Support SHUFFLE of ANY_EXTEND in performBuildShuffleExtendCombine (#178408)
Currently performBuildShuffleExtendCombine only supports ANY_EXTEND
operands for BUILD_VECTOR inputs, and will bail if it encounters a
VECTOR_SHUFFLE with ANY_EXTEND operands. Update the logic so that we
support shuffles with ANY_EXTEND operands, which brings the code in line
with the comment.
[mlir][memref]: Fold ExpandShape into TransferRead (#176786)
Add support for folding `memref.expand_shape` ops into
`vector.transfer_read` ops when the permutation map is a
non-minor-identity.
In the case that the permutation map indexes into expanded dimensions
that would be contiguous within the original source shape then it is
safe to make this transformation.
Signed-off-by: Jack Frankland <jack.frankland at arm.com>
[DAG] visitVECTOR_SHUFFLE - ensure correct resno when folding shuffle(bop(shuffle(x,y),shuffle(z,w)) (#179124)
TLI.isBinOp recognises some opcodes that have multiple results,
including UADDO etc.
In most cases we currently just bail if a binop has multiple results,
but shuffle combining was missing the check and its pretty trivial to
add handling in this case.
I've added add/sub-overflow opcodes to verifyNode to help catch these
cases in the future - IIRC there was a plan to autogen these, but there
isn't anything at the moment.
Fixes #179112
[mlir] Fix alignment for predicate (i1) vectors (#175975)
Legal scalable predicate vectors (legal in the LLVM sense), e.g.
`vector<[16]xi1>` (or `<vscale x 16 x i1>`, using LLVM syntax) ought to
have alignment **2** rather than **16**, see e.g. [1].
MLIR currently computes the vector “size in bits” as:
```cpp
vecType.getNumElements()
* dataLayout.getTypeSize(vecType.getElementType()) * 8
```
but `getTypeSize()` returns a size in *bytes* (rounded up from bits), so
for `i1` it returns 1. Multiplying by 8 converts that storage byte back to 8
bits per element, which overestimates predicate vector sizes.
Instead, use:
[18 lines not shown]
[AArch64][GlobalISel] Do no skip zext in getTestBitReg. (#177991)
We can, when attempting to lower to tbz, skip a zext that is then not
accounted for elsewhere. The attached test ends up with a tbz from an
extract that then does not properly zext the value extracted from the
vector. This patch fixes that by only looking through a G_ZEXT if the
bit checked is in the low part of the value, lining up the code with the
comment.
Fixes #173895
[SystemZ] Bugfix: Add VLR16 to SystemZInstrInfo::copyPhysReg(). (#178932)
Support COPYs involving higher FP16 regs (like F24H) with a new pseudo
instruction 'VLR16'.
This is needed with -O0/regalloc=fast, and probably in more cases as
well.
Fixes #178788.
(cherry picked from commit 09f9a2892a412a73d42942e78eed9cde61c7a9e7)
[ELFDebugObjectPlugin] Do not wait for std::future in post-fixup phase in the absent of debug info (#178541)
If there is no debug information, we wouldn't call
`DebugObject::collectTargetAlloc` in the post-allocation phase.
Therefore, when it's in the post-fixup phase,
`DebugObject::awaitTargetMem` will fail with _"std::future_error: No
associated state"_ because the std::future was not even populated.
(cherry picked from commit 696ea11b94d119416c9618b5add09d5ac09428aa)
[DOC][DTLTO] Update DTLTO documentation for the LLVM 22 release (#177368)
This change updates the documentation to reflect work completed during
the LLVM 22 timeframe, including support for the ThinLTO cache and
static libraries/archives.
It also clarifies that the goal of DTLTO is to support distribution of
ThinLTO backend compilations for any in-process ThinLTO invocation.
SIE Internal Tracker: TOOLCHAIN-21016
(cherry picked from commit 88478ab495f27f2cb798d4bf6912fe7cf4872997)
[DTLTO] support distributing bitcode from FatLTO objects (#176928)
We already have code to extract bitcode files from archives so they can
be distributed. Extend this code to extract bitcode from FatLTO objects
too, which otherwise cannot be used with DTLTO.
(cherry picked from commit e45ea95dbe236e233ad978067688789e7478541a)
AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.trig.preop
Surprisingly this doesn't consider the special cases, and literally
just extracts the exponent and proceeds as normal.
AMDGPU: Fix incorrect fold of undef for llvm.amdgcn.trig.preop
We were folding undef inputs to qnan which is incorrect. The instruction
never returns nan. Out of bounds segment select will return 0, so fold
undef segment to 0.
AMDGPU: Use SimplifyQuery in AMDGPUCodeGenPrepare (#179133)
Enables assumes in more contexts. Of particular interest is the
nan check for the fract pattern.
The device libs f32 and s64 sin implementations have a range check,
and inside the large path this pattern appears. After a small patch
to invert this check to send nans down the small path, this will
enable the fold unconditionally on the large path.
[LoopCacheAnalysis] Remove tryDelinearizeFixedSize (NFCI) (#177552)
LoopCacheAnalysis has its own function `tryDelinearizeFixedSize`, which
is a wrapper of Delinearization. Due to recent changes in
Delinearization, this function has become almost equivalent to
`delinearizeFixedSizeArray` and is no longer necessary. This patch
removes it.
[MachineFunctionPass] Preserve more IR analyses (#178871)
Preserve, PDT, BPI, LazyBPI and LazyBFI. These are all IR analysis that
are not invalidated by machine passes.
This partially mitigates the compile-time regression from
https://github.com/llvm/llvm-project/pull/174746.