[VPlan] Move getCanonicalIV to VPRegionBlock (NFC). (#163020)
The canonical IV is tied to region blocks; move getCanonicalIV there and
update all users.
PR: https://github.com/llvm/llvm-project/pull/163020
[LLDB][NativePDB] Consolidate simple types (#163209)
This aligns the simple types created by the native plugin with the ones
from DIA as well as LLVM and the original cvdump.
- A few type names weren't handled when creating the LLDB `Type` name
(e.g. `short`)
- 64-bit integers were created as `(u)int64_t` and are now created as
`(unsigned) long long` (matches DIA)
- 128-bit integers (only supported by clang-cl) weren't created as types
(they have `SimpleTypeKind::(U)Int128Oct`)
- All complex types had the same name - now they have `_Complex
<float-type>`
Some types like `SimpleTypeKind::Float48` can't be tested because they
can't be created in C++.
[Analysis][AArch64][NFC] Change undef to poison in most tests (#163532)
Whenever someone modifies an existing test that has `undef` in it the
github code formatter will complain so it's not easy to know if it's due
to a new or old use. I figured I may as well just do a simple sed
replace of undef with poison in all the tests to clean them up.
Hopefully it makes the contribution process a bit easier.
[X86] Add baseline test for X86 conditional load/store optimization bug (#163354)
This PR adds a baseline test that exposes a bug in the current
`combineX86CloadCstore` optimization. The generated assembly
demonstrates incorrect behavior when the optimization is applied without
proper constraints.
Without any assumptions about `X` this transformation is only valid when
`Y` is a non zero power of two/single-bit mask.
```cpp
// res, flags2 = sub 0, (and (xor X, -1), Y)
// cload/cstore ..., cond_ne, flag2
// ->
// res, flags2 = sub 0, (and X, Y)
// cload/cstore ..., cond_e, flag2
```
In the provided test case, the value in `%al` is unknown at compile
time. If `%al` contains `0`, the optimization cannot be applied, because
[2 lines not shown]
[lldb] Only get child if m_storage and m_element_type is valid (#163077)
This causes a crash because lldb-dap will check the first child to see
if it is array like to lazy load the children.
[lldb-dap][test] create temp source file in test directory. (#163383)
Fixes #163288
---------
Co-authored-by: Jonas Devlieghere <jonas at devlieghere.com>
[mlir][bufferization] Test tensor encoding -> memref layout conversion (#161166)
Support custom types (4/N): test that it is possible to customize memref
layout specification for custom operations and function boundaries.
This is purely a test setup (no API modifications) to ensure users are
able to pass information from tensors to memrefs within bufferization
process. To achieve this, a test pass is required (since bufferization
options have to be set manually). As there is already a
--test-one-shot-module-bufferize pass present, it is extended for the
purpose.
[mlir][linalg] Update vectorizatio of linalg.pack
This patch changes `vectorizeAsTensorPackOp` to require users to specify
all write-side vector sizes for `linalg.pack` (not just the outer
dimensions). This makes `linalg.pack` vectorization consistent with
`linalg.unpack` (see #149293 for a similar change).
Conceptually, `linalg.pack` consists of these high-level steps:
* **Read** from the source tensor using `vector.transfer_read`.
* **Re-associate** dimensions of the transposed value, as specified by
the op (via `vector.shape_cast`)
* **Transpose** the re-associated value according to the permutation
in the `linalg.pack` op (via `vector.transpose`).
* **Write** the result into the destination tensor via
`vector.transfer_write`.
Previously, the vector sizes provided by the user were interpreted as
write-vector-sizes for PackOp _outer_ dims (i.e. the final step above).
These were used to:
[27 lines not shown]
[AArch64][SME] Propagate desired ZA states in the MachineSMEABIPass
This patch adds a propagation step to the MachineSMEABIPass that
propagates desired ZA states forwards/backwards (from predecessors to
successors, or vice versa).
The aim of this is to pick better ZA states for edge bundles, as when
many (or all) blocks in a bundle do not have a preferred ZA state, the
ZA state assigned to a bundle can be less than ideal.
An important case is nested loops, where only the inner loop has a
preferred ZA state. Here we'd like to propagate the ZA state up from the
inner loop to the outer loops (to avoid saves/restores in any loop).
Change-Id: I39f9c7d7608e2fa070be2fb88351b4d1d0079041
[AArch64][SME] Fixup ABI routine insertion points to avoid clobbering NZCV (#161353)
This updates the `MachineSMEABIPass` to find insertion points for state
changes (i.e., calls to ABI routines), where the NZCV register (status
flags) are not live.
It works by stepping backwards from where the state change is needed
until we find an instruction where NZCV is not live, a previous state
change, or a call sequence. We conservatively don't move into/over
calls, as they may require a different state before the start of the
call sequence.
[Clang] Fix a regression introduced by #161163. (#162612)
Classes with a user provided constructor are still implicit lifetime if
they have an implicit, trivial copy ctr.
[MLIR][shard] Fix tblgen description of `shard.neighbors_linear_indices` (#163409)
This PR fixed an issue where inline code blocks in the ODS description of `shard.neighbors_linear_indices` were not properly closed.
[LLD] [COFF] Fix aarch64 delayimport of sret arguments (#163096)
For sret arguments on aarch64, the x8 register is used as input
parameter to functions, even though x8 normally isn't an input parameter
register.
When delayloading a DLL, the first call of a delayloaded function ends
up calling a helper which resolves the function. Therefore, any input
arguments to the actual function to be called need to be backed up and
restored - this also includes x8.
This matches how MS link.exe also changed its delayloading trampoline,
between MSVC 2019 16.7 and 16.8 (between link.exe 14.27.29110.0 and
14.28.29333.0).
This fixes running LLDB on aarch64 mingw, after
ec28b95b7491bc2fbb6ec66cdbfd939e71255c42 and
93d326038959fd87fb666a8bf97d774d0abb3591. Those commits make LLDB load
liblldb.dll with delayloading, and the first function to be called,
[4 lines not shown]
[LLD][COFF] Fix tailMergeARM64 delayload thunk 128 MB range limitation (#161844)
lld would fail with "error: relocation out of range" if the thunk was
laid out more than 128 MB away from __delayLoadHelper2.
This patch changes the call sequence to load the offset into a register
and call through that, allowing for 32-bit offsets.
Fixes #161812
(cherry picked from commit 69b8d6d4ead01b88fb8d6642914ca7492e32fdb6)
[AArch64] Add intrinsics for multi-vector FEAT_SVE_BFSCALE instructions (#163346)
This patch add intrinsics support for multi-vector BFMUL and BFSCALE
instruction based on
[this](https://github.com/ARM-software/acle/pull/410) ACLE specification
proposal
[X86][ByteCode] Allow PSHUFB intrinsics to be used in constexpr #156612 (#163148)
The PSHUFB instruction shuffles bytes within each 128-bit lane: for each
control byte, if bit 7 is set, the output byte is zeroed; otherwise, the
low 4 bits select a source byte (0–15) from the same lane.
Note: _mm_shuffle_pi8 function had to change as __anyext128 had negative
indices which are invalid in constant expression context.
Fixes #156612
[NFC] [clang] Add comments for a defect
See the patch for details.
I tried to solve the defect left in previous refactorings
but found it was more complex. Add the comment to state
it more clearly.
[AArch64PostCoalescer] Propagate undef flag after replacing (#163119)
I encountered a compilation crash issue, and after analysis, it was
caused by the AArch64PostCoalescerPass, see https://godbolt.org/z/vPeqeo5Pa.
When replacing the register, if the source register has undef flag, we
should propagate the flag to all uses of the destination register.
[OpenMP] Fix preprocessor mismatches between include and usages of hwloc (#158349)
Fix https://github.com/llvm/llvm-project/issues/156679
There is a mismatch between the preprocessor guards around the include
of `hwloc.h` and those protecting its usages, leading to build failures
on Darwin: https://github.com/spack/spack-packages/pull/1212
This change introduces `KMP_HWLOC_ENABLED` that reflects
whether hwloc is actually used.
[LLDB, FreeBSD, x86] Fix empty register set when trying to get size of register (#162890)
The register set information is stored as a singleton in
GetRegisterInfo_i386. However, other functions later access this
information assuming it is stored in GetSharedRegisterInfoVector. To
resolve this inconsistency, we remove the original construction logic
and instead initialize the singleton using llvm::call_once within the
appropriate function (GetSharedRegisterInfoVector_i386).
[X86] Add support for Wildcat Lake (#163214)
Add support for Wildcat Lake, per Intel Architecture Instruction Set
Extensions Programming Reference rev. 59
(https://cdrdv2.intel.com/v1/dl/getContent/671368)