[LLVM][LICM] Skip unrelated accesses when looking for hoist/sink conflicting instructions. (#195132)
Essentially uses ModRef analysis in place of getClobberingMemoryAccess()
because the former has more accurate information as to how in loop
accesses and the hoist/sink target relate.
[AMDGPU] Increment VA_VDST twice for each VOP3PX2 instruction (#196353)
In expert scheduling mode, change the VA_VDST counts to match the
hardware implementation. The inserted waits were conservatively correct
before. This just makes them more precise in some cases.
[libc++] Introduce implicit and explicit ABI annotations (#193045)
This patch introduces `_LIBCPP_{BEGIN/END}_EXPLICIT_ABI_ANNOTATIONS` and
marks everything within an
`_LIBCPP_{BEGIN,END}_UNVERSIONED_NAMESPACE_STD` (and any derivatives
like `_LIBCPP_{BEGIN,END}_NAMESPACE_STD`) implicitly by default. This
allows us to drop `_LIBCPP_HIDE_FROM_ABI` in most of the code base,
except for functions which shouldn't be `_LIBCPP_HIDE_FROM_ABI`.
This patch doesn't remove any `_LIBCPP_HIDE_FROM_ABI`s, since we have
over 13k of them in the code base. Actually dropping them will happen
over some time to avoid too many merge conflicts.
Revert "[lldb] Real-time console pane for output in lldb tui" (#196507)
Reverts llvm/llvm-project#177160
The new test is timing out on the AArch64 Linux buildbot
(https://lab.llvm.org/buildbot/#/builders/59/builds/34166) and on my own
machine.
I suspect something to do with the requested terminal size. If what we
get is smaller than requested, it could time out waiting for expected
program output.
[llvm][OpenMP][SPIRV] Fix assertion for GPU reductions (#194879)
Currenty compiling a `target reduction` results in the following assert
for spirv64-intel target:
> Assertion `New->getType() == getType() && "replaceUses of value with
new value of different type!"' failed.
This patch fixes it by adding an addrespace cast where necessary to make
the types of the expressions match.
Assisted-by: claude-sonnet-4-5
clang: Add BoundArch argument to addClangTargetOptions
addClangTargetOptions already has an OffloadKind argument,
but it kind of doesn't make sense for any function to know the
OffloadKind, but not the associated BoundArch.
The current process is kind of convoluted. TranslateArgs
synthesizes a -mcpu argument from BoundArch, and later
addClangTargetOptions re-parses that -mcpu argument each
time it wants the architecture. Add this argument so this
can be cleaned up in a future change.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
[AArch64] Use EXT for byte shuffles with leading zeros (#193466)
Fixes: https://github.com/llvm/llvm-project/issues/191735
Teach AArch64 LowerVECTOR_SHUFFLE to recognize byte shuffles that are a
zero fill right shift and lower them to EXT with a zero vector. Adds a
regression test too.
Change-Id: Iffe97ff7e35cfaff790f537b4f1f5ba9aded4f92
[lldb][test] Move DAP processes to own group to avoid random SIGHUPs (#195816)
On macOS, LLDB's test suite randomly receives SIGHUP signals that stop
the test suite early. The source of these SIGHUP's seems to be a bug in
the kernel (most likely job control).
The exact steps for reproducing this are not clear, but I have a set of
three tests of which two need to run concurrently for this to trigger:
* TestDAP_runInTerminal
* TestDAP_launch_io_integratedTerminal
* TestDAP_launch_stdio_redirection_and_console
I was also running UBSan on this build which may or may not be necessary
to make this random failure more persistent.
When these tests run, macOS job control will send SIGHUP to the process
group of the spawned subprocesses in that test. As LIT is in the same
process group, it also receives the SIGHUP and shuts down.
[8 lines not shown]
[llvm][tools] Extend llvm-objdump to support nested OffloadBinaries (#185425)
Extends llvm-objdump to print the information of images contained in
nested OffloadBinaries. For example, for a binary compiled with #185413
it shows
```
$llvm-objdump --offloading ./a.out
./a.out: file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind elf
arch
triple spirv64-intel
producer openmp
image size 43104 bytes
[Nested OffloadBinary format detected]
Number of inner images: 1
kind spir-v
[13 lines not shown]
[X86][GlobalISel] Support globals in pic mode (#170038)
Introduce G_WRAPPER_RIP it is the same node as in DAG. It is required to
make legalization possible when a load from stub is required to obtain a
pointer to a global value. It allows to avoid manual selection in
X86InstructionSelector.
Also added a missing check on X86SelectAddress failure.
[DAG] canCreateUndefOrPoison - poison generating flags / out of range shift amounts only generate poison (#196489)
Matches ValueTracking / GISel implementations - although testing options are limited until DAG has actual uses of UndefPoisonKind::UndefOnly
[lldb-dap] Fix build when using precompiled header and Xcode generator. (#196366)
When building with precompiled headers and Xcode as a generator, It adds
`obj.lldbDAP.dir/${BUILD_TYPE}/cmake_pch.xxx` but does not generate one
causing the build to fail.
This might have to do with `add_llvm_library` adding a source file
`Dummy.c` to any object it creates if using Xcode as a generator and
`lldbDAP` object not declaring it's LINK_LIBS and LINK_COMPONENTS.
[libclc] Use spirv[64]-mesa-mesa3d triple in README.md (#196483)
Now that #194607 landed we use a normalized triple in the README for the
SPIRV targets. Before `spirv-mesa3d-` and `spirv64-mesa3d-` were being
used and those will be normalized to `spirv-unknown-mesa3d` and
`spirv64-unknown-mesa3d` by the following command in
`runtimes/CMakeLists.txt` with this command:
```console
$ clang --target=spirv-mesa3d- -print-target-triple
spirv-unknown-mesa3d
```
This is because in `llvm/lib/TargetParser/Triple.cpp` the term `mesa3d`
is recognized as an OS and placed in third position. The install path
for `libclc.spv` there ends up in `spirv-unknown-mesa3d/libclc.spv`.
With this change we suggest to use triples that "survive" the
normalization:
[7 lines not shown]
[Dexter][NFC] Add split step-data collection methods for DAP (#196350)
This patch adds some extra state collection methods to DebuggerBase and
implements them for DAP only. These methods are used to fetch a
stacktrace without variable information, and to populate variable
information into a StepIR containing only a stacktrace. These methods
are currently unused, making this patch NFC, but this is a necessary
precursor to the new script model, where we examine the stacktrace to
determine what variable info we will collect.
As part of the stacktrace-collection function, we also fetch the
instruction address for each stack frame, if it is made available by the
debugger; to enable this, this patch adds a new value with default
`None` to `FrameIR`.
clang: Add BoundArch/OffloadKind argument to getSupportedSanitizers
Currently the AMDGPU HIP and OpenMP toolchains falsely report
all host sanitizers are supported, and then go out of their way
to skip forwarding those to the device compiles. Add an offloading
kind argument so that in the future this can be handled in one
place in the base toolchain.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
[AArch64][SME] Elide private ZA setup when possible (#196090)
In private ZA functions without any instructions that require "active"
ZA we can omit all ZA setup (and saves/restores). This is equivalent to
removing the `__arm_new("za/zt0")` attribute when ZA state is unused.
[AArch64] Reflect cost of integer sub-reductions. (#194594)
The cost of sub-reductions is either the cost of *mlslb + *mlslt, or the
cost of a dot operation with 2 negations:
```
partial_reduce_umls acc, lhs, rhs
<=> -partial_reduce_umla -acc, lhs, rhs
```
(codegen for this was added by #186809)
The cost-model was previously a bit of a hack, since sub-reductions were
expanded and therefore expensive, although we made the expansion cost
artifically cheaper so that it would still be a candidate for cdot
instructions.