[Clang] Add __scoped_atomic_uinc_wrap and __scoped_atomic_udec_wrap builtins (#168666)
This PR extends __scoped_atomic builtins with inc and dec functions.
They map to LLVM IR `atomicrmw uinc_wrap` and `atomicrmw udec_wrap`.
These enable implementation of OpenCL-style atomic_inc / atomic_dec with
wrap semantics on targets supporting scoped atomics (e.g. GPUs).
---------
Co-authored-by: Copilot <175728472+Copilot at users.noreply.github.com>
[OFFLOAD] Add support for indexed per-thread containers (#164263)
Split from #158900 it adds a PerThreadContainer that can use STL-like
indexed containers based on a slightly refactored PerThreadTable.
---------
Co-authored-by: Joseph Huber <huberjn at outlook.com>
[clangd] Enable lit internal shell by default
Enable it now that all of the tests pass under the internal shell. The
internal shell is slightly faster (10-15%) and also provides a better
debugging experience.
Reviewers: petrhosek, ilovepi, kadircet, HighCommander4
Reviewed By: ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/169540
[clangd] Make lit tests work with the internal shell
This makes all of the clangd tests work with the internal shell.
Modifications needed for each test are as follows:
1. system-include-extractor.test was using variable expansion which is
not supported in the internal shell. This patch rewrites it to use
the readfile mechanism along with python. This isn't super pretty but
is readily understandable and there are only two tests across the
monorepo that use this construction, so making it prettier is hard to
justify.
2. include-cleaner-batch-fix.test - Was using $'' construction to create
new lines in a string. Simply replace it with multiple echo commands
to be canonical with the rest of the repository.
3. index-tools.test - Just add IndexBenchmark to the clangd test
depends, so the test now just works unconditionally. This should
significantly increase test coverage at little cost.
Reviewers: ilovepi, HighCommander4, petrhosek, kadircet
[3 lines not shown]
opt: Stop creating TargetMachine to infer the datalayout
The Triple directly has the datalayout string in it, so just
use that.
The logical flow here is kind of a mess. We were constructing
a temporary target machine in the asm parser to infer the datalayout,
throwing it away, and then creating another target machine for the
actual compilation. The flow of the Triple construction is still
convoluted, but we can at least drop the TargetMachine.
[libc] Modular printf option (float only)
This adds LIBC_CONF_PRINTF_MODULAR, which causes floating point support
(later, others) to be weakly linked into the implementation.
__printf_modular becomes the main entry point of the implementaiton, an
printf itself wraps __printf_modular. printf it also contains a
BFD_RELOC_NONE relocation to bring in the float aspect.
See issue #146159 for context.
[flang][cuda] Use PTX instruction for atomicAdd with 4xf32 (#169581)
Implementation similar to the clang one in
`clang/lib/Headers/__clang_cuda_intrinsics.h`