libclc: Unify fast FMA controls (#188244)
This was defined in multiple places with different names. Consolidate
on one, with a gentype wrapper for it. Also set the value based on the
standard FP_FAST_FMA* macros.
[MLIR][XeGPU][TransformOps] Remove obsolete transform ops (#187561)
Cleaning up XeGPU transform ops. Now that XeGPU layout propagation
works, it is sufficient to set the layouts for anchor ops (e.g.
load/store/dpas ops) only.
Changes:
* Remove `xegpu.get_desc_op` and `xegpu.set_desc_layout`. Users should
not change the layout of descriptor op's return value anymore.
* Add `xegpu.get_load_op(value)` that finds either `xegpu.load_nd` or
`xegpu.load` op in the value's producer chain. This is a useful utility
as load ops often need to be annotated with a layout.
* The generic `xegpu.set_op_layout_attr(op, ...)` is now replaced by
`xegpu.set_anchor_layout(op, ...)` that only sets layout attribute of
anchor ops. Raises an error if the given op does not support anchor
layouts.
* `xegpu.insert_prefetch` takes a load op handle instead of a value.
AMDGPU: Implememt memsize forms of isLoadFromStackSlot/isStoreToStackSlot (#188264)
Requested in #182673, though I'm not sure why this needs to be pushed
into targets. The size can be taken from the machine mem operand
generically.
[libc] Add utimensat syscall wrapper and entrypoint (#188347)
Implemented the utimensat syscall for Linux and added the entrypoint to
sys/stat.h.
* Added utimensat syscall wrapper to OSUtil
* Updated utimes to use the utimensat wrapper
* Added utimensat unit tests to sys/stat
* Configured entrypoints for x86_64, riscv, and aarch64
[OFFLOAD][L0] Add support to run ctor/dtor code (#187510)
This PR adds support in the Level Zero plugin to execute
constructors/destructors on the device code. As spirv-link has some
limitations, it mimics the CUDA plugin behavior where the RTL constructs
the device side tables before invoking the kernel that will execute
them.
The kernel and other necessary symbols to create the device tables are
created by the SPIRVCtorDtorLowering pass to be added in #187509
[ORC] Move DylibManager impl out of SelfExecutorProcessControl. (#188417)
SelfExecutorProcessControl no longer implements DylibManager. Instead a
private inner class, InProcessDylibManager, is used to implement this
interface. This change should not affect the behavior of
SelfExecutorProcessControl from the perspective of API clients.
This is a step towards decoupling ExecutorProcessControl implementations
from other interfaces.
AMDGPU: Simplify placeholder replacement in AMDGPUPromoteAlloca (#188202)
If `promoteAllocaUserToVector` returns the placeholder, it means the
instruction does not actually modify the alloca. we don't need to add
the placeholder as block available value for correctness. Instructions
appear afterwards in the the same block could still get the placeholder
as source value through GetCurVal() call. Instructions in other block
which access the alloca will be set up later when we really do
placeholder replacement.
This help simplify the placeholder replacement logic.
[libc][docs][NFC] Expand entrypoints technical reference (#4) (#188255)
Expanded entrypoints.rst with details about definitions, registration
rules, and the lifecycle of an entrypoint.
Updated multiple documents to remove redundant technical details and
link to the centralized entrypoints reference:
- libc/docs/dev/cmake_build_rules.rst
- libc/docs/dev/implementation_standard.rst
- libc/docs/porting.rst
- libc/docs/dev/source_tree_layout.rst
[compiler-rt] CRT builtins tests should not run on mac/windows under LLVM_ENABLE_RUNTIMES (#187835)
#171941 got the builtins tests running under LLVM_ENABLE_RUNTIMES by
testing the builtins as part of the runtimes build.
As a consequence, CMake in `lib/builtins/` is no longer visible when
configuring the tests (but `test/builtins/` is). This means that the
`cmake_dependent_option` from `lib/builtins/` is not accounted for by
the tests, allowing COMPILER_RT_BUILD_CRT to be YES when
COMPILER_RT_HAS_CRT is NO. As a consequence, the CRT tests are running
on platforms where COMPILER_RT_HAS_CRT is false (#176892).
https://github.com/llvm/llvm-project/blob/367da15a11c52886c50e7f020cb4de59fe6d07ca/compiler-rt/lib/builtins/CMakeLists.txt#L1106-L1108
Although the long-term solution could be to split both the builtins (and
their tests) out of compiler-rt into a top-level directory with shared
options, this works around the issue for the moment by checking both
COMPILER_RT_HAS_CRT and COMPILER_RT_BUILD_CRT before enabling the "crt"
[2 lines not shown]
[Reland] [ClangScanDeps] Do not emit warning for P1689 format (#186966) (#188401)
Close https://github.com/llvm/llvm-project/issues/185394
This is only for P1689 format as ClangScanDeps/optimize-vfs-pch.m will
check for warning message. I'll leave this to people who want to change
that.
[asan] Convert __SANITIZER_DISABLE_CONTAINER_OVERFLOW__ tests to C (#188406)
As-is tests do not pass on android with older C++ headers.
There is nothing C++ specific in tests.
Followup to #181721.
[Clang] define memory scopes as a builtin enum
Clang currently represents memory scopes as pre-defined preprocessor macros that
evaluate to integers. But so far, there are three sets of conflicting scopes:
"common" clang scopes, HIP scopes and OpenCL scopes. These sets use the same
integers in different orders, making it impossible to validate their use. A
better approach is to represent these scopes as enum types, so that the integer
values become less significant. Sema can now validate the scope argument by its
type instead.
Both C and C++ define an enum for memory_order, but there is no standard enum
for memory_scope. This change introduces a Clang-specific enum "memory_scope".
The pre-defined macros are now mapped to this enum. Later changes can add
similar enums for other languages.
enum __memory_scope {
__memory_scope_system,
__memory_scope_device,
__memory_scope_workgroup,
[19 lines not shown]
[ELF] Guard relocation section handling behind copyRelocs in addOrphanSections. NFC (#188409)
In addOrphanSections, getRelocatedSection() only returns non-null for -r
or --emit-relocs links. Guard code blocks with `copyRelocs` to skip
unnecessary dyn_cast + getRelocatedSection calls per section in the
common case. Hoist copyRelocs and relocatable to local variables so the
compiler does not reload them through ctx on every loop iteration.
"Assign sections" decreases by 1ms.
[Flang][OpenMP] Remove dead restoreIP in OpenMP taskloop lowering (#187222)
This fixes an intermittent crash in `OpenMP` taskloop lowering.
In `OMPIRBuilder::createTaskloop`, the `restoreIP` in `PostOutlineCB`
was immediately overwritten by the following
`Builder.SetInsertPoint(StaleCI)` with no instructions created in
between, so it was effectively dead. This patch removes that dead
restore, which is the smallest change and preserves the intended IR
placement.
Adds a regression test that compiles a taskloop to LLVM IR and verifies
the bounds casts and __kmpc_taskloop call are present.
[flang-rt] Fix macOS build: define _DARWIN_C_SOURCE for mmap flags (#186142)
On Darwin, `sys/mman.h` hides `MAP_JIT` and `MAP_ANON(YMOUS)` when
`_POSIX_C_SOURCE` is defined unless `_DARWIN_C_SOURCE` is also defined.
`trampoline.cpp` uses those flags, so this change defines
`_DARWIN_C_SOURCE` before including `<sys/mman.h>` in this file.
Fixes build failure reported in #183108.
Co-authored-by: Sairudra More <moresair at pe31.hpc.amslabs.hpecorp.net>