compiler-rt: Stop using APPLE cmake variable
Use a variable derived from the target triple instead.
This is a partial fix for cross compiling the GPU runtimes
on macos. Previously on mac hosts, the build system would go
down completely wrong paths. This improves the situation by
moving the failures to compilation pulling in host flags which
shouldn't be forwarded.
Despite the cmake documentation claiming the APPLE constant
is "Set to True when the target system is an Apple platform",
this appears to be true when the host is apple. Not sure if this
is worth reporting as a cmake bug, or if it's an artifact of some
runtimes build specifics. Change to using a new variable computed
from the target triple. Presumably the same bug exists for the various
WIN32 and ANDROID checks around, there are just fewer of them.
Forcefully require new attributes to be documented (#203296)
Several years ago we began to require all new attributes be documented,
but we never had anything enforcing the requirement. However, despite
reviewers requesting this documentation, it's been missed often enough
that enforcement makes sense in order to reduce maintenance burden.
This adds a new tablegen option to spit out the list of undocumented
attributes, and a test which lists all of the existing undocumented
ones. If a new attribute is added, this test should catch the failure.
[RISCV][CHERIoT] Define ABI_CHERIOT. (#192929)
These correspond to the CHERIoT ABI, documented here:
https://github.com/CHERIoT-Platform/cheriot-sail/releases/download/v1.0/cheriot-architecture-v1.0.pdf
In particular, CHERIoT is an RV32E-based architecture extended with
CHERI support that is not binary compatible with the proposed RV Y base.
Amongst other changes, it has customized calling conventions, such as
passing f64 in capability registers.
[libc][math] Adding LIBC_MATH_ASSUME_ROUND_NEAREST_ONLY option (#201154)
This PR adds a new option ~~`LIBC_MATH_ALWAYS_ROUND_NEAREST`~~
`LIBC_MATH_ASSUME_ROUND_NEAREST_ONLY` to LLVM libm.
There are some UBs that I kept as-is from the original code to keep the
changes non-disruptive (which I've marked with TODO comments)
Benchmarks (from `files.zip` in the comment
https://github.com/llvm/llvm-project/issues/198276#issue-4468816457):
- System libm:
```
overflow (>710) 1.55 ns/call (644M ops/sec)
underflow to 0 (<-746) 1.34 ns/call (747M ops/sec)
normal [-10,10] 4.41 ns/call (227M ops/sec)
denormals [-740,-735] 2.25 ns/call (444M ops/sec)
near-uflow [-700,-690] 2.25 ns/call (444M ops/sec)
```
- LLVM libm (without the option being set):
[20 lines not shown]
Refactor FreeTrie to be intrusive proxy, add pop_min and tests
- Make FreeTrie a proxy holding root by reference.
- Move root and range storage outside FreeTrie (into FreeStore).
- Add find_min and pop_min implementations.
- Update tests and add pop_min test.
TAG=agy
CONV=7d0c366e-7fef-4a10-adb5-c96b98f5f2e2
[OFFLOAD][L0] Use counter-based events for inorder queues (#202301)
Inorder queues can use counter-based events which have better
performance and provide early-reused semantics.
Assisted by Claude.
Revert "[Flang][OpenMP] remove enable-delayed-privatization-staging to suppor…" (#203348)
Reverts llvm/llvm-project#200952
test added in commit llvm/llvm-project#200952
`/offload/test/offloading/fortran/target-firstprivate.f90` on x86
offload, will fix test and open new PR with fixed changes
[SLP] Cost struct-returning intrinsic calls with a vector library mapping
getVectorCallCosts queried the vector intrinsic cost with a type-based-only
IntrinsicCostAttributes. That path always scalarizes struct-returning
intrinsics (e.g. llvm.sincos), which have no VFDatabase name mapping.
Retry with an argument-aware query when the type-based cost looks scalarized.
Fixes #200644
Reviewers: hiraditya, bababuck, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/201389
[SLP] Recompute deps of copyable-modeled operands used directly
An instruction modeled as a copyable element elsewhere can also be used
directly by a later-built node sharing an instruction with the copyable
nodes. The direct use was not counted, so the scheduler over-decremented
the operand and tripped the unscheduled-deps assertion. Defer
recomputation whenever the operand is modeled as a copyable element
anywhere.
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/203342
[CIR] Implement bind temporary lvalue (#202755)
This change implements the handling to emit a CXXBindTemporaryExpr
l-value. This is a very direct port from the classic codegen
implementation, leveraging existing functions in CIR.
[CIR] Force emission of static local enclosing functions (#201941)
When getOrCreateStaticVarDecl is called, we need to call
`getAddressOfGlobal` to trigger the emission of the enclosing function.
In most cases this has already happened, but there are cases where the
enclosing function would not otherwise have been emitted. See
https://bugs.llvm.org/show_bug.cgi?id=18020 for details.
It appears that this was mistakenly seen as OpenMP-specific behavior
because of an OpenMP RAII guard that surrounds it in classic codegen,
but that actually is there to skip the behavior when generating OpenMP
device code.
We also needed to insert the static local decl into CIRGenModule's map
by calling `setStaticLocalDeclAddress`. To avoid a duplicate emission.
Assisted-by: Cursor / claude-opus-4.8
[OFFLOAD][L0] Add control for Copy Offload Hint (#203203)
In some cases setting ZE_COMMAND_QUEUE_FLAG_COPY_OFFLOAD_HINT reduces
performance. Here we introduce
LIBOFFLOAD_LEVEL_ZERO_USE_COPY_OFFLOAD_HINT env var to allow users to
control the hint (which continues to be on by default).