[compiler-rt][ARM] Optimized double-precision FP mul/div (#179923)
Optimized AArch32 implementations of `muldf3` and `divdf3` are provided.
The division function is particularly tricky because its Newton-Raphson
approximation strategy requires a rigorous error bound. In this version
of the commit I've left out the full supporting machinery that validates
the error bound via Gappa and Rocq, but full details are provided via
links to the upstream version of this code in the Arm Optimized Routines
repository, and to a pair of Arm Community blog posts.
[clang-format] Add BreakBeforeReturnType option (#197268)
In certain codebases (e.g. embedded) — function declarations could
accumulate a long prefix of specifiers and attributes (`static`,
`inline`, `__attribute__((...))`, project-specific `AttributeMacros`,
etc.) before the return type, which buries the core prototype and pushes
parameters past the column limit.
This patch adds a `BreakBeforeReturnType` style option that places that
prefix on its own line(s):
```cpp
__attribute__((always_inline)) static inline
int do_thing(int a, int b, int c);
```
The recognized prefix tokens are function/storage specifiers (`static`,
`extern`, `inline`, `virtual`, `constexpr`, `consteval`, `friend`,
`export`, `_Noreturn`, `__forceinline`), C++11 attribute groups
[16 lines not shown]
[Clang][Coroutines] Don't emit fake uses for coroutine parameters (#194690)
Fixes issue: https://github.com/llvm/llvm-project/issues/192351
The combination of coroutines with -fextend-variable-liveness has
resulted in use-after-free, caused by the fact that we insert fake uses
of coroutine parameters at the end of the coroutine. While this is fine
for normal functions, in coroutines these variables are stored in the
coroutine frame, which is freed before the end of the function; this
results in us loading from the deleted frame.
This patch fixes this by no longer emitting fake uses for most coroutine
parameters. Since coroutine parameters will be saved back to the frame
when we suspend, and currently may not be optimized out, fake uses are
not needed in this case, and so by not emitting them we avoid dealing
with the complexity of updating fake uses in the CoroSplit pass. The
exception to this is 'this', which is not saved to the frame.
(cherry picked from commit efb01c1bf558eaaf8ec64e1a54110584e827f21b)
[AArch64][test] Fix use-after-scope in createInstrInfo (#197622)
https://github.com/llvm/llvm-project/pull/183506 revealed a pre-existing
use-after-scope in createInstrInfo (MSan bot:
https://lab.llvm.org/buildbot/#/builders/164/builds/21562 [*]).
This patch fixes the issue by changing the stack-allocated
AArch64Subtarget (which goes out of scope once createInstrInfo()
returns) into heap-allocated, allowing it to be safely stored in the
returned AArch64InstrInfo.
-----
[*] WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55555666fabd in
llvm::AArch64InstrInfo::getInstSizeInBytes(llvm::MachineInstr const&)
const
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:247:5
...
[19 lines not shown]
[AArch64] Keep MMO when converting gather lane to LDRSui. (#197522)
We were losing the MMO when converting the load. Make sure we copy them
over, which apparently alters codegen more than I expected and helps
keep postinc generation after #196305.
[AMDGPU] Remove RCP_IFLAG combine (#197426)
The combine was added in D48569 8 years ago with the aim of preserving
flags, but the current LangRef says the status flags are not observable
in the default FP environment.
The main motivation for this change is to enable scalar float reciprocal
generation v_s_rcp_f32 on newer hardware. There is no v_s_rcp_iflag_f32,
so the combine effectively blocks the selection.
See: pseudo-scalar-transcendental.ll.
[libc] Add LLVM_LIBC_ENABLE_EXPERIMENTAL_ENTRYPOINTS CMake flag (#197537)
Adds a new CMake option, OFF by default, to gate entrypoints with
known-incomplete implementations. This lets developers build and test
partially-implemented functions without exposing them to production
users.
The motivating case is `sysconf`, which only handles three of the
required `_SC_*` constants (`_SC_PAGESIZE`, `_SC_NPROCESSORS_CONF`,
`_SC_NPROCESSORS_ONLN`) and returns `EINVAL` for everything else.
Functions like this are useful to have in a build for testing progress,
but shouldn't be part of a default full build until the implementation
is complete.
Changes:
- `libc/CMakeLists.txt`: adds
`option(LLVM_LIBC_ENABLE_EXPERIMENTAL_ENTRYPOINTS ... OFF)`
- `libc/cmake/modules/LLVMLibCCompileOptionRules.cmake`: propagates
`-DLIBC_EXPERIMENTAL_ENTRYPOINTS` when ON
[6 lines not shown]
[OpenMP] Fix launch_bounds for OpenMP ompx_attribute (#195665)
This commit fixes the handling of `launch_bounds` within OpenMP's
`ompx_attribute`. The third attribute value, the maximum blocks, was not
parsed correctly.
[BOLT][DWARF] Support DW_FORM_ref_udata and DW_OP_regval_type (#197565)
Add support for DWARF opcodes seen in GCC-generated binaries:
- DW_FORM_ref_udata: ULEB128-encoded CU-relative DIE reference.
- DW_OP_regval_type (0xa5): DWARF5 expression opcode with operands
(SizeLEB, BaseTypeRef). The BaseTypeRef was not being updated when DIEs
were relocated because cloneExpression only handled (Size1, BaseTypeRef)
patterns. Generalized the first-operand copying to use raw bytes from
the data stream instead of assuming a single byte.
Fixes #188250
Assisted-by: Claude Opus 4.6/4.7
[X86] Remove extra MOV after widening atomic store
This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
[flang][openacc] allow duplicate data sharing clauses (#197019)
This PR allows duplicate OpenACC `private` and `firstprivate` clauses.
While maintaining the restriction on `reduction` clauses.
[flang][cuda] Honor !dir$ ignore_tkr(m) under -gpu=mem:{unified,managed} (#197518)
A device-typed dummy with `!dir$ ignore_tkr(m)` is meant to be an
overload discriminator (only selected for actuals with an explicit
`device/managed/unified` attribute). Skip the host->device relaxation in
AreCompatibleCUDADataAttrs when `IgnoreTKR::Managed` is set so
unattributed host actuals no longer bind to such a dummy.
Also document the §3.2.3 matching distance table next to
GetMatchingDistance and add LIT tests for the full Table 2 grid
and the ignore_tkr(m) carve-out.
[AMDGPU] Validate forced lit() immediate (#196623)
Right now it takes validation path of an inline constant if fits
even though it is forced to literal encoding.