[AArch64][clang] Improve -mcpu= and -mtune= error messages too
Similar to my previous change improving the error message for
`-march=` in #197441, this changes `-mcpu=` and `-mtune=` arguments
to only report the invalid feature flag, rather than the entire
string.
This is a much clearer error message for the user.
[X86] Improve lowering of i32/i64 minmax reductions (#197578)
Allow 32-bit targets to correctly lower i64 ISD::VECREDUCE min/max nodes
via ReplaceNodeResults - this is necessary once we're finally ready for
#194473 and remove combineMinMaxReduction entirely
Improve handling of v2iXX reduction stages by consistently preferring
binop(extract(),extract()) scalarisation on SSE targets (if the vector
binop isn't legal).
[compiler-rt][ARM] Optimized single-precision FP comparisons (#179925)
These comparison functions follow the same structure as the
double-precision ones in a prior commit, of a header file containing the
main logic and some entry points varying the construction of the return
value.
In this case, we have provided versions for Thumb1 as well as
Arm/Thumb2.
[clangd] Fix parens suppression in mid-identifier code-completion (#197249)
When completing in the middle of an existing identifier (e.g.
`fo^o<int>(42)`), the next-token check lexes the character immediately
after the cursor, which prevents parens suppression to kick in.
After the fix, we go to the end of the current identifier first and only
then we start lexing for the next token, which handles redundant parens
even when the cursor is mid-identifier.
This also fixes the parens suppression in the replace mode which by
design is used mid-identifier.
Fixes https://github.com/clangd/clangd/issues/387
[AMDGPU][GCNPreRAOptimizations] Reduce BVH premature reuse (#197386)
Add implicit uses to ds_bvh_stack instructions to avoid reuse of VGPRs
allocated to bvh_intersect_ray results prior to ds_bvh_stack. This
reduces likelihood of a premature s_wait_bvhcnt occuring due to partial
reallocation of unused bvh_intersect_ray results registers.
[compiler-rt][ARM] Optimized double-precision FP comparisons (#179924)
The structure of these comparison functions consists of a header file
containing the main code, and several `.S` files that include that
header with different macro definitions, so that they can use the same
procedure to determine the logical comparison result and then just
translate it into a return value in different ways.
[AArch64][clang] Improve -mcpu= and -mtune= error messages too
Similar to my previous change improving the error message for
`-march=` in #197441, this changes `-mcpu=` and `-mtune=` arguments
to only report the invalid feature flag, rather than the entire
string.
This is a much clearer error message for the user.
[LV] Fix the cost model for freeze instructions (#197188)
While working on a PR to add a cost model for VPDerivedIV recipes I
noticed that a loop in or_reduction_with_freeze:
test/Transforms/LoopVectorize/AArch64/reduction-cost.ll
stopped vectorising because the cost model decided it was no longer
worth it. However, the main cause of this was the incredibly high cost
(14) of freeze for VF=2. We were using the cost of a vector mul
instruction as a proxy for the freeze cost, which is incredibly bad for
an AArch64 target without SVE since the operation needs scalarising.
As far as I understand, the freeze instruction does not lead to any
actual code being generated and acts merely as a barrier to potentially
unsafe optimisations. As such, I've updated the cost model to return 0
instead.
[compiler-rt][ARM] Optimized double-precision FP mul/div (#179923)
Optimized AArch32 implementations of `muldf3` and `divdf3` are provided.
The division function is particularly tricky because its Newton-Raphson
approximation strategy requires a rigorous error bound. In this version
of the commit I've left out the full supporting machinery that validates
the error bound via Gappa and Rocq, but full details are provided via
links to the upstream version of this code in the Arm Optimized Routines
repository, and to a pair of Arm Community blog posts.
[clang-format] Add BreakBeforeReturnType option (#197268)
In certain codebases (e.g. embedded) — function declarations could
accumulate a long prefix of specifiers and attributes (`static`,
`inline`, `__attribute__((...))`, project-specific `AttributeMacros`,
etc.) before the return type, which buries the core prototype and pushes
parameters past the column limit.
This patch adds a `BreakBeforeReturnType` style option that places that
prefix on its own line(s):
```cpp
__attribute__((always_inline)) static inline
int do_thing(int a, int b, int c);
```
The recognized prefix tokens are function/storage specifiers (`static`,
`extern`, `inline`, `virtual`, `constexpr`, `consteval`, `friend`,
`export`, `_Noreturn`, `__forceinline`), C++11 attribute groups
[16 lines not shown]