[AArch64][clang] Improve -march= error message with many feature flags (#197441)
When calling `clang` with a large number of feature flags, the entire
argument is printed as an error message if one of the feature flags is
invalid.
For example, before this change, when providing a large number of features
to `-march=` with one of them invalid, an error message such as this is
printed:
```
clang: error: unsupported argument 'armv9.6a+sme2+sme2p1+sve2+sve2p1+profile
+crypto+aes+sha2+sha3+sm4+memtag+ssbs+bf16+i8mm+dotprod+ls64+rcpc3+brbe+gcs
+faminmax+fp8+fp8fma+fp8dot4+fp8dot2+sme-f8f32+the+lut+lsui+pops+occmo
+rme-gpc3+d128+invalidfeature'
```
and a user doesn't know which of the `+feature` flags is actually invalid.
After this change, the following error message is printed:
```
[2 lines not shown]
[LV] Avoid crashing for vector calls with scalar byte types (#197417)
If a parameter to a vector function variant is uniform or linear, check
whether the type is SCEVable first. Byte types aren't, so would cause
an assert. We could improve this later if needed.
[LLVM][Constants] Remove the option to disable vector ConstantFP support. (#197427)
Removes the command line options:
-use-constant-fp-for-fixed-length-splat
-use-constant-fp-for-scalable-splat
[BOLT][AArch64] Account for hugify alignment in AArch64 long jump layout (#195272)
When --hugify is used for a PIE, the final section allocation in
RewriteInstance::mapCodeSections aligns the address after the last
non-cold text section before laying out the following sections:
for (BinarySection *Section : CodeSections) {
Address = alignTo(Address, Section->getAlignment());
Section->setOutputAddress(Address);
Address += Section->getOutputSize();
if (opts::Hugify && !BC->HasFixedLoadAddress &&
Section->getName() == LastNonColdSectionName)
Address = alignTo(Address, Section->getAlignment());
}
The AArch64 long-jump pass doesn't model that gap in its tentative
layout, so a CBZ could be considered in range during stub insertion and
later become out of range when JITLink applied the final layout.
[5 lines not shown]
AMDGPU/GlobalISel: Legalize scalar extloads with large memory type
Add narrowScalar for scalar sext/zextload when the memory type is
larger then 32 bits. There is no narrow scalar implementation when
NarrowSize < MemSize (split load) but we don't want that anyway.
Narrow scalar to MemSize creates large normal load + extension to dst.
[AArch64][clang] Improve -mcpu= and -mtune= error messages too
Similar to my previous change improving the error message for
`-march=` in #197441, this changes `-mcpu=` and `-mtune=` arguments
to only report the invalid feature flag, rather than the entire
string.
This is a much clearer error message for the user.
[X86] Improve lowering of i32/i64 minmax reductions (#197578)
Allow 32-bit targets to correctly lower i64 ISD::VECREDUCE min/max nodes
via ReplaceNodeResults - this is necessary once we're finally ready for
#194473 and remove combineMinMaxReduction entirely
Improve handling of v2iXX reduction stages by consistently preferring
binop(extract(),extract()) scalarisation on SSE targets (if the vector
binop isn't legal).
[compiler-rt][ARM] Optimized single-precision FP comparisons (#179925)
These comparison functions follow the same structure as the
double-precision ones in a prior commit, of a header file containing the
main logic and some entry points varying the construction of the return
value.
In this case, we have provided versions for Thumb1 as well as
Arm/Thumb2.
[clangd] Fix parens suppression in mid-identifier code-completion (#197249)
When completing in the middle of an existing identifier (e.g.
`fo^o<int>(42)`), the next-token check lexes the character immediately
after the cursor, which prevents parens suppression to kick in.
After the fix, we go to the end of the current identifier first and only
then we start lexing for the next token, which handles redundant parens
even when the cursor is mid-identifier.
This also fixes the parens suppression in the replace mode which by
design is used mid-identifier.
Fixes https://github.com/clangd/clangd/issues/387
[AMDGPU][GCNPreRAOptimizations] Reduce BVH premature reuse (#197386)
Add implicit uses to ds_bvh_stack instructions to avoid reuse of VGPRs
allocated to bvh_intersect_ray results prior to ds_bvh_stack. This
reduces likelihood of a premature s_wait_bvhcnt occuring due to partial
reallocation of unused bvh_intersect_ray results registers.
[compiler-rt][ARM] Optimized double-precision FP comparisons (#179924)
The structure of these comparison functions consists of a header file
containing the main code, and several `.S` files that include that
header with different macro definitions, so that they can use the same
procedure to determine the logical comparison result and then just
translate it into a return value in different ways.
[AArch64][clang] Improve -mcpu= and -mtune= error messages too
Similar to my previous change improving the error message for
`-march=` in #197441, this changes `-mcpu=` and `-mtune=` arguments
to only report the invalid feature flag, rather than the entire
string.
This is a much clearer error message for the user.
[LV] Fix the cost model for freeze instructions (#197188)
While working on a PR to add a cost model for VPDerivedIV recipes I
noticed that a loop in or_reduction_with_freeze:
test/Transforms/LoopVectorize/AArch64/reduction-cost.ll
stopped vectorising because the cost model decided it was no longer
worth it. However, the main cause of this was the incredibly high cost
(14) of freeze for VF=2. We were using the cost of a vector mul
instruction as a proxy for the freeze cost, which is incredibly bad for
an AArch64 target without SVE since the operation needs scalarising.
As far as I understand, the freeze instruction does not lead to any
actual code being generated and acts merely as a barrier to potentially
unsafe optimisations. As such, I've updated the cost model to return 0
instead.