[AArch64] Fuse froundeven+convert into single instruction (#177800)
Stacked on https://github.com/llvm/llvm-project/pull/177799.
We're already able to fuse `fceil`, `ffloor`, `ftrunc`, and `fround`
followed by a float-to-int conversion into a single "rounded conversion"
instruction. However, we were not doing this for `froundeven`, even
though there's a "convert to integer, rounding to even" instruction
(`FCVTNS`/`FCVTNU`).
[clang-format] Add ObjCSpaceBeforeMethodDeclColon option to control space before Objective-C method return type (#170579)
[clang-format] Add ObjCSpaceBeforeMethodDeclColon option to control
space before Objective-C method return type
This patch introduces the ObjCSpaceBeforeMethodDeclColon style option,
allowing users to add or remove a space between the '-'/'+' and the
return type in Objective-C method declarations (e.g., '- (void)method'
vs '-(void)method').
Includes documentation and unit tests.
[libc] Modular printf option (float only)
This adds LIBC_CONF_PRINTF_MODULAR, which causes floating point support
(later, others) to be weakly linked into the implementation.
__printf_modular becomes the main entry point of the implementaiton, an
printf itself wraps __printf_modular. printf it also contains a
BFD_RELOC_NONE relocation to bring in the float aspect.
See issue #146159 for context.
[Polly][DeLICM] Check for error state (#178281)
When the ISL max-operations is exceeded, `is_wrapping` will return an
error state. Propagate the error state to the caller.
Fixes #175953
[AArch64] Align nontemporal store/load little-endian checks (#177468)
This patch aims to align all nontemporal store/load handling to
systematically enforce a little-endian target. This has been the
effective support LLVM had for NT store/load lowering (there has been no
effective support for big-endian, even with the inconsistencies).
The change in `llvm/lib/Target/AArch64/AArch64InstrInfo.td` is
effectively a NFC, because the only lowering of LDNP, in
`llvm/lib/Target/AArch64/AArch64ISelLowering.cpp`, have already checked
for `isLittleEndian`. The change in
`llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h` affects its
single caller
`llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp`. The
previous logic has been wrong, enabling vectorization of effectively
illegal nontemporal store/load instructions on big-endian.
[AMDGPU] Fix crash in SIWholeQuadMode with debug instructions.
The prepareInsertion function was crashing when debug instructions
appeared at positions being queried for slot indices. Debug instructions
don't have entries in the slot index map, so getInstructionIndex would
fail with an assertion.
Fixes SWDEV-480902.
[AArch64] Add missing GlobalISel patterns to round+convert multiclass (#177799)
This allows GlobalISel to fuse floating point round+convert operations
in the same way as SelectionDAG.
[InstCombine][profcheck] More fixes for missing branch data in InstCombineCompares.cpp (#178084)
Again, these fixes are trivial as we're creating new select instructions
with predicates from existing select instructions.
In this case, we create one select instruction from two existing select
instructions, but since both existing select instructions have the same
predicate, their profile data should be the same, so we can reuse the
profile data from either instruction. Therefore, we arbitrarily reuse
the profile data from the first select instruction.
Tracking issue: #147390
[AMDGPU] Fix crash in SIWholeQuadMode with debug instructions.
The prepareInsertion function was crashing when debug instructions
appeared at positions being queried for slot indices. Debug instructions
don't have entries in the slot index map, so getInstructionIndex would
fail with an assertion.
Use yet another allocator for LiveRanges
Not sure it's worth it for these, there should never be all that
many. We could pre-allocate the maximum size up front.
Use SpecificBumpPtrAllocator for LiveInterval
I didn't realize we used a singly linked list for storing subranges,
but that seems bad.
I didn't realize we used a singly linked list for storing subranges.
That seems bad and we should probably switch this to an array
[RISCV] Replace VPatBinaryV_VX_VROTATE with VPatBinaryV_VX. NFC (#178254)
VPatBinaryV_VX_VROTATE appeared to be almost exact copy and paste of
VPatBinaryV_VX except it used 'XLenVT' instead of 'vti.Scalar'.
'vti.Scalar' is 'XLenVT' for integer vectors so this wasn't a real
difference.
This change allows VV_VX or VV_VX_VI combination classes to be used,
further reducing the code.
No tablegen outputs change with this patch.
[RISCV] Set the reciprocal throughtput cost for division to TTI::TCC_Expensive (#177516)
Fixes #176208. Scaled back version of #176515 that only affects the RISCV backend.
Only modifies the cost for cases when DIV is a legal operation.
Updates the cost for both Scalar and Vector types.
Used `TTI::TCC_Expensive` as suggested by
https://github.com/llvm/llvm-project/issues/176208#issuecomment-3760902537.
---------
Co-authored-by: Luke Lau <luke_lau at icloud.com>
[CIR] Implement 'noreturn' attribute for functions/calls. (#177978)
This mirrors what LLVM does, and requires propagating into the LLVM
dialect: When the user specifies 'noreturn' we propagate this down
throughout the stack.
Note the similar 'willreturn' is too strong of a guarantee (in that they
are not opposites of each other, as there is a 'unknown' implied by all
others), so we cannot use that on non-noreturn functions.
[libc++] Rewrite the std::lower_bound benchmark to be more efficient and add an upper_bound benchmark (#177180)
The current benchmark is incredibly slow to run. This patch refactors
the benchmark to be faster and also adds an equivalent benchmark for
`std::upper_bound`.
Fixes #177026
[flang][acc] Use ReducibleType interface on LogicalType (#178253)
Introduce a new ReducibleType type interface in the OpenACC dialect that
provides a type-aware mechanism for translating OpenACC reduction
operators to arith::AtomicRMWKind values. This interface should be
attached to value types that can participate in OpenACC reductions.
For FIR, implement this interface on fir::LogicalType to handle the
AccLand and AccLor reduction operators, which map to
arith::AtomicRMWKind::andi and ori respectively.
[libc++][pstl] Generic implementation of parallel std::is_sorted (#176129)
This PR implements a generic backend-agnostic parallel `std::is_sorted`
based on `std::transform_reduce`.
While this approach is suboptimal comparing a direct backend-specific
implementation, since it doesn't support early termination and requires
a reduction operation, it does show speedup when the dataset is large
enough and the comparator is not absolutely trivial.
Parent issue: #99938