[SLP] Fix GEP cost computation for load vectorization cost estimates
Pass Instruction::Load instead of Instruction::GetElementPtr to
getGEPCosts in isMaskedLoadCompress and CheckForShuffledLoads.
These call sites estimate costs for wide contiguous loads and sub-vector
load patterns, not for masked gather pointer vector formation. Using
Instruction::GetElementPtr incorrectly triggered the gather-style cost
path, which computes vector GEP formation costs. Since the call sites
already add scalarization overhead for pointer vector building
separately, this led to double-counting of pointer costs and inaccurate
vectorization decisions.
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/191620
[libc++][numeric] P4052R0: Renaming saturation arithmetic functions (#189574)
Implements P4052R0.
Also renames:
- the internal names for consistency.
- test files (no changes to the contents but the function names).
Fixes: #189589
---------
Co-authored-by: A. Jiang <de34 at live.cn>
[AMDGPU][Scheduler] Use MIR-level rematerializer in rematerialization stage (#189491)
This makes the scheduler's rematerialization stage use the
target-independent rematerializer. Previously duplicate logic is
deleted, and restrictions are put in place in the stage so that the same
constraints as before apply on rematerializable registers (as the
rematerializer is able to expose many more rematerialization
opportunities than what the stage can track at the moment). Consequently
it is not expected that this change improves performance overall, but it
is a first step toward being able to use the rematerializer's more
advanced capabilities during scheduling.
This is *not* a NFC for 2 reasons.
- Score equalities between two rematerialization candidates with
otherwise equivalent score are decided by their corresponding register's
index handle in the rematerializer (previously the pointer to their
state object's value). This is determined by the rematerializer's
register collection order, which is different from the stage's old
[12 lines not shown]
[JITLink] Use NonOwningSymbolStringPtrs in ExternalSymbolsMap. (#191634)
SymbolStringPtr comparisons should be more efficient that string
comparisons. Fixes a FIXME.
[llvm-readobj][ELF] Use WrappedError to filter duplicates
Switch from StringError to WrappedError. Errors of the form "Prefix:
Error" can now be filtered out based on the underlying error while
preserving distinct prefixes, resulting in clearer llvm-readobj output.
[Object][ELF] Pass Error to WarningHandler
Warning consumers may need to handle errors based on their type. Pass
the Error object instead of a string representation to enable this. This
also brings WarningHandler in line with Support/WithColor.h.
[flang] clang-format flang/lib/Semantics/resolve-directives.cpp (#191660)
The changes are only on 5 lines, but now the entire file is invariant
under clang-format.
[llvm-readobj][ELF] Use WrappedError to filter duplicates
Switch from StringError to WrappedError. Errors of the form "Prefix:
Error" can now be filtered out based on the underlying error while
preserving distinct prefixes, resulting in clearer llvm-readobj output.
[Object][ELF] Pass Error to WarningHandler
Warning consumers may need to handle errors based on their type. Pass
the Error object instead of a string representation to enable this. This
also brings WarningHandler in line with Support/WithColor.h.
[Support] Add WrappedError class
The error consumer filters duplicate errors based on a portion of the
error message. Introduce a new Error kind that carries a prefix string
to support this use case.
[X86] Convert VPABSQ NonVLX patterns to use avx512_unary_lowering helper (#191648)
Move avx512_unary_lowering so we can avoid manually writing the XMM/YMM->ZMM widening for NonVLX targets
Adds some missing comments for instruction classes as well