[ValueTracking] Add CharWidth argument to getConstantStringInfo (NFC)
The method assumes that host chars and target chars have the same width.
Add a CharWidth argument so that it can bail out if the requested char
width differs from the host char width.
Alternatively, the check could be done at call sites, but this is more
error-prone.
In the future, this method will be replaced with a different one that
allows host/target chars to have different widths. The prototype will
be the same except that StringRef is replaced with something that is
byte width agnostic. Adding CharWidth argument now reduces the future
diff.
[IR] Account for byte width in m_PtrAdd
The method has few uses yet, so just pass DL argument to it. The change
follows m_PtrToIntSameSize, and I don't see a better way of delivering
the byte width to the method.
[IRBuilder] Add getByteTy and use it in CreatePtrAdd
The change requires DataLayout instance to be available, which, in turn,
requires insertion point to be set. In-tree tests detected only one case
when the function was called without setting an insertion point, it was
changed to create a constant expression directly.
[SimplifyLibCalls] Add initial support for non-8-bit bytes
The patch makes CharWidth argument of `getStringLength` mandatory
and ensures the correct values are passed in most cases.
This is *not* a complete support for unusual byte widths in
SimplifyLibCalls since `getConstantStringInfo` returns false for those.
The code guarded by `getConstantStringInfo` returning true is unchanged
because the changes are currently not testable.
[ValueTracking] Make isBytewiseValue byte width agnostic
This is a simple change to show how easy it can be to support unusual
byte widths in the middle end.
[IR] Make @llvm.memset prototype byte width dependent
This patch changes the type of the value argument of @llvm.memset and
similar intrinsics from i8 to iN, where N is the byte width specified
in data layout string.
Note that the argument still has fixed type (not overloaded), but type
checker will complain if the type does not match the byte width.
Ideally, the type of the argument would be dependent on the address
space of the pointer argument. It is easy to do this (and I did it
downstream as a PoC), but since data layout string doesn't currently
allow different byte widths for different address spaces, I refrained
from doing it now.
[DataLayout] Add byte specification
This patch adds byte specification to data layout string.
The specification is `b:<size>`, where `<size>` is the size of a byte
in bits (later referred to as "byte width").
Limitations:
* The only values allowed for byte width are 8, 16, and 32.
16-bit bytes are popular, and my downstream target has 32-bit bytes.
These are the widths I'm going to add tests for in follow-up patches,
so this restriction only exists because other widths are untested.
* It is assumed that bytes are the same in all address spaces.
Supporting different byte widths in different address spaces would
require adding an address space argument to all DataLayout methods
that query ABI / preferred alignments because they return *byte*
alignments, and those will be different for different address spaces.
This is too much effort, but it can be done in the future if the need
arises, the specification reserves address space number before ':'.
Reapply "[VPlan] Run removeDeadRecipes early." (#195325) (#195445)
This reverts commit 2a9699ccd128d7f94372d18c97229e1934b8506e.
Recommit contains a small fix for skipping dead recipes when finding
induction casts.
Original message:
The initial simplifyRecipes run can leave dead recipes, which
removeDeadRecipes can clean up, similar for dead instructions in the
input.
PR: https://github.com/llvm/llvm-project/pull/190191
[MLIR] Add HasAncestor op trait
Add HasAncestor/AncestorOneOf traits that verify an operation has a
specific ancestor anywhere in the parent chain, unlike HasParent which
only checks the immediate parent. This enables declarative verification
for ops that can be nested arbitrarily deep inside a required ancestor.
[NFC][LLVM] Simplify `PruningFunctionCloner::cloneInstruction` (#195389)
Add early returns and decrease indendation of the code that does
implements calls to constrained intrinsics.
[VPlan] Simplify extract-lane of all single-scalars (#194838)
Checking against vputils::isSingleScalar is sufficient for both
correctness and profitability.
[Clang-Tidy] Skip `misc-unused-parameters` in macro. (#194999)
The new parameter allows to skip the check for the cases when we need to
use the macro, but there is no immediate way to fix the macro itself. It
recently come up with a gradual adoption of clang-tidy and not all parts
of the code could be fixed at once.
Simply enabling it using `clang-tidy-diff` is not enough, since
`misc-unused-parameters` would cause false-positive, since the given
diff didn't introduce the unused parameters and might be not easy to
change.
The given parameters allow for better incremental adoption.
---------
Co-authored-by: Dmitrii Kuragin <dkuragin at adobe.com>
Co-authored-by: EugeneZelenko <eugene.zelenko at gmail.com>
[InstCombine] Add user-count bailout to isAllocSiteRemovable (#190347)
isAllocSiteRemovable() walks all transitive users of an alloc site, but
sites with many users are almost never removable. Profiling on
real-world codegen workloads (73,943 alloc sites) showed:
- 89 removable sites, max 1,392 users walked
- 73,854 non-removable sites, avg 31,305 users walked
- 2.31B total wasted user visits (~400s wall-clock on a 35-min build)
Skip the removability analysis when direct user count exceeds a
configurable threshold (default 2048, tunable via hidden cl::opt
-instcombine-max-allocsite-removable-users).
Also defer WeakTrackingVH conversion: collect into Instruction* first
and convert only when the site is actually removable.
[libc][math] Refactor fmul-fsub-frexp family to header-only (#195431)
Refactors the fmul-fsub-frexp math family to be header-only.
part of: #147386
Target Functions:
- fmul
- fmulf128
- fmull
- fsub
- fsubf128
- fsubl
- frexp
- frexpbf16
- frexpl
Co-authored-by: Muhammad Bassiouni <60100307+bassiounix at users.noreply.github.com>
[llvm] Add support for atomicrmw and cmpxchg in AssumeBundleBuilder (#194630)
The assume builder currently only preserves dereferenceable, nonnull,
and alignment knowledge for regular load/store instructions and calls.
Atomic memory accessing instructions (atomicrmw and cmpxchg) also
dereference their pointer operands, but were previously skipped, causing
useful knowledge to be lost across these operations.
Add handling for AtomicRMWInst and AtomicCmpXchgInst in
AssumeBuilderState::addInstruction(), using the same addAccessedPtr()
path as loads and stores. The accessed type is taken from the value
operand (atomicrmw) or compare operand (cmpxchg), which corresponds to
the in-memory element type, and the alignment is taken from the
instruction's explicit alignment.
Add a test to verify that assume bundles are correctly generated before
atomicrmw and cmpxchg instructions.
---------
Co-authored-by: Nikita Popov <github at npopov.com>
[RISCV][CodeGen] Add initial vzip codegen support (#194548)
Add initial support for vzip instruction, which is included in zvzip
extension. It is used to lower VECTOR_SHUFFLE with interleave pattern
and VECTOR_INTERLEAVE.
[CIR] Add RegionBranchOpInterface unit tests and fix control flow bugs
Add unit tests for RegionBranchOpInterface implementations across CIR
control flow operations: IfOp, ScopeOp, TernaryOp, SwitchOp, WhileOp,
ForOp, DoWhileOp, and TryOp. The tests verify successor regions,
terminator successors, loop detection, repetitive region marking, and
op/terminator successor consistency.
Fix a missing return in ConditionOp::getSuccessorRegions that caused
fallthrough from the loop case to an unconditional cast<AwaitOp>,
crashing when the parent is a loop operation.
Fix IfOp::getSuccessorRegions to report parent exit as a successor
when the else region is absent, correctly modeling the case where the
condition is false.