Fix ownership based deallocation pass crash (#179357)
The `OwnershipBasedBufferDeallocation` pass crashes when the IR contains
memrefs that are live in the same Block but are defined in different
Blocks. During this pass, live memrefs in a given block are sorted
according to the comparison function `ValueComparator`. This causes an
assertion to be triggered when sorting memref values using
`ValueComparator` as the comparison function. The assertion triggered is
found in `Operation::isBeforeInBlock`, which requires `this` and `other`
to reside in the same block. (See the definition
[here](https://github.com/llvm/llvm-project/blob/main/mlir/lib/IR/Operation.cpp#L385-L386).)
The fix is to handle values from different blocks in the
`ValueComparator` by sorting based on Block number if the compared ops
aren't in the same block. While `computeBlockNumber` is intended for
debugging and error messages, it is a convenient utility that can
provide a sufficient weak ordering for `llvm::sort` while handling
operations from different parent blocks. I'm not aware of another
ordering relation for Blocks that would be appropriate as well as cheap
[8 lines not shown]
[CIR] Add cir.atomic.xchg to target lowering (#180744)
This patch adds the `cir.atomic.xchg` operation to the TargetLowering
pass. The synchronization scope attached to the operation will be
canonicalized there.
[AArch64][ISel] Lower fixed-width i64 vector CLMUL intrinsics (#178876)
NEON's PMULL/PMULL2 can be used and its lower bits taken to lower CLMUL
intrinsics, so long as +aes is present.
[BOLT][BTI] Patch ignored functions in place when targeting them with
indirect branches
When applying BTI fixups to indirect branch targets, ignored functions are
considered a special case:
- these hold no instructions,
- have no CFG,
- and are not emitted in the new text section.
The solution is to patch the entry points in the original location.
If such a situation occurs in a binary, recompilation using the
-fpatchable-function-entry flag is required. This will place a nop at all
function starts, which BOLT can use to patch the original section.
Without the extra nop, BOLT cannot safely patch the original .text section.
An alternative solution could be to also ignore the function from which
the stub starts. This has not been tried as LongJmp pass - where most
[3 lines not shown]
[MLIR][LLVMIR] Add support for importing ConstantInt/FP vector splats. (#180946)
Updates LLVM IR importing to remove the assumption that
ConstantInt/ConstantFP are always scalar.
[AArch64] Eliminate XTN/SSHLL for vector splats (#180913)
Combine:
sext(duplane(insert_subvector(undef, trunc(X), 0), idx))
Into:
duplane(X, idx)
This avoids XTN/SSHLL instruction sequences that occur when splatting
elements from boolean vectors after type legalization, which is common
when using shufflevector with comparison results.
[llvm-reduce] Add a pass to replace unconditinal branches with returns
Unconditional branches could end up in infinite loops in the reduced code,
while the code could have been reduce furter.
This patch implements a simple pass that replaces unconditional branches
with returns.
[lldb][doc] Improve documentation for `ScriptedFrameProvider` (#179996)
* Provide a minimal, working example
* Document the instance variables
* Remove mention of `thread.SetScriptedFrameProvider` (which doesn't exist)
* add missing `@staticmethod` annotation
* fix rendering of bullet-pointed lists
[libc] Add getc, ungetc, fflush to enable libc++ iostream on baremetal (#175530)
After https://github.com/llvm/llvm-project/pull/168931 landed getc,
ungetc and fflush are still missing at link time while trying to make
libc++ std::cout work with LLVM libc on baremetal.
ungetc implementation is very minimal only to cover the current standard
streams implementation from the patch above.
[libc++] Avoid including pair in <__functional/hash.h> (#179635)
We already have `_PairT`, which is just a pair of two `size_t`s, so we
might as well use that throughout the file. This avoids the `pair`
include altogether, reducing header parse times a bit in some cases.
[SelectionDAG] Make sure demanded lanes for AND/MUL-by-zero are frozen (#180727)
DAGCombiner can fold a chain of INSERT_VECTOR_ELT into a vector AND/OR
operation. This patch adds protection to avoid that we end up making the
vector more poisonous by freezing the source vector when the elements
that should be set to 0/-1 may be poison in the source vector.
The patch also fixes a bug in SimplifyDemandedVectorElts for
MUL/MULHU/MULHS/AND that could result in making the vector more
poisonous. Problem was that we skipped demanding elements from Op0 that
were known to be zero in Op1. But that could result in elements being
simplified into poison when simplifying Op0, and then the result would
be poison and not zero after the MUL/MULHU/MULHS/AND. The solution is to
defensively make sure that we demand all the elements originally
demanded also when simplifying Op0.
This bugs were found when analysing the miscompiles in
https://github.com/llvm/llvm-project/issues/179448
[6 lines not shown]
[MLIR][ODS] Make dialect attribute helper member functions const (NFC)
This commit marks member functions of dialect attribute helpers as
constant. This ensures that these helpers can be added as members of
rewrite patterns, whose `matchAndRewrite` functions are marked as const
as well.
[libc++][NFC] Use std::quoted in fs::path and remove the private __quoted (#181043)
We've provided `std::filesystem` before C++17 in the past, but we don't
anymore, so we can use `std::quoted`.
[clang][Builtins][ARM] NFC updates in ARM.cpp (#180966)
Updates the logic in `CodeGenFunction::EmitAArch64BuiltinExpr` so that
we always start with the general code and we only fall-back to
specialised cases (i.e. `switch` stmts) for intrinsics for which the
general code does no apply.
BEFORE (only high-level:
```cpp
Value *CodeGenFunction::EmitAArch64BuiltinExpr() {
(...)
/// 1. SWITCH STMT FOR NON-OVERLOADED INTRINSIS
switch (BuiltinID) {
default break:
case NEON::BI__builtin_neon_vabsh_f16:
(...)
}
/// 2. GENERAL CODE
[58 lines not shown]
[DAGCombiner] Fix subvector extraction index for big-endian STLF (#180795)
This PR fixes a big-endian regression in `ForwardStoreValueToDirectLoad`
where the wrong subvector was being extracted. In big-endian, memory
offset 0 corresponds to the high bits, so the extraction index needs to
be adjusted.
As suggested by @KennethHilmersson, calculate the extraction index as
the difference between the number of elements in the intermediate vector
and the load vector when in big-endian mode.
Special thanks to Kenneth Hilmersson for providing the fix logic and the
ARM regression test.
https://github.com/llvm/llvm-project/pull/172523#issuecomment-3878065191https://github.com/llvm/llvm-project/pull/172523#issuecomment-3879575092
[ValueTracking] Extend computeConstantRange for add/sub, sext/zext/trunc
Recursively compute operand ranges for add/sub and propagate ranges
through sext/zext/trunc.
For add/sub, the computed range is intersected with any existing range
from setLimitsForBinOp, and NSW/NUW flags are used via addWithNoWrap/
subWithNoWrap to tighten bounds.
The motivation is to enable further folding of reduce.add expressions
in comparisons, where the result range can be bounded by the input
element ranges.
Compile-time impact on llvm-test-suite is <0.1% mean.