[AArch64] Improve pow(x,y) cost model for some constant values of y (#185607)
Some optimisations of pow(x, y) calls only occur during codegen,
e.g. pow(x, 0.25) -> sqrt(sqrt(x)) and at the IR level we don't
currently reflect this in the cost of calls to the llvm.pow
intrinsic. This patch attempts to fix that in cases where we know
the intrinsic can in general be legally lowered to libcalls. For
scalable vector variants of llvm.pow we need to be cautious, since
without a math library this cannot be scalarised and there is
always a small risk that the optimisation will not happen during
codegen.
[IVDescriptors] Remove single-use constraint from FindLast comparisons (#186096)
Just relaxing some minor constraints for FindLast recurrence detection.
[AArch64][SVE2] Allow commuting two-input NBSL/BSL2N idioms. (#184847)
Specifically, EON, NAND and NOR are commutable operations that lack
dedicated SVE2 instructions, but we support them via NBSL/BSL2N.
However, as NBSL/BSL2N have tied operands, sometimes we generate a COPY
even if one of the operands could be clobbered.
This patch defines custom expansion for these operations to allow using
their commuted forms or, if still necessary, using MOVPRFX for the COPY.
Should help with
https://github.com/llvm/llvm-project/pull/176194#discussion_r2889564685.
[gn] port b80248a0ea35df more (clang-doc md templates) (#186401)
The previous version misspelled the name of comments-partial.mustache,
and it put the md files in the wrong output directory.
[libc] Add support for chown on platforms that don't define SYS_chown (#186167)
Some platforms don't define SYS_chown (like risc-v), so this PR adds a
fallback to calling SYS_fchownat.
[Offload][L0] clear completed events from a wait list (#186379)
Queue's WaitEvent collection wasn't being cleared after synchronization
and resetting of the events. This led to hangs on subsequent host
synchronizations if not preceeded by any other operation.
[MIR] Support symbolic inline asm operands (#185893)
Support parsing and printing inline assembly operands in MIR using the
symbolic form instead of numeric register class IDs, thus removing the
need to update tests when the numbers change.
The numeric form remains supported.
---------
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[libc++] Make the associative container query benchmarks more representative (#183036)
Currently the query benchmarks are training the branch predictor
incredibly well, which isn't representative of the real world. This
change causes the branch misses to go from <1% to ~50% with the current
implementation of `__tree::__find_end`.
This patch also removes the `non-existent` benchmarks, since it'd be
non-trivial to write a representative benchmark for that case, and the
benchmark would be relatively low value. We're already searching to leaf
nodes ~50% of the time (since half the nodes are leaves) with the
current benchmark. So we'd only additionally cover a relatively trivial
failure branch that is only taken once per function call. The loop is
already covered through benchmarking with keys existing in the
container.
[mlir][tosa] Allow integer gather/scatter ops in fp profile (#183342)
This commit updates profile compliance to allow integer gather and
scatter operations to be used with the floating point profile. This
update aligns with the specification change:
https://github.com/arm/tosa-specification/pull/35.
[CIR] Implement zero-init-bases lowering (#186230)
This showed up in a test suite. A zero-initializer for a whole struct
seems completely sensible, as long as the type is zero-initializable.
This patch doesn't change the non-zero-init behavior (I am working on a
patch to do so, but it is a massive scope), so this is limited to JUST
classes with bases.
[VectorCombine] Fix crash in foldShuffleOfSelects for single-element shuffle result (#185713)
In foldShuffleOfSelects, if the shuffle result has a single element, the
resulting type may be scalar rather than a vector. The later code in
foldShuffleOfSelects assumes the result is a vector and performs cast<
FixedVectorType >, which triggers an assertion.
Fixes #183625
[AMDGPU] Pass MF into the SIInsertWaitcnts constructor. NFC. (#186369)
Pass MF into the SIInsertWaitcnts constructor instead of the run method.
This is more natural now that SIInsertWaitcnts is constructed once per
MachineFunction and enables future cleanup by initializing more fields
in the constructor that depend on MF.
[mlir][Bytecode] Fix stale deferred worklist entries in attribute callback fallthrough (#186150)
When parseCustomEntry() calls a user attribute/type callback that
internally reads sub-attributes/types via the bytecode reader, the
reader may add entries to the deferredWorklist if the depth limit is
exceeded. If the callback then returns success with an empty entry
(falling through to the regular dialect reader), the reader position is
reset but deferredWorklist retains stale entries from the failed partial
read.
This causes an assert(deferredWorklist.empty()) failure in debug builds
when the fallback dialect reader successfully parses the attribute.
Fix by saving and restoring deferredWorklist.size() around each callback
invocation, discarding any stale entries added during a callback's
partial read when the reader position is rolled back.
Fixes #163337
Assisted-by: Claude Code
[mlir][shape] Fix crash when folding tensor.extract(shape_of(memref)) (#186270)
The `ExtractFromShapeOfExtentTensor` canonicalization pattern was
unconditionally rewriting:
tensor.extract(shape.shape_of(%arg), %idx) -> tensor.dim(%arg, %idx)
even when `%arg` is a memref. This produced an invalid `tensor.dim`
(whose source operand must be a tensor), which then caused an assertion
failure in `DimOp::getSource()` when subsequent canonicalization
patterns tried to match the op:
Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type\!"'
failed. [To = TypedValue<TensorType>, From = Value]
Fix: add an `IsTensorType` constraint to
`ExtractFromShapeOfExtentTensor` in `ShapeCanonicalization.td` so the
pattern only fires when `%arg` is a tensor type. The memref case is
intentionally left unfolded (the correct lowering to `memref.dim` would
[8 lines not shown]