[CIR] Implement non-odr use of reference type lowering (#185720)
This is used somewhat rarely, but is a pretty simple emission of
pointers, and ends up using infrastructure we already have.
Additionally, this is the first use of `getNaturalTypeAlignment` that
uses the `pointee` argument, so this adds the implementation there,
which includes some alignment work for CXXRecordDecls, so this
implements that as well.
[CIR] Implement 'builtin-addressof' for 'getPointerWithAlignment' (#185684)
The 'getPointerWithAlignment' is really only called when evaluating
arguments for builtins, so the test is a touch weird as it test through
bcopy. However, this shows up in some headers, so it is important that
we support this.
This patch just adds the implementation, which mirrors classic-codegen,
except that we don't generate TBAA.
[CIR] Implement deferred V-Table emission (#185655)
We are currently only emitting Vtables that have an 'immediate' need to
emit. There rest, we are supposed to add to a list and emit at the end
of the translation unit if necessary. This patch implements that
infrastructure.
The test added is from classic-codegen and came in at the same time as
the deferred vtable emission over there, and only works with deferred
vtable emission, and while it does test the deferred emission, tests
quite a bit more than that. AND since it came in with the same
functionality in classic codegen, seemed to make sense to come in here
too.
[mlir][dialect-conversion] Fix OOB crash in convertFuncOpTypes for funcs with extra block args (#185060)
Some function ops (e.g., gpu.func with workgroup memory arguments) have
more entry block arguments than their FunctionType has inputs. The
workgroup memory arguments are not part of the public function signature
but are present as additional block arguments.
`convertFuncOpTypes` previously created a `SignatureConversion` sized
only for `type.getNumInputs()`, then called `applySignatureConversion`
on the entry block. When the block had more arguments (e.g., workgroup
args), the loop in `applySignatureConversion` would call
`getInputMapping(i)` with out-of-bounds indices, causing an assertion
failure in `SmallVector::operator[]`.
Fix this by:
1. Sizing the `SignatureConversion` for all entry block arguments.
2. Adding identity mappings for extra block args beyond the function
type inputs.
3. Using only the converted function-type-input types when updating the
[5 lines not shown]
[mlir][scf] Fix crash in extractFixedOuterLoops with iter_args loops (#184106)
The stripmineSink helper splices loop body operations into a new inner
scf.for that has no iter_args. When the target loop carries iter_args,
values yielded by the spliced body are moved inside the inner loop, but
the outer loop's yield terminator still references those values,
creating an SSA invariant violation. In debug builds this triggers the
assertion
use_empty() && "Cannot destroy a value that still has uses\!"
when the outer RewriterBase tries to erase the now-broken operations.
Fix: in extractFixedOuterLoops, skip the strip-mining transformation if
any of the collected perfectly-nested loops have iter_args.
Add a regression test to parametric-tiling.mlir.
Fixes #129044
Assisted-by: Claude Code
[MLIR] Fix crash in ValueBoundsConstraintSet for non-entry block args (#185048)
When two vector transfer ops share a non-entry block argument as an
index (e.g., in a loop with unstructured control flow), calling
`ValueBoundsConstraintSet::areEqual` on those values caused a crash.
The first `populateConstraints` call would insert the block argument
into the constraint set. The second call found it already mapped and
called `getPos`, which hit an assert requiring the value to be either an
OpResult or an entry-block argument.
Fix with two changes:
1. In `insert()`, suppress adding non-entry block arguments to the
worklist. `ValueBoundsOpInterface` cannot derive bounds for such values,
so the worklist push was a no-op and triggered the re-entrant `getPos`
call.
2. Remove the overly conservative assert in `getPos`. Looking up a
previously inserted non-entry block argument is valid; the assert was
preventing legitimate use after the value had already been inserted.
[3 lines not shown]
[X86] Add mayLoad/mayStore to legacy instructions CMPS/LODS/MOVS/SCAS/STOS (#185689)
When LLVM is used to disassemble instructions, legacy X86 strings
instructions doesn't report memory access with mayLoad and mayStore.
Note that INS and OUTS may also need sush flags, but I'm not totally
sure which one.
system(3): Address test robustness issue
Don't assume that SIGINT and SIGQUIT are set to SIG_DFL at the start
of the test. Instead, retrieve their current dispositions and verify
that they are restored at the end of the test.
MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D55709
(cherry picked from commit 48368f702423742b2a7dff7ad3191625e8bf26f0)
system(3): Fix brain glitch in previous commit
We were saving SIGINT twice instead of SIGINT and SIGQUIT.
Also restore original order of operations (SIGINT then SIGQUIT), which
matches the order in which they're discussed in the POSIX description
[7 lines not shown]
system(3): Unwrap execve()
There is no need to call execl(), which will allocate an array and copy
our arguments into it, when we can use a static array and call execve()
directly.
MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D55648
(cherry picked from commit 40e52e0edd038460a2a2aca017b3ac5a513fe37b)
[VPlan] Handle FindLast in VPIRFlags::printFlags (#185857)
Noticed this when -vplan-print-after-all crashed on a find-last
reduction. We don't yet return an opcode for it because there's no
in-loop reduction.
py-mmh3: updated to 5.2.1
5.2.1
Added
Add support for the Android wheel for Python 3.14.
Removed
Drop support for Python 3.9, as it has reached the end of life on 2025-10-31.
py-lazy_loader: updated to 0.5
0.5
Enhancements
- Add `suppress_warning` parameter to the `load` function
Bug Fixes
- fix: Remove problematic try/finally block
- Make sure that `__dir__` returns new copies of `__all__`
- Allow disabled eager loading with EAGER_IMPORT=0
Documentation
- Update release process doc
[X86] Optimized ADD + ADC to ADC (#173543)
This patch folds an `adc` followed by an `add` into a single `adc` instruction when adding constants.
Fixes #173408
libclc: Add frexp_exp utility function
Many functions want to extract the exponent and
currently rely on bithacking to do it. These can be
better handled with frexp. AMDGPU has a dedicated
instruction for each of the frexp return values. Other
targets could override this to do the bithacking (though
they would be better off teaching codegen to optimize
frexp with a discarded output).
[StackSlotColoring] Check for zero stack slot size in RemoveDeadStores (#182673)
The default implementations of the methods isLoadFromStackSlot() and
isStoreToStackSlot() used in StackSlotColoring::RemoveDeadStores() set
the number of bytes loaded from the stack (MemBytes) to zero to indicate
that the value is unknown. This means that
StackSlotColoring::RemoveDeadStores() must abort if the size is zero
otherwise the stack slot size check doesn't mean anything.
As backends that use this are required to override the default
implementations this should not impose any degradation of the code.
As the registers also must match in
StackSlotColoring::RemoveDeadStores() for the store to be optimized away
there is small risk of this being a real bug.
---------
Co-authored-by: Karl-Johan Karlsson <karl-johan.karlsson at ericsson.com>