[orc-rt] Add managed-code-calls TaskGroup. (#190740)
Adds a ManagedCodeCallsGroup TaskGroup to Session, and updates the
shutdown sequence to wait until all calls into managed code have
completed before proceeding to shut down the Session's Services and the
Session itself.
To support safe calls into managed code two new helper template methods
are added:
callManagedCodeSync attempts to acquire a TaskGroup::Token for the
ManagedCodeCallsGroup before calling the given function and returning
its result.
callManagedCodeAsync attempts to acquire a TaskGroup::Token for the
ManagedCodeCallsGroup before calling the given async function. The
wrapped Return call for the async function will carry the acquired
Token, ensuring that shutdown waits for the async Return call to be
destroyed (whether or not it's actually called).
[mlir][BytecodeReader] Fix crash reading FusedLoc with empty locations (#189228)
FusedLoc::get(context, locs) may return UnknownLoc when locs is empty
and no metadata is provided. The bytecode reader's cBuilder used
cast<FusedLoc>() on this result, which crashes with an assertion
failure.
Fix by giving the FusedLoc DialectAttribute its own cBuilder that passes
Attribute() explicitly, causing getChecked<FusedLoc> to call the
two-parameter storage constructor directly and always produce a
FusedLoc.
Fixes #99626
Assisted-by: Claude Code
[clang][ExtractAPI] emit correct spelling for type aliases (#134007)
Previously, C++11 type aliases were serialized using "typedef"
regardless of the source spelling.
This checks if the TypedefNameDecl is actually a TypeAliasDecl and
corrects the spelling.
[flang][OpenMP] Use OmpDirectiveSpecifications in helper functions (#190644)
This will make them more reusable, for example when processing APPLY
clause in the future.
Issue: https://github.com/llvm/llvm-project/issues/185287
[VPlan] Optimize FindLast of (binop %IV, live-in) by sinking. (#183911)
When we are finding the last occurrence of a value of an expression that
depends on an induction, we can vectorize this by just selecting the IV
and sinking the expression in the middle block
This follows one of @ayalz's suggestions during earlier discussions for
adding support for CAS/FindLast patterns.
This patch starts with the simplest case, where the selected value is a
simple binary expression of a wide IV and a loop-invariant operand.
This should always be profitable, as the current restriction to binary
operators ensures that the width of the wide IV matches the original
reduction width, we won't introduce any new, wider reduction phi
recipes, and remove the boolean reduction + the horizontal reduction in
the loop.
PR: https://github.com/llvm/llvm-project/pull/183911
[CodeGen] Fix incorrect rematerialization order in rematerializer (#189485)
When rematerializing DAGs of registers wherein multiple paths exist
between some regsters of the DAG, it is possible that the
rematerialization determines an incorrect rematerialization order that
does not ensure that a register's dependencies are rematerialized before
itself; an invariant that is otherwise required.
This fixes that using a simpler recursive logic to determine a correct
rematerialization order that honors this invariant. A minimal unit test
is added that fails on the current implementation.
[llvm] Mark Darwin arm64 to UNSUPPORTED for 2010-11-04-BigByval.ll (#190594)
Update AArch64 UNSUPPORTED on CodeGen/Generic/2010-11-04-BigByval.ll to
include Darwin, where it is referred to as arm64 rather than aarch64.
llvm][docs] Cleanup LLDB release notes (#190760)
* A few items were in the wrong place.
* FreeBSD batch mode check was removed in
d0f5df111865ea4bb9d7d6ff35b517ee1aa7402f.
* Mark some names as plaintext.
* Fix some spellings.
[AArch64][GlobalISel] Add patterns for scalar sqdmlal/sqdmlsl (#187246)
SQMLAL's instruction selection patterns don't work for GlobalISel when
the intrinsic has scalar operands. This is because the intrinsic has a
slightly different name (int_aarch64_neon_sqdmulls_scalar). As a result,
this leads to sub-optimal code generation.
This patch allows sqdmulls_scalar to lower, and adds GlobalISel versions
of the TableGen patterns to provide this optimisation.
The pattern added performs this mapping:
`SQADD(a, SQDMULL(b,c)) -> SQDMLAL(a, b, c) [And equivalent for
subtraction]`
[X86] Add DAG combine to fold promoted f32 sequences for f16 fneg and fabs (#189395)
This patch optimizes f16 fneg and fabs on X86 targets by introducing
a DAG combine to identify and collapse fpext -> fneg/fabs -> fptrunc.
Generally f16 operations are promoted to f32. For bitwise-equivalent
operations like fneg and fabs, this results in unnecessary and
expensive f32 library calls (__extendhfsf2 / __truncsfhf2) or
hardware conversions (vcvtph2ps / vcvtps2ph) at -O0.
Fixes: https://github.com/llvm/llvm-project/issues/188201
---------
Co-authored-by: Phoebe Wang <phoebe.wang at intel.com>
AMDGPU: Use SmallSet for VOPD scalar reg tracking
Use SmallSet instead of SmallVector for UniqueScalarRegs.
VCC_LO was pushed without uniqueness check, so when both
components used VCC implicitly it was counted twice,
rejecting valid VOPD pairings.
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>