LLVM/project 543ec35orc-rt/include/orc-rt Session.h, orc-rt/lib/executor Session.cpp

[orc-rt] Add managed-code-calls TaskGroup. (#190740)

Adds a ManagedCodeCallsGroup TaskGroup to Session, and updates the
shutdown sequence to wait until all calls into managed code have
completed before proceeding to shut down the Session's Services and the
Session itself.

To support safe calls into managed code two new helper template methods
are added:

callManagedCodeSync attempts to acquire a TaskGroup::Token for the
ManagedCodeCallsGroup before calling the given function and returning
its result.

callManagedCodeAsync attempts to acquire a TaskGroup::Token for the
ManagedCodeCallsGroup before calling the given async function. The
wrapped Return call for the async function will carry the acquired
Token, ensuring that shutdown waits for the async Return call to be
destroyed (whether or not it's actually called).
DeltaFile
+218-1orc-rt/unittests/SessionTest.cpp
+126-2orc-rt/include/orc-rt/Session.h
+26-17orc-rt/lib/executor/Session.cpp
+370-203 files

LLVM/project 546787emlir/include/mlir/Dialect/SPIRV/IR SPIRVTosaTypes.td, mlir/test/Dialect/SPIRV/IR tosa-ops-verification.mlir

[mlir][spirv] Fix SPIRV TOSA per-channel rescale length verification (#190748)

`TensorLengthMatchesPerChannel` was checking `rank(input) - 1` instead
of `input_shape[rank(input) - 1]`. Fix the predicate and update the
rescale verifier tests accordingly.

Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
DeltaFile
+8-8mlir/test/Dialect/SPIRV/IR/tosa-ops-verification.mlir
+2-2mlir/include/mlir/Dialect/SPIRV/IR/SPIRVTosaTypes.td
+10-102 files

LLVM/project aa4c76dmlir/include/mlir/IR BuiltinDialectBytecode.td, mlir/unittests/Bytecode BytecodeTest.cpp

[mlir][BytecodeReader] Fix crash reading FusedLoc with empty locations (#189228)

FusedLoc::get(context, locs) may return UnknownLoc when locs is empty
and no metadata is provided. The bytecode reader's cBuilder used
cast<FusedLoc>() on this result, which crashes with an assertion
failure.

Fix by giving the FusedLoc DialectAttribute its own cBuilder that passes
Attribute() explicitly, causing getChecked<FusedLoc> to call the
two-parameter storage constructor directly and always produce a
FusedLoc.

Fixes #99626

Assisted-by: Claude Code
DeltaFile
+26-0mlir/unittests/Bytecode/BytecodeTest.cpp
+6-2mlir/include/mlir/IR/BuiltinDialectBytecode.td
+32-22 files

LLVM/project fe8a597clang/lib/ExtractAPI DeclarationFragments.cpp, clang/test/ExtractAPI type-alias.cpp

[clang][ExtractAPI] emit correct spelling for type aliases (#134007)

Previously, C++11 type aliases were serialized using "typedef"
regardless of the source spelling.
This checks if the TypedefNameDecl is actually a TypeAliasDecl and
corrects the spelling.
DeltaFile
+56-0clang/test/ExtractAPI/type-alias.cpp
+19-7clang/lib/ExtractAPI/DeclarationFragments.cpp
+1-1clang/test/Index/extract-api-cursor-cpp.cpp
+76-83 files

LLVM/project de6d86cflang/include/flang/Semantics openmp-utils.h, flang/lib/Semantics openmp-utils.cpp

[flang][OpenMP] Use OmpDirectiveSpecifications in helper functions (#190644)

This will make them more reusable, for example when processing APPLY
clause in the future.

Issue: https://github.com/llvm/llvm-project/issues/185287
DeltaFile
+45-54flang/lib/Semantics/openmp-utils.cpp
+3-3flang/include/flang/Semantics/openmp-utils.h
+48-572 files

LLVM/project 3f583d4llvm/lib/Transforms/Vectorize VPlanTransforms.cpp LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize find-last-iv-sinkable-expr.ll find-last-iv-sinkable-expr-epilogue.ll

[VPlan] Optimize FindLast of (binop %IV, live-in) by sinking. (#183911)

When we are finding the last occurrence of a value of an expression that
depends on an induction, we can vectorize this by just selecting the IV
and sinking the expression in the middle block

This follows one of @ayalz's suggestions during earlier discussions for
adding support for CAS/FindLast patterns.

This patch starts with the simplest case, where the selected value is a
simple binary expression of a wide IV and a loop-invariant operand.

This should always be profitable, as the current restriction to binary
operators ensures that the width of the wide IV matches the original
reduction width, we won't introduce any new, wider reduction phi
recipes, and remove the boolean reduction + the horizontal reduction in
the loop.

PR: https://github.com/llvm/llvm-project/pull/183911
DeltaFile
+480-32llvm/test/Transforms/LoopVectorize/find-last-iv-sinkable-expr.ll
+105-11llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+61-12llvm/test/Transforms/LoopVectorize/AArch64/find-last-iv-sinkable-expr-epilogue.ll
+24-30llvm/test/Transforms/LoopVectorize/find-last-iv-sinkable-expr-epilogue.ll
+19-28llvm/test/Transforms/LoopVectorize/iv-select-cmp-decreasing.ll
+20-3llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+709-1166 files

LLVM/project db3302dllvm/lib/CodeGen Rematerializer.cpp, llvm/unittests/CodeGen RematerializerTest.cpp

[CodeGen] Fix incorrect rematerialization order in rematerializer (#189485)

When rematerializing DAGs of registers wherein multiple paths exist
between some regsters of the DAG, it is possible that the
rematerialization determines an incorrect rematerialization order that
does not ensure that a register's dependencies are rematerialized before
itself; an invariant that is otherwise required.

This fixes that using a simpler recursive logic to determine a correct
rematerialization order that honors this invariant. A minimal unit test
is added that fails on the current implementation.
DeltaFile
+19-33llvm/lib/CodeGen/Rematerializer.cpp
+38-0llvm/unittests/CodeGen/RematerializerTest.cpp
+57-332 files

LLVM/project b79a6b5llvm/lib/Target/AMDGPU SIISelLowering.cpp

Review comments:
use input wave instruction for checks
DeltaFile
+7-7llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-71 files

LLVM/project b09f286llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

[AMDGPU] DPP wave reduction for long types - 2

Supported Ops: `add`, `sub`
DeltaFile
+1,113-146llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+1,079-142llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+72-20llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,264-3083 files

LLVM/project bac21bellvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fsub.ll llvm.amdgcn.reduce.fadd.ll

[AMDGPU] DPP wave reduction for double types - 2

Supported Ops: `fadd` and `fsub`
DeltaFile
+1,030-130llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fsub.ll
+1,008-130llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fadd.ll
+12-10llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,050-2703 files

LLVM/project 44172dallvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fmax.ll llvm.amdgcn.reduce.fmin.ll

[AMDGPU] DPP wave reduction for double types - 1

Supported Ops: `fmin` and `fmax`
DeltaFile
+1,112-234llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmax.ll
+1,112-234llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmin.ll
+27-13llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,251-4813 files

LLVM/project 1639f1fllvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.xor.ll llvm.amdgcn.reduce.and.ll

[AMDGPU] DPP wave reduction for long types - 3

Supported Ops: `and`, `or`, `xor`
DeltaFile
+984-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+960-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.and.ll
+960-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.or.ll
+12-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,916-3494 files

LLVM/project a8dff3ellvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.max.ll llvm.amdgcn.reduce.min.ll

[AMDGPU] DPP wave reduction for long types - 1

Supported Ops: `min`, `max`, `umin`, `umax`
DeltaFile
+1,084-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.max.ll
+1,084-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.min.ll
+1,044-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll
+1,044-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umin.ll
+185-43llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+4,441-4755 files

LLVM/project e129d09llvm/lib/Target/AMDGPU SIISelLowering.cpp

Avoid capturing the structed binding.
DeltaFile
+3-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+3-11 files

LLVM/project cb1a912llvm/test/CodeGen/Generic 2010-11-04-BigByval.ll

[llvm] Mark Darwin arm64 to UNSUPPORTED for 2010-11-04-BigByval.ll (#190594)

Update AArch64 UNSUPPORTED on CodeGen/Generic/2010-11-04-BigByval.ll to
include Darwin, where it is referred to as arm64 rather than aarch64.
DeltaFile
+1-0llvm/test/CodeGen/Generic/2010-11-04-BigByval.ll
+1-01 files

LLVM/project 668bf98mlir/lib/Dialect/Vector/IR VectorOps.cpp, mlir/test/Dialect/Vector canonicalize.mlir

[mlir][vector] Replace unused shuffle operands / results with poison
DeltaFile
+70-0mlir/test/Dialect/Vector/canonicalize.mlir
+47-2mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+6-4mlir/test/Dialect/XeGPU/xegpu-vector-linearize.mlir
+123-63 files

LLVM/project c1d6eeallvm/docs ReleaseNotes.md

llvm][docs] Cleanup LLDB release notes (#190760)

* A few items were in the wrong place.
* FreeBSD batch mode check was removed in
d0f5df111865ea4bb9d7d6ff35b517ee1aa7402f.
* Mark some names as plaintext.
* Fix some spellings.
DeltaFile
+7-7llvm/docs/ReleaseNotes.md
+7-71 files

LLVM/project 36fa27fllvm/lib/Target/AArch64 AArch64InstrInfo.td, llvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp

[AArch64][GlobalISel] Add patterns for scalar sqdmlal/sqdmlsl (#187246)

SQMLAL's instruction selection patterns don't work for GlobalISel when
the intrinsic has scalar operands. This is because the intrinsic has a
slightly different name (int_aarch64_neon_sqdmulls_scalar). As a result,
this leads to sub-optimal code generation.
This patch allows sqdmulls_scalar to lower, and adds GlobalISel versions
of the TableGen patterns to provide this optimisation.

The pattern added performs this mapping:
`SQADD(a, SQDMULL(b,c)) -> SQDMLAL(a, b, c) [And equivalent for
subtraction]`
DeltaFile
+115-48llvm/test/CodeGen/AArch64/arm64-vmul.ll
+24-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+1-2llvm/test/CodeGen/AArch64/arm64-int-neon.ll
+1-0llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+141-504 files

LLVM/project 95f1730libsycl/include/sycl/__impl event.hpp, libsycl/src event.cpp queue.cpp

apply comments

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+10-10libsycl/src/detail/event_impl.hpp
+3-6libsycl/src/detail/event_impl.cpp
+1-3libsycl/src/detail/queue_impl.hpp
+1-1libsycl/src/event.cpp
+1-1libsycl/include/sycl/__impl/event.hpp
+1-1libsycl/src/queue.cpp
+17-221 files not shown
+17-237 files

LLVM/project aaee467libsycl/src/detail program_manager.cpp device_binary_structures.hpp

fix comments

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+3-3libsycl/src/detail/program_manager.cpp
+1-3libsycl/src/detail/device_binary_structures.hpp
+0-1libsycl/src/detail/queue_impl.cpp
+4-73 files

LLVM/project eb36feflibsycl/docs index.rst, libsycl/include/sycl/__impl event.hpp

[libsycl] add sycl::event and wait functionality to event  & queue

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+90-0libsycl/include/sycl/__impl/event.hpp
+68-0libsycl/src/detail/event_impl.hpp
+39-0libsycl/src/detail/event_impl.cpp
+25-0libsycl/src/event.cpp
+13-1libsycl/src/detail/queue_impl.cpp
+9-2libsycl/docs/index.rst
+244-35 files not shown
+264-411 files

LLVM/project 75a3379llvm/lib/Target/AMDGPU SIISelLowering.cpp

Review comments:
use input wave instruction for checks
DeltaFile
+7-7llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-71 files

LLVM/project f0593abcmake/Modules LLVMVersion.cmake, libcxx/include __config

Bump version to 22.1.4
DeltaFile
+1-1cmake/Modules/LLVMVersion.cmake
+1-1libcxx/include/__config
+1-1llvm/utils/gn/secondary/llvm/version.gni
+1-1llvm/utils/mlgo-utils/mlgo/__init__.py
+1-1llvm/utils/lit/lit/__init__.py
+5-55 files

LLVM/project b914982llvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV/rvv vfdiv-vp.ll fixed-vectors-vfdiv-vp.ll

[RISCV] Remove codegen for vp_fdiv (#190591)

Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off the vp.fdiv intrinsic from #179622.
DeltaFile
+344-594llvm/test/CodeGen/RISCV/rvv/vfdiv-vp.ll
+135-143llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfdiv-vp.ll
+45-45llvm/test/CodeGen/RISCV/rvv/vfrdiv-vp.ll
+36-36llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfrdiv-vp.ll
+8-12llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll
+1-4llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+569-8342 files not shown
+573-8358 files

LLVM/project 048a5e7llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fsub.ll llvm.amdgcn.reduce.fadd.ll

[AMDGPU] DPP wave reduction for double types - 2

Supported Ops: `fadd` and `fsub`
DeltaFile
+1,030-130llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fsub.ll
+1,008-130llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fadd.ll
+12-10llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,050-2703 files

LLVM/project c18c909llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.fmax.ll llvm.amdgcn.reduce.fmin.ll

[AMDGPU] DPP wave reduction for double types - 1

Supported Ops: `fmin` and `fmax`
DeltaFile
+1,112-234llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmax.ll
+1,112-234llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.fmin.ll
+27-13llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,251-4813 files

LLVM/project 55a3382llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 half-fneg-fabs.ll

[X86] Add DAG combine to fold promoted f32 sequences for f16 fneg and fabs (#189395)

This patch optimizes f16 fneg and fabs on X86 targets by introducing 
a DAG combine to identify and collapse fpext -> fneg/fabs -> fptrunc.

Generally f16 operations are promoted to f32. For bitwise-equivalent 
operations like fneg and fabs, this results in unnecessary and 
expensive f32 library calls (__extendhfsf2 / __truncsfhf2) or 
hardware conversions (vcvtph2ps / vcvtps2ph) at -O0.

Fixes: https://github.com/llvm/llvm-project/issues/188201

---------

Co-authored-by: Phoebe Wang <phoebe.wang at intel.com>
DeltaFile
+79-0llvm/test/CodeGen/X86/half-fneg-fabs.ll
+14-4llvm/lib/Target/X86/X86ISelLowering.cpp
+93-42 files

LLVM/project 28faaffllvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.xor.ll llvm.amdgcn.reduce.and.ll

[AMDGPU] DPP wave reduction for long types - 3

Supported Ops: `and`, `or`, `xor`
DeltaFile
+984-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+960-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.and.ll
+960-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.or.ll
+12-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,916-3494 files

LLVM/project 00f35a1llvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

[AMDGPU] DPP wave reduction for long types - 2

Supported Ops: `add`, `sub`
DeltaFile
+1,113-146llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+1,079-142llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+72-20llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2,264-3083 files

LLVM/project a455799llvm/test/CodeGen/AMDGPU frem.ll llvm.minimum.f16.ll

AMDGPU: Use SmallSet for VOPD scalar reg tracking

Use SmallSet instead of SmallVector for UniqueScalarRegs.
VCC_LO was pushed without uniqueness check, so when both
components used VCC implicitly it was counted twice,
rejecting valid VOPD pairings.

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+38-41llvm/test/CodeGen/AMDGPU/frem.ll
+23-25llvm/test/CodeGen/AMDGPU/llvm.minimum.f16.ll
+23-25llvm/test/CodeGen/AMDGPU/llvm.maximum.f16.ll
+13-14llvm/test/CodeGen/AMDGPU/llvm.maximum.f32.ll
+13-14llvm/test/CodeGen/AMDGPU/llvm.minimum.f32.ll
+8-11llvm/test/CodeGen/AMDGPU/fmed3.ll
+118-1308 files not shown
+153-17014 files