LLVM/project 0b1fe74mlir/include/mlir/Dialect/Bufferization/IR UnstructuredControlFlow.h, mlir/lib/Dialect/Bufferization/IR BufferizableOpInterface.cpp

[mlir][bufferization] Introduce reconcileBufferTypeMismatchFn hook (#202667)

This PR is the first part of the work that aims to allow customizations
in resolving mismatching buffer types.

Add a new bufferization hook that lets downstream bufferization
implementations define how to handle buffer mismatches that appear
during type inference in various upstream scenarios.

The hook is used as a fallback mechanism in several upstream operations.
For example, when bufferizing block signatures (scf.execute_region), and
resolving "branch" conflicts (scf.if, scfl.index_switch, scf.for,
arith.select).

The hook returns a valid buffer type when reconciliation succeeded;
failure indicates reconciliation failure and should be treated as
bufferization failure. The caller of the hook is expected to use the
returned buffer type. By default, a memref with fully-dynamic layout map
is returned (for unranked case, buffers are assumed to match).

    [3 lines not shown]
DeltaFile
+282-0mlir/test/Dialect/Bufferization/Transforms/test-one-shot-module-bufferize.mlir
+100-3mlir/test/lib/Dialect/Test/TestOps.td
+35-50mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
+71-0mlir/test/lib/Dialect/Test/TestOpDefs.cpp
+40-13mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
+24-26mlir/include/mlir/Dialect/Bufferization/IR/UnstructuredControlFlow.h
+552-926 files not shown
+613-11812 files

LLVM/project b5dfea0llvm/lib/Target/ARM ARMBaseInstrInfo.cpp ARMRegisterInfo.td, llvm/test/CodeGen/ARM machine-outliner-thunk-tcgpr.mir

[ARM] Fix Machine Outliner crash when tBLXr uses non-tcGPR register  (#200684)

When the Machine Outliner selects MachineOutlinerThunk mode for a
sequence ending in tBLXr/tBLXr_noip, it converts the indirect call to
tTAILJMPr in buildOutlinedFrame. However tTAILJMPr requires its operand
to be in tcGPR (R0-R3, R12), while tBLXr accepts any GPR.

If the register is callee-saved (e.g. r4), the Machine Verifier crashes
with 'Illegal physical register for instruction'.

Fixes #188076 
DeltaFile
+40-0llvm/test/CodeGen/ARM/machine-outliner-thunk-tcgpr.mir
+5-3llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
+3-2llvm/lib/Target/ARM/ARMRegisterInfo.td
+48-53 files

LLVM/project 3aeb6c1mlir/include/mlir/Dialect/SPIRV/IR SPIRVCompositeOps.td SPIRVBase.td, mlir/test/Dialect/SPIRV/IR composite-ops.mlir

[mlir][spirv] Re-enable bf16/fp8 for vector composite ops (#204848)

Allow bf16 and fp8 vector element types in VectorExtractDynamic,
VectorInsertDynamic, and VectorShuffle.
DeltaFile
+54-0mlir/test/Dialect/SPIRV/IR/composite-ops.mlir
+8-8mlir/include/mlir/Dialect/SPIRV/IR/SPIRVCompositeOps.td
+6-0mlir/include/mlir/Dialect/SPIRV/IR/SPIRVBase.td
+68-83 files

LLVM/project 78fc168clang/lib/Driver/ToolChains Clang.cpp

clang: Use the effective triple string for offload jobs

Track the future effective triple for the job, rather than
the toolchain's default triple. In the future this will
change the result when amdgpu starts adjusting the triples
to contain subarches.
DeltaFile
+12-7clang/lib/Driver/ToolChains/Clang.cpp
+12-71 files

LLVM/project b0bf2c7clang/docs ConstantInterpreter.rst

[clang][bytecode] Update high-level documentation (#202596)
DeltaFile
+85-53clang/docs/ConstantInterpreter.rst
+85-531 files

LLVM/project 6daa021mlir/lib/Dialect/Tosa/IR TosaOps.cpp, mlir/test/Dialect/Tosa verifier.mlir

[mlir][tosa] Check same input/output types in pooling ops verifier (#203565)

Adds a missing check to make sure the input and output types of pooling
ops have the same element type.
DeltaFile
+22-0mlir/test/Dialect/Tosa/verifier.mlir
+8-5mlir/lib/Dialect/Tosa/IR/TosaOps.cpp
+30-52 files

LLVM/project f782f54llvm/include/llvm/Transforms/Scalar Reassociate.h, llvm/lib/Transforms/Scalar Reassociate.cpp

review
DeltaFile
+6-1llvm/include/llvm/Transforms/Scalar/Reassociate.h
+2-3llvm/lib/Transforms/Scalar/Reassociate.cpp
+8-42 files

LLVM/project 6be53abllvm/lib/Target/AArch64/GISel AArch64InstructionSelector.cpp, llvm/test/CodeGen/AArch64 pr204118.ll

[AArch64][GISel] Remove hard-coded operand index from FCVT renderers (#204118)
DeltaFile
+28-0llvm/test/CodeGen/AArch64/GlobalISel/pr204118.mir
+13-0llvm/test/CodeGen/AArch64/pr204118.ll
+2-2llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+43-23 files

LLVM/project 0972d5cclang/include/clang/AST Decl.h

[clang][NFC] Add LLVM_PREFERRED_TYPE to EvaluatedStmt bitfields (#205026)
DeltaFile
+16-8clang/include/clang/AST/Decl.h
+16-81 files

LLVM/project bdde0e5. .gitignore

gitignore: Add emacs lock files
DeltaFile
+2-0.gitignore
+2-01 files

LLVM/project e4df739llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.h AMDGPUInstructionSelector.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.h

[AMDGPU] Remove stale declarations. NFC. (#205047)

Remove declarations of functions that are never defined. Also remove
unused field AMDGPUInstructionSelector::TM.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
DeltaFile
+0-21llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+1-5llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
+2-4llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+0-5llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
+0-2llvm/lib/Target/AMDGPU/R600ISelLowering.h
+0-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
+3-392 files not shown
+4-418 files

LLVM/project 1eb70f1clang/lib/Driver Driver.cpp, clang/lib/Driver/ToolChains CommonArgs.cpp AMDGPU.cpp

clang/AMDGPU: Use effective triple instead of raw toolchain triple

Start using the effective triple instead of the raw toolchain triple.
For the moment this is NFC, but will change when new uses of the subarch
field are introduced.
DeltaFile
+3-2clang/lib/Driver/ToolChains/CommonArgs.cpp
+2-2clang/lib/Driver/ToolChains/AMDGPU.cpp
+1-1clang/lib/Driver/Driver.cpp
+1-1clang/lib/Driver/ToolChains/HIPAMD.cpp
+7-64 files

LLVM/project 5951dafllvm/lib/Analysis LoopAccessAnalysis.cpp, llvm/test/Transforms/LoopVectorize scalable-first-order-recurrence.ll scalable-lifetime.ll

[LV] Allow scalable VFs in `-force-vector-width` (and use in tests) (#204953)

This updates `-force-vector-width=VF` to accept scalable VFs. If a
scalable width is specified it is assumed the target supports scalable
vectors.

So for example, `-force-vector-width="vscale x 4"` works as a shorthand
for `-scalable-vectorization=always -force-target-supports-scalable-vectors=true -force-vector-width=4`.
DeltaFile
+8-11llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll
+8-8llvm/lib/Analysis/LoopAccessAnalysis.cpp
+3-8llvm/test/Transforms/LoopVectorize/scalable-lifetime.ll
+4-7llvm/test/Transforms/LoopVectorize/scalable-assume.ll
+4-6llvm/test/Transforms/LoopVectorize/scalable-noalias-scope-decl.ll
+2-8llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll
+29-4810 files not shown
+49-6716 files

LLVM/project 5492d06llvm/include/llvm/Analysis TargetFolder.h, llvm/lib/Analysis ConstantFolding.cpp

[IRBuilder] (Target|InstSimplify)-fold intrinsics (#204967)

Includes changes to guard against a nullptr TLI and Call. TargetFold or
InstSimplify fold in IRBuilderBase::CreateIntrinsic, in the same way we
fold in Create(Unary|Binary)Intrinsic.
DeltaFile
+95-105llvm/test/CodeGen/VE/Scalar/atomic_cmp_swap.ll
+71-75llvm/lib/Analysis/ConstantFolding.cpp
+15-18llvm/test/Transforms/IRCE/correct-loop-info.ll
+13-6llvm/lib/IR/IRBuilder.cpp
+9-10llvm/test/Transforms/VectorCombine/X86/shuffle-of-intrinsics.ll
+6-13llvm/include/llvm/Analysis/TargetFolder.h
+209-22719 files not shown
+256-33925 files

LLVM/project ff0e4f0llvm/lib/Target/AArch64 AArch64ISelLowering.cpp

[AArch64][NFC] Refactor duplicate code into getCmpOrCmnOperandFoldingProfit (#198981)
DeltaFile
+9-7llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+9-71 files

LLVM/project fdd453dllvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

fix debuginfo issue that led to failure of offloading/bug51781.c
DeltaFile
+9-1llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+9-11 files

LLVM/project 18bb247clang/test/OpenMP nvptx_teams_reduction_codegen.cpp target_teams_reduction_codegen.cpp, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

Reapply "[OpenMP][offload] Cross-team reductions with variable number of teams" (#204914)

This reverts commit 4c16440e1edc00cd1b5a64944fc651064fe6425b.
DeltaFile
+0-3,642clang/test/OpenMP/nvptx_teams_reduction_codegen.cpp
+2,331-0clang/test/OpenMP/target_teams_reduction_codegen.cpp
+155-169openmp/device/src/Reduction.cpp
+144-73llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+60-60clang/test/OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp
+60-60clang/test/OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp
+2,750-4,004168 files not shown
+4,266-5,534174 files

LLVM/project 64ad10fllvm/docs AMDGPUExecutionSynchronization.rst

[AMDGPU][doc] Refactor Barrier Execution Model (#204566)

Remove everything that has to do with named barriers and put it in a
series of model extensions specific to /sbarrier/named-barriers.

I had to change a few things to make it fit, in summary:

Base Model:

- (~) Stylistic changes that make it easier to refer to specific rules.
Each rule is in a rubric instead of a bullet point.
- (-) No longer defines `barrier-mutually-exclusive`
- (-) No longer defines barrier `join` and any associated rule.

New named barrier extensions

- (+) Define "named barrier" as a sub-type of barrier objects. This
makes barrier-mutually-exclusive redundant.
- (+) Define barrier join as an op that can exclusively be done on

    [17 lines not shown]
DeltaFile
+200-154llvm/docs/AMDGPUExecutionSynchronization.rst
+200-1541 files

LLVM/project 25e4057clang/include/clang/Options Options.td, clang/lib/Driver/ToolChains Clang.cpp

[clang] Respect `CLANG_USE_EXPERIMENTAL_CONST_INTERP` (#200716)

Seems like https://github.com/llvm/llvm-project/pull/199396 had no
effect at all, even though the patch itself seems pretty obvious.


Change the semantics of the command-line option to support
`-fno-experimental-constant-interpreter` as well. This way, the cmake
option can be used to set the default and the `-f`/`-fno-` command-line
options can be used to override the default behavior.
DeltaFile
+28-0clang/test/AST/ByteCode/command-line-options.cpp
+9-2clang/lib/Driver/ToolChains/Clang.cpp
+4-4clang/include/clang/Options/Options.td
+41-63 files

LLVM/project 057c1ceflang/lib/Lower PFTBuilder.cpp, flang/test/Lower do_loop_unstructured.f90 do_loop_execute_region_wrap.f90

[flang][PFT-to-MLIR] Wrap unstructured Fortran constructs in scf.execute_region

Extend the PFT-to-MLIR (HLFIR/FIR) lowering so unstructured DO and IF
constructs are emitted inside scf.execute_region, hiding their multi-block
CFG behind a single op. OpenACC and OpenMP lowerings that reject
multi-block content (e.g. the "unstructured do loop in combined acc
construct" TODO in OpenACC.cpp) now see a structured op instead.

Flag: -mmlir --wrap-unstructured-constructs-in-execute-region (default on).

An evaluation is wrappable iff all of the following hold:

  * wrap flag on
  * eval is parser::DoConstruct or parser::IfConstruct
  * eval.isUnstructured
  * branchesAreInternal(eval) -- every controlSuccessor in the subtree
    targets a nested eval or the constructExit
  * !hasIncomingBranch(eval) -- no outside eval branches into the body
    (PFT's synthetic IfConstruct around `if(c) goto X` absorbs label

    [14 lines not shown]
DeltaFile
+103-102flang/test/Lower/OpenMP/unstructured.f90
+199-2flang/lib/Lower/PFTBuilder.cpp
+115-24flang/test/Lower/OpenACC/acc-unstructured.f90
+38-87flang/test/Lower/do_loop_unstructured.f90
+111-0flang/test/Lower/do_loop_execute_region_wrap.f90
+47-61flang/test/Lower/mixed_loops.f90
+613-27621 files not shown
+958-43627 files

LLVM/project 6d66cc1orc-rt/include/orc-rt SimplePackedSerialization.h, orc-rt/unittests SimplePackedSerializationTest.cpp

[orc-rt] Add SPS serialization for ExecutorAddrRange. (#205041)

Allows SPS serialization to/from ExecutorAddrRange. This will be used in
upcoming patches for compact-unwind registration support.
DeltaFile
+20-0orc-rt/include/orc-rt/SimplePackedSerialization.h
+6-0orc-rt/unittests/SimplePackedSerializationTest.cpp
+26-02 files

LLVM/project 34a3377flang/lib/Optimizer/Transforms FIRToMemRef.cpp, flang/test/Transforms/FIRToMemRef slice-projected.mlir

[FIR] Route embox + projected complex slice through shapeVec

When the array_coor base is a fir.embox with a projected complex %re/%im
slice, take the shapeVec path instead of the descriptor (fir.box_dims)
path. The descriptor path iterates source-rank dims while querying the
rank-reduced embox result box, which miscompiles slices that collapse
dims (e.g. complex(:,k)%re). For embox-derived boxes the underlying
storage is contiguous, so the shape-derived layout is both correct and
the natural place to encode that static shape is available. Non-embox
boxes (rebox, assumed-shape) still go through fir.box_dims.

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+19-51flang/test/Transforms/FIRToMemRef/slice-projected.mlir
+2-13flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
+21-642 files

LLVM/project 15a3238llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 extend-bool-vector-load.ll

[AArch64] Lower extends of boolean vector loads via scalar load (#203394)

Replace a `load <N x i1>` under a sext/zext with a scalar load +
bitcast, so the `combineToExtendBoolVectorInReg` helper can apply,
avoiding scalarization.

Optimisation for the SVE case with a predicate load to be added in a
follow up.

Fixes #200325
DeltaFile
+294-0llvm/test/CodeGen/AArch64/extend-bool-vector-load.ll
+47-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+341-02 files

LLVM/project 7b0d45bclang/include/clang/Basic AttrDocs.td, clang/test/CodeGen/AMDGPU amdgcn-av-none-attr-c-atomic.c

fix attr doc; add test that got left behind
DeltaFile
+72-0clang/test/CodeGen/AMDGPU/amdgcn-av-none-attr-c-atomic.c
+8-9clang/include/clang/Basic/AttrDocs.td
+80-92 files

LLVM/project de045d5orc-rt/include/orc-rt SimplePackedSerialization.h

[orc-rt] Tidy up some SPS tag types. NFC. (#205038)

Replaces class definitions with decls for tag types that don't need a
body, and moves the SPSError tag down to just above it's
serialization-traits class.
DeltaFile
+5-5orc-rt/include/orc-rt/SimplePackedSerialization.h
+5-51 files

LLVM/project 9f0b22cllvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchISelLowering.h, llvm/test/CodeGen/LoongArch/ir-instruction double-convert.ll float-convert.ll

[LoongArch] Custom scalar UINT_TO_FP and FP_TO_UINT with LSX instructions (#200901)

Using `vftintrz.lu.d` for converting scalar double/float values to
unsigned 64-bit integers, and `vffint.d.lu` vice versa.
DeltaFile
+51-2llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+7-26llvm/test/CodeGen/LoongArch/ir-instruction/double-convert.ll
+7-25llvm/test/CodeGen/LoongArch/ir-instruction/float-convert.ll
+1-0llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+66-534 files

LLVM/project 1092b2bllvm/docs AMDGPUAsyncOperations.rst

[AMDGPU] Improve the description of asyncmark semantics (#202579)

- The semantics of asyncmarks is now defined purely in terms of
sequences, without referring to the implementation.
- The examples incorrectly used (post)dominance. Fixed that with wording
in terms of asyncmark sequences.
DeltaFile
+122-71llvm/docs/AMDGPUAsyncOperations.rst
+122-711 files

LLVM/project 6a2128allvm/include/llvm/ProfileData SampleProfReader.h, llvm/lib/ProfileData SampleProfReader.cpp

[ProfileData] Lazy-load fixed-length MD5 name table (#202014)

When reading extensible binary format profiles with fixed-length MD5
name tables, the reader eagerly allocates and populates a
std::vector<FunctionId> to store the name table.  This eager loading
is particularly wasteful when ProfileIsCS is false, as we populate the
entire name table just to support lookups during profile ingestion,
even though we may only use a subset of the profile.  Since FunctionId
is 16 bytes on 64-bit systems, a name table containing 10 million MD5
hash values would consume 160MB of heap memory.

This patch implements lazy loading for the name table in extensible
binary format profiles when the fixed-length MD5 layout is used.

Specifically, this patch introduces SampleProfileNameTable to
encapsulate the name table representation, supporting both lazy
loading (pointing directly to the memory-mapped buffer) and eager
loading (using a vector).  Eager loading is retained as a fallback for
layouts that do not support O(1) random access (such as

    [11 lines not shown]
DeltaFile
+106-24llvm/include/llvm/ProfileData/SampleProfReader.h
+23-14llvm/lib/ProfileData/SampleProfReader.cpp
+129-382 files

LLVM/project 776a626flang/lib/Optimizer/Transforms FIRToMemRef.cpp, flang/test/Transforms/FIRToMemRef slice-projected.mlir

[FIR] Route embox+projected slice through shapeVec in FIRToMemRef

The descriptor-strides path iterates source-rank dims but queries the
rank-reduced embox result box, miscompiling slices that collapse dims
(e.g. complex %re/%im on b(:,k)). For embox-derived boxes the underlying
storage is contiguous, so the shape-derived layout is both correct and
the natural place to encode "static shape information is available."

Drop the `|| hasProjectedSlice` carve-out from boxNeedsDescriptorStrides
so projection cases also take the shapeVec path. Non-embox boxes
(rebox, assumed-shape) still go through fir.box_dims because their
storage may be non-contiguous.

Fixes the SIGSEGV at -O0 -lro and miscompile at -O1 -lro on the Fujitsu
0086_0019 reproducer (complex(:,k)%re inside WHERE).

Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+19-51flang/test/Transforms/FIRToMemRef/slice-projected.mlir
+2-13flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
+21-642 files

LLVM/project 03f9dceflang/test/Transforms/FIRToMemRef slice-projected.mlir

xxx
DeltaFile
+9-9flang/test/Transforms/FIRToMemRef/slice-projected.mlir
+9-91 files