LLVM/project 0f5e9beflang/lib/Optimizer/OpenMP LowerWorkshare.cpp, flang/test/Integration/OpenMP workshare-forall-sliced-array.f90

[flang][OpenMP] Fix crash when a sliced array is specified in a forall within a workshare construct (#170913)

This is a fix for two problems that caused a crash:

1. Thread-local variables sometimes are required to be parallelized.
Added a special case to handle this in
`LowerWorkshare.cpp:isSafeToParallelize`.
2. Race condition caused by a `nowait` added to the `omp.workshare` if
it is the last operation in a block. This allowed multiple threads to
execute the `omp.workshare` region concurrently. Since
_FortranAPushValue modifies a shared stack, this concurrent access
causes a crash. Disable the addition of `nowait` and rely on the
implicit barrier at the the of the `omp.workshare` region.

Fixes #143330
DeltaFile
+294-0flang/test/Transforms/OpenMP/lower-workshare-thread-local.mlir
+125-2flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+57-0flang/test/Integration/OpenMP/workshare-forall-sliced-array.f90
+476-23 files

LLVM/project c7ddb30llvm/lib/Target/NVPTX NVPTXInstrInfo.td, llvm/test/CodeGen/NVPTX fma-relu-contract.ll

[NVPTX] Remove `NoNaNsFPMath` uses (#183447)

Remove `NoNaNsFPMath` uses, use only `nnan`.
DeltaFile
+36-38llvm/test/CodeGen/NVPTX/fma-relu-contract.ll
+2-3llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+38-412 files

LLVM/project 4875b06llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/RISCV clmul.ll

Merge branch 'main' into users/kosarev/vcc-tuples
DeltaFile
+84,809-78,780llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+92,827-0llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+25,757-24,820llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+23,597-20,267llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+21,856-18,561llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+25,051-14,920llvm/test/CodeGen/RISCV/clmul.ll
+273,897-157,34812,842 files not shown
+1,250,980-555,13312,848 files

LLVM/project d316fb0llvm/lib/Transforms/Vectorize VPlanUnroll.cpp, llvm/test/Transforms/LoopVectorize float-induction.ll hoist-predicated-loads-with-predicated-stores.ll

[VPlan] Replicate VPScalarIVStepsRecipe by VF outside replicate regions. (#170053)

Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the
patch adds a new lane operand to VPScalarIVStepsRecipe, which is only
added when replicating. This enables removing a number of lane 0
computations. The lane operand will also be used to explicitly replicate
replicate regions in a follow-up.

Depends on https://github.com/llvm/llvm-project/pull/169796
Depends on https://github.com/llvm/llvm-project/pull/170906

PR: https://github.com/llvm/llvm-project/pull/170053
DeltaFile
+706-0llvm/test/Transforms/LoopVectorize/float-induction.ll
+63-55llvm/test/Transforms/LoopVectorize/hoist-predicated-loads-with-predicated-stores.ll
+51-53llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll
+35-42llvm/test/Transforms/LoopVectorize/hoist-predicated-loads.ll
+24-48llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
+49-14llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+928-21281 files not shown
+1,334-74787 files

LLVM/project 839dc4fmlir/include/mlir/Dialect/Bufferization/IR BufferDeallocationOpInterface.h, mlir/lib/Dialect/Bufferization/IR BufferDeallocationOpInterface.cpp

[mlir][bufferization] Fix use-after-free in ownership-based buffer deallocation (#184118)

When `handleInterface(RegionBranchOpInterface)` processes an op such as
`scf.for`, it calls `appendOpResults` to clone the op with extra
ownership result types and erase the original. The `Liveness` analysis
is computed once before the transformation begins and may still
reference the old (now-freed) result values.

If the same block contains a `BranchOpInterface` terminator (e.g.,
`cf.br`) after the structured loop, `handleInterface(BranchOpInterface)`
calls `getMemrefsToRetain`, which iterates `liveness.getLiveOut()`. That
set may contain stale `Value` objects pointing to the erased op's
results. Calling `isMemref()` on such a value dereferences freed memory,
triggering a crash.

Fix by adding a `valueMapping` map to `DeallocationState`. Before
erasing the old op in `handleInterface(RegionBranchOpInterface)`, record
the old-to-new result mapping via `state.mapValue`. The
`getLiveMemrefsIn` and `getMemrefsToRetain` helpers translate stale

    [5 lines not shown]
DeltaFile
+35-5mlir/lib/Dialect/Bufferization/IR/BufferDeallocationOpInterface.cpp
+25-0mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-branchop-interface.mlir
+13-0mlir/include/mlir/Dialect/Bufferization/IR/BufferDeallocationOpInterface.h
+11-0mlir/lib/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation.cpp
+84-54 files

LLVM/project a115e6bmlir/lib/Dialect/Shape/IR Shape.cpp, mlir/test/Dialect/Shape canonicalize.mlir

[mlir][Shape] Fix crash in BroadcastOp::fold when operand is ub.poison (#183931)

BroadcastOp::fold used an unchecked llvm::cast<DenseIntElementsAttr> on
each operand's folded attribute. The existing null-check only guarded
against a missing (unset) attribute, not against a non-null attribute of
a different type such as PoisonAttr (produced when an operand is
ub.poison).

Replace the unchecked casts with dyn_cast_or_null, bailing out with
nullptr (i.e. no fold) when any operand does not provide a
DenseIntElementsAttr.

Add a regression test with a ub.poison operand.

Fixes #179679
DeltaFile
+18-0mlir/test/Dialect/Shape/canonicalize.mlir
+7-7mlir/lib/Dialect/Shape/IR/Shape.cpp
+25-72 files

LLVM/project f69aa4cllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll

Merge remote-tracking branch 'external-upstream/main' into users/mariusz-sikora-at-amd/gfx13/add-vflat
DeltaFile
+84,317-78,372llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+66,293-29,491llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+25,751-24,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+23,663-20,281llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+21,867-18,577llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+19,112-16,445llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+241,003-187,9485,372 files not shown
+662,779-394,6775,378 files

LLVM/project a631af3llvm/include/llvm/CodeGen ValueTypes.h, llvm/include/llvm/CodeGenTypes MachineValueType.h

Add EVT::changeVectorElementCount and MVT:changeVectorElementCount (#182266)

Fixes #174584.
DeltaFile
+12-0llvm/include/llvm/CodeGen/ValueTypes.h
+7-0llvm/include/llvm/CodeGenTypes/MachineValueType.h
+1-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+20-23 files

LLVM/project 24e9618llvm/test/MC/AArch64 tlbip-tlbid-or-d128.s armv9a-tlbip.s

fixup! Fix using Marian's suggestion
DeltaFile
+0-259llvm/test/MC/AArch64/tlbip-tlbid-or-d128.s
+165-0llvm/test/MC/AArch64/armv9a-tlbip.s
+165-2592 files

LLVM/project 3ebb3efmlir/lib/IR BuiltinDialectBytecode.cpp, mlir/test/Bytecode/invalid invalid-dense-elem-type-interface.mlir

[mlir][bytecode] Fix crash when reading DenseIntOrFPElementsAttr with unsupported element type (#184773)

When a bytecode type callback substitutes a type that does not implement
DenseElementTypeInterface (e.g., \!test.i32 replacing i32), the bytecode
reader attempted to reconstruct a DenseIntOrFPElementsAttr with that
type. This unconditionally called getDenseElementBitWidth() which hit an
llvm_unreachable on unsupported types.

Fix this by validating the element type implements
DenseElementTypeInterface in readDenseIntOrFPElementsAttr before
proceeding. If the check fails, a proper diagnostic is emitted and
reading fails gracefully instead of crashing.

Fixes #128317
DeltaFile
+15-0mlir/test/Bytecode/invalid/invalid-dense-elem-type-interface.mlir
+11-0mlir/lib/IR/BuiltinDialectBytecode.cpp
+26-02 files

LLVM/project 34d8d21libclc/opencl/lib/amdgcn/synchronization barrier.cl, libclc/opencl/lib/generic SOURCES

libclc: Define work_group_barrier

Previously only the old barrier name was implemented. Define this
as an indirection around the new name, and move it to common code.
The target implementations are already provided by __clc_work_group_barrier,
so targets were unnecessarily duplicating these.

This also fixes the default scope, which should be
memory_work_group_scope. Previously this was guessing that
if the flags included global memory, it makes the scope
device which is not the case.
DeltaFile
+25-0libclc/opencl/lib/generic/synchronization/barrier.cl
+0-17libclc/opencl/lib/ptx-nvidiacl/synchronization/barrier.cl
+0-17libclc/opencl/lib/amdgcn/synchronization/barrier.cl
+1-1libclc/opencl/lib/generic/async/wait_group_events.cl
+0-1libclc/opencl/lib/ptx-nvidiacl/SOURCES
+1-0libclc/opencl/lib/generic/SOURCES
+27-361 files not shown
+27-377 files

LLVM/project 46dbecdclang/lib/Sema SemaExpr.cpp, clang/test/ParserOpenACC parse-constructs.cpp

[clang] use typo-corrected name qualifier for expressions

Fixes #175783
DeltaFile
+12-0clang/test/SemaCXX/GH175783.cpp
+7-0clang/lib/Sema/SemaExpr.cpp
+2-2clang/test/ParserOpenACC/parse-constructs.cpp
+21-23 files

LLVM/project 9d1d80bllvm/lib/CodeGen/SelectionDAG InstrEmitter.cpp

DAG: Replace legal type check in EmitCopyFromReg (#177788)

It doesn't make sense that an illegal type would get here; a
CopyFromReg cannot be illegally typed. The only exception that
was hit here is in a handful of SystemZ inline assembly tests
for i128, which use untyped. They shouldn't; it should treat
v2i64 as legal instead. Just leave the untyped check for now.
DeltaFile
+5-4llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
+5-41 files

LLVM/project af388bbllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/RISCV clmul.ll

Rebase

Created using spr 1.3.7
DeltaFile
+84,317-78,372llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+66,293-29,491llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+25,751-24,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+24,655-20,149llvm/test/CodeGen/RISCV/clmul.ll
+23,663-20,281llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+21,867-18,577llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+246,546-191,6525,461 files not shown
+722,116-447,1445,467 files

LLVM/project 65d378dlibcxx/docs/ReleaseNotes 23.rst, libcxx/include regex

[libc++] Remove `__wrap_iter::base()` (#179389)

Resolves #126442

- Converts all the relevant functions that used `.base()` into friends
- Fixed usage in `<regex>`

---------

Co-authored-by: A. Jiang <de34 at live.cn>
DeltaFile
+12-12libcxx/include/__iterator/wrap_iter.h
+3-0libcxx/docs/ReleaseNotes/23.rst
+2-1libcxx/include/regex
+17-133 files

LLVM/project 0f59753mlir/lib/Dialect/SparseTensor/Transforms Sparsification.cpp, mlir/test/Dialect/SparseTensor spy_sddmm.mlir

[mlir][sparse] Fix crash in sparsification when unary/binary present block captures sparse tensor argument (#184597)

`relinkBranch` in Sparsification.cpp assumed that any block argument
from the outer `linalg.generic` op encountered inside an inlined
semi-ring branch must be a dense tensor, and asserted accordingly.
However, the `present` block of a `sparse_tensor.unary` (or similar
semi-ring ops) is permitted to capture sparse tensor operands directly
via `isAdmissibleBranchExp`, which accepts any `BlockArgument` as
admissible.

The fix removes the incorrect assertion and extends the load generation
to handle sparse tensors using `genSubscript`, which already knows how
to return the value buffer and current value position via the loop
emitter. The `kSparseIterator` strategy (where `genSubscript` returns a
`TensorType`) is also handled by emitting a
`sparse_tensor.extract_value` op.

Fixes #91183
DeltaFile
+71-0mlir/test/Dialect/SparseTensor/spy_sddmm.mlir
+13-5mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
+84-52 files

LLVM/project 9c87724llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll

Merge remote-tracking branch 'external-upstream/main' into users/mariusz-sikora-at-amd/gfx13/hazard-getreg
DeltaFile
+84,317-78,372llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+66,293-29,491llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+25,751-24,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+23,663-20,281llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+21,867-18,577llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+19,112-16,445llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+241,003-187,9484,730 files not shown
+621,230-368,0104,736 files

LLVM/project 001c049llvm/test/CodeGen/X86 known-pow2.ll

[X86] known-pow2.ll - add zext vector test for #182226 (#184772)

DeltaFile
+30-0llvm/test/CodeGen/X86/known-pow2.ll
+30-01 files

LLVM/project e56b580llvm/lib/Target/SPIRV SPIRVEmitIntrinsics.cpp, llvm/test/CodeGen/SPIRV/extensions/SPV_KHR_float_controls2 exec_mode3.ll

Reapply "[SPIRV] Emit intrinsics for globals only in function that references them (#178143 (#179268)) (#182552)

This reverts commit 395858d9f172ff1c61c661aa7c2a18b449daffa6.

This PR had been reverted due to an unrelated address-sanitizer failure.
DeltaFile
+117-3llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+45-38llvm/test/CodeGen/SPIRV/pointers/fun-with-aggregate-arg-in-const-init.ll
+46-30llvm/test/CodeGen/SPIRV/extensions/SPV_KHR_float_controls2/exec_mode3.ll
+15-15llvm/test/CodeGen/SPIRV/extensions/SPV_NV_shader_atomic_fp16_vector/atomicrmw_fminfmax_vec_float16.ll
+15-15llvm/test/CodeGen/SPIRV/extensions/SPV_NV_shader_atomic_fp16_vector/atomicrmw_faddfsub_vec_float16.ll
+238-1015 files

LLVM/project 68c0afallvm/lib/Target/AMDGPU AMDGPU.td, llvm/lib/Target/AMDGPU/MCTargetDesc AMDGPUInstPrinter.cpp

AMDGPU: Add FlatSignedOffset feature and use it for flat offset printing (#183483)

Co-authored-by: Matt Arsenault <Matthew.Arsenault at amd.com>
DeltaFile
+5-1llvm/lib/Target/AMDGPU/AMDGPU.td
+1-1llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+6-22 files

LLVM/project e67360elibclc/clc/include/clc/address_space qualifier.h, libclc/clc/lib/amdgcn SOURCES

libclc: Implement address space qualifier functions for amdgpu (#184766)

DeltaFile
+36-0libclc/clc/lib/amdgcn/address_space/qualifier.cl
+35-0libclc/opencl/lib/generic/address_space/qualifier.cl
+33-0libclc/clc/include/clc/address_space/qualifier.h
+33-0libclc/clc/lib/generic/shared/clc_qualifier.cl
+1-0libclc/clc/lib/amdgcn/SOURCES
+1-0libclc/clc/lib/generic/SOURCES
+139-01 files not shown
+140-07 files

LLVM/project 4afd0cfmlir/lib/Dialect/SparseTensor/Transforms SparseAssembler.cpp, mlir/test/Dialect/SparseTensor external_after_codegen.mlir

[mlir][sparse] Fix crash in SparseAssembler when run after SparseTensorCodegen (#183896)

After --sparse-tensor-codegen, sparse tensor arguments are replaced by
memrefs and \!sparse_tensor.storage_specifier types. The subsequent
--sparse-assembler pass calls getSparseTensorEncoding() to identify
sparse arguments to wrap/unwrap. However, getSparseTensorEncoding()
returns non-null for StorageSpecifierType as well as for sparse
RankedTensorType. Since StorageSpecifierType is not a RankedTensorType,
the subsequent cast<RankedTensorType> in convTypes() and convVals()
would crash with an assertion failure.

Fix by also checking isa<RankedTensorType>(type) in the passthrough
condition in both convTypes() and convVals(), so that
StorageSpecifierType arguments pass through unchanged.

Fixes #183776
DeltaFile
+30-0mlir/test/Dialect/SparseTensor/external_after_codegen.mlir
+8-4mlir/lib/Dialect/SparseTensor/Transforms/SparseAssembler.cpp
+38-42 files

LLVM/project 1bddfedclang/test/CodeGenHLSL/builtins f16tof32-builtin.hlsl f16tof32.hlsl

[HLSL] Amend f32tof16() and f16tof32() tests (#179261)

Amend the codegen tests for f32tof16() and f16tof32() to include SPIRV
as a target in addition to DXIL.

Fixes #179257

Co-authored-by: Tim Corringham <tcorring at amd.com>
DeltaFile
+27-6clang/test/CodeGenHLSL/builtins/f16tof32-builtin.hlsl
+27-6clang/test/CodeGenHLSL/builtins/f16tof32.hlsl
+27-3clang/test/CodeGenHLSL/builtins/f32tof16-builtin.hlsl
+27-3clang/test/CodeGenHLSL/builtins/f32tof16.hlsl
+108-184 files

LLVM/project b9237bfllvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Simplify logic after suggestions from Marian
DeltaFile
+13-10llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+1-0llvm/lib/Target/AArch64/AArch64SystemOperands.td
+14-102 files

LLVM/project da92cb5llvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! Don't use ExtraRequires. Instead, set a boolean in TLBITableBase
DeltaFile
+27-22llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+26-12llvm/lib/Target/AArch64/AArch64SystemOperands.td
+7-7llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+60-413 files

LLVM/project e0a1fb9llvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128

Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:

```
  All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
  that are currently dependent on FEAT_D128 are updated to be dependent
  on FEAT_D128 or FEAT_TLBID
```
DeltaFile
+259-0llvm/test/MC/AArch64/tlbip-tlbid-or-d128.s
+66-66llvm/test/MC/AArch64/armv9a-tlbip.s
+15-5llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+20-0llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+6-3llvm/lib/Target/AArch64/AArch64SystemOperands.td
+366-745 files

LLVM/project 72e68falldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/optional TestDataFormatterGenericOptional.py

[lldb][test] TestDataFormatterGenericOptional.py: remove obsolete skipIfs

Clang 7 and GCC 5 are pretty ancient. There's unlikely to be any bot configurations running this anymore. Lets remove it to reduce test noise.
DeltaFile
+0-12lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/optional/TestDataFormatterGenericOptional.py
+0-121 files

LLVM/project fcf6bb8lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/bitset TestDataFormatterGenericBitset.py, lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/coroutine_handle TestCoroutineHandle.py

[lldb][test] Clean up USE_LIBSTDCPP/USE_LIBCPP usage

This patch makes the two tests consistent with the rest of the formatter API tests (and is in my opionion easier to follow).
DeltaFile
+10-13lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/bitset/TestDataFormatterGenericBitset.py
+5-8lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/coroutine_handle/TestCoroutineHandle.py
+15-212 files

LLVM/project 2f90df0flang/lib/Parser io-parsers.cpp, flang/test/Semantics io17.f90

 [Flang] Fix wrong compile-time error message, issue #178494. (#183878)

Fix the problem described in issue #178494. It will cover the failures
with S, SP, SS, BN, BZ, LZ, LZP, LZS, etc. It will resolve the test
failures in PR #183500.
DeltaFile
+56-0flang/test/Semantics/io17.f90
+5-0flang/lib/Parser/io-parsers.cpp
+61-02 files

LLVM/project da87137llvm/lib/Target/AArch64 AArch64InstrFormats.td

fixup! Add $policy to MIOperandInfo
DeltaFile
+1-1llvm/lib/Target/AArch64/AArch64InstrFormats.td
+1-11 files