LLVM/project 7f35a2amlir/include/mlir/Dialect/EmitC/IR EmitC.td, mlir/lib/Dialect/EmitC/IR EmitC.cpp

[MLIR][EmitC] Add optional pure attribute to CastOp (#202749)

In general, C++ cast expressions cannot always be assumed to be pure: they may
invoke user-defined conversions or be affected by floating-point environment
settings. However, in many practical cases, such as integer casts without
operator overloading, the cast is pure and can be treated as speculatable and
side-effect-free. For such cases, the newly added `pure` attribute may be used.

When `pure` attribute is set, `getSpeculatability()` returns `Speculatable` and
`getEffects()` reports no effects. It is UB if the `pure` attribute is set and
the actual conversion is not pure, e.g. when the user-defined conversion has
memory effects.
DeltaFile
+23-3mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
+13-0mlir/test/Dialect/EmitC/canonicalize.mlir
+13-0mlir/lib/Dialect/EmitC/IR/EmitC.cpp
+49-33 files

LLVM/project 3b46665lld/ELF ScriptParser.cpp, lld/test/ELF/linkerscript overlay-symbols.test

[LLD] Allow all output-section-commands in OVERLAYS. (#203524)

The GNU ld grammar for overlays is:
  secname1
    {
      output-section-command
      output-section-command
      ...
    }
  secname2
  ...

The output-section-commands are the same as in an OutputSection. At
present we have a stripped down parser that only supports
InputSectionDescriptions, this does not permit other useful commands
such as defining symbols.

Due to recent refactoring it is now simple to reuse the parser for an
Output Section command rather than using a custom one.

    [5 lines not shown]
DeltaFile
+49-0lld/test/ELF/linkerscript/overlay-symbols.test
+1-1lld/ELF/ScriptParser.cpp
+50-12 files

LLVM/project 943ccc3flang/lib/Lower OpenACC.cpp, flang/test/Lower/OpenACC/Todo acc-unstructured-loop-construct.f90 acc-unstructured-combined-construct.f90

[OpenACC] Add emit-independent-loops-as-unstructured flag

Add a flag (default true) to bypass the TODOs in `loopWillBeIndependent`
that fired for unstructured do loops inside independent OpenACC loop and
combined constructs, lowering them as `acc.loop` instead. Existing TODO
tests are extended to exercise both the default (lowered) path and the
explicit `=false` path that still reports the TODO.

Co-Authored-By: Claude <noreply at anthropic.com>
DeltaFile
+32-5flang/test/Lower/OpenACC/Todo/acc-unstructured-loop-construct.f90
+23-3flang/test/Lower/OpenACC/Todo/acc-unstructured-combined-construct.f90
+11-2flang/lib/Lower/OpenACC.cpp
+66-103 files

LLVM/project ff5844cllvm/lib/Transforms/Scalar DropUnnecessaryAssumes.cpp, llvm/test/Transforms/DropUnnecessaryAssumes dereferenceable.ll

[DropUnnecessaryAssumes] Fix iterator invalidation. (#203765)

registerAssumption() below can append to (and reallocate) the cache's
assumption vector. Use integer index for indexing instead of using the
iterator. Stop at the original count, so we don't reprocess assumes
created during the loop.

PR: https://github.com/llvm/llvm-project/pull/203765
DeltaFile
+64-0llvm/test/Transforms/DropUnnecessaryAssumes/dereferenceable.ll
+8-1llvm/lib/Transforms/Scalar/DropUnnecessaryAssumes.cpp
+72-12 files

LLVM/project 3503a1cclang/lib/AST/ByteCode Pointer.cpp, clang/test/AST/ByteCode cxx11.cpp

[clang][bytecode] Not all bases compare equal (#204052)

Add the base class offset so they don't all compare equal.
DeltaFile
+2-2clang/test/AST/ByteCode/cxx11.cpp
+1-2clang/lib/AST/ByteCode/Pointer.cpp
+3-42 files

LLVM/project 8be07cfllvm/lib/CodeGen MacroFusion.cpp, llvm/test/CodeGen/AArch64 misched-fusion-no-raw-dependency.mir

[MacroFusion] Restrict pairs to have SDep::Data dependency only (#203793)

This patch aims to restrict target independent macro fusion to
SDep::Data dependent paris only. The test demonstrates the case that has
driven this patch - 2 instructions are being wrongly macro fused by an
Artificial edge, without being RAW dependent. Currently macro fusion do
not really require a more relaxed constraint ike it has today. If this
is invalidated in the future, we can solve it later e.g. by adding a
hook.
DeltaFile
+37-0llvm/test/CodeGen/AArch64/misched-fusion-no-raw-dependency.mir
+2-2llvm/lib/CodeGen/MacroFusion.cpp
+39-22 files

LLVM/project 46af61allvm/lib/Transforms/Scalar JumpThreading.cpp, llvm/test/Transforms/JumpThreading guards.ll

[JumpThreading] Use context when checking speculatability (#203912)

Pass the terminator of the predecessor as context instruction when
checking for load speculatability. This needs to be done per
(unavailable) predecessor now, because the context is different. Cache
the guaranteed-to-transfer walk between checks, as that part if always
the same.

JumpThreading doesn't use AssumptionCache currently, so I believe this
is only observable under -use-dereferenceable-at-point-semantics. Adjust
the tests to drop nofree attributes that currently hide this issue with
the option enabled.
DeltaFile
+16-6llvm/lib/Transforms/Scalar/JumpThreading.cpp
+7-7llvm/test/Transforms/JumpThreading/guards.ll
+23-132 files

LLVM/project a52195cllvm/include/llvm/MC MCExpr.h MCAsmInfo.h, llvm/lib/MC MCExpr.cpp MCDwarf.cpp

[MC] Remove unused MCAsmInfo::usesDwarfFileAndLocDirectives and parameter. NFC (#204071)
DeltaFile
+2-2llvm/lib/MC/MCExpr.cpp
+1-2llvm/include/llvm/MC/MCExpr.h
+1-2llvm/lib/MC/MCDwarf.cpp
+0-2llvm/include/llvm/MC/MCAsmInfo.h
+4-84 files

LLVM/project 5413442llvm/lib/Transforms/Scalar DFAJumpThreading.cpp, llvm/test/Transforms/DFAJumpThreading single-block-defs.ll

[DFAJumpThreading] Do not thread over blocks with multiple phi definitions (#195512)

Fixes #195088
For the reduced case in the issue, there are 4 threading paths:
```
< then, case2, lbl_entry, switch_bb > [ 0, case2 ]
< case2, lbl_entry, switch_bb > [ 0, case2 ]
< then, case2, switch_bb > [ 1, case2 ]
< case2, switch_bb > [ 0, case2 ]
```
But the first path and the third path have a conflict: `then->case2`
cannot be diverged into `then->case2.0` and `then->case2.1` at the same
time, as jumping from `then` to `case2` does not really define a unique
exiting state. Multiple phi definition causes two exiting states (0 and
1) for `then->case2`.
The root cause is that the block with multiple definitions cannot be
regarded as a determinator.
DeltaFile
+55-0llvm/test/Transforms/DFAJumpThreading/single-block-defs.ll
+18-1llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+73-12 files

LLVM/project 0b1cb30clang/lib/AST/ByteCode EvaluationResult.cpp

[clang][bytecode] Refactor collectBlocks() (#204062)

If the record has no pointer field, return from the function completely.
DeltaFile
+17-5clang/lib/AST/ByteCode/EvaluationResult.cpp
+17-51 files

LLVM/project 0560140flang/include/flang/Optimizer/Dialect FIROps.td, flang/lib/Optimizer/Transforms MemoryUtils.cpp

[flang][MemoryAllocation] do not assume all blocks have terminators (#203902)

Update MemoryAllocation to cope with blocks without terminators.
`getTerminator` cannot be called when a block has no terminator and must
be guarded by `mightHaveTerminator`.
This case was hit for instance for alloca inside fir.do_concurrent.loop.

Add handling for single block regions with no terminators (which is the
case of `fir.do_concurrent.loop` and most regions without terminators).
The deallocation point can simply be placed at the end of the block in
such cases. For regions with several blocks and no terminators, the pass
will leave the alloca (no known operation used in flang with such
behavior).

Also add `AutomaticAllocationScope` to the `fir.do_concurrent.loop`
since each iteration is independent and owns its allocas (otherwise the
pass would create allocmem outside of the loops).
DeltaFile
+71-46flang/lib/Optimizer/Transforms/MemoryUtils.cpp
+111-0flang/test/Fir/memory-allocation-opt-do-concurrent.fir
+2-1flang/include/flang/Optimizer/Dialect/FIROps.td
+184-473 files

LLVM/project d873501clang/test/Driver aarch64-hip12.c, clang/test/Driver/print-enabled-extensions aarch64-hip12.c

[AArch64] Add initial support for Hisilicon's hip12 core (#203446)

This patch adds initial support for Hisilicon's hip12 core (Kunpeng 950
processor).
For more information, see:
https://www.huawei.com/en/news/2025/9/hc-xu-keynote-speech

HIP12Model will come later.
DeltaFile
+72-0clang/test/Driver/print-enabled-extensions/aarch64-hip12.c
+28-1llvm/lib/Target/AArch64/AArch64Processors.td
+13-0clang/test/Driver/aarch64-hip12.c
+6-0llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+3-2llvm/lib/TargetParser/Host.cpp
+4-0llvm/unittests/TargetParser/Host.cpp
+126-34 files not shown
+131-410 files

LLVM/project 4f9adfdllvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp AMDGPUCoExecInfo.h, llvm/test/CodeGen/AMDGPU coexec-hazardrec-preRA.mir wmma-trans-multi-shadow-hazard.mir

[AMDGPU] Model WMMA co-execution windows in the scheduler for gfx1250

WMMA instructions in gfx1250 exposes an execution window during which
only certain other instruction classes may co-execute. Teach the hazard
recognizer about those windows so the scheduler can fill co-execution slots and
account for the resulting stalls. This adds a preRA hazard recognizer
mode.

Add AMDGPUCoExecInfo.h, a shared model of a co-execution window: the
per-stage capability bitmask, the stage types (CoExecStageType), and
CoExecInfo, which maps a multi-cycle instruction to its per-cycle slot
pattern via getCoExecInfo(). InstructionFlavor and its helpers move here
from AMDGPUCoExecSchedStrategy.h with no functional change so they can
be shared by the scheduler and the hazard recognizer.
DeltaFile
+501-0llvm/test/CodeGen/AMDGPU/coexec-hazardrec-preRA.mir
+413-12llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+406-0llvm/lib/Target/AMDGPU/AMDGPUCoExecInfo.h
+120-2llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-91llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+67-0llvm/test/CodeGen/AMDGPU/wmma-trans-multi-shadow-hazard.mir
+1,511-1057 files not shown
+1,562-11713 files

LLVM/project 587e029clang/lib/AST/ByteCode Interp.cpp InterpState.h

[clang][bytecode] Remove InterpState::InitializingBlocks (#204054)

This was superseded by `InitializingPtrs` when implementing
`dynamic_cast`, so we can now remove `InitializingBlocks`.
DeltaFile
+5-10clang/lib/AST/ByteCode/Interp.cpp
+7-2clang/lib/AST/ByteCode/InterpState.h
+2-2clang/lib/AST/ByteCode/Interp.h
+1-1clang/lib/AST/ByteCode/InterpBuiltin.cpp
+15-154 files

LLVM/project e675b9fllvm/lib/Analysis InstructionSimplify.cpp, llvm/test/Transforms/InstSimplify compare.ll

[InstSimplify] Consider `dereferenceable(N)` when simplifying pointer equalities (#203867)

Extend `computePointerICmp` to leverage `dereferenceable(N)` attribute
when simplifying pointer equality comparisons. Per attribute semantics,
an argument pointer marked as such cannot be a one-past-the-end pointer
to some object, thus it cannot equal the start of an adjacent object.
This lets us prove inequality between a `dereferenceable` argument and
storage allocated within the function.

Fixes: https://github.com/llvm/llvm-project/issues/200511.
DeltaFile
+182-0llvm/test/Transforms/InstSimplify/compare.ll
+47-28llvm/lib/Analysis/InstructionSimplify.cpp
+229-282 files

LLVM/project 2487cb0clang/lib/AST/ByteCode EvaluationResult.cpp EvalEmitter.cpp

[clang][bytecode] Rename checkReturnValue to checkDynamicAllocations (#204064)

This is part of https://github.com/llvm/llvm-project/pull/186045, but
makes sense independently.
DeltaFile
+4-3clang/lib/AST/ByteCode/EvaluationResult.cpp
+2-2clang/lib/AST/ByteCode/EvalEmitter.cpp
+2-2clang/lib/AST/ByteCode/EvaluationResult.h
+8-73 files

LLVM/project fe51b83llvm/include/llvm/Analysis AssumeBundleQueries.h, llvm/lib/Analysis AssumeBundleQueries.cpp

[Test] Remove test creating invalid assume operand bundles (#203945)

This was creating random assume operand bundles, using unsupported
attributes, and using invalid arguments for supported ones.

Rather than trying to salvage this test, delete it and the API it tests.
DeltaFile
+0-98llvm/unittests/Analysis/AssumeBundleQueriesTest.cpp
+0-12llvm/include/llvm/Analysis/AssumeBundleQueries.h
+0-6llvm/lib/Analysis/AssumeBundleQueries.cpp
+0-1163 files

LLVM/project 4ae4a15clang-tools-extra/test/clang-tidy/infrastructure cli-argument-errors.cpp config-option-errors.cpp

Revert "[clang-tidy][NFC] Add more test coverage for tidy errors" (#204073)

Reverts llvm/llvm-project#203987
DeltaFile
+0-13clang-tools-extra/test/clang-tidy/infrastructure/cli-argument-errors.cpp
+0-13clang-tools-extra/test/clang-tidy/infrastructure/config-option-errors.cpp
+0-11clang-tools-extra/test/clang-tidy/infrastructure/vfsoverlay-errors.cpp
+0-10clang-tools-extra/test/clang-tidy/infrastructure/config-file-parse-errors.cpp
+0-8clang-tools-extra/test/clang-tidy/infrastructure/export-fixes-errors.cpp
+0-7clang-tools-extra/test/clang-tidy/infrastructure/list-checks-no-checks.cpp
+0-626 files

LLVM/project 95cc633mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR openmp-taskloop-reduction.mlir openmp-todo.mlir

[mlir][OpenMP] Translate reductions on taskloop

Add LLVM IR translation for reduction and in_reduction clauses on omp.taskloop.context.

For taskloop reduction, emit the implicit taskgroup reduction setup and map each generated task to runtime-provided private reduction storage through __kmpc_task_reduction_get_th_data. For in_reduction, use the same runtime lookup path with a null descriptor to join an enclosing task reduction context.

Unsupported byref, cleanup, and two-argument initializer forms remain diagnosed.

Add MLIR translation tests for the supported taskloop reduction and in_reduction cases.
DeltaFile
+373-0mlir/test/Target/LLVMIR/openmp-taskloop-reduction.mlir
+238-27mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+92-10mlir/test/Target/LLVMIR/openmp-todo.mlir
+703-373 files

LLVM/project 1338c5cllvm/test/CodeGen/AArch64/GlobalISel irtranslator-memset-inline.ll inline-memset-forced.mir, llvm/test/CodeGen/AMDGPU/GlobalISel legalize-memsetinline.mir

[GlobalISel] Implement `llvm.memset.inline` (#203198)
DeltaFile
+142-0llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-memset-inline.ll
+77-0llvm/test/CodeGen/AArch64/GlobalISel/inline-memset-forced.mir
+72-0llvm/test/CodeGen/AArch64/GlobalISel/inline-small-memset.mir
+69-0llvm/test/CodeGen/RISCV/GlobalISel/memset-inline.ll
+59-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-memsetinline.mir
+57-0llvm/test/CodeGen/Mips/GlobalISel/mips-prelegalizer-combiner/inline-memset.mir
+476-021 files not shown
+614-2727 files

LLVM/project 161d8a7clang-tools-extra/clangd HeaderSourceSwitch.cpp ClangdLSPServer.cpp, clang-tools-extra/clangd/refactor/tweaks ExtractVariable.cpp

[clangd][nfc] Avoid type erasure for local recursive callbacks (#203042)

Four local clangd callbacks use std::function only to call themselves.
Switch to local structs and static functions to avoid std::function
type-erasure and copy-support machinery.

In matched Release AArch64 builds, the four object files shrink by 8,152
bytes and 131 relocations; linked clangd shrinks by 3,872 bytes
unstripped and 16 bytes stripped, with __text down 360 bytes,
__DATA_CONST,__const down 208 bytes, unwind data down 32 bytes, and 21
fewer dyld fixups.

Work towards #202616

AI tool disclosure: Co-authored with OpenAI Codex.
DeltaFile
+22-17clang-tools-extra/clangd/HeaderSourceSwitch.cpp
+22-17clang-tools-extra/clangd/ClangdLSPServer.cpp
+13-11clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
+11-10clang-tools-extra/clangd/Protocol.cpp
+68-554 files

LLVM/project e13bb91llvm/lib/Target/AArch64 AArch64RegisterInfo.cpp, llvm/unittests/Target/AArch64 AArch64RegisterInfoTest.cpp

[AArch64] Reserve `W30_HI` and `[BHSDQ]31_HI` (#202929)
DeltaFile
+38-0llvm/unittests/Target/AArch64/AArch64RegisterInfoTest.cpp
+6-6llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+44-62 files

LLVM/project 9d7ca44llvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp GCNHazardRecognizer.h, llvm/test/CodeGen/AMDGPU misched-into-wmma-hazard-shadow.mir

[AMDGPU] Track VALU instructions separately for WMMA coexecution hazards (#202523)

WMMA coexecution hazards can only be resolved by VALU instructions, not
S_NOPs. Track VALU/WMMA instructions separately so the scheduler can
accurately determine stall cycles.
DeltaFile
+59-10llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+36-0llvm/test/CodeGen/AMDGPU/misched-into-wmma-hazard-shadow.mir
+16-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+111-103 files

LLVM/project 2e36f06clang-tools-extra/test/clang-tidy/infrastructure cli-argument-errors.cpp config-option-errors.cpp

[clang-tidy][NFC] Add more test coverage for tidy errors (#203987)
DeltaFile
+13-0clang-tools-extra/test/clang-tidy/infrastructure/cli-argument-errors.cpp
+13-0clang-tools-extra/test/clang-tidy/infrastructure/config-option-errors.cpp
+11-0clang-tools-extra/test/clang-tidy/infrastructure/vfsoverlay-errors.cpp
+10-0clang-tools-extra/test/clang-tidy/infrastructure/config-file-parse-errors.cpp
+8-0clang-tools-extra/test/clang-tidy/infrastructure/export-fixes-errors.cpp
+7-0clang-tools-extra/test/clang-tidy/infrastructure/list-checks-no-checks.cpp
+62-06 files

LLVM/project f88e9delibc/lib CMakeLists.txt

[libc] Generate a stub for libpthread.a (#200908)

Several build systems / existing scripts assume that pthread functions
are exposed through separate library (`libpthread.so` / `libpthread.a`)
and thus use `-lpthread` flag explicitly. Since llvm-libc puts all the
pthread functions into the regular `libc`, teach the CMake build rules
to produce an empty static archive `libpthread.a` for compatibility
purposes.
DeltaFile
+25-0libc/lib/CMakeLists.txt
+25-01 files

LLVM/project ccef34dllvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h, llvm/test/Transforms/LoopVectorize runtime-check-known-true.ll

[VPlan] Simplify reverse(reverse(x)) -> x (#199057)

This is a version of #196900 that performs the simplification as a
separate transform.

We need to add an additional `vp.splice.right(vp.splice.left(poison, x,
evl), poison, evl) -> x` simplification to avoid left over splices
whenever reverses are removed in an EVL tail folded loop.

Co-authored-by: Madhur Amilkanthwar <madhura at nvidia.com>
DeltaFile
+59-0llvm/test/Transforms/LoopVectorize/VPlan/simplify-reverse-reverse.ll
+26-6llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+4-12llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll
+3-5llvm/test/Transforms/LoopVectorize/runtime-check-known-true.ll
+3-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+1-0llvm/test/Transforms/LoopVectorize/VPlan/vplan-print-after-all.ll
+96-236 files

LLVM/project 0a04c14llvm/tools/llvm-objdump llvm-objdump.cpp, llvm/tools/llvm-profdata llvm-profdata.cpp

[llvm] Replace unordered_{map,set} with Dense{Map,Set} in llvm tools (#204058)

std::unordered_map is slow. Switch the remaining local maps and sets in
the command-line tools (llvm-profgen, llvm-profdata, llvm-objdump,
llvm-exegesis, llvm-xray, llvm-remarkutil) to DenseMap/DenseSet.
DeltaFile
+27-28llvm/tools/llvm-profgen/PerfReader.h
+9-19llvm/tools/llvm-profgen/MissingFrameInferrer.h
+12-15llvm/tools/llvm-objdump/llvm-objdump.cpp
+14-10llvm/tools/llvm-profgen/ProfiledBinary.h
+7-10llvm/tools/llvm-profgen/PerfReader.cpp
+7-9llvm/tools/llvm-profdata/llvm-profdata.cpp
+76-919 files not shown
+101-11715 files

LLVM/project 639b1d9lld/ELF/Arch LoongArch.cpp, lld/test/ELF loongarch-pcadd-hi20.s

[lld][LoongArch] Fix range checking of R_LARCH_*_PCADD_HI20 relocations on 64-bit (#183233)

According to the la-abi-specs, the `R_LARCH_*_PCADD_HI20` relocations
are also used on 64-bit LoongArch. Fix the range checking accordingly.
DeltaFile
+32-0lld/test/ELF/loongarch-pcadd-hi20.s
+1-1lld/ELF/Arch/LoongArch.cpp
+33-12 files

LLVM/project 4f8a1e9lld/test/ELF loongarch-pcadd-hi20.s

Add test case
DeltaFile
+32-0lld/test/ELF/loongarch-pcadd-hi20.s
+32-01 files

LLVM/project d1539edlld/ELF/Arch LoongArch.cpp

[lld][LoongArch] Fix range checking of R_LARCH_*_PCADD_HI20 relocations on 64-bit

According to the la-abi-specs, the R_LARCH_*_PCADD_HI20 relocations are
also used on 64-bit LoongArch. Fix the range checking accordingly.
DeltaFile
+1-1lld/ELF/Arch/LoongArch.cpp
+1-11 files