[MLIR][EmitC] Add optional pure attribute to CastOp (#202749)
In general, C++ cast expressions cannot always be assumed to be pure: they may
invoke user-defined conversions or be affected by floating-point environment
settings. However, in many practical cases, such as integer casts without
operator overloading, the cast is pure and can be treated as speculatable and
side-effect-free. For such cases, the newly added `pure` attribute may be used.
When `pure` attribute is set, `getSpeculatability()` returns `Speculatable` and
`getEffects()` reports no effects. It is UB if the `pure` attribute is set and
the actual conversion is not pure, e.g. when the user-defined conversion has
memory effects.
[LLD] Allow all output-section-commands in OVERLAYS. (#203524)
The GNU ld grammar for overlays is:
secname1
{
output-section-command
output-section-command
...
}
secname2
...
The output-section-commands are the same as in an OutputSection. At
present we have a stripped down parser that only supports
InputSectionDescriptions, this does not permit other useful commands
such as defining symbols.
Due to recent refactoring it is now simple to reuse the parser for an
Output Section command rather than using a custom one.
[5 lines not shown]
[OpenACC] Add emit-independent-loops-as-unstructured flag
Add a flag (default true) to bypass the TODOs in `loopWillBeIndependent`
that fired for unstructured do loops inside independent OpenACC loop and
combined constructs, lowering them as `acc.loop` instead. Existing TODO
tests are extended to exercise both the default (lowered) path and the
explicit `=false` path that still reports the TODO.
Co-Authored-By: Claude <noreply at anthropic.com>
[DropUnnecessaryAssumes] Fix iterator invalidation. (#203765)
registerAssumption() below can append to (and reallocate) the cache's
assumption vector. Use integer index for indexing instead of using the
iterator. Stop at the original count, so we don't reprocess assumes
created during the loop.
PR: https://github.com/llvm/llvm-project/pull/203765
[MacroFusion] Restrict pairs to have SDep::Data dependency only (#203793)
This patch aims to restrict target independent macro fusion to
SDep::Data dependent paris only. The test demonstrates the case that has
driven this patch - 2 instructions are being wrongly macro fused by an
Artificial edge, without being RAW dependent. Currently macro fusion do
not really require a more relaxed constraint ike it has today. If this
is invalidated in the future, we can solve it later e.g. by adding a
hook.
[JumpThreading] Use context when checking speculatability (#203912)
Pass the terminator of the predecessor as context instruction when
checking for load speculatability. This needs to be done per
(unavailable) predecessor now, because the context is different. Cache
the guaranteed-to-transfer walk between checks, as that part if always
the same.
JumpThreading doesn't use AssumptionCache currently, so I believe this
is only observable under -use-dereferenceable-at-point-semantics. Adjust
the tests to drop nofree attributes that currently hide this issue with
the option enabled.
Merge tag 'for-7.2/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- small cleanups in dm-vdo, dm-raid, dm-cache, dm-zoned-metadata
- rework of dm-ima
- introduce dm-inlinecrypt
- fix wrong return value in dm-ioctl
- fix rcu stall when polling
* tag 'for-7.2/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-zoned-metadata: Use strscpy() to copy device name
dm cache: make smq background work limit configurable
dm-inlinecrypt: add support for hardware-wrapped keys
dm: limit target bio polling to one shot
[16 lines not shown]
[DFAJumpThreading] Do not thread over blocks with multiple phi definitions (#195512)
Fixes #195088
For the reduced case in the issue, there are 4 threading paths:
```
< then, case2, lbl_entry, switch_bb > [ 0, case2 ]
< case2, lbl_entry, switch_bb > [ 0, case2 ]
< then, case2, switch_bb > [ 1, case2 ]
< case2, switch_bb > [ 0, case2 ]
```
But the first path and the third path have a conflict: `then->case2`
cannot be diverged into `then->case2.0` and `then->case2.1` at the same
time, as jumping from `then` to `case2` does not really define a unique
exiting state. Multiple phi definition causes two exiting states (0 and
1) for `then->case2`.
The root cause is that the block with multiple definitions cannot be
regarded as a determinator.
[flang][MemoryAllocation] do not assume all blocks have terminators (#203902)
Update MemoryAllocation to cope with blocks without terminators.
`getTerminator` cannot be called when a block has no terminator and must
be guarded by `mightHaveTerminator`.
This case was hit for instance for alloca inside fir.do_concurrent.loop.
Add handling for single block regions with no terminators (which is the
case of `fir.do_concurrent.loop` and most regions without terminators).
The deallocation point can simply be placed at the end of the block in
such cases. For regions with several blocks and no terminators, the pass
will leave the alloca (no known operation used in flang with such
behavior).
Also add `AutomaticAllocationScope` to the `fir.do_concurrent.loop`
since each iteration is independent and owns its allocas (otherwise the
pass would create allocmem outside of the loops).
[AArch64] Add initial support for Hisilicon's hip12 core (#203446)
This patch adds initial support for Hisilicon's hip12 core (Kunpeng 950
processor).
For more information, see:
https://www.huawei.com/en/news/2025/9/hc-xu-keynote-speech
HIP12Model will come later.
[AMDGPU] Model WMMA co-execution windows in the scheduler for gfx1250
WMMA instructions in gfx1250 exposes an execution window during which
only certain other instruction classes may co-execute. Teach the hazard
recognizer about those windows so the scheduler can fill co-execution slots and
account for the resulting stalls. This adds a preRA hazard recognizer
mode.
Add AMDGPUCoExecInfo.h, a shared model of a co-execution window: the
per-stage capability bitmask, the stage types (CoExecStageType), and
CoExecInfo, which maps a multi-cycle instruction to its per-cycle slot
pattern via getCoExecInfo(). InstructionFlavor and its helpers move here
from AMDGPUCoExecSchedStrategy.h with no functional change so they can
be shared by the scheduler and the hazard recognizer.
[clang][bytecode] Remove InterpState::InitializingBlocks (#204054)
This was superseded by `InitializingPtrs` when implementing
`dynamic_cast`, so we can now remove `InitializingBlocks`.
Merge tag 'for-7.2/block-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Per-controller admin and IO timeout sysfs attributes, and
letting the block layer set request timeouts (Maurizio,
Maximilian)
- Multipath passthrough iostats, and PCI P2PDMA enablement for
multipath devices (Keith, Kiran)
- A new diag sysfs attribute group exporting per-controller
counters (retries, multipath failover, error counters, requeue
and failure counts, reset and reconnect events) (Nilay)
- FDP configuration validation and bounds check fixes (liuxixin)
- Various nvmet fixes, including a pre-auth out-of-bounds read in
the Discovery Get Log Page handler, auth payload bounds
validation, and tcp error-path leak fixes (Bryam, Tianchu,
Geliang)
- nvme-tcp lockdep and workqueue fixes (Shin'ichiro, Kuniyuki,
[76 lines not shown]
[InstSimplify] Consider `dereferenceable(N)` when simplifying pointer equalities (#203867)
Extend `computePointerICmp` to leverage `dereferenceable(N)` attribute
when simplifying pointer equality comparisons. Per attribute semantics,
an argument pointer marked as such cannot be a one-past-the-end pointer
to some object, thus it cannot equal the start of an adjacent object.
This lets us prove inequality between a `dereferenceable` argument and
storage allocated within the function.
Fixes: https://github.com/llvm/llvm-project/issues/200511.
Merge tag 'for-7.2/io_uring-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Rework the task_work infrastructure.
Both the local (DEFER_TASKRUN) and the normal (tctx) task_work lists
were llist based, which is LIFO ordered, and hence each run had to do
an O(n) list reversal pass first to restore queue order.
Additionally, to cap the amount of task_work run, each method needed
a retry list as well.
Add a lockless MPCS FIFO queue (based on Dmitry Vyukov's intrusive
MPSC algorithm) and switch both task_work lists to it. It performs
better than llists and we can then also ditch the retry lists as well
as entries are popped one-at-the-time.
On top of those changes, run the tctx fallback task_work directly and
remove the now-unused per-ctx fallback machinery entirely.
[61 lines not shown]
[Test] Remove test creating invalid assume operand bundles (#203945)
This was creating random assume operand bundles, using unsupported
attributes, and using invalid arguments for supported ones.
Rather than trying to salvage this test, delete it and the API it tests.
[mlir][OpenMP] Translate reductions on taskloop
Add LLVM IR translation for reduction and in_reduction clauses on omp.taskloop.context.
For taskloop reduction, emit the implicit taskgroup reduction setup and map each generated task to runtime-provided private reduction storage through __kmpc_task_reduction_get_th_data. For in_reduction, use the same runtime lookup path with a null descriptor to join an enclosing task reduction context.
Unsupported byref, cleanup, and two-argument initializer forms remain diagnosed.
Add MLIR translation tests for the supported taskloop reduction and in_reduction cases.