[flang][cuda] Support non-allocatable module-level managed variables (#188526)
Add support for non-allocatable module-level CUDA managed variables
using pointer indirection through a companion global in
__nv_managed_data__. The CUDA runtime populates this pointer with the
unified memory address via __cudaRegisterManagedVar and
__cudaInitModule.
1. Create a .managed.ptr companion global in the __nv_managed_data__
section and register it with _FortranACUFRegisterManagedVariable
(CUFAddConstructor.cpp)
2. Call __cudaInitModule after registration to populate the managed
pointer (registration.cpp)
3. Annotate managed globals in gpu.module with nvvm.managed for PTX
.attribute(.managed) generation (cuda-code-gen.mlir)
4. Suppress cuf.data_transfer for assignments to/from non-allocatable
module managed variables, since cudaMemcpy would target the shadow
address rather than the actual unified memory (tools.h)
5. Preserve cuf.data_transfer for device_var = managed_var assignments
where explicit transfer is still required
[mlir][OpenMP][NFC] Refactor fillAffinityIteratorLoop (#189418)
Extract affinity-specific logic from fillAffinityIteratorLoop into a
callback so that the iterator loop codegen logic can be shared with
other clauses such as depend clause and target clause.
[Flang][OpenMP] Support iterator modifier in depend clause (#189412)
Lower the iterator modifier on depend clause to omp.iterator. Iterated
depend objects emit `!omp.iterated<!llvm.ptr>` by using
`getDataOperandBaseAddr` to generate base address and
`genIteratorCoordinate` to get the addr+offset. The non-iterated objects
in depend clause still use existing lowering path.
This patch is part of feature work for #188061.
Assisted with copilot.
[mlir][OpenMP] Add iterator support to depend clause (#189090)
Extend the depend clause to support `!omp.iterated<Ty>` handles
alongside plain depend vars, so the IR can represent both forms.
Assisted with copilot
This is part of feature work for
https://github.com/llvm/llvm-project/issues/188061
[flang] Preserve UseErrorDetails in module files (#189423)
When the same name is USE-associated with two or more distinct ultimate
symbols, and they are not both generic procedure interfaces, it's not an
error unless the name is actually referenced in the scope. But when the
scope is itself a module or submodule, our module files don't preserve
the error for later diagnosis -- instead, the UseErrorDetails symbol
that serves as a "poison pill" in case of later use is discarded when
the module file is generated. So emit additional USE statements to the
module file so that a UseErrorDetails symbol is created anew when the
module file is read.
[mlir][ArithToSPIRV] Fix invalid SPIRV and crashes when lowering integer ops on i1 (#189239)
Several arith integer operations on i1 / vector<Ni1> types were either
crashing or producing invalid SPIRV. The i1 type maps to spirv.bool in
SPIRV, not to a SPIRV integer — so standard integer SPIRV ops
(spirv.IAdd, spirv.UDiv, spirv.GLSMax, etc.) are illegal on it.
Add dedicated boolean patterns for all affected arith integer ops, each
with benefit=2 to take priority over the generic elementwise patterns.
The semantics for i1 follow from treating true = 1 / false = 0 with
two's complement wrapping:
- addi, subi → spirv.LogicalNotEqual (XOR on bits)
- muli, divui, divsi → spirv.LogicalAnd
- remui, remsi, shli, shrui → spirv.LogicalAnd(a, spirv.LogicalNot(b))
(a & ~b)
- shrsi → identity (arithmetic right shift of a 1-bit signed value is
always the input)
- maxui, minsi → spirv.LogicalOr (unsigned max / signed min treats true
as larger)
[4 lines not shown]
[openmp] Add support for Arm64X to libomp (#176157)
This patch allows building libomp.dll and libomp.lib as Arm64X binaries
containing both arm64 and arm64ec code and useable from applications
compiled for both architectures.
[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128
Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
that are currently dependent on FEAT_D128 are updated to be dependent
on FEAT_D128 or FEAT_TLBID
```
[mlir][docs] dialect interfaces and mlir reduce documentation fix (#189258)
Two modifications:
1. Reflect newly added dialect interface methods in the documentation
2. Remove the bug in the `MLIR Reduce` documentation
Revert "[VPlan] Extract reverse mask from reverse accesses" (#189637)
Reverts llvm/llvm-project#155579
Assertion added triggers on some buildbots
clang:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:3840:
virtual InstructionCost
llvm::VPWidenMemoryRecipe::computeCost(ElementCount, VPCostContext &)
const: Assertion `!IsReverse() && "Inconsecutive memory access should
not have reverse order"' failed.
PLEASE submit a bug report to
https://github.com/llvm/llvm-project/issues/ and include the crash
backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments:
/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1.install/bin/clang
-DNDEBUG -mcpu=neoverse-v2 -mllvm -scalable-vectorization=preferred -O3
-std=gnu17 -fcommon -Wno-error=incompatible-pointer-types -MD -MT
[3 lines not shown]