if_awg: Add missing awg_poll() prototype
The function awg_poll() was missing a prototype, which causes the build
to fail if DEVICE_POLLING is enabled, which it is in the ARMADAXP config.
MFC after: 2 weeks
Reviewed by: tuexen, mmel, adrian
Sponsored by: https://www.patreon.com/bsdivy
Differential Revision: https://reviews.freebsd.org/D56651
[CIR][AMDGPU] Add lowering for amdgcn_div_scale builtins (#192931)
Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/2050
This PR adds support for lowering of _builtin_amdgcn_div_scale* amdgpu
builtins to clangIR.
Followed similar lowering from reference clang->llvmir in
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp.
[flang][OpenMP] Support lowering of metadirective (part 1)
This patch implements following feature in metadirective:
- implementation={vendor(...)}
- device={kind(...), isa(...), arch(...)}
- user={condition(<constant-expr>)}
- construct={parallel, target, teams}
- default, nothing, and otherwise clause
Dynamic user conditions and loop-associated variants are deferred
to follow-up patches.
This patch is part of the feature work for #188820.
Assisted with copilot and GPT-5.4
[NVPTX] Improve error diagnostic when handling unknown intrinsics (#191194)
Following up on #146726, it may be desirable to gracefully fail the
compilation in the presence of unknown NVVM intrinsics, which
cannot be lowered by the NVPTX backend, rather than silently
emitting invalid PTX.
mvc: BaseField: extend count() for value-based field types
So getValues() already uses isSet() which makes the count of
set values in the field type correct. Add the tests to prove
it.
[LoongArch] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector (#164943)
On 64-bit targets the generic legalize will use an i64 load and a
scalar_to_vector for us. But on 32-bit targets, i64 isn't legal, and the
generic legalizer will end up emitting two 32-bit loads. This patch uses
f64 to avoid the splitting entirely and the redundant int->fp
conversion.
[MLIR][OpenMP] Post-translate declare-target USM indirection in OpenMPIRBuilder
When lowering OpenMP to LLVM IR for the target device, record pairs of the
`declare target` device global and the OMPIRBuilder "ref" pointer global
(used for unified shared memory) via `OpenMPIRBuilder`. During the
`OpenMPIRBuilder::finalize` pass, run a postpass that rewrites remaining uses of the
original global to load from the ref global and adjust the pointer (shared
path for `ConstantExpr` addrspace/bitcast chains and for direct
instruction uses).
This follows what is done by clang for similar cases:
https://reviews.llvm.org/D63108.
Co-authored-by: Composer
Co-authored-by: Gemini Pro
[Flang][OpenMP] Clear close on descriptor members for box parents in USM
Extend the MapInfoFinalization walk introduced in #185330 so
parent/member close consistency is enforced whenever
unified_shared_memory is in effect, not only when the parent map's
variable is a fir.RecordType. Allocatable (box) roots expand to member
maps the same way as derived-type instances; getDescriptorMapType may
add OMP_MAP_CLOSE to implicit descriptor members while the parent map
does not set close, which led to bad device behavior under
-fopenmp-force-usm with multiple mapped allocatables.
Co-authored-by: Composer (Cursor) <ai at cursor.com>
[LoopFusion][NFC] UTC gen some tests (#193755)
Some variables need rename as UTC normalizes IR value names. Also,
remove dead variable `%M` and `%N` from
`double_loop_nest_inner_guard.ll`
[MLIR][OpenMP] Post-translate declare-target USM indirection in OpenMPIRBuilder
When lowering OpenMP to LLVM IR for the target device, record pairs of the
`declare target` device global and the OMPIRBuilder "ref" pointer global
(used for unified shared memory) via `OpenMPIRBuilder`. During the
`OpenMPIRBuilder::finalize` pass, run a postpass that rewrites remaining uses of the
original global to load from the ref global and adjust the pointer (shared
path for `ConstantExpr` addrspace/bitcast chains and for direct
instruction uses).
This follows what is done by clang for similar cases:
https://reviews.llvm.org/D63108.
Co-authored-by: Composer
Co-authored-by: Gemini Pro
[Flang][OpenMP] Clear close on descriptor members for box parents in USM
Extend the MapInfoFinalization walk introduced in #185330 so
parent/member close consistency is enforced whenever
unified_shared_memory is in effect, not only when the parent map's
variable is a fir.RecordType. Allocatable (box) roots expand to member
maps the same way as derived-type instances; getDescriptorMapType may
add OMP_MAP_CLOSE to implicit descriptor members while the parent map
does not set close, which led to bad device behavior under
-fopenmp-force-usm with multiple mapped allocatables.
Co-authored-by: Composer (Cursor) <ai at cursor.com>
AMDGPU: Back-propagate wqm for sources of side-effect instruction (#193395)
For readfirstlane instruction, as it would get undefined value if exec
is zero. To handle the case that only helper lanes execute the parent
block, we let the readfirstlane to execute under wqm. But this is not
enough. If the parent block was also executed by non-helper lanes, we
also need to make sure its sources were calculated under wqm. Otherwise,
if the instruction that generate the source of readfirstlane was
executed under exact mode, the value would contain garbage data in help
lane. The garbage data in helper lane maybe returned by the
readfirstlane running under wqm.
To fix this issue, we need to enforce the back-propagation of wqm for
instructions like readfirstlane. This was only done if the instruction
was possibly in the middle of wqm region (by checking OutNeeds).