[LoongArch] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector (#164943)
On 64-bit targets the generic legalize will use an i64 load and a
scalar_to_vector for us. But on 32-bit targets, i64 isn't legal, and the
generic legalizer will end up emitting two 32-bit loads. This patch uses
f64 to avoid the splitting entirely and the redundant int->fp
conversion.
[LoopFusion][NFC] UTC gen some tests (#193755)
Some variables need rename as UTC normalizes IR value names. Also,
remove dead variable `%M` and `%N` from
`double_loop_nest_inner_guard.ll`
[MLIR][OpenMP] Post-translate declare-target USM indirection in OpenMPIRBuilder
When lowering OpenMP to LLVM IR for the target device, record pairs of the
`declare target` device global and the OMPIRBuilder "ref" pointer global
(used for unified shared memory) via `OpenMPIRBuilder`. During the
`OpenMPIRBuilder::finalize` pass, run a postpass that rewrites remaining uses of the
original global to load from the ref global and adjust the pointer (shared
path for `ConstantExpr` addrspace/bitcast chains and for direct
instruction uses).
This follows what is done by clang for similar cases:
https://reviews.llvm.org/D63108.
Co-authored-by: Composer
Co-authored-by: Gemini Pro
[Flang][OpenMP] Clear close on descriptor members for box parents in USM
Extend the MapInfoFinalization walk introduced in #185330 so
parent/member close consistency is enforced whenever
unified_shared_memory is in effect, not only when the parent map's
variable is a fir.RecordType. Allocatable (box) roots expand to member
maps the same way as derived-type instances; getDescriptorMapType may
add OMP_MAP_CLOSE to implicit descriptor members while the parent map
does not set close, which led to bad device behavior under
-fopenmp-force-usm with multiple mapped allocatables.
Co-authored-by: Composer (Cursor) <ai at cursor.com>
AMDGPU: Back-propagate wqm for sources of side-effect instruction (#193395)
For readfirstlane instruction, as it would get undefined value if exec
is zero. To handle the case that only helper lanes execute the parent
block, we let the readfirstlane to execute under wqm. But this is not
enough. If the parent block was also executed by non-helper lanes, we
also need to make sure its sources were calculated under wqm. Otherwise,
if the instruction that generate the source of readfirstlane was
executed under exact mode, the value would contain garbage data in help
lane. The garbage data in helper lane maybe returned by the
readfirstlane running under wqm.
To fix this issue, we need to enforce the back-propagation of wqm for
instructions like readfirstlane. This was only done if the instruction
was possibly in the middle of wqm region (by checking OutNeeds).
[GVN] Propagate isMemorySSAEnabled() into ValueTable (#193938)
`GVNPass::runImpl()` calls `VN.setMemorySSA(MSSA)` with a single
argument. The second parameter of `ValueTable::setMemorySSA()`,
`MSSAEnabled`, defaults to `false`, so `ValueTable::IsMSSAEnabled`
remains false even when the pass is configured with
`-enable-gvn-memoryssa=1` or `-passes='gvn<memoryssa>'`.
The MemorySSA-backed value-numbering paths in
`ValueTable::lookupOrAddCall()` and `ValueTable::computeLoadStoreVN()`
are gated on `IsMSSAEnabled`, making them unreachable from runImpl() on
main today.
This patch forwards isMemorySSAEnabled() as the second argument to
setMemorySSA(), so selecting the MemorySSA backend actually enables
MemorySSA-aware value numbering.
sysutils/nvtop: New port: GPU & Accelerator process monitoring
NVTOP stands for Neat Videocard TOP, a (h)top like task monitor for GPUs
and accelerators. It can handle multiple GPUs and print information
about them in a htop-familiar way.
Currently supported vendors are AMD (Linux amdgpu driver), Apple
(limited M1 & M2 support), Huawei (Ascend), Intel (Linux i915/Xe
drivers), NVIDIA (Linux proprietary divers), Qualcomm Adreno (Linux MSM
driver), Broadcom VideoCore (Linux v3d driver), Rockchip, MetaX (MXSML
driver), Enflame (Linux EFML driver).
PR: 294825
Sponsored by: UNIS Labs
[X86] Mark machine-block-hash.mir as XFAIL on big-endian hosts (#194279)
Test introduced in #193107 assumes `stable_hash_combine` is stable,
but it turns out it's not true.
print/pdf-tools: Add pkgconf build dependency and fix configure env
The recent import of pkgconf into the FreeBSD base system temporarily
caused a print/pdf-tools build failure and exposed two issues with the
port. First, pkgconf should be a direct build dependency. Second,
${CONFIGURE_ENV} should be passed to ./configure so that
PKG_CONFIG_LIBDIR is set correctly regardless of the pkgconf
implementation in the environment.
Sponsored by: The FreeBSD Foundation
amd64: ia32_fetch_syscall_args() does not need to check params != NULL
Whatever params pointer is, it does not matter. copyin() handles any
values. In fact, params cannot be ever NULL.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D56630
amd64 ia32_syscall(): only allow for ILP32 processes
64bit processes can issue INT $0x80 instruction, and get the syscall
dispatched through ia32_syscall(). This works because syscall argument
fetch and result return are selected from the process sysent.
But, ia32_syscall() does not verify some conditions and does not perform
some actions which are considered unnecessary because the caller is
supposed to only access lower 4G. The INT syscall path breaks this
assumption.
We never supported such hack, so disable it. Send the offending thread
SIGBUS as if #GP was issued by hardware due to IDT vector 0x80 having
not numerically high enough DPL value.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D56630