[Instrumentor] Move NumericFlags into InstrumentorRuntimeHelper.h (#204068)
This patch makes the `NumericFlags` enum visible to the end user by
moving it into `InstrumentorRuntimeHelper.h`.
nfs_clvnops.c: Fix the case where va_flags are being cleared
Commits c5d72d2 and 3b6d4c6 broke the case where the
archive/hidden/system attributes are being set false
(UF_ARCHIVE, UF_HIDDEN or UF_SYSTEM bits being cleared.)
and the NFS server does not support those attributes.
These patches only checked for support if the
archive/hidden/system attributes were non-zero.
This patch fixes the problem.
PR: 296088
Tested by: Joshua Kinard <freebsd at kumba.dev>
MFC after: 1 week
Fixes: c5d72d29fe0e ("nfsv4: Add support for the NFSv4 hidden and system attributes")
[CHERI] Fix incorrect MAX_E for RV64Y capabilities. (#204487)
Add tests for all capability formats at the upper end of their ranges, which would have caught this oversight.
[DirectX] Add DXILRemoveUnusedResources pass (#200965)
Adds `DXILRemoveUnusedResources` pass that scans the module and removes
any resource that is not used. It means that it removes calls to
`dx_resource_handlefrom{implicit}binding` whose return value is either
not used at all, or it is saved to a global variable that does not have
external linkage and is not used anywhere else in the module.
This pass needs to run before implicit resource binding assignment pass.
The test `unused-resources-impl-binding.ll` makes sure the implicit
binding assignments are not affected by the unused resources.
Since we have many tests that are initializing resources without
actually using them, an internal option
`-disable-dxil-remove-unused-resource` has been added to `llc` so we can
keep these tests simple without adding extra code to artificially use
each resource.
Depends on #200312
Fixes #192524
[libc][math][c23] Improve rsqrtf16 function for targets without fp32 FPUs. (#160639)
Closes #159378
#### Changes
- This PR adds math approximation for targets that don't have hardware
for floats - in other words, targets that don't have
`LIBC_TARGET_CPU_HAS_FPU_FLOAT`
- This PR also introduces Google Benchmark for rsqrtf16
- Fixed typo in `+inf` case. Should return +0 according to
[F.10.4.9](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf)
[lldb] Report a generic wasm32 architecture for Wasm object files (#204496)
ObjectFileWasm hardcoded the architecture of every Wasm module as
"wasm32-unknown-unknown-wasm". A Wasm binary does not actually encode a
vendor or OS, those are properties of the runtime executing it.
When debugging via a runtime whose gdb stub reports a more specific
triple (e.g. WAMR reports "wasm32-wamr-wasi-wasm"), lldb adopts that
triple and clears the module list. The dynamic loader then tries to
reload the main executable, but GetOrCreateModule rejects the on-disk
file because the triples are incompatible. This causes lldb to back to
reading from memory.
Fix all this by reporting a bare "wasm32"/"wasm64" architecture instead.
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[SelectionDAG] Keep split vector atomic store value in a vector register
When the value of an ATOMIC_STORE has a vector type whose legalization
action is split (e.g. <4 x half>/<4 x bfloat> on X86 without F16C),
SplitVecOp_ATOMIC_STORE bitcast the value straight to a scalar integer
spanning the memory width. For a split vector that bitcast is expanded
element by element, reassembling the value in GPRs (a long pextrw/shl/or
sequence) before the store.
Instead, keep the value in a vector register when a legal vector form
exists: reinterpret it as a same-shaped integer-element vector (an FP
element type may have no legal vector form, e.g. bfloat on SSE2, while
the integer-of-element-size form does), widen that to a legal vector,
and extract the low integer element of the memory width. This issues the
store directly from a vector register (a single MOVQ/MOVD on X86),
matching the widen-path codegen already produced on AVX targets. Falls
back to the scalar bitcast when no suitable legal vector type exists.
[MLIR][XeGPU] Treat lane_data repacks as compatible layouts (#204016)
A subgroup-level convert_layout that only repacks lane_data while keeping
lane_layout unchanged (e.g. [N, 1] to [1, 1] with order = [1, 0]) is a no-op
after lane distribution: each lane owns the same elements in the same order.
Previously isCompatibleWith compared per-distribution-unit block starts, which
encode the lane_data blocking, so such layouts looked incompatible.
Handle this at the Lane level in isCompatibleWith by expanding the block
starts into per-element coordinates before comparing. The expansion only runs
when lane_data differ; otherwise the cheaper block-start comparison is exact.
The shared logic lives in a compareDistributedCoords helper used by both
LayoutAttr and SliceAttr. The Subgroup level is left for a follow-up (TODO).
Add a lit test covering the fold in sg-to-lane-distribute-unit.mlir.
[lldb] Skip the prologue when a function's entry has no line row (#204480)
Function::GetPrologueByteSize computed the prologue only when a line
table row contained the function's entry address (low_pc). When no row
covers low_pc it returned 0, leaving a name breakpoint sitting on the
function's entry address. For WebAssembly the entry address is the
function's locals-declaration byte rather than an instruction, so the
line table has no row there and the breakpoint is never hit.
When low_pc has no covering row, fall back to the first line row that
begins within the function's range and run the existing prologue logic
on it. For functions whose entry is already covered (all normally
compiled native code) this branch is not taken, so behavior is remains
unchanged.
This PR adds a hand (Claude) crafted regression test with a function
whose entry address is not covered by a line row.