[AMDGPU] Make VALU instructions defining SGPR non-ignorable (#195270)
This fixes an issue where CSE would incorrectly eliminate an instruction
that produces a lane mask. For example, the second V_CMP_GT in the code
below cannot be replaced with %3, despite both having the same operands
as it would cause an incorrect exec mask being calculated in %6:
```
bb.1
%3:sreg_64 = V_CMP_GT_U32_e64 %0:vgpr_32, %1:sreg_32, implicit $exec
%4:sreg_64 = SI_IF_BREAK killed %3:sreg_64, %2:sreg_64, implicit-def dead $scc
SI_LOOP %4:sreg_64, %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
S_BRANCH %bb.2
bb.2:
SI_END_CF %4:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
%5:sreg_64 = V_CMP_GT_U32_e64 %0:vgpr_32, %1:sreg_32, implicit $exec
%6:sreg_64 = S_AND_B64 %5:sreg_64, $exec, implicit-def $scc
```
[3 lines not shown]
Expand sharing protocol tests for NFS
This commit converts some NFS tests into using lower-level
pynfs library to explicitly test server behavior and expands
test coverage for readdir operations.
Originally tests were executed via the linux NFS client which
was extremely limiting in how we can exercise server in a
fine-grained manner.
[clang] Deduce _BitInt(N) template parameter as size_t (#195534)
Update template argument deduction to deduce the `N` in `_BitInt(N)` as
`size_t` rather than `int`. This increases consistency with deduction of
array sizes, and matches the behavior proposed in P3666.
Fixes #195033
[llvm-otool] Add -m flag and archive(member) input syntax (#194234)
Support classic otool's archive(member) input syntax where a filename
like 'foo.a(bar.o)' extracts and processes only the named member from
the archive. The -m flag disables this syntax parsing, treating the
entire string as a literal filename.
Fixes #126272
AMDGPU/GlobalISel: Switch to extended LLTs
Switch is required to be able to translate bfloat.
After the switch most of the codegen patterns now require explicit
type on register to match instead of LLT::scalar.
So we can still use LLT::scalar for type checks but new instructions
created during lowerings/combines need to use propper extended LLT.
inst select test sources fully switched to i32/f32 so patterns can match
for legalizer and regbanklegalize left as is (should probably be switched
as well)
New functionality worth noting is f16 and bitcast lowering to i32
f16 = g_bitcast i16
->
i32 = g_anyext i16
f16 = g_trunc i32
f16 = trunc i32 is legal
[clang][OpenMP][SPIRV] Use the right calling convention for reduction helpers (#195911)
This is a follow-up to #194879 to ensure that the helpers for reduction use the right calling convention (in particular that they are marked as spir_func for SPIRV).
Assisted by Claude Sonnet 4.5.
Syscall migrations of stdio and unistd (#196403)
Added ErrorOr-returning syscall wrappers for access, chdir, dup, dup2,
dup3, faccessat, fchdir, fsync, lseek, readlink, readlinkat, rename,
rmdir, and unlinkat.
Migrated the Linux entrypoint implementations in src/unistd/linux/ and
src/stdio/linux/rename.cpp to use them.
Replaced internal::lseekimpl() with linux_syscalls::lseek() in the
File infrastructure and deleted the now-unused lseekImpl.h.
Assisted-by: Automated tooling, human reviewed.
Exclude unsupported compiler-rt tests on z/OS. (#194437)
This PR excludes unsupported part (NAN, -NAN, INFINITY, -INFINITY) from
the following 2 compiler-rt tests on z/OS.
```
compiler-rt/test/builtins/Unit/compiler_rt_scalbnl_test.c
compiler-rt/test/builtins/Unit/compiler_rt_logbl_test.c
```
[BOLT][AArch64] Add support for LDR relaxation on LDRSWl (#196051)
BOLT currently supports LDR relaxation for LDRXl and LDRWl. Add support
for LDR relaxation on LDRSWl.
[OpenMP] Fix set-but-unused-var warning in omptest (#196069)
This fixes a warning in omptest about a set but unused variable. The var
was intended to control whether colored logging output is created.
That logic has been moved into the `Logger` itself.
[libc][math] Fix -Wshadow warnings in FMod.h (#196346)
The using statement inside the lambda is redundant with the same using 4
lines up.
No behavior change.
AMDGPU: Reland: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter which will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
Original patch had a bug where it did not check if physical src
registers match register class of appropriate operand in fullVOPD
instructions, check is now done via isValidVOPDSrc.