[X86] Add target verifier with subtarget-dependent checks
Add an X86 TargetVerify, registered by triple, so the target-independent
TargetVerifierPass dispatches to it for X86 modules.
These checks depend on the features in a function's target-cpu /
target-features attributes, which the generic triple-only IR verifier
cannot see. The MCSubtargetInfo is built from those attributes, so no
TargetMachine is needed and the pass runs from generic pipelines:
- x86 instruction-set intrinsics (llvm.x86.avx/avx2/avx512.*) require
the matching AVX/AVX2/AVX-512 feature.
- 128/256-bit AVX-512 intrinsics additionally require AVX512VL.
- The x86_amx type requires AMX-TILE.
[DAGCombiner] Fold multiplication through vp_merge into `partial_reduce_*mla` (#205890)
DAGCombiner is already doing this right now:
```
partial_reduce_*mla(acc, sel(p, mul(*ext(a), *ext(b)), splat(0)), splat(1))
-> partial_reduce_*mla(acc, sel(p, a, splat(0)), b)
```
We should be able to have drop-in support for vp_merge (not for
vp_select though) as the select in the pattern above. This patch adds
such support.
The test is checking whether RISC-V's Zvdot4a8i instruction is
generated, as it depends on this pattern to fold away not just the
multiplication but also the sign / zero extensions.
[Hexagon] Add ShadowCallStack support (#200508)
Implement the software ShadowCallStack for Hexagon.
On Hexagon, r19 is used as the shadow stack pointer (reserved via
-ffixed-r19). On function entry the LR (r31) is saved to the shadow
stack and the pointer is advanced; on exit the LR is restored from the
shadow stack before returning.
Prologue sequence:
r19 = add(r19, #4)
memw(r19+#-4) = r31
Epilogue sequence (between deallocframe/jumpr r31):
r31 = memw(r19+#-4)
r19 = add(r19, #-4)
[5 lines not shown]
Make getaddrinfo(3) check hnok_lenient() earlier.
r1.60 added special handling for localhost names; this was done before the
hnok_lenient() check, ensure this validation applies to localhost names too.
ok florian
[AMDGPU] Fix overlapping insert crash during rewrite-agpr-copy-mfma
Fixes https://github.com/llvm/llvm-project/issues/204224
Guard against a possibly wrong interference result for a discontiguous
stack slot interval by using the entire range.
A spilled stack slot can have a discontiguous live interval, e.g. a single
value live across several disjoint segments:
[a, b) [c, d) ........gap........ [e, f)
with gaps where the slot is dead. The interference check previously only
considered the covered segments, so it could pick a PhysReg that is free
within them but busy inside a gap. Unspilling replaces the slot with a vreg
whose recomputed interval is continuous over [a, f) (it fills the gaps),
so assigning that PhysReg could overlap the value live in the gap and trip
the "Overlapping insert" assertion in LiveRegMatrix::assign. Checking
interference over the whole [a, f) hull avoids this.
[2 lines not shown]
[llubi] Add support for byval pointer arguments (#201852)
This patch adds support for the byval attribute. The hidden copy is
implemented as memcpy with the allocation size of the specified type.
See https://github.com/llvm/llvm-project/pull/205576 for more
information.
Revert "[Instrumentor] Add runtime examples: [1/N] A flop counter" (#205960)
This reverts commit 61cbfabb7ade682a64f516c871b2bacb1e3e324a.
Fails compiler-rt standalone builds, though, locally it works fine :(
sched_ule: Fix off by one in preempt_thresh definition
Since 'preempt_thresh' is set to PRI_MIN_KERN by default, and comparison
of the considered thread's priority with that threshold is done with
'<=', PRI_MIN_KERN threads actually can preempt other threads, contrary
to other non-interrupt kernel ones (between PRI_MIN_KERN + 1 and
PRI_MAX_KERN).
So, replace the comparison operator '<=' by '<'. The alternative would
be to change the default value, but changing the comparison instead has
the benefit to be consistent with the 0 setting (which forbids
preemption entirely), since allowing only threads with priority 0 to
preempt becomes possible.
Consequently, we also change the default value for the FULL_PREEMPTION
option by adding 1 to PRI_MAX_IDLE (in practice, that does not make any
difference in the current setting, since no preemption will happen if
the new priority value is not strictly lower than the current one, and
PRI_MAX_IDLE is PRI_MAX, the highest possible priority).
[8 lines not shown]
nfs_commonkrpc.c: Improve handling of NFSv4.1/4.2 recovery
Commit 4d80d4913e79 fixed a long standing bug in the recovery
code. However. glebius@ reported seeing multiple
recovery cycles with this patch during an NFSv4.1/4.2
server reboot.
This commit should minimize the risk of multiple
recovery cycles.
PR: 294925
(cherry picked from commit ea4886f2829bf33866c8c0c60b14a9641fc54b40)
nfs_commonkrpc.c: Improve handling of NFSv4.1/4.2 recovery
Commit 4d80d4913e79 fixed a long standing bug in the recovery
code. However. glebius@ reported seeing multiple
recovery cycles with this patch during an NFSv4.1/4.2
server reboot.
This commit should minimize the risk of multiple
recovery cycles.
PR: 294925
(cherry picked from commit ea4886f2829bf33866c8c0c60b14a9641fc54b40)
[X86][APX] Implement push+push2+push pre-alignment strategy for PP2 (#205031)
Replace the dummy "push %rax" stack-alignment padding for APX push2/pop2
(PP2) with a push+push2+push strategy: when an even number of callee-saved
GPRs is involved, a single CSR push provides the 16-byte alignment instead
of a throwaway push %rax, and the remaining registers use push2/pop2. The
padForPush2Pop2 flag and its associated dummy push, SUB/LEA padding, and
SEH_StackAlloc emission in spill/restoreCalleeSavedRegisters are removed.
BuildStackAdjustment now uses NF (no-flags) variants of ADD/SUB, but
only as a smaller replacement for LEA, i.e. only when EFLAGS must be preserved
across the adjustment. When EFLAGS is dead the plain SUB/ADD is kept, which is
shorter than the EVEX-encoded NF form. The NF opcodes are 64-bit
(SUB64ri32_NF/ADD64ri32_NF), so they are not used for the x32 ABI, and
they are recognized in mergeSPUpdates and the epilogue backward scan.
Update LIT tests accordingly.
Assisted-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
nfs: Fix argument typo to avoid a crash
A typo resulted in the wrong argument for a bytewise
comparison that could result in a crash if
the incorrect argument was not a valid pointer.
This patch fixes the argument.
While investigating this, I noticed that the
correct argument was not being filled in as
required, so this patch fixes that, as well.
Somehow, recovery from a NFSv4.1/4.2 server
crash worked during testing, so this was not
detected. The bug/patch only affects NFS
client mounts using NFSv4.1/4.2.
PR: 294925
(cherry picked from commit 4d80d4913e79c8b5918b1f04c1c7b38e6c76b9b4)
nfs: Fix argument typo to avoid a crash
A typo resulted in the wrong argument for a bytewise
comparison that could result in a crash if
the incorrect argument was not a valid pointer.
This patch fixes the argument.
While investigating this, I noticed that the
correct argument was not being filled in as
required, so this patch fixes that, as well.
Somehow, recovery from a NFSv4.1/4.2 server
crash worked during testing, so this was not
detected. The bug/patch only affects NFS
client mounts using NFSv4.1/4.2.
PR: 294925
(cherry picked from commit 4d80d4913e79c8b5918b1f04c1c7b38e6c76b9b4)
[VectorCombine] Fold zero tests of or/umax reductions (#205622)
Recognize equality and inequality tests against zero on vector.reduce.or
and vector.reduce.umax. When profitable, replace the scalar reduction
and
compare with a lane-wise comparison followed by an i1 reduce.or or
reduce.and.
Run the existing zero-preserving reduction fold first to retain its more
specific canonicalization opportunities.
Proof: https://alive2.llvm.org/ce/z/pyoTwP
Fixed https://github.com/llvm/llvm-project/issues/205028
[Instrumentor] Add runtime examples: [1/N] A flop counter (#205698)
This adds a instrumentor-tools folder into compiler RT to showcase use
cases of the instrumentor. The initial example is a program that, via
instrumentation, counts the number of flops performed. Call and
intrinsic support will follow after #198042.
This is the second try with more CMake magic after
https://github.com/llvm/llvm-project/pull/205221 failed on some
platforms.
Partially developped by Claude (AI), tested and verified by me.
build.7: explain how to build KBI-compatible standalone module
Reviewed by: imp, kevans
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D57859
vmd(8): prevent OOB reads in 32 and 64-bit ELF loaders.
Malformed ELF files could cause reading past section-headers.
For ELF64 files, malformed section metadata could cause out of bound
reads of heap allocated buffers.
Reported by Frank Denis.
Discussed with and "go for it" from mlarkin@