[Hexagon] Add ShadowCallStack support (#200508)
Implement the software ShadowCallStack for Hexagon.
On Hexagon, r19 is used as the shadow stack pointer (reserved via
-ffixed-r19). On function entry the LR (r31) is saved to the shadow
stack and the pointer is advanced; on exit the LR is restored from the
shadow stack before returning.
Prologue sequence:
r19 = add(r19, #4)
memw(r19+#-4) = r31
Epilogue sequence (between deallocframe/jumpr r31):
r31 = memw(r19+#-4)
r19 = add(r19, #-4)
[5 lines not shown]
[AMDGPU] Fix overlapping insert crash during rewrite-agpr-copy-mfma
Fixes https://github.com/llvm/llvm-project/issues/204224
Guard against a possibly wrong interference result for a discontiguous
stack slot interval by using the entire range.
A spilled stack slot can have a discontiguous live interval, e.g. a single
value live across several disjoint segments:
[a, b) [c, d) ........gap........ [e, f)
with gaps where the slot is dead. The interference check previously only
considered the covered segments, so it could pick a PhysReg that is free
within them but busy inside a gap. Unspilling replaces the slot with a vreg
whose recomputed interval is continuous over [a, f) (it fills the gaps),
so assigning that PhysReg could overlap the value live in the gap and trip
the "Overlapping insert" assertion in LiveRegMatrix::assign. Checking
interference over the whole [a, f) hull avoids this.
[2 lines not shown]
[llubi] Add support for byval pointer arguments (#201852)
This patch adds support for the byval attribute. The hidden copy is
implemented as memcpy with the allocation size of the specified type.
See https://github.com/llvm/llvm-project/pull/205576 for more
information.
Revert "[Instrumentor] Add runtime examples: [1/N] A flop counter" (#205960)
This reverts commit 61cbfabb7ade682a64f516c871b2bacb1e3e324a.
Fails compiler-rt standalone builds, though, locally it works fine :(
sched_ule: Fix off by one in preempt_thresh definition
Since 'preempt_thresh' is set to PRI_MIN_KERN by default, and comparison
of the considered thread's priority with that threshold is done with
'<=', PRI_MIN_KERN threads actually can preempt other threads, contrary
to other non-interrupt kernel ones (between PRI_MIN_KERN + 1 and
PRI_MAX_KERN).
So, replace the comparison operator '<=' by '<'. The alternative would
be to change the default value, but changing the comparison instead has
the benefit to be consistent with the 0 setting (which forbids
preemption entirely), since allowing only threads with priority 0 to
preempt becomes possible.
Consequently, we also change the default value for the FULL_PREEMPTION
option by adding 1 to PRI_MAX_IDLE (in practice, that does not make any
difference in the current setting, since no preemption will happen if
the new priority value is not strictly lower than the current one, and
PRI_MAX_IDLE is PRI_MAX, the highest possible priority).
[8 lines not shown]
nfs_commonkrpc.c: Improve handling of NFSv4.1/4.2 recovery
Commit 4d80d4913e79 fixed a long standing bug in the recovery
code. However. glebius@ reported seeing multiple
recovery cycles with this patch during an NFSv4.1/4.2
server reboot.
This commit should minimize the risk of multiple
recovery cycles.
PR: 294925
(cherry picked from commit ea4886f2829bf33866c8c0c60b14a9641fc54b40)
nfs_commonkrpc.c: Improve handling of NFSv4.1/4.2 recovery
Commit 4d80d4913e79 fixed a long standing bug in the recovery
code. However. glebius@ reported seeing multiple
recovery cycles with this patch during an NFSv4.1/4.2
server reboot.
This commit should minimize the risk of multiple
recovery cycles.
PR: 294925
(cherry picked from commit ea4886f2829bf33866c8c0c60b14a9641fc54b40)
[X86][APX] Implement push+push2+push pre-alignment strategy for PP2 (#205031)
Replace the dummy "push %rax" stack-alignment padding for APX push2/pop2
(PP2) with a push+push2+push strategy: when an even number of callee-saved
GPRs is involved, a single CSR push provides the 16-byte alignment instead
of a throwaway push %rax, and the remaining registers use push2/pop2. The
padForPush2Pop2 flag and its associated dummy push, SUB/LEA padding, and
SEH_StackAlloc emission in spill/restoreCalleeSavedRegisters are removed.
BuildStackAdjustment now uses NF (no-flags) variants of ADD/SUB, but
only as a smaller replacement for LEA, i.e. only when EFLAGS must be preserved
across the adjustment. When EFLAGS is dead the plain SUB/ADD is kept, which is
shorter than the EVEX-encoded NF form. The NF opcodes are 64-bit
(SUB64ri32_NF/ADD64ri32_NF), so they are not used for the x32 ABI, and
they are recognized in mergeSPUpdates and the epilogue backward scan.
Update LIT tests accordingly.
Assisted-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
nfs: Fix argument typo to avoid a crash
A typo resulted in the wrong argument for a bytewise
comparison that could result in a crash if
the incorrect argument was not a valid pointer.
This patch fixes the argument.
While investigating this, I noticed that the
correct argument was not being filled in as
required, so this patch fixes that, as well.
Somehow, recovery from a NFSv4.1/4.2 server
crash worked during testing, so this was not
detected. The bug/patch only affects NFS
client mounts using NFSv4.1/4.2.
PR: 294925
(cherry picked from commit 4d80d4913e79c8b5918b1f04c1c7b38e6c76b9b4)
nfs: Fix argument typo to avoid a crash
A typo resulted in the wrong argument for a bytewise
comparison that could result in a crash if
the incorrect argument was not a valid pointer.
This patch fixes the argument.
While investigating this, I noticed that the
correct argument was not being filled in as
required, so this patch fixes that, as well.
Somehow, recovery from a NFSv4.1/4.2 server
crash worked during testing, so this was not
detected. The bug/patch only affects NFS
client mounts using NFSv4.1/4.2.
PR: 294925
(cherry picked from commit 4d80d4913e79c8b5918b1f04c1c7b38e6c76b9b4)
[VectorCombine] Fold zero tests of or/umax reductions (#205622)
Recognize equality and inequality tests against zero on vector.reduce.or
and vector.reduce.umax. When profitable, replace the scalar reduction
and
compare with a lane-wise comparison followed by an i1 reduce.or or
reduce.and.
Run the existing zero-preserving reduction fold first to retain its more
specific canonicalization opportunities.
Proof: https://alive2.llvm.org/ce/z/pyoTwP
Fixed https://github.com/llvm/llvm-project/issues/205028
[Instrumentor] Add runtime examples: [1/N] A flop counter (#205698)
This adds a instrumentor-tools folder into compiler RT to showcase use
cases of the instrumentor. The initial example is a program that, via
instrumentation, counts the number of flops performed. Call and
intrinsic support will follow after #198042.
This is the second try with more CMake magic after
https://github.com/llvm/llvm-project/pull/205221 failed on some
platforms.
Partially developped by Claude (AI), tested and verified by me.
build.7: explain how to build KBI-compatible standalone module
Reviewed by: imp, kevans
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D57859
vmd(8): prevent OOB reads in 32 and 64-bit ELF loaders.
Malformed ELF files could cause reading past section-headers.
For ELF64 files, malformed section metadata could cause out of bound
reads of heap allocated buffers.
Reported by Frank Denis.
Discussed with and "go for it" from mlarkin@
[flang][openacc] add acc.routine op for external names added in bind clauses. (#205591)
This adds acc.routine ops for the func.func ops that declare external
functions bound for device specific. This is needed to get the
ACCRoutineToGPUFunc pass to move the function declaration into the
correct region.
This is a follow-up from
[#203088](https://github.com/llvm/llvm-project/pull/203088) which
unblocked the original pass that was stalling bind clauses, but failed
further down the pipeline.
[CIR] Implement Direct+canFlatten in CallConvLowering
ArgKind::Direct with a multi-field coerced struct and the canFlatten flag
means the coerced struct is passed as one scalar wire argument per field.
CallConvLowering was passing it as a single aggregate, ignoring canFlatten.
A new getFlattenedCoercedType helper recognizes the Direct+canFlatten arg
shape. At the callee, insertArgCoercion replaces the single block argument
with N scalar block args, stores each into an alloca of the coerced struct
type, reloads it, and coerces back to the original argument type when the
coerced struct type differs from the original. The Ignore-drop loop and
updateArgAttrs account for the N block-argument slots a flattened arg
occupies; updateArgAttrs also shapes them on the sret return path.
At the call site, when the operand type differs from the coerced struct
type the operand is coerced through a memory slot and each field is read
from that slot with cir.get_member + cir.load (via a new emitCoercionToMemory
helper that returns the coerce-slot pointer without loading the whole
aggregate); when the types already match each field is extracted directly
[7 lines not shown]