[NFC][LLVM] Refactor IIT_ANY payload for vector/element constraint (#203506)
Change `IIT_ANY` payload from a single packed OverloadIndex + AnyKind
byte to 2 bytes:
- An 8 bit OverloadIndex
- An 8 pit packed vector + element type constraint.
This will enable `IIT_ANY` to express constraints on the overload type
is a more general fashion compared to a flat `AnyKind` enum.
Also fixed a latent bug in fixed encodings generated by the intrinsic
emitter (exposed by this change). Existing `encodePacked` packs the
type-signature as 8 nibbles into a 32-bit word and then checks if the
MSB bit position (i.e., bit 15) is 0 (to allow it's use in fixed
encoding). This effectively drop any 0 valued bytes in the encoding in
the upper 4 nibbles. Fix this by changing `encodePacked` to use the
actual fixed encoding type and its size.
[flang][CUDA] Keep host literals from using unified-memory generic distance (#201257)
Fix CUDA generic resolution under `-gpu=mem:unified` so unattributed
literals and expression temporaries are not treated as unified-memory
actuals.
Previously, a host scalar literal such as `1.0` could score as
compatible with a `DEVICE` dummy and incorrectly select the
device-scalar overload. This could pass a host stack address to a device
helper and fail at runtime. The fix applies the unified/managed memory
distance columns only to symbol-backed actuals.
[flang][cuda] Fix host loads from CUDA constant globals (#203064)
This fixes CUDA Fortran lowering for scalar module variables with the
constant attribute that are read from host code, such as launch
configuration expressions or CUF kernel loop bounds.
Previously, host-side declarations for these globals could be rewritten
to device constant-memory addresses, causing host loads to dereference
the result of _FortranACUFGetDeviceAddress. The fix preserves host reads
from the host-visible global while still using the device address for
host-to-device assignment updates.
A FIR regression test covers host reads and assignment updates for
scalar CUDA constant globals.
[BOLT] Change DataAggregator error types (#203651)
1. In `filterBinaryMMapInfo`, replace `incovertibleErrorCode` with errc
code as `parseMainEvents` converts returned Error to std::error_code.
2. In `parsePerfData`, pass through Error returned by `prepareToParse`
for memory events.
Test Plan: updated perf_test.test
[libc] fix EAGAIN being treated as timeout in mutex and rwlock (#203574)
fix #203411.
This PR addresses the problem that `EAGAIN` may be treated as timeout in
mutex and rwlock. Two changes are applied:
1. timeout sites always explicitly check for timeout now to make the
logic more robust;
2. the futex wait now discards the error of `EAGAIN/EWOULDBLOCK` and
returns 0;
We don't distinguish waking up from signal and waking up from mismatch
for the following 3 reasons:
- We have userspace guard to avoid futex syscall if we already know
value would match, it seems awkward to make that check returns error, as
we may wake up and loop back to the check, where signal is consumed but
we still return error....;
- futex syscall can spuriously wake up anyway, there is no way to tell
[3 lines not shown]
QuantileType bytecode patch (#203495)
Since the merge of this
PR(https://github.com/llvm/llvm-project/pull/190321) there were some
issues identified, such as QuantileType not being added in the ByteCode
files. This PR focuses on fixing these missing pieces which should make
QuantileType a complete and functional type.
[libc] implement mkstemp (#199220)
Fixes #191266
Implements `mkstemp` as specified in POSIX
Currently Linux-only since it relies on the Linux syscall wrappers for
`getrandom` and `open`
Revert "[AMDGPU] In `LowerDYNAMIC_STACKALLOC`, hoist the `readfirstlane` up one instruction" (#203645)
Reverts llvm/llvm-project#201528
Reverting due to change causing "illegal VGPR to SGPR copy"
[SPIR-V] Lower freeze instructions with aggregate operands (#203584)
An aggregate freeze takes its result type from its operand, like a PHI
or select, but was handled by neither the up-front value-id mutation nor
replaceMemInstrUses, so the pass aborted with "illegal aggregate
intrinsic user". Mutate aggregate freezes to the i32 value-id type and
replace their operands alongside PHIs and selects.
[AMDGPU][true16] extract 16bit for scratch_load_ubyte_st when spilling (#203589)
In sramecc mode scratch_load_ubyte_st is selected for 16bit spilling.
Need a tmp vgpr32 and extract lo16 from it
Merge branch 'main' of github.com:llvm/llvm-project into users/ziqingluo/PR-179173940
Conflicts:
clang/unittests/ScalableStaticAnalysisFramework/Analyses/PointerFlow/PointerFlowTest.cpp
[AArch64][PAuth] Fix return-address auth for swifttailcc with FPDiff > 0 (#203340)
When a swifttailcc tail call has FPDiff > 0 (the caller received more
stack argument space than the callee pops), the epilogue contains an SP
adjustment to discard the leftover argument space. The existing code
treated both FPDiff < 0 and FPDiff > 0 uniformly in a single 'FPDiff !=
0' block, using AUTI[AB]1716 with a reconstructed entry-SP in x16 for
both cases.
For FPDiff < 0 (callee pops more) that reconstruction is necessary and
correct. For FPDiff > 0 it is wrong: by the time we enter the block the
post-index LDP has already adjusted SP back to the frame base, but the
'add sp, sp, #N' argument pop has not yet run. Entry SP equals the
current SP at that point, so AUTI[AB]SP would work directly, but instead
the combined block bumped SP via StackOffset::getFixed(-FPDiff) which
overshoots, and then emits AUTIA1716 with a wrong discriminator. Worse
yet, the SP restore had already been emitted *before* the auth, leaving
the live argument stack below SP and outside the red-zone during the
authentication window.
[9 lines not shown]
[SSAF][PointerFlow] Recognize reference-to-pointer/array Decls
Decls of reference-to-pointer/array types are now treated the same as
those of pointer/array type.
rdar://179173940
[BOLT] Fix perf data return identification (#203628)
If perf data doesn't have branch type recorded, missing value would
incorrectly be interpreted as not-a-return. Only populate Returns map if
the branch type is available.
Fixes bug introduced in #202813.
[LoopInterchange] Mark getAddRecCoefficient with static (#203624)
As this function is a file-scope non-member function, it's better to
mark it with static.
[LoopInterchange] Fix crash when followLCSSA returns constant (#203515)
Similar as the case in ##201069, `followLCSSA` may return a constant
value, but it was cast to Instruction unconditionally. We need to
explicitly check whether the returned value is an Instruction or not.
Fix #203375.