[ARM][Libc] Fix ARM big-endian low-end inline_memset byte fills (#198777)
Fix inline_memset_arm_low_end to use the splatted 32-bit fill value for
all byte stores. This ensures that all stores, including unaligned and
trailing byte fills, write the requested repeated byte value on
big-endian ARM.
The low-end ARM memset path already builds value32 as value 0x01010101U
for aligned word/block stores, but its byte-wise prefix and tail
handling still passed the raw 8-bit value * widened to uint32_t. Use
value32 for the byte-wise alignment prefix and final tail loop as well.
This makes the low-end path endian-safe and consistent with
inline_memset_arm_mid_end, fixing incorrect NUL/truncated output in
printf-family formatting and direct memset tests on big-endian ARM
no-unaligned-access targets.
[lldb][Windows] use pipes when no terminal dimensions are sent (#203562)
Plumb `eLaunchFlagUsePipes` from the lldb-dap client through the
gdb-remote protocol to lldb-server so the server's LaunchProcess can
choose between ConPTY and anonymous pipes for inferior stdio.
This is needed for LLDB DAP in `internalConsole` mode.
Fixes `TestDAP_launch_args.py`, `TestDAP_launch_basic.py`, and
`TestDAP_launch_shellExpandArguments_disabled.py` on Windows under
`LLDB_USE_LLDB_SERVER=1`.
rdar://178725958
[lldb-dap] subscribe to target events at the broadcaster-manager level (#201866)
https://github.com/llvm/llvm-project/pull/200133 added
`wait_for_module_events()` assertions to `TestDAP_attachCommands.py` and
`TestDAP_launch_extra_launch_commands.py`. However, `SetTarget` is
called after the `Run*Commands`, so this fails on Windows, because it
loads modules early during the process startup. The module-load events
are not listened to.
This patch moves the event subscription to
`DAPSessionManager::GetEventThreadForDebugger` and uses
`StartListeningForEventClass` so any target created in this debugger
automatically gets the listener attached. This is the same pattern we
already use for `SBThread` events.
This is a follow up to https://github.com/llvm/llvm-project/pull/201796
which skipped the tests.
rdar://179923440
[libc++] Add WG21 issues and papers voted in Brno (#204191)
All papers were targeting C++29, so I created new CSV tracking files. A
few papers were forwarded by CWG in plenary but have library wording, so
they are included here. I believe at least some of them will be *nothing
to do*, but should still be included for completeness.
Minor drive-bys:
- rename Cxx2c to Cxx26
- synchronize RST notes from CSVs to Github issues
Assisted by Claude
[LLVM][Intrinsics] Add scalar-only and vector-only overload types (#204138)
Add integer/fp overload types that allow only scalar or only vector
types, and adopt them in some existing intrinsics.
Fixed ValueTracking unit test to not exercise invalid vp reduce
intrinsics as they now fail to parse (as opposed to fail to validate).
See
https://discourse.llvm.org/t/rfc-minor-change-in-overloaded-types-for-intrinsics/90880
[SelectionDAG] Add expansion for llvm.convert.to.arbitrary.fp (#193595)
The expansion converts a native IEEE float to an arbitrary-precision FP
format, returning the result as an integer, following this algorithm:
1. Bitcast the source float to an integer and extract sign, exponent,
and mantissa bit fields via masks and shifts.
2. Classify the input (zero/denormal/normal/Inf/NaN).
3. Normalize the source via FFREXP.
4. Normal path: adjust the exponent bias from source to destination
format and truncate the mantissa with rounding (supports
NearestTiesToEven, TowardZero, TowardPositive, TowardNegative,
NearestTiesToAway).
5. Denormal destination path: when the biased destination exponent is <=
0, shift the mantissa right to produce a denormalized result with
rounding.
6. Handle mantissa overflow from rounding and exponent overflow. Produce
Inf or saturate to max finite, depending on format and saturation flag.
7. Build special-value results (canonical qNaN, signed Inf, signed zero)
[13 lines not shown]
[AMDGPU][InstCombine] Fold identity wave shuffle to its source (#204121)
The constant-shuffle matcher emitted a self-targeting update_dpp
(quad_perm 0xe4) for an identity shuffle instead of eliminating it.
Return the source value when every lane reads itself so the intrinsic
folds away.
Add a separate wave64 test file that drives the matcher with the full
wave64 thread ID (mbcnt.hi(mbcnt.lo)), covering quad_perm and other DPP
forms plus identity folding.
[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask
Follow up from comments on https://github.com/llvm/llvm-project/pull/202886
Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked into the design.
I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory.
Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.
[Loads][LAA] Properly handle multiple deref assumptions (#204083)
The current code uses a `DerefRK = std::max(DerefRK, RK)` pattern, which
mostly "works" by accident. The maximum here is based on RK.ArgValue,
which will be 0 on initialization and then (incorrectly) set to 1 for
variable dereferenceable assumptions, or N for constant ones. So if we
have a single variable dereferenceable assumption, this ends up working.
If we have multiple, or there are also constant assumptions, or we fix
the incorrect ArgValue initialization, this breaks down.
Fix this by individually inspecting the RK values. Do the checks for
each one until we have both align and dereferenceable proven. For the
LAA case, add all the applicable dereferenceable assumptions to the umax
expression.
Reapply "[clang][SPIR-V] Implement -fspv-preserve-interface (#196404)" (#204249)
This reverts commit 6746898d2bfc086947d86715e065f8dbf74e9690.
[clang][SPIR-V] Re-land -fspv-preserve-interface (#196404)
This had been reverted in #202558 due to a missing symbol
(llvm::removeFromUsedLists). That symbol is now available in main.
@jmmartinez @jplehr could you please review?
I did not need to modify the original PR #196404 . As I described in
https://github.com/llvm/llvm-project/pull/196404#issuecomment-4661219367,
the build failure on `main` was not due to it.
Tested locally. Will watch the bots once the PR is created.
[X86] Attempt to fold SHUF128(concat(x,y),concat(z,w),0x44) -> concat(x,z) (#204340)
Trying to yak shave the regressions on #201271 led me here - I really
want this to be done generically as a canonicalization inside
combineX86ShufflesRecursively, but hit some issues with the widening
code that is still causing trouble for #45319
[AArch64][SVE] Enable known bits for predicated shifts (#200347)
Allow SelectionDAG to query target known-bits information for scalable
vector nodes, and known-bits cases for SVE predicated SHL, SRL and SRA
nodes.
This enables DAG combines to prove disjointness for ORs involving
scalable vector shifts, enabling USRA/SSRA instruction selection.
Merge tag 'audit-pr-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
- Fix a recursive deadlock when duplicating executable file rules
Avoid multiple lookups and attempted I_MUTEX_PARENT locks when moving
watched files by passing the already resolved inodes through the
audit code.
- Fix removal of executable watch rules after the file is deleted
Prior to this fix we were unable to remove an executable file watch
where the file had been previously deleted due to a negative dentry
check in the code that performs the lookup on the file watches.
- Convert our basic "unsigned" type usage to "unsigned int".
* tag 'audit-pr-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
[3 lines not shown]