[X86] combineConcatVectorOps - IsConcatFree - detect splats that comes from a common load/broadcastload (#174986)
Allows us to handle freely concatable cases after a broadcast load has
become shared by different vector width uses by peeking through
bitcasts/extract_subvector nodes
[RISC-V][Mach-O] Print immediate operands in hexadecimal format. (#174505)
This is done for logical operations and auipc/lui.
Patch based on code written by Tim Northover.
[SPIRV] Additional fixes for const init via `UtoPtr` (#172584)
#166494 added support for using `inttoptr` in global initialisation, and
lowering int into `OpSpecConstantOp OpConvertUToPtr`. Unfortunately, it
slightly more subtle case / exposed an existing issue around the `COPY`
pseudo-op. This patch ensures that we glance through a `COPY` when
figuring out whether an `OpConvertUToPtr` is actually operating on a
global. We also correctly handle the case where a `G_PTR_ADD` is used by
an `OpSpecConstantOp` in the context of global initialisation, which
would otherwise lead to broken SPIR-V wherein the latter would reference
a non constant Op.
---------
Co-authored-by: Marcos Maronas <marcos.maronas at intel.com>
[InlineSpiller][AMDGPU] Implement subreg reload during RA spill
Currently, when a virtual register is partially used, the
entire tuple is restored from the spilled location, even if
only a subset of its sub-registers is needed. This patch
introduces support for partial reloads by analyzing actual
register usage and restoring only the required sub-registers.
This improvement enhances register allocation efficiency,
particularly for cases involving tuple virtual registers.
For AMDGPU, this change brings considerable improvements
in workloads that involve matrix operations, large vectors,
and complex control flows.
[AMDGPU] Test precommit for subreg reload
This test currently fails due to insufficient
registers during allocation. Once the subreg
reload is implemented, it will begin to pass
as the partial reload help mitigate register
pressure.
[AMDGPU] Put back ProperlyAlighedRC helper functions
Putting back the functions that are recently deleted
as they were found unused. They are needed for
implementing subreg reload during RA.
[CodeGen] Enhance createFrom for sub-reg aware cloning
Instead of just cloning the virtual register, this
function now creates a new virtual register derived
from a subregister class of the original value.
[AMDGPU] Make AMDGPURewriteAGPRCopyMFMA aware of subreg reload
AMDGPURewriteAGPRCopyMFMA pass is currently not subreg-aware.
In particular, the logic that optimizes spills into COPY
instructions assumes full register reloads. This becomes
problematic when the reload instruction partially restores
a tuple register. This patch introduces the necessary changes
to make this pass subreg-aware, for a future patch that
implements subreg reload during RA.
[AMDGPU] Introduce Offset field in SGPR spill Pseudos
Currently, SGPR spill pseudo-instructions lack
an offset field to represent non-zero stack offsets.
This patch introduces an additional offset field to
SGPR spill pseudo-instructions and updates all
relevant passes that handle spill lowering to support
this new field. This field is essential for a future
patch that implements subreg reload of tuple registers
from their stack location during RA.
[SLP]Do not generate extractelement subnodes with the same indeces
The compiler should not generate subvectors with the same extractelement
instructions, it may cause a crash and leads to inefficient
vectorization.
Fixes #174773
py-gwcs: updated to 0.26.1
0.26.1 (2025-11-19)
- Fix an indexing bug in ``spectroscopy.SellmeierZemax`` where the output ``n`` for array-type wavelength
inputs had the correct shape, but had the same value for all elements.
- Deprecate the private ``_toindex`` function in favor of a public ``to_index`` function.
0.26.0 (2025-09-18)
- Fix the computation of ``lon_pole`` for Zenitahl projections and declination of +/-90 deg.
- Enable ``inputs_mapping`` in ``selector.LabelMapperArray``.
- Deprecate ``with_units`` argument in favor of the high level Shared API.
[Headers][X86] __builtin_ia32_pmovwb128_mask is not constexpr (#174985)
Appears to be a copy+paste type - most of the x86 masked truncation intrinsics still can't be made constexpr at this time
Fixes #166814
[SDPatternMatch] Add m_FAbs matcher (#174975)
Adds a pattern matcher for floating-point absolute value (ISD::FABS),
following the same pattern as m_Abs for integer absolute value.
Fixes #174751
[compiler-rt][AArch64] Exit early from __arm_za_disable. (#174942)
Because `__arm_za_disable` is a private-ZA function, it's only ever
entered with ZA state `off` or `dormant`. If the state is `off` then we
can safely return and there is no need to call `__arm_tpidr2_save` or to
explicitly set PSTATE.ZA or TPIDR2_EL0 to zero.
libsodium: updated to 1.0.21
* Version 1.0.21-stable
- Export missing crypto_ipcrypt_nd_keygen() helper function.
- Fixed compilation with GCC on aarch64.
* Version 1.0.21
This point release includes all the changes from 1.0.20-stable, which
include a security fix for the `crypto_core_ed25519_is_valid_point()`
function, as well as two new sets of functions:
- The new `crypto_ipcrypt_*` functions implement mechanisms for securely
encrypting and anonymizing IP addresses as specified in https://ipcrypt-std.github.io
- The `sodium_bin2ip` and `sodium_ip2bin` helper functions have been added
to complement the `crypto_ipcrypt_*` functions and easily convert addresses
between bytes and strings.
- XOF: the `crypto_xof_shake*` and `crypto_xof_turboshake*` functions
are standard extendable output functions. From input of any length, they can
derive output of any length with the same properties as hash functions. These
primitives are required by many post-quantum mechanisms, but can also be used
[2 lines not shown]
[libc++][NFC] Update <any> to a more modern code style (#174619)
This patch refactors `enable_if`s inside `<any>` to use the `..., int> =
0` variant that we try to use throughout the code base and inlines some
of the functions into the class body to avoid duplicating the
`enable_if`s.
amd64: Remove tpm(4) from GENERIC for now
It breaks suspend/resume and no one has had time to investigate and fix
it.
PR: 291067
Reviewed by: emaste
Fixes: 3deb21f1afd5 ("random: TPM_HARVEST should have been named RANDOM_ENABLE_TPM")
Differential Revision: https://reviews.freebsd.org/D54587
[Clang] expunge `trivially_relocate_if_eligible` (#174344)
In Kona, WG21 decided to revert trivial relocation (P2786).
Keep the notion of relocatability
(used in the wild and likely to come back),
but remove the keyword which is no longer conforming