[AMDGPU][PromoteAlloca] Set !amdgpu.non.volatile if promotion fails
I thought about doing this in a separate pass, but this pass already has all the necessary analysis for this to be a trivial addition.
We can simply set `!amdgpu.non.volatile` if all other attempts to promote the operation failed.
[AMDGPU] Set MONonVolatile on memory accesses for spills
Mark the memory operand of spill load/stores as non-volatile, so that these
loads and stores are emitted with `nv` set.
The reason is that scratch memory used by spills will never be shared by
another thread. It's purely thread local and thus a good fit for the `nv` bit.
[AMDGPU][GFX12.5] Add support for emitting memory operations with nv bit set
- Add & document `!amdgpu.non.volatile` metadata and a corresponding `MONonVolatile` MachineMemOperand flag.
- Set nv=1 on memory operations on GFX12.5 if the operation accesses a constant address space,
is an invariant load, or has the `MONonVolatile` flag set.
devel/libunicode-contour: Fix build when samurai is used instead of ninja
When samurai is used instead of ninja, python is not pulled in as its
dependency. The build process requires python's existence so the port
must explictly depend on python in build time.
PR: 292683
Reported by: Eric Camachat <eric at camachat.org>
MFH: 2026Q1
(cherry picked from commit 460e62cc3548b28a331954257679793713631951)
[WebAssembly] Combine shuffle and signed extend to extend_high (#179166)
Fold shuffles and bitcasts feeding extend_low_s into extend_high_s.
This enables i32x4.dot_i16x8_s selection and removes redundant shuffles.
Fixed: https://github.com/llvm/llvm-project/issues/179145
[lldb][CompilerType] Add CompilerType::IsRealFloatingPointType (#178904)
This is part of a patch series to clean up the
`TypeSystemClang::IsFloatingPointType` API. Currently the API is a bit
of a foot-gun because it returns `true` for both Complex floats and
vector types whose element types are floats, but most call-sites
probably don't handle these correctly. The former aligns with the
`clang::Type::isFloatingType` API, but the latter doesn't. This specific
implementation choice will be addressed in a separate patch. This patch
adds a new `CompilerType::IsRealFloatingPointType` API which clients can
use to query about non-complex floats (named after the similarly named
`clang::Type::isRealFloatingType`).
This allows us to clean up some of the callers which only wanted to
handle non-complex floats. I cleaned those up as part of this patch.
Wherever we checked for `is_float && !is_complex && !is_vector_type` i
just replaced it with the new API.
On encountering complex/vector floats, some of the ABI plugins would set
[6 lines not shown]
devel/libunicode-contour: Fix build when samurai is used instead of ninja
When samurai is used instead of ninja, python is not pulled in as its
dependency. The build process requires python's existence so the port
must explictly depend on python in build time.
PR: 292683
Reported by: Eric Camachat <eric at camachat.org>
MFH: 2026Q1
[HLSL][DXIL][SPIRV] WavePrefixSum intrinsic support (#167946)
Issue: https://github.com/llvm/llvm-project/issues/99172
- [x] Implement `WavePrefixSum` clang builtin
- [x] Link `WavePrefixSum` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WavePrefixSum` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WavePrefixSum` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WavePrefixSum.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WavePrefixSum-errors.hlsl`
- [x] Create the `int_dx_WavePrefixSum` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WavePrefixSum` to `121` in
`DXIL.td`
- [x] Create the `WavePrefixSum.ll` and `WavePrefixSum_errors.ll` tests
in `llvm/test/CodeGen/DirectX/`
[13 lines not shown]
[VPlan] Sink recipes from the vector loop region in licm. (#168031)
When a recipe can be safely sunk and all of its users are outside the
vector loop region in the same dedicated exit block, the recipe does not
need to be executed on every iteration.
This patch extends the VPlan-based LICM (Loop Invariant Code Motion) to
also sink such recipes from the vector loop region into the exit block.
This reduces redundant computation and improves cost model accuracy.
TODO: Support nested loop sinking
TODO: Support sinking `VPReplicateRecipe` (requires `replicateByVF`
fixes)
TODO: Support recipes with multiple defined values (e.g., interleaved
loads)
TODO: Clone recipes without users to all exit blocks
TODO: Support PHI node users by checking incoming value blocks
TODO: Support sinking when users are in multiple blocks
TODO: Clone recipes when users are on multiple exit paths
[5 lines not shown]
OpenSSL: install .pc files from the exporters subdir
The .pc files generated in the root directory are used as part of the
build; they should never be installed. Use the versions from the
exporters subdirectory--which should be installed--as the .pc files
which are distributed with FreeBSD. This avoids the need for "fixing up"
these files after the fact (see `crypto/openssl/BSDmakefile` for more
details as part of this change).
Garbage collect `secure/lib/libcrypto/Makefile.version`, et al,
as they're orphaned files. They were technically unused prior to this
change as the vendor process properly embeds the version numbers in
various files, but this commit formalizes the removal.
This correction/clarification on the .pc files will be made in an
upcoming release of OpenSSL [1].
References:
1. https://github.com/openssl/openssl/issues/28803
[6 lines not shown]
[RISCV] Sink conversion from nfields/lmul to nf down one level in RISCVInstrInfoV.td. NFC (#179369)
The nf field is encoded as nfields/lmul minus one. Use asserts to
verify this doesn't lose any information.
The asserts increase the number of lines, but I think this makes the
class interfaces a more logical level than encoding.
OpenSSL: update build artifacts to match 3.0.16 release
The files committed match the output of the new vendor process. Much of
this involves regenerating manpages to catch up to content from the
initial 3.0 import.
This is a direct commit to stable/14.