[flang][OpenMP] Fix crash in declare reduction with intrinsic operators (#182978)
genOMP for OpenMPDeclareReductionConstruct unconditionally extracts
ProcedureDesignator from OmpReductionIdentifier, but when the reduction
identifier is an intrinsic operator like `+`, the parser produces a
DefinedOperator instead. This causes a std::get crash.
Visit both variants of OmpReductionIdentifier to extract the reduction
name string, handling DefinedOperator (with IntrinsicOperator and
DefinedOpName sub-variants) alongside the existing ProcedureDesignator
path.
This fixes the ICE; the underlying lack of derived-type reduction
support (TODO in ReductionProcessor::getReductionInitValue) remains
a separate issue.
Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski at hpe.com>
[Clang] Add `__builtin_reduce_[in_order|assoc]_fadd` for floating-point reductions (#176160)
This adds `__builtin_reduce_[in_order|assoc]_fadd` to expose the
`llvm.vector.reduce.fadd.*` intrinsic directly in Clang, for the full
range of supported FP types.
Given a floating-point vector `vec` and a scalar floating-point value
`acc`:
- `__builtin_reduce_assoc_fadd(vec)` corresponds to an fast/associative
reduction
* i.e, the fadds can occur in any order
- `__builtin_reduce_in_order_fadd(vec, acc)` corresponds to an ordered
redunction
* i.e, the result is as-if an accumulator was initialized with `acc`
and each lane was added to it in-order, starting from lane 0
[mlir][gpu] Support arith.truncf in subgroup MMA elementwise ops (#182499)
This commit adds support for arith.truncf in the supported list of
elementwise ops for subgroup MMA ops, and enables lowering to SPIR-V.
[DAG] visitOR - attempt to fold (or buildvector(), buildvector()) -> buildvector() (#183032)
See if we can fold all elements of an OR of buildvectors: OR(-1,X) ->
-1, OR(0,X) -> X, etc.
[clang] Define __PTRAUTH_INTRINSICS__ for arm64e-apple-* targets (#172944)
The macro is set by Xcode clang for the arm64e-apple-* targets, and
ifdefed in the macOS and iPhoneOS SDKs.
[AMDGPU]Fix compute num sign bits unsigned underflow (#182723)
Fixes #182677
The `BFE_I32` case in `ComputeNumSignBitsForTargetNode` was not masking
the width operand with `& 0x1f`, unlike other BFE operations in the same
file. Since the hardware instruction only uses the low 5 bits of the
width field, values >= 32 passed via `@llvm.amdgcn.sbfe.i32` caused
unsigned integer underflow in the calculation:
unsigned SignBits = 32 - Width->getZExtValue() + 1;
When width > 33, this underflows, producing incorrect SignBits values.
When width == 33, SignBits becomes 0, violating the expected return
range of [1, BitWidth]. This led to assertion failures and
miscompilation where subsequent BFE narrowing operations were
incorrectly eliminated.
This patch:
[2 lines not shown]
[llvm][release] Link to .jsonl signatures for Windows x86_64 and ARM64 (#183053)
Previously we linked to .sig files, which were created by the person who
built the release.
Now these are built in GitHub so they have .jsonl signature files
instead.
Add a temporary patch to remove tmppath from pledge in favour of
unveil(_PATH_TMP)+pledge("rpath wpath cpath").
This patch is to bridge the time until a new release of dkimsign can be
made.
OK op@ kirill@
[libc] Fix LIBC_INLINE build error in riscv/irelative.cpp (#183249)
LIBC_INLINE is defined in attributes.h, which was not included. Since
constexpr already implies inline, simply remove the LIBC_INLINE
qualifier from the static helper, matching the x86_64 and aarch64
irelative implementations.
[libclc] Compile with -fdenormal-fp-math=dynamic (#183262)
This PR is extracted from #157633.
`-fdenormal-fp-math=dynamic` is required to defer denormal handling and
should be used for libclc library compilation.
Additionally, if the default ieee value is incompatible with the user
code's denormal-fp-math setting, this mismatch prevents libclc functions
from being inlined.
[OpenCL] Set intel extensions minimum version to OpenCL 1.0 (#176854)
Motivation is similar to b12e070b9238. Following intel extensions are
changed:
cl_intel_required_subgroup_size
cl_intel_subgroups
cl_intel_subgroups_char
cl_intel_subgroups_long
cl_intel_subgroups_short
cl_intel_subgroup_buffer_prefetch
cl_intel_subgroup_local_block_io
cl_intel_device_side_avc_motion_estimation
Relates to https://github.com/KhronosGroup/OpenCL-CTS/pull/2376.
Add support for scan command version 17 to iwx(4).
This will be needed to support BZ wifi-6e devices in the future.
Tested:
AX200: jmc, stsp
AX210: kettenis (MA device)
AX211: sthen (SO device), phessler
AX211: stsp (BZ device)
[AArch64][GISel] Fix computeKnownBits through a COPY with different fixed-width vector types (#179123)
Fix an assertion in known bits through a COPY by making computeKnownBits
length-aware for different fixed width vectors. If the lengths of the
vectors are different all lanes are demanded.
Fixes #178242