[Clang][AArch64] Fix codegen for SVE vector compare operations (#194013)
Overloaded operartors `<`, `>`, `<=`, `>=`, `==`, and `!=` with SVE
integer vector operands emitted LLVM IR with a couple of issues:
* The `icmp` instruction always performed unsigned comparison, even for
signed operands.
* The result of the comparison was zero-extended, whereas the intent is
to follow established NEON conventions and sign-extend it.
This patches fixes these issues.
[clang][CIR] Add lowering for vrshr_ and vrshrq_ rounding intrinsics (#194229)
This PR adds lowering for the vector rounding shift right intrinsice,
i.e. `vrshr_*` and `vrshrq_*` [1]. It also moves the corresponding tests
from:
* clang/test/CodeGen/AArch64/neon_intrinsics.c
to:
* clang/test/CodeGen/AArch64/neon/intrinsics.c
The lowering follows the existing implementation in
CodeGen/TargetBuiltins/ARM.cpp.
Part of #185382.
Reference:
[1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#vector-rounding-shift-right
Co-authored-by: Md Mouzam Arfi Hussain <arfihussain27 at gmail.com>
[InstCombine] Combine llvm.sin/llvm.cos libcall pairs into llvm.sincos (#184760)
Teach InstCombine to recognize pairs of `llvm.sin(x)` and `llvm.cos(x)`
intrinsic calls that share the same argument and replace them with a
single `llvm.sincos(x)` call, extracting the individual results.
The optimization works in two phases:
1. **SimplifyLibCalls**: Convert `sin`/`cos` C library calls (e.g.
`sinf`, `cosf`, `sin`, `cos`, `sinl`, `cosl`) into `llvm.sin` /
`llvm.cos` intrinsics when the call does not access memory (i.e. does
not set `errno`). This normalization step brings library calls into
the same form as compiler-generated intrinsics.
2. **InstCombineCalls**: When visiting an `llvm.sin` or `llvm.cos`
intrinsic, scan the users of the shared argument for a matching
counterpart. If found, emit a single `llvm.sincos` call placed right
after the argument definition, replace both original calls, and erase
the matched instruction.
Also remove the completed sincos TODO from Target/README.txt.
[OpenMP] Rename ompx_taskgraph->omp_taskgraph_experimental
This patch renames the option to enable taskgraph support in the
runtime from OMPX_TASKGRAPH to OMP_TASKGRAPH_EXPERIMENTAL, to reflect
the feature's official status in OpenMP 6.0, but also the feature's
current work-in-progress nature.
commit-id:fa62775a
Reviewers: ro-i
Reviewed By: ro-i
Pull Request: https://github.com/llvm/llvm-project/pull/194045
[X86] Add constant pool comments for (V)GF2P8AFFINEQB instructions (#194572)
Still need to do predicate/broadcast handling, but that's true for most instructions and we need a decent general mechanism to handle them
Revert "[Flang][OpenMP] Clear close on descriptor members for box parents in USM" (#194568)
Reverts llvm/llvm-project#194287
Buildbot errors in https://lab.llvm.org/buildbot/#/builders/67/builds/3464
local revert fixed the issues.
[AArch64][ISel] Remove zero instruction for `rev` all true predicates (#192925)
This patch removes the redundant instruction to zero inactive lanes for
SVE2p1 `rev` intrinsics when all lanes are active.
[mlir][arith] Remove redundant lambdas (NFC) (#194376)
Replace trivial lambda wrappers with direct function references. The
lambdas simply forwarded their arguments to existing functions, so
passing the function directly is clearer and more concise.
[Clang][OpenMP] Validate omp_initial_device omp_invalid_device as device IDs (#193688)
The counterpart fix for clang (as too done here:
[flang-fix](https://github.com/llvm/llvm-project/pull/193669))
The incorrectly interpreted device values in the `target` directive
throws:
```
error: argument to 'device' clause must be a non-negative integer value
#pragma omp target device(-1)
^~
error: argument to 'device' clause must be a non-negative integer value
#pragma omp target device(omp_invalid_device)
^~~~~~~~~~~~~~~~~~
```
[LoongArch] Support VBIT{CLR,SET,REV}I patterns for non-native element sizes
Extend vsplat_uimm_{pow2,inv_pow2} matching to allow specifying an explicit
element bit width, enabling recognition of splat patterns whose logical
element size differs from the vector's native element type.
Introduce templated selectVSplatUimm{Pow2,InvPow2} helpers with an optional
EltSize parameter, and add corresponding ComplexPattern definitions for
i8/i16/i32 element widths. This allows TableGen patterns to match cases such
as operating on v8i32/v4i64 vectors with masks derived from smaller element
sizes (e.g. i16).
With these changes, AND/OR/XOR operations using inverse power-of-two or
power-of-two splat masks are now correctly selected to VBITCLRI, VBITSETI,
and VBITREVI instructions instead of falling back to vector logical
operations with materialized constants.
[mlir][SPIR-V] Allow SpecConstantComposite constituents to reference other SpecConstantComposites (#193416)
The verifier for spirv.SpecConstantComposite previously assumed all
constituents were spirv.SpecConstant ops, which caused a crash when
referencing nested spirv.SpecConstantComposite ops
Per the SPIR-V spec (s3.3.7, OpSpecConstantComposite), constituents
"must be the \<id\>s of other specialization constants, constant
declarations, or an OpUndef", which includes OpSpecConstantComposite
[lldb] Remove full stop from AppendErrorWithFormat format strings (part 2) (#194352)
To fit the style guide:
https://llvm.org/docs/CodingStandards.html#error-and-warning-messages
I found these with:
* Find `(\.AppendErrorWithFormat\(([\s\r\n]+)?"(?:(?:\\.|[^"\\])*))\."`
and replace with `$1"` using Visual Studio Code.
* Putting a call to `validate_diagnostic` in `AppendErrorWithFormat`.
* Manual inspection.
Note that this change *does not* include a call to `validate_diagnostic`
because I do not know what's going to crash on platforms that I haven't
tested on.
[AArch64][GlobalISel] Lower BF16 FPTRUNC (#193941)
When the architecture +bf16 features is available this is simple as we
lower to a standard instruction. When not available we need to expand to
a series of instructions that performs the necessary rounding. The code
to do that is a port of TargetLowering::expandFP_ROUND to GISel, minus
the float64 odd rounding via expandRoundInexactToOdd. f64 will follow in
a followup patch.
uitofp and sitofp are currently disabled, so that we can take this one
step at a time and check each part in turn. The LLT fp types attempt to
return true for ieee types without UseExtended for types of the correct
size, always returning false for non-standard types.
[mlir][x86] Fix - Replace `load` with `transfer_read` to support on tensor type. (#194543)
This `patch` replaces `vector.load` operation with
`vector.transfer_read` op, such that the re-write lowers
`vector.contract` ops to `bf16_avx512_dp`.
[flang] improve array section analysis for WHERE (#194399)
The array section analysis in the HLFIR pass in charge of WHERE lowering
was unable to tell that the LHS and RHS are the same array section when
the base is an assumed shape or when a variable is used as indices.
This patch adds an optional callback to the array section
analysis to tell if two SSA values have the same value. This call back
is then implemented to tell that two SSA values are the same only if:
they are the result of equivalent operations with no memory effect (ok
to be non speculatable) and with operands that have the same value
(recursively), or if they are the load from the same variable (which is
OK in the context of WHERE RHS/LHS thanks to Fortran 2023 10.1.4 that
guarantee that a variable referred both on the RHS and LHS cannot be
modified by side effects in the RHS/LHS).
Assisted by: Claude