[AMDGPU] si-peephole-sdwa: Handle V_PACK_B32_F16_e64 (WIP)
Change si-peephole-sdwa to eliminate V_PACK_B32_F16_e64 instructions
by changing the second operand to write to the upper word of the
destination directly.
[AMDGPU] Enable ISD::{FSIN,FCOS} custom lowering to work on v2f16
Currently ISD::FSIN and ISD::FCOS of type MVT::v2f16 are legalized by
first expanding and then using a custom lowering on the resulting f16
instructions. This ordering prevents using packed math variants of the
instructions introduced by the legalization (e.g. the multiplication),
if available, and makes it difficult to eliminate the packing of the
results by using SDWA form; previous attempts to deal with the latter
situation in the si-peephole-sdwa pass were unwieldly since it was
necessary to reconstruct the association between the source and target
vectors.
Change the legalization action for ISD::FSIN and ISD::FCOS of type
MTF::v2f16 to Custom and change the custom intrinsic lowering to deal
with the v2f16 for the intrinsics introduced in this way.
[AMDGPU] SIIselLowering: Use intrinsics in LowerTrig
This allows to apply further legalization actions to the
resulting nodes which is a preparatory step to extend the
custom lowering to vector types.
[MLIR][XeGPU] Add support for cross-subgroup reduction from wg to sg (#170936)
This PR adds support for cross-sg reduction whilst distributing from
workgroup to subgroup. It has following limitation
1. Cannot reduce to a scalar
2. For cross-sg, only 1:1 decomposition (each sg should be assigned only
one tile in the original WG tile) is supported for now. For example for
a WG tile of size 256x128, sg_layout = [8, 4], sg_data = [16, 16] wont
be supported.
[github] Fix release parameter to uncomment download links step (#176386)
I thought I could remove validate-tag from the "needs" because
release-binaries also "needs" validate-tag. Turns out that we get the
release version from an output of validate-tag and if it isn't in the
"needs" section we get an empty string when substitution happens.
Leading to this error:
./llvm/utils/release/./github-upload-release.py --token "$GITHUB_TOKEN"
--release uncomment_download_links
github-upload-release.py: error: the following arguments are required:
command
Put back validate-tag.
Fixes 822a45f4b4909289f84d119f1e5891b486d74f5e.
[mlir][tosa] Add support for CONV2D_BLOCK_SCALED operator (#172294)
This commit adds support for an MXFP CONV2D operation,
CONV2D_BLOCK_SCALED, added to the specification in
https://github.com/arm/tosa-specification/commit/408a5e53f5a7357adef7121ba3cc88e2225d4231.
This includes:
- Operator definition
- Addition of the EXT_MXFP_CONV extension
- Verification logic for the operator
- Output shape inference for the operator
- Validation checks to ensure compliance with the TOSA specification.
Add `-Xoffload-compiler` option (#170467)
... to forward input to clang-linker-wrapper's device compiler
invocation.
(a separate patch as requested in #168043)
AMDGPU/GlobalISel: Regbanklegalize rules for G_UNMERGE_VALUES
Move G_UNMERGE_VALUES handling to AMDGPURegBankLegalizeRules.cpp.
Fix sgpr S16 unmerge by lowering using shift and using S32.
Previously sgpr S16 unmerge was selected using _lo16 and _hi16 subreg
indexes which are exclusive to vgpr register classes.
For remaing cases we do trivial mapping, assigns same reg bank
to all operands, vgpr or sgpr.
Add an overload of `expandMemSetAsLoop` that takes an optional TTI pointer
This avoids breaking the API for out-of-tree tools like the
SPIRV-LLVM-Translator.
[llvm][utils][release] Remove mention of sub-project source archives (#176348)
These are no longer provided as of llvm 22:
https://discourse.llvm.org/t/llvm-22-1-0-rc1-released/89479
> Please note: since the last release the subproject tarballs have been
> removed and are no longer provided. See RFC: Do "something" with the
> subproject tarballs in the release page for more details.
There are now only llvm-project and llvm-test-suite archives.
[AMDGPU] si-peephole-sdwa: Handle V_PACK_B32_F16_e64 (WIP)
Change si-peephole-sdwa to eliminate V_PACK_B32_F16_e64 instructions
by changing the second operand to write to the upper word of the
destination directly.
[AMDGPU] Enable ISD::{FSIN,FCOS} custom lowering to work on v2f16
Currently ISD::FSIN and ISD::FCOS of type MVT::v2f16 are legalized by
first expanding and then using a custom lowering on the resulting f16
instructions. This ordering prevents using packed math variants of the
instructions introduced by the legalization (e.g. the multiplication),
if available, and makes it difficult to eliminate the packing of the
results by using SDWA form; previous attempts to deal with the latter
situation in the si-peephole-sdwa pass were unwieldly since it was
necessary to reconstruct the association between the source and target
vectors.
Change the legalization action for ISD::FSIN and ISD::FCOS of type
MTF::v2f16 to Custom and change the custom intrinsic lowering to deal
with the v2f16 for the intrinsics introduced in this way.
[AMDGPU] Fix typo in `LowerVGPREncoding` to allow it to hoist past `waitcnt` instructions (#176355)
Fixes a typo which prevented `set_vgpr_msb` to be hoisted past `waitcnt`
instructions.