LLVM/project 4401468clang/include/clang/Basic CodeGenOptions.def, clang/include/clang/Options Options.td

Revert "[AMDGPU][SIInsertWaitcnt] Implement Waitcnt Expansion for Profiling (…"

This reverts commit 3dfb782333bf929945f63e5b0b1cad378b0bd87a.
DeltaFile
+0-944llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+92-203llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+0-19llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+0-7clang/include/clang/Options/Options.td
+0-4clang/include/clang/Basic/CodeGenOptions.def
+92-1,1971 files not shown
+92-1,1997 files

LLVM/project 1e96ee6llvm/test/Analysis/Delinearization global_array_bounds.ll

[Delinearization] Precommit global decl test. NFC. (#175173)

This precommits a test that should demonstrate that Delinearization can
succeed when we analyse the size of the global variable definition.
DeltaFile
+48-0llvm/test/Analysis/Delinearization/global_array_bounds.ll
+48-01 files

LLVM/project 075e467llvm/lib/Target/SPIRV SPIRVInstrInfo.td SPIRVModuleAnalysis.cpp, llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_arbitrary_precision_floating_point arbitrary_precision_floating_point_test.ll arbitrary_precision_floating_point_test_extended.ll

[SPIRV] Added Support for the SPV_ALTERA_arbitrary_precision_floating_point Extension (#160054)

Added support for the SPV_ALTERA_arbitrary_precision_floating_point
extension, enabling all the arbitrary precision floating-point
operations with instruction definitions and test files.
DeltaFile
+126-0llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_arbitrary_precision_floating_point/arbitrary_precision_floating_point_test.ll
+103-0llvm/lib/Target/SPIRV/SPIRVInstrInfo.td
+96-0llvm/test/CodeGen/SPIRV/extensions/SPV_INTEL_arbitrary_precision_floating_point/arbitrary_precision_floating_point_test_extended.ll
+53-0llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+45-0llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp
+44-0llvm/lib/Target/SPIRV/SPIRVBuiltins.td
+467-03 files not shown
+474-19 files

LLVM/project 3de24d2libcxx/include CMakeLists.txt, libcxx/include/__algorithm find_if.h

Revert "[libc++] Optimize std::find_if (#167697)"

This reverts commit 6189512f73a343b364f1907b742659bae3bd5b56.
DeltaFile
+0-74libcxx/include/__memory/valid_range.h
+37-0libcxx/include/__utility/is_valid_range.h
+16-4libcxx/test/benchmarks/algorithms/nonmodifying/find.bench.cpp
+0-3libcxx/include/__algorithm/find_if.h
+1-1libcxx/include/__utility/is_pointer_in_range.h
+1-1libcxx/include/CMakeLists.txt
+55-833 files not shown
+58-869 files

LLVM/project aaa99a3clang-tools-extra/clang-tidy/readability RedundantTypenameCheck.cpp, clang-tools-extra/test/clang-tidy/checkers/readability redundant-typename.cpp

[clang-tidy] Fix false negatives around static data members in `readability-redundant-typename` (#175477)

Fixes #175475.
DeltaFile
+30-0clang-tools-extra/test/clang-tidy/checkers/readability/redundant-typename.cpp
+2-1clang-tools-extra/clang-tidy/readability/RedundantTypenameCheck.cpp
+32-12 files

LLVM/project 17aa32cllvm/lib/Target/SPIRV SPIRVPrepareFunctions.cpp, llvm/test/CodeGen/SPIRV/llvm-intrinsics constrained-fmuladd.ll

[SPIRV]  Added support for the constrained arithmetic(Fmuladd) intrinsic  (#170270)

Added SPIR-V support for constrained arithmetic intrinsic fmuladd,
lowered as a sequence of OpFMul and OpFAdd with roundingmode, consistent
with the SPIR-V translator.
DeltaFile
+64-0llvm/test/CodeGen/SPIRV/llvm-intrinsics/constrained-fmuladd.ll
+22-0llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
+86-02 files

LLVM/project b3b282ellvm/include/llvm/ExecutionEngine/JITLink loongarch.h, llvm/lib/ExecutionEngine/JITLink ELF_loongarch.cpp loongarch.cpp

[JITLink][LoongArch] Add reloc types for LA32R/LA32S
DeltaFile
+88-18llvm/test/ExecutionEngine/JITLink/LoongArch/ELF_loongarch32_relocations.s
+84-1llvm/lib/ExecutionEngine/JITLink/ELF_loongarch.cpp
+72-1llvm/include/llvm/ExecutionEngine/JITLink/loongarch.h
+4-0llvm/lib/ExecutionEngine/JITLink/loongarch.cpp
+248-204 files

LLVM/project c705003llvm/lib/CodeGen TargetLoweringObjectFileImpl.cpp, llvm/lib/Target/AMDGPU R600ISelLowering.cpp

[NFC] Exactly one kind typo fixes, change defintions to definitions. (#174333)

DeltaFile
+1-1llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+1-1llvm/lib/Target/AMDGPU/R600ISelLowering.cpp
+1-1llvm/lib/Target/M68k/MCTargetDesc/M68kMCCodeEmitter.cpp
+1-1llvm/tools/llvm-objdump/MachODump.cpp
+1-1llvm/utils/TableGen/Basic/TableGen.cpp
+1-1mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp
+6-64 files not shown
+10-1010 files

LLVM/project c9cc782llvm/lib/Target/LoongArch LoongArchExpandPseudoInsts.cpp LoongArchMergeBaseOffset.cpp, llvm/lib/Target/LoongArch/AsmParser LoongArchAsmParser.cpp

[llvm][LoongArch] Add PC-relative address materialization using pcadd instructions (#175358)

This patch adds support for PC-relative address materialization using
pcadd-class relocations, covering the HI20/LO12 pair and their GOT and
TLS variants (IE, LD, GD, and DESC).

Link:
https://gcc.gnu.org/pipermail/gcc-patches/2025-December/703312.html
DeltaFile
+149-90llvm/test/CodeGen/LoongArch/code-models.ll
+132-89llvm/test/CodeGen/LoongArch/merge-base-offset.ll
+153-54llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
+114-30llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
+42-28llvm/test/CodeGen/LoongArch/double-imm.ll
+39-16llvm/lib/Target/LoongArch/LoongArchMergeBaseOffset.cpp
+629-30726 files not shown
+949-44732 files

LLVM/project 71cc387llvm/include/llvm/Analysis TargetTransformInfoImpl.h TargetTransformInfo.h, llvm/lib/Analysis TargetTransformInfo.cpp

[InferAddressSpaces] Handle unconverted ptrmask (#140802)

In case a ptrmask cannot be converted to the new address space due to an
unknown mask value, this needs to be detcted and an addrspacecast is
needed to not hinder a future use of the unconverted return value of
ptrmask. Otherwise, users of this value will become invalid by receiving
a nullptr as an operand.

This LLVM defect was identified via the AMD Fuzzing project.

(See https://reviews.llvm.org/D80129 for an explanation of why some
ptrmasks are impossible to convert to other addrspaces.)
DeltaFile
+240-13llvm/test/Transforms/InferAddressSpaces/AMDGPU/ptrmask.ll
+72-15llvm/lib/Transforms/Scalar/InferAddressSpaces.cpp
+42-0llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+0-35llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+12-0llvm/include/llvm/Analysis/TargetTransformInfo.h
+11-0llvm/lib/Analysis/TargetTransformInfo.cpp
+377-636 files

LLVM/project ef44d25clang/include/clang/Basic DiagnosticASTKinds.td, clang/lib/AST ExprConstant.cpp

[clang][ExprConst] Diagnose out-of-lifetime access consistently (#175562)

Previously, we had two very similar diagnostics, "read of object outside
its lifetime" and "read of variable whose lifetime has ended".
The difference, as far as I can tell, is that the latter was used when
the variable was created in a function frame that has since vanished,
i.e. in this case:
```c++
constexpr const int& return_local() { return 5; }
static_assert(return_local() == 5);
```
so the output used to be:
```console
array.cpp:602:15: error: static assertion expression is not an integral constant expression
  602 | static_assert(return_local() == 5);
      |               ^~~~~~~~~~~~~~~~~~~
array.cpp:602:15: note: read of temporary whose lifetime has ended
array.cpp:601:46: note: temporary created here
  601 | constexpr const int& return_local() { return 5; }

    [42 lines not shown]
DeltaFile
+5-6clang/test/AST/ByteCode/lifetimes.cpp
+2-2clang/lib/AST/ExprConstant.cpp
+2-2clang/test/SemaCXX/builtin-is-within-lifetime.cpp
+2-2clang/test/SemaCXX/constant-expression-cxx2a.cpp
+0-3clang/include/clang/Basic/DiagnosticASTKinds.td
+2-1clang/lib/AST/ByteCode/Interp.cpp
+13-164 files not shown
+17-2010 files

LLVM/project d5c11b9llvm/lib/Transforms/Vectorize VPlanRecipes.cpp VPlan.h, llvm/test/Transforms/LoopVectorize vplan-printing-reductions.ll

[VPlan] Replace PhiR operand of ComputeRdxResult with VPIRFlags. (#174026)

Remove the artificial PhiR operand of ComputeReductionResult, which was
only used to look up recurrence kind, in-loop and ordered properties.

Instead, encode them as VPIRFlags as suggested by @ayalz in
https://github.com/llvm/llvm-project/pull/170223.

This addresses a TODO to make codegen for ComputeReductionResult
independent of looking up information from other recipes.

This is NFC w.r.t. codegen, the printing has been improved to include
the reduction type, and whether it is in-loop/ordered.

PR: https://github.com/llvm/llvm-project/pull/174026
DeltaFile
+42-22llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+56-4llvm/lib/Transforms/Vectorize/VPlan.h
+28-18llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+13-13llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+8-6llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+4-4llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll
+151-673 files not shown
+157-739 files

LLVM/project d7483e2llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU bf16.ll llc-pipeline-npm.ll

Merge branch 'users/chapuni/mcdc/nest/covgen' into users/chapuni/mcdc/nest/trunk
DeltaFile
+600-600llvm/test/Transforms/Attributor/nofpclass-implied-by-fcmp.ll
+939-0llvm/test/MC/RISCV/xaifet-amo-valid.s
+465-462llvm/test/CodeGen/AMDGPU/bf16.ll
+418-418llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+806-0llvm/test/CodeGen/X86/avx10_2bf16-fma.ll
+290-503llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+3,518-1,983587 files not shown
+16,568-11,996593 files

LLVM/project 48ce7bbllvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/AArch64 multiple-result-intrinsics.ll

[LV] Fix bug in setVectorizedCallDecision (#175742)

There is a bug in this logic:

```
   InstructionCost Cost = ScalarCost;
   InstWidening Decision = CM_Scalarize;

   if (VectorCost <= Cost) {
     Cost = VectorCost;
     Decision = CM_VectorCall;
   }

   if (IntrinsicCost <= Cost) {
     Cost = IntrinsicCost;
     Decision = CM_IntrinsicCall;
   }
```


    [17 lines not shown]
DeltaFile
+29-29llvm/test/Transforms/LoopVectorize/AArch64/multiple-result-intrinsics.ll
+2-2llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+31-312 files

LLVM/project f51eca2llvm/include/llvm/ProfileData InstrProf.h, llvm/lib/ProfileData InstrProf.cpp

[IndirectCallPromotion] Proper canonicalize coroutine function (#175606)

Fix an issue where coroutine function and its await suspend wrappers are
all canonicalized to the same name. This creates duplicate entries in
`MD5FuncMap` (a sorted vector) and may return an incorrect GUID that
mismatches the one from prof metadata and miss ICP. For example, coro
function `foo` and its wrappers `foo.__await_suspend_wrapper__init`,
`foo.__await_suspend_wrapper__final` are all canonicalized to `foo`.
During GUID lookup, any of them can be returned due to unstable sort.
This is more of the reliability issue (the indeterminism) than a
performance issue because hot indirect calls should've been promoted in
sample loader pass.

This also fixes the same naming issue in `CGProfile` where symtab is
created. By the time the pass is run, wrapper functions should already
be inlined but naming collision can happen to the coro function and its
post-split clones (`foo.resume`, `foo.cleanup`).

This change

    [9 lines not shown]
DeltaFile
+21-19llvm/lib/ProfileData/InstrProf.cpp
+24-0llvm/test/Transforms/PGOProfile/indirect_call_promotion_unique.ll
+4-4llvm/include/llvm/ProfileData/InstrProf.h
+49-233 files

LLVM/project 6ddab42clang-tools-extra/clang-tidy/readability RedundantTypenameCheck.cpp, clang-tools-extra/test/clang-tidy/checkers/readability redundant-typename.cpp

[clang-tidy] Fix false positive from `readability-redundant-typename` on partially specialized variables (#175473)

Fixes #174827.
DeltaFile
+15-12clang-tools-extra/clang-tidy/readability/RedundantTypenameCheck.cpp
+10-0clang-tools-extra/test/clang-tidy/checkers/readability/redundant-typename.cpp
+25-122 files

LLVM/project 0ae23callvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h

[VPlan] Split out optimizeEVLMasks. NFC (#174925)

Addresses part of #153144 and splits off part of #166164

There are two parts to the EVL transform:

1) Convert the loop so the number of elements processed each iteration
is EVL, not VF. The IV and header mask are replaced with EVL-based
variants.
2) Optimize users of the EVL based header mask to VP intrinsic based
recipes.

(1) changes the semantics of the vector loop region, whereas (2) needs
to preserve them. This splits (2) out so we don't mix the two up, and
allows us to move (1) earlier in the pipeline in a future PR.
DeltaFile
+58-49llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+9-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+4-2llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+5-0llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+76-514 files

LLVM/project 8eb23a6llvm/utils profcheck-xfail.txt

Update profcheck-xfail.txt
DeltaFile
+0-2llvm/utils/profcheck-xfail.txt
+0-21 files

LLVM/project d927768flang/lib/Lower/OpenMP OpenMP.cpp

fix adding numThreadsNumDims to ParallelOperands apply method
DeltaFile
+1-0flang/lib/Lower/OpenMP/OpenMP.cpp
+1-01 files

LLVM/project fc81a66mlir/include/mlir/Dialect/Linalg/Utils Utils.h, mlir/lib/Dialect/Linalg/Transforms Specialize.cpp

[NFC][Linalg] Add `matchConvolutionOpOfType` API and make `isaConvolutionOpOfType` API a wrapper (#174722)

-- This commit involves the following updates pertaining to
`isaConvolutionOpOfType` API :-
1. We don't want dilations/strides of convolution op to be returned as
pointer arguments to the API function - to tackle this we create a new
API `matchConvolutionOpOfType` which would return an optional struct of
dilations/stride.
2. To not break the original API's use case as a simple querying
functionality with true/false return - we keep `isaConvolutionOpOfType`
as a wrapper API which will invoke `matchConvolutionOpOfType` API and
return true/false depending on whether `matchConvolutionOpOfType` API
returned any value or not.
3. Dilations/strides of named convolution op are also populated now (it
was missed in the previous PRs while creating `isaConvolutionOpOfType`).
4. [Max/Min]UnsignedPool ops' body matcher now only matches unsigned int
ops (refer: https://github.com/llvm/llvm-project/pull/166070)

-- No tests are being added as all the above are NFC changes around the

    [2 lines not shown]
DeltaFile
+956-617mlir/lib/Dialect/Linalg/Utils/Utils.cpp
+19-5mlir/include/mlir/Dialect/Linalg/Utils/Utils.h
+4-4mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
+979-6263 files

LLVM/project d186277mlir/lib/Dialect/Bufferization/IR BufferizationOps.cpp, mlir/test/Dialect/Bufferization canonicalize.mlir

[MLIR][Bufferization] Fold LoadOp only when the buffer is read only (#172595)

When we `memref.load` from a buffer, it folded to `tensor.extract` even
when the buffer was writable, causing unexpected results. For example:

```mlir
func.func @load_after_write_from_buffer_cast(%arg0: index, %arg1: index,
                            %arg2: tensor<?x?xf32>) -> f32 {
  %0 = bufferization.to_buffer %arg2 : tensor<?x?xf32> to memref<?x?xf32>
  linalg.ceil ins(%0 : memref<?x?xf32>) outs(%0 : memref<?x?xf32>)
  %1 = memref.load %0[%arg0, %arg1] : memref<?x?xf32>
  return %1 : f32
}
```
would fold into
```mlir
module {
  func.func @load_after_write_from_buffer_cast(%arg0: index, %arg1: index, %arg2: tensor<?x?xf32>) -> f32 {
    %0 = bufferization.to_buffer %arg2 : tensor<?x?xf32> to memref<?x?xf32>

    [5 lines not shown]
DeltaFile
+19-1mlir/test/Dialect/Bufferization/canonicalize.mlir
+2-2mlir/test/Dialect/SparseTensor/sparse_perm_lower.mlir
+2-1mlir/test/Dialect/SparseTensor/fuse_sparse_pad_with_consumer.mlir
+1-1mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp
+1-1mlir/test/Dialect/SparseTensor/sparse_conv_2d_slice_based.mlir
+1-1mlir/test/Dialect/SparseTensor/sparse_pack.mlir
+26-76 files

LLVM/project 84c19e7clang/lib/Interpreter IncrementalExecutor.cpp

[clang-repl] Use more precise search to find the orc runtime. (#175805)

The new mechanism relies on the path in the toolchain which should be
the autoritative answer. This patch tweaks the discovery of the orc
runtime from unittests where the resource directory is hard to deduce.

Should address the issue raised in #175435 and #175322
DeltaFile
+76-71clang/lib/Interpreter/IncrementalExecutor.cpp
+76-711 files

LLVM/project c7e4350llvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV rvp-ext-rv64.ll rvp-ext-rv32.ll

[RISCV][llvm] Support select codegen for P extension (#175741)

This is scalar condition with fixed vector true/false value, we can just
handle it same as scalars.
DeltaFile
+55-0llvm/test/CodeGen/RISCV/rvp-ext-rv64.ll
+37-0llvm/test/CodeGen/RISCV/rvp-ext-rv32.ll
+24-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+116-03 files

LLVM/project 2f2ec93llvm/lib/Target/RISCV RISCVInstrInfoP.td, llvm/test/CodeGen/RISCV rvp-ext-rv64.ll rvp-ext-rv32.ll

[RISCV][llvm] Support vselect codegen for P extension (#175744)

The only difference between vselect vs. select is condition value(a.k.a.
mask), we can select by using bitwise operation:
vselect(mask, true, false) = (mask & true) | (~mask & false)
DeltaFile
+58-0llvm/test/CodeGen/RISCV/rvp-ext-rv64.ll
+39-0llvm/test/CodeGen/RISCV/rvp-ext-rv32.ll
+10-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+107-03 files

LLVM/project 0e6ef95llvm/lib/Target/RISCV RISCVInstrInfoVVLPatterns.td RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV/rvv vfclass-sdnode.ll fixed-vectors-vfclass.ll

[RISCV][llvm] Support IS_FPCLASS codegen for zvfbfa (#175758)

DeltaFile
+203-13llvm/test/CodeGen/RISCV/rvv/vfclass-sdnode.ll
+206-4llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfclass.ll
+6-0llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
+2-2llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+417-194 files

LLVM/project 9ced34dflang/lib/Lower/OpenMP ClauseProcessor.cpp Clauses.cpp, flang/test/Lower/OpenMP num-teams-dims.f90

[FLANG] Add flang to mlir lowering for num_teams
DeltaFile
+52-0flang/test/Lower/OpenMP/num-teams-dims.f90
+27-10flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+23-4flang/lib/Lower/OpenMP/Clauses.cpp
+15-3flang/lib/Lower/OpenMP/OpenMP.cpp
+117-174 files

LLVM/project 7cc013allvm/include/llvm/Passes CodeGenPassBuilder.h, llvm/include/llvm/Target CGPassBuilderOption.h

[CodeGen][NPM] Add support for -print-regusage in New Pass Manager (#169761)

Support `-print-regusage` flag in NPM for printing register usage information
DeltaFile
+3-4llvm/lib/CodeGen/RegisterUsageInfo.cpp
+5-0llvm/include/llvm/Passes/CodeGenPassBuilder.h
+4-0llvm/lib/CodeGen/TargetPassConfig.cpp
+1-0llvm/test/CodeGen/AMDGPU/ipra-regmask.ll
+1-0llvm/include/llvm/Target/CGPassBuilderOption.h
+14-45 files

LLVM/project aefccacllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU vector-reduce-mul.ll integer-mad-patterns.ll

[AMDGPU][GlobalISel] Add RegBankLegalize support for S64 G_MUL

Patch 4 of 4 patches to implement full G_MUL support in regbanklegalize.
DeltaFile
+195-203llvm/test/CodeGen/AMDGPU/vector-reduce-mul.ll
+106-101llvm/test/CodeGen/AMDGPU/integer-mad-patterns.ll
+98-49llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
+21-21llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-mul.mir
+23-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+5-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+448-3752 files not shown
+450-3758 files

LLVM/project fcdae37llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel regbankselect-smul.mir

[AMDGPU][GlobalISel] Add RegBankLegalize support for G_AMDGPU_S_MUL_*

Patch 3 of 4 patches to implement full G_MUL support in regbanklegalize.

Current mul.ll test is only partially updated and expected to fail.
It will be updated in the fourth patch.
DeltaFile
+92-0llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smul.mir
+19-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+4-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+2-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.h
+117-04 files

LLVM/project de7c7f7llvm/lib/Target/AMDGPU VOP3PInstructions.td, llvm/test/MC/AMDGPU gfx1250_asm_wmma_w32.s

[AMDGPU] Fix the encoding of VOP3PX2 instructions

ISA spec says `SCALE_OPSEL[0:1]` determines which parts of S3 and S4 are used, and `SCALE_OPSEL_HI[0:1]` should be zero.
DeltaFile
+40-40llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_wmma_w32.txt
+20-20llvm/test/MC/AMDGPU/gfx1250_asm_wmma_w32.s
+2-2llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+62-623 files