AMDGPU: Select vector reg class for divergent build_vector (#168169)
The main improvement is to the mfma tests. There are some
mild regressions scattered around, and a few major ones.
The worst regressions are in some of the bitcast tests;
these are cases where the SGPR argument list runs out
and uses VGPRs, and the copies-from-VGPR are misidentified
as divergent. Most of the shufflevector tests are also
regressions. These end up with cleaner MIR, but then get poor
regalloc decisions.
CodeGen: Remove PointerLikeRegClass handling from codegen
All uses have been migrated to RegClassByHwMode. This is now
an implementation detail of InstrInfoEmitter for pseudoinstructions.
CodeGen: Make target overrides of PointerLikeRegClass mandatory
Most targets should now use the convenience multiclass to fixup
the operand definitions of pointer-using pseudoinstructions:
defm : RemapAllTargetPseudoPointerOperands<target_ptr_regclass>;
TableGen: Support target specialized pseudoinstructions
Allow a target to steal the definition of a generic pseudoinstruction
and remap the operands. This works by defining a new instruction, which
will simply swap out the emitted entry in the InstrInfo table.
This is intended to eliminate the C++ half of the implementation
of PointerLikeRegClass. With RegClassByHwMode, the remaining usecase
for PointerLikeRegClass are the common codegen pseudoinstructions.
Every target maintains its own copy of the generic pseudo operand
definitions anyway, so we can stub out the register operands with
an appropriate class instead of waiting for runtime resolution.
In the future we could probably take this a bit further. For example,
there is a similar problem for ADJCALLSTACKUP/DOWN since they depend
on target register definitions for the stack pointer register.
AMDGPU: Consider isVGPRImm when forming constant from build_vector (#168168)
This probably should have turned into a regular integer constant
earlier. This is to defend against future regressions.
AMDGPU: Select vector reg class for divergent build_vector
The main improvement is to the mfma tests. There are some
mild regressions scattered around, and a few major ones.
The worst regressions are in some of the bitcast tests;
these are cases where the SGPR argument list runs out
and uses VGPRs, and the copies-from-VGPR are misidentified
as divergent. Most of the shufflevector tests are also
regressions. These end up with cleaner MIR, but then get poor
regalloc decisions.
AMDGPU: Consider isVGPRImm when forming constant from build_vector
This probably should have turned into a regular integer constant
earlier. This is to defend against future regressions.
[AArch64][GlobalISel] Improve lowering of vector fp16 fpext (#165554)
This PR improves the lowering of vectors of fp16 when using fpext.
Previously vectors of fp16 were scalarized leading to lots of extra
instructions. Now, vectors of fp16 will be lowered when extended to fp64
via the preexisting lowering logic for extends. To make use of the
existing logic, we need to add elements until we reach the next power of
2.
[SelectionDAGBuilder] Propagate fast-math flags to fpext (#167574)
As in title. Without this, fpext behaves in selectionDAG as always
having no fast-math flags.
AMDGPU: Select vector reg class for divergent build_vector
The main improvement is to the mfma tests. There are some
mild regressions scattered around, and a few major ones.
The worst regressions are in some of the bitcast tests;
these are cases where the SGPR argument list runs out
and uses VGPRs, and the copies-from-VGPR are misidentified
as divergent. Most of the shufflevector tests are also
regressions. These end up with cleaner MIR, but then get poor
regalloc decisions.
AMDGPU: Consider isVGPRImm when forming constant from build_vector
This probably should have turned into a regular integer constant
earlier. This is to defend against future regressions.
AMDGPU: Use v_mov_b32 to implement divergent zext i32->i64 (#168166)
Some cases are relying on SIFixSGPRCopies to force VALU
reg_sequence inputs with SGPR inputs to use all VGPR inputs,
but this doesn't always happen if the reg_sequence isn't
invalid. Make sure we use a vgpr up-front here so we don't
rely on something later.
[NFC][SpecialCaseList] Convert `preprocess` into `LazyInit` (#167281)
Currently SpecialCaseList created at least twice,
one on by `Driver`, for diagnostics only, and then
the real one by the `ASTContext`.
Also, deppending on enabled sanitizers, not all
sections will be used.
In both cases there is unnecessary RadixTree
construction.
This patch changes `GlobMatcher` to do initialization
lazily only when needed.
And remove empty one from `RegexMatcher`.
This saves saves 0.5% of clang time building large project.
AMDGPU: Select vector reg class for divergent build_vector
The main improvement is to the mfma tests. There are some
mild regressions scattered around, and a few major ones.
The worst regressions are in some of the bitcast tests;
these are cases where the SGPR argument list runs out
and uses VGPRs, and the copies-from-VGPR are misidentified
as divergent. Most of the shufflevector tests are also
regressions. These end up with cleaner MIR, but then get poor
regalloc decisions.