[CI] Fix misspelled runtimes_targets variable (#167696)
This was preventing check-compiler-rt from actually running when we
touched a project that was supposed to cause compiler-rt to be tested.
[LoongArch] Support memcmp expansion for vectors and combine for i128/i256 setcc
This commit enables memcmp expansion for lsx/lasx. After doing
this, i128 and i256 loads which are illegal types on LoongArch
will be generated. Without process, they will be splited to
legal scalar type.
So this commit also enable combination for `setcc` to bitcast
i128/i256 types to vector types before type legalization and
generate vector instructions.
Inspired by x86 and riscv.
[RISCV][GISel] Fallback to SelectionDAG for vleff intrinsics. (#167776)
Supporting this in GISel requires multiple changes to IRTranslator to
support aggregate returns containing scalable vectors and non-scalable
types. Falling back is the quickest way to fix the crash.
Fixes #167618
AMDGPU: Really use AV classes by default for vector classes
Update getRegClassFor to use AV classes in place of VGPRs for
gfx90a-gfx950. There are a handful of regressions. Most are
enabling unprofitable rematerialization which reduce register
count by 1 but add an unnecessary instruction.
32-bitcase
Note this does very little because we only use VGPR classes
for FP types (though this doesn't particularly make any sense),
and we legalize normal loads and stores to integer.
[AMDGPU] Insert `s_wait_xcnt(0)` before atomics to work around write-combining miss hazard
This patch adds a workaround for a hazzard on GFX1250, which inserts an `s_wait_xcnt(0)` instruction before any atomic operation that might write to memory.
Fixes SWDEV-543703.
AMDGPU: Start to use AV classes for unknown vector class
Use AGPR+VGPR superclasses for gfx90a+. The type used
for the class should be the broadest possible class, to
be contextually restricted later. InstrEmitter clamps these
to the common subclass of the context use instructions, so we're
best off using the broadest possible class for all types.
Note this does very little because we only use VGPR classes
for FP types (though this doesn't particularly make any sense),
and we legalize normal loads and stores to integer.
[AMDGPU] Insert `s_wait_xcnt(0)` before atomics to work around write-combining miss hazard
This patch adds a workaround for a hazzard on GFX1250, which inserts an `s_wait_xcnt(0)` instruction before any atomic operation that might write to memory.
Fixes SWDEV-543703.
[clang-doc] lift Mustache template generation from HTML
To prepare for more backends to use Mustache templates, this patch lifts
the Mustache functionality from HTMLMustacheGenerator.cpp to
Generators.h. A MustacheGenerator interface is created to share code for
template creation.
[mlir][linalg] Fix Linalg runtime verification test (#167814)
This integration test has been broken for a while. This commit partially
fixes it.
- Use `CHECK` + `CHECK-NEXT` to ensure that the correct error lines are
matched together.
- Move all `CHECK-NOT` to the end. Having a `CHECK` with the same string
does not make sense after a `CHECK-NOT`.
- Add a missing `CHECK: ERROR` for one of the test cases.
- Deactivate `reverse_from_3`, which is broken, and put a TODO.
[AMDGPU] Document meaning of alignment of buffer fat pointers, intrinsics (#167553)
This commit adds documentation clarifying the meaning of `align` on ptr
addrpsace(7) (buffer fat pointer) and ptr addrspace(9) (bufferef
structured pointer) operations (specifying that both the base and the
offset need to be aligned) and documents the meaning of the `align`
attribute when used as an argument on *.buffer.ptr.* intrinsics.
[RISCV] Remove custom legalization of v2i16/v4i8 loads for P extension. (#167651)
We can use the default legalization which will create an i32 load
followed by a v2i32 scalar_to_vector followed by a bitcast. We can isel
the scalar_to_vector like a bitcast and not generate any instructions
for it.
[MLIR] Add reduction interface with tester to mlir-reduce (#166096)
Currently, we don't have support for patterns that need access to a
`Tester` instance in `mlir-reduce`. This PR adds
`DialectReductionPatternWithTesterInterface` to the set of supported
interfaces. Dialects can implement this interface to inject the tester
into their pattern classes.
[AMDGPU] Insert `s_wait_xcnt(0)` before atomics to work around write-combining miss hazard
This patch adds a workaround for a hazzard on GFX1250, which inserts an `s_wait_xcnt(0)` instruction before any atomic operation that might write to memory.
Fixes SWDEV-543703.
[clang-doc] lift Mustache template generation from HTML
To prepare for more backends to use Mustache templates, this patch lifts
the Mustache functionality from HTMLMustacheGenerator.cpp to
Generators.h. A MustacheGenerator interface is created to share code for
template creation.
[ASan] Fix forward 141c2b
When landing 141c2b I didn't realize that none of these files actually
got built either locally or by premerge. I had some minor syntax
mistakes that caused the build to fail. This patch fixes those issues
and has been verified on a Windows machine.
Reland yet again: [mlir] Add FP software implementation lowering pass: `arith-to-apfloat` (#167608)
Fix both symbol visibility issue in the mlir_apfloat_wrappers lib and the linkage issue in ArithToAPFloat.