[AMDGPU][GFX12.5] Reimplement monitor load as an atomic operation (#177343)
Load monitor operations make more sense as atomic operations, as
non-atomic operations cannot be used for inter-thread communication w/o
additional synchronization.
The previous built-in made it work because one could just override the
CPol bits, but that bypasses the memory model and forces the user to learn
about ISA bits encoding.
Making load monitor an atomic operation has a couple of advantages.
First, the memory model foundation for it is stronger. We just lean on the
existing rules for atomic operations. Second, the CPol bits are abstracted away
from the user, which avoids leaking ISA details into the API.
This patch also adds supporting memory model and intrinsics
documentation to AMDGPUUsage.
Solves SWDEV-516398.
[PhaseOrdering] Regenerate test checks (NFC)
The partial check lines while claiming UTC output here were
highly confusing. Regenerate the check lines. While here, use a
newer version and rename blocks to avoid anon block conflicts.
[InstCombine] Relax one-use check for min/max(fpext x, fpext y) to fpext(min/max(x, y)) fold (#180164)
If only of the operands is one-use, the total number of fpexts stays the
same, but the min/max is performed on a narrowed type. Additionally, the
fpext may fold with a following fptrunc.
[CIR][AArch64] Add lowering for predicated SVE svdup builtins (zeroing) (#175976)
This PR adds CIR lowering support for predicated SVE `svdup` builtins on
AArch64. The corresponding ACLE intrinsics are documented at:
https://developer.arm.com/architectures/instruction-sets/intrinsics
This change focuses on the zeroing-predicated variants (suffix `_z`,
e.g. `svdup_n_f32_z`), which lower to the LLVM SVE `dup` intrinsic
with a `zeroinitializer` passthrough operand.
IMPLEMENTATION NOTES
--------------------
* The CIR type converter is extended to support `BuiltinType::SveBool`,
which is lowered to `cir.vector<[16] x i1>`, matching current Clang
behaviour and ensuring compatibility with existing LLVM SVE lowering.
* Added logic that converts `cir.vector<[16] x i1>` according to the
underlying element type. This is done by calling
`@llvm.aarch64.sve.convert.from.svbool`.
[58 lines not shown]
AMDGPU: Add syntax for s_wait_event values (#180272)
Previously this would just print hex values. Print names for the
recognized values, matching the sp3 syntax.
[RISCV] Add cost for @llvm.vector.splice.{left,right} (#179219)
Currently vector splice intrinsics are costed through getShuffleCost
when the offset is fixed. When the offset is variable though we can't
use a shuffle mask so it currently returns invalid.
This implements the cost in RISCVTTIImpl::getIntrinsicInstrCost as the
cost of a slideup and a slidedown, which matches the codegen.
It also implements the type based cost whenever the offset argument
isn't available.
It may be possible to reduce the cost in future when one of the vector
operands is known to be poison, in which case we only generate a single
slideup or slidedown.
tests: fix tests broken by #9744
The static option list caching mechanism is now invoked over a
nonexistant config.xm; caching no interfaces for the "*" static
options key. In order to fix that add a reset for the list.
Would be nicer to move reset to BaseListField since a number
of fields use the static option list for caching but they all
define their own.
Remove assumption that known type UIDs have a local alias
Currently this changes nothing, as all known types have a known
name (aka description), which is unlikely to ever change, and have
been assigned an alias for use with gpt -- which might change in the
future if we learn about types which no-one is ever likely to want
to use gpt(8) to create (but which might exist in tables processed).
Avoiding the assumptions is cheap (for both alias and name), so just
do it, and no-one will ever need to care in the future.
AMDGPU: Add syntax for s_wait_event values
Previously this would just print hex values. Print names for the
recognized values, matching the sp3 syntax.
[RISCV][TTI] Adjust the cost of `llvm.abs` intrinsic when `Zvabd` exists
When `Zvabd` exists, `llvm.abs` is lowered to `vabs.v` so the cost
is 1.
Reviewers: mshockwave, topperc, lukel97, skachkov-sc, preames
Reviewed By: topperc
Pull Request: https://github.com/llvm/llvm-project/pull/180146
AMDGPU: Add llvm.amdgcn.s.wait.event intrinsic (#180170)
Exactly match the s_wait_event instruction. For some reason we already
had this instruction used through llvm.amdgcn.s.wait.event.export.ready,
but that hardcodes a specific value. This should really be a bitmask
that
can combine multiple wait types.
gfx11 -> gfx12 broke compatabilty in a weird way, by inverting the
interpretation of the bit but also shifting the used bit by 1. Simplify
the selection of the old intrinsic by just using the magic number 2,
which should satisfy both cases.
tests: one more for protocol replacements; closes #9744
The tests aren't complete but they do cover parseReplaceSimple()
in its latest form so that's good enough.
Just as a note the tests are designed to be render-agnostic so
that we always start with our rule input and produce pf.conf
compatible rulesets with the tests. There are two purposes here:
1. Catch regressions when parsers are changed and that also includes
switching the parser implementation completely in the future.
2. Make sure that the files are actually compilable by pf.conf and
this should be covered later (the conf files are there on the
disk for that purpose).
This is the right type of testing for the purpose since the pf.conf
syntax is virtually static and will require little maintenance.
Just needs a lot more coverage for the missing features/rule types.
CONTRIBUTING.md: Fix links to section
Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
Sponsored by: The FreeBSD Foundation
Pull Request: https://github.com/freebsd/freebsd-src/pull/2010
[ARM] Treat strictfp vector rounding operations as legal
Previously, the strictfp variants of rounding operations (FLOOR, ROUND,
etc) were handled in SelectionDAG via the default expansion, which
splits vector operation into scalar ones. This results in less efficient
code.
This change declares the strictfp counterparts of the vector rounding
operations as legal and modifies existing rules in tablegen descriptions
accordingly.