libclc: Use elementwise exp for exp functions (#185626)
For amdgpu use the exp intrinisc. Really, this should be
the default generic implementation. But we're stuck in a
mess where essentially nothing works. All of the exp
intrinsics work for AMDGPU, but aren't really implemented
for spirv or nvptx. Ideally the intrinsic and/or libm call
would be the default implementation.
[AMDGPU][Doc] GFX12.5 Barrier Execution Model
- Document GFX12.5-specific intrinsics.
- Rename signal -> arrive, leave -> drop to match C++ terminology.
- Update execution model to support GFX12.5 semantics (e.g. threads can arrive w/o waiting)
- Various clean-ups & wording updates on the model.
- Added "mutually exclusive" barrier objects.
- Added barrier-phase-with + related constraints.
- Document that barriers can exist at cluster scope too.
- Update GFX12 target semantics/code sequences to include GFX12.5.
The model is no longer marked as incomplete, it is now just experimental.
There are more updates planned in the future to support more features, and
improve some known shortcomings of the model. e.g., currently many relations
encode too much semantic information, which means the model doesn't build
when barriers aren't used correctly. I'd like the model to eventually represent
broken executions as well, just like a memory model can.
libclc: Use elementwise exp for exp functions
For amdgpu use the exp intrinisc. Really, this should be
the default generic implementation. But we're stuck in a
mess where essentially nothing works. All of the exp
intrinsics work for AMDGPU, but aren't really implemented
for spirv or nvptx. Ideally the intrinsic and/or libm call
would be the default implementation.
[DA] Remove outdated comments (NFC) (#185621)
Recently, the consistent flag and the peeling flags were removed from
the `Dependence` class (#181608, #183737). However, the related comments
were not deleted accordingly. This patch cleans them up.
Revert "[AMDGPU] Enable scheduler mfma rewrite stage by default" (#185604)
Reverts llvm/llvm-project#180751
Enabling this pass by default breaks a few tests / use cases downstream.
@frederik-h was also looking into the actual implementation of the pass.
For now: Just revert that pass to be on-by-default.
Also fix a typo in the process.
---------
Co-authored-by: Jay Foad <jay.foad at amd.com>
libclc: Use elementwise exp for exp functions
For amdgpu use the exp intrinisc. Really, this should be
the default generic implementation. But we're stuck in a
mess where essentially nothing works. All of the exp
intrinsics work for AMDGPU, but aren't really implemented
for spirv or nvptx. Ideally the intrinsic and/or libm call
would be the default implementation.
[CIR][AArch64] Add support for the remaining `vceqz` builtins
Implement the remaining CIR lowerings for the AdvSIMD (Neon)
`vceqz` intrinsic group (bitwise equal to zero).
Most variants of `vceqz` variant were already supported; this patch
completes the rest of the group [1] that was left as a TODO.
Tests for these intrinsics are moved from:
* test/CodeGen/AArch64/neon_intrinsics.c
* test/CodeGen/AArch64/v8.2a-fp16-intrinsics.c
to:
* test/CodeGen/AArch64/neon/intrinsics.c
* test/CodeGen/AArch64/neon/fullfp16,
respectively.
The implementation largely mirrors the existing lowering in
[4 lines not shown]
[lldb][PlatformDarwin][NFC] Use formatv-style format string in LocateExecutableScriptingResourcesFromDSYM (#185622)
About to make changes in this area and using `formatv` instead of
`printf` style format specifiers makes those easier to follow.
libclc: Remove amdgpu sqrt override (#185620)
The generic intrinsic should be used. A very long time ago
the sqrt intrinsic did not work for f64, but it's implemented
essentially the same way as this.