[SPIR-V] initial support for @llvm.structured.gep (#178668)
This commit adds initial support to lower the intrinsinc
`@llvm.structured.gep` into proper SPIR-V.
For now, the backend continues to support both GEP formats. We might
want to revisit this at some point for the logical part.
CONTRIBUTING.md: Clarify GitHub pull requests
Make the guidelines more prescriptive (while remaining clear that Pull
Requests are merely one, not-preferred method for submitting changes).
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55089
[AMDGPU] Add legalization rules for G_ATOMICRMW_FADD (#175257)
G_ATOMICRMW_FADD is supported on flat, global and local address spaces
for S32, S64 and V2S16 values.
Reland "[LoopVectorize] Support vectorization of overflow intrinsics" (#180526)
Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`,
`ssub`, `umul` and `smul` as trivially vectorizable.
Fixes #174617
---
This patch is a reland of #174835.
Reverts #179819
[llvm][DebugInfo] Avoid attaching retained nodes to unrelated subprograms in DIBuilder (#180294)
Fix a regression introduced by
https://github.com/llvm/llvm-project/pull/165032, where DIBuilder could
attach local metadata nodes to the wrong subprogram during finalization.
DIBuilder records freshly created local variables, labels, and types in
`DIBuilder::SubprogramTrackedNodes`, and later attaches them to their
parent subprogram's retainedNodes in `finalizeSubprogram()`.
However, a temporary local type created via
`createReplaceableCompositeType()` may later be replaced by a type with
a different scope.
DIBuilder does not currently verify that the scopes of the original and
replacement types match.
As a result, local types can be incorrectly attached to the
retainedNodes of an unrelated subprogram. This issue is observable in
clang with limited debug info mode (see
[5 lines not shown]
ApiControllerBase->exportCsv: add $separator as parameter and swtich the default to a semicolon (;), importCsv() already understands both, but semicolon seems to be more commonly used, which helps tools like Excel to open the file instantly as table.
[IR] Update docstring for stripAndAccumulateConstantOffset (#180365)
Make it clear that the returned object in the case where a variable
offset is found is the first value to introduce a non-constant offset,
not necessarily the actual underlying object.
Found while investigating #180361.
[IVDesc] Check loop-preheader for loop-legality when pass-remarks enabled (#166310)
When `-pass-remarks=loop-vectorize` is specified, the subsequent logic
is executed to display detailed debug messages even if no PreHeader
exists in the loop.
Therefore, an assert occurs when the `getLoopPreHeader()` function is
called. This commit resolves that issue.
Fixed: #165377
[AMDGPU] Introduce asyncmark/wait intrinsics
Asynchronous operations are memory transfers (usually between the global memory
and LDS) that are completed independently at an unspecified scope. A thread that
requests one or more asynchronous transfers can use async marks to track their
completion. The thread waits for each mark to be completed, which indicates that
requests initiated in program order before this mark have also completed.
For now, we implement asyncmark/wait operations on pre-GFX12 architectures that
support "LDS DMA" operations. Future work will extend support to GFX12Plus
architectures that support "true" async operations.
Co-authored-by: Ryan Mitchell ryan.mitchell at amd.com
Fixes: SWDEV-521121
[libc++] Reduce the number of runs on the format_to{,n} and formatted_size benchmarks (#179922)
Testing a bunch of sizes has relatively little value. This reduces the
number of benchmarks so we can run them on a regular basis. This saves
~8 minutes when running the benchmarks.
[NFC][ELF] Fix overzealous find/replace affecting comments
Commit a94060ca0c87 ("[ELF] Pass Ctx & to Relocations") swapped the
InputSectionBase &c argument for an InputSectionBase *sec member, and so
"c." was replaced with "sec->". However, this must have done in such a
way that "Local-Exec." was transformed to "Local-Exesec->" and
"RISCV::relocateAlloc." to "RISCV::relocateAllosec->", i.e. without the
use of something like clangd, and without appropriate word boundaries in
a regex.
[AMDGPU] Fix instruction size for 64-bit literal constant operands (#180387)
`getLit64Encoding` uses a different approach to determine whether 64-bit
literal encoding is used, which caused a size mismatch between the
`MachineInstr` and the `MCInst`.
For `!isValid32BitLiteral`, it is effectively `!(isInt<32>(Val) ||
isUInt<32>(Val))`, which is `!isInt<32>(Val) && !isUInt<32>(Val)`, but
in `getLit64Encoding`, it is `!isInt<32>(Val) || !isUInt<32>(Val)`.