[AMDGPU] Pack overflow inreg args into VGPR lanes
When inreg function arguments overflow the available SGPRs, pack multiple values
into lanes of a single VGPR using writelane/readlane instead of consuming one
VGPR per overflow argument.
The feature is behind a flag (default off) and currently only supports the
SelectionDAG path.
Known issue: if the register allocator does not coalesce the COPY between the
writelane chain and the physical call argument register, the resulting v_mov_b32
is EXEC-dependent and will not transfer inactive lanes. This is correct when
EXEC is all-ones (the common case at call sites) but would be incorrect inside
divergent control flow.
[InstCombine] Restrict foldICmpOfVectorReduce to one-use (#182833)
Follow up on 279b3dbe ([InstCombine] Fold icmp (vreduce_(or|and) %x),
(0|-1), #182684) to fix a regression by restricting the fold to one-use.
Regression: https://godbolt.org/z/f38b169MM
[libc][math] Refactor getpayload family functions to header-only (#181824)
Refactors the payload_functions math family to be header-only.
part of: https://github.com/llvm/llvm-project/issues/181823
Target Functions:
- getpayload
- getpayloadbf16
- getpayloadf
- getpayloadf128
- getpayloadf16
[Clang][NFC] Don't redefine __trap macro in tests for PowerPC (#182898)
These `OverflowBehaviorType` tests were failing due to PowerPC already
defining a __trap macro.
We can just remove the __wrap and __trap macros as they are unused in these tests.
Signed-off-by: Justin Stitt <justinstitt at google.com>
Move the ObjC blocks layout bitmap to the cstring section (#182398)
This is a follow-up to https://github.com/llvm/llvm-project/pull/174705
There's one additional place in the ObjC code gen logic to make sure the
ObjC blocks layout is generated in the regular cstring section.
[Clang][AMDGPU][Docs] Add builtin documentation for AMDGPU builtins (#181574)
Use the documentation generation infrastructure to document the AMDGPU
builtins.
This PR starts with the ABI / Special Register builtins. Documentation
for the remaining builtin categories will be added incrementally in
follow-up patches.
[DAG] isKnownNeverZero - add DemandedElts argument (#182679)
Following changes were made for isKnownNeverZero :
- Added BUILDVECTOR and SPLATVECTOR cases.
- Added support for DemandedElts arguments for SELECT/VSELECT cases.
- Added tests for constants and SELECT/VSELECT.
Closes #181656
[lldb][bazel] Add HighlighterDefault, rename ClangHighlighter targets (#182693)
Rename `PluginClangHighlighter` to `PluginHighlighterClang` for
consistency with the directory-based naming convention, add the new
`PluginHighlighterDefault` library, and register both `HighlighterClang`
and `HighlighterDefault` in `DEFAULT_PLUGINS`.
[AMDGPU][ISel] Reduce 64-bit `setcc` to upper 32 bits if lower 32 bits are known (#181238)
Truncate 64-bit integral `setcc`s to their upper 32-bit operands if
enough information is known about their lower 32-bit operands, subsuming
the special cases handled in #177662.
Alive2 verification for analogous IR transformations:
[xdATxK](https://alive2.llvm.org/ce/z/xdATxK)
[AMDGPU] Fix caller/callee mismatch in SGPR assignment for inreg args (#182754)
On the callee side, `LowerFormalArguments` marks SGPR0-3 as allocated in
`CCState` before running the CC analysis. On the caller side,
`LowerCall` (and GlobalISel's `lowerCall`/`lowerTailCall`) added the
scratch resource to `RegsToPass` without marking it in `CCState`. This
caused `CC_AMDGPU_Func` to treat SGPR0-3 as available on the caller
side, assigning user inreg args there, while the callee skipped them
without marking it in `CCState`. This caused `CC_AMDGPU_Func` to treat
SGPR0-3 as available on the caller side, assigning user inreg args
there, while the callee skipped them.
[Clang][AMDGPU][Docs] Add builtin documentation for AMDGPU builtins
Use the documentation generation infrastructure to document the AMDGPU builtins.
This PR starts with the ABI / Special Register builtins. Documentation for the
remaining builtin categories will be added incrementally in follow-up patches.
[VPlan] Start implementing VPlan-based stride multiversioning
This commit only implements the run-time guard without actually
optimizing the vector loop. That would come in a separate PR to ease
review.
[CIR] Fix HLSL test that crashes (#182894)
This was caused by #182609, which just changed the way the AST stores
these, which causes us to hit an NYI in a way that doesn't recover
nicely. In the future, we could probably represent a 'no op' instead of
an empty op in the IR for these cases, but there isn't much use for it,
since it is always after NYI.
This patch changes the test to use float instead of float1 as suggested
in review, which avoids the problematic conversion.
[mlir][acc] Add parallelism mapping policy interface (#182890)
Add a header that defines the interface for mapping OpenACC parallelism
levels (gang, worker, vector) to target-specific parallel dimension
attributes. Alongside this,
DefaultACCToGPUMappingPolicy is introduced for an initial implementation
of ACC parallelism to GPU mapping.
[AMDGPU] Fix caller/callee mismatch in SGPR assignment for inreg args
On the callee side, `LowerFormalArguments` marks SGPR0-3 as allocated in
`CCState` before running the CC analysis. On the caller side, `LowerCall` (and
GlobalISel's `lowerCall`/`lowerTailCall`) added the scratch resource to
`RegsToPass` without marking it in `CCState`. This caused `CC_AMDGPU_Func` to
treat SGPR0-3 as available on the caller side, assigning user inreg args there,
while the callee skipped them without marking it in `CCState`. This caused
`CC_AMDGPU_Func` to treat SGPR0-3 as available on the caller side, assigning
user inreg args there, while the callee skipped them.
[SandboxIR][Region] Replace exit() with reportFatalUsageError() (#182134)
`Region::createRegionsFromMD()` parses the IR and the corresponding
metadata and forms one or more Regions. If an instruction is tagged as
being part of the "auxiliary" vector of the region, then a check
enforces that it should also be part of a region, i.e., it should have
both `!sandboxaux` and `!sandboxvec` metadata, not just `!sandboxaux`.
The check used to `exit(1)` after printing an error, but it's better to
abort using LLVM's error handling functions. Since the user can write
the IR by hand I think it makes sense to report this as a usage error
with `reportFatalUsageError()`, and not as an internal error.