[libc] Support array tags in the RPC dispatch helpers (#181395)
Summary:
This PR adds support for tagging a pointer as an array when marshaling
between the CPU and GPU.
[AMDGPU] Add target features to guard DPP controls (#182391)
This patch adds target features:
- `+dpp-wavefront-shifts`, for DPP `wave_shl/rol/shr/ror`
- `+dpp-row-bcast`, for DPP `row_bcast15/31`
These DPP controls are not available in gfx10+, so these target features
enable `AMDGPURemoveIncompatibleFunctions` to remove functions that rely
on these controls when compiling for newer GPUs.
[Clang][AST][NFC] Correct Comment in GenericSelectionExpr (#180850)
Correct a misleading comment about the number/type of trailing objects
in the GenericSelectionExpr.
[lldb][test] delayed-definition-die-searching.test: compile without simple template names
Fails on Darwin after we made `-gsimple-template-names` the default (in https://github.com/llvm/llvm-project/pull/182297):
```
13:42:19 | # CHECK: DWARFASTParserClang::ParseTypeFromDWARF{{.*}}DW_TAG_structure_type (DW_TAG_structure_type) name = 't2<t1>'
13:42:19 | ^
13:42:19 | <stdin>:9:12: note: scanning from here
13:42:19 | (lldb) p v1
13:42:19 | ^
13:42:19 | <stdin>:10:278: note: possible intended match here
13:42:19 | (arm64) /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/tools/lldb/test/Shell/SymbolFile/DWARF/Output/delayed-definition-die-searching.test.tmp.out: DWARFASTParserClang::ParseTypeFromDWARF (die = 0x0000000000000037, decl_ctx = 0x0000000B723D2030 (die 0x000000000000000c)) DW_TAG_structure_type (DW_TAG_structure_type) name = 't2')
13:42:19 |
```
This just checks the delayed definition search. It used to always run without `-gsimple-template-names`, so we're not losing coverage here. Also the failure is expected with `-gsimple-template-names` because the DIE name no longer include template parameters. I didn't want to make the `CHECK` less strict because it useful to check that the types being resolved are the correct instantiations.
[DA] Add tests for dependencies are missed due to addrecs wrap (NFC) (#179683)
Add test cases where dependencies are missed since nowrap properties of
addrecs are not checked properly. This patch doesn't contain test cases
for the MIV tests.
[SLP]Do not convert inversed cmp nodes, if they reordered/reused
If the cmp node with inversed compares must be reordered/shuffled with
the reuses, disable transformation for such nodes for now, they require
some special processing.
Fixes https://github.com/llvm/llvm-project/pull/181580#issuecomment-3933026221
[flang-rt] Implement basic support for I/O from OpenMP GPU Offloading (#181039)
Summary:
This PR provides the minimal support for Fortran I/O coming from a GPU
in OpenMP offloading. We use the same support the `libc` uses for its
printing through the RPC server. The helper functions `rpc::dispatch`
and `rpc::invoke` help make this mostly automatic.
Becaus Fortran I/O is not reentrant, the vast majority of complexity
comes from needing to stitch together calls from the GPU until they can
be executed all at once. This is needed not only because of the
limitations of recursive I/O, but without this the output would all be
interleaved because of the GPU's lock-step execution.
As such, the return values from the intermediate functions are
meaningless, all returning true. The final value is correct however. For
cookies we create a context pointer on the server to chain these
together.
[23 lines not shown]
[flang][NFC] Converted five tests from old lowering to new lowering (part 18) (#182439)
Tests converted from test/Lower/forall: forall-allocatable.f90,
forall-allocatable-2.f90, forall-array.f90, forall-construct-2.f90,
forall-construct-3.f90
[AMDGPU] Add GatherToLDS async flag back in FoldMemRefOpsIntoGatherToLDSOp (#182364)
I discovered that async flag on GatherToLDS op got dropped going through
the lowering pipeline so adding it back as it should.
[AMDGPU] Ensure all PERMLANE instructions are marked as convergent (#182162)
All PERMLANE instructions in AMDGPUGenInstrInfo.inc were verified to now
be marked as convergent. This is necessary to prevent PERMLANE
instructions from being incorrectly sunk by machine-sink.
---------
Signed-off-by: John Lu <John.Lu at amd.com>
[LoopIdiomVectorize] Bail when vectorization is disabled (#181142)
Bail on vectorizing a loop in LoopIdiomVectorize when the loop carries
hints that indicate vectorization is disabled.
This means that LoopIdiomVectorize will now respect vectorize(disable)
loop hints.
AMDGPU/GlobalISel: Regbanklegalize rules for INTRIN_IMAGE (#179810)
Regbanklegalize rules for INTRIN_IMAGE loads and stores.
Because of very large number of different type signatures, rule specifies
only function for lowering (waterfall lowering of RsrcIdx operand if needed)
and this function also applies register banks.
[clang][Driver][Darwin] Turn on -gsimple-template-names for Darwin by default (#182297)
Enables `-gsimple-template-names=simple` when targeting recent Apple
platforms (26 or later, except `DriverKit` which is at 25). Those are
platforms where the associated LLDB is capable of debugging
`simple-template-names` debug-info.
The two main affects on debug-info are:
1. forward declarations for structures now have
`DW_TAG_type_template_parameter`s (since this is required to reconstruct
the template names if just given a forward declaration
2. `DW_AT_name` of templates will not include template parameters
anymore (except for a few cases where the name is not reconstitutible
from the template parameter DIE names)
While the `.debug_str` section is reduced in size (due to shorter
`DW_AT_name`s), this is somewhat offset by having to include template
parameter DIEs on forward declarations.