CodeGen/AMDGPU: Allow 3-address conversion of bundled instructions
This is in preparation for future changes in AMDGPU that will make more
substantial use of bundles pre-RA. For now, simply test this with
degenerate (single-instruction) bundles.
commit-id:4a30cb78
CodeGen: Handle bundled instructions in two-address-instructions pass
If the instruction with tied operands is a BUNDLE instruction and we
handle it by replacing an operand, then we need to update the
corresponding internal operands as well. Otherwise, the resulting MIR is
invalid.
The test case is degenerate in the sense that the bundle only contains a
single instruction, but it is sufficient to exercise this issue.
commit-id:6760a9b7
CodeGen: More accurate mayAlias for instructions with multiple MMOs (#166211)
There can only be meaningful aliasing between the memory accesses of
different instructions if at least one of the accesses modifies memory.
This check is applied at the instruction-level earlier in the method.
This change merely extends the check on a per-MMO basis.
This affects a SystemZ test because PFD instructions are both mayLoad
and mayStore but may carry a load-only MMO which is now no longer
treated as aliasing loads. The PFD instructions are from llvm.prefetch
generated by loop-data-prefetch.
[flang][debug] Add debug type support for procedure pointers (#166764)
Fixes #161223
Procedure pointers in Fortran were generating incorrect debug type
information, showing as 'integer' in GDB instead of the actual procedure
signature.
[DirectX] Let data scalarizer pass account for sub-types when updating GEP type (#166200)
This pr lets the `dxil-data-scalarization` account for a GEP with a
source type that is a sub-type of the pointer operand type.
The pass is updated so that the replaced GEP introduces zero indices
such that the result type remains the same (with the vector -> array
transform).
Please see resolved issue for an annotated example.
Resolves: https://github.com/llvm/llvm-project/issues/165473
CodeGen/AMDGPU: Allow 3-address conversion of bundled instructions
This is in preparation for future changes in AMDGPU that will make more
substantial use of bundles pre-RA. For now, simply test this with
degenerate (single-instruction) bundles.
commit-id:4a30cb78
CodeGen: Handle bundled instructions in two-address-instructions pass
If the instruction with tied operands is a BUNDLE instruction and we
handle it by replacing an operand, then we need to update the
corresponding internal operands as well. Otherwise, the resulting MIR is
invalid.
The test case is degenerate in the sense that the bundle only contains a
single instruction, but it is sufficient to exercise this issue.
commit-id:6760a9b7
[CI] Ensure compatibility with Python 3.8
55436aeb2e8275d803a0e1bdff432717a1cf86b5 broke this on Windows as we
only use Python 3.9 there, but the construct is only supported from 3.10
onwards. Use the old Optional type to ensure compatibility.
[clang][AST] Do not try to handle irrelevant cases in writeBareSourceLocation (#166588)
`writeBareSourceLocation` is always called on either `Expanded` or
`Spelling` location, in any on those cases the
`SM.getSpellingLineNumber(Loc) == SM.getExpansionLineNumber(Loc) ==
SM.getLineNumber(Loc)`.
[IR] llvm.reloc.none intrinsic for no-op symbol references (#147427)
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.
See issue #146159 for context.
[AArch64][llvm] Add support for vmmlaq_[f16,f32]_mf8 intrinsics
Add support for the following new intrinsics:
```
float16x8_t vmmlaq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t);
float32x4_t vmmlaq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t);
```
[mlir][acc] Erase empty kernel_environment ops during canonicalization (#166633)
This change removes empty `acc.kernel_environment` operations during
canonicalization. This could happen when the acc compute construct
inside the `acc.kernel_environment` is optimized away in cases such as
when only private variables are being written to in the loop.
In cases of empty `acc.kernel_environment` ops with waitOperands, we
still remove the empty `acc.kernel_environment`, but also create an
`acc.wait` operation to take those wait operands to preserve
synchronization behavior.
[AtomicExpand] Add bitcasts when expanding load atomic vector
AtomicExpand fails for aligned `load atomic <n x T>` because it
does not find a compatible library call. This change adds appropriate
bitcasts so that the call can be lowered. It also adds support for
128 bit lowering in tablegen to support SSE/AVX.
[X86] Cast atomic vectors in IR to support floats
This commit casts floats to ints in an atomic load during AtomicExpand to support
floating point types. It also is required to support 128 bit vectors in SSE/AVX.
[Offload] Remove handling for device memory pool (#163629)
Summary:
This was a lot of code that was only used for upstream LLVM builds of
AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so
just use that. Simplifies code, can be added back if we start providing
alternate forms but I don't think there's a single use-case that would
justify it yet.
[CI][NFC] Refactor compute_platform_title into generate_test_report_lib
This enables reuse in other CI components, like
premerge_advisor_explain.py.
Reviewers: DavidSpickett, gburgessiv, Keenuts, dschuff, lnihlen
Reviewed By: Keenuts, DavidSpickett
Pull Request: https://github.com/llvm/llvm-project/pull/166604