[RISCV] tt-ascalon-d8 vector scheduling (#167066)
Add the vector scheduling model for tt-ascalon-d8 and corresponding
llvm-mca tests.
---------
Co-authored-by: Craig Topper <craig.topper at sifive.com>
[flang][cuda] Add support to allocate scalar character types (#169550)
Add support for character declared like:
```
subroutine sub1()
character*4, device :: b
end subroutine
```
[mlir][tblgen] Don't echo absolute paths into rewrite pattern source (#168984)
Currently, the declarative pattern rewrite generator will always print
the [source]:[line](s) from which a pattern came. This is a useful
debugging hint, but it causes problem when absolute paths are used as
arguments to mlir-tblgen (which LLVM's build rules automatically do).
Specifially, it causes the source to be tied to the build location,
harning reproducability and our collective ability to get ccache hits
from, say, separate worktrees.
This commit resolves the issue by replacing absolute paths in thes
"Generated from:" comments with their filenames. (The alternative would
have been to implement an entire file-prefix-map the way the C compilers
do, but since this is an isolated incident, I chose to resolve it
locally.)
[BOLT][Tests] Use AT&T assembler syntax only for X86 tests (#169541)
Enabling AT&T syntax for all tests is broken when X86 target is not
enabled as reported in #167225.
[mlir] Fix build failure with BUILD_SHARED_LIBS=ON
/usr/bin/ld: tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGP
UPipelines.dir/GPUToXeVMPipeline.cpp.o: in function `mlir::gpu::buildLo
werToXeVMPassPipeline(mlir::OpPassManager&, mlir::gpu::GPUToXeVMPipelin
eOptions const&)':
GPUToXeVMPipeline.cpp:(.text._ZN4mlir3gpu28buildLowerToXeVMPassPipeline
ERNS_13OpPassManagerERKNS0_24GPUToXeVMPipelineOptionsE+0x1293): undefin
ed reference to `mlir::createConvertVectorToLLVMPass()'
[AArch64][PAC] Factor out printing real AUT/PAC/BLRA encodings (NFC)
Separate the low-level emission of the appropriate variants of `AUT*`,
`PAC*` and `B(L)RA*` instructions from the high-level logic of pseudo
instruction expansion.
Introduce `getBranchOpcodeForKey` helper function by analogy to
`get(AUT|PAC)OpcodeForKey`.
AMDGPU: Use RegClassByHwMode to manage GWS operand special case (#169373)
On targets that require even aligned 64-bit VGPRs, GWS operands
require even alignment of a 32-bit operand. Previously we had a hacky
post-processing which added an implicit operand to try to manage
the constraint. This would require special casing in other passes
to avoid breaking the operand constraint. This moves the handling
into the instruction definition, so other passes no longer need
to consider this edge case. MC still does need to special case this,
to print/parse as a 32-bit register. This also still ends up net
less work than introducing even aligned 32-bit register classes.
This also should be applied to the image special case.
[lld] macho: Support section branch relocations, including the 1-byte form (#169062)
I noticed that we had a hardcoded value of 4 for the pcrel section
relocations, which seems like an issue given that we recently added
support for 1-byte branch relocations in
https://github.com/llvm/llvm-project/pull/164439. The code included an
assert that the relevant relocation had the BYTE4 attribute, but that is
actually not enough to use a hardcoded value of 4: we need to assert
that the *other* `BYTE<n>` attributes are not set either.
However, since we did not support local branch relocations, that doesn't
seem to have mattered in practice. That said, local branch relocations
can be emitted by compilers, and ld64 does handle the 4-byte version of
them, so I've added support for it here.
ld64 actually seems to reject 1-byte section relocations, so the
questionable code is actually probably fine (minus the incorrect
assert). So we have two options: add an equivalent check in LLD, or just
support 1-byte local branch relocations. Supporting it actually requires
less code, so I've gone with that option here.
[OpenMP][Clang] Parsing/Sema support for `need_device_ptr(fb_nullify/fb_preserve)`. (#168905)
This patch adds parsing, semantic handling, and diagnostics for the
`OpenMP 6.1 fb_nullify` and` fb_preserve` fallback modifiers used with
the `need_device_ptr` map modifier.
[OpenACC][CIR][NFC] Remove 'NYI' diagnostics, since we're done with t… (#169543)
…hese
We've finished all of the clauses/etc that we're going to use this
visitor for, so we can remove the SourceLocation we used just for that,
and replace all NYI with unreachables.
AMDGPU: Use RegClassByHwMode to manage GWS operand special case
On targets that require even aligned 64-bit VGPRs, GWS operands
require even alignment of a 32-bit operand. Previously we had a hacky
post-processing which added an implicit operand to try to manage
the constraint. This would require special casing in other passes
to avoid breaking the operand constraint. This moves the handling
into the instruction definition, so other passes no longer need
to consider this edge case. MC still does need to special case this,
to print/parse as a 32-bit register. This also still ends up net
less work than introducing even aligned 32-bit register classes.
This also should be applied to the image special case.
AMDGPU: Stop forcing RequiresCodeGenSCCOrder (#169522)
This hasn't been strictly necessary since c897c13dde.
Practically this makes little difference; we still enable IPRA
by default which implies this option. By removing this explicit
force, -enable-ipra=0 has the expected change in the pass pipeline
to remove the DummyCGSCC runs.
AMDGPU: Try to use zext to implement constant-32-bit addrspacecast
If the high bits are assumed 0 for the cast, use zext. Previously
we would emit a build_vector and a bitcast with the high element
as 0. The zext is more easily optimized. I'm less convinced this is
good for globalisel, since you still need to have the inttoptr back
to the original pointer type.
The default value is 0, though I'm not sure if this is meaningful
in the real world. The real uses might always override the high
bit value with the attribute.