[flang] Improve intrinsic error messages when multiple signatures exist (#172099)
When an intrinsic has multiple table entries (e.g., BESSEL_JN has both
elemental and transformational forms), the error message selection logic
now prefers specific argument errors over generic "too many actual
arguments" messages.
For example, given:
real, dimension(10) :: xarray
integer :: n1, n2
print *, bessel_jn(n1, n2, xarray)
The transformational form of BESSEL_JN(n1, n2, x) requires x to be
scalar. Previously, flang reported "too many actual arguments" from the
elemental form (which only accepts 2 args), even though the
transformational form matched the argument count but failed on rank.
Now flang correctly reports "x= argument has unacceptable rank 1", which
[4 lines not shown]
[OpenMP][CIR] Implement basic 'parallel' lowering + some clause infra (#172308)
This patch adds some basic lowering for the OMP 'parallel' lowering,
which adds an omp.parallel operation, plus tries to insert into its
region, with a omp.terminator operation.. However, this patch doesn't
implement CapturedStmt (and I don't intend to do so at all), so there is
an NYI error when emitting a parallel region (plus it shows up in IR
as 'empty'.
This patch also adds some infrastructure to 'lower' clauses, however no
clauses are emitted, and this simply adds a 'not yet implemented'
warning any time a clause is attempted. The OMP clause visitor seems to
have had a bug with how it 'degraded' when a clause wasn't handled (it
would result in an infinite recursion if it wasn't supplied), so this
fixes that as well.
A followup patch or two may use this infrastructure to demonstrate how
to use it.
[OpenMP][CIR] Implement 'barrier' lowering (#172305)
As my next patch showing how OMP lowering should work, here is a simple
construct implementation. Best I can tell, the 'barrier' construct just
results in a omp.barrier to be emitted into the IR. This is our first
use of the omp dialect, though the dialect was already added in my last
patch.
[llvm-objdump] Support --mcpu=help/--mattr=help without -d (#165661)
Currently `--mcpu=help` and `--mattr=help` only produce help out when
disassembling. This patch specialises these cases to always print the
requested help.
If `--triple` is specified, the help text will be derived from the
specified target. Otherwise, it will be derived from the target of the
first input file.
Fixes: #150567
---------
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
Co-authored-by: James Henderson <James.Henderson at sony.com>
AMDGPU: Stop requiring afn for f32 rsq formation
We were checking for afn or !fpmath attached to the sqrt. We
are not trying to replace a correctly rounded rsqrt; we're replacing
the two correctly rounded operations with the contracted operation.
It's net a better precision, so contract on both instructions should
be sufficient. Both the contracted and uncontracted sequences pass
the OpenCL conformance test, with a lower maximum error contracted.
[AArch64][SVE] Select non-temporal instructions for unpredicated loads/stores with the nontemporal flag (#171261)
Add patterns to select SVE non-temporal load/store instructions for
unpredicated vector loads/stores with the `nontemporal` flag.
Previously, regular instructions were used for these cases.
Fixes #169034
[lld][MachO] Add --lto-emit-llvm command line option
This option will cause the linker to emit LLVM bitcode instead of an
object file. The implementation is similar to that of the corresponding
option in the ELF backend. This only works with LLD and will not work
the gold plugin.
[libc++] Store the premerge runner images in the monorepo (#171443)
We need one canonical place to store the image used by the various sets
of libc++ CI runners. This is needed so that our run-buildbot-container
script can stay up-to-date, and for the pre-merge infrastructure to stay
up-to-date.
Previously, the images used by the premerge infrastructure were stored
in llvm-zorg, which makes it less discoverable and more complicated to
update and keep synchronized.
fixup! [AArch64][llvm] Add intrinsics for SVE BFSCALE
Also test llvm.aarch64.sve.fscale.nxv8bf16() intrinsic
in streaming mode as well as non-streaming
[CLANG] Fixes the crash on the use of nested requirements in require expressions (#169876)
Fixes #165386
Nested requirements in requires-expressions must be constant
expressions. Previously, using an invented parameter in a nested
requirement caused a crash. Now emit a clear diagnostic and recover.
fixup! [AArch64][llvm] Add intrinsics for SVE BFSCALE
Change `HasSVEBFSCALE` to be correct. This now requires
+sve-bfscale in sve-intrinsics-fp-arith.ll, and ensure
it is lowered to `bfscale` correctly.
Also run in both streaming and non-streaming mode in acle_sve_bfscale.c
[flang][OpenMP] Make function name more accurate, NFC (#172328)
Change `CountGeneratedLoops` to `CountGeneratedNests`, since it's really
the nests that are counted.