[flang][openacc] add extension which accepts multiple names in a OpenACC routine directive (#200296)
This PR adds an extension which allows one or more function names in a
single named routine directive. This is treated as multiple named
routine directives with the same clauses. The bind clause is forbidden.
The empty list of names isn't excepted. Routine clauses are stable under
unparsing.
This PR tests Parsing, Unparsing, Semantics, and Lowering.
[LV] Vectorize early exit loops with stores using masking (#178454)
This is an alternative approach to vectorizing early exit loops with
stores that avoids needing to add an extra check block. This is a
fairly straightforward approach that should work on vector ISAs
supporting masked memory ops.
The basic approach is to create a mask covering all lanes _before_ any
exiting lane, using cttz.elts and active.lane.mask (which sets all lanes
to true if the uncountable exit wasn't taken). If the uncountable exit
was taken, then there will still be one scalar iteration left to perform
after the vector loop, which will also handle which exit block we should
branch to.
We no longer need to advance exit conditions in the vector body to the
next iteration (compared to the other PR), though we still need to move
the recipes needed to generate the exit condition (depending on which
memory operations are first in the loop).
[56 lines not shown]
[AArch64][llvm] Restrict luti6 (4 regs, 8-bit) to 0 <= Zn <= 7
The `luti6` instruction (table, four registers, 8-bit) should only
allow `0 <= Zn <= 7`, since there's only 3 bits. It actually allows:
```
luti6 { z0.b - z3.b }, zt0, { z8 - z10 }
```
which produces a duplicate encoding to the following:
```
luti6 { z0.b - z3.b }, zt0, { z0 - z2 }
```
Fix tablegen to ensure Zn is only allowed in correct range of 0 to 7.
[CIR][AArch64] Lower vfmaq_lane_v and vfma_laneq_v (#197084)
Lower BI__builtin_neon_vfmaq_lane_v and BI__builtin_neon_vfma_laneq_v in
CIR.
This handles the covered vfmaq_lane_* and vfma_laneq_* ACLE wrappers by
bitcasting operands to the expected types, selecting the requested lane
from the lane source operand, and emitting fma through
emitCallMaybeConstrainedBuiltin.
For vfmaq_lane_v, the selected lane is splatted with emitNeonSplat.
For vfma_laneq_v, the lane is selected from the wider lane source; the
f64 case extracts the scalar lane before emitting scalar fma.
Neighboring scalar lane/laneq wrappers and other out-of-scope forms
remain explicit NYI cases.
Tests are moved into the existing CIR-enabled fused multiply files under
clang/test/CodeGen/AArch64/neon/, reusing upstream LLVM checks where
[3 lines not shown]
[AMDGPU] Reject named single register inline asm constraints for wider types (#200771)
A named single register constraint like `={v0}` was silently accepted
for i64 result, binding it to one 32-bit register
Reject scalars larger than 32 bits as well
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
Reapply "[clang] remove lots of "innocuous" addrspacecasts" (#200427)
Reapply #197745, with an additional commit to undo a small part of the
first commit, pending further analysis of alternatives to that part of it.
In particular, make the `agg.tmp` (CreateAggTemp) values keep using the
declared expression type of the RValue. This is indeed probably
sensible, since it lets Sema influence this via the expression type,
though it runs into some issues where some expression types (notably for
any load) haven't had one of the equivalent functions
getNonLValueExprType/DeduceAutoType/getUnqualifiedType called on them,
so they are bringing along additional annotation baggage which doesn't
apply the the rvalue when turned into a temporary (see comments in
getNonLValueExprType for relevant part of C++ standard). This in turn is
also rarely relevant, since inventing temporaries aren't often allowed
in this part of the pipeline (it'd require a move constructor) so the
LValue Dest already provides the type and the RValue type is ignored.
However, it does affect a single test (which loads a global but ignores
the result) and so this adds an extra `getUnqualifiedType` call to
[6 lines not shown]
[flang][OpenMP] Store DECLARE_TARGET information in WithOmpDeclarative (#201103)
This will be used to emit DECLARE_TARGET directives into module files.
When a symbol apperars in DECLARE_TARGET, the OmpDeclareTarget flag will
be set on it (this includes procedures containing a DECLARE_TARGET
without arguments or clauses). The set of accompanying clauses will be
stored in the associated details, in the WithOmpDeclarative mixin. The
mixin was added to ObjectEntityDetails, ProcEntityDetails, and
CommonBlockDetails.
The design goal was to be able to reconstruct the appropriate DECLARE_
TARGET directive for individual symbols for the purpose of emitting it
in a module file. Simply storing and then unparsing the AST node may
include symbols that should not be emitted.
Additionally, refactor the WithOmpDeclarative printing code for reuse in
symbol dumping for debugging, and for printing clause sets.
[Matrix][HLSL] Add codegen support for Matrix Layout keywords (#198887)
fixes #192262
- Wrap Matrix Type in a row or column major layout attribute
- Add Helper to know which Matrix Layout to apply in codegen or check
for in Sema
- Remove the Decl Atribute and only store on the type.
Assisted by Claud Opus 4.7
[Dexter] Add basic result evaluation for structured scripts
This patch adds evaluation for structured scripts, completing the features
required to run simple Dexter tests using structured scripts. The basic
output from these evaluations is a list of named metrics aggregating the
results of evaluating !value nodes. The verbose output gives a per-step
summary of the results for each expect node active at that step.
Most of the new functionality is in the evaluation/ dir, which has also
absorbed some functionality previously stored in the
ScriptDebuggerController for matching !where nodes to a debugger StepIR,
as this is logic which is common to both managing a debugger session and
evaluating the end result.
[Flang-RT] Disable tests by default without modules (#201311)
With #201297 flang-rt-mod is required for running tests. Disable tests
by default if module files are not built.
[Transforms] Delete identical poison tests (NFC) (#201349)
These are now bit-identical to the original tests:
- llvm/test/Transforms/InferAddressSpaces/AMDGPU/old-pass-regressions.ll
- llvm/test/Transforms/InterleavedAccess/AArch64/interleaved-accesses.ll