[flang][OpenMP] Move TargetOMPContext to shared FlangOMPContext (NFC) (#202677)
Moving the class to shared code makes it available for reuse by
forthcoming DECLARE VARIANT lowering without any functional change to
existing metadirective lowering.
[Flang] Reject keyword arguments in statement function calls (#198610)
**Problem**
Flang silently accepted keyword arguments in calls to statement
functions, violating F2018 C1535.
**Standard: F2018 §15.5.1 C1535**: In a reference to a procedure whose
interface is implicit at the point of the reference, the actual argument
shall not be a keyword argument.
Flang silently compiles the following code without giving error` Keyword
argument 'x' at (1) is invalid in a statement function
`
```
program test
integer :: f1, x, c
f1(x) = x / 2
c = f1(x=10) ! Should be an error
[14 lines not shown]
[clang][PS5] Clang driver PS5 - pass the target CPU to lld. (#202924)
Forward the PS5 target CPU from the clang driver to lld as
`-plugin-opt=mcpu=znver2`, matching behavior of other platforms.
Most drivers call addLTOOptions to include LTO-related link options. That includes specifying mcpu. The PS5 driver doesn't yet call addLTOOptions. In time I hope we'll arrive at a point where we can refactor to use the same functionality. This is one step towards that.
---------
Co-authored-by: Edd Dawson <edd.dawson at sony.com>
[libc] Add the htons function family to netinet/in.h (#203028)
As required by POSIX.
I've used the merge_yaml_files functionality to avoid duplication.
Assisted by Gemini.
[OpenACC][flang] Emit NYI when unstructured loops are associated with OpenACC directives
When an unstructured loop is associated with a loop or a combined
directive, we emit an unstructured CFG for the loop's logic nested
within the OpenACC op. This effectively serializes the nested loop on
the device which is not desirable. For now, emit NYI's while working on
a longer-term solution.
The NYI is restricted to the cases where the loop will be lowered with
`independent` parallelism semantics for the default device_type -- i.e.,
the user has explicitly promised the loop is parallel. This covers:
- combined `acc parallel loop`,
- standalone `acc loop` inside `acc parallel`,
- orphan `acc loop` inside a non-`seq` acc routine,
- explicit `independent` clause.
For `auto` (`acc kernels loop` and `acc loop` inside `acc kernels`) and
for `seq` (`acc serial loop`, `acc loop` inside `acc serial`, explicit
`seq`, or orphan inside a `seq` routine), the user has not made a
[4 lines not shown]
[flang][OpenACC] Don't hoist declare directive out of interface bodies (#202806)
Example:
```fortran
program main
real :: a(10, 60)
interface
subroutine compute(a)
real :: a(10, 60)
!$acc declare present(a)
end subroutine
end interface
call compute(a)
end program
```
In this code, the `!$acc declare` inside the interface body is hoisted
into the
host program unit and lowered there, where its operand (the interface
[12 lines not shown]
[SPIR-V] Lower `select` instructions with aggregate operands (#201417)
Context: `SPIRVEmitIntrinsics` represents aggregate (array/struct) SSA
values as i32 value-ids, keeping the real type on the side for SPIR-V
emission. `preprocessCompositeConstants()` rewrites composite constant
operands into those value-ids.
A `select` takes its result type from its operands, so rewriting one arm
leaves the select with an aggregate result type but an i32 operand,
which is invalid. The exact failure mode depends: a composite-constant
arm tripped the verifier ("Select values must have same type as select
instruction"), while a non-constant arm (say a load) only became a
value-id later, in the visitor pass, at which point
`replaceMemInstrUses()` found a `select` among its users and hit an
unreachable.
I pushed two commits fixing this, one limited to my use case, another
more general:
[20 lines not shown]
[Demangle] Guard DEMANGLE_ABI and add missing annotation (#202920)
This updates the DEMANGLE_ABI annotation to only be defined if it is not
already defined. This is required to parse the Demangle headers with the
ids-check script.
In addition, this adds one missing DEMANGLE_ABI annotation.
This effort is tracked in #109483.
[flang][OpenMP] Model target in_reduction through map entries
Model omp.target in_reduction so the target body uses the mapped
map_entries block argument instead of a separate in_reduction entry
block argument.
The in_reduction operands remain on the op for host-side translation.
For the host-fallback path, the matching map block argument is redirected
to the pointer returned by __kmpc_task_reduction_get_th_data, so the
target body accumulates into the task reduction-private storage.
Flang lowering now relies on the implicit address-preserving map for the
target body binding, while task and taskloop keep their existing
in_reduction block-argument behavior.
Offload/device compilation is still diagnosed as not yet implemented, and
each target in_reduction variable must have a matching map_entries entry.
[flang][OpenMP] Lower target in_reduction for host fallback
Teach Flang lowering and MLIR OpenMP translation to carry
in_reduction through omp.target for the host-fallback path.
The translation looks up task reduction-private storage with
__kmpc_task_reduction_get_th_data and binds the target region's
in_reduction block argument to that private pointer, so uses inside the
region do not keep referring to the original variable.
The patch also preserves in_reduction operands in the TargetOp builder
path and ensures target in_reduction list items are mapped into the
target region when needed.
The device/offload-entry path remains diagnosed as not yet implemented.
[LoopFusion] Drop duplicate write-write dependence check (NFC) (#203173)
`dependencesAllowFusion()` re-tested every FC0-write vs FC1-write pair
in the second loop nest, duplicating the checks already done in the
first. Iterate only the remaining FC0-read vs FC1-write pairs; the set
of checked dependences (W0xW1, W0xR1, R0xW1) is unchanged.
[OpenACC][flang] Emit NYI when unstructured loops are associated with OpenACC directives
When an unstructured loop is associated with a loop or a combined
directive, we emit an unstructured CFG for the loop's logic nested
within the OpenACC op. This effectively serializes the nested loop on
the device which is not desirable. For now, emit NYI's while working on
a longer-term solution.
The NYI is restricted to the cases where the loop will be lowered with
`independent` parallelism semantics for the default device_type -- i.e.,
the user has explicitly promised the loop is parallel. This covers:
- combined `acc parallel loop`,
- standalone `acc loop` inside `acc parallel`,
- orphan `acc loop` inside a non-`seq` acc routine,
- explicit `independent` clause.
For `auto` (`acc kernels loop` and `acc loop` inside `acc kernels`) and
for `seq` (`acc serial loop`, `acc loop` inside `acc serial`, explicit
`seq`, or orphan inside a `seq` routine), the user has not made a
[4 lines not shown]
[libc++] Hoist <compare> outside the threads guard in <thread> (#202535)
The standard mandates [thread.syn] include <compare> as part of
<thread>'s synopsis. This is a standards-mandated dependency, not a
thread-feature dependency, so it should be visible regardless of
_LIBCPP_HAS_THREADS.
This matches how we handle standard-mandated includes elsewhere, see for
example #134877.
[mlir][OpenMP] Translate reductions on taskloop
Add LLVM IR translation for reduction and in_reduction clauses on omp.taskloop.context.
For taskloop reduction, emit the implicit taskgroup reduction setup and map each generated task to runtime-provided private reduction storage through __kmpc_task_reduction_get_th_data. For in_reduction, use the same runtime lookup path with a null descriptor to join an enclosing task reduction context.
Unsupported byref, cleanup, and two-argument initializer forms remain diagnosed.
Add MLIR translation tests for the supported taskloop reduction and in_reduction cases.
[RISCV] Set CostPerUse to 1 only when optimizing for size (#201501)
We saw some regressions because of bad RAs as the cost of registers
beyond x8-x15 are bigger. This is why `DisableCostPerUse` was added
in https://github.com/llvm/llvm-project/issues/83320.
In this PR, we change it to set `CostPerUse=1` only when optimizing
for size.
Code size increases less than 0.1% in llvm-test-suite.
Reland emitc lower multi return functions (#203026)
Reland #200659 reverted by #202911.
Fixed GCC 7 func-to-emitc build: Use the adaptor operand types
when creating the multi-return struct type instead of relying on an
implicit conversion from ValueRange to TypeRange.
Failed buildbot:
https://lab.llvm.org/buildbot/#/builders/116/builds/29302
Assisted-by: Copilot
[LoongArch] Propagate demanded bits for CRC[C].W.{B,H}.W
CRC byte and halfword instructions only use the low 8 or 16 bits of
their data operand. Propagate these demanded-bit requirements through
SimplifyDemandedBitsForTargetNode() so redundant masking operations can
be removed during DAG combining.
[APInt] Provide sqrtFloor (floor of square root) instead of sqrt (rounded) (#197406)
This simplifies both the implementation and the only in-tree user.
I changed the name to avoid silently changing the behavour of an
existing function that might have out-of-tree users.