[FIR] Route embox + projected complex slice through shapeVec (#205042)
When the array_coor base is a fir.embox with a projected complex %re/%im
slice, take the shapeVec path instead of the descriptor (fir.box_dims)
path. The descriptor path iterates source-rank dims while querying the
rank-reduced embox result box, which miscompiles slices that collapse
dims (e.g. complex(:,k)%re). For embox-derived boxes the underlying
storage is contiguous, so the shape-derived layout is both correct and
the natural place to encode that static shape is available. Non-embox
boxes (rebox, assumed-shape) still go through fir.box_dims.
Co-Authored-By: Claude Sonnet 4.6 <noreply at anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
[AArch64][SVE] Use ADD/ADR instead of MUL/MLA for x*N (#198566)
Avoid `MUL`/`MLA` for all-active multiplies by small constants when
cheaper `ADD`/`ADR` sequences are available.
Vector multiplication (int32_t/uint32_t base types) by 2, 3, 5, 9 can be
done with ADD (for 2) ADR (for 3,5,9).
Similarly, operations of the form a + x * {1,2,4,8} can use ADR.
[SCEV] Infer addrec nowrap flags during range analysis (#202964)
When we're computing the range of the addrec, we already have to reason
about whether it wraps, so we may as well determine the nowrap flags at
the same time.
This is more precise than the previous logic that took the addrec range
and checked whether adding a step to it does not wrap. For example, an
`{0,+,1}` addrec with a full range can still be non-wrapping.
Note that I removed some assertions in the SCEV printed that predicated
exit counts actually have predicates. Due to SCEV's query order
dependence, this can happen, also prior to this change, see for example
https://llvm.godbolt.org/z/cWK1MMEqv. While this indicates suboptimal
results, it's not a bug, and we should not assert.
Fixes https://github.com/llvm/llvm-project/issues/200788.
[flang][OpenMP] Check that IF clause applies to at most one leaf (#205164)
This also allows placing the IF clause in the "allowedClauses" set for
all directives, instead of having it in "allowedOnceClauses" for some
directives and in "allowedClauses" for others.
The emitted diagnostic will show which constituent has multiple IF
clauses applying to it:
```
if.f90:4:35: error: At most one IF clause can apply to each directive constituent
!$omp & if(target teams: x > 0) if(teams distribute: y > 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
if.f90:4:11: Previous IF clause applying to the TEAMS constituent
!$omp & if(target teams: x > 0) if(teams distribute: y > 0)
^^^^^^^^^^^^^^^^^^^^^^^
```
[AArch64][TableGen] Define ZA, ZT0 and FPMR memory defvars (#154144)
Introduce TableGen defvars for the inaccessible memory effects used to
model accesses to ZA, ZT0 and FPMR in IntrinsicsAArch64.td.
This is a preparatory cleanup for a follow-up patch that will replace
these uses of InaccessibleMem with target-specific memory locations.
Other uses of inaccessible memory in the file are left unchanged because
they are unrelated to ZA, ZT0 or FPMR.
This preserves the existing memory effects. In particular, intrinsics
that currently access both argument memory and inaccessible memory keep
the same ArgMem/InaccessibleMem read/write modelling.
---------
Co-authored-by: Paul Walker <paul.walker at arm.com>
Remove unused variables in the monorepo (#204994)
https://github.com/llvm/llvm-project/pull/203084 adds diagnostics about
unused variables to the libc++ containers. This patch is the fallout
from the projects I tried to build with it.
[runtimes][NFC] Re-indent shared library blocks
Re-indent the shared library target blocks that were wrapped in
if(<runtime>_SUPPORTS_SHARED_LIBRARY) in the previous commit. This is a
whitespace-only change split out from the functional change to keep that diff
minimal and reviewable.
Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>
Reapply "runtimes: Pass CMAKE_SYSTEM_NAME based on target triple" (#205133)
This reverts commit 08c728e8528c9584bc1fe0f46bbdd657e368be91.
Reapply after runtimes build fixes on platforms without shared libraries.
[flang][FIR] add canonicalization pattern for fir.if returning OPTIONAL (#205353)
Lowering is generating patterns when forwarding OPTIONAL in calls that
looks like:
```
%present = fir.is_present %var : (T) -> i1
%if_result = fir.if %present -> (T) {
fir.result %var : T
} else {
%absent = fir.absent T
fir.result %absent : T
}
```
This specific pattern is a no-op and `%var` can be used directly. The
lowering logic that generates such patterns is inside non trivial
compiler code that has to deal with more complex scenarios where the
code inside the fir.if is more complex. Add a FIR pattern to
canonicalize such code to help with later analysis (like aliasing).
Reapply "runtimes: Pass CMAKE_SYSTEM_NAME based on target triple" (#205133)
This reverts commit 08c728e8528c9584bc1fe0f46bbdd657e368be91.
Reapply after runtimes build fixes on platforms without shared libraries.
[libc] Add IPv4 socket options and related structs (#204787)
This patch adds struct ip_mreq, ip_mreq_source, ip_mreqn, ip_opts, and
ip_msfilter to <netinet/in.h>, along with IP level socket option macros
(IP_TOS, IP_TTL, IP_ADD_MEMBERSHIP, etc.).
I add basic unit tests verifying the size and member offsets of the new
structures against standard layout expectations, mainly to make sure
that the files are used /somewhere/.
Assisted by Gemini.
[runtimes] Don't create shared library targets when unsupported
On platforms that don't support shared libraries (e.g. CMAKE_SYSTEM_NAME of
"Generic", used for GPU and other baremetal targets), CMake's
Platform/Generic.cmake sets the global TARGET_SUPPORTS_SHARED_LIBS property to
FALSE. Under CMP0164's OLD behavior (the default, since the runtimes set
cmake_minimum_required(3.20)), CMake silently demotes SHARED library targets to
STATIC archives. libcxx, libcxxabi and libunwind always create their shared
target, so after demotion both the shared and static targets emit e.g.
"libc++abi.a" and Ninja fails with "multiple rules generate ...".
Rather than papering over the collision with a distinct output name, skip
creating the shared library targets entirely when the platform does not support
them, gating on the TARGET_SUPPORTS_SHARED_LIBS property (left undefined on
platforms that do support shared libraries). The few consumers of the shared
targets are guarded with TARGET checks so they fall back to the static library
or are skipped.
Also set policy CMP0164 to NEW so that any future unguarded
[10 lines not shown]
[flang][OpenMP] Lower target in_reduction for host fallback
Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.
Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.
Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
[compiler-rt][ARM] Fix underflow handling in new divdf3.S (#204784)
The code which calculates the 'errsign' parameter to pass to
`__compiler_rt_dunder` was wrong in two ways. It calculated the value
with the wrong sign, and also in the wrong register, r12 rather than r2!
In this code's original context, both of those things made sense (the
'dunder' function had a nonstandard ABI). Somehow none of the existing
test cases detected the problem.
We found this bug in a test case downstream that only failed big-endian
(because that changes which half of the denominator mantissa is left in
r2 to be accidentally used as errsign). However, the new test cases here
are designed to detect the failure in both endiannesses.
[libc] Refactor qsort code (#198781)
This patch makes the following changes:
- Refactor the internal sorting functions to reduce code duplication.
- Move the testing machinery done for the testing of `qsort_r` to a
shared place.
These changes are done in anticipation to the introduction of Annex K's
`qsort_s`. This function shares most of its semantics with `qsort_r`,
therefore most of the testing logic can be shared between the two.
Besides, `qsort`, `qsort_r` and `qsort_s` are all very similar, hence we
can attempt to reduce duplication a bit more.
[ObjectYAML][NFC] Derive BBAddrMap section size from the CBA offset (#204056)
Add the CBA offset delta to sh_size once at the end instead of after
each write.
[AArch64] Run cleanup one final time after peephole (#199711)
It's a lightweight pass. Should always be the last SSA pass since
peephole can end up making some instructions dead.
[LoopFusion][NFC] Share fusion tail between guarded and unguarded paths (#205492)
`performFusion()` and `fuseGuardedLoops()` carried two
character-for-character identical tails: header-PHI migration plus latch
rewiring, and the SCEV-forget / block-merge / latch-merge finalization.
Extract them into `rewireFusedHeaderPHIsAndLatches()` and
`finalizeFusedLoop()` and call both from each path.
[flang][PFT-to-MLIR] Wrap unstructured Fortran constructs in scf.execute_region
Extend the PFT-to-MLIR (HLFIR/FIR) lowering so unstructured DO and IF
constructs are emitted inside scf.execute_region, hiding their multi-block
CFG behind a single op. OpenACC and OpenMP lowerings that reject
multi-block content (e.g. the "unstructured do loop in combined acc
construct" TODO in OpenACC.cpp) now see a structured op instead.
Flag: -mmlir --wrap-unstructured-constructs-in-execute-region (default on).
An evaluation is wrappable iff all of the following hold:
* wrap flag on
* eval is parser::DoConstruct or parser::IfConstruct
* eval.isUnstructured
* branchesAreInternal(eval) -- every controlSuccessor in the subtree
targets a nested eval or the constructExit
* !hasIncomingBranch(eval) -- no outside eval branches into the body
(PFT's synthetic IfConstruct around `if(c) goto X` absorbs label
[27 lines not shown]