[Flang][MIF] Adding support of intrinsics with coarray argument (#192944)
Added support for intrinsics that query the coarray in argument:
- Adding lowering and operation in MIF Dialect for UCOBOUND, LCOBOUND,
COSHAPE and IMAGE_INDEX
- Adding support of coarray argument for THIS_IMAGE in MIF Dialect (and
the lowering)
---------
Co-authored-by: Dan Bonachea <dobonachea at lbl.gov>
Co-authored-by: jeanPerier <jean.perier.polytechnique at gmail.com>
[clang-scan-deps] Add scan-deps-filter.py test helper to filter full output (#206758)
Add a helper script, which projects `clang-scan-deps` experimental-full
JSON down to a chosen set of fields, plus a `%scan-deps-filter` lit
substitution. A bare key (e.g. `file-deps`) matches that key at any
depth. A dotted path (e.g. `modules.command-line`) is anchored from the
document root to disambiguate keys when relevant.
This lets tests assert only on the fields they care about instead of
`CHECK`ing the whole object, which otherwise breaks whenever an
unrelated field is added/modified, and avoids gating emission behind
awkward per-field flags.
Migrate modules-full-by-mod-name.c as a first example.
Assisted-by: Claude Opus 4.8
[libc++] Move threading and random device config into <__configuration/platform.h> (#206262)
These are platform-specific configuration options, so they should live
`<__configuration/platform.h>`.
clang/AMDGPU: Fix double linking opencl libs with --libclc-lib
Noticed by inspection. If using an explicit --libclc-lib flag,
do not attempt to also link the rocm device libs which will contain
different implementations of the same opencl symbols.
Co-Authored-By: Claude <noreply at anthropic.com>
clang/AMDGPU: Remove driver restriction on --gpu-max-threads-per-block
Previously this flag was only handled for HIP, and would produce an unused
argument warning. There is a custom warning produced by cc1 that the
argument isn't supported, but practically speaking that was unreachable
due to not forwarding the argument. Also add a test for the untested warning.
Also use a simpler method for forwarding the flag to cc1.
clang/AMDGPU: Merge toolchain subclasses
Simplify the toolchain implementations by collapsing
them into one. Previously we had a confusing split. The
AMDGPUToolChain base class implemented much of the base
support. It was subclassed by ROCMToolChain, which would
have been more accurately described as the offloading subclass.
That was further subclassed into HIP and OpenMP specific subclasses.
Deleting those two is the important part of this change. There was
code duplication, and features arbitrarily handled in one but not
the other. The offload kind is passed in almost everywhere if you
really need to know the original language. However, I consider
this an antifeature, and it is really poor QoI to have the HIP
and OpenMP toolchains behave differently in any way. The platform
should be consistent and the driver behaviors should not depend
on the language.
There is additional mess in the handling of spirv, which this
[9 lines not shown]
Revert "Disable RelLookupTableConverter on AArch64" (#207046)
Reverts llvm/llvm-project#204669
https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst#code-models
says that text + rodata should be <2GB on AArch64 for the small code
model, so we should be able to enable RelLookupTableConverter on AArch64
small code model.
With #205963, we now properly diagnose overflows, rather than silently
truncating and miscompiling.
[lldb] Move RegisterFlags from Target to Utility (#206861)
I'm doing this so that I can move RegisterInfo from
`lldb-private-types.h` to lldbUtility. It currently has a `RegisterFlags
*` field, so having it sit in lldb-private-types.h masks the actual
layering of our data types.
I considered moving RegisterInfo into Target, but RegisterValue (in
lldbUtility) uses RegisterInfo directly. Because RegisterFlags has no
internal dependencies, it seemed better to sink that instead.
[AMDGPU] Fold constant offsets into named barrier addresses (#205216)
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. `s_barrier_signal_var` on a GEP'd
named barrier now selects the immediate form, matching a bare global and
GlobalISel.
[lit] Truncate process output to 10 kiB (#206355)
The output of processes is transformed multiple times:
1. All non-piped/redirected output of processes is collected
(communicate() for the last process of the pipe, read() for all
previous.)
- It's not good that we *collect* the entire output at all (frequent
realloc+memcpy for large buffers) and I'd rather not have a
possibly large output stored in Python at all.
2. The output is converted into strings (memcpy/utf-8 decode) and stored
in the results list of executeScriptInternal.
3. executeScriptInternal builds the debug output combining all these
stdout/stderr.
- It performs a lot of `out += ...`, which allocates (malloc+memcpy)
a new string every time. There are many of these concatenations.
4. The combined debug output is returned (together with other things) to
_runShTest, which determines whether the test passed, executing the
test multiple times if necessary. It also string-formats the output.
[23 lines not shown]
[NFC][AMDGPU] Use SIInstrFlags predicates in MCA (#206761)
Replace raw TSFlags accesses with SIInstrFlags predicate calls in
AMDGPUCustomBehaviour.
Part of a series following the introduction of SIInstrFlags predicates.
[mlir][sme] Update the multi-tile e2e example (#202979)
These changes enable hoisting of the accumulator load/store operations
out of the K loop.
Many thanks to @steplong for identifying the missing steps (see #201562)
[llvm][SplitModuleByCategory] Fix infinite loop on cyclic global uses (#206862)
The dependency graph construction in `addUserToGraphRecursively` walked
the users of globals without visited set. A cycle in the use graph
caused it to push the same users forever, hanging the splitter.
Track visited users so each is processed once.
Co-Authored-By: Claude
Handle the case where the ISA we find when looking up a method implementation has masked bits (#206864)
We need to canonicalize these since we look them up, and the PointerISA
is the right thing to use since it actually points at the class.
I can't write a test for this because ObjC mostly uses the masks for
swift/objc interop. I will add a test on the swift fork.
[lld] Make R_AARCH64_PREL32/16 only signed ints (#205963)
After https://github.com/ARM-software/abi-aa/pull/401, these are defined
to be only signed 32/16 bit ints rather than both signed and unsigned.
Assisted-by: Gemini
[flang][OpenMP] Semantic checks for metadirective loop nests
A loop-associated metadirective variant (`do`, `simd`, ...) is only
resolved during lowering, so it is never checked as a loop construct
during semantic analysis; a malformed or non-canonical associated nest
therefore reaches lowering, which assumes a canonical nest.
This patch Validate the nest that follows such a variant (the next
executable construct) during semantics, reusing the diagnostics of a real
loop-associated construct. Each applicable variant is checked against it:
* Canonical loop: the affected loop must be a canonical DO loop, so a
`DO WHILE`, a pre-6.0 `DO CONCURRENT`, or a `DO` without loop control
is rejected.
* Nest depth: `collapse(n)` and `ordered(n)` must not exceed the depth
of the associated loop nest.
* Rectangularity: loops that must be rectangular (e.g. under `tile`) may
not have bounds that depend on an outer loop's variable.
[8 lines not shown]
[bazel] Fix remote exec with thin docker images (#205849)
rules_python is working on flipping this default but without this
setting /usr/bin/python3 must exist to run a py_binary. This might not
be the case in remote exec environments where you're trying to use the
smallest possible image.
[TableGen] Fix SearchableTable lookup comparator w/ multiple string keys (#207021)
This change fixes a bug in `SearchableTableEmitter::emitLookupFunction`
where `emitComparator` redeclares `LHSStr`/`RHSStr` in the same scope.
This fix simply attaches the Field.Name to the emitted `LHSStr`/`RHSStr`
variable names.
[CIR] Fix getNewInitValue on string-literal arrays
`getNewInitValue` in `CIRGenModule.cpp` rebuilds a global's initializer when
`replaceGlobal` fixes up references after a global's type changes -- for
example when an `extern` array is referenced while still incomplete and then
completed. Its `ConstArrayAttr` branch cast `getElts()` to an `mlir::ArrayAttr`,
but a `ConstArrayAttr` built from a string literal stores its elements as a
`StringAttr`. A struct global that both points at the replaced global and has a
`char` array member therefore aborted on a failed `cast<ArrayAttr>` during
CIRGen.
`ConstArrayAttr::verify` allows only two element kinds: an `ArrayAttr` or a
`StringAttr`. A `StringAttr` holds raw 8-bit bytes and references no globals, so
there is nothing to rewrite. The fix returns the initializer unchanged for the
`StringAttr` case and `cast`s on the `ArrayAttr` path, so a future third element
kind asserts rather than silently passing through.
This surfaced compiling CPython's deep-frozen module data (SPEC CPU 2026
714.cpython_r), where frozen objects cross-reference each other and carry string
payloads. The benchmark advances past this abort to a const-record type-identity
issue that is tracked separately.
[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.
The barrier ID is derived from the address via (addr >> 4) & 0x3F, so a
byte offset that does not land on a 16-byte barrier boundary is still
valid: it simply selects the containing barrier. No alignment assertion
is needed, and such offsets must not crash the compiler (see the
misaligned test).
Change-Id: I639bc723eb001573585cc05d0ad19f2773054f21
Assisted-by: Cursor
[OpenACC] append triples when materializing reduction destroy recipes (#207034)
Append the triples if they exist when materializing the destroy region.