[Clang][CodeGen] Fix __builtin_counted_by_ref for nested struct FAMs (#182575) (#182590)
GetCountedByFieldExprGEP() used getOuterLexicalRecordContext() to find
the RecordDecl containing the counted_by count field. This walks up
through all lexically enclosing records to find the outermost one, which
is wrong when a struct with a counted_by FAM is defined nested inside
another named struct.
For example, when struct inner (containing the FAM) is defined inside
struct outer, getOuterLexicalRecordContext() resolves to struct outer
instead of struct inner. The StructAccessBase visitor then fails to
match the base expression type (struct inner *) against the expected
record (struct outer), returning nullptr. This nullptr propagates back
as the GEP result, and the subsequent dereference in
*__builtin_counted_by_ref() triggers an assertion failure in
Address::getBasePointer().
Replace getOuterLexicalRecordContext() with a walk that only traverses
anonymous structs and unions, which are transparent in C and must be
[13 lines not shown]
[VPlan] Start implementing VPlan-based stride multiversioning
This commit only implements the run-time guard without actually
optimizing the vector loop. That would come in a separate PR to ease
review.
[flang][debug] Supply missing subprogram attributes (#181425)
Add DW_AT_elemental, DW_AT_pure, and DW_AT_recursive attributes to
subprograms and functions when they are specified in the source.
[NFC][VPlan] Add initial tests for future VPlan-based stride MV
I tried to include both the features that current
LoopAccessAnalysis-based transformation supports (e.g., trunc/sext of
stride) but also cases where the current implementation behaves poorly,
e.g., https://godbolt.org/z/h31c3zKxK; as well as some other potentially
interesting scenarios I could imagine.
[NFC][VPlan] Split `makeMemOpWideningDecisions` into subpasses
The idea is to have handling of strided memory operations (either from
https://github.com/llvm/llvm-project/pull/147297 or for VPlan-based
multiversioning for unit-strided accesses) done after some mandatory
processing has been performed (e.g., some types **must** be scalarized)
but before legacy CM's decision to widen (gather/scatter) or scalarize
has been committed.
And in longer term, we can uplift all other memory widening decision to
be done here directly at VPlan level. I expect this structure would also
be beneficial for that.
[NFCI][VPlan] Split initial mem-widening into a separate transformation
Preparation change before implementing stride-multiversioning as a
VPlan-based transformation. Might help
https://github.com/llvm/llvm-project/pull/147297/ as well.
[MLIR][Complex] Check for FastMathFlag in DivOp folder (#176249)
- Fold DivOp with LHS that has NaN as real or imag to Complex of NaNs
- Fold `div(a, Complex<1, 0>) -> a` if fast math flag with nnan is set
AMDGPU: Try to fix leak in AMDGPULibFunc
I don't know why this was trying to do placement do. I guess
this was overriding the unique_ptr, bypassing its destructor.
[AMDGPU] Efficient way to get NumArchVGPRs. (#182537)
No functional change. Cleaning up to get number of VGPRs for different
AMDGPU target based on features.
[libc] Properly handle null handles in rpc_dispatch.h
Summary:
We autuomatically dereference pointers, we should check if these are
null. Minimal change made by just keeping it zero and handling zero.
[flang-rt] Add support for formatted I/O on the GPU (#182580)
Summary:
Expands on the previous support to enable formatted output, characters,
and checking basic iostat. We intentionally do not handle cases where
the descriptor is non-null as this is a non-trivial class that cannot
easily be shepherded across the wire.
Reapply "[flang] Lowering a ArrayCoorOp to arithmetic computations" (#182585)
Reapplying the changes. Reverted it wrongly yesterday
This reverts commit 3c6523dcb8ebc0396f69c578285599b66e16dce7.
[LoopFusion] Improve collectFusionCandidates() (#182571)
The order of visiting loops in collectFusionCandidates() guarantees that
a new member can only possibly be added to the end of a set.
Also currently `NumFusionCandidates` counts any loop that is added to a
candidate set. Usually large majority of candidate sets have a single
members so they are not really candidates for fusion. Only the second
member of a candidate set and the ones that come after that could be
counted as fusion candidates.
[CIR][CUDA] Add CUDAKernelNameAttr for device stubs (#180051)
Besides the Attribute description. It is worth noting that this
attribute will later be consumed when handling runtime registration on
loweringPrepare.
[flang] OPTIONAL char dummy has no defining op; add null check (#182582)
size.getDefiningOp() returns nullptr for block arguments when a OPTIONAL
character length generated the conditional "fir.if". Check for a nullptr
before calling mlir::isa<> to avoid the crash.
Addresses: https://github.com/llvm/llvm-project/issues/182436
Passes check-flang, check-flang-rt, and llvm-test-suite (x86_64)
---------
Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval at gmail.com>
[clang][ssaf] Refactor `JSONFormatTest` into a directory with a shared fixture header (#182523)
This change converts `Serialization/JSONFormatTest.cpp` into a directory
to support reuse of the `JSONFormatTest` fixture by upcoming test files
for additional data structures with JSON serialization support. New test
files for other serializable data structures can now include
`JSONFormatTest.h`, inherit from `JSONFormatTest`, and add their own
fixture and tests without duplicating the filesystem scaffolding.
AMDGPU: Try to fix leak in AMDGPULibFunc
I don't know why this was trying to do placement do. I guess
this was overriding the unique_ptr, bypassing its destructor.
[clang][ARM] Refactor argument handling in `EmitAArch64BuiltinExpr` (2/2) (NFC) (#181974)
Refactor `EmitAArch64BuiltinExpr` so that all AArch64/NEON builtins
handled by this hook _and marked as overloaded_ share a common path
for generating LLVM IR arguments (collected into the `Ops`
`SmallVector<Value*>`) (*). This is a follow-up for #181794 - please
refer to that PR for more context.
As in the previous PR, the key change is implemented in
`HasExtraNeonArgument` , i.e. in the hook that identifies Builtins with
the extra argument. In this PR, I am replacing the ad-hoc switch
statement with a more principled approach borrowed from SemaARM.cpp,
namely:
```cpp
static bool HasExtraNeonArgument(unsigned BuiltinID) {
// (...)
uint64_t mask = 0;
switch (BuiltinID) {
#define GET_NEON_OVERLOAD_CHECK
[29 lines not shown]
[c-index-test] Avoid loading a module input file when we need a file name only. (#182426)
Loading a module input file triggers its validation. Avoid this process
when we need only a file name.
rdar://167647519