[SSAF] Increase Expr kind coverage in EntityPointerLevelTranslator
Add support for more kinds of Expr that can be translated to
EntityPointerLevel(s).
Additionally, fix bugs in PointerFlowExtractor discovered by tests
added for the new Expr kinds.
[CIR] Allow local goto within cleanup regions (#197539)
Until now, CIR's FlattenCFG pass reported an NYI error any time a goto
operation was found within a cleanup scope region. This change loosens
that restriction to allow goto operations that transfer to another block
within the same cleanup region. This case doesn't require any change in
the cleanup scope flattening. It just has to be detected and ignored.
The goto will be lowered as it is when no cleanup scope is present.
We are still reporting an NYI error in cases where a goto operation
branches out of the cleanup scope. That will be implemented in a
follow-up change.
Assisted-by: Cursor / claude-opus-4.7-thinking-xhigh
Fix regressions in OpenMP V&V and Fujitsu testsuites
The users iterator apparently becomes invalid after one of its uses is
replaced. Fix this by making a copy of the list of users.
Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#188851)"
Linear iteration variables were being treated as private. This fixes
one of the issues reported in #170784.
The regressions in the OpenMP V&V and Fujitsu testsuites happened
because the users iterator was apparently becoming invalid, after one of
its uses was replaced. This was fixed by making a copy of the list of
users.
[flang] Exclude procedure scope variables in isModuleScopeDataUniquedName (#192999)
In particular, for saved local such as `x` in the sample below,
isModuleScopeDataUniquedName should return false.
```
module m
contains
subroutine foo()
integer, save :: x ! <-- SAVE
end subroutine
end
```
[CIR] Make collectUnreachable more robust (#197334)
The CXXABILowering and LowerToLLVM passes both use a
`collectUnreachable` function to find unreachable blocks and add them to
a SmallVector that will be passed to `applyPartialConversion` to avoid
leaving unconverted operations in the dead blocks. We can't simply call
`eraseUnreachableBlocks` because we don't yet protect indirect branch
and goto targets from being erased.
Application testing revealed that our `collectUnreachable` function was
missing blocks in SCCs, because they were connected to each other. This
change revises `collectUnreachable` to instead build a list of reachable
blocks and then walk all blocks to find the ones that aren't in the
reachable list (which is what `eraseUnreachableBlocks` does).
Assisted-by: Cursor / claude-opus-4.7-thinking-xhigh
[LifetimeSafety] Fix a crash on calling a function with `lifetime_capture_by` with an invalid parameter (#197517)
On creating the attribute, all of the parameters are in an invalid
state. If later not replaced (with a valid argument, such as `global`,
`this`, or any variable), the code will iterate over what it assumes are
completely valid arguments.
With this PR we just skip the argument handling for the invalid ones.
Fixes #197508
Reapply "[clang-doc] Move Info types into arenas (#190054)" (#192495)
This base patch was reverted to unbreak darwin-x86 bots. This version
of the patch avoids allocating any Info types locally and copying them
into the arenas after. It also uses a dedicated InfoNode<T> to manage
data separately from the lists, since the list nodes themselves were
being corrupted. The perf numbers below are not quite accurate anymore,
but are in the same ballpark.
Co-authored-by: Erick Velez <erickvelez7 at gmail.com>
---
Original PR text:
Info types used to own significant chunks of data. As we move these into
local arenas, these types must be trivially destructible, to avoid
leaking resources when the arena is reset. Unfortunately, there isn't a
good way to transition all the data types one at a time, since most of
them are tied together in some way. Further, as they're now allocated in
[66 lines not shown]
[clang] fix crash on alias templates with extraneous template parameter lists (#197542)
Recovers by picking a non-empty template parameter list as the current
list.
This fixes a regression introduced by #195303, which has never been
released, so there are no release notes.
Fixes #197398
[VPlan] Remove unneeded setDebugLoc in ::execute (NFC) (#197552)
The debug location is set before each recipe's ::execute is called.
Remove unneeded setDebugLoc in executes.
[AMDGPU] Fix forced lit64 encoding on lit() modifier
We were forcing lit64 encoding on a 64-bit operand with lit()
modifier. This is not required, not compatible with SP3, and
in the pathalogical case creates invalid 4 dword encoding if
used with a VOP3* instruction.
That said if lit() is used the immediate is silently truncated
even before the encoding, so the encoder only sees 32-bits of
relevant data and 32-bits of zeroes anyway. That is a separate
issue, but we never had a true 64-bit constant really properly
encoded with lit() modifier, only with lit64().
[LV] Add extra tests for epilogue vectorization. (NFC) (#197283)
Add additional tests with dead main epilogue loops and various
interesting combinations: SCEV and memory runtime checks, various
reductions etc.
[flang][cuda] Optimize CUDA descriptor transfers for regular strided layouts (#197532)
Add a conservative cudaMemcpy2D fast path for descriptor-to-descriptor
CUDA transfers when both descriptors have equal element counts and
regular positive-stride layouts. This speeds up cases such as
component-section transfers while preserving the existing runtime Assign
fallback for unsupported layouts, and adds CUDA runtime coverage for
strided host/device descriptor copies.
[libclc] Add optimized DPP scan functions for AMDGPU (#197543)
Summary:
This uses the update_dpp function to efficiently provide scans. DPP
allows for swizzling within a group of 16, so we do four of those,
followed by a permlane on GFX10, or a native DPP shift on GFX9.
Right now these are not actually used because we do not compile for
multiple targets yet, but I verified them by setting
`CMAKE_CLC_FLAGS=mcpu=gfx1030` or similar. The intention is to have
these waiting as a motivational implementation once we start doing
variable builds.
The output is optimal as far as I am aware. Here is the example for
gfx1030 generation:
```asm
_Z28sub_group_scan_inclusive_addi: ; @_Z28sub_group_scan_inclusive_addi
; %bb.0:
[17 lines not shown]
[VPlan] Remove VPTransformState::Lane (NFC). (#197545)
After efd429fdfb6f, all replicate regions are dissolved early. Remove
the new unused VPTransformState::Lane and corresponding dead assertions.
[clang] make evaluation of type constraint a SFINAE context
Otherwise, errors when substituting a type constraint could unintentionally make
the program ill-formed.
This also strenghtens the assert which checks, when we are instantiating templates,
that we either have a code synthesis context, or that we are in a SFINAE
context.
[NFC][SSAF] Rename PointerFlowReachableAnalysis to UnsafeBufferReachableAnalysis
The previously named PointerFlowReachableAnalysis essentially
propagates unsafe buffers across a pointer flow graph. The pointer
flow analysis is a dependency rather than the subject
itself. Therefore, this commit renames and moves the analysis to
better reflect its purpose.
[clang] Make `InitListExpr::isExplicit()` work (#195175)
The main goal of this change is to be able to use `isExplicit()` check
in the IWYU tool. Consider the following:
```cpp
struct Inner {};
struct Outer {
Inner inner;
};
const Outer& refOuter = {};
```
Here, Clang generates `InitListExpr` child node for the implicit `Inner`
initialization under the `InitListExpr` node corresponding to the
initializer of `refOuter`. IWYU should require the header containing
`Outer` definition for the initializer, but not the header for `Inner`
because it should be already provided by `Outer`.
[9 lines not shown]
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs
These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.
Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
[lld][MachO] Fix symbol resolution order with --read-workers (#193239)
When read-workers is enabled, archives are deferred while
frameworks and dylibs are processed immediately. This can cause
incorrect resolution if dylib's symbols are inserted into the
symbol table before an archive's lazy-symbols.
This also fixes Chromium bug crbug.com/500256589 whose root cause
was due to this incorrect resolution of dylib symbols being
inserted too soon.
This patch now defers all library and framework types.
`ArchiveFile::addLazySymbols()` is also called immediately after
each archive is created in the processing loop.
---------
Co-authored-by: Nico Weber <thakis at chromium.org>
[NFC][Analysis] Use `isa<ConstantPointerNull>` for null pointer checks
Make Analysis null pointer checks use `ConstantPointerNull` rather than generic
null value checks.