[Offload][OpenMP][Flang] Update no-loop test (#205803)
Updates to the kernel type detection logic now allow `target parallel
do` to be promoted to SPMD-No-Loop.
A currently broken offload test that was affected by this change is
updated here.
[clang][dataflow] Move expensive solver asserts under EXPENSIVE_CHECKS (#205715)
The watched-literal solver has a few invariant checks that run on every
solver iteration in assertion builds. Some of these checks rebuild and
iterate over the watched-literal state. This overhead is usually hidden,
but it becomes dominant for large flow-sensitive analyses.
While testing clang-tidy's `unchecked-optional-access` check on real
world projects (in this case, LLVM itself), we found there are a few
extreme slow analyses caused by this overhead.
| Time | File |
|---------|-----------------------------------------------------|
| 8235.7s | llvm-project/clang/utils/TableGen/RISCVVEmitter.cpp |
| 8197.2s | llvm-project/clang/lib/Driver/Multilib.cpp |
(Ran on a machine with Icelake 32cores + 128gb memory)
After moving these asserts to `EXPENSIVE_CHECKS`, the same files
[13 lines not shown]
[NaryReassociate] Fix divide by zero crash in NaryReassociatePass (#202377)
Updates NaryReassociatePass with a safety check to guard against GEPs
into arrays with zero sized element types (eg. [0 x ptr]) to prevent
division by zero.
[analyzer] Fix unjustified early return in processCallExit (#205656)
In `ExprEngine::processCallExit` step 3 may theoretically split the
state because it calls `removeDead`, which activates `LiveSymbols` and
`DeadSymbols` callbacks of various checkers. (However, in practice it is
likely that these checker callbacks never actually split the state -- at
least, no such state splits happen in the LIT tests.)
The nodes produced by `removeDead` are placed in the set `CleanedNodes`;
in theory the different execution paths should be handled in parallel,
independently of each other. However, the loop `for (ExplodedNode *N :
CleanedNodes)` contained an early return statement, which meant that if
the creation of `CEENode` failed for a node `N`, then the subsequent
iterations were skipped altogether.
This commit replaces the `return` with a `continue` to ensure that the
nodes in `CleanedNodes` are handled independently (if there are several
such nodes).
[6 lines not shown]
[LifetimeSafety] Gate annotation suggestions behind `SuggestAnnotations` opt (#205764)
Annotations suggestions expectedly fire very often and they have
recently shown significant regressions after the
https://github.com/llvm/llvm-project/pull/204045. This now gates the
suggestions behind a dedicated `SuggestAnnotations` option, preventing
unnecessary work when the relevant diagnostics are disabled.
[VPlan] Allow VPValue in match_fn without needing explicit template arguments. NFC (#205748)
Currently if you want to use match_fn over a range of VPValues, you have
to explicitly write `match_fn<VPValue>` otherwise it will resolve to the
VPUser overload.
This changes the functor to be a lambda with an auto argument so
match_fn(...) works for both VPValues and VPUsers without explicit
templates. The lambda is inlined so there's no indirect function call.
vputils::getGEPFlagsForPtr is updated to use the new form.
We can't use `bind_back` since it requires we bind to exactly one
function that's known at call time.
[Attr] Add `noipa` function attribute (#203304)
This adds a `noipa` function attribute to LLVM IR. This new attribute
disables any interprocedural analysis that inspects the definition of
the function. Setting this attribute is equivalent to moving the
function definition to a separate, optimizer-opaque, module.
The `noipa` attribute does *not* control inlining or outlining. Add the
`noinline` and `nooutline` attributes as well in cases where inlining
and outlining should additionally be disabled.
Revival of https://reviews.llvm.org/D101011
Discussed in https://discourse.llvm.org/t/noipa-continues/74411
LLVM portion of https://github.com/llvm/llvm-project/issues/40819
[clangd] Fix unknown doxygen command parsing in parameter documentation (#202121)
This patch mainly fixes a bug with parsing of unknown doxygen commands
in function parameter documentation.
To extract the parameter documentation from the function documentation,
the whole function documentation is parsed first.
Then the documentation paragraph for the requested parameter is
"converted" to a string and stored as the documentation for the
parameter. The string is converted by visiting and dumping all chunks of
the parsed paragraph.
When unknown doxygen commands are parsed (during the function
documentation parsing step), they are registered in a
`clang::comments::CommandTraits` object.
Visiting the unknown command requires to query the registered commands
through the `clang::comments::CommandTraits` object to get the command
name.
[18 lines not shown]
[MLIR][WASM] Re-Introduce the RaiseWasmMLIRPass to convert WasmSSA MLIR to core dialects (#205483)
See the previous PR here:
https://github.com/llvm/llvm-project/pull/164562
It was reverted by @lforg37 because of some build bot issue: see
https://github.com/llvm/llvm-project/pull/164562#issuecomment-4756828598.
However, after checking on my end, I could not reproduce the buildbot
issue. Seeing that the problem triggered in `flang` which is completely
unrelated to this work, I assume that it was a builder or a flaky test
problem so I'm re-opening this PR as it had been initially merged.
---------
Signed-off-by: Ferdinand Lemaire <flemairen6 at gmail.com>
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire at woven-planet.global>
Co-authored-by: Ferdinand Lemaire <flemairen6 at gmail.com>
utils/AMDGPU: Add scripts to update tests using default subtarget
Add vibe coded scripts to migrate AMDGPU codegen tests that run llc
without a -mcpu argument. This should either be uncommitted or delete
after the migration is completed.
amdgpu-pin-default-subtarget.py adds an explicit -mcpu matching the current
default for the triple's OS (amdhsa defaults to gfx700, all others to
gfx600). The blank default subtarget is a featureless generic that does not
match any explicit -mcpu, so pinning changes codegen; the batch driver
amdgpu-pin-default-subtarget-batch.sh regenerates the autogenerated CHECK
lines and reverts anything that still fails.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
clang: Move __builtin_amdgcn_processor_is diagnostic test to sema (#205734)
This wasn't checking the codegen result, so move it to the right place
and use -verify instead of FileChecking stderr.
Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>
[Support][CHERI] Refactor CHERICapabilityFormatBase to embrace CRTP (#205623)
Currently CHERICapabilityFormatBase does not provide a definition for
getAlignmentMask, but does provide a declaration, which leads to
warnings when building with MSVC. We want to have an abstract base here
without any dynamic dispatch, which is what CRTP is for, so use it for
getAlignmentMask such that the base can provide a definition that uses
each derived type's implementation, just as the two base wrappers were
already doing when calling getAlignmentMask. Whilst doing this we might
as well move the wrappers to the header so they can be inlined (and now
that getAlignmentMask is defined we can use it in the helpers rather
than needing each of them to explicitly use the derived type).
Fixes: 7dc09d0d3cf1 ("[CHERI] Add a Support utility for determining
alignment requirements of CHERI capabilities. (#197402)")
AMDGPU: Avoid default subtarget in codegen tests (4/9)
Continue migrating targets away from codegenning the dummy target
by script.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
AMDGPU: Avoid default subtarget in hand-written codegen tests (8/9)
Introduce the missing -mcpu argument to some tests which are not
autogenerated.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
AMDGPU: Avoid default subtarget in hand-written codegen tests (6/9)
Introduce -mcpu arguments in tests which didn't require check line
updates.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
AMDGPU: Avoid default subtarget in hand-written codegen tests (9/9)
Fix some manual test checks using amdgcn triples without -mcpu. These require the
most careful consideration. The highest impact changes are the optimizations
removing execz branch now that there's a sched model.
AMDGPU: Avoid default subtarget in hand-written codegen tests (7/9)
Introduce an -mcpu argument to tests missing it to avoid codegening
the default dummy target. These are cases that didn't require adjusting
the check lines.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
AMDGPU: Avoid default subtarget in hand-written codegen tests (5/9)
Introduce -mcpu arguments in tests that did not need check line updates.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
AMDGPU: Avoid default subtarget in generated codegen tests (3/9)
Another batch of tests updated by script.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
AMDGPU: Avoid using default subtarget in generated codegen tests (1/9)
Fix codegen tests using amdgcn triples without a target-cpu. The dummy
default subtarget has always been an irritating edge case to deal with.
For unknown/mesa3d/amdpal triples, this has been a gfx600-like result
and gfx700-like result for amdhsa. Convert tests to use the explicit
target. This was performed by vibe-coded script, and covers tests
using update_{llc|mir}_test_checks. There are some minor codegen differences
to be expected, mostly due to now having a scheduling model.
In the future we should forbid trying to codegen the default target.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
AMDGPU: Avoid default subtarget in generated codegen tests (2/9)
Continue migrating away from testing the dummy target, and use
real targets approximating the old behavior. Performed by script.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
[LifetimeSafety] Fix loop liveness leakage for conditional operator
Generate flow facts for conditional operators in their respective
predecessor blocks (branches) instead of the merge block, path-isolating
the flows and preventing liveness from leaking across loop backedges.
Also includes tests, formatting cleanups, and refactoring of the flow propagation.
TAG=agy
CONV=b4614911-a1e1-489f-a395-2f895c423788
Reapply "[InstCombine] Merge consecutive assumes", round 2 (#205773)
This patch was reverted due to triggering another bug. That bug has been
fixed by https://github.com/llvm/llvm-project/pull/205275, so this
should be ready to land now.
Original commit message:
This should make assumes a bit more efficient, since it removes a few
instructions. This should also help with optimizations that are limited
in how many instructions they step through.
This reverts commit 053d75c1d580e0c394f4cfb0688bafd05c187b0f.