[AMDGPU][NFC] Roundtrip gfx11_asm_vop3_from_vop2.s
Removes the need for gfx11_dasm_vop3_from_vop2_hi.txt sitting
downstream.
Catches a problem with printing op_sel for the tied operands in
v_fmac_f16_e64.
[libc++] Add missing attribute usages to `<__memory/shared_ptr.h>` (#205776)
Since 44546e0e32077241ca9a9a90ac57f2f086f9488a, lack of
`_LIBCPP_NODEBUG` and `_LIBCPP_HIDE_FROM_ABI` are caught by clang-tidy.
This patch adds them wherever expected.
[libc][stdlib] Add unsetenv (#202422)
Added the POSIX unsetenv() function and its internal support.
Implemented EnvironmentManager::unset() to remove a variable by name,
free the string if allocated, and compact the array.
Updated EnvironmentManager to synchronize the public global environ
pointer when transitioning to managed storage.
Registered for x86_64, aarch64, and riscv. Integration tests cover basic
operations and edge cases.
Assisted-by: Automated tooling, human reviewed.
[Bazel] Fixes 5314be5 (#205818)
This fixes 5314be5a740c9985b0b3ab958269b5f1824cce02.
Signed-off-by: Ingo Müller <ingomueller at google.com>
Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
[AArch64] Correct A510 scheduling information for LDn instructions (#205518)
The latency and throughput for these instructions don't match what's in
the A510 Software Optimization Guide, so adjust them so that they do
match. Also rearrange the definitions to match how they're structured in
the optimization guide and rename things in a similar manner to how the
C1 CPUs do things, as it's much clearer.
[mlir][vector] add consistent stride verification to `masked load/store` and `gather/scatter` ops (#204842)
Extend negative stride checks to MaskedLoadOp, MaskedStoreOp, GatherOp,
and ScatterOp to match LoadOp and StoreOp behavior.
Depends on: #204611.
AI Disclaimer: I used AI for the tests.
---------
Signed-off-by: Federico Bruzzone <federico.bruzzone.i at gmail.com>
Revert "Reapply "[InstCombine] Merge consecutive assumes", round 2" (#205805)
It looks like there is still a bug with removing assumes from the
assumption cache.
Reverts llvm/llvm-project#205773
[clang][bytecode] Fix `evaluateDestruction()` (#205778)
Me previous testing regarding this seems to have been insufficient. Or
this regressed some time along the way.
Now that `CLANG_USE_EXPERIMENTAL_CONST_INTERP` is used for testing I
noticed a few regressions.
We need to special-case the evaluating decl in a few places, since it's
a global variable that we're allowed to modify.
[libc] Add libgen.h to target public headers (#205804)
Ensure libgen.h is included in TARGET_PUBLIC_HEADERS for Linux targets
so that it gets generated and installed.
Assisted-by: Automated tooling, human reviewed.
[Offload][OpenMP][Flang] Update no-loop test (#205803)
Updates to the kernel type detection logic now allow `target parallel
do` to be promoted to SPMD-No-Loop.
A currently broken offload test that was affected by this change is
updated here.
[clang][dataflow] Move expensive solver asserts under EXPENSIVE_CHECKS (#205715)
The watched-literal solver has a few invariant checks that run on every
solver iteration in assertion builds. Some of these checks rebuild and
iterate over the watched-literal state. This overhead is usually hidden,
but it becomes dominant for large flow-sensitive analyses.
While testing clang-tidy's `unchecked-optional-access` check on real
world projects (in this case, LLVM itself), we found there are a few
extreme slow analyses caused by this overhead.
| Time | File |
|---------|-----------------------------------------------------|
| 8235.7s | llvm-project/clang/utils/TableGen/RISCVVEmitter.cpp |
| 8197.2s | llvm-project/clang/lib/Driver/Multilib.cpp |
(Ran on a machine with Icelake 32cores + 128gb memory)
After moving these asserts to `EXPENSIVE_CHECKS`, the same files
[13 lines not shown]
[NaryReassociate] Fix divide by zero crash in NaryReassociatePass (#202377)
Updates NaryReassociatePass with a safety check to guard against GEPs
into arrays with zero sized element types (eg. [0 x ptr]) to prevent
division by zero.
[analyzer] Fix unjustified early return in processCallExit (#205656)
In `ExprEngine::processCallExit` step 3 may theoretically split the
state because it calls `removeDead`, which activates `LiveSymbols` and
`DeadSymbols` callbacks of various checkers. (However, in practice it is
likely that these checker callbacks never actually split the state -- at
least, no such state splits happen in the LIT tests.)
The nodes produced by `removeDead` are placed in the set `CleanedNodes`;
in theory the different execution paths should be handled in parallel,
independently of each other. However, the loop `for (ExplodedNode *N :
CleanedNodes)` contained an early return statement, which meant that if
the creation of `CEENode` failed for a node `N`, then the subsequent
iterations were skipped altogether.
This commit replaces the `return` with a `continue` to ensure that the
nodes in `CleanedNodes` are handled independently (if there are several
such nodes).
[6 lines not shown]
GlobalISel/LegalizerHelper: Use same LLT kind as WideTy for widen merge
In widenScalarMergeValues, WideTy is input given by target. Use same LLT
kind for other types of different sizes instead of LLT::scalar.
Makes a difference with extendedLLTs.
[LifetimeSafety] Gate annotation suggestions behind `SuggestAnnotations` opt (#205764)
Annotations suggestions expectedly fire very often and they have
recently shown significant regressions after the
https://github.com/llvm/llvm-project/pull/204045. This now gates the
suggestions behind a dedicated `SuggestAnnotations` option, preventing
unnecessary work when the relevant diagnostics are disabled.
[VPlan] Allow VPValue in match_fn without needing explicit template arguments. NFC (#205748)
Currently if you want to use match_fn over a range of VPValues, you have
to explicitly write `match_fn<VPValue>` otherwise it will resolve to the
VPUser overload.
This changes the functor to be a lambda with an auto argument so
match_fn(...) works for both VPValues and VPUsers without explicit
templates. The lambda is inlined so there's no indirect function call.
vputils::getGEPFlagsForPtr is updated to use the new form.
We can't use `bind_back` since it requires we bind to exactly one
function that's known at call time.
GlobalISel/LegalizerHelper: Use type of input load dst for LowerLoad
Deduce dst type for new instructions, that do the load lowering, from
destination type of original load instead of from MMO.
Makes a difference with extendedLLTs.
[Attr] Add `noipa` function attribute (#203304)
This adds a `noipa` function attribute to LLVM IR. This new attribute
disables any interprocedural analysis that inspects the definition of
the function. Setting this attribute is equivalent to moving the
function definition to a separate, optimizer-opaque, module.
The `noipa` attribute does *not* control inlining or outlining. Add the
`noinline` and `nooutline` attributes as well in cases where inlining
and outlining should additionally be disabled.
Revival of https://reviews.llvm.org/D101011
Discussed in https://discourse.llvm.org/t/noipa-continues/74411
LLVM portion of https://github.com/llvm/llvm-project/issues/40819
[clangd] Fix unknown doxygen command parsing in parameter documentation (#202121)
This patch mainly fixes a bug with parsing of unknown doxygen commands
in function parameter documentation.
To extract the parameter documentation from the function documentation,
the whole function documentation is parsed first.
Then the documentation paragraph for the requested parameter is
"converted" to a string and stored as the documentation for the
parameter. The string is converted by visiting and dumping all chunks of
the parsed paragraph.
When unknown doxygen commands are parsed (during the function
documentation parsing step), they are registered in a
`clang::comments::CommandTraits` object.
Visiting the unknown command requires to query the registered commands
through the `clang::comments::CommandTraits` object to get the command
name.
[18 lines not shown]
[MLIR][WASM] Re-Introduce the RaiseWasmMLIRPass to convert WasmSSA MLIR to core dialects (#205483)
See the previous PR here:
https://github.com/llvm/llvm-project/pull/164562
It was reverted by @lforg37 because of some build bot issue: see
https://github.com/llvm/llvm-project/pull/164562#issuecomment-4756828598.
However, after checking on my end, I could not reproduce the buildbot
issue. Seeing that the problem triggered in `flang` which is completely
unrelated to this work, I assume that it was a builder or a flaky test
problem so I'm re-opening this PR as it had been initially merged.
---------
Signed-off-by: Ferdinand Lemaire <flemairen6 at gmail.com>
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire at woven-planet.global>
Co-authored-by: Ferdinand Lemaire <flemairen6 at gmail.com>
utils/AMDGPU: Add scripts to update tests using default subtarget
Add vibe coded scripts to migrate AMDGPU codegen tests that run llc
without a -mcpu argument. This should either be uncommitted or delete
after the migration is completed.
amdgpu-pin-default-subtarget.py adds an explicit -mcpu matching the current
default for the triple's OS (amdhsa defaults to gfx700, all others to
gfx600). The blank default subtarget is a featureless generic that does not
match any explicit -mcpu, so pinning changes codegen; the batch driver
amdgpu-pin-default-subtarget-batch.sh regenerates the autogenerated CHECK
lines and reverts anything that still fails.
Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
clang: Move __builtin_amdgcn_processor_is diagnostic test to sema (#205734)
This wasn't checking the codegen result, so move it to the right place
and use -verify instead of FileChecking stderr.
Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>