[CI] Remove unused env var from commit-access-greeter (#199024)
LABEL_NAME has been there since the workflow's introduction in
f8ef2699d860aea97750953f1b79db8ef7574e82, but has never been used.
[OpenMP][OMPIRBuilder] Refactor removeUnusedBlocksFromParent (#198938)
This is essentially post-commit review for #198690 which was landed
quickly to fix nondeterminism in tests introduced in #197637
Change-Id: Ib3603ef3c70dde5bb22d0fc04d9249e62ecccf0c
Co-authored-by: @Meinersbur
Co-authored-by: @chichunchen
[RISCV] Ensure AVL dominates True in vmerge peephole (#199008)
When folding vmerge into its true operand, if vmerge has an AVL defined
by a register and true has VLMAX, then the minimum AVL will be the
register. In this case it's not guaranteed to dominate true, so we need
to potentially sink true so it does.
This teaches ensureDominates to check multiple definitions at the same
time, since we want the sinking to be atomic.
Fixes #198733
[Flang][OpenMP][NFC] Remove Fortran Evaluate Depedancy from Support (#198742)
Following #197442, FortranEvaluate was implicitly included in
OpenMP-utils.h which should be avoided to ensure front-end data
structures in the Optimizers can stop and restart pure MLIR source
without any side-data structures.
To ensure this is done, EntryBlockArgs has been stripped back to only
track vars, objects are now tracked within ObjectEntryBlockArgs in
Lowering as this is a more appropriate place for this information, and
the existing symbol tracking in EntryBlockArgsEntry was only used here.
This ensures FortranEvaluate is not needed within the Optimizers, and
objects can still be maintained when lowering. This enables better
referencing in Reduction Clauses, where previously context was being
lost for expressions such as ArrayElements.
See more: #197442
Assisted-by: Codex
[LoopInterchange] Disable LoopCacheAnalysis-based heuristic by default (#193478)
LoopInterchange has three types of heuristics for profitability
decisions: `cache`, `instorder`, and `vectorize`. Currently, the
profitability check invokes these heuristics in this order. The
heuristic corresponding to `cache` is based on LoopCacheAnalysis.
However, LoopCacheAnalysis applies several aggressive heuristics, which
can sometimes lead to undesirable decisions. In contrast, the heuristic
corresponding to `instorder` is relatively simpler than `cache`, but its
behavior is clear and it is likely sufficient for practical cases.
In light of the default enablement, I believe it is better to use a
simpler, easier‑to‑reason‑about, and more stable heuristic rather than
an aggressive but complex one. Therefore, this patch disables the
LoopCacheAnalysis‑based profitability check by default.
[InstCombine] Fold `X s<= Y ? 0 : X -nsw Y -> X - smin(X, Y)` (#187898)
This is part of #146131 and #182597
`func3` and `func4` are
[equivalent](https://alive2.llvm.org/ce/z/NNMTDa) but `func3` produces a
`sext` instead of `zext` when `b - a` is known non-negative.
[Proof of correctness](https://alive2.llvm.org/ce/z/ZthC9m)
```c++
#include <stdint.h>
uint64_t func3(int32_t a, int32_t b) {
return (b < a ? 0 : (int32_t)(b - a));
}
uint64_t func4(int32_t a, int32_t b) {
return (b < a ? 0 : (uint32_t)(b - a));
[10 lines not shown]
[AArch64] C1-Nano scheduling model refactor [NFC] (#198469)
Creates explicit definitions for each latency/throughput/resource
combination and use the definitions in the instruction rule definitions.
Alhough this change touches most lines in the model, there is no
functional change - all test cases are not affected by this change.
This makes the style of the C1-Nano scheduling model be similar to that
used in the C1-Ultra / C1-Premium and is being done in preparation to
including the work to support SME instructions that is currently being
implemented on the C1-Ultra scheduling model
[lld][PAC] Print full version and platform values on core mismatch (#198758)
`toHex()` only prints a single byte of the integer value, which can hide
the actual mismatch in AArch64 PAuth ABI core info diagnostics.
clang/AMDGPU: Use TranslateArgs from the base toolchain instead of the host
This fixes -Xopenmp-target / -Xarch for arbitrary arguments. HIP and OpenMP
had cargo-cult broken implementations of TranslateArgs, which called the host
toolchain's implementation, and then special case transferred either -march
or -mcpu to the device argument list. The respective device forwarding flags
should work for any argument, not just this one. The main feature that needs
to be preserved is the shared filtering of unsupported sanitizers to degrade
them into warnings.
Most of the changes here are dealing with fallout observed when
the host target is darwin. The darwin toolchain happens to have
some hacky statefulness tracking the compile target version, which
gets written and rewritten on argument parsing. To maintain this hack,
there are a few unused calls to getArgsForToolChain; start passing OFK_Host
to these so the offload toolchains don't get confused and think they're in
a non-offload context.
[Clang] Add warning for non-portable include paths with trailing whitespace or dots (#190610)
This patch extends -Wnonportable-include-path to detect and warn about
trailing whitespace and dots in #include directives. Such paths are
non-portable and can lead to build failures on different operating
systems.
The warning is triggered when an include filename ends with a space or a
dot, which is common when copy-pasting paths or due to typos.
Fixes #96064
[Offload] Fix LLVM_LINK_LLVM_DYLIB linking liboffload (#198955)
Summary:
When this is set you can only link against `LLVM`. The previous patch
did not respect this because I did not realize that internally in the
add_llvm_library that this was required.
[VPlan] Run replaceSymbolicStrides on VPlan0 (NFCI). (#196840)
Running replacceSymbolicStrides on VPlan0 means we only need to run it
once, and also enables simplifications earlier on. It is also needed to
be able to compute costs of the scalar VPlan0 early accurately, without
hacks manual folds like in the legacy cost model.
PR: https://github.com/llvm/llvm-project/pull/196840
clang/AMDGPU: Use TranslateArgs from the base toolchain instead of the host
This fixes -Xopenmp-target / -Xarch for arbitrary arguments. HIP and OpenMP
had cargo-cult broken implementations of TranslateArgs, which called the host
toolchain's implementation, and then special case transferred either -march
or -mcpu to the device argument list. The respective device forwarding flags
should work for any argument, not just this one. The main feature that needs
to be preserved is the shared filtering of unsupported sanitizers to degrade
them into warnings.
Most of the changes here are dealing with fallout observed when
the host target is darwin. The darwin toolchain happens to have
some hacky statefulness tracking the compile target version, which
gets written and rewritten on argument parsing. To maintain this hack,
there are a few unused calls to getArgsForToolChain; start passing OFK_Host
to these so the offload toolchains don't get confused and think they're in
a non-offload context.
clang: Refactor handling of offload sanitizer arguments
Previously the AMDGPU toolchains hackily handled -fsanitize arguments.
They would lie and report that all host side sanitizers are available,
then TranslateArgs would filter out the device side cases that do not
work, providing diagnostics for the skipped cases. Move that logic
into the base sanitizer argument parsing.
This makes the produced diagnostics more consistent. Previously we
would get repeated warnings when a sanitizer is fully unsupported
by amdgpu, which should now be once for the toolchain. These could
be further improved; we're printing the specific field of -fsanitize
in more cases where it could be skipped. In other cases we have the
opposite problem, where we aren't reporting the exact sanitizer
from the -f flag in the case that depends on a subtarget feature.
This will help fix other broken target specific flag forwarding bugs
in the future.
Co-authored-by: Claude Sonnet 4 <noreply at anthropic.com>
[LoopInterchange] Take base pointer into account in profitability check (#193477)
Currently `getInstrOrderCost` doesn't check the base pointers of the
accesses, which can lead to undesirable profitability decisions. This
patch makes the function take the base pointers into account. Fix the
test case added in #193476.