[mlir] Use add_tablegen() for mlir-src-sharder to fix aarch64 cross-compile (#196202)
`add_tablegen()` already sets `MLIR_SRC_SHARDER_TABLEGEN_EXE` to the
native host-tool path during cross-compilation (via
`build_native_tool`). The leftover manual
`set(MLIR_SRC_SHARDER_TABLEGEN_EXE mlir-src-sharder PARENT_SCOPE)`
clobbered that path with the bare binary name, causing aarch64
cross-builds to fail with:
```
/bin/sh: 1: mlir-src-sharder: not found
```
when sharding `TestOps`. Switching `mlir-src-sharder` from
`add_llvm_executable` to `add_tablegen` (and dropping the redundant
`set(... PARENT_SCOPE)`) lets the existing cross-compile machinery point
consumers at the host build of the tool.
[VectorCombine] Fold reduce.add == 0 into reduce.[or,umax] == 0
If every lane of a fixed-length vector is non-negative or every lane is
non-positive, and the lane count is small enough that summation cannot
wrap, then reduce.add(V) == 0 exactly when every lane is zero. In that
case the add reduction can be replaced by reduce.or or reduce.umax,
whichever is cheaper on the target.
[CIR] Fix function signature mismatch on redirected calls (#196665)
We were running into CIR verification errors ("error: 'cir.call' op
operand type mismatch") when compiling with some older versions of the
GLIBC headers that used a macro to redirect system library calls to a
function that used different, but compatible, arguments.
This change fixes the problem by detecting the mismatch at the callsite
and bitcasting the arguments.
Assisted-by: Cursor / claude-opus-4.7-thinking-xhigh
[sanitizer_common] Implement address sanitizer on AIX: platform specific support (#131866)
Add recognition of AIX and some platform specific changes. This lays the
groundwork to implement AIX in sanitizer_common/asan.
Issue: https://github.com/llvm/llvm-project/issues/138916
[LifetimeSafety] Warn on implicit this lifetimebound violations (#196926)
With this change we report `[[clang::lifetimebound]]` violations on the
implicit `this` parameter.
It also adds a helper to retrieve the `[[clang::lifetimebound]]`
attribute on method declarations, so diagnostics can point directly at
the attribute location.
[CIR] Global-TLS variable 'call' rewriting- (#197026)
This is a followup to my previous patch to handle global/namespace
thread local variables. This patch handles the
re-writing/lowering-prepare of the `get-global` for these variables.
Each call to one of these is required to go to a 'wrapper' function,
which optionally calls the initializer. This patch does not handle the
initializer call (so each wrapper call is a very simple 'return the
variable'), as that will be handled in a followup.
Also, variables without initialization don't use a wrapper in Classic
Codegen, however this patch does. The followup patch that will call the
initializer will skip the call to the initializer, but leave the wrapper
in place. This is a necessity due to how we handle global ops/get-global
ops: we won't know whether there is a required ctor/dtor that needs an
initializer at the time of wrapper-write-replacement.
Fix "author" handling in GitHub PR Greeter (#197140)
This is a follow-up to #194307 and fixes the issue reported in:
* https://github.com/llvm/llvm-project/pull/194307#issuecomment-4426270256
Use the same author-detection logic in `PRGreeter` as in
`PRBuildbotInformation`, so both components handle PR authors
consistently.
[libclc] Consolidate `amdgpu` and `amdgcn` architectures consistently (#197233)
Summary:
Currently we did not pass all checks with amdgpu triple as we did with
amdgcn. SPIR-V set this pattern so let's make it consistent.
[llvm][tools] Use temp dir for offload-binary unbundling test (#197234)
Certain environments will leave some of the test dirs read-only for
immutability purposes. Create a new temporary directory so that
llvm-offload-binary has a writable directory to unbundle the image into.
With this method we can also delete the temporary directory preventing
breakage of the failure from still passing tests due to leftover files.
Delete top level mops-instructions.s file (#197244)
The top level file mops-instructions.s file was accidentally added with
the AArch64 C1-Nano scheduling model and is deleted.
The correct file is located in
llvm/test/tools/llvm-mca/AArch64/Inputs/mops-instructions.s
[AMDGPU][MIRFormatter] Human-readable mask for S_WAITCNT_soft (#197075)
This patch reuses the S_WAITCNT mask printer and parser for
S_WAITCNT_soft. It prints the mask in a human-readable format, showing
the counter values like `Vmcnt_<NUM>_Expcnt_<NUM>_Lgkmcnt_<NUM>`.
[clang-tidy] Add `llvm-formatv-string` (#195974)
Adds a clang-tidy check to perform some validation on `llvm::formatv`
calls. Similar to the built in support Clang has for checking printf
calls.
The validations are:
- The number of unique format indices matches the number of arguments.
- Every argument is used by the format string.
- Automatic and explicit indices are not mixed.
This includes a config option (`AdditionalFunctions`) to perform the
same validation checks on other functions which take formatv inputs.
Assisted-by: claude
---------
Co-authored-by: EugeneZelenko <eugene.zelenko at gmail.com>
Co-authored-by: Victor Chernyakin <chernyakin.victor.j at outlook.com>
Co-authored-by: Zeyi Xu <zeyi2 at nekoarch.cc>
[AsmParser] Use cantFail for FloatLiteral string conversion (#197064)
With assertions disabled but `LLVM_ABI_BREAKING_CHECKS=FORCE_ON`, the
`assert` was elided, the Expected stayed unchecked, and the subsequent
`*Except` tripped `fatalUncheckedError`. Fix this by switching to
`cantFail`.
Assisted-by: Claude Opus
DAGCombiner: (srl/sra (add nuw/nsw X, c), d) --> (add nuw/nsw (srl/sra X, d), c >> d) (#196379)
Additional precondition:
* The LSBs of c are 0; equivalently: c >> d is exact
Alive2 for
* unsigned case: https://alive2.llvm.org/ce/z/YcJ8qA
* signed case: https://alive2.llvm.org/ce/z/fgpvyE
We already canonicalize (shl (add ...) ...) to (add (shl ...) ...).
Restrict this combine to the single-use case to minimize risk for now.
The main target of this combine is a fan-out tree of `add`s that all end
up being shifted by the same amount at the leaves. This change happens
to
improve a bunch of existing CodeGen tests in AMDGPU.
v2:
- remove a redundant check on the shift amount -- large shift amounts
results in poison anyway
[CIR][HIP] Handle HIP module constructor and destructor emission (#195391)
Related: https://github.com/llvm/llvm-project/issues/179278,https://github.com/llvm/llvm-project/issues/175871
Similar to https://github.com/llvm/llvm-project/pull/188673, This adds
the HIP host-side module registration path in CIR lowering for the
non-RDC, included-fatbin case.
Generated sequence for HIP, non-RDC, with `-fcuda-include-gpubinary`:
```c
void **__hip_gpubin_handle = nullptr;
void __hip_module_ctor() {
if (__hip_gpubin_handle == nullptr)
__hip_gpubin_handle = __hipRegisterFatBinary(&__hip_fatbin_wrapper);
__hip_register_globals(__hip_gpubin_handle); // we only register kernels
[12 lines not shown]
[lldb][windows] fix 4-byte error-code read (#197177)
Reading `word_size` (8) bytes here would include 4 bytes of stack
garbage past the struct and produce bogus error codes.
[clang] Add typed variants for C2y stdbit.h rotate builtins (#195299)
stdc_rotate_left_{uc,us,ui,ul,ull}
stdc_rotate_right_{uc,us,ui,ul,ull}
Lower type-specific <stdbit.h> rotate functions to LLVM intrinsics
(fshl/fshr). Includes constant expression support and tests for Sema,
CodeGen, and constant evaluation.
Followup: #160259
[PAC][lld] Do not emit AUTH relocs against undef weak non-preemptible symbols
Undefined weak non-preemptible symbols should be statically resolved to
the addend value and not signed. Previously, a dynamic relocation
against such symbols was emitted, which is not a correct behavior.
See also docs: https://github.com/ARM-software/abi-aa/pull/391
Resolves #173296
Fix assertion failure of `APInt::sqrt` on U64 MAX input (#197161)
Closes #197145
In https://github.com/llvm/llvm-project/blob/65a206f2ec552cccf7c96c5306147f0437832ec7/llvm/lib/Support/APInt.cpp#L1305-L1312:
Instead of computing `nextSquare` completely (which overflows), we only
need to compute the difference between `x_old^2` and `(x_old + 1)^2`
which is simply `2 * x_old + 1` since `(x_old + 1)^2 = x_old^2 + 2 *
x_old + 1`. We can use this difference for the following computation of
the midpoint.
[ScheduleDAG] Add a reachability cache to amortize DFS calls (#195079)
ScheduleDAGTopologicalSort::IsReachable falls out to a DFS on its
slow path. For some connectivity patterns this can result in ~quadratic
behavior.
Add a cache of {A, B} -> Reachable(A, B). This is invalidated whenever
AddPred or InitDAGTopologicalSorting is called.
For an antagnostic testcase, SelectionDAG time went from 1300s to 250s.
No testcase as no functional change, performance only.
---------
Co-authored-by: James Molloy <jmolloy at google.com>