[Xtensa] Call isUInt<8> in range-check asserts (#204731)
`printOffset8m8_AsmOperand` and `getSelect_256OpValue` assert on
`isUInt<8>` without calling it, so the expression takes the function's
address and the range check never runs. This also trips
`-Werror,-Wpointer-bool-conversion` in builds with assertions enabled.
Pass the operand value so the bound is actually checked.
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[RFC][IR] Extract AMDGPU-specific verification logic into `VerifierAMDGPU.cpp`
`Verifier.cpp` is large and already mixes generic IR verification with
target-specific checks. We also have a growing amount of AMDGPU verifier logic
downstream, which would all end up in the same file if we don't address this,
and that is not ideal.
This patch extracts AMDGPU-specific verification logic into a separate
`VerifierAMDGPU.cpp` file, with shared infrastructure (`VerifierSupport`) moved
into `VerifierInternal.h`.
This is purely a code organization change, not a target-dependent IR verifier.
All checks remain compiled and linked into `LLVMCore` regardless of the target
triple. The extracted functions are called unconditionally at well-defined
extension points in `Verifier.cpp`, and each function internally gates on
target-specific conditions (for example, triple checks or intrinsic IDs) as
needed. The file is strictly limited to AMDGPU-specific IR constructs (amdgcn
intrinsics, AMDGPU module flags, etc.), and does not contain generic IR rules
that vary by target.
[10 lines not shown]
[clang][x86] Add constexpr support for VNNI intrinsics (#190549)
Fixes #161340.
It adds constexpr support for VNNI
intrinsics by modifying their header files, their TableGen definitions,
how they're interpreted in InterpBuiltin.cpp and ExprConstant.cpp, and
adds unit tests in the headers' corresponding unit test files.
[orc-rt] Replace TaskDispatcher with Session-supplied wrapper-runner. (#204965)
TaskDispatcher was only used to run wrapper-function calls that
originated from the controller. Replace it with a callable type:
Session::RunWrapperCall = move_only_function<void(
orc_rt_SessionRef, uint64_t, orc_rt_WrapperFunctionReturn,
orc_rt_WrapperFunction, WrapperFunctionBuffer)>
Each call carries an outstanding ManagedCodeTaskGroup token; the runner
must eventually invoke Fn (which calls Return) or call Return directly
to bail out, otherwise Session shutdown blocks indefinitely.
Clients can supply any callable that satisfies the contract above. The
new QueueingRunner and ThreadPoolRunner classes (replacing
QueueingTaskDispatcher and ThreadPoolTaskDispatcher, respectively) are
provided as off-the-shelf options.
[LV] Avoid zero-width VF in computeVPlanOuterloopVF. (#204918)
RegSize / WidestType may be 0 for types wider than the vector register
size. Clamp VF to at least 1 (scalar), to avoid a crash. This matches
inner loop behavior.
Support for -fsplit-lto-unit option in flang driver (#204904)
Fix for buildbot failures in #202858
This commit fixes a regression introduced in commit
12aefe26cedd9a8f94546cc1f2be285cfddcc861 (Support for -fsplit-lto-unit
option in flang driver). When the compiler is built only for aarch64 one
of the testcase failed.
Added explicit check %if x86-registered-target for this testcase to
resolve the issue.
[tsan] fit Go/s390x mapping under QEMU (#204503)
QEMU linux-user first tries guest_base=0. In that identity-mapped mode,
fixed guest mappings use the same host addresses. On an x86-64 host
with four-level page tables, the Go/s390x meta shadow starts at
144 TiB, beyond the 128 TiB userspace limit, and its mmap fails with
ENOMEM during TSan initialization.
Move the meta shadow down by 32 TiB to
[0x700000000000, 0x780000000000), restoring the 16 TiB gap after the
shadow and placing all Go/s390x TSan regions below 2^47. Correct the
mapping comment's shadow size and ratio.
Failure report and native s390x comparison:
https://github.com/golang/go/issues/67881
QEMU identity guest-base selection:
https://github.com/qemu/qemu/blob/v10.2.3/linux-user/elfload.c#L1036-L1042
[9 lines not shown]
[orc-rt] Sink Session::sendWrapperResult into Session.cpp. NFC. (#204956)
This function is never called inline (except by Session::wrapperReturn,
which is also in Session.cpp), so there's no need for it to be in the
header.
[SimplifyCFG] Avoid threading loop-header branches in convergent functions
SimplifyCFG can fold a conditional branch when the condition is known from
a predecessor. When the destination is a loop header in a convergent function,
this can change the dynamic convergence structure of the loop even though the
scalar CFG rewrite is otherwise valid.
Skip this fold for loop-header branches in convergent functions so convergent
control flow is preserved.
Fixes ROCM-26496.
[clang] Add clang-format-check-format instead to CLANG_TEST_DEPS (#204908)
Ensure that clang-format doesn't break the existing format of its own
source.
Reverts #199169 and #199638.
[AMDGPU][VOPD] Cache load reachability checks in VOPDpairing (#204854)
#201930 causes significant compilation time regression when building
ROCm mathlibs.
Major regressions are caused by repeated queries to `DAG->IsReachable`
to detect possible scalarisation of loads when fusing a pair of
VOPD-capable instructions.
This patch caches the set of reachable loads for every potentially
hazardous load instruction to avoid the need to invoke
`DAG->IsReachable` at all.
[SelectionDAG] Keep split vector atomic store value in a vector register (#201566)
When the value of an ATOMIC_STORE has a vector type whose legalization
action is split (e.g. <4 x half>/<4 x bfloat> on X86 without F16C),
SplitVecOp_ATOMIC_STORE bitcast the value straight to a scalar integer
spanning the memory width. For a split vector that bitcast is expanded
element by element, reassembling the value in GPRs (a long pextrw/shl/or
sequence) before the store.
Instead, keep the value in a vector register when a legal vector form
exists: reinterpret it as a same-shaped integer-element vector (an FP
element type may have no legal vector form, e.g. bfloat on SSE2, while
the integer-of-element-size form does), widen that to a legal vector,
and extract the low integer element of the memory width. This issues the
store directly from a vector register (a single MOVQ/MOVD on X86),
matching the widen-path codegen already produced on AVX targets. Falls
back to the scalar bitcast when no suitable legal vector type exists.
Stacked on top of https://github.com/llvm/llvm-project/pull/197861; and
below of #197862.
[VPlan] Properly check predicates and types in canNarrowOps. (#204948)
Update canNarrowOps to properly check the types of all members match.
Similarly, for recipes with predicates, the predicates must match.
[llvm-objcopy][MachO] Use alignToPowerOf2 instead of alignTo (#204033)
During the review of #203680 I noticed that Mach-O objcopy files seems
to use `alignTo` and import `Alignment.h` to align some offsets to page
boundaries and similar requirements. However, the `alignTo` in
`Alignment.h`, while being intended for powers of 2, requires using an
alignment of type `llvm::Align`, and needs explicit conversion from
`uint64_t` and similar. Single `Alignment.h` includes `MathExtras.h`,
the `alignTo` being invoked ends up being a generic `alignTo` that does
not require powers of 2, and perform divisions and multiplications.
While some of those might be optimized by the compiler into efficient
power of 2 operations, there's an explicit `alignToPowerOf2` version
that is optimized and asserts the alignment is a power of 2 (with
asserts enabled). Since all the alignments should be power of 2 for the
Mach-O binary format, change from `alignTo` to `alignToPowerOf2` to make
the fact more visible (and get the extra safety net of the assertions).
As expected, the test suite of objcopy doesn't show any regressions, but
I have not done a performance benchmark around this either.
[llvm-objcopy][MachO] Align __LINKEDIT entries to pointer size (#203680)
Align Mach-O __LINKEDIT entries to the target pointer size when building
the tail layout. This matches the behavior of ld64 and lld-macho.
dyld on macOS 27 rejects loading dylibs with misaligned __LINKEDIT
entries.
See #203678 for details and the motivation of this fix.
AI Tool Use Disclosure:
Regarding the PR and the linked issue, I have personally wrote every
single part of the PR by myself, and have/ran/verified every single part
of the issue report as well without any AI tool usage.
I have used LLM-based coding agents only for debugging purposes, e.g. to
figure out why the dylib was not loading (from the original bug report),
and figuring out how to build, run, and test my local `llvm-objcopy`.
[VPlan] Skip shl->mul SCEV rewrite for out-of-range shift amounts. (#204921)
getSCEVExprForVPValue rewrites `shl x, c` as `x * (1 << c)` using
ScalarEvolution::getPowerOfTwo, which asserts that the power is less
than the type's bit width.
Only perform the rewrite when the shift amount is less than the
operand's bit width, to avoid assertion.
[FileCheck] Use default colors in input dumps
This patch makes two improvements to colors used in FileCheck input
dumps:
1. Without this patch, input line numbers and ellipses have a
foreground color of black, which is hard to see in a terminal with
a dark color theme. This patch changes that to the terminal's
default color.
2. Without this patch, the input text is accidentally set to bold when
neither `-v` or `-vv` is specified. Perhaps I never noticed
because I tend to always use `-vv`. This patch changes that to use
the terminal's default color.
Case 2 exposes a problem with LLVM's color implementation. Without
this patch, the call to `WithColor`'s constructor actually specifies
bold as `false`, but `WithColor` ignores that when the color is
`SAVEDCOLOR`. While it seems like that should be fixed, I am
concerned about the impact of such a fix on other tools that might
[12 lines not shown]