[AArch64][AsmParser] Add MC support for %dtprel() relocation (#186599)
This patch adds support for the %dtprel relocation specifier in the
AArch64 assembler. This specifier is used to generate the
R_AARCH64_TLS_DTPREL64 relocation, which is used in .debug_info sections
to describe the location of thread-local variables.
Prerequisite for https://github.com/llvm/llvm-project/pull/146572
[flang] Fix extra "./" prefix in source file paths (#186212)
When a relative file search path in the current working directory was
passed to LocateSourceFile, an extra `./` was appended, producing paths
like "././test.F90". This caused the `__FILE__` macro to expand
incorrectly and did not match the behavior of other Fortran compilers
such as gfortran. This caused failures when running an external test
suites that parses `__FILE__` to generate test log filenames.
In LocateSourceFile and LocateSourceFileAll: do not append the search
path if it is the current working directory (`.`). In all other cases,
append the search path to the file name (original behavior).
The test expectation in getsymbols02.f90 is updated because the old
CHECK patterns matched the buggy "././" output.
[VPlan] Add the cost of spills when considering register pressure (#179646)
Currently when considering register pressure is enabled, we reject any
VF that has higher pressure than the number of registers. However this
can result in failing to vectorize in cases where it's beneficial, as
the cost of the extra spills is less than the benefit we get from
vectorizing.
Deal with this by instead calculating the cost of spills and adding that
to the rest of the cost, so we can detect this kind of situation and
still vectorize while avoiding vectorizing in cases where the extra cost
makes it not with it.
[MLIR][XeGPU] Avoid crashing on `gpu.func` missing `gpu.return` (#186330)
Skip malformed `gpu.func` operations in `MoveFuncBodyToWarpOp`.
This prevents functions without a `gpu.return` terminator from
triggering an assertion.
Add a regression test covering a `gpu.func` missing a return terminator,
and switch the existing unknown-op test to `test.unknown` so the file no
longer needs `--allow-unregistered-dialect`.
While touching the test file, trim a few FileCheck lines that were
asserting printer details instead of the transform behavior.
Fix https://github.com/llvm/llvm-project/issues/186037
[flang][acc] Add missing dependency for checking CUF attributes (#187292)
PR https://github.com/llvm/llvm-project/pull/187161 introduced some
logic which checks CUF attributes. But this wasn't added properly to the
dependencies.
[flang][NFC] Converted five tests from old lowering to new lowering (part 34) (#187175)
Tests converted from test/Lower/Intrinsics: is_iostat_value.f90,
ishft.f90, ishftc.f90, lbound.f90, leadz.f90
[IR][NFCI] Remove *WithoutDebug (#187240)
The function instructionsWithoutDebug serves two uses: skipping debug
intrinsics and skipping pseudo instructions. Nonetheless, these
functions are expensive due to out-of-line filtering using
std::function. Ideally, the filter should be inlined, but that would
require including IntrinsicInst.h in BasicBlock.h.
We no longer use debug intrinsics, so the first use (parameter false) is
no longer needed. The second use is sometimes needed, but the
distinction between PseudoProbe instructions can be made at the call
sites more easily in many cases.
Therefore, remove instructionsWithoutDebug/sizeWithoutDebug.
c-t-t stage2-O3 -0.21%.
NAS-140346 / 26.0.0-BETA.2 / fix NoRowsWereUpdatedException in zettarepl (by yocalebo) (#18491)
Commit a08212fc46 (NAS-136213, June 2025) changed datastore.update from
raising RuntimeError('No rows were updated') to raising
`NoRowsWereUpdatedException()`.
The except RuntimeError in flush_state was correct before that commit —
it was specifically catching this "no rows updated" case. But when the
exception type was changed, nobody updated flush_state to match, so it
became a silently broken error handler.
To make matters even more confusing, an unrelated change in master
4b7769149f (NAS-140201) fixed this but because that was a much more
involved change it was never back ported.
Original PR: https://github.com/truenas/middleware/pull/18488
Co-authored-by: caleb <yocalebo at gmail.com>
[MLIR] Fix crash in FrozenRewritePatternSet when PDL lowering is skipped by debug counter (#186159)
When using --mlir-debug-counter=pass-execution-skip=N, the MLIR debug
counter can skip the internal PDL-to-PDLInterp lowering pass that runs
inside FrozenRewritePatternSet's constructor. This caused an assertion
failure in PDLByteCode::Generator::generate() because the PDL module
wasn't properly converted to the interpreter dialect.
The fix adds a check after the PDL lowering pipeline runs to verify that
the expected matcher function symbol was produced. If the symbol is
absent (e.g., because the lowering was skipped by a debug counter),
bytecode generation is skipped entirely and PDL patterns are not
applied. This allows debug counter bisection to work without crashing.
Fixes #131441
Fixes #128342
Assisted-by: Claude Code
[VPlan] Use target's index type for {First,Last}ActiveLane instead of i64 (#186361)
Fixes #186005
On RV32 with zve32x, i.e. no legal 64 bit types either scalar or vector,
@llvm.cttz.elts.i64 cannot be lowered and so returns an illegal cost for
scalable VFs. However VPInstruction::FirstActiveLane and
VPInstruction::LastActiveLane always use a hardcoded i64 type.
This causes a legacy/VPlan cost model mismatch in the live-out.ll test,
and in early-exit-live-out.ll prevents the scalable VF from being
chosen.
This PR teaches the two VPInstructions to use the target's index type,
i.e. the width of a pointer in the default address space, so it will
generate a 32 bit cttz.elts on RV32. This should be large enough to hold
the maximum number of elements in a vector, as if the vector was any
bigger it would imply it isn't accessible by memory.
[2 lines not shown]
[libc++] Refactor __is_transparent_v to make it clear what it depends on (#186419)
__is_transparent_v used to accept an additional _Key template argument
whose sole purpose was to make the instantiation as a whole dependent.
It turns out that creates confusion around whether that trait takes into
account the key type (it does not). Instead, we can use our traditional
approach for making template params artificially dependent, which allows
removing the confusing parameter.
For disclaimer, I authored this patch with Claude code just to see if I
could get it to do the right thing. It works, but you have to steer it
right.
Fixes #186417
[InstCombine] RAUW for proven zero-indexed GEPs rather than cloning for a specific user (#185053)
When analyzing operands of loads/stores, if we can guarantee that a GEP
is always zero-indexed, it is better to modify the GEP such that other
users can take advantage of the simplification, rather than just cloning
it for one specific load/store user. Edit: implementation changed to
call replaceInstUsesWith instead of modifying in place.
Without this change, replaceGEPIdxWithZero clones the GEP for the
triggering load/store, leaving the original variable-indexed GEP in
place. Other users of that GEP (e.g., a constant-offset GEP feeding a
second load) miss the simplification. Testcase demonstrates this:
without the first load _modifying_ the gep, the _second_ load will still
be dependent on both GEPs, and thus unnecessarily dependent on the %idx.
This lack of simplification can cause issues with later passes such as
LICM.
Alternative approaches could be to add a version of this transform into
visitGEP, but there is precedent to doing so in visitLoad/visitStore,
[8 lines not shown]
PR bin/60099 fix a (harmless) c&p issue in previous
The test name used in failure error messages suffered from
a c&p problem (cut from the wrong place). No effect upon
the tests themselves, just the error message produced when
a test case fails.
[Offload] Add CMake alias for CI (#186099)
In the pre-merge CI we need a top-level visible target that can be used
to build offload, i.e., libomptarget and LLVMOffload.
The related PR to include offload into pre-merge CI is here:
https://github.com/llvm/llvm-project/pull/174955
[mlir][acc] Move acc routine functions into GPU module (#187161)
The OpenACC routine directive defines functions that may be called from
device code; those functions (and any device-required callees) must be
present in the device compilation unit. This PR introduces
ACCRoutineToGPUFunc pass which moves materialized acc routines into the
GPU module as gpu.func so they can be compiled for the device.
This adds testing showing the pass on both MLIR and FIR. The FIR tests
required improvements in OpenACCSupport implementation to ensure that
CUF and Fortran runtime is considered as legal for GPU.