[LVI] Remove unused DL member (NFC)
This is never used (the data layout is taken later from the
module instead) and not even initialized in the legacy PM code path.
[LVI] Store function in LVI wrapper class
We know the function we're working on at construction, there is
no need to have code to fetch the module in every place that
fetches the Impl object.
I'm storing the function instead of the module to be able to get
the block number epoch in a future change.
[CIR][RISCV][NFC] Add CIRGenBuiltinRISCV file to support RISCV builtins codegen (#186050)
This PR adds CIRGenBuiltinRISCV.cpp file for RISCV specific builtins
codegen support.
List all builtins except vector builtins which need tablegen, and mark
them as "NYI".
[NFC][llvm-symbolizer]Replace makeStringError helper with createStringError (#188428)
The local `makeStringError` helper in `llvm-symbolizer.cpp` is
equivalent to `createStringError` from `llvm/Support/Error.h`. Remove it
and use `createStringError` directly at all call sites.
[DA] Hoist division check for early exit in weakCrossingSIVtest (NFC)
This patch moves the check that `Coeff` divides `Delta` earlier in the
function to enable an early exit. Potentially improve performance.
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter whihc will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
Force to inline syscall_impl on all platforms (#186849)
With currently only LIBC_INLINE, we just hint the compiler to inline the
function which however in practice is not always the case.
Since we added `[[gnu::always_inline]]` on linux/x86_64 it makes sense
to do it on all platforms consistently and add a comment explaining why
we need it.
[DA] Optimize parity check in weakCrossingSIVtest (NFC)
This patch simplifies the logic used to determine if the `Distance`
is divisible by 2. Previously, this was done by allocating an APInt
and performing a signed remainder (`srem`) operation.
Since `Distance` is an APInt, we can more efficiently check if it
is odd by directly inspecting the least significant bit (`Distance[0]`).
This avoids an expensive division operation and APInt allocation
while making the code more concise.
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
[OFFLOAD] Improve resource management of the plugin (#187597)
This PR improves event management of the plugin by fixing potential
resource leaks and preventing a potential deadlock
[mlir][arith] Add `arith.convertf` op (#188041)
There are multiple FP types with the same bitwidth. Neither `extf` nor
`truncf` can be used in that case. Add a new `arith.convertf` op that
can be used in such cases. The op is modeled after `arith.truncf`. Also
add a lowering to LLVM.
Discussion:
https://discourse.llvm.org/t/arith-fptofp-vs-arith-extf-arith-truncf/90276
Assisted-by: claude-4.6-opus-high
libclc: Force assuming fast float fma for AMDGPU (#188245)
Currently the build uses the default dummy target, which assumes
FMA is slow. Force this to assume fast fma, which is the case on
any remotely new hardware. In the future if we want better support
for older targets, there should be a separate build of the math
functions for the slow fma case.
[X86] Fix widening for strict_fmin/fmax (#188286)
I believe that widening these with undef is not correct, because the
undef values might be picked as sNaN and then trap.
[DA] Fix the Weak Crossing SIV test when Coeff and Delta are zero (#188203)
The Weak Zero SIV test concluded that there is a dependency only in the
`=`-direction when `Delta` is zero. This is incorrect, because the
coefficients of the addrecs might be zero, in which case the dependency
should have all directions. This patch adds non-zero check for the
coefficient to address the issue.
[lldb] Print correct thread plan in logging code of Thread::ShouldReportRun (#188198)
This code accesses the completed thread plan (even if it's private one).
However, the logging code does not pass `skip_private=false` and instead
accesses only the public completed thread plan. In case there is no
public thread plan, the logging code could also crash.
This is just some minor refactoring that ensures we use the same thread
plan in the logging code.