[NFC][AArch64] ConditionOptimizer: add CmpCondPair and tryOptimizePair (#187160)
Add CmpCondPair to bundle a compare/conditional instruction pair with
its condition code.
Update applyCmpAdjustment() to take CmpCondPair, and extract core
optimization logic into tryOptimizePair() to be used in both intra- and
cross-block paths.
[mlir][OpenMP] Move taskloop clauses to the context op
The clauses are implemented when lowering the context op (which
generates the runtime calls, and handles the outlining of the task
function: including privatization etc). Therefore I thought it made more
sense to put the clauses on this operation rather than on the wrapped
loop.
RFC: https://discourse.llvm.org/t/rfc-openmp-alloca-placement-for-openmp-loop-wrappers/89512/7
Patch 2/3
Address review comments: mark unused param and move var decl
- Mark the unused 'clauses' parameter in TaskloopOp::build with
[[maybe_unused]]
- Move the declaration of 'wrapperClauseOps' in genStandaloneTaskloop
to immediately before its first use
Assisted-by: Copilot, Claude Sonnet 4.6
[lldb][Process/FreeBSDKernelCore] Rework plugin destruction (#188426)
Destroy the plugin classes similar to `ProcessElfCore`, another process
plugin derived from `PostMortemProcess` class. After clearing thread
list, invoke `Finalize()` to cleanup resources properly. `Finalize()`
will call `DoDestroy()` which releases `m_kvm` via `kvm_close()`.
---------
Signed-off-by: Minsoo Choo <minsoochoo0122 at proton.me>
[mlir][OpenMP] Separate OutlinableInterface from taskloop LoopWrapper
Separate taskloop context and loop lowering into different operations.
This allows us to have separate operations representing the outlinable
interface and the loop wrapper interface so that there is somewhere
better than the loop body to put task-local allocations:
```
omp.taskloop.context {
llvm.alloca ...
omp.taskloop {
omp.loop_nest ... {
...
}
}
omp.terminator
}
```
[11 lines not shown]
[mlir][OpenMP] Rename taskLoopOp/taskloopOp variables to taskLoopWrapperOp/taskloopWrapperOp
Rename local variables for clarity to better reflect the type they hold.
Assisted-by: Copilot, Claude Sonnet 4.6
AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (#179226)
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter which will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
[ARM] Consider register pressure when vectorizing with MVE (#188053)
MVE only has 8 vector registers, so it's not too hard for the vectorizer
to end up using more than that resulting in enough spilling that it's
worse than not vectorizing. Enable
shouldConsiderVectorizationRegPressure for targets with MVE so the
vectorizer doesn't vectorize in those cases.
[mlir][OpenMP] Rename TaskloopOp/omp.taskloop to TaskloopWrapperOp/omp.taskloop.wrapper
Rename the loop wrapper operation to better distinguish it from the
context op (omp.taskloop.context), which handles outlining and runtime calls.
The new name makes the role of each operation clearer at a glance.
RFC: https://discourse.llvm.org/t/rfc-openmp-alloca-placement-for-openmp-loop-wrappers/89512/7
Patch 3/3
Assisted-by: Copilot, Claude Sonnet 4.6
Address review comments: mark unused param and move var decl
- Mark the unused 'clauses' parameter in TaskloopOp::build with
[[maybe_unused]]
- Move the declaration of 'wrapperClauseOps' in genStandaloneTaskloop
to immediately before its first use
Assisted-by: Copilot, Claude Sonnet 4.6
[mlir][OpenMP] Move taskloop clauses to the context op
The clauses are implemented when lowering the context op (which
generates the runtime calls, and handles the outlining of the task
function: including privatization etc). Therefore I thought it made more
sense to put the clauses on this operation rather than on the wrapped
loop.
RFC: https://discourse.llvm.org/t/rfc-openmp-alloca-placement-for-openmp-loop-wrappers/89512/7
Patch 2/3
[LVI] Use block numbers (#188270)
Store the cache as a vector indexed by block numbers instead of a map,
which results in a small compile-time improvement.
[LIBM][AMDLIBM] - New vector calls for cdfnorm and round scalar calls (#187232)
In amdlibm, new vector calls
cdfnorm
amd_vrd2_cdfnorm
amd_vrd4_cdfnorm
amd_vrd8_cdfnorm
round
amd_vrs16_roundf
amd_vrs8_roundf
amd_vrs4_roundf
amd_vrd8_round
amd_vrd4_round
amd_vrd2_round
Link to aocl repo -
[aocl-libm-ose](https://github.com/amd/aocl-libm-ose)
[VPlan] Remove isVector guard in getCostForRecipeWithOpcode. (#188126)
The legacy cost model computes and passes RHSInfo both when widening and
replicating. Match behavior in VPlan-based cost model.
The added test shows that we now compute the same cost as the legacy
cost model.
Without this change, the test added in
llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll would
crash with https://github.com/llvm/llvm-project/pull/187056.
PR: https://github.com/llvm/llvm-project/pull/188126
[analyzer] Untangle subcheckers of CStringChecker (#186802)
It turns out, that some checks for cstring functions happened as a side
effect of other checks. For example, whether the arguments to memcpy
were uninitialized happened during buffer overflow checking.
The way this was implemented is that if alpha.unix.cstring.OutOfBounds
was disabled, alpha.unix.cstring.UninitializedRead couldn't emit any
warnings. It turns out that major modeling steps are early-exited if a
certain checker is disabled!
This patch moved the early returns to the report emission parts --
modeling still happens, only the bug report construction is omitted.
This would mean that if we find a fatal error (like buffer overflow) we
_should_ stop analysis even if we don't emit a warning (thats a part of
doing modeling), but I decided against implementing that.
One hurdle is that CStringChecker is a dependency of MallocChecker, and
the current tests rely on the CStringChecker _not_ terminating execution
[9 lines not shown]
[libc] Fix unused variable warning in utimes.cpp (#188347) (#188448)
Moved the declaration of 'ret' inside the SYS_utimes block to prevent an
unused variable warning on the libc-riscv32-qemu-yocto-fullbuild-dbg
builder, which doesn't define SYS_utimes.