[LoopPeel] Fix BFI when peeling last iteration without guard
LoopPeel sometimes proves that, when reached, the original loop always
executes at least two iterations. LoopPeel then unconditionally
executes both the remaining loop's initial iteration and the peeled
final iteration. But that increases the latter's frequency above its
frequency in the original loop. To maintain the total frequency, this
patch compensates by decreasing the remaininng loop's latch
probability.
The is another step in issue #135812 and was discussed at
<https://github.com/llvm/llvm-project/pull/166858#discussion_r2528968542>.
[AArch64] Move AArch64SMEAttributes out of Utils library to fix layering. NFC (#168236)
The AArch64 MCTargetDesc library links the Utils library. The
AArch64SMEAttributes.cpp/h requires the Core library and includes files
from AArch64's CodeGen library. These are layering violations.
The MCTargetDesc doesn't need anything from AArch64SMEAttributes.cpp/h
so the easiest fix is to move them to the CodeGen library.
We should probably merge the remaining files in Utils into MCTargetDesc.
[ADT] Group public functions in DenseMap.h (NFC) (#168239)
This patch groups public functions, including the constructors, the
destructor, and the copy/move assignment operators.
[LV] Use VPlan pattern matching in adjustRecipesForReductions (NFC)
Replace the assert checking if CurrentLinkI is a CmpInst with a pattern
matching check in the if condition. This uses VPlan-level pattern matching
instead of inspecting the underlying instruction type.
[SelectionDAG] Fix AArch64 machine verifier bug when expanding LOOP_DEPENDENCE_MASK (#168221)
TargetConstant nodes don't match TableGen ImmLeaf patterns during
instruction selection. When this zero constant flows into the AArch64
CCMP formation code, the machine verifier hits an assertion in expensive
checks.
Fixes: #168227
[mlir][MemRef] Add UB as a dependent dialect and use `ub.poison` for Mem2Reg (#168066)
This patch adds `ub` as a dependent dialect to `memref`, and uses
`ub.poison` as the default value in `AllocaOp::getDefaultValue` for the
mem2reg pass.
This aligns the behavior of `mem2reg` with LLVM, where loading a value
before having a value should be poison.
---------
Signed-off-by: Fabian Mora <fabian.mora-cordero at amd.com>
[Clang] Add __builtin_bswapg (#162433)
Add a new builtin function __builtin_bswapg. It works on any integral
types that has a multiple of 16 bits as well as a single byte.
Closes #160266
Cleanups in AArch64 (#168025)
Forward declare a couple of classes for simplicity, remove some unused
headers, clean up a comment.
Tested with check-all.
[CI] Fix typo in CI Best Practices for the release branch names push filter (#168226)
The CIBestPractices.rst document uses `releases/*` as the branch name
filter for push events. The actual release branch names match the
pattern `release/*`.
[CodeGen] add a command to force global merge
I found that in some performance scenarios, such as under O2, this pr can be helpful for a series of loading global variables.
[ADT] Make DenseMapBase::moveFrom safer (NFC) (#168180)
Without this patch, DenseMapBase::moveFrom() moves buckets and leaves
the moved-from object in a zombie state. This patch teaches
moveFrom() to call kill() so that the move-from object is in a known
good state. This brings moveFrom()'s behavior in line with standard
C++ move semantics.
kill() is implemented so that it takes the fast path in the destructor
-- both destroyAll() and deallocateBuckets().
[MLIR][Transform][Python] Expose applying named_sequences as a method (#168223)
Makes it so that a NamedSequenceOp can be directly applied to a Module,
via a method `apply(...)`.