[RISCV] Add regalloc hints for BSETI/BEXTI (#173964)
This patch hints the register allocator to use the same source and
destination registers for the `BEXTI/BSETI` instructions when the
`Xqcibm` vendor extension is enabled. This enables the generation of the
compressed `QC_C_BEXTI/QC_C_BSETI` instructions when possible.
[mlir][Interfaces] Add `RegionBranchOpInterface::getSuccessorOperands` helper (#173971)
Add a helper for querying the successor operands for a region branch
`src -> dst`. Both `src` and `dst` may be the region branch op itself or
a terminator.
This helper allows users to query successor operands for the region
branch op and the terminators in a uniform way. This is similar to
`getSuccessorRegions(RegionBranchPoint)`, which works both for region
branch ops and terminators.
InstCombine: Introduce nsz flag on minimum/maximum in SimplifyDemandedFPClass
Alive isn't particularly happy with this in the case where
one of the inputs could be zero, but I think
it's wrong: https://alive2.llvm.org/ce/z/dF7V6k
nsz shouldn't permit introducing a -0 result where
there wasn't one in the input here.
[libc++][NFC] Simplify `gcd` a bit (#173570)
1. With `if constexpr` we can avoid partial specializations of
`__ct_gcd`. This patch changes it to a function template and renames it
to `__abs_in_type` to slightly improve readability.
2. `__gcd` was made non-recursive by
27a062e9ca7c92e89ed4084c3c3affb9fa39aabb, so this patch simply inlines
it into `gcd`.
[MLIR] Fix mlir-opt crash in ReshapeOpsUtils.cpp when collapse_shape index is invalid (#173791)
This patch fixes a crash occurring in mlir-opt when running
collapse_shape with an invalid index configuration. Instead of crashing,
an error message is returned to the user.
Fixes: #173567
---------
Co-authored-by: Bazinga! <akparmar004>
[CloneFunction] Fix non-deterministic PHI cleanup using PHINode::removeIncomingValueIf()
Previously, we use `std::map<BasicBlock *, unsigned> PredCount` to track excess incoming blocks and removed them one by one using `removeIncomingValue`.
then we remove the excess incoming blocks one by one.
Since `PredCount` use `BasicBlock *` as key, the iteration order depends on the memory addresses of the blocks.
With `PHINode::removeIncomingValue()` changed to use the swapping strategy, the order in which operands are removed affects the final order of the remaining operands in the PHI node. this will cause non-determinism in compiles.
This patch uses `PHINode::removeIncomingValueIf()` to remove invalid incoming blocks that no longer
go to `NewBB` block, fixes the non-determinism.
[AArch64][SME] Vastly simplify and fix `sme-framelower-use-bp.ll` (NFC) (#172999)
This test was added in:
https://github.com/llvm/llvm-project/commit/d4c86e7f3ea298b259e673142470a7b838f5f302
However, over time this test has stopped testing that change. That
change ensures that LLVM sets up the base-pointer in functions with only
+sme (no sve) and dynamic allocas + SVE stack objects.
The original test did not intend to have dynamic allocas or SVE stack
objects though. They were introduced by the IR-based SME ABI pass
unintentionally pushing allocas outside the entry block and SVE spills.
Both of these have been resolved, so this test was not testing the
original change. This patch simplifies the test, and corrects it so
tests the intended functionality.
[MCA][AArch64] Model single-register EXTR as ROR on Neoverse N2 (#172831)
As per the SWOG for [Neoverse
N2](https://developer.arm.com/documentation/109914/latest/), the latency
of a one register bitfield extract should be 1 and the throughput should
be 4. This patch models the single register EXTR (alias ROR) for the
Neoverse N2 model.
[MCA] Fix -mcpu=help flag (#173399)
Previously, using the `-mcpu=help` flag would require an empty stdin to
be passed to print the CPU/Features
list.
- Moves the `MemoryBuffer::getFileOrSTDIN` call below an early return.
- Adds a test mcpu-help.test is included which tests the flag with a
missing file. Previously, this would have resulted in an error with no
outputted help list, but now provides the help list and ignores the
missing file input.
[LLVM][CMake][NFC] Use generator expression to separate CXXFLAGS (#173869)
This avoids looking at the individual sources for mixed C/C++ libraries.
The previous code was written ~2014. Generator expressions were added in
CMake 3.3 (2015). We currently require CMake 3.20 and therefore can rely
on more modern features.
Apart from simplifying the code, this is preliminary work to make more
use of pre-compiled headers (#173868).
[mlir][ods] Fix ODS bug for usePropertiesForAttributes = 0 (#173006)
This fixes invalid cpp generated in the `verifyInvariantsImpl` method
for operations generated from ODS when `usePropertiesForAttributes = 0`
is set on the Dialect.
Fixes the bug introduced in
- https://github.com/llvm/llvm-project/pull/153603
Closes #171217
Reland "[mlir][tensor] Add ValueBoundsOpInterface for ExpandShapeOp and CollapseShapeOp #173356" (#173857)
The original PR #173356 was reverted (commit 5d6c40b) due to an
AddressSanitizer failure
(https://lab.llvm.org/buildbot/#/builders/52/builds/13831).
The failure was caused by incorrect use of a const reference
https://github.com/llvm/llvm-project/pull/173356#discussion_r2643027667,
which bound a reference to a temporary value returned by
`getReassociationIndices()`.
This reland drops the const reference and uses a copy instead.
Signed-off-by: Yu-Zhewen <zhewenyu at amd.com>
[mlir][docs] Add more examples for the "canonical form" (#173667)
Mention that there is no formal definition of the canonical form. Also
add more examples for users to understand what kind of transformations
the community has agreed upon in the past.
---------
Co-authored-by: Mehdi Amini <joker.eph at gmail.com>
[mlir][Transforms][NFC] `remove-dead-values`: Simplify dropped value handling (#173540)
`RDVFinalCleanupList::values` is used only for function op handling. The
functionality for dropping function arg uses can be incorporated into
Step 5 (function op handling). There is no need for a separate step.
[Clang] Remove 't' from __builtin_amdgcn_flat_atomic_fmin/fmax_f64 (#173839)
Allows for type checking depending on the built-in signature.
There is no `f32` version for both builtins
[clang-doc] Add friends to class template
This patch also allows comments to be associated with friend
declarations. Currently, it seems like the comments for friend `RecordDecl`
are taken from the actual class declaration, while a friend
function's comments are taken from the actual `friend` declaration.
[clang-doc] Add friends to class template
This patch also allows comments to be associated with friend
declarations. Currently, it seems like the comments for friend `RecordDecl`
are taken from the actual class declaration, while a friend
function's comments are taken from the actual `friend` declaration.