[MLIR][Shard] Fix three bugs in ND mesh resharding in Partition pass (#189241)
A new MoveLastSplitAxisPattern class handles the case where the last
grid axis of one tensor dimension is moved to the front of another
tensor dimension's split axes, e.g. [[0, 1], [2]] -> [[0], [1, 2]].
The three bugs fixed are:
1. detectMoveLastSplitAxisInResharding: compared source.back() with
target.back() instead of target.front(), preventing the pattern from
being detected for resharding like [[0,1],[2]] -> [[0],[1,2]].
2. targetShardingInMoveLastAxis: axes were appended with push_back but
should be inserted at the front, producing wrong split_axes order.
3. handlePartialAxesDuringResharding: a copy_if wrote results into the
wrong output variable (addressed structurally by the clean
implementation).
[2 lines not shown]
[flang][NFC] Converted five tests from old lowering to new lowering (part 42) (#191751)
Tests converted from test/Lower/Intrinsics: storage_size.f90, sum.f90,
system_clock.f90, trailz.f90, transfer.f90
[Hexagon] Fix inner CONCAT_VECTORS type in combineConcatOfScalarPreds (#191756)
The inner CONCAT_VECTORS result type was hardcoded to MVT::v8i1, which
is only correct when BitBytes == 1. Otherwise, the inner concat produces
fewer elements than 8, causing an assertion failure:
Assertion `(Ops[0].getValueType().getVectorElementCount() * Ops.size())
== VT.getVectorElementCount() && "Incorrect element count in vector
concatenation!"' failed.
Fix by computing the inner vector type dynamically based on BitBytes.
Update clang/lib/ScalableStaticAnalysisFramework/Analyses/EntityPointerLevel/EntityPointerLevel.cpp
Co-authored-by: Balázs Benics <benicsbalazs at gmail.com>
stand: Force disable RETPOLINE for boot loaders
Boot loaders do not require speculative execution protection, and may be
too large if enabled.
Reported by: Shawn Webb
Reviewed by: dim, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D56068
(cherry picked from commit 61f78130c2f3a6abaa70bd66d6d6974060fb3d04)
stand: Force disable RETPOLINE for boot loaders
Boot loaders do not require speculative execution protection, and may be
too large if enabled.
Reported by: Shawn Webb
Reviewed by: dim, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D56068
(cherry picked from commit 61f78130c2f3a6abaa70bd66d6d6974060fb3d04)
fmax.3: Add caveat for going beyond C std requirements
libm's fmax and fmin family of functions treat +0.0 as greater than
-0.0. This is not required by the C standard, so the user may not see
this behaviour due to compiler optimization.
PR: 294214
Reviewed by: fuz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D56230
(cherry picked from commit 7764e9ca28a9702aed4ba7391e055ec2fcf35c41)
(cherry picked from commit 855507463e0d3903d31aa7c084efbf4f819b5d63)
kqueue_fork_copy_knote(): zero kn_knlist for the copy before calling knlist_add()
Reported by: pho, dhw
Sponsored by: The FreeBSD Foundation
(cherry picked from commit aab1ef4527f1b0935add3e8dba9e928e0623376f)
kqueue_fork_copy_knote(): zero kn_knlist for the copy before calling knlist_add()
Reported by: pho, dhw
Sponsored by: The FreeBSD Foundation
(cherry picked from commit aab1ef4527f1b0935add3e8dba9e928e0623376f)
tunefs: Better fix for arm64 alignment issues
Rather than trust that the compiler will lay out the stack frame the
way we expect it to, use a union to force the correct alignment.
MFC after: 1 week
Fixes: 616f47f176c3 ("tunefs: Fix alignment warning on arm64")
Reviewed by: kevans, mckusick
Differential Revision: https://reviews.freebsd.org/D56245
(cherry picked from commit 8244dd326265867293b2286efc3d571f06ef0dab)
tunefs: Fix pointer arithmetic
While here, remove a bogus const which has been there for years.
MFC after: 1 week
Reported by: ivy@
Fixes: 1b83e8a3f840 ("Constify string pointers.")
[5 lines not shown]
tunefs: Better fix for arm64 alignment issues
Rather than trust that the compiler will lay out the stack frame the
way we expect it to, use a union to force the correct alignment.
MFC after: 1 week
Fixes: 616f47f176c3 ("tunefs: Fix alignment warning on arm64")
Reviewed by: kevans, mckusick
Differential Revision: https://reviews.freebsd.org/D56245
(cherry picked from commit 8244dd326265867293b2286efc3d571f06ef0dab)
tunefs: Fix pointer arithmetic
While here, remove a bogus const which has been there for years.
MFC after: 1 week
Reported by: ivy@
Fixes: 1b83e8a3f840 ("Constify string pointers.")
[5 lines not shown]
tunefs: Better fix for arm64 alignment issues
Rather than trust that the compiler will lay out the stack frame the
way we expect it to, use a union to force the correct alignment.
MFC after: 1 week
Fixes: 616f47f176c3 ("tunefs: Fix alignment warning on arm64")
Reviewed by: kevans, mckusick
Differential Revision: https://reviews.freebsd.org/D56245
(cherry picked from commit 8244dd326265867293b2286efc3d571f06ef0dab)
tunefs: Fix pointer arithmetic
While here, remove a bogus const which has been there for years.
MFC after: 1 week
Reported by: ivy@
Fixes: 1b83e8a3f840 ("Constify string pointers.")
[5 lines not shown]
Fix default for .MAKE.SAVE_DOLLARS
NetBSD make defaults this to "yes",
bmake defauts it to "no" to retain the traditional behavior.
The default is dealt with in bmake's Makefile but that does not
address boot-strap.
For now, just change the ifdef in main.
PR: 294436
[MLIR][Vector] Fix multi_reduction fold to handle empty reduction dims for any rank (#188983)
The fold for `vector.multi_reduction` only handled the rank-1 case with
no reduction dimensions. For higher-rank vectors (e.g.,
`vector<2x3xf32>`) with empty reduction dims `[]`, the fold returned
null, allowing `ElideUnitDimsInMultiDimReduction` to fire incorrectly.
That canonicalization pattern checks that all *reduced* dims have size
1, but with zero reduction dims the check trivially passes, and the
pattern then computes `acc op source` (e.g., `acc + source`) instead of
the correct no-op result (`source`).
This caused `--canonicalize` to produce a different value than
`--lower-vector-multi-reduction` for the same program:
vector.mask %m { vector.multi_reduction <add>, %src, %src [] :
vector<3x3xi32> to vector<3x3xi32> } : vector<3x3xi1> -> vector<3x3xi32>
* Without --lower-vector-multi-reduction: `src + src` (e.g., 2)
* With --lower-vector-multi-reduction: `src` (e.g., 1)
[8 lines not shown]
fmax.3: Add caveat for going beyond C std requirements
libm's fmax and fmin family of functions treat +0.0 as greater than
-0.0. This is not required by the C standard, so the user may not see
this behaviour due to compiler optimization.
PR: 294214
Reviewed by: fuz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D56230
(cherry picked from commit 7764e9ca28a9702aed4ba7391e055ec2fcf35c41)
fmax.3: Add caveat for going beyond C std requirements
libm's fmax and fmin family of functions treat +0.0 as greater than
-0.0. This is not required by the C standard, so the user may not see
this behaviour due to compiler optimization.
PR: 294214
Reviewed by: fuz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D56230
(cherry picked from commit 7764e9ca28a9702aed4ba7391e055ec2fcf35c41)
[LoopUnroll] Fix freqs for unconditional latches: N<=2 (#179520)
As another step in issue #135812, this patch fixes block frequencies
when LoopUnroll converts a conditional latch in an unrolled loop
iteration to unconditional. It thus includes complete loop unrolling
(the conditional backedge becomes an unconditional loop exit), which
might be applied to the original loop or to its remainder loop.
As explained in detail in the header comments on the
fixProbContradiction function that this patch introduces, these
conversions mean LoopUnroll has proven that the original uniform latch
probability is incorrect for the original loop iterations associated
with the converted latches. However, LoopUnroll often is able to perform
these corrections for only some iterations, leaving other iterations
with the original latch probability, and thus corrupting the aggregate
effect on the total frequency of the original loop body.
This patch ensures that the total frequency of the original loop body,
summed across all its occurrences in the unrolled loop after the
[15 lines not shown]