[mlir][shard] Empowering resharding (#180962)
Enabling many more resharding cases by dealing with dimension by dimension,
try-applying various patterns on a single dimension.
tools: Remove unused PluginLoader includes
As far as I can tell there are 2 parallel plugin mechanisms.
opt -load=plugin does not work, and is ignored. opt -load-pass-plugin
does work. The only user of PluginLoader appears to be bugpoint.
[LV] Support argmin/argmax with strict predicates. (#170223)
Extend handleMultiUseReductions to support strict predicates (>, <),
matching the first index instead of the last for non-strict predicates.
Builds on top of https://github.com/llvm/llvm-project/pull/141431.
FindLast reductions with strict predicates are adjusted to compute the
correct result as follows:
1. Find the first canonical indices corresponding to partial min/max
values, using loop reductions.
2. Find which of the partial min/max values are equal to the overall
min/max value.
3. Select among the canonical indices those corresponding to the overall
min/max value.
4. Find the first canonical index of overall min/max and scale it back to
the original IV using VPDerivedIVRecipe.
5. If the overall min/max equals the starting min/max, the condition in
[2 lines not shown]
AMDGPU: Perform libcall recognition to replace fast OpenCL pow (#182135)
If a float-typed call site is marked with afn, replace the 4
flavors of pow with a faster variant.
This transforms pow, powr, pown, and rootn to __pow_fast,
__powr_fast, __pown_fast, and __rootn_fast if available. Also
attempts to handle all of the same basic folds on the new fast
variants that were already performed with the base forms. This
maintains optimizations with OpenCL when the device libs unsafe
math control library is deleted. This maintains the status quo
of how libcalls work, and only handles 4 new entry points. This
only helps with the elimination of the control library, and not
general libcall emission problems.
This makes no practical difference for HIP, which is the status
quo for libcall optimizations. AMDGPULibCalls recognizes the OpenCL
mangled names. e.g., OpenCL float "pow" is really _Z3powff but the
HIP provided function "powf" is really named _ZL4powfff, and std::pow
[5 lines not shown]
[flang][FIR] allow mem2reg over fir.declare (#181848)
This patch adds the possibility for MLIR mem2reg to work over
fir.declare.
Note that mem2reg is not part of FIR pipeline, and this is just part of
work to be able to leverage it.
The patch:
- Adds a fir.declare_value operation
- Implements the PromotableOpInterface for fir.declare simple scalars
and replace it by fir.declare_value.
- Generates llvm.dbg.debug_value from it (when a FusedLoc with a
DILocalVariableAttr is created for it in AddDebugInfo, like for
fir.declare).
acpi: Factor out the power off code into acpi_poweroff()
While here, make it print that we are trying to power off upfront, not
really treating differently power off preparation via
acpi_EnterSleepStatePrep() and actual power off via
AcpiEnterSleepState(), which the user does not care about.
While here, capitalize the messages.
Reviewed by: obiwac
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55226
acpi: Factor out message printing on failure of AcpiEnterSleepStatePrep()
To this end, create a small wrapper, acpi_EnterSleepStatePrep(), which
itself prints the failure message.
While here, when trying to power down (acpi_shutdown_final()), and
AcpiEnterSleepStatePrep() failed, print an additional message more
explicit about the power down request having failed.
Reviewed by: obiwac
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D55225
[lldb][AArch64][Linux] Add support for the Permission Overlay Extension (POE) (#177145)
This change adds initial support for managing the Permission Overlay
Extension (POE). This extension allows userspace programs to change
memory permissions without making a sycall.
This is used to implement Linux's memory protection keys
(https://docs.kernel.org/core-api/protection-keys.html) on AArch64.
Overview of POE:
* Page table entries have a set of permissions. To change these, a
program would have to use a syscall which adds overhead.
* 3 bits of the page table entry are used for a protection key 0-7.
* POE adds a new register "por" (POR_EL0 in the manual) which stores
4-bit sets of permissions.
* The protection key is an index into this por register.
* Permissions in POR are applied on top of the page table permissions,
but may only remove permissions. For example, if you overlay
read/execute over read/write, the result is read. Since execute was not
[14 lines not shown]
i3lock: update to 2.16.
2025-10-31 i3lock 2.16
• fix crash when the user changes the XKB configuration
• when started on Wayland, display an error and usage
• switch to clang-format 15 (with InsertBraces)
• fix -Werror=calloc-transposed-args by swapping calloc args
• reword: remove "dynamic" TWM
• update meson setup command in README
• do not increase failed_attempts beyond 999
• i3lock.1 man page: fix acute accent
• declare a development shell in flake.nix
• fix in_dpi variable checking
• meson: use explicit_bzero if it is available
i3: update to 4.25.1.
This is i3 v4.25.1. This version is considered stable. All users of i3 are
strongly encouraged to upgrade.
cmd_floating: Fix crash when running empty workspace
Fix i3bar workspace buttons for primary screen
Fix ctype(3) function arguments.
use setlocale(3) (NetBSD lacks uselocale(3))
[DA] Remove `DependenceInfo::unifySubscriptType` (#181607)
`DependenceInfo::unifySubscriptType` is a function that takes two
subscripts and casts them to the wider type. Using this function can
sometimes lead to correctness issues, especially when combined with
`DependenceInfo::removeMatchingExtensions`, as in #148435. These two
functions are intended to broaden the scope of DA, but they can also
introduce correctness issues, mainly due to mishandling of `sext`/`zext`
and integer overflows.
To avoid these issues, this patch removes the `unifySubscriptType`
function. Currently, it has only one caller, which is part of the
validation logic for delinearization. Instead of calling
`unifySubscriptType`, this patch adds a type check and bails out if the
types do not match. Note that I'm not entirely sure whether there are
real cases where the types differ and the check is actually necessary.
Also, this patch doesn't include new test cases, as I have not found
concrete examples where `unifySubscriptType` itself causes actual
issues. That is, this patch may be NFC.
Fix #169807
[LV] Fix sub-reduction PHI in vectorized epilogue (#182072)
When the vectorized epilogue loop uses partial reductions, the PHI node
in the loop must start at 0 (because for partial sub-reductions the
sub is done in the middle block) and the compute-reduction-result must
subtract from the partial result (as calculated in the middle block of
the main vector loop), instead of subtracting from the original init
value.
This fixes the issue as reported on #178919 by @aeubanks.