[GISel] import pattern `(A-(B-C)) to A+(C-B)` (#181676)
This PR imports the rewrite pattern `(A-(B-C)) to A+(C-B)` from
selectionDAG to GlobalISel.
The rewrite should only trigger when `B-C` is used once.
[SLP]Improve reductions for copyables/split nodes
The original support for copyables leads to a regression in x264 in
RISCV, this patch improves detection of the copyable candidates by more
precise checking of the profitability and adds and extra check for
splitnode reduction, if it is profitable.
Fixes #184313
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/185697
[OpenACC][NFC] Generalize wrapMultiBlockRegionWithSCFExecuteRegion (#187359)
Simplify `wrapMultiBlockRegionWithSCFExecuteRegion` by replacing the
`bool convertFuncReturn` parameter with a generic `getNumSuccessors() ==
0` check. Terminators with no successors are by definition region exit
points, so they can be identified automatically without requiring
callers to specify types. This enables downstream dialects (e.g., CUF
with fir::FirEndOp) to reuse the utility without modifying it.
```
// Before:
wrapMultiBlockRegionWithSCFExecuteRegion(region, mapping, loc, rewriter, /*convertFuncReturn=*/true);
// After:
wrapMultiBlockRegionWithSCFExecuteRegion(region, mapping, loc, rewriter);
```
[CIR] Add support for array new with ctor init (#187418)
This adds support for array new initialization that requires calling
constructors.
This diverges a bit from the classic codegen implementation in a couple
of ways. First, we use the cir.array_ctor operation to represent all the
constructor calls that weren't part of an explicit initializer list.
This gets lowered to a loop during the LoweringPrepare pass. Second,
because CIR uses more explicit types, we have to insert a bitcast of the
array pointer to an explicit array type. Third, when an initializer list
is provided and we are calling constructors for the "filler" portion of
the list, we attempt to get the array size as a constant and create a
"tail array" to initialize that is sized to the number of elements
remaining.
[compiler-rt] Define GPU specific handling of profiling functions (#185763)
Summary:
The changes in https://www.github.com/llvm/llvm-project/pull/185552
allowed us to
start building the standard `libclang_rt.profile.a` for GPU targets.
This PR expands this by adding an optimized GPU routine for counter
increment and removing the special-case handling of these functions in
the OpenMP runtime.
Vast majority of these functions are boilerplate, but we should be able
to do more interesting things with this in the future, like value or
memory profiling.
[AMDGPU] Fix alias handling in module splitting functionality (#187295)
Summary:
The module splitting used for `-flto-partitions=8` support (which is
passed by default) did not correctly handle aliases. We mainly need to
do two things: keep the aliases in the they are used in and externalize
them. Internalize linkage needs to be handled conservatively.
This is needed because these aliases show up in PGO contexts.
---------
Co-authored-by: Shilei Tian <i at tianshilei.me>
Merge tag 'pm-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix an idle loop issue exposed by recent changes and a race
condition related to device removal in the runtime PM core code:
- Consolidate the handling of two special cases in the idle loop that
occur when only one CPU idle state is present (Rafael Wysocki)
- Fix a race condition related to device removal in the runtime PM
core code that may cause a stale device object pointer to be
dereferenced (Bart Van Assche)"
* tag 'pm-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Fix a race condition related to device removal
sched: idle: Consolidate the handling of two special cases
Merge tag 'acpi-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support fixes from Rafael Wysocki:
"These fix an MFD child automatic modprobe issue introduced recently,
an ACPI processor driver issue introduced by a previous fix and an
ACPICA issue causing confusing messages regarding _DSM arguments to be
printed:
- Update the format of the last argument of _DSM to avoid printing
confusing error messages in some cases (Saket Dumbre)
- Fix MFD child automatic modprobe issue by removing a stale check
from acpi_companion_match() (Pratap Nirujogi)
- Prevent possible use-after-free in acpi_processor_errata_piix4()
from occurring by rearranging the code to print debug messages
while holding references to relevant device objects (Rafael
Wysocki)"
[4 lines not shown]
17962 klm: refresh_nlm_rpc() handle re-init is missing CLSET_BINDSRCADDR
Reviewed by: Gordon Ross <Gordon.W.Ross at gmail.com>
Approved by: Dan McDonald <danmcd at edgecast.io>
Services: Dnsmasq DNS & DHCP: Since client-id is a valid IPv4 reservation type as well, ensure the lease view handles it correctly. The same is also true for MAC address as IPv6 reservation type.
[DAG] Use value tracking to detect or_disjoint patterns and add a add_like pattern matcher (#187478)
Extend the generic or_disjoint pattern to call haveNoCommonBitsSet, this
allows us to remove the similar x86 or_is_add pattern, use or_disjoint
directly and merge some add/or_is_add matching patterns to use a
add_like wrapper pattern instead
You can not use ibuf_add_n32 for an signed 32bit type.
ibuf_add_nXY() and ibuf_add_hXY() pass values as uint64_t so the sign
extension of a negative 32bit value will cause an overflow check to trigger.
The relative metric field can be negative and so this will trigger this
error. Use ibuf_add() instead, which is more what this should use anyway.
Found the hard way by sthen@ who also debugged it.
OK tb@ sthen@
[mlir][memref] Rewrite scalar `memref.copy` through reinterpret_cast into load/store (#186118)
This change adds a rewrite that simplifies `memref.copy` operations whose
destination is a scalar view produced by `memref.reinterpret_cast`.
The pattern matches cases where a reinterpret cast creates a scalar view
(`sizes = [1, ..., 1]`) into a memref that has a single non-unit dimension. In
this situation the view refers to exactly one element in the base buffer, so
the accessed address depends only on the base pointer and the offset.
The stride information of the view does not affect the accessed element,
because the only valid index into the view is `[0, ..., 0]`.
Therefore the copy can be rewritten into a direct load from the source and a
store into the base memref using the offset from the reinterpret cast.
This makes the `memref.reinterpret_cast` redundant for the copy and simplifies
the IR.
[53 lines not shown]
NAS-140358 / 26.0.0-BETA.1 / Improve container migration for NIC devices, MAC addresses, and CPU pinning (by Qubad786) (#18512)
## Context
Improve Incus-to-container migration with the following changes:
- NIC device validation now correctly checks interface names against
both BRIDGE and MACVLAN choices instead of top-level dict keys
- MAC address lookup uses the Incus device name (e.g. `eth0`) for
volatile config keys instead of the parent interface (e.g. `br0`)
- Preserve CPU pinning (`limits.cpu`) during migration
- Reject underscores in cpuset values that Python's `int()` silently
accepts as numeric separators
Original PR: https://github.com/truenas/middleware/pull/18506
Co-authored-by: M. Rehan <mrehanlm93 at gmail.com>
NAS-140358 / 26.0.0-BETA.2 / Improve container migration for NIC devices, MAC addresses, and CPU pinning (#18506)
## Context
Improve Incus-to-container migration with the following changes:
- NIC device validation now correctly checks interface names against
both BRIDGE and MACVLAN choices instead of top-level dict keys
- MAC address lookup uses the Incus device name (e.g. `eth0`) for
volatile config keys instead of the parent interface (e.g. `br0`)
- Preserve CPU pinning (`limits.cpu`) during migration
- Reject underscores in cpuset values that Python's `int()` silently
accepts as numeric separators
NAS-140362 / 26.0.0-BETA.1 / Fix vm.device.query filters to use attributes.dtype after pydantic conversion (by Qubad786) (#18511)
## Problem
After the Pydantic model conversion for VM devices, `dtype` was moved
into the `attributes` dict. Two callers of `vm.device.query` still
filter on the top-level `dtype` field, causing the filters to match
nothing.
## Solution
Update the query filters in `0012_libvirt_uid_gid.py` and
`zvol_utils.py` to use `attributes.dtype` instead of `dtype`.
Original PR: https://github.com/truenas/middleware/pull/18510
Co-authored-by: M. Rehan <mrehanlm93 at gmail.com>
[NFC][AMDGPU] New test for untested case in SILowerI1Copies (#186127)
[This
line](https://github.com/ambergorzynski/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp#L646)
is untested by the existing LLVM test suite (checked using code coverage
and by inserting an `abort`).
We propose a new test that exercises this case. The test is demonstrated
by adding an abort to show that it is the only test that fails (the
abort is removed before merging).
NAS-140362 / 26.0.0-BETA.2 / Fix vm.device.query filters to use attributes.dtype after pydantic conversion (#18510)
## Problem
After the Pydantic model conversion for VM devices, `dtype` was moved
into the `attributes` dict. Two callers of `vm.device.query` still
filter on the top-level `dtype` field, causing the filters to match
nothing.
## Solution
Update the query filters in `0012_libvirt_uid_gid.py` and
`zvol_utils.py` to use `attributes.dtype` instead of `dtype`.
[mlir][EmitC] Support pointer-based memrefs in load/store lowering (#186828)
## Problem
In the MemRef → EmitC conversion, `memref.load` and `memref.store`
assume that the converted memref operand is an `emitc.array`, as defined
by the type conversion in `populateMemRefToEmitCTypeConversion`.
However, `memref.alloc` is lowered to a `malloc` call returning
`emitc.ptr`. When such values are used by `memref.load` or
`memref.store`, the conversion framework inserts a bridging
`builtin.unrealized_conversion_cast` from `emitc.ptr` to `emitc.array`.
These casts have no EmitC representation and therefore remain in the IR
after conversion, preventing valid C/C++ emission.
## Solution
Extend the `memref.load` and `memref.store` conversions to handle
[74 lines not shown]