Reland "[LV] Support conditional scalar assignments of masked operations" (#180708)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).
For example, the following loop can now be vectorized:
```
int simple_csa_int_load(
int* a, int* b, int default_val, int N, int threshold)
{
int result = default_val;
for (int i = 0; i < N; ++i)
if (a[i] > threshold)
result = b[i];
return result;
}
```
[9 lines not shown]
[mlir][vector] Reuse vector TD op in vector.xfer flatten tests (#180606)
This change adds a `RUN` line in vector-transfer-flatten.mlir that will
use `vector.flatten_vector_transfer_ops` that was introduced in #178134.
It also removes a test added in the original PR whose coverage is
already provided by pre-existing tests.
Extending UniformQuantizedType with interface-based support for new storage types in Quant dialect (#152966)
Currently, UniformQuantizedType only supports built-in MLIR storage
types such as Integer. LLM quantization research introducing feature of
using NF4 as a low precision datatype (see
https://arxiv.org/pdf/2305.14314). There is a growing need to make the
system extensible and maintainable as more types are added. Ensuring
that MLIR can natively support NF4 through a clean, extensible interface
is essential for both current and future quantization workflows.
**Current Approach and Its Limitations:**
- The present implementation relies on dynamic checks (e.g., type
switches or if-else chains) to determine the storage type and retrieve
type-specific information for legality checks.
- This approach works for a small, fixed set of types, but as the number
of supported types grows, the code becomes harder to read, maintain, and
extend.
[23 lines not shown]
InstCombine: Use SimplifyDemandedFPClass on fmul (#177490)
Start trying to use SimplifyDemandedFPClass on instructions, starting
with fmul. This subsumes the old transform on multiply of 0. The
main change is the introduction of nnan/ninf. I do not think anywhere
was systematically trying to introduce fast math flags before, though
a few odd transforms would set them.
Previously we only called SimplifyDemandedFPClass on function returns
with nofpclass annotations. Start following the pattern of
SimplifyDemandedBits, where this will be called from relevant root
instructions.
I was wondering if this should go into InstCombineAggressive, but that
apparently does not make use of InstCombineInternal's worklist.
iflib: Add support for SIOCGIFDOWNREASON ioctl
This change adds native support for the SIOCGIFDOWNREASON ioctl in iflib.
When ifconfig issues SIOCGIFDOWNREASON, the request is now routed through a
new driver callback (IFDI_GET_DOWNREASON). iflib allocates the ifdownreason
structure, calls the driver to fill the down-reason message, and then
returns the data back to ifconfig for display.
Without this change, iflib-based drivers cannot implement link-down reason
reporting even if the hardware provides the information.
No functional change for existing drivers unless they implement the new
IFDI_GET_DOWNREASON method. Existing drivers continue to behave as before.
Reviewed by: gallatin, erj, kgalazka, ssaxena, #iflib
Differential Revision: https://reviews.freebsd.org/D54045
MFC After: 1 week
pci_iov: Reuse downstream bridge bus window if it already covers VF bus
If the parent bridge's [secondary, subordinate] window already covers
the VF bus (e.g., programmed by BIOS or a prior PF), skip allocating
PCI_RES_BUS. This avoids a duplicate rman allocation in the multi-PF
case while still allocating when growth is actually needed.
Reviewed by: ssaxena
Differential Revision: https://reviews.freebsd.org/D52163
MFC After: 1 week
[flang] do not set nuw flag in CSHIFT bound arithmetic (#180520)
Fix https://github.com/llvm/llvm-project/issues/180374
I initially suspected an issue with some lower bound adjustment missing,
and indeed found an unrelated issue because gen1DSection was always
called with all-ones lower bounds because the genLowerbounds was called
on the result fir.shape.
But this is actually not relevant for the issue where this code path is
not exercised. The issue was `nuw` (no unsigned-wrap) was being set on
arithmetic inside the kernel generated for CSHIFT, but because this
arithmetic is dealing with user defined bounds, it may actually have to
deal with negative values (even if the offsets from the CSHIFT itself
are not negative).
This caused LLVM optimization to generate completely invalid code when
the lower bounds of CSHIFT input are zero or less.
[MemorySSA] Relax clobbering checks for calls to consider writes only (#179721)
Now that getModRefInfo for calls handles read and write effects by
examining both calls, the clobbering query no longer needs to treat
reads as clobbers. Update the check to consider writes only, aligning
call handling with other instructions
[cross-project-tests][lldb] Relax llvm::Expected check
The `CHECK` for `(int)` was too strict. On macOS the type prints as:
```
08:46:24 29: (lldb) v -T ExpectedRef
08:46:24 30: (llvm::Expected<int &>) ExpectedRef = {
08:46:24 next:14'0 X error: no match found
08:46:24 31: (std::__1::reference_wrapper<int>::type) value = 100
08:46:24 next:14'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
08:46:24 next:14'1 ? possible intended match
08:46:24 32: }
08:46:24 next:14'0 ~~
```
[flang] Use alias analysis in lowering record assignments (#180628)
Without alias analysis Flang assumes no aliasing in lowering record
assignments which can result in miscompilation of programs using
`SEQUENCE` types and `EQUIVALENCE`.
Use alias analysis to guard the fast path in `genRecordAssignment`;
otherwise fall back to element-wise expansion.
Update FIR FileCheck expectations
Add `FIRAnalysis` to "flang/unittests/Optimizer/CMakeLists.txt" to fix
the Windows x64 build failure (linker error).
Add `SEQUENCE` handling and update tests accordingly.
Fixes #175246 (and includes the fix to
"flang/lib/Optimizer/Builder/CMakeLists.txt" in PR #176483).
Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski at hpe.com>
[SimplifyLibCalls] Directly canonicalize fminimum_num to intrinsic (#180555)
Same as https://github.com/llvm/llvm-project/pull/177988, but for
fminimum_num/fmaximum_num. Directly canonicalize these to the
corresponding intrinsics, and let the shrinking happen directly on the
intrinsics.
[OCaml] Remove global_context (#180533)
This has been deprecated in the C API, so remove it from the OCaml
bindings. create_context and dispose_context should be used instead.
py-coverage: updated to 7.13.4
7.13.4
- Fix: the third-party code fix in 7.13.3 required examining the parent
directories where coverage was run. In the unusual situation that one of the
parent directories is unreadable, a PermissionError would occur, as
described in `issue 2129`_. This is now fixed.
- Fix: in test suites that change sys.path, coverage.py could fail with
"RuntimeError: Set changed size during iteration" as described and fixed in
`pull 2130`_. Thanks, Noah Fatsi.
- We now publish ppc64le wheels, thanks to `Pankhudi Jain <pull 2121_>`_.
openimageio: updated to 3.1.10.0
3.1.10.0
- *perf*: `IBA::resample()` and `oiiotool --resample` improvements to speed up 20x or more
- *ImageBuf*: IB::localpixels_as_[writable_]byte_image_span
- ImageBufAlgo*: IBA::make_texture now honors "maketx:threads" hint
- *heif*: Add IOProxy for input and output
- *heif*: Can not output AVIF when libheif has no HEVC support
- *heif*: Error saving multiple images with different bit depths
- *webp*: Use correct resolution limits for WebpOutput::open
- *webp*: Missing oiio:UnassociatedAlpha on input
- *fix*: Several bug fixes related to internal use of image_span
- *build*: Fix building on OpenBSD
- *deps*: Libheif 1.21 support
- *deps*: Bump build ver to 2.5.1
- *deps*: Use libheif exported config if available
- *tests*: Add new ref image for jpeg test
- *tests*: Fully disable tests when their required dependencies are missing
[5 lines not shown]
[OFFLOAD] Implement excluding filters for debugging (#180538)
Allow a to define a set of Types that are not shown by default when
doing default debug loggin (e.g., LIBOMPTARGET_DEBUG=All).
Users can enable output of those types of messages by explicitly adding
them to LIBOMPTARGET_DEBUG.
Used to implement: #180545
---------
Co-authored-by: Michael Klemm <michael.klemm at amd.com>
[SDAG] Implement missing legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` (#180290)
This lowers the splitting as:
```
any_active(hi_mask)
? (find_last_active(hi_mask) + lo_mask.getVectorElementCount())
: find_last_active(lo_mask)
```
And trivially lowers `<1 x i1>` scalarization to returning zero. Which
is a natural result of the splitting (and the lack of a sentinel
"none-active" result value).
The lowerings likely can be improved. This patch is for completeness.
Should fix:
https://github.com/llvm/llvm-project/pull/178862#issuecomment-3862310334
Fixes #180212