Patch tryCanonicalizeStructToVector to handle split slice tails (#201434)
We choose a vector alloca over a struct alloca when all users of the
alloca are memory or lifetime intrinsics. But we only accounted for
slices that start in the corresponding partition. We have to also check
that all split slice tails overlapping the partition are memory or
lifetime intrinsics
I also updated the `PassRegistry.def` to include the new pass option
because we forgot to add that.
[mlir][arith] Fix APInt bitwidth mismatch crash in int-range-optimizations (#205110)
Fixes https://github.com/llvm/llvm-project/issues/204909
When an op's `areTypesCompatible()` hook accepts integers of different
widths across a region boundary, the range analysis can propagate a
constant range whose APInt bitwidth does not match the IR type of the
destination value.
This caused `IntegerAttr::get` to `assert` in
`maybeReplaceWithConstant`.
Fix by bailing out in `maybeReplaceWithConstant` when the bitwidths
mismatch, and adding the same check to the needsReplacing lambda in
matchAndRewrite.
The second guard is necessary to mirror the existing isIntOrIndex()
guard — without it the pattern claims success without changing the IR,
causing the greedy rewrite driver to loop.
[14 lines not shown]
website: Turn on verbose asciidoctor build
This shows a lot of typos in anchors. Enable it globally so
people see the typos as they are making them and can fix them.
[RISCV][P-ext] Rename pwcvt/pncvt pseudoinstructions for RV64. (#205227)
We need to add a 'w' to the suffix to indicate it operates on a word and
not a register pair like on RV32. See https://github.com/riscv/riscv-p-spec/pull/303
[SandboxVectorizer] Implement topdown vectorizer
This patch introduces the `top-down-vec` pass to the Sandbox Vectorizer,
adding the ability to traverse use-def chains top-down to discover and
collect vectorization opportunities.
Key changes include:
* TopDownVec Pass: Implemented `TopDownVec` which recursively processes
value bundles top-down, creates vectorization actions (widening, packing,
shuffles), and emits the final vector IR.
* Shared Infrastructure (VecPassBase): Extracted common IR emission logic
out of `BottomUpVec` and into a new shared base class, `VecPassBase`.
Functions for generating vector instructions, handling diamond reuse,
creating shuffles/packs, and collecting dead instructions are now shared
between the bottom-up and top-down vectorizers to prevent code
duplication.
* Pass Registration: Exposed `top-down-vec` in `PassRegistry.def` and
`SandboxVectorizerPassBuilder`, allowing it to be invoked within pass
pipelines via `opt`.
[3 lines not shown]
Reland [Allocator] Keep bump pointer at a minimum alignment (#205240)
Reland #203718 (reverted in #205091) by making computation in integer
domain to avoid UB (nullptr + non-zero offset).
Add a `MinAlign` template parameter (default 8, sizeof(size_t) on 64-bit
platforms) so that the common case `Alignment <= MinAlign` can skip
realigning `CurPtr`.
This is achieved by rounding each allocation's size up to MinAlign, so
the bump pointer stays MinAlign-aligned between allocations.
SpecificBumpPtrAllocator::DestroyAll() walks objects at a fixed
sizeof(T) stride and needs tight packing, so it uses MinAlign=1.
(alignof(T) would
pack just as tightly and reuse the default instantiation, but T may be
incomplete here, e.g. `SpecificBumpPtrAllocator<MCSectionELF>`.)
Its `Allocate` still skips the realign: the slab is max_align_t-aligned
[9 lines not shown]
[OpenMPOpt][Attributor] Selectively seed deglobalization AAs (#198710)
This addresses a compile-time issue observed on a large generated C++
translation unit compiled with `-fopenmp`.
The source code is not OpenMP-heavy. It mainly consists of generated
function-registration wrappers, template instantiations, lambdas, and
small helper functions. However, because the TU is compiled with OpenMP
enabled, `OpenMPOptCGSCCPass` runs and drives Attributor on a module
with many functions.
`OpenMPOpt::registerAAsForFunction` currently eagerly creates the
deglobalization AAs for every function in OpenMP device modules:
* `AAHeapToShared`
* `AAHeapToStack`
Most generated wrapper/helper functions in the motivating workload do
not contain `__kmpc_alloc_shared`, removable allocations, or free-like
[25 lines not shown]
[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.
Change-Id: I639bc723eb001573585cc05d0ad19f2773054f21
Assisted-by: Cursor
[AMDGPU] Pre-commit test for constant-offset named barrier signal_var
A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
With object linking it emits a runtime add of the offset instead of folding
it into the relocation addend.
Change-Id: I7cea0dd64d050eb3e2143841e7136355cbb3bc50
Assisted-by: Cursor
[AMDGPU] Fold constant offsets into named barrier addresses
Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.
Change-Id: Ie05b8c8cd127604ff174c423a74340fd2de4e405
Assisted-by: Cursor
[AMDGPU] Pre-commit test for constant-offset named barrier signal_var
A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
With object linking it emits a runtime add of the offset instead of folding
it into the relocation addend.
Change-Id: I59f0e6fe6a72b4c96c8efb926610f7f2d3833e38
Assisted-by: Cursor
Merge tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"The most notable change is the removal of the fscache backend: it has
been deprecated for almost two years, mainly because EROFS file-backed
mounts and fanotify pre-content hooks (together with erofs-utils) now
provide better functionality and simpler codebase. In addition,
fscache has depended on netfslib for years, which is undesirable for
EROFS since it is a local filesystem. More details in [1].
In addition, sparse support has been added to the pcluster layout,
which is helpful for large sparse AI datasets, and map requests for
chunk-based inodes have been optimized to be more efficient as well.
There are also the usual fixes and cleanups.
Summary:
- Report more consecutive chunks of the same type for
each iomap request
[21 lines not shown]
don't increment scatterlist length twice
this occurs as sg_dma_len() returns the length member of struct scatterlist
where as on x86 linux it returns a dma_length member of the struct
Problem reported by Ryan Fahy in FreeBSD drm-kmod PR 468.
Avoids a 'Data modified on freelist' panic on boot when using discrete
Intel cards (DG2). DG2 has other issues, so remains disabled for now.
[RISCV][P-ext] packed exchanged add/sub codegen (#203473)
Wire up the already-defined exchanged add/sub instructions
pas/psa/psas/pssa/paas/pasa with llvm.riscv.* intrinsics and isel
patterns.