AMDGPU: Don't limit VGPR usage based on occupancy in dVGPR mode (#185981)
The maximum VGPR usage of a shader is limited based on the target
occupancy,
ensuring that the targeted number of waves actually fit onto a CU/WGP.
However, in dynamic VGPR mode, we should not do that, because VGPRs are
allocated
dynamically at runtime, and there are no static constraints based on
occupancy.
Fix that in this patch.
Also fixup the getMinNumVGPRs helper to behave consistently by always
returning
zero in dVGPR mode.
This also fixes a problem where AMDGPUAsmPrinter bumps the VGPR usage to
at least
the result of getMinNumVGPRs, per my understanding in order to avoid an
occupancy
[2 lines not shown]
[AArch64] Add partial reduce patterns for new sve dot variants (#184649)
This patch enables generation of new dot instruction added in 2025 arm
extension from partial reduce nodes.
[IR][NFC] Hot-cold splitting in PatternMatch (#186777)
ConstantAggregates are rare, therefore split that check into a separate
function so that the fast path can be inlined.
Likewise for vectors, which occur much less frequently than scalar
values.
add: resolve shlibs and provides via symlink directory layout
When pkg add installs a package, it now resolves shlibs_required and
abstract requires by looking up provider packages in a pre-built symlink
directory alongside the package archive:
PACKAGEDIR/
All/pkgname-1.0.pkg
shlibs/libfoo.so.1/provider.pkg -> ../../All/provider-1.0.pkg
provides/vi-editor/vim.pkg -> ../../All/vim-9.0.pkg
Provider selection supports:
- Single provider: used directly
- Alphabetically sorted, the first win
System shlibs and already-installed providers are skipped.
Resolution is disabled for stdin and upgrade modes.
Symlink directory creation is a poudriere/ports concern.
[NFC][analyzer] Refactor ExprEngine::processCallExit (#186182)
This commit converts `ExprEngine::processCallExit` to the new paradigm
introduced in 1c424bfb03d6dd4b994a0d549e1f3e23852f1e16 where the current
`LocationContext` and `Block` is populated near the beginning of the
`dispatchWorkItem` call (= elementary analysis step) and remains
available during the whole step.
Unfortunately the first half of the `CallExit` procedure (`removeDead`)
happens within the callee context, while the second half (`PostCall` and
similar callbacks) happen in the caller context -- so I need to change
the current `LocationContext` and `Block` at the middle of this big
method.
This means that I need to discard my invariant that
`setCurrLocationContextAndBlock` is only called once per each
`dispatchWorkItem`; but I think this exceptional case (first half in
callee, second half in caller) is still clear enough.
In addition to this main goal, I perform many small changes to clarify
and modernize the code of this old method.
[ADT] Add `Repeated<T>` for memory-efficient repeated-value ranges (#186721)
Introduce a lightweight range representing N copies of the same value
without materializing a dynamic array. The range owns this value.
I plan to use it with MLIR APIs that often end up requiring N copies of
the same thing. Currently, we use `SmallVector<T>(N, Val)` for these,
which is wasteful.
---------
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[LLVM][CodeGen][SVE] insert_subvector(undef, splat(C), 0) -> splat(C). (#186090)
When converting a fixed-length constant splats to scalable vector we can
instead regenerate the splat using the target type.
Shift NFS protocol xattr and acl tests to sharing_protocols
This commit cleans up the tests slightly and moves them to
the correct portion of our testing framework.
[NFC][analyzer] Eliminate NodeBuilder::getContext() (#186201)
This is a step towards the removal of the type `NodeBuilderContext`.
The few remaining locations that used `NodeBuilder::getContext()` were
changed to use the methods `getCurrBlock()` and `getNumVisitedCurrent()`
of `ExprEngine`.
The new code is equivalent to the old one because the `NodeBuilder`s
were constructed with `ExprEngine::currBldrCtx` as their context, which
is currently the "backend" behind `getCurrBlock()` and
`getNumVisitedCurrent()` -- but these methods will remain valid after
the removal of `NodeBuilderContext` and `currBldrCtx`.
[libclc] Add generic clc_mem_fence instruction (#185889)
Summary:
This can be made generic, which works as expected on NVPTX and SPIR-V.
We do not replace this for AMDGPU because the dedicated built-in has an
extra argument that controls whether or not local memory or global
memory will be invalidated. It would be correct to use this generic
operation there, but we'd lose that minor optimization so we likely
should not regress.
lang/gcc6-aux: unbreak with isl-27, cleanup
ISL27 have noved own isl_val_* definition to dedicated header, isl/val.h
Chase it to unbreak build
Make graphite loop optimization optional; off by default to mimics previous behaviour
Cleanup make environment
PR: 292414
sysutils/rubygem-tmuxinator: fix conflict with shells/fish
They conflict because they both try to install the same completion file.
Fix the conflict by removing the completion file for tmuxinator, and
leaving it for fish.
PR: 293846
MFH: 2026Q1
Approved by: mfechner (ruby)
[C2y] Update the C Status Page from the recent meetings (#186487)
The Feb and Mar 2026 virtual meetings are now concluded, these are the
adopted papers which could potentially impact the compiler.
[IR] Drop BasicBlockEdge::isSingleEdge (#186767)
This was only called on CondBr instructions, where it is always faster
to access the successors directly than to use successors().
Multi-edges don't dominate anything, so this rare case is often already
handled by dominates().
There is also a very small (hardly measurable) performance
improvement here (it did show up in profiles at 0.03% or so).