[CHR] Skip regions containing convergent calls (#180882)
CHR (Control Height Reduction) merges multiple biased branches into a
single speculative check, cloning the region into hot/cold paths. On
GPU targets, the merged branch may be divergent (evaluated per-thread),
splitting the wavefront: some threads take the hot path, others the
cold path.
A convergent call like ds_bpermute (a cross-lane operation on AMDGPU)
requires a specific set of threads to be active — when thread X reads
from thread Y, thread Y must be active and participating in the same
call. After CHR cloning, thread Y may have gone to the cold path while
thread X is on the hot path, so the hot-path ds_bpermute reads a stale
register value from thread Y instead of the intended value.
This caused a miscompilation in rocPRIM's lookback scan: CHR duplicated
a region containing ds_bpermute, and the hot-path copy executed with a
different set of active threads, reading incorrect cross-lane data and
causing a memory access fault.
[2 lines not shown]
[MLIR][Python] Support type definitions in Python-defined dialects (#182805)
In this PR, we added basic support of type definitions in Python-defined
dialects, including:
- IRDL codegen for type definitions
- Type builders like `MyType.get(..)` and type parameter accessors (e.g.
`my_type.param1`)
- Use Python-defined types in Python-defined oeprations
```python
class TestType(Dialect, name="ext_type"):
pass
class Array(TestType.Type, name="array"):
elem_type: IntegerType[32] | IntegerType[64]
length: IntegerAttr
class MakeArrayOp(TestType.Operation, name="make_array"):
arr: Result[Array]
[3 lines not shown]
[NFC][WebAssembly] Expanding load-ext testcases for the MVP CPU target (#182864)
Some features tested in load-ext require sign-ext.
To test this, add tests targeting the MVP CPU.
[ELF] Adjust allowed dynamic relocation types for x86-64 (#182905)
First, disallow R_X86_64_PC64 - generally only absolute relocations are
allowed in getDynRel. glibc and musl don't support R_X86_64_PC64 as
dynamic relocations.
Second, support R_X86_64_32 as dynamic relocation for the ILP32 ABI
(x32). GNU ld's behavior looks like:
- R_X86_64_32 => R_X86_64_RELATIVE
- R_X86_64_64 with addend 0 => R_X86_64_RELATIVE
- R_X86_64_64 with non-zero addend => R_X86_64_RELATIVE64 (unsupported
by musl; compilers do not generate such constructs to the best of my
knowledge)
For now we require R_X86_64_64 to be resolved at link-time for x32.
Fix #140465
[libc][math] Refactor bf16mul family to header-only (#182018)
Refactors the bf16mul math family to be header-only.
Closes https://github.com/llvm/llvm-project/issues/182017
Target Functions:
- bf16mul
- bf16mulf
- bf16mulf128
- bf16mull
[LLVM] Metric added - largest number of basic blocks in a single func… (#182970)
This metric gets the size of the biggest count of basic blocks in a
single function.
Make acpidmar useful for general IOMMU use on amd64.
1. Remove panics in favor of error returns
2. Make unmap ordering clear (PTEs > invalidate IOTLB > free IOVA)
3. Add locking so concurrent mappings cannot race installing intermediate
page table levels (when marked MPSAFE)
For AMD-Vi:
1. Add cache flush for page tables and IVHD command/event data
structures (no-op on coherent IOMMUs)
2. Add per-page/range IOTLB invalidation
3. Fix device/interrupt-table invalidations to be keyed by requester device ID
4. Move batch completion variable from stack to softc
For Intel VT-d:
1. Finish queued invalidation (QI) with batching
2. Add page-selective invalidation (PSI) with address-mask coalescing
[4 lines not shown]
Use fmprintf instead of logit for challenge-response name and info to
preserve UTF-8 characters where appropriate. Prompted by github PR#452,
with & ok djm@.
vfs_mount.c: Don't call VFS_MOUNT() if only exports are being updated
PR#293198 reports a hang within ZFS when exports
are being updated concurrently with a VOP_SETEXTATTR().
The hang appears to be caused by mishandling of the
z_teardown_lock, but fixing handling of this lock appears
to be a major effort. Since the hang occurs when
VFS_MOUNT() acquires a write/exclusive z_teardown_lock,
which rarely occurs, except when exports are being updated,
this patch avoids the VFS_MOUNT() call for this case.
Avoiding a VFS_MOUNT() call fixes the hang for the case
reported by PR#293198 and is also an optimization.
As such, this patch avoids the VFS_MOUNT() call when only exports
are being updated similar to what was already being done
within vnet prisons.
PR: 293198
(cherry picked from commit 935cf3284f520c90a63baaadb762caaa30084f5c)
[NewPM][X86] Port AsmPrinter to NewPM
This patch makes AsmPrinter work with the NewPM. We essentially create
three new passes that wrap different parts of AsmPrinter so that we can
separate out doIntialization/doFinalization without needing to
materialize all MachineFunctions at the same time. This has two main
drawbacks for now:
1. We do not transfer any state between the three new AsmPrinter passes.
This means that debuginfo/CFI currently does not work. This will be
fixed in future passes by moving this state to MachineModuleInfo.
2. We probably incur some overhead by needing to setup up analysis
callbacks for every MF rather than just per module. This should not
be large, and can be optimized in the future on top of this if
needed.
3. This solution is not really clean. However, a lot of cleanup is going
to be difficult to do while supporting two pass managers. Once we
remove LegacyPM support, we can make the code much cleaner and better
enforce invariants like a lack of state between
[5 lines not shown]
[NFCi][NewPM][x86] Use callbacks to get analyses in AsmPrinter
This allows for overriding these call backs when using the NewPM which
has different methods for obtaining analysis results.
Reviewers: RKSimon, arsenm, phoebewang, mingmingl-llvm, aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/182796
[CodeGen][NewPM] Adjust pipeline for AsmPrinter
AsmPrinter needs to be split into three passes (begin, per MF, end) to
avoid the need to materialize all machine functions at the same time.
Update the CodeGenPassBuilder hooks for this.
Reviewers: aeubanks, paperchalice, arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/182795
[CodeGen][NewPM] Plumb MCContext through buildCodeGenPipeline
Otherwise we cannot create an MCStreamer without getting MMI, which we
cannot do until we have started running AsmPrinter without also plumbing
MMI through CodeGenPassBuilder.
Reviewers: arsenm, paperchalice, aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/182794
sysutils/amdmsrtweaker: fix build on recent FreeBSD
bmake has recently started to support $^ in addition to $>, causing
both to expand and leading to a build error like
c++ -O2 -pipe -fstack-protector-strong -fno-strict-aliasing -Wall \
-Werror -pedantic -o amdmsrt Info.o AmdMsrTweaker.o WinRing0.o \
Worker.oInfo.o AmdMsrTweaker.o WinRing0.o Worker.o
c++: error: no such file or directory: 'Worker.oInfo.o'
Fix the error by avoiding both $^ and $>.
Approved by: portmgr (build fix blanket)
MFH: 2026Q1
(cherry picked from commit 87999cd890995b259fa61e70dba80e8a8d153964)
biology/ncbi-cxx-toolkit: only for aarch64, amd64
This port uses SIMD intrincis to compute CRC checksums.
It's probably easy to add a generic code path if desired.
Approved by: portmgr (build fix blanket)
MFH: 2026Q1
Sponsored by: Raptor Computing Systems, LLC
(cherry picked from commit 2ea396c568c8df2627010da3d0f55ba9a98e7a85)
biology/infernal: not for ppc64le
Project can use VMX, but only on big endian platforms.
Approved by: portmgr (build fix blanket)
MFH: 2026Q1
Sponsored by: Raptor Computing Systems, LLC
(cherry picked from commit 8e33257234d52d48e390cce3d55162aebaa6c59d)
biology/ncbi-blast+: only for aarch64, amd64
This port uses SIMD intrincis to compute CRC checksums.
It's probably easy to add a generic code path if desired.
Approved by: portmgr (build fix blanket)
MFH: 2026Q1
Sponsored by: Raptor Computing Systems, LLC
(cherry picked from commit c306479f2b4a4dd7c6d7b7c716574a25a0748986)
japanese/kdrill: fix build
This adds a missing parenthesis to $(LOCALBASE).
It is unclear why this hasn't been noticed earlier.
While we are at it, define LICENSE.
Fixes: 2546bd0290761071e3ad392427d7c2ba4e5a396b
Approved by: portmgr (build fix blanket)
MFH: 2026Q1
Sponsored by: Raptor Computing Systems, LLC
(cherry picked from commit 8552be0c42f43fbc0a2db02c7982e8355c6b52a4)
[NFCi][AsmPrinter] Refactor getting analyses to callbacks
As part of making AsmPrinter work with the new pass manager, we need to
be able to override how we get analyses. This patch does that by
refactoring getting all analyses/other related functionality to
callbacks that are set by default but can be overriden later (like by a
NewPM wrapper pass).
Reviewers: aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/182793