LLVM/project 3a56470libc/shared rpc.h

[libc] Increase the maximum RPC port size for future hardware (#188756)

Summary:
We store the locks in local device memory for performance and
simplicity. The number here needs to correspond to the maximum occupancy
so that we never have a situation where a GPU thread is blocking another
GPU thread.

The number now is sufficient for most hardware, but modern compute chips
like the MI300x are already pushing ~12000 resident waves. This has ABI
impliciations so I'd like to bump it up sooner rather than later. The
ABI change is within what OpenMP expects, LLVM major versions, and it
will be caught statically so there's no risk of silent corruption (size
doesn't match).
DeltaFile
+3-1libc/shared/rpc.h
+3-11 files

LLVM/project ffd6a13compiler-rt/include/profile InstrProfData.inc, compiler-rt/lib/profile InstrProfilingPlatformOther.c InstrProfilingPlatformGPU.c

[compiler-rt] Rework profile data handling for GPU targets (#187136)

Summary:
Currently, the GPU iterates through all of the present symbols and
copies them by prefix. This is inefficient as it requires a lot of small
high-latency data transfers rather than a few large ones. Additionally,
we force every single profiling symbol to have protected visibility.
This means potentially hundreds of unnecessary symbols in the symbol
table.

This PR changes the interface to move towards the start / stop section
handling. AMDGPU supports this natively as an ELF target, so we need
little changes. Instead of overriding visibility, we use a single table
to define the bounds that we can obtain with one contiguous load.

Using a table interface should also work for the in-progress HIP
implementation for this, as it wraps the start / stop sections into
standard void pointers which will be inside of an already mapped region
of memory, so they should be accessible from the HIP API.

    [13 lines not shown]
DeltaFile
+78-95offload/plugins-nextgen/common/src/GlobalHandler.cpp
+35-12compiler-rt/lib/profile/InstrProfilingPlatformOther.c
+44-0compiler-rt/lib/profile/InstrProfilingPlatformGPU.c
+24-15llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
+36-0compiler-rt/include/profile/InstrProfData.inc
+36-0llvm/include/llvm/ProfileData/InstrProfData.inc
+253-1225 files not shown
+274-13911 files

FreeNAS/freenas 6b7f8e6

Empty commit to create PR on github.

You should reset it
DeltaFile
+0-00 files

FreeNAS/freenas 40b01b8src/middlewared/middlewared/etc_files scst.conf.mako, src/middlewared/middlewared/plugins/failover_ event.py

NAS-140407 / 25.10.2.2 / Fix FC/iSCSI path availability during ALUA failover (#18568)

Fixes FC/iSCSI path availability during HA failover when ALUA is
enabled.

Four independent problems caused paths to drop or I/O to fail during the
`dev_disk` -> `dev_vdisk` LUN swap window:

- **FC path death**: HA iSCSI session logout cascaded through SCST and
removed LUN mappings before the LUN swap, destroying the ALUA tgt_dev
filter and causing LUN NOT SUPPORTED on FC. Fixed by deferring
`reset_active` to after `become_active` has replaced all LUN mappings.

- **90-second global drain**: `activate_extents` wrote `active=1` via
sysfs, triggering `scst_suspend_activity(90s)`. Fixed by removing the
job entirely - `bind_alua_state=1` already handles dev_vdisk file-open
drain-free via `blockio_on_alua_state_change_finish`.

- **LUN replace blocks on in-flight commands**: `scst_acg_repl_lun`

    [10 lines not shown]
DeltaFile
+13-55src/middlewared/middlewared/plugins/failover_/event.py
+1-55src/middlewared/middlewared/plugins/iscsi_/alua.py
+26-0src/middlewared/middlewared/plugins/iscsi_/scst.py
+12-0src/middlewared/middlewared/etc_files/scst.conf.mako
+52-1104 files

LLVM/project 76f8806llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU ctls.ll

[AMDGPU] Remove AMDGPUISD::FFBH_I32 and add ISD::CTLS lowering (#187694)

It's the a continuation of previously reverted
https://github.com/llvm/llvm-project/pull/178420

The patch removes custom AMDGPUISD::FFBH_I32 SelectionDAG node. Call
sites that need raw hardware semantics (LowerINT_TO_FP32, legalizeITOFP)
now use amdgcn_sffbh intrinsic directly. ISD::CTLS is added as a Custom
operation for i32.

Previous attempt had an issue:
The hardware v_ffbh_i32 instruction (v_cls_i32 on newer targets) has
different semantics than ISD::CTLS:
-sffbh returns [1, BitWidth-1] for normal values, -1 for
all-same-bits
-CTLS returns [0, BitWidth-2] for normal values, BitWidth-1 for
all-same-bits

Now LowerCTLS handles this by: sffbh -> umin(sffbh, BitWidth) -> sub 1.

    [6 lines not shown]
DeltaFile
+624-0llvm/test/CodeGen/AMDGPU/ctls.ll
+159-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ctls.mir
+41-2llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+25-0llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+18-1llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+0-4llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
+867-75 files not shown
+873-911 files

LLVM/project 51593c1clang/lib/CodeGen CGObjCMac.cpp

format
DeltaFile
+3-2clang/lib/CodeGen/CGObjCMac.cpp
+3-21 files

LLVM/project 249a3d1llvm/utils/gn/secondary/llvm/lib/Target/NVPTX BUILD.gn

[gn build] Port 28318d5db86f
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/Target/NVPTX/BUILD.gn
+1-01 files

LLVM/project a5cd44fllvm/utils/gn/secondary/compiler-rt/lib/sanitizer_common BUILD.gn

[gn] port 25904ac91554
DeltaFile
+1-0llvm/utils/gn/secondary/compiler-rt/lib/sanitizer_common/BUILD.gn
+1-01 files

LLVM/project a111106clang/lib/CodeGen CodeGenModule.h CGObjCMac.cpp

isPreconditionThunkEnabled -> isObjCDirectPreconditionThunkEnabled
DeltaFile
+3-3clang/lib/CodeGen/CodeGenModule.h
+2-2clang/lib/CodeGen/CGObjCMac.cpp
+1-1clang/lib/CodeGen/CGObjC.cpp
+6-63 files

LLVM/project f08f7ecllvm/utils/gn/secondary/compiler-rt/lib/builtins BUILD.gn

[gn] "port" 80831832e03f
DeltaFile
+3-0llvm/utils/gn/secondary/compiler-rt/lib/builtins/BUILD.gn
+3-01 files

LLVM/project bbd69eellvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/X86 srem-seteq-vec-nonsplat.ll urem-seteq-vec-nonsplat.ll

[TargetLowering] In prepareUREMEqFold/prepareSREMEqFold, fix K=-1 for i64 elements. (#188600)

K is an unsigned, it will be zero extended to uint64_t for
the APInt constructor. If the ShSVT has more than 32 bits, we won't
create an all ones ConstantSDNode.

To fix this, explicitly push an all ones constant to KAmts. This
also fixes an APInt ImplicitTrunc.

This allows turnVectorIntoSplatVector to work for this case.
DeltaFile
+119-0llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
+114-0llvm/test/CodeGen/X86/urem-seteq-vec-nonsplat.ll
+6-10llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+239-103 files

FreeBSD/src 6a1ebd1sys/fs/fuse fuse_node.h fuse_node.c, tests/sys/fs/fusefs read.cc rename.cc

fusefs: redo vnode attribute locking

Previously most fields in fuse_vnode_data were protected by the vnode
lock.  But because DEBUG_VFS_LOCKS was never enabled by default until
stable/15 the assertions were never checked, and many were wrong.
Others were missing.  This led to panics in stable/15 and 16.0-CURRENT,
when a vnode was expected to be exclusively locked but wasn't, for fuse
file systems that mount with "-o async".

In some places it isn't possible to exclusively lock the vnode when
accessing these fields.  So protect them with a new mutex instead.  This
fixes panics and unprotected field accesses in VOP_READ,
VOP_COPY_FILE_RANGE, VOP_GETATTR, VOP_BMAP, and FUSE_NOTIFY_INVAL_ENTRY.
Add assertions everywhere the protected fields are accessed.

Lock the vnode exclusively when handling FUSE_NOTIFY_INVAL_INODE.

During fuse_vnode_setsize, if the vnode isn't already exclusively
locked, use the vn_delayed_setsize mechanism.  This fixes panics during

    [14 lines not shown]
DeltaFile
+192-0tests/sys/fs/fusefs/read.cc
+80-11sys/fs/fuse/fuse_node.h
+90-0tests/sys/fs/fusefs/rename.cc
+71-18sys/fs/fuse/fuse_node.c
+74-7sys/fs/fuse/fuse_vnops.c
+24-18sys/fs/fuse/fuse_internal.c
+531-545 files not shown
+609-7911 files

FreeBSD/src 9ac21f8tests/sys/fs/fusefs bmap.cc

fusefs: add a regression test for a cluster_read bug

VOP_BMAP is purely advisory.  If VOP_BMAP returns an error during
readahead, cluster_read should still succeed, because the actual data
was still read just fine.

Add a regression test for PR 264196, wherein cluster_read would fail if
VOP_BMAP did.

PR:             264196
Reported by:    danfe
Reviewed by:    arrowd
Differential Revision: https://reviews.freebsd.org/D51316

(cherry picked from commit 6d408ac490730614b3ed0ebd3caffcd23f303fb4)
DeltaFile
+87-0tests/sys/fs/fusefs/bmap.cc
+87-01 files

FreeBSD/src 1ebccc3sys/kern vfs_cluster.c

vfs_cluster.c: Do not propagate VOP_BMAP errors to the caller

The code that makes this VOP_BMAP call tries to perform a read-ahead I/O
operation. Failing to do that for any reason isn't fatal for `cluster_read()`,
because we still can return some data to the caller. This change is consistent
with other places within `cluster_read()`, where error returned by VOP_BMAP is
not returned to the caller - see the `if (nblks > 1)` block above the changed
lines and `if (reqbp)` at the end of the function.

PR:     264196
Approved by:    markj, kib
Differential Revision: https://reviews.freebsd.org/D51254

(cherry picked from commit 62aef3f73f38db9fb68bffc12cc8900fecd58f0e)
DeltaFile
+3-1sys/kern/vfs_cluster.c
+3-11 files

FreeBSD/src d069250sys/fs/fuse fuse_ipc.c fuse_ipc.h

fusefs: remove the obsolete rename_lock

This lock was included in the original GSoC submission.  Its purpose
seems to have been to prevent concurrent FUSE_RENAME operations for the
current mountpoint, as well as to synchronize FUSE_RENAME with
fuse_vnode_setparent.  But it's obsolete, now that ef6ea91593e added
mnt_renamelock .

Sponsored by:   ConnectWise
Reviewed by:    kib
Differential Revision: https://reviews.freebsd.org/D55231

(cherry picked from commit 7755a406a6ae3801e885a79f714155f97c4d2bc6)
DeltaFile
+0-2sys/fs/fuse/fuse_ipc.c
+0-2sys/fs/fuse/fuse_ipc.h
+0-2sys/fs/fuse/fuse_vnops.c
+0-63 files

LLVM/project 797916bllvm/lib/Frontend/OpenMP OMPIRBuilder.cpp, mlir/test/Target/LLVMIR omptarget-region-host-device-llvm.mlir

[OpenMP][flang] Fix crash in host offload (#187847)

Guard `getGridValue` in `OMPIRBuilder` to avoid reaching the
`unreachable` in `getGridValue` when offloading to host device without
an explicit num_threads clause.
DeltaFile
+13-3llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+14-0mlir/test/Target/LLVMIR/omptarget-region-host-device-llvm.mlir
+27-32 files

LLVM/project 1422665clang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/CodeGen CIRGenAtomic.cpp

[CIR] Add support for __atomic_fetch_uinc and __atomic_fetch_udec (#188050)

This patch adds CIRGen and LLVM lowering support for the
`__atomic_fetch_uinc` and the `__atomic_fetch_udec` built-in functions.

Assisted-by: Claude Opus 4.6
DeltaFile
+30-0clang/test/CIR/CodeGen/atomic.c
+16-5clang/lib/CIR/CodeGen/CIRGenAtomic.cpp
+14-0clang/test/CIR/IR/atomic.cir
+11-2clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+5-2clang/include/clang/CIR/Dialect/IR/CIROps.td
+76-95 files

LLVM/project 14269b4openmp/runtime/test/taskgraph taskgraph_deps_23.cpp taskgraph_deps_25.cpp

[OpenMP] OpenMP 6.0 "taskgraph" support, add new tests
DeltaFile
+100-0openmp/runtime/test/taskgraph/taskgraph_deps_23.cpp
+86-0openmp/runtime/test/taskgraph/taskgraph_deps_25.cpp
+77-0openmp/runtime/test/taskgraph/taskgraph_deps_3.cpp
+77-0openmp/runtime/test/taskgraph/taskgraph_deps_24.cpp
+73-0openmp/runtime/test/taskgraph/taskgraph_deps_4.cpp
+72-0openmp/runtime/test/taskgraph/taskgraph_deps_15.cpp
+485-021 files not shown
+1,575-027 files

LLVM/project eec9d38openmp/runtime/test/tasking omp_record_replay_multiTDGs.cpp omp_record_replay_print_dot.cpp

[OpenMP] OpenMP 6.0 "taskgraph" support, remove obsolete tests
DeltaFile
+0-76openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp
+0-69openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
+0-63openmp/runtime/test/tasking/omp_record_replay_deps.cpp
+0-58openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp
+0-56openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp
+0-50openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp
+0-3721 files not shown
+0-4207 files

LLVM/project a409a9bclang/include/clang/AST OpenMPClause.h, clang/lib/AST OpenMPClause.cpp

[OpenMP] OpenMP 6.0 "taskgraph" support, frontend parts
DeltaFile
+447-221clang/lib/CodeGen/CGOpenMPRuntime.cpp
+71-2clang/include/clang/AST/OpenMPClause.h
+53-10clang/lib/CodeGen/CGStmtOpenMP.cpp
+28-0clang/lib/Sema/SemaOpenMP.cpp
+26-0clang/lib/Sema/TreeTransform.h
+17-2clang/lib/AST/OpenMPClause.cpp
+642-23511 files not shown
+734-24117 files

LLVM/project 15c75e1clang/include/clang/Driver Driver.h, clang/lib/Driver Driver.cpp

[Driver][HIP] Bundle AMDGPU -S output under the new offload driver (#188262)

[Driver][HIP] Bundle AMDGPU -S output under the new offload driver

The old offload driver emits bundled assembly code for -S in textual
clang-offload-bundler format. This allows a single .s file to contain
assembly
code for both host and devices, which can be consumed by clang. This
eases
manual optimization of assembly code for host and device. There are
existing
HIP tests and examples depending on this feature. The new offload driver
does
not support it, causing regressions. This patch adds support for this
feature
with minor changes to the job action creations.

Fixes: LCOMPILER-553
DeltaFile
+56-6clang/lib/Driver/Driver.cpp
+8-4clang/include/clang/Driver/Driver.h
+3-0clang/test/Driver/hip-phases.hip
+67-103 files

FreeNAS/freenas 065de0csrc/middlewared pyproject.toml

fix deprecated license format in pyproject.toml
DeltaFile
+1-1src/middlewared/pyproject.toml
+1-11 files

NetBSD/pkgsrc 9nGcgq6multimedia/libde265 Makefile

   libde265: cmake file says this needs c++17
VersionDeltaFile
1.16+2-1multimedia/libde265/Makefile
+2-11 files

LLVM/project 19420c0clang/lib/CodeGen CGOpenMPRuntime.cpp, clang/test/OpenMP target_update_codegen.cpp

[OpenMP] Fix non-contiguous array omp target update (#156889)

The existing implementation has three issues which this patch addresses.

1. The last dimension which represents the bytes in the type, has the
wrong stride and count. For example, for a 4 byte int, count=1 and
stride=4. The correct representation here is count=4 and stride=1
because there are 4 bytes (count=4) that we need to copy and we do not
skip any bytes (stride=1).

2. The size of the data copy was computed using the last dimension.
However, this is incorrect in cases where some of the final dimensions
get merged into one. In this case we need to take the combined size of
the merged dimensions, which is (Count * Stride) of the first merged
dimension.

3. The Offset into a dimension was computed as a multiple of its Stride.
However, this Stride which is in bytes, already includes the stride
multiplier given by the user. This means that when the user specified

    [3 lines not shown]
DeltaFile
+102-61offload/test/offloading/non_contiguous_update.cpp
+95-0offload/test/offloading/strided_offset_multidim_update.c
+22-21clang/test/OpenMP/target_update_codegen.cpp
+18-18offload/test/offloading/strided_update_variable_stride_misc.c
+12-8clang/lib/CodeGen/CGOpenMPRuntime.cpp
+9-7offload/test/offloading/strided_update_count_expression_complex.c
+258-1153 files not shown
+276-1249 files

FreeBSD/ports 28324fenet-mgmt/py-pypowerwall Makefile, net-mgmt/py-pypowerwall/files patch-pypowerwall_tedapi_____init____.py

net-mgmt/py-pypowerwall: Fix runtime error, bump PORTREVISION
DeltaFile
+11-0net-mgmt/py-pypowerwall/files/patch-pypowerwall_tedapi_____init____.py
+1-0net-mgmt/py-pypowerwall/Makefile
+12-02 files

LLVM/project 4ca9638llvm/lib/Analysis UniformityAnalysis.cpp

review: avoid adding NeverUniform arg and inst to uniformValues
DeltaFile
+18-11llvm/lib/Analysis/UniformityAnalysis.cpp
+18-111 files

LLVM/project bbc2335llvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h

review: address suggestion on hasDivergence flag
DeltaFile
+26-16llvm/include/llvm/ADT/GenericUniformityImpl.h
+0-3llvm/include/llvm/ADT/GenericUniformityInfo.h
+26-192 files

LLVM/project 675abe6llvm/unittests/Target/AMDGPU UniformityAnalysisTest.cpp CMakeLists.txt

add unit test
DeltaFile
+95-0llvm/unittests/Target/AMDGPU/UniformityAnalysisTest.cpp
+3-0llvm/unittests/Target/AMDGPU/CMakeLists.txt
+98-02 files

LLVM/project 1aaa153llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: address suggestions
DeltaFile
+16-39llvm/include/llvm/ADT/GenericUniformityImpl.h
+15-0llvm/lib/Analysis/UniformityAnalysis.cpp
+31-392 files

LLVM/project fc05de6llvm/include/llvm/ADT GenericSSAContext.h GenericUniformityImpl.h, llvm/lib/CodeGen MachineSSAContext.cpp

review: rename isNeverDivergent to isAlwaysUniform
DeltaFile
+1-1llvm/lib/IR/SSAContext.cpp
+1-1llvm/include/llvm/ADT/GenericSSAContext.h
+1-1llvm/include/llvm/ADT/GenericUniformityImpl.h
+1-1llvm/lib/CodeGen/MachineSSAContext.cpp
+4-44 files