LLVM/project cfd1cddopenmp/runtime/src z_Windows_NT-586_util.cpp z_Linux_asm.S

Add indirect for kmp_invoke_microtask
DeltaFile
+26-0openmp/runtime/src/z_Windows_NT-586_util.cpp
+4-0openmp/runtime/src/z_Linux_asm.S
+1-0openmp/runtime/src/z_Windows_NT_util.cpp
+31-03 files

OpenZFS/src a459290man/man8 zpool-list.8

zpool-list.8: clarify that only imported pools are listed (#18352)

The man page stated "all pools in the system are listed" which is
misleading, as only imported pools are shown. Clarify this and
add a cross-reference to zpool-import(8) for discovering pools
available for import.

Signed-off-by: Christos Longros <chris.longros at gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
DeltaFile
+3-1man/man8/zpool-list.8
+3-11 files

LLVM/project ada44b0clang/lib/AST/ByteCode Interp.cpp

[clang][bytecode] Avoid a macro redefinition (#188052)

Fixes:

```
/home/b/sanitizer-aarch64-linux/build/llvm-project/clang/lib/AST/ByteCode/Interp.cpp:46:9: error: 'MUSTTAIL' macro redefined [-Werror,-Wmacro-redefined]
   46 | #define MUSTTAIL
      |         ^
/home/b/sanitizer-aarch64-linux/build/llvm-project/clang/lib/AST/ByteCode/Interp.cpp:32:9: note: previous definition is here
   32 | #define MUSTTAIL [[clang::musttail]]
      |         ^
1 error generated.
```
DeltaFile
+1-0clang/lib/AST/ByteCode/Interp.cpp
+1-01 files

OpenZFS/src 8518e3econfig kernel-copy-from-user-inatomic.m4 kernel.m4

Linux 7.0: autoconf: Remove copy-from-user-inatomic API checks (#18348) (#18354)

This function was removed in c6442bd3b643: "Removing old code outside
of 4.18 kernsls", but fails at present on PowerPC builds due to the
recent inclusion of 6bc9c0a90522: "powerpc: fix KUAP warning in VMX
usercopy path" in the upstream kernel, which introduces a use of
cpu_feature_keys[], which is a GPL-only symbol. Removing the API
check as it doesn't appear necessary.

Signed-off-by: John Cabaj <john.cabaj at canonical.com>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
DeltaFile
+0-30config/kernel-copy-from-user-inatomic.m4
+0-2config/kernel.m4
+0-322 files

LLVM/project 27adb8fllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.exp.compr.ll

AMDGPU/GlobalISel: RegBankLegalize rules for exp_compr (#187822)

This intrinsic only accepts vectorTy. Correct the test to use v2s16.
DeltaFile
+19-20llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn-exp-compr.mir
+5-4llvm/test/CodeGen/AMDGPU/llvm.amdgcn.exp.compr.ll
+3-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+27-243 files

FreeNAS/freenas eb523d8src/middlewared/middlewared/api/v26_0_0 cloud_sync_providers.py, src/middlewared/middlewared/api/v27_0_0 cloud_sync_providers.py

Fix cloud sync with S3 provider behind a proxy
DeltaFile
+3-0src/middlewared/middlewared/api/v27_0_0/cloud_sync_providers.py
+3-0src/middlewared/middlewared/api/v26_0_0/cloud_sync_providers.py
+6-02 files

LLVM/project b670265llvm/test/CodeGen/AArch64 sve-intrinsics-perm-select.ll, llvm/test/CodeGen/PowerPC aix32-p8-scalar_vector_conversions.ll

[DAG] ComputeKnownBits - set low bit to zero for ADD(X,X) (#186461)

ADD(X,X) is equivalent to SHL(X,1), so bit[0] is always zero.

This allows downstream folds like `and(add(x,x), 1) -> 0`.

Fixes #186091
DeltaFile
+2,034-2,026llvm/test/CodeGen/X86/clmul-vector.ll
+152-72llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
+11-0llvm/test/CodeGen/X86/known-bits.ll
+4-4llvm/test/CodeGen/PowerPC/aix32-p8-scalar_vector_conversions.ll
+3-3llvm/test/CodeGen/RISCV/xaluo.ll
+3-3llvm/test/CodeGen/RISCV/xqcisls.ll
+2,207-2,1088 files not shown
+2,224-2,12014 files

LLVM/project 0e0dc53llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.readfirstlane.ptr.ll llvm.amdgcn.readfirstlane.ll

AMDGPU/GlobalISel: Use B32 for readfirstlane (#187809)

Using B32 would also add missing pointer support to readfirstlane
intrinsic rule.
DeltaFile
+83-25llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ptr.ll
+40-40llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll
+2-2llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.m0.ll
+1-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+126-684 files

LLVM/project 44df411compiler-rt/cmake/Modules CompilerRTUtils.cmake, compiler-rt/lib/builtins CMakeLists.txt

[compiler-rt][ARM] cmake properties for complicated builtin sources (#179919)

In the builtins library, most functions have a portable C implementation
(e.g. `mulsf3.c`), and platforms might provide an optimized assembler
implementation (e.g. `arm/mulsf3.S`). The cmake script automatically
excludes the C source file corresponding to each assembly source file it
includes. Additionally, each source file name is automatically
translated into a flag that lit tests can query, with a name like
`librt_has_mulsf3`, to indicate that a function is available to be
tested.

In future commits I plan to introduce cases where a single .S file
provides more than one function (so that they can share code easily),
and therefore, must supersede more than one existing source file.

I've introduced the `crt_supersedes` cmake property, which you can set
on a .S file to name a list of .c files that it should supersede. Also,
the `crt_provides` property can be set on any source file to indicate a
list of functions it makes available for testing, in addition to the one
implied by its name.
DeltaFile
+13-0compiler-rt/lib/builtins/CMakeLists.txt
+8-4compiler-rt/cmake/Modules/CompilerRTUtils.cmake
+4-1compiler-rt/test/builtins/CMakeLists.txt
+25-53 files

LLVM/project 35998c7mlir/test/Target/LLVMIR omptarget-region-host-device-llvm.mlir

add test
DeltaFile
+14-0mlir/test/Target/LLVMIR/omptarget-region-host-device-llvm.mlir
+14-01 files

LLVM/project 7ca70c5llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

[AMDGPU] Add structural stall heuristic to scheduling strategies

Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.

- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
  - Resource conflicts on unbuffered resources (from the SchedModel)
  - Sequence-dependent hazards (from GCNHazardRecognizer)

- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
DeltaFile
+41-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+37-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+3-4llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+4-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+95-71 files not shown
+97-77 files

OPNSense/plugins 9c047f8www/OPNProxy Makefile, www/OPNProxy/src/etc/inc/plugins.inc.d opnproxy.inc

www/OPNProxy: fix issue with 2e56601903b39bba
DeltaFile
+1-1www/OPNProxy/Makefile
+1-1www/OPNProxy/src/etc/inc/plugins.inc.d/opnproxy.inc
+2-22 files

LLVM/project b2edc0allvm/lib/Target/WebAssembly WebAssemblyReduceToAnyAllTrue.cpp WebAssemblyTargetMachine.cpp, llvm/test/CodeGen/WebAssembly any-all-true.ll

wasm: recognize `any_true` and `all_true` (#155885)

fixes https://github.com/llvm/llvm-project/issues/129441

cc @lukel97 @badumbatish
https://github.com/llvm/llvm-project/pull/145108

I've been learning a bit about LLVM, trying to make progress on some of
these issues. The code below is based on
https://github.com/llvm/llvm-project/pull/145108#issuecomment-3004561085,
by implementing `shouldExpandReduction`.

The implementation works for the test cases I added, but (obviously)
fails for any existing cases. `ISD::VECREDUCE_AND` and
`ISD::VECREDUCE_OR` are now marked as legal, which is required for the
`Pat`s to fire, but when they don't that causes a selection failure.

So, I'm wondering, what is the right approach here. Should I mark these
intrinsics as `Custom` instead and manually perform the transformation

    [2 lines not shown]
DeltaFile
+133-0llvm/lib/Target/WebAssembly/WebAssemblyReduceToAnyAllTrue.cpp
+131-0llvm/test/CodeGen/WebAssembly/any-all-true.ll
+3-0llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
+2-0llvm/lib/Target/WebAssembly/WebAssembly.h
+1-0llvm/lib/Target/WebAssembly/CMakeLists.txt
+270-05 files

LLVM/project 7fa2752llvm/lib/Support APFloat.cpp

[NFC][Support] Minor code cleanup in APFloat.cpp (#187526)

Minor code cleanup: define variables at their first assignment as
opposed to at the start of functions, and use `[[maybe_unused]]` for
variables used in assert only.
DeltaFile
+106-197llvm/lib/Support/APFloat.cpp
+106-1971 files

LLVM/project ce5a1dfllvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp VOP2Instructions.td, llvm/test/CodeGen/AMDGPU llvm.amdgcn.fdot2.ll llvm.amdgcn.fdot2.f32.bf16.ll

AMDGPU: Improve codegen for VOP2 v_dot2c_f32_f16/bf16 (#179225)

Select VOP2 version when there are no src_modifers, otherwise VOP3.
DeltaFile
+64-212llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+12-60llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll
+20-48llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+32-2llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+14-12llvm/lib/Target/AMDGPU/VOP2Instructions.td
+22-0llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+164-3344 files not shown
+181-33410 files

LLVM/project 0748515llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.mov.dpp.ll

AMDGPU/GlobalISel: RegBankLegalize rules for mov_dpp (#187807)
DeltaFile
+10-7llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mov.dpp.ll
+3-3llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mov.dpp.ll
+4-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+17-103 files

LLVM/project a57db57clang/lib/Sema SemaHLSL.cpp HLSLBuiltinTypeDeclBuilder.cpp, clang/test/CodeGenHLSL/resources Texture2D-Subscript.hlsl

[HLSL] Implement Texture2D::operator[] (#186110)

Implments the Texture2D::operator[] method. It uses the same design as
Buffer::operator[]. However, this requires us to chagne the
resource_getpointer intrinsic to accept integer vectors for the index.

Assisted-by: Gemini
DeltaFile
+74-0clang/test/CodeGenHLSL/resources/Texture2D-Subscript.hlsl
+39-4clang/lib/Sema/SemaHLSL.cpp
+27-1clang/test/SemaHLSL/BuiltIns/resource_getpointer-errors.hlsl
+21-7clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.cpp
+14-14llvm/test/Transforms/SimplifyCFG/DirectX/no-sink-dxgetpointer.ll
+12-12llvm/test/Transforms/GVN/no-sink-dxgetpointer.ll
+187-3818 files not shown
+306-9424 files

LLVM/project 6e5e1c9flang-rt/include/flang-rt/runtime format-implementation.h, flang-rt/lib/runtime io-api.cpp

[flang][flang-rt] Implement F202X leading-zero control edit descriptors LZ, LZS, and LZP for formatted output (F, E, D, and G editing) (#183500)

LZ: processor-dependent (default, flang prints leading zero); LZS:
suppress the optional leading zero before the decimal point; LZP: print
the optional leading zero before the decimal point. Changes span the
source parser, compile-time format validator, runtime format processing,
and runtime output formatting. Includes semantic test (io18.f90) and
documentation updates.
DeltaFile
+379-0flang-rt/unittests/Runtime/LeadingZeroTest.cpp
+126-0flang/test/Semantics/io18.f90
+41-5flang/include/flang/Common/format.h
+22-2flang-rt/include/flang-rt/runtime/format-implementation.h
+23-0flang-rt/lib/runtime/io-api.cpp
+19-1flang/lib/Parser/io-parsers.cpp
+610-816 files not shown
+699-2622 files

LLVM/project 0afc30fllvm/lib/CodeGen/SelectionDAG TargetLowering.cpp

[TargetLowering] Add helper to create FSHR like operation in expandDIVREMByConstant. NFC (#187979)
DeltaFile
+18-24llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+18-241 files

LLVM/project c75b8a1llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp

[TargetLowering] Avoid unnecessary nodes in the chunk loop in expandDIVREMByConstant (#187967)

We don't need an AND on the last iteration. If we shifted the dividend
due to trailing zeros in the divisor, we don't need a chunk that only
contains shifted in zeros.
DeltaFile
+4-4llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+4-41 files

LLVM/project 887c8cfllvm/include/llvm/ADT StringSwitch.h, llvm/unittests/ADT StringSwitchTest.cpp

[ADT] Add predicate based match support to StringSwitch

This introduces `Predicate` and `IfNotPredicate` case selection to
StringSwitch to allow use cases like

```
StringSwitch<...>(..)
  .Case("foo", FooTok)
  .Predicate(isAlpha, IdentifierTok)
...
```

This is mostly useful for improving conciseness and clarity when
processing generated strings, diagnostics, and similar.
DeltaFile
+33-0llvm/unittests/ADT/StringSwitchTest.cpp
+16-0llvm/include/llvm/ADT/StringSwitch.h
+49-02 files

LLVM/project 2088fccflang/lib/Semantics openmp-utils.cpp, flang/test/Semantics/OpenMP tile09.f90

Fix threshold for message about depth reset
DeltaFile
+2-2flang/lib/Semantics/openmp-utils.cpp
+1-0flang/test/Semantics/OpenMP/tile09.f90
+3-22 files

LLVM/project cdaed3bllvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUTargetMachine.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir coexec-sched-warning.mir

[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling

This patch adds the initial coexec scheduler scaffold for machine
learning workloads on gfx1250.

It introduces function and module-level controls for selecting the
AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type`
module flag that maps ML workloads to coexec preRA scheduling and a nop
postRA scheduler by default.

It also updates the coexec scheduler to use a simplified top-down
candidate selection path that considers both available and pending
queues through a single flow, setting up follow-on heuristic work.
DeltaFile
+283-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+124-0llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+43-5llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+46-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+12-9llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+20-0llvm/test/CodeGen/AMDGPU/coexec-sched-warning.mir
+528-144 files not shown
+555-1410 files

FreeBSD/src b5b9517contrib/libcbor CMakeLists.txt, contrib/libcbor/doc/source using.rst

libcbor: Update to 0.13.0

Sponsored by:   The FreeBSD Foundation
DeltaFile
+377-34contrib/libcbor/test/copy_test.c
+225-114contrib/libcbor/CMakeLists.txt
+135-137contrib/libcbor/test/cbor_serialize_test.c
+170-26contrib/libcbor/src/cbor.c
+183-0contrib/libcbor/examples/crash_course.c
+0-174contrib/libcbor/doc/source/using.rst
+1,090-485122 files not shown
+3,665-2,450128 files

LLVM/project ed3d3bfmlir/include/mlir/Dialect/SPIRV/IR SPIRVTosaOps.td SPIRVTosaTypes.td, mlir/test/Dialect/SPIRV/IR tosa-ops.mlir tosa-ops-verification.mlir

[mlir][spirv] Add first 3 data layout ops in TOSA Ext Inst Set (#187714)

This patch introduces the following reduction operators:

spirv.Tosa.Concat
spirv.Tosa.Pad
spirv.Tosa.Reshape

Also dialect and serialization round-trip tests have been added.

Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
DeltaFile
+146-0mlir/include/mlir/Dialect/SPIRV/IR/SPIRVTosaOps.td
+125-0mlir/test/Target/SPIRV/tosa-ops.mlir
+72-0mlir/test/Dialect/SPIRV/IR/tosa-ops.mlir
+72-0mlir/test/Dialect/SPIRV/IR/tosa-ops-verification.mlir
+29-0mlir/include/mlir/Dialect/SPIRV/IR/SPIRVTosaTypes.td
+444-05 files

LLVM/project be31cffflang/test/Lower/Intrinsics mvbits.f90 move_alloc.f90

[flang][NFC] Converted five tests from old lowering to new lowering (part 37) (#188009)

Tests converted from test/Lower/Intrinsics: minval.f90, modulo.f90,
move_alloc.f90, mvbits.f90, not.f90
DeltaFile
+54-60flang/test/Lower/Intrinsics/mvbits.f90
+29-37flang/test/Lower/Intrinsics/move_alloc.f90
+23-32flang/test/Lower/Intrinsics/minval.f90
+22-16flang/test/Lower/Intrinsics/modulo.f90
+6-4flang/test/Lower/Intrinsics/not.f90
+134-1495 files

OPNSense/core f753abcsrc/opnsense/scripts/kea get_kea_leases.py

Simplify diff
DeltaFile
+0-2src/opnsense/scripts/kea/get_kea_leases.py
+0-21 files

LLVM/project 7152266clang/docs ThreadSafetyAnalysis.rst, clang/lib/Analysis ThreadSafety.cpp

Thread Safety Analysis: Support guarded_by/pt_guarded_by with multiple capabilities (#186838)

Previously, `guarded_by` and `pt_guarded_by` only accepted a single
capability argument. Introduce support for declaring that a variable is
guarded by multiple capabilities, which exploits the following property:
any writer must hold all capabilities, so holding any one of them
(exclusive or shared) guarantees at least shared (read) access.
Therefore, writing requires all listed capabilities to be held
exclusively, while reading only requires at least one to be held.

This synchronization pattern is frequently used where the underlying
lock implementation does not support real reader locking, and instead
several lock "shards" are used to reduce contention for readers. For
example, the Linux kernel makes frequent use of this pattern [1].

Backwards compatibility is not affected by this change: for the time
being we deliberately do not change the semantics of multiple stacked
attributes (this retains existing semantics precisely, while giving a
way to choose the "stricter" semantics if needed).

    [2 lines not shown]
DeltaFile
+63-5clang/lib/Analysis/ThreadSafety.cpp
+54-0clang/test/SemaCXX/warn-thread-safety-analysis.cpp
+30-7clang/docs/ThreadSafetyAnalysis.rst
+34-0clang/lib/Sema/AnalysisBasedWarnings.cpp
+12-15clang/lib/Sema/SemaDeclAttr.cpp
+10-12clang/test/SemaCXX/warn-thread-safety-parsing.cpp
+203-399 files not shown
+244-5315 files

LLVM/project b16efa6mlir/include/mlir/Bindings/Python IRCore.h

[mlir][python] Fix PyObjectRef copy/move assignment for MSVC (#186758)

PyObjectRef has a user-declared move constructor but no explicit
copy/move assignment operators. On at least some version of MSVC,
instantiation of operator= is forced, causing a compile error:

```
In file included from mlir/lib/Bindings/Python/Globals.cpp:9:
In file included from mlir/include/mlir/Bindings/Python/IRCore.h:16:
<MSVC>/include/vector(1461,27): error: object of type 'value_type' (aka 'mlir::python::mlir::PyDiagnostic::DiagnosticInfo') cannot be assigned because its copy assignment operator is implicitly deleted
 1461 |                     *_Mid = *_First;
      |                           ^
<MSVC>/include/vector(1539,9): note: in instantiation of function template specialization 'std::vector<mlir::python::mlir::PyDiagnostic::DiagnosticInfo>::_Assign_counted_range<mlir::python::mlir::PyDiagnostic::DiagnosticInfo *>' requested here
 1539 |         _Assign_counted_range(_Right_data._Myfirst, static_cast<size_type>(_Right_data._Mylast - _Right_data._Myfirst));
      |         ^
mlir/include/mlir/Bindings/Python/IRCore.h(1317,33): note: in instantiation of member function 'std::vector<mlir::python::mlir::PyDiagnostic::DiagnosticInfo>::operator=' requested here
 1317 | struct MLIR_PYTHON_API_EXPORTED MLIRError {
      |                                 ^
mlir/include/mlir/Bindings/Python/IRCore.h(369,16): note: copy assignment operator of 'DiagnosticInfo' is implicitly deleted because field 'location' has a deleted copy assignment operator

    [20 lines not shown]
DeltaFile
+12-0mlir/include/mlir/Bindings/Python/IRCore.h
+12-01 files

LLVM/project 7bda811clang/lib/AST/ByteCode Interp.cpp

[clang][bytecode] Disable tailcalls on aarch64 (#188042)

Apparently it causes problems there, too.
See https://lab.llvm.org/buildbot/#/builders/24/builds/18781
DeltaFile
+3-1clang/lib/AST/ByteCode/Interp.cpp
+3-11 files