LLVM/project 726dc3cllvm CMakeLists.txt

use before/after for nominmax
DeltaFile
+10-0llvm/CMakeLists.txt
+10-01 files

LLVM/project 4cef1c1llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 bitcnt-big-integer.ll

[X86] canonicalizeShuffleWithOp - add handling for SHUFFLE(PSADBW(X,Y),PSADBW(Z,W)) -> PSADBW(SHUFFLE(X,Z),SHUFFLE(Y,W)) (#188072)

PSADBW takes vXi8 inputs and gives a vXi64 result so we need to tweak
the bitcasts (shuffle types checks will already ensure that the result
type isn't affected).

Minor improvement to #187447
DeltaFile
+7-5llvm/lib/Target/X86/X86ISelLowering.cpp
+5-7llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+12-122 files

LLVM/project fd5f8b1llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

[AMDGPU] Add structural stall heuristic to scheduling strategies

Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.

- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
  - Resource conflicts on unbuffered resources (from the SchedModel)
  - Sequence-dependent hazards (from GCNHazardRecognizer)

- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
DeltaFile
+41-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+38-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+3-4llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+4-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+96-71 files not shown
+98-77 files

LLVM/project c624851llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanHelpers.h, llvm/test/Transforms/LoopVectorize/X86 cost-divisor-overflow.ll

[LoopVectorize] Fix an integer narrowing conversion in `getPredBlockCostDivisor(...)` (#187605)

`LoopVectorizationCostModel::getPredBlockCostDivisor(...)` may return
large `uint64_t` values that get coerced to an `unsigned` by
`VPCostContext::getPredBlockCostDivisor(...)`, which can cause division
by zero.

Fixes #187584
DeltaFile
+67-0llvm/test/Transforms/LoopVectorize/X86/cost-divisor-overflow.ll
+1-1llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+1-1llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+69-23 files

NetBSD/pkgsrc-wip 575fed7. TODO, crush distinfo go-modules.mk

crush: update to 0.51.2
DeltaFile
+225-225crush/distinfo
+74-74crush/go-modules.mk
+1-1crush/Makefile
+0-1TODO
+300-3014 files

LLVM/project 54a3518clang/lib/CodeGen CGHLSLBuiltins.cpp, clang/lib/Headers/hlsl hlsl_alias_intrinsics.h

[HLSL] Add WaveActiveBitAnd builtin function (#187149)

This PR adds the WaveActiveBitAnd HLSL function.
Fixes https://github.com/llvm/llvm-project/issues/99166
DeltaFile
+82-0clang/test/CodeGenHLSL/builtins/WaveActiveBitAnd.hlsl
+34-0clang/lib/Headers/hlsl/hlsl_alias_intrinsics.h
+32-0llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveBitAnd.ll
+23-0clang/test/CodeGenHLSL/builtins/WaveActiveBitAnd-errors.hlsl
+19-0llvm/test/CodeGen/DirectX/WaveActiveBitAnd.ll
+11-0clang/lib/CodeGen/CGHLSLBuiltins.cpp
+201-010 files not shown
+228-116 files

LLVM/project 1aa64ffclang/lib/Sema SemaHLSL.cpp, clang/test/CodeGenHLSL/BasicFeatures InitLists.hlsl

[HLSL] handle hlslAttributedResourceType in init list code (#187813)

Handle HLSL Attributed Resource Type in the init list code. Treat it
like its a scalar value.
Closes #187568
DeltaFile
+20-0clang/test/CodeGenHLSL/BasicFeatures/InitLists.hlsl
+6-4clang/lib/Sema/SemaHLSL.cpp
+26-42 files

OPNSense/core a528dd8src/opnsense/mvc/app/views/OPNsense/Kea leases6.volt

Fix copy paste error in view from v4 to v6
DeltaFile
+10-1src/opnsense/mvc/app/views/OPNsense/Kea/leases6.volt
+10-11 files

LLVM/project 239ca11mlir/include/mlir/Interfaces MemorySlotInterfaces.td, mlir/lib/Dialect/SCF/IR MemorySlot.cpp

[MLIR][Mem2Reg] Add support for region control flow and SCF (#185036)

This PR adds support for region control-flow. Region control-flow and
CFG can be mixed together in the same program. See the [accompanying
RFC](https://discourse.llvm.org/t/rfc-support-region-control-flow-in-mem2reg/90082)
for some design considerations.

Beyond the considerations in the RFC, a few minor changes were
introduced:

- Calling the visitor hook for defined values is now deferred to the end
of promotion.
- The lazy creation of default values has been moved to the places where
it happens to prepare for a future change where it is actually lazy.
Documentation about it not working as intended for now was also added.

All SCF operations are supported, including `forall` and `parallel`,
which is pretty cool I think.


    [11 lines not shown]
DeltaFile
+1,120-0mlir/test/Dialect/SCF/mem2reg.mlir
+414-206mlir/lib/Transforms/Mem2Reg.cpp
+355-0mlir/lib/Dialect/SCF/IR/MemorySlot.cpp
+160-0mlir/test/Dialect/SCF/mem2reg-reject.mlir
+99-4mlir/include/mlir/Interfaces/MemorySlotInterfaces.td
+74-0mlir/test/Transforms/mem2reg.mlir
+2,222-2106 files not shown
+2,250-21812 files

LLVM/project 49311d4llvm/test/CodeGen/AMDGPU memintrinsic-unroll.ll, llvm/test/CodeGen/X86 vector-interleaved-load-i64-stride-7.ll vector-interleaved-load-i8-stride-8.ll

Merge branch 'main' into users/eas/fix-test
DeltaFile
+6,835-6,798llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+5,208-5,214llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-7.ll
+3,046-3,042llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-8.ll
+4,523-0llvm/test/tools/llvm-mca/RISCV/SiFiveX100/rvv/arithmetic.test
+2,034-2,026llvm/test/CodeGen/X86/clmul-vector.ll
+2,034-1,998llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-8.ll
+23,680-19,0781,532 files not shown
+120,402-51,9621,538 files

LLVM/project 94239b3llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.permlane.ll permlane16_opsel.ll

[AMDGPU][GlobalISel] Add RegBankLegalize rules for permlane16/permlanex16 (#187906)

Add RegBankLegalize rules for the amdgcn_permlane16
and amdgcn_permlanex16 intrinsics. Both intrinsics
are sources of divergence, so only the divergent
case is needed: result, old, and src0 map to VGPR,
while src1 and src2 are SGPR with ReadFirstLane if
divergent.

Update the GISEL RUN lines in llvm.amdgcn.permlane.ll
and permlane16_opsel.ll to use -new-reg-bank-select,
and regenerate check lines. The v8i16 test cases now
produce identical SDAG/GISEL output so their checks
are unified.
DeltaFile
+83-169llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
+5-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+2-2llvm/test/CodeGen/AMDGPU/permlane16_opsel.ll
+90-1713 files

OpenZFS/src ef47c3acontrib/pyzfs setup.py.in, scripts spdxcheck.pl

pyzfs: update license tags/classifiers

The standard for package license metadata[1] is a SPDX identifier in the
the `license` and that's all. So, updating that, remove the deprecated
license classifier, and adding a tag at the top of the file for
spdxcheck to find.

1. https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18356
DeltaFile
+2-2contrib/pyzfs/setup.py.in
+0-1scripts/spdxcheck.pl
+2-32 files

LLVM/project b32b31eclang/lib/Analysis/LifetimeSafety FactsGenerator.cpp, clang/test/Sema warn-lifetime-safety.cpp

[LifetimeSafety] Fix compiler crash with `static operator()` (#187853)

This PR removes the first argument from the `Args` list (which is `S()`)
before doing lifetime safety checks to ensure correct indexing.

It also adds a test to prevent regressions in the future

Fixes #187426
<details>
<summary>Bug details</summary>

When calling a `static operator()` directly (with `S()(...)`), we also
store `S()` in `Args` as the first argument, so all indexing is off by
one.
The most interesting part is that `S::operator()(...)` works correctly
and does not add `S()` at the beginning of the argument list, so it does
not crash during lifetime checks.
This solution is probably not the cleanest, but I would love to hear
feedback on where to put it!
</details>
DeltaFile
+24-0clang/test/Sema/warn-lifetime-safety.cpp
+8-1clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+32-12 files

LLVM/project 3073f11compiler-rt/lib/interception interception_win.cpp, compiler-rt/lib/sanitizer_common sanitizer_win.cpp sanitizer_unwind_win.cpp

nominmax
DeltaFile
+16-15compiler-rt/lib/sanitizer_common/sanitizer_win.cpp
+5-4compiler-rt/lib/sanitizer_common/sanitizer_unwind_win.cpp
+3-2libcxxabi/src/cxa_personality.cpp
+3-2compiler-rt/lib/interception/interception_win.cpp
+2-1compiler-rt/test/asan/TestCases/Windows/dll_heap_allocation.cpp
+2-1libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.last_write_time/last_write_time.pass.cpp
+31-2595 files not shown
+128-27101 files

FreeNAS/freenas dcb6184src/middlewared/middlewared/plugins/update_ trains.py, src/middlewared/middlewared/plugins/zfs snapshot_hold_release_impl.py

Avoid checking unnecessary trains

(cherry picked from commit 211c0cd4f76638ba0061a4ef7eda9dd48a488fb2)
DeltaFile
+36-12src/middlewared/middlewared/pytest/unit/plugins/update/test_trains.py
+25-4src/middlewared/middlewared/plugins/update_/trains.py
+2-0src/middlewared/middlewared/plugins/zfs/snapshot_hold_release_impl.py
+63-163 files

NetBSD/src QDp6gyRsys/arch/hp300/conf files.hp300, sys/arch/hp300/include hp300spu.h cpu.h

   Use the new M68K_EC_VAC and M68K_EC_PAC options, based on configured
   model.

   As a transitional step, ensure that the new options are consistent with
   the legacy CACHE_HAVE_{PAC,VAC} defines.
VersionDeltaFile
1.57+11-4sys/arch/news68k/include/cpu.h
1.102+7-7sys/arch/hp300/conf/files.hp300
1.18+13-1sys/arch/hp300/include/hp300spu.h
1.80+5-4sys/arch/hp300/include/cpu.h
1.43+2-2sys/arch/news68k/conf/files.news68k
+38-185 files

LLVM/project 90daea7llvm/lib/Target/AMDGPU VOP3PInstructions.td GCNVOPDUtils.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.h

AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

Codegen for v_dual_dot2acc_f32_f16/bf16 for targets that only have VOP3
version of the instruction.
Since there is no VOP2 version, instroduce temporary mir DOT2ACC pseudo
that is selected when there are no src_modifiers. This DOT2ACC pseudo
has src2 tied to dst (like the VOP2 version), PostRA pseudo expansion will
restore pseudo to VOP3 version of the instruction.
CreateVOPD will recoginize such VOP3 pseudo and generate v_dual_dot2acc.
DeltaFile
+166-491llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+96-95llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+31-4llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+21-8llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+19-1llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+15-0llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+348-5994 files not shown
+367-60110 files

LLVM/project af37ac8llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 reduction-across-different-bb.ll

[SLP]Use reduction root explicitly from reduction analysis to avoid non-determinism

Initially, the reduction root was detected using the last member of the UserIgnoreList set, which is unordered. Better to use the reduction root explicitly to avoid non-determinism in the reduction parent block, which may cause incorrect scale factor estimation for the reduction cost.
DeltaFile
+122-0llvm/test/Transforms/SLPVectorizer/X86/reduction-across-different-bb.ll
+19-16llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+141-162 files

FreeBSD/src b24b533release release.sh, release/scripts pkg-stage.sh

release: Remove not-NO_ROOT cases

We always use NO_ROOT for release artifact builds, so remove the
alternate code paths.

For the first step we set NO_ROOT unconditionally in cases that invoke
submakes, and turn NO_ROOT being unset into an error in lover-level
targets so that we can catch potential out-of-tree build scripts (or
missed in-tree cases) that expect to run not-NO_ROOT builds.  The second
step will be to remove those entirely.

Reviewed by:    cperciva
Sponsored by:   The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D54179

(cherry picked from commit 54e006369c9aab4f3a22f026eb6924c0f9cafda8)
DeltaFile
+83-113release/tools/vmimage.subr
+4-10release/scripts/pkg-stage.sh
+2-11release/tools/azure.conf
+2-8release/tools/vagrant.conf
+2-6release/tools/ec2.conf
+1-3release/release.sh
+94-1516 files

FreeBSD/src 61f0453release Makefile Makefile.vm

release: Use make's `:H` rather than `/..`

In general we want to strip subdir components, rather than appending
`..`s.

Reviewed by:    lwhsu
Sponsored by:   The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D54373

(cherry picked from commit 3949c2b8c4691a6dff8be7b38805d56faab91187)
DeltaFile
+2-2release/Makefile
+1-1release/Makefile.vm
+3-32 files

FreeNAS/freenas 190e6dcsrc/middlewared/middlewared/plugins/update_ trains.py, src/middlewared/middlewared/pytest/unit/plugins/update test_trains.py

Avoid checking unnecessary trains

(cherry picked from commit 211c0cd4f76638ba0061a4ef7eda9dd48a488fb2)
DeltaFile
+27-3src/middlewared/middlewared/pytest/unit/plugins/update/test_trains.py
+25-4src/middlewared/middlewared/plugins/update_/trains.py
+52-72 files

LLVM/project 079be4elibc/utils/libctest format.py

[libc] Support Windows test executables in LibcTest lit format (#188057)

Updated LibcTest to handle Windows test executables:

* Added support for .exe extensions when identifying test executables.
* Skipped the executable bit check on Windows as it is not applicable.
* Updated .params file discovery to look for both <test>.exe.params and
<test>.params.

This allows running libc tests on Windows hosts.
DeltaFile
+31-9libc/utils/libctest/format.py
+31-91 files

LLVM/project 52dd610flang/lib/Semantics check-omp-loop.cpp

Common out similar messages
DeltaFile
+9-8flang/lib/Semantics/check-omp-loop.cpp
+9-81 files

LLVM/project bc61e85libclc/clc/lib/generic/math clc_tgamma.inc clc_tgamma.cl

libclc: Improve tgamma handling
DeltaFile
+213-0libclc/clc/lib/generic/math/clc_tgamma.inc
+12-54libclc/clc/lib/generic/math/clc_tgamma.cl
+225-542 files

LLVM/project 1abf9falibclc/clc/include/clc/math clc_lgamma_r_decl.inc clc_lgamma_r.h, libclc/clc/include/clc/shared unary_with_out_arg_scalarize_loop.inc

libclc: Update lgamma_r

This was originally ported from rocm device libs in
0ab07e1bde7d002f1a4c30babb6241c0cc366320. Merge
in more recent changes.
DeltaFile
+630-0libclc/clc/lib/generic/math/clc_lgamma_r_stret.inc
+27-591libclc/clc/lib/generic/math/clc_lgamma_r.cl
+67-0libclc/clc/include/clc/shared/unary_with_out_arg_scalarize_loop.inc
+16-7libclc/clc/lib/generic/math/clc_lgamma_r.inc
+21-0libclc/clc/include/clc/math/clc_lgamma_r_decl.inc
+3-1libclc/clc/include/clc/math/clc_lgamma_r.h
+764-5991 files not shown
+768-5997 files

LLVM/project da5d421mlir/include/mlir/IR OpImplementation.h, mlir/lib/AsmParser AsmParserImpl.h

[MLIR][TableGen] Make optional enum parser not consume the token when it is not matched (#188008)

Previously the optional parser would consume the token even when it
failed to match a value of the enum and prevented parsers later in the
op syntax from having an attempt. This PR changes that so that the token
is consumed only when the parsing succeeds. This change is made to the
emitted `FieldParser<std::optional<T>>` for enums.

This, for example, allows having a simple list of default valued props
in the assembly format without needing decorations around them. This
mimics the behaviour that is emitted for `DefaultValuedAttribute` when
it is used with `EnumAttr`.

This PR also adds `parseOptionalString` variant with an allow-list
argument as `parseOptionalKeyword` has and adds
`parseOptionalKeywordOrString` allow-list variant which combines these
two into a single utility wrapper. These methods do not consume the
token unless it is from the allow-list.
DeltaFile
+44-0mlir/test/IR/enum-attr-roundtrip.mlir
+33-0mlir/lib/AsmParser/AsmParserImpl.h
+28-0mlir/test/lib/Dialect/Test/TestOpsSyntax.td
+14-5mlir/tools/mlir-tblgen/EnumsGen.cpp
+12-0mlir/include/mlir/IR/OpImplementation.h
+9-0mlir/test/lib/Dialect/Test/TestEnumDefs.td
+140-56 files

LLVM/project 5a58e77llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

[AMDGPU] Add structural stall heuristic to scheduling strategies

Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.

- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
  - Resource conflicts on unbuffered resources (from the SchedModel)
  - Sequence-dependent hazards (from GCNHazardRecognizer)

- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
DeltaFile
+41-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+37-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+3-4llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+4-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+95-71 files not shown
+97-77 files

LLVM/project 3e4efe3llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUTargetMachine.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir coexec-sched-warning.mir

[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling (#169616)

This patch adds the initial coexec scheduler scaffold for machine
learning workloads on gfx1250.

It introduces function and module-level controls for selecting the
AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type`
module flag that maps ML workloads to coexec preRA scheduling and a nop
postRA scheduler by default.

It also updates the coexec scheduler to use a simplified top-down
candidate selection path that considers both available and pending
queues through a single flow, setting up follow-on heuristic work.
DeltaFile
+283-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+124-0llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+43-5llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+46-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+12-9llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+20-0llvm/test/CodeGen/AMDGPU/coexec-sched-warning.mir
+528-144 files not shown
+555-1410 files

LLVM/project d69c670llvm/lib/Target/WebAssembly WebAssemblyTargetTransformInfo.cpp WebAssemblyTargetTransformInfo.h, llvm/test/Transforms/SLPVectorizer/WebAssembly simd-splat-shuffle-cost.ll

[WebAssembly] Add initial shuffle cost capabilities  (#187596)

Fixes #178940

Fixes the case of i16x8, i8x16 manual splat not recognized but the case of i32x4 still remains.
DeltaFile
+406-0llvm/test/Transforms/SLPVectorizer/WebAssembly/simd-splat-shuffle-cost.ll
+19-0llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
+6-0llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
+431-03 files

LLVM/project cfd1cddopenmp/runtime/src z_Windows_NT-586_util.cpp z_Linux_asm.S

Add indirect for kmp_invoke_microtask
DeltaFile
+26-0openmp/runtime/src/z_Windows_NT-586_util.cpp
+4-0openmp/runtime/src/z_Linux_asm.S
+1-0openmp/runtime/src/z_Windows_NT_util.cpp
+31-03 files