LLVM/project 51dd3ecmlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp

[MLIR][OpenMP] Bail early in sortMapIndices if indices are the same (#169474)

If we are given the same index in the comparator callback, simply return
false. Otherwise we will end up adding invalid items to
occludedChildren, causing extra items to get removed that should not be,
resulting in failures that manifest in different forms (assertions, asan
failures, ubsan failures, etc.).
DeltaFile
+6-0mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+6-01 files

LLVM/project d6c16a7flang/lib/Lower/Support ReductionProcessor.cpp

post-rebase fixes
DeltaFile
+2-1flang/lib/Lower/Support/ReductionProcessor.cpp
+2-11 files

LLVM/project af41e99llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

review comments, Michael
DeltaFile
+4-4llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+2-2llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+6-62 files

LLVM/project 070b6c4flang/lib/Lower/Support ReductionProcessor.cpp

review comments, Tom
DeltaFile
+2-3flang/lib/Lower/Support/ReductionProcessor.cpp
+2-31 files

LLVM/project 8f2acf6flang/include/flang/Optimizer/Dialect FIROps.td, flang/lib/Lower/Support ReductionProcessor.cpp

review comments, Pranav
DeltaFile
+7-2mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+6-2flang/include/flang/Optimizer/Dialect/FIROps.td
+4-0mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+1-1flang/lib/Lower/Support/ReductionProcessor.cpp
+18-54 files

LLVM/project ec80eb7clang/lib/CodeGen CGOpenMPRuntimeGPU.cpp, llvm/unittests/Frontend OpenMPIRBuilderTest.cpp

try to fix Windows build
DeltaFile
+2-2mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+1-1clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+1-1llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+4-43 files

LLVM/project d715c56flang/lib/Lower/Support ReductionProcessor.cpp, llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h

Add `data_ptr_ptr` region to `declare_reduction` op.
DeltaFile
+54-28llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+40-5flang/lib/Lower/Support/ReductionProcessor.cpp
+37-0mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+12-4llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+10-5llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+11-2mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+164-444 files not shown
+183-4710 files

LLVM/project 1863366flang/lib/Lower/Support ReductionProcessor.cpp, llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h

[OpenMP][flang] Add initial support for by-ref reductions on the GPU

Adds initial support for GPU by-ref reductions. In particular, this diff
adds support for reductions on scalar allocatables where reductions
happen on loops nested in `target` regions. For example:

```fortran
  integer :: i
  real, allocatable :: scalar_alloc

  allocate(scalar_alloc)
  scalar_alloc = 0

  !$omp target map(tofrom: scalar_alloc)
  !$omp parallel do reduction(+: scalar_alloc)
  do i = 1, 1000000
    scalar_alloc = scalar_alloc + 1
  end do
  !$omp end target

    [12 lines not shown]
DeltaFile
+126-35llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+92-0mlir/test/Target/LLVMIR/allocatable_gpu_reduction.mlir
+40-13llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+20-4mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+9-1flang/lib/Lower/Support/ReductionProcessor.cpp
+4-4mlir/test/Target/LLVMIR/omptarget-multi-reduction.mlir
+291-5728 files not shown
+327-9034 files

LLVM/project e1b0873llvm/lib/CodeGen RegisterCoalescer.cpp, llvm/test/CodeGen/AArch64 pr164181-reduced.ll pr151592.mir

Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""

This reverts commit bb78728826ff57f3df859e79bfd857b5a175bb6d.
DeltaFile
+0-451llvm/test/CodeGen/X86/subreg-to-reg-coalescing.mir
+0-185llvm/test/CodeGen/X86/coalescer-breaks-subreg-to-reg-liveness.ll
+14-171llvm/lib/CodeGen/RegisterCoalescer.cpp
+0-183llvm/test/CodeGen/AArch64/pr164181-reduced.ll
+0-176llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
+0-168llvm/test/CodeGen/AArch64/pr151592.mir
+14-1,33425 files not shown
+79-1,82331 files

LLVM/project bc4143bllvm/include/llvm/CodeGen SDPatternMatch.h, llvm/unittests/CodeGen SelectionDAGPatternMatchTest.cpp

[DAG] SDPatternMatch - add m_SpecificFP matcher (#167438)

This patch introduces SpecificFP matcher for SelectionDAG nodes.

This includes:

Adding SpecificFP_match() in SDPatternMatch.h.
Adding test coverage in SelectionDAGPatternMatchTest.cpp.

Closes #165566
DeltaFile
+40-0llvm/unittests/CodeGen/SelectionDAGPatternMatchTest.cpp
+22-0llvm/include/llvm/CodeGen/SDPatternMatch.h
+62-02 files

LLVM/project c4b8de2llvm/test/tools/UpdateTestChecks/update_givaluetracking_test_checks/Inputs const.mir.expected const.mir

Fixup test that I somehow missed
DeltaFile
+9-9llvm/test/tools/UpdateTestChecks/update_givaluetracking_test_checks/Inputs/const.mir.expected
+6-6llvm/test/tools/UpdateTestChecks/update_givaluetracking_test_checks/Inputs/const.mir
+15-152 files

LLVM/project 4b137e7lldb/source/Plugins/UnwindAssembly/InstEmulation UnwindAssemblyInstEmulation.cpp

[lldb][NFC] Remove code dupl in favour of a named variable in UnwindAssemblyInstEmulation (#169369)

DeltaFile
+5-5lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
+5-51 files

LLVM/project c98ca39llvm/test/CodeGen/AArch64 arm64-opt-remarks-lazy-bfi.ll

Update AArch64 test for name change
DeltaFile
+2-2llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
+2-21 files

LLVM/project cb63e99llvm/test/Transforms/LoopVectorize vplan-printing.ll vplan-printing-reductions.ll, llvm/test/Transforms/LoopVectorize/AArch64 synthesize-mask-for-call.ll

[VPlan] Include flags in VectorPointerRecipe::printRecipe (#169466)

The change is non-functional with respect to emitted IR.
DeltaFile
+20-20llvm/test/Transforms/LoopVectorize/vplan-printing.ll
+14-14llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+6-6llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll
+6-6llvm/test/Transforms/LoopVectorize/vplan-widen-struct-return.ll
+6-6llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
+6-6llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll
+58-5812 files not shown
+87-8718 files

LLVM/project b769f41llvm/lib/Target/AArch64 MachineSMEABIPass.cpp

Remove [[maybe_unused]]

Change-Id: I762a5672deffef003dea39832f8fa11c202a78cb
DeltaFile
+1-2llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+1-21 files

LLVM/project 2ccc561llvm/lib/Target/AArch64 MachineSMEABIPass.cpp

Add comment

Change-Id: I29831358c352e68eb5838bb4d8f2e424ba415adb
DeltaFile
+5-0llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+5-01 files

LLVM/project 847d296llvm/lib/Target/AArch64 MachineSMEABIPass.cpp

Make transistions to OFF explicit

Change-Id: I80a6f954f308269684f205098ee43eb20e1bd670
DeltaFile
+11-9llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+11-91 files

LLVM/project 840c080llvm/lib/Target/AArch64 MachineSMEABIPass.cpp, llvm/test/CodeGen/AArch64 sme-za-exceptions.ll

Add extra test (and missing transition)

Change-Id: If4df9272e1951487a0491b734293f3265024c6a9
DeltaFile
+149-0llvm/test/CodeGen/AArch64/sme-za-exceptions.ll
+3-2llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+152-22 files

LLVM/project 47a8e8fllvm/lib/Target/AArch64 MachineSMEABIPass.cpp

TODO

Change-Id: Ib00ead9b59fdf77151ea21a26341d3d4b6502a32
DeltaFile
+1-1llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+1-11 files

LLVM/project f6a6d49llvm/lib/Target/AArch64 MachineSMEABIPass.cpp AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 sme-za-exceptions.ll sme-zt0-state.ll

[AArch64][SME] Support saving/restoring ZT0 in the MachineSMEABIPass

This patch extends the MachineSMEABIPass to support ZT0. This is done
with the addition of two new states:

  - `ACTIVE_ZT0_SAVED`
    * This is used when calling a function that shares ZA, but does
      share ZT0 (i.e., no ZT0 attributes).
    * This state indicates ZT0 must be saved to the save slot, but
      must remain on, with no lazy save setup
  - `LOCAL_COMMITTED`
    * This is used for saving ZT0 in functions without ZA state.
    * This state indicates ZA is off and ZT0 has been saved.
    * This state is general enough to support ZA, but those
      have not been implemented†

To aid with readability, the state transitions have been reworked to a
switch of `transitionFrom(<FromState>).to(<ToState>)`, rather than
nested ifs, which helps manage more transitions.

    [5 lines not shown]
DeltaFile
+150-26llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+96-28llvm/test/CodeGen/AArch64/sme-za-exceptions.ll
+60-44llvm/test/CodeGen/AArch64/sme-zt0-state.ll
+8-3llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+6-0llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+0-4llvm/test/CodeGen/AArch64/sme-peephole-opts.ll
+320-1051 files not shown
+321-1057 files

LLVM/project 7abe483llvm/utils/UpdateTestChecks mir.py

python formatting, please ignore
DeltaFile
+2-0llvm/utils/UpdateTestChecks/mir.py
+2-01 files

LLVM/project 0516726llvm/lib/Target/AMDGPU SIFrameLowering.cpp

Clang format... Please ignore
DeltaFile
+6-6llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+6-61 files

LLVM/project ea8199eflang-rt/lib/runtime cudadevice.f90 __ppc_intrinsics.f90, flang/module cudadevice.f90 __ppc_intrinsics.f90

Merge branch 'main' into users/rovka/inheritable-legacy-fp-manager
DeltaFile
+2,242-0flang-rt/lib/runtime/cudadevice.f90
+0-2,242flang/module/cudadevice.f90
+1,911-0flang-rt/lib/runtime/__ppc_intrinsics.f90
+0-1,911flang/module/__ppc_intrinsics.f90
+456-878llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+1,122-0flang-rt/lib/runtime/mma.f90
+5,731-5,031117 files not shown
+9,548-9,129123 files

LLVM/project 795dbd1llvm/include/llvm/CodeGen TargetPassConfig.h, llvm/lib/CodeGen TargetPassConfig.cpp

[AMDGPU] Insert inliner anchor earlier

Add a new hook for inserting passes right after the last DummyCGSCC pass
and use it to insert the anchor. This changes the last FunctionPass
manager to be an inlining pass manager, thus preserving some of the
analyses that might be computed before the inliner and used after it (to
be fair that's never going to be a lot of analyses, since inlining is
pretty plastic, but at least some of the IR-level analyses that have
absolutely no reason to change can be computed only once).

This is how I originally designed the code, but I don't feel like I have
a good name/abstraction for this exact point in the pipeline, hence the
separate patch.
DeltaFile
+9-0llvm/include/llvm/CodeGen/TargetPassConfig.h
+2-5llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+6-0llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+3-1llvm/lib/CodeGen/TargetPassConfig.cpp
+20-64 files

LLVM/project 7361d2allvm/lib/Target/AMDGPU AMDGPUMachineLevelInliner.cpp AMDGPUMachineLevelInliner.h, llvm/test/CodeGen/AMDGPU amdgpu-machine-level-inliner-mfi.mir pal-metadata-3.6-inliner.ll

[AMDGPU] Update machine frame info during inlining

Update some of the machine frame info while inlining functions. The
stack of the caller will now contain an additional object representing
the stacks of its callees that have been inlined.

Also update some other info such as HasCalls and a few other pieces of
info that are trivial to update (this isn't very thorough or exhaustive,
and notably doesn't handle tail calls).
DeltaFile
+651-0llvm/test/CodeGen/AMDGPU/amdgpu-machine-level-inliner-mfi.mir
+199-0llvm/test/CodeGen/AMDGPU/pal-metadata-3.6-inliner.ll
+123-0llvm/test/CodeGen/AMDGPU/amdgpu-machine-level-inliner.ll
+99-0llvm/test/CodeGen/AMDGPU/amdgpu-machine-level-inliner.mir
+71-0llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.cpp
+16-0llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.h
+1,159-06 files

LLVM/project 244e62bllvm/lib/Target/AMDGPU SIFrameLowering.cpp GCNSubtarget.h

[AMDGPU] Move getScratchScaleFactor to ST. NFC
DeltaFile
+6-10llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+4-0llvm/lib/Target/AMDGPU/GCNSubtarget.h
+10-102 files

LLVM/project 6e49828llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs amdgpu-inlined-function.ll.expected, llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs inlined-function.mir.expected inlined-function-swapped.mir.expected

Support inlining in backend update scripts

Generate CHECK-NOT for MIR functions that are missing from the output.
Also look for conflicts where a MIR function is generated for some runs
but not others with the same prefixes.
DeltaFile
+129-0llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu-inlined-function.ll.expected
+55-8llvm/utils/UpdateTestChecks/mir.py
+34-0llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs/inlined-function.mir.expected
+34-0llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs/inlined-function-swapped.mir.expected
+27-0llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs/inlined-function.mir
+27-0llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs/inlined-function-swapped.mir
+306-86 files not shown
+352-812 files

LLVM/project 64c9485llvm/lib/Target/AMDGPU AMDGPUMachineLevelInliner.cpp AMDGPUMachineLevelInliner.h, llvm/test/CodeGen/AMDGPU amdgpu-machine-level-inliner.ll amdgpu-machine-level-inliner.mir

[AMDGPU] Actually perform the machine level inlining

Errors out if there's a recursive call.
DeltaFile
+188-14llvm/test/CodeGen/AMDGPU/amdgpu-machine-level-inliner.ll
+164-1llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.cpp
+63-12llvm/test/CodeGen/AMDGPU/amdgpu-machine-level-inliner.mir
+8-0llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.h
+423-274 files

LLVM/project 5e7631ellvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchLSXInstrInfo.td, llvm/test/CodeGen/LoongArch/lasx and-not-combine.ll

[LoongArch][DAGCombiner] Combine vand (vnot ..) to vandn (#161037)

After this commit, DAGCombiner will have more opportunities to perform
vector folding. This patch includes several foldings, as follows:
- VANDN(x,NOT(y)) -> AND(NOT(x),NOT(y)) -> NOT(OR(X,Y))
- VANDN(x, SplatVector(Imm)) -> AND(NOT(x), NOT(SplatVector(~Imm)))
DeltaFile
+148-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+26-54llvm/test/CodeGen/LoongArch/lasx/and-not-combine.ll
+22-43llvm/test/CodeGen/LoongArch/lsx/and-not-combine.ll
+14-13llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+13-13llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td
+223-1235 files

LLVM/project f287abdllvm/include/llvm/CodeGen ValueTypes.td, llvm/lib/Target/NVPTX NVPTXISelLowering.cpp

[DAG][X86] Improve custom i256/i512 AVX512 CTLZ/CTTZ Handling with MVT::i256/i512 (#168860)

This patch proposes to move the AVX512 CTLZ/CTTZ i256/i512 codegen to
ReplaceNodeResults to allow them to be declared as custom lowering -
this allows expansion of larger int types (e.g. i1024) to fallback to
them during their expansion.

However to declare these i256/i512 ops as custom, we need to add
MVT::i256/i512 simple types - I'm intending to add further large integer
handling in the future, some of which will use vector register
instructions, and its going to be much easier if this can be handled
with i128/i256/i512 types that match the vector register sizes.

This exposed a regression in NVPTX due to their use of EVT::isSimple()
to match their upper integer size bounds.
DeltaFile
+456-878llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+248-246llvm/include/llvm/CodeGen/ValueTypes.td
+61-67llvm/lib/Target/X86/X86ISelLowering.cpp
+2-2llvm/test/TableGen/CPtrWildcard.td
+2-1llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+769-1,1945 files