LLVM/project 835c015flang/lib/Lower/OpenMP OpenMP.cpp, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[flang][OpenMP] Lower target in_reduction for host fallback

Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.

Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.

Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
DeltaFile
+153-14mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+77-36mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+110-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+107-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+72-19flang/lib/Lower/OpenMP/OpenMP.cpp
+75-0mlir/test/Target/LLVMIR/openmp-target-in-reduction-multi.mlir
+594-727 files not shown
+764-8913 files

LLVM/project 9eee6c0llvm/lib/Target/AArch64 AArch64InstrInfo.td AArch64InstrFormats.td, llvm/test/MC/AArch64 arm64-aliases.s brbe.s

[AArch64][llvm] Define APAS, BRB and TRCIT as SYS aliases (#203563)

`APAS`, `BRB IALL/INJ` and `TRCIT` use `SYS` encodings, so define them
as aliases of `SYSxt` instead of separate instructions.

Check that the preferred architectural aliases are printed when their
features are enabled and that disassembly falls back to the generic `SYS`
spelling when not enabled.
DeltaFile
+27-0llvm/test/MC/AArch64/arm64-aliases.s
+8-13llvm/lib/Target/AArch64/AArch64InstrInfo.td
+0-19llvm/lib/Target/AArch64/AArch64InstrFormats.td
+4-0llvm/test/MC/AArch64/brbe.s
+39-324 files

LLVM/project 74ac7c9lldb/packages/Python/lldbsuite/test lldbtest.py, lldb/test/API/commands/expression/anonymous-struct TestCallUserAnonTypedef.py

[lldb][test] Introduce build_and_run test utility (#194386)

We currently have several hundred tests require a running process in a
given state, and therefore perform the same three tasks:

* compile a test executable
* set a breakpoint by finding a source regex
* then launch the test process to hit that breakpoint.

A large chunk of these tests do this exact same setup with various
versions of copied boilerplate code. The different versions we have all
have different conventions of naming the breakpoint comment, the main
file (and whether it should be resolved), and different generated error
messages if things go wrong.

We already have a standardized and much shorter way of doing this in
LLDB (see below), but this still encourages test writers to specify
non-standard file names and non-standard breakpoint comment names.


    [15 lines not shown]
DeltaFile
+33-1lldb/packages/Python/lldbsuite/test/lldbtest.py
+1-4lldb/test/API/commands/expression/anonymous-struct/TestCallUserAnonTypedef.py
+1-4lldb/test/API/commands/expression/dollar-in-variable/TestDollarInVariable.py
+1-4lldb/test/API/lang/objc/bitfield_ivars/TestBitfieldIvars.py
+1-4lldb/test/API/lang/objcxx/conflicting-names-class-update-utility-expr/TestObjCConflictingNamesForClassUpdateExpr.py
+37-175 files

LLVM/project 33bb0e7lldb/packages/Python/lldbsuite/test lldbpexpect.py

[lldb][test] Faster shut down for pexpect tests (#201171)

Our pexpect tests spend most of their time in the shutdown logic
waiting for the test child to shut down. For example, our editline
tests spend about 95% of their 40s runtime just waiting for the
pexpect child to terminate.

One of the reasons is that the ptyprocess terminate approach
uses a timeout to give the child time to shut down and be cleaned
up by the kernel. While this timeout makes sense, our timeout is
extremely long (6s) since 56fb7456950d2564d16500e40c5719c954a6987a .

Because the default ptyprocess implementation is designed for very
short timeouts (0.1s), it just sleeps and then checks the process
status. For our long timeout, the child most likely already terminated
way before the timeout on a fast system. However, because we have
some very slow builders, we cannot reduce this timeout without
making tests flaky again.


    [7 lines not shown]
DeltaFile
+74-5lldb/packages/Python/lldbsuite/test/lldbpexpect.py
+74-51 files

LLVM/project b24bc4dllvm/lib/Target/AMDGPU GCNVOPDUtils.cpp VOP3PInstructions.td, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp

AMDGPU: Reland: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3

For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter which will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.

Original patch had a bug where it did not check if physical src
registers match register class of appropriate operand in fullVOPD
instructions, check is now done via isValidVOPDSrc.
DeltaFile
+442-520llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.ll
+163-69llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll
+34-1llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+8-5llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+8-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+6-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+661-5951 files not shown
+663-5977 files

LLVM/project cb1821dlldb/source/Expression IRExecutionUnit.cpp, lldb/test/API/lang/c/libc_calls TestLibcCalls.py main.c

[lldb] Avoid calling dyld's versions of libc functions (#201829)

dyld ships with its own version of various libc functions that we are
not supposed to call. This patch prevents the expression evaluator from
calling them by respecting the existing list of forbidden modules.
DeltaFile
+75-0lldb/test/API/lang/c/libc_calls/TestLibcCalls.py
+11-0lldb/test/API/lang/c/libc_calls/main.c
+8-0lldb/source/Expression/IRExecutionUnit.cpp
+3-0lldb/test/API/lang/c/libc_calls/Makefile
+97-04 files

LLVM/project c1ec4b3flang/include/flang/Optimizer/Dialect FIROps.td FIROps.h, flang/lib/Optimizer/Dialect FIROps.cpp

[flang][mem2reg] promote memory slots through declares (#196975)

Leverage the new mem2reg APIs for views to remove the
"same block" limitation over fir.declare mem2reg, and to allow mem2reg
over fir.convert so that mixed dialect mem2reg with fir + memref is
possible.

Note that fir.declare_value for memory used with different value types
will be dropped (e.g. EQUIVALENCE). A later patch will deal with
improving fir.declare_value to carry the variable type interpedently of
the value (like in LLVM), but there are anyway a bit more work to enable
mem2reg with equivalence given their storage is an array of bytes.

Assisted by: Claude
DeltaFile
+195-16flang/test/Fir/mem2reg.mlir
+111-24flang/lib/Optimizer/Dialect/FIROps.cpp
+10-3flang/include/flang/Optimizer/Dialect/FIROps.td
+1-0flang/include/flang/Optimizer/Dialect/FIROps.h
+317-434 files

LLVM/project b85c748llvm/lib/Target/Mips MipsSEISelLowering.cpp MipsMSAInstrInfo.td, llvm/test/CodeGen/Mips/msa f16-llvm-ir.ll

[MIPS] soft-promote `f16` also when using `+msa` (#203065)

Fixes https://github.com/llvm/llvm-project/issues/202808

Make use of the default soft-promote mechanism for f16, rather than an
ad-hoc approach making f16 storage-only.

In theory you could leave it at that, but I added custom implementations
to make use of the instructions for `FP16_TO_FP` and `FP_TO_FP16`, and
manually apply the "fptoui to fptosi trick" which generates shorter
code.

I don't really have a good way of testing this. The assembly changes
look reasonable but it's easy to miss something subtle of course. I've
tried to break the change up into smaller commits but it's still kind of
a lot.
DeltaFile
+966-1,105llvm/test/CodeGen/Mips/msa/f16-llvm-ir.ll
+103-416llvm/lib/Target/Mips/MipsSEISelLowering.cpp
+0-37llvm/lib/Target/Mips/MipsMSAInstrInfo.td
+2-14llvm/lib/Target/Mips/MipsSEISelLowering.h
+0-6llvm/lib/Target/Mips/MipsRegisterInfo.td
+1-1llvm/lib/Target/Mips/MipsScheduleI6400.td
+1,072-1,5792 files not shown
+1,072-1,5838 files

LLVM/project f1ec325llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining-avx512bwvl.ll

[SelectionDAG] Fold subvector inserts into concat operands (#200937)

Push insert_subvector into the containing CONCAT_VECTORS operand when
the insertion is wholly contained there.

AI note: an LLM generated the code and the test, I've read them
DeltaFile
+8-36llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll
+31-10llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+39-462 files

LLVM/project 4af8d82llvm/lib/Target/NVPTX NVPTXISelLowering.cpp NVPTXAsmPrinter.cpp

Revert "[NVPTX] Rip out vestigial variadic support (NFC)" (#204106)

Reverts llvm/llvm-project#202385
DeltaFile
+230-64llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+21-11llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+9-4llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+8-0llvm/lib/Target/NVPTX/NVPTXSubtarget.h
+268-794 files

LLVM/project a8236a6flang/lib/Lower/OpenMP OpenMP.cpp, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[flang][OpenMP] Lower target in_reduction for host fallback

Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.

Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.

Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
DeltaFile
+143-14mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+77-36mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+107-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+72-19flang/lib/Lower/OpenMP/OpenMP.cpp
+83-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+75-0mlir/test/Target/LLVMIR/openmp-target-in-reduction-multi.mlir
+557-726 files not shown
+685-8912 files

LLVM/project 6e3da4dllvm/lib/CodeGen MachineSink.cpp, llvm/test/CodeGen/X86 machine-sink-dbg-loc.mir

Fix machine-sink using debug instruction source locations in merged locations (#203900)
DeltaFile
+123-0llvm/test/CodeGen/X86/machine-sink-dbg-loc.mir
+4-4llvm/lib/CodeGen/MachineSink.cpp
+127-42 files

LLVM/project dcc97d8llvm/include/llvm/Support CommandLine.h

[Support] Apply suggested ABI annotation fixup (NFC) (#204102)

Reported by the "LLVM ABI annotation checker" on a PR, but present in
main.

See: https://github.com/llvm/llvm-project/pull/203969#issuecomment-4711157875
DeltaFile
+2-2llvm/include/llvm/Support/CommandLine.h
+2-21 files

LLVM/project 841a606llvm/lib/Target/AArch64 AArch64PerfectShuffle.cpp AArch64PerfectShuffle.h, llvm/test/CodeGen/AMDGPU fcanonicalize.ll llvm.log10.ll

Merge branch 'main' into users/krzysz00/insert-concat-dagcombine
DeltaFile
+6,583-0llvm/lib/Target/AArch64/AArch64PerfectShuffle.cpp
+3-6,571llvm/lib/Target/AArch64/AArch64PerfectShuffle.h
+1,825-1,328llvm/test/Transforms/LoopVectorize/WebAssembly/memory-interleave.ll
+2,269-65llvm/test/CodeGen/AMDGPU/fcanonicalize.ll
+1,134-744llvm/test/CodeGen/AMDGPU/llvm.log10.ll
+1,134-744llvm/test/CodeGen/AMDGPU/llvm.log.ll
+12,948-9,4521,357 files not shown
+65,358-25,9271,363 files

LLVM/project f36f02eflang/include/flang/Optimizer/Dialect FIROps.td, flang/lib/Optimizer/Dialect FIROps.cpp

rebase
DeltaFile
+46-22flang/lib/Optimizer/Dialect/FIROps.cpp
+7-5flang/include/flang/Optimizer/Dialect/FIROps.td
+53-272 files

LLVM/project 292362dflang/test/Fir mem2reg.mlir

update test after #198552
DeltaFile
+1-1flang/test/Fir/mem2reg.mlir
+1-11 files

LLVM/project b0a76c0flang/include/flang/Optimizer/Dialect FIROps.td FIROps.h, flang/lib/Optimizer/Dialect FIROps.cpp

[flang][mem2reg] promote memory slots through declares
DeltaFile
+195-16flang/test/Fir/mem2reg.mlir
+87-24flang/lib/Optimizer/Dialect/FIROps.cpp
+9-4flang/include/flang/Optimizer/Dialect/FIROps.td
+1-0flang/include/flang/Optimizer/Dialect/FIROps.h
+292-444 files

LLVM/project 9018242llvm/test/Transforms/LoopInterchange fixed-size-no-signed-wrap.ll

[LoopInterchange] Add test for #200788 (NFC) (#204107)
DeltaFile
+80-0llvm/test/Transforms/LoopInterchange/fixed-size-no-signed-wrap.ll
+80-01 files

LLVM/project e28c71dllvm/lib/Target/NVPTX NVPTXISelLowering.cpp NVPTXAsmPrinter.cpp

Revert "[NVPTX] Rip out vestigial variadic support (NFC) (#202385)"

This reverts commit e63cd40ccce67f9472af9676185d7c87157043b4.
DeltaFile
+230-64llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+21-11llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+9-4llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+8-0llvm/lib/Target/NVPTX/NVPTXSubtarget.h
+268-794 files

LLVM/project 798358fllvm/lib/Target/AMDGPU SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU fcopysign.bf16.ll fcopysign.f16.ll

[AMDGPU] Fix lowerFCOPYSIGN dropping the sign bit when narrowing the sign operand (#203492)

TRUNCATE of the v2i32-bitcast sign kept the low 16 bits of each lane but
dropped f32 sign bit at bit 31

Shift right by 16 first so the sign bit lands in the f16 sign position

---------

Co-authored-by: Jay Foad <jay.foad at gmail.com>
DeltaFile
+58-52llvm/test/CodeGen/AMDGPU/fcopysign.bf16.ll
+48-46llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll
+6-2llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+112-1003 files

LLVM/project 0ad5f11llvm/lib/Target/SPIRV SPIRVModuleAnalysis.cpp, llvm/test/CodeGen/SPIRV/hlsl-intrinsics InterlockedAdd_spv_i64.ll

[SPIR-V] Add Int64Atomics to Vulkan available capabilities (#203194)

fixes #202456
DeltaFile
+0-4llvm/test/CodeGen/SPIRV/hlsl-intrinsics/InterlockedAdd_spv_i64.ll
+2-1llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+2-52 files

LLVM/project 2fd832dclang/docs ReleaseNotes.rst, clang/lib/Driver/ToolChains Linux.cpp

fixup! Address PR comments
DeltaFile
+44-62clang/test/Driver/stdc-predef.c
+11-4clang/lib/Driver/ToolChains/Linux.cpp
+2-1clang/docs/ReleaseNotes.rst
+57-673 files

LLVM/project 427807ellvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Transforms/InstCombine/AArch64 sve-intrinsic-mla-one.ll

[AArch64][SVE] add missing instcombine x+1 -> x (#201851)
DeltaFile
+97-0llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-mla-one.ll
+25-0llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+122-02 files

LLVM/project f4c5539llvm/lib/Transforms/Vectorize VPlanTransforms.h LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize vplan-print-before-after.ll vplan-print-after.ll

[LV] Add `-vplan-print-before=<pass-regex>` (#203933)

This can be helpful for debugging and for VPlan check tests (showing
before/after a specific transform).

This also adds `-vplan-print-before-all` for parity with
`-vplan-print-after-all`.
DeltaFile
+103-0llvm/test/Transforms/LoopVectorize/VPlan/vplan-print-before-after-all.ll
+0-100llvm/test/Transforms/LoopVectorize/VPlan/vplan-print-after-all.ll
+27-14llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+34-0llvm/test/Transforms/LoopVectorize/vplan-print-before-after.ll
+0-29llvm/test/Transforms/LoopVectorize/vplan-print-after.ll
+8-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+172-1436 files

LLVM/project 5556889llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll

AMDGPU/GlobalISel: RegBankLegalize rules for mfma_scale
DeltaFile
+5,788-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll
+1,911-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll
+7-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+7,706-23 files

LLVM/project 22bda6cclang/lib/AST/ByteCode Pointer.h Pointer.cpp, clang/test/AST/ByteCode complex.cpp

[clang][bytecode] Add more checks around _Complex values (#204076)

Check the actual source type when converting a pointer to an rvalue. We
otherwise allow converting form a two-element primitive array.
DeltaFile
+7-0clang/test/AST/ByteCode/complex.cpp
+5-1clang/lib/AST/ByteCode/Pointer.h
+1-1clang/lib/AST/ByteCode/Pointer.cpp
+13-23 files

LLVM/project 98bd015llvm/lib/Transforms/Vectorize VPlanUtils.cpp VPlanPatternMatch.h, llvm/test/Transforms/LoopVectorize/RISCV strided-accesses.ll

[VPlan] Recognize lshr in getSCEVExprForVPValue. (#203496)

When lshr v, const occurs and const is less than the type's bitwidth, it
can be treated as udiv v, (1 << const). This enables vectorizer to
convert more gathers into strided loads.

Pre-commit test #203488
DeltaFile
+10-6llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
+7-0llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+6-0llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+23-63 files

LLVM/project 196912cllvm/lib/Target/SPIRV SPIRVModuleAnalysis.cpp

[SPIR-V] Remove duplicate collectReqs call in runOnModule (NFC) (#203947)
DeltaFile
+0-1llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+0-11 files

LLVM/project 6409cedllvm/lib/Target/SPIRV SPIRVInstructionSelector.cpp

[SPIR-V] Constrain OpExtInst on the modf integral part path (#203946)
DeltaFile
+0-1llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+0-11 files

LLVM/project 9ed8b1dmlir/include/mlir/Dialect/Tensor/Transforms Passes.td Passes.h, mlir/lib/Dialect/Tensor/Transforms ScalarizeFunctionResult.cpp CMakeLists.txt

[MLGO][EmitC] Scalarize single-element tensor returns (#199686)

Add an EmitC-owned preparation pass,
`mlgo-scalarize-single-element-tensor-return`, that rewrites private
functions returning a statically-shaped ranked tensor with exactly one
element into functions returning the element type directly.

Assisted-by: Codex (refine implementation + tests). I reviewed all code
and tests before submission.

## Example

Before:
```mlir
func.func private @rank1(%arg0: tensor<1xi64>) -> tensor<1xi64> {
  return %arg0 : tensor<1xi64>
}
```


    [44 lines not shown]
DeltaFile
+318-0mlir/lib/Dialect/Tensor/Transforms/ScalarizeFunctionResult.cpp
+294-0mlir/test/Dialect/Tensor/scalarize-single-elem-return.mlir
+14-0mlir/include/mlir/Dialect/Tensor/Transforms/Passes.td
+2-1mlir/include/mlir/Dialect/Tensor/Transforms/Passes.h
+2-0mlir/lib/Dialect/Tensor/Transforms/CMakeLists.txt
+630-15 files