LLVM/project a214981lldb/include/lldb/Utility AnsiTerminal.h, lldb/source/Interpreter Options.cpp

[lldb] Move parts of OutputFormattedUsageText into utility function (#180947)

As seen in #177570, this code has a bunch of corner cases, does not
handle ANSI codes properly and does not handle unicode at all. That's
enough to fix that we need some tests to make it clear where we're
starting from.

The body of OutputFormattedUsageText is moved into a utility in the
AnsiTerminal.h header and tests added to the existing
AnsiTerminalTest.cpp.

Some results are known to be wrong. Some that cause crashes are
commented out, to be enabled once fixed.
DeltaFile
+68-0lldb/unittests/Utility/AnsiTerminalTest.cpp
+64-0lldb/include/lldb/Utility/AnsiTerminal.h
+1-50lldb/source/Interpreter/Options.cpp
+133-503 files

LLVM/project 54177e9llvm/lib/Transforms/Scalar LowerMatrixIntrinsics.cpp, llvm/test/Transforms/LowerMatrixIntrinsics multiply-fused-loops-large-matrixes.ll data-layout-multiply-fused.ll

[Matrix] Use tiled loops automatically for large kernels. (#179325)

Update LowerMatrixIntrinsics to use tiled loops automatically in for
larger matrixes. The fully unrolled codegen creates a huge amount of
code, which performs noticably worse then the tiled loop nest variant.

We new try to estimate the number of instructions needed for the
multiply, and if it is too large, tiled loops are used. The current
threshold is anything roughly larger than 6x6x6 double multiply.

Eventually I think we want to only generate tiled loops. This patch is a
first step, trying to opt in for cases where we know it is beneficial.
Checked on AArch64, but should help on other architectures similarly,
and also drastically reduce binary size + compile time.

PR: https://github.com/llvm/llvm-project/pull/179325
DeltaFile
+80-2,209llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops-large-matrixes.ll
+28-4llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
+2-2llvm/test/Transforms/LowerMatrixIntrinsics/data-layout-multiply-fused.ll
+1-1llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-volatile.ll
+1-1llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-dominance.ll
+1-1llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops.ll
+113-2,2181 files not shown
+114-2,2197 files

LLVM/project bf547f9mlir/include/mlir/Bindings/Python IRAttributes.h, mlir/lib/Bindings/Python IRAttributes.cpp

[mlir][IR] `DenseElementsAttr`: Remove `i1` dense packing special case
DeltaFile
+6-100mlir/lib/Bindings/Python/IRAttributes.cpp
+10-80mlir/lib/IR/BuiltinAttributes.cpp
+1-54mlir/lib/IR/AttributeDetail.h
+0-14mlir/unittests/IR/AttributeTest.cpp
+0-12mlir/include/mlir/Bindings/Python/IRAttributes.h
+0-10mlir/test/IR/attribute-roundtrip.mlir
+17-2703 files not shown
+25-2839 files

LLVM/project f5e5745mlir/lib/Dialect/Shard/Transforms Partition.cpp, mlir/test/Conversion/ShardToMPI convert-shard-to-mpi.mlir

[mlir][shard, mpi] Allow more than one last axis to be "unsplit" (#180754)

A resharding pattern allowed only a single trailing axis to be
"unsplit".
This PR allows multiple trailing axes to be "unsplit".
DeltaFile
+73-59mlir/lib/Dialect/Shard/Transforms/Partition.cpp
+24-0mlir/test/Dialect/Shard/partition.mlir
+12-1mlir/test/Conversion/ShardToMPI/convert-shard-to-mpi.mlir
+109-603 files

LLVM/project 2099dd6mlir/include/mlir/Bindings/Python IRAttributes.h, mlir/lib/Bindings/Python IRAttributes.cpp

[mlir][IR] `DenseElementsAttr`: Remove `i1` dense packing special case
DeltaFile
+6-92mlir/lib/Bindings/Python/IRAttributes.cpp
+10-80mlir/lib/IR/BuiltinAttributes.cpp
+1-54mlir/lib/IR/AttributeDetail.h
+0-14mlir/unittests/IR/AttributeTest.cpp
+0-12mlir/include/mlir/Bindings/Python/IRAttributes.h
+0-10mlir/test/IR/attribute-roundtrip.mlir
+17-2623 files not shown
+25-2759 files

LLVM/project 7b02e6cmlir/include/mlir/Bindings/Python IRAttributes.h, mlir/lib/Bindings/Python IRAttributes.cpp

[mlir][IR] `DenseElementsAttr`: Remove `i1` dense packing special case
DeltaFile
+6-92mlir/lib/Bindings/Python/IRAttributes.cpp
+6-54mlir/lib/IR/BuiltinAttributes.cpp
+1-51mlir/lib/IR/AttributeDetail.h
+0-14mlir/unittests/IR/AttributeTest.cpp
+0-12mlir/include/mlir/Bindings/Python/IRAttributes.h
+0-10mlir/test/IR/attribute-roundtrip.mlir
+13-2332 files not shown
+20-2418 files

LLVM/project d58304dmlir/include/mlir/IR BuiltinTypeInterfaces.td BuiltinAttributes.td, mlir/lib/AsmParser AttributeParser.cpp

[mlir][WIP] `DenseElementsAttr` generalized

getter / iterator via interface

extraTraitClassDeclaration to provide default FloatType impls

address comments

simplify parser
DeltaFile
+137-1mlir/lib/AsmParser/AttributeParser.cpp
+38-92mlir/lib/IR/BuiltinAttributes.cpp
+88-0mlir/lib/IR/BuiltinTypes.cpp
+83-0mlir/test/IR/dense-elements-type-interface.mlir
+74-1mlir/include/mlir/IR/BuiltinTypeInterfaces.td
+32-13mlir/include/mlir/IR/BuiltinAttributes.td
+452-1078 files not shown
+610-11914 files

LLVM/project cbc2930clang/include/clang/Analysis/Analyses/LifetimeSafety Loans.h Facts.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

Field and interior paths
DeltaFile
+190-100clang/include/clang/Analysis/Analyses/LifetimeSafety/Loans.h
+174-101clang/unittests/Analysis/LifetimeSafetyTest.cpp
+116-46clang/test/Sema/warn-lifetime-safety-invalidations.cpp
+44-48clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+42-42clang/test/Sema/warn-lifetime-safety-dataflow.cpp
+51-29clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+617-3668 files not shown
+786-46214 files

LLVM/project c7ec361llvm/lib/Transforms/Vectorize SLPVectorizer.cpp

Fix a comment

Created using spr 1.3.7
DeltaFile
+1-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+1-11 files

LLVM/project 768cc03llvm/utils profcheck-xfail.txt

[ProfCheck Add WinEH Tests to XFail List

This pass recently had NewPM coverage added which means we now can see
profcheck issues with the pass. Disable it for now until we can get it
fixed, although its not crucial for anything given it is only run for
32-bit X86 Windows.
DeltaFile
+5-0llvm/utils/profcheck-xfail.txt
+5-01 files

LLVM/project 19a7a56llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 shl-to-add-transformation4.ll gather-node-same-reduced.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+4-61llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+15-14llvm/test/Transforms/SLPVectorizer/X86/shl-to-add-transformation4.ll
+2-24llvm/test/Transforms/SLPVectorizer/X86/gather-node-same-reduced.ll
+21-993 files

LLVM/project a58268aflang/test/Lower fail_image.f90, flang/test/Lower/forall array-pointer.f90 array-constructor.f90

[flang][NFC] Converted five tests from old lowering to new lowering (part 16) (#180866)

Tests converted from test/Lower: fail_image.f90,
test/Lower/forall: array-constructor.f90, array-pointer.f90,
array-subscripts.f90, character-1.f90
DeltaFile
+418-523flang/test/Lower/forall/array-pointer.f90
+86-258flang/test/Lower/forall/array-constructor.f90
+23-17flang/test/Lower/forall/character-1.f90
+24-12flang/test/Lower/forall/array-subscripts.f90
+11-8flang/test/Lower/fail_image.f90
+562-8185 files

LLVM/project 5a65900clang/include/clang/Analysis/Analyses/LifetimeSafety Loans.h Facts.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

Field and interior paths
DeltaFile
+185-98clang/include/clang/Analysis/Analyses/LifetimeSafety/Loans.h
+174-101clang/unittests/Analysis/LifetimeSafetyTest.cpp
+116-46clang/test/Sema/warn-lifetime-safety-invalidations.cpp
+42-46clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+42-42clang/test/Sema/warn-lifetime-safety-dataflow.cpp
+51-29clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+610-3628 files not shown
+779-45814 files

LLVM/project d7e5a7dlldb/include/lldb/Host/windows PseudoConsole.h, lldb/source/Host/windows PseudoConsole.cpp

[lldb-dap][windows] drain the ConPTY before attaching (#180578)

Add a step to drain the init sequences emitted by the ConPTY before
attaching it to the debuggee.

A ConPTY (PseudoConsole) emits init sequences which flush the screen and
contain the name of the program (ESC[2J for clear screen, ESC[H for
cursor home and more). It's not desirable to filter them out: if a
debuggee also emits them, lldb would filter that output as well. To work
around this, the ConPTY is drained by attaching a dummy process to it,
consuming the init sequences and then attaching the actual debuggee.

---------

Co-authored-by: Nerixyz <nero.9 at hotmail.de>
DeltaFile
+66-0lldb/source/Host/windows/PseudoConsole.cpp
+10-0lldb/include/lldb/Host/windows/PseudoConsole.h
+76-02 files

LLVM/project 0b0dca5clang/lib/Driver/ToolChains CommonArgs.cpp AMDGPU.cpp, clang/test/Driver opencl-libclc.cl hip-device-libs-llvm-env.hip

clang/AMDGPU: Do not look for rocm device libs if environment is llvm (#180922)

clang/AMDGPU: Do not look for rocm device libs if environment is llvm

Introduce usage of the llvm environment type. This will be useful as
a switch to eventually stop depending on externally provided libraries,
and only take bitcode from the resource directory.

I wasn't sure how to handle the confusing mess of -no-* flags. Try
to handle them all. I'm not sure --no-offloadlib makes sense for OpenCL
since it's not really offload, but interpret it anyway.
DeltaFile
+23-0clang/test/Driver/opencl-libclc.cl
+16-6clang/lib/Driver/ToolChains/CommonArgs.cpp
+11-3clang/lib/Driver/ToolChains/AMDGPU.cpp
+11-0clang/test/Driver/hip-device-libs-llvm-env.hip
+2-2libclc/CMakeLists.txt
+3-0clang/lib/Driver/ToolChains/HIPAMD.cpp
+66-113 files not shown
+68-129 files

LLVM/project 6d6feb7libc/docs/gpu rpc.rst, libc/shared rpc_dispatch.h rpc_util.h

[libc] Add RPC helpers for dispatching functions to the host (#179085)

Summary:
The RPC interface is useful for forwarding functions. This PR adds
helper functions for doing a completely bare forwarding of a function
from the client to the server. This is intended to facilitate
heterogenous libraries that implement host functions on the GPU (like
MPI or Fortran).
DeltaFile
+258-0libc/shared/rpc_dispatch.h
+205-0offload/test/libc/rpc_callback.cpp
+176-2libc/shared/rpc_util.h
+0-66offload/test/libc/rpc_callback.c
+44-5libc/docs/gpu/rpc.rst
+35-3libc/shared/rpc.h
+718-766 files

LLVM/project 3f73f83clang/lib/CodeGen CGHLSLBuiltins.cpp, clang/lib/Sema HLSLBuiltinTypeDeclBuilder.cpp SemaHLSL.cpp

[HLSL] Implement Sample* methods for Texture2D (#179322)

This commit implement the methods:

- SampleBias
- SampleCmp
- SampleCmpLevelZero
- SampleGrad
- SampleLevel

They are added to the Texture2D resource type. All overloads except for
those with the `status` argument.

Part of https://github.com/llvm/llvm-project/issues/175630

Assisted-by: Gemini

---------

Co-authored-by: Helena Kotas <hekotas at microsoft.com>
DeltaFile
+348-2clang/test/AST/HLSL/Texture2D-AST.hlsl
+240-6clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.cpp
+133-53clang/lib/Sema/SemaHLSL.cpp
+140-20clang/lib/CodeGen/CGHLSLBuiltins.cpp
+108-0clang/test/CodeGenHLSL/resources/Texture2D-SampleGrad.hlsl
+0-90clang/test/CodeGenHLSL/resources/Texture2D.sample.hlsl
+969-17115 files not shown
+1,550-17221 files

LLVM/project d8fdcc0lldb/test/API/riscv/conflicting-extensions-disassembly TestConflictingExtensions.py Makefile, lldb/test/API/riscv/disassembler TestDisassembler.py

update tests
DeltaFile
+8-32lldb/test/API/riscv/disassembler/TestDisassembler.py
+32-0lldb/test/API/riscv/conflicting-extensions-disassembly/TestConflictingExtensions.py
+17-0lldb/test/API/riscv/conflicting-extensions-disassembly/Makefile
+8-0lldb/test/API/riscv/conflicting-extensions-disassembly/main.c
+6-0lldb/test/API/riscv/conflicting-extensions-disassembly/file_with_zcmp.c
+6-0lldb/test/API/riscv/conflicting-extensions-disassembly/file_with_zcd.c
+77-321 files not shown
+82-327 files

LLVM/project 4677fc3lldb/include/lldb/Target Platform.h, lldb/source/Plugins/Architecture/Arm ArchitectureArm.cpp

Revert "[lldb] Step over non-lldb breakpoints" (#180944)

Reverts llvm/llvm-project#174348 due to reported failures on MacOS and
Arm 32-bit Linux.
DeltaFile
+61-87lldb/source/Target/Platform.cpp
+0-76lldb/test/API/functionalities/builtin-debugtrap/TestBuiltinDebugTrap.py
+71-0lldb/test/API/macosx/builtin-debugtrap/TestBuiltinDebugTrap.py
+0-42lldb/source/Target/StopInfo.cpp
+0-30lldb/source/Plugins/Architecture/Arm/ArchitectureArm.cpp
+0-29lldb/include/lldb/Target/Platform.h
+132-26413 files not shown
+147-34419 files

LLVM/project a8f2119llvm/lib/CodeGen ExpandIRInsts.cpp, llvm/test/CodeGen/AMDGPU fptoi.i128.ll

[ExpandIRInsts] Support saturating fptoi (#179710)

Add support for expanding fptosi.sat and fptoui.sat via IR expansions.
Similar to fptosi/fptoui we would get legalization errors otherwise.

The previous expansion for fptosi/fptoui was already saturating -- but
those instructions do not actually require saturation, and the
implementation of the saturation was incorrect in lots of ways. What
this PR does is:

* For fptosi, remove the unnecessary saturation handling.
* For fptoui, remove the unnecessary saturation handling and sign
multiplication.
* For fptosi, use the previous saturation handling with fixes: We need
to map NaNs to 0 and the saturation condition on the exponent was
incorrect. (I'm performing the NaN check via fcmp -- there's no
requirement to do everything bitwise here.)
* For fptoui use a variation of the signed saturation handling: Negative
values need to go to zero and we saturate to unsigned max.

Proofs: https://alive2.llvm.org/ce/z/Xv9FNd
DeltaFile
+470-1,417llvm/test/CodeGen/AMDGPU/fptoi.i128.ll
+103-75llvm/test/Transforms/ExpandIRInsts/X86/expand-fp-convert-small.ll
+84-40llvm/lib/CodeGen/ExpandIRInsts.cpp
+9-75llvm/test/Transforms/ExpandIRInsts/X86/expand-large-fp-convert-fptoui129.ll
+9-51llvm/test/Transforms/ExpandIRInsts/X86/expand-large-fp-convert-fptosi129.ll
+675-1,6585 files

LLVM/project 98fcc11flang/test/Lower mixed_loops.f90 while_loop.f90, flang/test/Lower/forall forall-2.f90 degenerate.f90

[flang][NFC] Converted five tests from old lowering to new lowering (part 17) (#180869)

Tests converted from test/Lower: goto-do-body.f90, mixed_loops.f90,
while_loop.f90
From test/Lower/forall: degenerate.f90, forall-2.f90
DeltaFile
+90-139flang/test/Lower/forall/forall-2.f90
+58-63flang/test/Lower/mixed_loops.f90
+46-44flang/test/Lower/while_loop.f90
+25-17flang/test/Lower/forall/degenerate.f90
+19-16flang/test/Lower/goto-do-body.f90
+238-2795 files

LLVM/project 5456d63llvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64InstrInfo.td, llvm/test/CodeGen/AArch64 nontemporal-store-interleaved.ll nontemporal-load-interleaved.ll

[AArch64] Lower factor-of-2 interleaved stores to STNP (#177938)

This patch prioritizes lowering to `stnp` over `st2` store instructions
marked !nontemporal.

From performance perspective, we should conservatively prioritize STNP
lowering for non-temporal stores, because currently NT stores requires
explicit usage of `__builtin_nontemporal_store()` intrinsic, so I think
its reasonable to assume the developer explicitly intends to optimize
D-cache usage of some hot non-temporal execution. He can rollback if it
doesnt help.

The cost here is it adds a few instructions for code size (thus we
predicate when not optimizing for code size), few extra fast
instructions to execute, few extra short dep chains - should be commonly
handled by OOO execution, I-cache alignment effects, few extra
registers. In the future we can may be able to approximate a cost model
to select by.


    [3 lines not shown]
DeltaFile
+1,014-0llvm/test/CodeGen/AArch64/nontemporal-store-interleaved.ll
+999-0llvm/test/CodeGen/AArch64/nontemporal-load-interleaved.ll
+97-0llvm/test/CodeGen/AArch64/nontemporal-store-interleaved-optsize.ll
+60-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+2-2llvm/lib/Target/AArch64/AArch64InstrInfo.td
+1-1llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+2,173-46 files

LLVM/project 6af11dbclang/include/clang/Driver RocmInstallationDetector.h, clang/lib/Driver/ToolChains AMDGPU.cpp

clang/AMDGPU: Remove dead code in RocmInstallationDetector (#180920)

The defaulted constructor argument isn't used anywhere, so
this path is unreachable.
DeltaFile
+1-3clang/lib/Driver/ToolChains/AMDGPU.cpp
+1-2clang/include/clang/Driver/RocmInstallationDetector.h
+2-52 files

LLVM/project 0ec4aa5lldb/source/Host/windows/PythonPathSetup PythonPathSetup.cpp

[lldb][windows] switch to using std::string instead of std::wstring in Python setup (#180786)

This patch changes the return type of methods returning `std:wstring` to
`std::string` in `PythonPathSetup.cpp`.

This follows lldb's style of converting to `std::wstring` at the last
moment.
DeltaFile
+19-18lldb/source/Host/windows/PythonPathSetup/PythonPathSetup.cpp
+19-181 files

LLVM/project 99c9e5eclang/lib/CodeGen/Targets Hexagon.cpp, clang/test/CodeGen hexagon-linux-vararg.c

[Hexagon] Fix signed constant creation in EmitVAArgFromMemory (#180385)

Use ConstantInt::getSigned instead of ConstantInt::get when creating a
negative alignment mask in EmitVAArgFromMemory. This is the same fix as
commit 8546294db95d (PR #176115) which addressed the issue in
EmitVAArgForHexagonLinux.

Added a test case that exercises the EmitVAArgFromMemory alignment path
using a struct that is both >8 bytes (to trigger EmitVAArgFromMemory)
and has 8-byte alignment (to trigger the alignment masking code).
DeltaFile
+33-0clang/test/CodeGen/hexagon-linux-vararg.c
+1-1clang/lib/CodeGen/Targets/Hexagon.cpp
+34-12 files

LLVM/project 37ace28flang/lib/Lower/OpenMP ClauseProcessor.cpp OpenMP.cpp, flang/test/Lower/OpenMP dyn-groupprivate-clause.f90

[flang][mlir] Add flang to mlir lowering for dyn_groupprivate
DeltaFile
+54-0flang/test/Lower/OpenMP/dyn-groupprivate-clause.f90
+47-0flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+18-0llvm/include/llvm/Frontend/OpenMP/ConstructDecompositionT.h
+0-10flang/test/Lower/OpenMP/Todo/dyn-groupprivate-clause.f90
+4-2flang/lib/Lower/OpenMP/OpenMP.cpp
+3-1flang/lib/Lower/OpenMP/ClauseProcessor.h
+126-136 files

LLVM/project 7f2b875llvm/lib/Target/SPIRV SPIRVInstructionSelector.cpp SPIRVGlobalRegistry.cpp

[SPIRV] Replace `SPIRVType` with `SPIRVTypeInst` as much as we can (#180721)

Second part of https://github.com/llvm/llvm-project/pull/179947 where we
use `SPIRVTypeInst` as much as we can.

Co-authored-by: Cursor <cursoragent at cursor.com>
DeltaFile
+325-322llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+203-193llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.cpp
+178-167llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.h
+94-94llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp
+56-55llvm/lib/Target/SPIRV/SPIRVPostLegalizer.cpp
+37-31llvm/lib/Target/SPIRV/SPIRVISelLowering.cpp
+893-8629 files not shown
+988-95315 files

LLVM/project 4a2a450utils/bazel/llvm-project-overlay/lldb/source/Plugins BUILD.bazel

[bazel][lldb] Port 304c680809f05923edd097835d1056e6460a3646 (#180931)

Co-authored-by: Pranav Kant <prka at google.com>
DeltaFile
+3-0utils/bazel/llvm-project-overlay/lldb/source/Plugins/BUILD.bazel
+3-01 files

LLVM/project f2441cbflang/include/flang/Lower OpenMP.h, flang/lib/Lower/OpenMP OpenMP.cpp

[flang][mlir] Add flang to mlir lowering for groupprivate
DeltaFile
+135-1flang/lib/Lower/OpenMP/OpenMP.cpp
+57-0flang/test/Lower/OpenMP/groupprivate.f90
+0-9flang/test/Lower/OpenMP/Todo/groupprivate.f90
+1-0flang/include/flang/Lower/OpenMP.h
+193-104 files

LLVM/project 54cdd90llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AArch64 select-zext-analysis.ll

[SLP]Skip operands comparing on non-matching (but compatible) instructions

If the instructions are compatible but non-matching (zext-select pair as
example), no need to perform operands analysis, just return that they
are matching.
DeltaFile
+42-0llvm/test/Transforms/SLPVectorizer/AArch64/select-zext-analysis.ll
+2-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+44-02 files