[lldb] Move parts of OutputFormattedUsageText into utility function (#180947)
As seen in #177570, this code has a bunch of corner cases, does not
handle ANSI codes properly and does not handle unicode at all. That's
enough to fix that we need some tests to make it clear where we're
starting from.
The body of OutputFormattedUsageText is moved into a utility in the
AnsiTerminal.h header and tests added to the existing
AnsiTerminalTest.cpp.
Some results are known to be wrong. Some that cause crashes are
commented out, to be enabled once fixed.
[Matrix] Use tiled loops automatically for large kernels. (#179325)
Update LowerMatrixIntrinsics to use tiled loops automatically in for
larger matrixes. The fully unrolled codegen creates a huge amount of
code, which performs noticably worse then the tiled loop nest variant.
We new try to estimate the number of instructions needed for the
multiply, and if it is too large, tiled loops are used. The current
threshold is anything roughly larger than 6x6x6 double multiply.
Eventually I think we want to only generate tiled loops. This patch is a
first step, trying to opt in for cases where we know it is beneficial.
Checked on AArch64, but should help on other architectures similarly,
and also drastically reduce binary size + compile time.
PR: https://github.com/llvm/llvm-project/pull/179325
[mlir][shard, mpi] Allow more than one last axis to be "unsplit" (#180754)
A resharding pattern allowed only a single trailing axis to be
"unsplit".
This PR allows multiple trailing axes to be "unsplit".
[ProfCheck Add WinEH Tests to XFail List
This pass recently had NewPM coverage added which means we now can see
profcheck issues with the pass. Disable it for now until we can get it
fixed, although its not crucial for anything given it is only run for
32-bit X86 Windows.
[flang][NFC] Converted five tests from old lowering to new lowering (part 16) (#180866)
Tests converted from test/Lower: fail_image.f90,
test/Lower/forall: array-constructor.f90, array-pointer.f90,
array-subscripts.f90, character-1.f90
[lldb-dap][windows] drain the ConPTY before attaching (#180578)
Add a step to drain the init sequences emitted by the ConPTY before
attaching it to the debuggee.
A ConPTY (PseudoConsole) emits init sequences which flush the screen and
contain the name of the program (ESC[2J for clear screen, ESC[H for
cursor home and more). It's not desirable to filter them out: if a
debuggee also emits them, lldb would filter that output as well. To work
around this, the ConPTY is drained by attaching a dummy process to it,
consuming the init sequences and then attaching the actual debuggee.
---------
Co-authored-by: Nerixyz <nero.9 at hotmail.de>
clang/AMDGPU: Do not look for rocm device libs if environment is llvm (#180922)
clang/AMDGPU: Do not look for rocm device libs if environment is llvm
Introduce usage of the llvm environment type. This will be useful as
a switch to eventually stop depending on externally provided libraries,
and only take bitcode from the resource directory.
I wasn't sure how to handle the confusing mess of -no-* flags. Try
to handle them all. I'm not sure --no-offloadlib makes sense for OpenCL
since it's not really offload, but interpret it anyway.
[libc] Add RPC helpers for dispatching functions to the host (#179085)
Summary:
The RPC interface is useful for forwarding functions. This PR adds
helper functions for doing a completely bare forwarding of a function
from the client to the server. This is intended to facilitate
heterogenous libraries that implement host functions on the GPU (like
MPI or Fortran).
[HLSL] Implement Sample* methods for Texture2D (#179322)
This commit implement the methods:
- SampleBias
- SampleCmp
- SampleCmpLevelZero
- SampleGrad
- SampleLevel
They are added to the Texture2D resource type. All overloads except for
those with the `status` argument.
Part of https://github.com/llvm/llvm-project/issues/175630
Assisted-by: Gemini
---------
Co-authored-by: Helena Kotas <hekotas at microsoft.com>
[ExpandIRInsts] Support saturating fptoi (#179710)
Add support for expanding fptosi.sat and fptoui.sat via IR expansions.
Similar to fptosi/fptoui we would get legalization errors otherwise.
The previous expansion for fptosi/fptoui was already saturating -- but
those instructions do not actually require saturation, and the
implementation of the saturation was incorrect in lots of ways. What
this PR does is:
* For fptosi, remove the unnecessary saturation handling.
* For fptoui, remove the unnecessary saturation handling and sign
multiplication.
* For fptosi, use the previous saturation handling with fixes: We need
to map NaNs to 0 and the saturation condition on the exponent was
incorrect. (I'm performing the NaN check via fcmp -- there's no
requirement to do everything bitwise here.)
* For fptoui use a variation of the signed saturation handling: Negative
values need to go to zero and we saturate to unsigned max.
Proofs: https://alive2.llvm.org/ce/z/Xv9FNd
[flang][NFC] Converted five tests from old lowering to new lowering (part 17) (#180869)
Tests converted from test/Lower: goto-do-body.f90, mixed_loops.f90,
while_loop.f90
From test/Lower/forall: degenerate.f90, forall-2.f90
[AArch64] Lower factor-of-2 interleaved stores to STNP (#177938)
This patch prioritizes lowering to `stnp` over `st2` store instructions
marked !nontemporal.
From performance perspective, we should conservatively prioritize STNP
lowering for non-temporal stores, because currently NT stores requires
explicit usage of `__builtin_nontemporal_store()` intrinsic, so I think
its reasonable to assume the developer explicitly intends to optimize
D-cache usage of some hot non-temporal execution. He can rollback if it
doesnt help.
The cost here is it adds a few instructions for code size (thus we
predicate when not optimizing for code size), few extra fast
instructions to execute, few extra short dep chains - should be commonly
handled by OOO execution, I-cache alignment effects, few extra
registers. In the future we can may be able to approximate a cost model
to select by.
[3 lines not shown]
clang/AMDGPU: Remove dead code in RocmInstallationDetector (#180920)
The defaulted constructor argument isn't used anywhere, so
this path is unreachable.
[lldb][windows] switch to using std::string instead of std::wstring in Python setup (#180786)
This patch changes the return type of methods returning `std:wstring` to
`std::string` in `PythonPathSetup.cpp`.
This follows lldb's style of converting to `std::wstring` at the last
moment.
[Hexagon] Fix signed constant creation in EmitVAArgFromMemory (#180385)
Use ConstantInt::getSigned instead of ConstantInt::get when creating a
negative alignment mask in EmitVAArgFromMemory. This is the same fix as
commit 8546294db95d (PR #176115) which addressed the issue in
EmitVAArgForHexagonLinux.
Added a test case that exercises the EmitVAArgFromMemory alignment path
using a struct that is both >8 bytes (to trigger EmitVAArgFromMemory)
and has 8-byte alignment (to trigger the alignment masking code).
[SPIRV] Replace `SPIRVType` with `SPIRVTypeInst` as much as we can (#180721)
Second part of https://github.com/llvm/llvm-project/pull/179947 where we
use `SPIRVTypeInst` as much as we can.
Co-authored-by: Cursor <cursoragent at cursor.com>
[SLP]Skip operands comparing on non-matching (but compatible) instructions
If the instructions are compatible but non-matching (zext-select pair as
example), no need to perform operands analysis, just return that they
are matching.