[Clang] Export inline move constructors in dllexport-ed template instantiations on non-MSVC targets (#168170)
Previously, even when MSVC compatibility was not requested, inline move
constructors in dllexport-ed templates were not exported, which was
seemingly unintended.
On non-MSVC targets (MinGW, Cygwin, and PS), such move constructors
should be exported consistently with copy constructors and with the
behavior of modern MSVC.
[lldb][docs] Fix plaintext markers in command map
Single backticks RST tries to resolve to a reference.
Double means plaintext.
Fixes these warnings:
map.rst:803: WARNING: 'any' reference target not found: target.prefer-dynamic-value
map.rst:814: WARNING: 'any' reference target not found: expr
[lldb][docs] Fix Visual Studio link in build doc
Fixes warning:
build.rst:107: WARNING: 'any' reference target not found: https://visualstudio.microsoft.com
[CIR][CIRGen][Builtin][X86] Masked compress Intrinsics (#169582)
Added masked compress builtin in CIR.
Note: This is my first PR to llvm. Looking forward to corrections
---------
Co-authored-by: bhuvan1527 <balabhuvanvarma at gmail.com>
[AMDGPU] Modifies builtin def to take _Float16('x') for both HIP/C++ and for OpenCL (#167652)
For extended imges insts amdgcn_image_sample_*_/gather4_* builtins,
using 'x' in the builtin def so that it will take _Float16 for both
HIP/C++ and OpenCL.
[SPIRV] Start adding support for `int128` (#170798)
LLVM has pretty thorough support for `int128`, and it has started seeing
some use. Even thouth we already have support for the
`SPV_ALTERA_arbitrary_precision_integers` extension, the BE was oddly
capping integer width to 64-bits. This patch adds partial support for
lowering 128-bit integers to `OpTypeInt 128`. Some work remains to be
done around legalisation support and validating constant uses (e.g.
cases that get lowered to `OpSpecConstantOp`).
[x86][AVX-VNNI] Fix VPDPWXXD Argument Types (#169456)
Fixed the argument types of the following intrinsics to match with the
ISA:
- vpdpwssd_128, vpdpwssd_256, vpdpwssd_512,
- vpdpwssds_128, vpdpwssds_256, vpdpwssds_512
- vpdpwsud_128, vpdpwsud_256, vpdowsud_512
- vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512
- vpdpwusd_128, vpdpwusd_256, vpdpwusd_512
- vpdpwusds_128, vpdpwusds_256, vpdpwusds_512
- vpdpwuud_128, vpdpwuud_256, vpdpwuud_512
- vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512
Fixes #97271. Note that this is the last PR for the issue.
[NVPTX] Add IR pass for FMA transformation in the llc pipeline (#154735)
This change introduces a new IR pass in the llc pipeline for NVPTX that
transforms sequences of FMUL followed by FADD or FSUB into a single FMA
instruction.
Currently, all FMA folding for NVPTX occurs at the DAGCombine stage,
which is too late for any IR-level passes that might want to optimize or
analyze FMAs. By moving this transformation earlier into the IR phase,
we enable more opportunities for FMA folding, including across basic
blocks.
Additionally, this new pass relies on the contract instruction level
fast-math flag to perform these transformations, rather than depending
on the -fp-contract=fast or -enable-unsafe-fp-math options passed to
llc.
[LLDB] Run MSVC STL smart pointer tests with PDB (#166946)
Runs the `std::shared/unique_ptr` tests with PDB with two changes:
- PDB uses the "full" name, so `std::string` is `std::basic_string<char,
std::char_traits<char>, std::allocator<char>>`
- The type of the pointer inside the shared/unique_ptr isn't the
`element_type` typedef
[CIR][NFC] Add stubs for missing visitors in ScalarExprEmitter (#171222)
This adds stubs that issue NYI errors for any visitor that is present in
the ClangIR incubator but missing in the upstream implementation. This
will make it easier to find to correct locations to implement missing
functionality.
[CIR][NFC] Fix bad switch fallthroughs in emitStmt (#171224)
This moves a couple of statement emitters that were incorrectly
implemented in the middle of a switch statement where all cases in the
final group are intended to fall through to a handler that emits an NYI
error message. The placement of these implementations was causing some
statement types that should have emitted the NYI error to instead go to
a handler for a different statement type.
[RISCV] Use VM and VMNoV0 for "vr" and "vd" inline asm constraints with mask type. (#171235)
The inline assembly handling in SelectionDAG uses the first type
for the register class as the type at the input/output of the
inlineassembly. If this isn't the type for the surrounding DAG,
it needs to be converted.
nxv8i8 is the first type for the VR and VRNoV0 register classes.
So we currently generate insert/extract_subvector and bitcasts to
convert to/from nxv8i8.
I believe some of the special casing we have for this in
splitValueIntoRegisterParts and joinRegisterPartsIntoValue is causing
us to also generate incorrect code for arguments with nxv16i4 types
that should be any extended to nxv16i8. Instead we widen them to nxv32i4
and bitcast to nxv16i8.
This patch uses VM and VMNoV0 for masks which has nxv64i1 as their
first type. This means we will only emit an insert/extract_subvector
[5 lines not shown]
[RISCV] Add VMNoV0 register class with only the VMaskVTs. (#171231)
I plan to use this for inline assembly "vd" contraints with mask types
in a follow up patch. Due to the test changes I wanted to post this
separately.
[lldb] Don't read firstSubclass and nextSiblingClass from class_rw_t (#171213)
We're considering modifying the ObjC runtime's class_rw_t structure to
remove the firstSubclass and nextSiblingClass fields in some cases. LLDB
is currently reading those but not actually using them. Stop doing that
to avoid issues if they are removed by the runtime.
rdar://166084122
[MLIR][XeGPU] Extend propagation and sg_to_lane distribution pass support broadcast with low rank and scalar source input (#170409)
This PR extends XeGPU layout propagation and distribution for
vector.broadcast operation.
It relaxes the restriction of layout propagation to allow low-rank and
scalar source input, and adds a pattern in sg-to-wi distribution to
support the lowering.
[mlir][vector] Fix crash in ReorderCastOpsOnBroadcast with non-vector result (#170985)
Fixes a crash in `ReorderCastOpsOnBroadcast` by ensuring the cast result
is a `VectorType` before applying the pattern.
A regression test has been added to
mlir/test/Dialect/Vector/vector-sink.mlir.
Fixes: #126371
[compiler-rt] Try bumping soft_rss_limit again (#171469)
This is still failing on some of the bots. Try bumping the limit again
to see if this fixes things.
[CI] Tweak wording for builds with passing tests and build errors (#171436)
"All tests passed" is too easily interpreted as every possible test was
run and was fine. A lot of the time it means all the tests that didn't
fail to build ran and were fine.
Maybe the wording is still too subtle but at least it hints to the idea
that the tests run might be fewer than if the build had no compilation
errors.
[SystemZ] Implement ctor/dtor emission via @@SQINIT and .xtor sections
This patch implements support for constructors/destructors by introducing the
@@SQINIT section and emitting .xtor.<priority> sections within the SystemZ
AsmPrinter and in the GOFF object lowering layer. Improvements to ADA descriptor
handling is also done within this change.
[clang][FMV][AArch64] Remove O3 from failing test (#171457)
This fixes the buildbot failures from
https://github.com/llvm/llvm-project/pull/150267.
I could not reproduce them locally but my intuition suggests that the
-O3 option on the RUN line behaves incosistently on different hosts
judging from the error logs.
My intention was to run an integration test which will use llvm's
globalopt pass, but there's no need actually. We have unittests in place
for it.
[AArch64][ARM] Optimize more `tbl`/`tbx` calls into `shufflevector` (#169748)
Resolves #169701.
This PR extends the existing InstCombine operation which folds `tbl1`
intrinsics to `shufflevector` if the mask operand is constant. Before
this change, it only handled 64-bit `tbl1` intrinsics with no
out-of-bounds indices. I've extended it to support both 64-bit and
128-bit vectors, and it now handles the full range of `tbl1`-`tbl4` and
`tbx1`-`tbx4`, as long as at most two of the input operands are actually
indexed into.
For the purposes of `tbl`, we need a dummy vector of zeroes if there are
any out-of-bounds indices, and for the purposes of `tbx`, we use the
"fallback" operand. Both of those take up an operand for the purposes of
`shufflevector`.
This works a lot like https://github.com/llvm/llvm-project/pull/169110,
with some added complexity because we need to handle multiple operands.
[11 lines not shown]
[OCaml] Fix build
Fix a mistake introduced in https://github.com/llvm/llvm-project/pull/163979:
We should stick with the deprecated LLVMGetGlobalContext() API
in this file, as getGlobalContextForCAPI() is a C++ API that is
not available here.
[libc++] Don't try to be compatible with libstdc++ in __libcpp_refstring on iOS (#170816)
iOS doesn't provide a libstdc++ dylib anymore, so we can remove the
compatiblity check whether we can load the dylib.
[BOLT] Fix pacret-synchronous-unwind.cpp test (#171395)
The test case build a binary from C++, and checks for the number of
functions the PointerAuthCFIFixup pass runs on.
This can change based on the platform. To account for this, the patch
changes the number to a regex.
The test failed when running on RHEL 9.