[libc++][C++03] Remove code in the C++03-specific tests that is guarded on the language version (#169354)
This is dead code, since `test/libcxx-03` is only ever executed with
`-std=c++03`.
[ARM] Remove Subtarget from ARMAsmPrinter (#168264)
Remove Subtarget uses from ARMAsmPrinter, making use of TargetMachine
where applicable and getting the Subtarget from the MF where not. Some
of the `if() llvm_unreachable` have been replaced by `asserts`.
[X86] Sync multiversion features with libgcc and refactor internal feature tables (#168750)
Compiler-rt internal feature table is synced with the one in libgcc
(common/config/i386/i386-cpuinfo.h).
LLVM internal feature table is refactored to include a field ABI_VALUE,
so we won't be relying on ordering to keep the values correct. The table
is also synced to the one in compiler-rt.
[MLIR][OpenMP] Add MLIR Lowering Support for dist_schedule (#152736)
`dist_schedule` was previously supported in Flang/Clang but was not
implemented in MLIR, instead a user would get a "not yet implemented"
error. This patch adds support for the `dist_schedule` clause to be
lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop`
section.
There has needed to be some rework required to ensure that MLIR/LLVM
emits the correct Schedule Type for the clause, as it uses a different
schedule type to other OpenMP directives/clauses in the runtime library.
This patch also ensures that when using dist_schedule or a chunked
schedule clause, the correct llvm loop parallel accesses details are
added.
[OMPIRBuilder] always leave PARALLEL via the same barrier (#164586)
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.
The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like
```
...
%ret = call i32 @__kmpc_cancel(...)
[50 lines not shown]
[OpenMP][clang][HIP][CUDA] fix weak alias emit on device compilation (#164326)
This PR adds checks for when emitting weak aliases in: `void
CodeGenModule::EmitGlobal(GlobalDecl GD)`, before for device compilation
for OpenMP, HIP and Cuda, clang would look for the aliasee even if it
was never marked for device compilation.
For OpenMP the following case now works:
> Failed before when compiling with device, ie: `clang -fopenmp
-fopenmp-targets=amdgcn-amd-amdhsa`
> ```
> int __Two(void) { return 2; }
> int Two(void) __attribute__ ((weak, alias("__Two")));
> ```
For HIP / Cuda:
>
[17 lines not shown]
[MLIR][NVVM][Docs] Update docs (#169694)
This patch updates the NVVM Dialect docs to:
* include information on the type of pointers for the memory spaces.
* include high-level information on mbarrier objects.
Signed-off-by: Durgadoss R <durgadossr at nvidia.com>
[NFC][analyzer] Clean up obsolete installation instructions (#166193)
The documentation file `Installation.rst` contained very obsolete
instructions for installing the clang static analyzer. This commit
replaces it with sentence which explains that the analyzer is part of
clang and links to the releases page of LLVM (for downloading clang).
This sentence is primarily added to the top-level page of the analyzer
documentation; but it also appears in a stubbed Installation.rst (for
users who followed a direct external link to this installation page).
This stubbed section is removed from the table of contents, but I kept
it as an orphaned page (to avoid breaking links).
Fixes #165571
[BOLT] Overhaul the comments in PAuthGadgetScanner for readability (NFC)
Update the comments in PAuthGadgetScanner.cpp to better describe the
current version of the code. Along the way, shorten identifier names
that are redundant taking their context into account:
`RegsToTrackInstsFor` (made `RegsToTrack`) and `getNumTrackedRegisters`
(made `getNumRegisters`).
[SystemZ] Serialize ada entry flags (#169395)
Adding support for serializing the ada entry flags helps with mir based
test cases. Without this change, the flags are simple displayed as being
"unkmown".
[CodeGenTypes] Remove explicit VT numbers from ValueTypes.td (#169670)
Remove explicit VT numbers from ValueTypes.td so that patches that add a
new VT do not have to renumber the entire file.
In TableGen VTs are now identified by ValueType.LLVMName instead of
ValueType.Value. This is important for target-defined types (typically
based on PtrValueType) which are not mentioned in ValueTypes.td itself.
[X86] Replace BF16 to F32 conversions with generic conversions (#169781)
Let standard casting / builtin_convertvector handle the conversions from BF16 to F32
My only query is how to best implement _mm_cvtpbh_ps - I went for the
v8bf16 -> v8f32 conversion followed by subvector extraction in the end,
but could just as easily extract a v4bf16 first - makes no difference to
final optimized codegen.
First part of #154911
[LV] Test more combinations of scalar stores using last lane of IV.
Extends test coverage to include different start and step values, as
well as interleaving.
[clang][bytecode] Check for invalid record decls in IntPointer::atOffset (#169786)
We can't access the RecordLayout of an invalid decl, so return failure
if that happens.
Fixes https://github.com/llvm/llvm-project/issues/167076
[X86] optimize ssse3 horizontal saturating add/sub (#169591)
Currently LLVM fails to recognize a manual implementation of `phadd`
https://godbolt.org/z/zozrssaWb
```llvm
declare <8 x i16> @llvm.x86.ssse3.phadd.sw.128(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16>, <8 x i16>)
define <8 x i16> @phaddsw_v8i16_intrinsic(<8 x i16> %a, <8 x i16> %b) {
entry:
%res = call <8 x i16> @llvm.x86.ssse3.phadd.sw.128(<8 x i16> %a, <8 x i16> %b)
ret <8 x i16> %res
}
define <8 x i16> @phaddsw_v8i16_generic(<8 x i16> %a, <8 x i16> %b) {
entry:
[28 lines not shown]