[lldb][TypeSystemClang] Remove mostly unused is_complex output parameter to IsFloatingPointType (#178906)
Depends on:
* https://github.com/llvm/llvm-project/pull/178904
(only last commit is relevant for the review)
This is part of a patch series to clean up the
TypeSystemClang::IsFloatingPointType API. The `is_complex` parameter is
rarely checked. This patch introduces a `CompilerType::IsComplexType`
API which callers that previously checked `is_complex` can use instead.
This will also allow us to remove `CompilerType::IsFloat`, which is just
`IsFloatingPointType` that ignores the `is_complex` parameter.
Attributor: Add -light options to -attributor-enable flag (#179346)
Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.
Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.
I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
[AArch64] Fix cttz.elts codegen for fixed-length vectors (#178902)
When lowering cttz.elts for fixed-length vectors when SVE is available,
we use scalable container types for the predicate types since NEON
doesn't have dedicated predicate registers. Unfortunately, this also
discards the actual length of the vector to look at if it's shorter than a
full vector.
Example codegen, for a llvm.experimental.cttz.elts.i64.v4i1
shl v0.4h, v0.4h, 15
ptrue p0.h, vl4
ptrue p1.h
cmpne p0.h, p0/z, z0.h, #0
brkb p0.b, p1/z, p0.b
cntp x8, p0, p0.h
The 'ptrue p1.h' is where we went wrong -- if p0 is empty, we should
only set 4 lanes active at most, but since brkb's pg operand is all
active, it sets all available lanes (e.g. 8 .h lanes on a 128b SVE
[6 lines not shown]
[CIR][CUDA] Upstream device stub body emission and name mangling (#177790)
Part of #175871
This patch adds the initial implementation of the CUDA/NV Runtimes
generating code for the device stub body. tested on CUDA. HIP coverage
to be added in a later PR.
[Analysis][CostModel] Add insert-extract runlines for Apple CPUs (NFC) (#179236)
Including `apple-latest` to cover new processors until (if) they
diverge.
[libc++] Refactor formatter_float.bench.cpp and drop some benchmarks (#178886)
`formatter_float.bench.cpp` currently benchmarks the floating point
formatting very extensively. This patch reduces the number of benchmarks
by removing some of the cases that are relatively meaningless.
The benchmark is also converted to the more recent style of benchmarks.
As a nice side-effect, this reduces the time it takes to compile the
benchmark by ~20x.
We may be able to drop more benchmarks, but I'm not an expert here and
am rather conservative here for that reason.
[CMake] Update "all" project/runtimes (#179270)
Move compiler-rt from "all" projects to "all" runtimes and add "openmp"
to "all" runtimes, as it was recently removed from "all" projects.
Attributor: Add -light otions to -attributor-enable flag
Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.
Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.
I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
[flang][acc] Fix cache directive with mapped component (#179335)
When a derived type component is mapped via a data clause (e.g.,
`copyin(data%A(...))`), the base address inside the parallel region
comes from an `hlfir.declare` op (for the mapped address) instead of
an `hlfir.designate` op. Use `FortranVariableOpInterface` to extract
shape/typeparams/attrs, which works for both cases since both ops
implement this interface.
[flang][acc] remap no_create array sections (#178660)
The workaround for no_create with array section is not needed anymore
because it has been expected that it would be up to the runtime to make
sure fir.box for the variable are always readable on the device even
when the variable is not present.
[SelectionDAG] Use promoted types when creating nodes after type legalization (#178617)
When creating new nodes with illegal types after type legalization, we
should try to use promoted type to avoid creating nodes with illegal
types.
Fixes: https://github.com/llvm/llvm-project/issues/177155
[llvm][OpenMP] Allow Chunk Size on SIMD Guided (#178853)
As per the OpenMP Spec, Chunk Size is allowed when using the guided
kind-type with the schedule clause. However, when being used in cases
such as `!$omp do simd schedule (simd:guided,4)`, this was not allowed
as the base type, BaseGuidedSimd, would hit an assert not allowing
ChunkSizes.
By making this change, we can allow the use of the Guided type, with a
ChunkSize and the schedule clause when using OMPIRBuidler.
Fixes #82106
[MC] Try to fix ubsan bot
Check that the size is non-zero to make sure we don't call
memcpy with null pointers. This is well-defined now, but ubsan
may still warn about it.
(cherry picked from commit d064f395af7ac226dec3f8e90516a26e96e2acf1)
[X86][APX] Disable PP2/PPX generation on Windows (#178122)
The PUSH2/POP2/PPX instructions for APX require updates to the Microsoft
Windows OS x64 calling convention documented at
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170
due to lack of suitable unwinder opcodes that can support APX
PUSH2/POP2/PPX.
The PR request disables this support by default for code robustness;
workloads that choose to explicitly enable this support can change the
default behavior by explicitly specifying the flag options that enable
this support e.g. for experimentation or code paths that do not need
unwinder support.
(cherry picked from commit 2f3935bcee6eaf7df8c85a21b7c0fbef967316b5)
[lldb] Fix SBBreakpointName::SetEnabled to propagate changes to breakpoints (#178734)
When setting the enabled state of a breakpoint name via the API, the
change was not being propagated to breakpoints using that name.
This was inconsistent with the CLI behaviour where `breakpoint name
configure --enable/--disable` correctly updates all associated
breakpoints.
(cherry picked from commit 8370304f1e5878c1860223239932ddd05d9ba4c8)
[AArch64][GlobalISel] Do no skip zext in getTestBitReg. (#177991)
We can, when attempting to lower to tbz, skip a zext that is then not
accounted for elsewhere. The attached test ends up with a tbz from an
extract that then does not properly zext the value extracted from the
vector. This patch fixes that by only looking through a G_ZEXT if the
bit checked is in the low part of the value, lining up the code with the
comment.
Fixes #173895
(cherry picked from commit 0321f3eeee5cceddc2541046ee155863f5f59585)
[X86] getScalarMaskingNode - FIXUPIMM scalar ops take upper elements from second operand (#179101)
FIXUPIMMSS/SD instructions passthrough the SECOND operand upper elements, and not the first like most (2-op) instructions
Fixes #179057
(cherry picked from commit 49d2323447aec77c3d1ae8c941f3f8a126ff1480)
[X86] Add test coverage for #179057 (#179092)
Incorrect folding of fixupimm scalar intrinsics passthrough when the
mask is known zero
(cherry picked from commit 618d71dc98df760d0c724cff6fa69b780e8c0372)