[libc++] Refactor formatter_float.bench.cpp and drop some benchmarks (#178886)
`formatter_float.bench.cpp` currently benchmarks the floating point
formatting very extensively. This patch reduces the number of benchmarks
by removing some of the cases that are relatively meaningless.
The benchmark is also converted to the more recent style of benchmarks.
As a nice side-effect, this reduces the time it takes to compile the
benchmark by ~20x.
We may be able to drop more benchmarks, but I'm not an expert here and
am rather conservative here for that reason.
[CMake] Update "all" project/runtimes (#179270)
Move compiler-rt from "all" projects to "all" runtimes and add "openmp"
to "all" runtimes, as it was recently removed from "all" projects.
Attributor: Add -light otions to -attributor-enable flag
Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.
Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.
I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
[flang][acc] Fix cache directive with mapped component (#179335)
When a derived type component is mapped via a data clause (e.g.,
`copyin(data%A(...))`), the base address inside the parallel region
comes from an `hlfir.declare` op (for the mapped address) instead of
an `hlfir.designate` op. Use `FortranVariableOpInterface` to extract
shape/typeparams/attrs, which works for both cases since both ops
implement this interface.
[flang][acc] remap no_create array sections (#178660)
The workaround for no_create with array section is not needed anymore
because it has been expected that it would be up to the runtime to make
sure fir.box for the variable are always readable on the device even
when the variable is not present.
[SelectionDAG] Use promoted types when creating nodes after type legalization (#178617)
When creating new nodes with illegal types after type legalization, we
should try to use promoted type to avoid creating nodes with illegal
types.
Fixes: https://github.com/llvm/llvm-project/issues/177155
[llvm][OpenMP] Allow Chunk Size on SIMD Guided (#178853)
As per the OpenMP Spec, Chunk Size is allowed when using the guided
kind-type with the schedule clause. However, when being used in cases
such as `!$omp do simd schedule (simd:guided,4)`, this was not allowed
as the base type, BaseGuidedSimd, would hit an assert not allowing
ChunkSizes.
By making this change, we can allow the use of the Guided type, with a
ChunkSize and the schedule clause when using OMPIRBuidler.
Fixes #82106
[MC] Try to fix ubsan bot
Check that the size is non-zero to make sure we don't call
memcpy with null pointers. This is well-defined now, but ubsan
may still warn about it.
(cherry picked from commit d064f395af7ac226dec3f8e90516a26e96e2acf1)
[X86][APX] Disable PP2/PPX generation on Windows (#178122)
The PUSH2/POP2/PPX instructions for APX require updates to the Microsoft
Windows OS x64 calling convention documented at
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170
due to lack of suitable unwinder opcodes that can support APX
PUSH2/POP2/PPX.
The PR request disables this support by default for code robustness;
workloads that choose to explicitly enable this support can change the
default behavior by explicitly specifying the flag options that enable
this support e.g. for experimentation or code paths that do not need
unwinder support.
(cherry picked from commit 2f3935bcee6eaf7df8c85a21b7c0fbef967316b5)
[lldb] Fix SBBreakpointName::SetEnabled to propagate changes to breakpoints (#178734)
When setting the enabled state of a breakpoint name via the API, the
change was not being propagated to breakpoints using that name.
This was inconsistent with the CLI behaviour where `breakpoint name
configure --enable/--disable` correctly updates all associated
breakpoints.
(cherry picked from commit 8370304f1e5878c1860223239932ddd05d9ba4c8)
[AArch64][GlobalISel] Do no skip zext in getTestBitReg. (#177991)
We can, when attempting to lower to tbz, skip a zext that is then not
accounted for elsewhere. The attached test ends up with a tbz from an
extract that then does not properly zext the value extracted from the
vector. This patch fixes that by only looking through a G_ZEXT if the
bit checked is in the low part of the value, lining up the code with the
comment.
Fixes #173895
(cherry picked from commit 0321f3eeee5cceddc2541046ee155863f5f59585)
[X86] getScalarMaskingNode - FIXUPIMM scalar ops take upper elements from second operand (#179101)
FIXUPIMMSS/SD instructions passthrough the SECOND operand upper elements, and not the first like most (2-op) instructions
Fixes #179057
(cherry picked from commit 49d2323447aec77c3d1ae8c941f3f8a126ff1480)
[X86] Add test coverage for #179057 (#179092)
Incorrect folding of fixupimm scalar intrinsics passthrough when the
mask is known zero
(cherry picked from commit 618d71dc98df760d0c724cff6fa69b780e8c0372)
ValueTracking: Revert noundef checks in computeKnownFPClass for fmul/fma (#178850)
This functionally reverts fd5cfcc41311c6287e9dc408b8aae499501660e1 and
35ce17b6f6ca5dd321af8e6763554b10824e4ac4.
This was correct and necessary, but is causing performance regressions
since isGuaranteedNotToBeUndef is apparently not smart enough to detect
through recurrences. Revert this for the release branch.
Also the test coverage was inadequate for the fma case, so add a new
case which changes with and without the check.
(cherry picked from commit 07ec2fa1443ccd3cbb55612937f1dddebfe51c15)
[AMDGPU][PromoteAlloca] Set !amdgpu.non.volatile if promotion fails
I thought about doing this in a separate pass, but this pass already has all the necessary analysis for this to be a trivial addition.
We can simply set `!amdgpu.non.volatile` if all other attempts to promote the operation failed.
[AMDGPU] Set MONonVolatile on memory accesses for spills
Mark the memory operand of spill load/stores as non-volatile, so that these
loads and stores are emitted with `nv` set.
The reason is that scratch memory used by spills will never be shared by
another thread. It's purely thread local and thus a good fit for the `nv` bit.