[Hexagon] Fix hasFP to respect frame-pointer attribute unconditionally (#181524)
HexagonFrameLowering::hasFPImpl() incorrectly gated the
DisableFramePointerElim check behind MFI.getStackSize() > 0. This meant
leaf functions with no stack allocation would not get a frame pointer
even when "frame-pointer"="all" (-fno-omit-frame-pointer) was set,
violating the user/ABI request. Every other LLVM target checks
DisableFramePointerElim unconditionally.
Move the DisableFramePointerElim and EliminateFramePointer checks
outside the getStackSize() > 0 guard so they are always evaluated.
Update affected tests whose CHECK patterns change due to the now-
correct allocframe emission.
Revert "[CMake][TableGen] Fix Ninja depslog error with implicit outputs on Ninja <1.10" (#182695)
Reverts llvm/llvm-project#179842
This seems to break some dependency tracking, as I no longer see .inc
files being regenerated when I update a TableGen .cpp file. Reverting
for now per the discussion on the PR.
AMDGPU: Implement expansion for f64 exp
I asked AI to port the device libs reference implementation.
It mostly worked, though it got the compares wrong and also
missed a fold that happened in compiler. With that fixed I get
identical DAG output, and almost the same globalisel output (differing
by an inverted compare and select). Also adjusted some stylistic choices.
InstCombine: Fold out nanless canonicalize pattern
Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.
The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility
[17 lines not shown]
InstCombine: Fold absorbing fmul of compared 0 into select (#172381)
This is similar to the select-bin-op identity case, except
in this case we are looking for the absorbing value for the
binary operator.
If the compared value is a floating-point 0, and the fmul is
implied to return a +0, put the 0 directly into the select
operand. This pattern appears in scale-if-denormal sequences
after optimizations assume denormals are treated as 0.
Fold:
```
%fabs.x = call float @llvm.fabs.f32(float %x)
%mul.fabs.x = fmul float %fabs.x, known_positive
%x.is.zero = fcmp oeq float %x, 0.0
%select = select i1 %x.is.zero, float %mul.fabs.x, float %fabs.x
[12 lines not shown]
[ProfCheck] Exclude bitcode tests
These tests fail due to inserted function entry count annotations. Just
exclude them for now given they aren't actually running any passes.
[clang-tidy] Teach `performance-faster-string-find` about `starts_with`, `ends_with`, and `contains` (#182633)
These aren't "find" functions per se, so they don't totally match the
check name, but the same optimization is applicable to them (for
example, see
https://en.cppreference.com/w/cpp/string/basic_string_view/starts_with.html).
This optimization could be expanded to `operator+=` as well, but that's
a bit more involved, so I'm not doing it in this PR.
InstCombine: Fold absorbing fmul of compared 0 into select
This is similar to the select-bin-op identity case, except
in this case we are looking for the absorbing value for the
binary operator.
If the compared value is a floating-point 0, and the fmul is
implied to return a +0, put the 0 directly into the select
operand. This pattern appears in scale-if-denormal sequences
after optimizations assume denormals are treated as 0.
Fold:
%fabs.x = call float @llvm.fabs.f32(float %x)
%mul.fabs.x = fmul float %fabs.x, known_positive
%x.is.zero = fcmp oeq float %x, 0.0
%select = select i1 %x.is.zero, float %mul.fabs.x, float %fabs.x
To:
%fabs.x = call float @llvm.fabs.f32(float %x)
[5 lines not shown]
[Scalarizer] Fix out-of-bounds crash (#180359)
When processing an extractelement instruction with an index that exceeds
the vector size (e.g., extracting index 2147483647 from a 4-element
vector), the scalarizer would calculate an out-of-bounds Fragment index
and crash with an assertion failure in `SmallVector::operator[]`.
This PR adds a bounds check in
`ScalarizerVisitor::visitExtractElementInst` to prevent a crash when the
extractelement index is out of bounds.
Fixes #179880
Remove whitespace on blank lines (#182574)
I removed some whitespace on a workflow job, which only had spaces.
I did not remove the newline completelty, only the whitespace junk,
which I found by git diffing the head.
[InstCombine] Update test
This was breaking buildbots due to a mid-air collision where some change
caused test differences between when the test was put up/passed CI and
when it landed.
[InstCombine] Transform splat before n x i1 for vec.reduce.add (#182213)
```llvm
define i1 @src(i1 %0) {
%2 = insertelement <8 x i1> poison, i1 %0, i32 0
%3 = shufflevector <8 x i1> %2, <8 x i1> poison, <8 x i32> zeroinitializer
%4 = tail call i1 @llvm.vector.reduce.add.v8i1(<8 x i1> %3)
ret i1 %4
}
define i1 @tgt(i1 %0) {
ret i1 0
}
```
alive2: https://alive2.llvm.org/ce/z/vejxot
`vector_reduce_add(<n x i1>)` to `Trunc(ctpop(bitcast <n x i1> to in))`
interferes with the `vector_reduce_add(<splat>)` to `mul`, so I
[2 lines not shown]