[clang-tidy] Teach `performance-faster-string-find` about `starts_with`, `ends_with`, and `contains` (#182633)
These aren't "find" functions per se, so they don't totally match the
check name, but the same optimization is applicable to them (for
example, see
https://en.cppreference.com/w/cpp/string/basic_string_view/starts_with.html).
This optimization could be expanded to `operator+=` as well, but that's
a bit more involved, so I'm not doing it in this PR.
InstCombine: Fold absorbing fmul of compared 0 into select
This is similar to the select-bin-op identity case, except
in this case we are looking for the absorbing value for the
binary operator.
If the compared value is a floating-point 0, and the fmul is
implied to return a +0, put the 0 directly into the select
operand. This pattern appears in scale-if-denormal sequences
after optimizations assume denormals are treated as 0.
Fold:
%fabs.x = call float @llvm.fabs.f32(float %x)
%mul.fabs.x = fmul float %fabs.x, known_positive
%x.is.zero = fcmp oeq float %x, 0.0
%select = select i1 %x.is.zero, float %mul.fabs.x, float %fabs.x
To:
%fabs.x = call float @llvm.fabs.f32(float %x)
[5 lines not shown]
[Scalarizer] Fix out-of-bounds crash (#180359)
When processing an extractelement instruction with an index that exceeds
the vector size (e.g., extracting index 2147483647 from a 4-element
vector), the scalarizer would calculate an out-of-bounds Fragment index
and crash with an assertion failure in `SmallVector::operator[]`.
This PR adds a bounds check in
`ScalarizerVisitor::visitExtractElementInst` to prevent a crash when the
extractelement index is out of bounds.
Fixes #179880
Remove whitespace on blank lines (#182574)
I removed some whitespace on a workflow job, which only had spaces.
I did not remove the newline completelty, only the whitespace junk,
which I found by git diffing the head.
[InstCombine] Update test
This was breaking buildbots due to a mid-air collision where some change
caused test differences between when the test was put up/passed CI and
when it landed.
[InstCombine] Transform splat before n x i1 for vec.reduce.add (#182213)
```llvm
define i1 @src(i1 %0) {
%2 = insertelement <8 x i1> poison, i1 %0, i32 0
%3 = shufflevector <8 x i1> %2, <8 x i1> poison, <8 x i32> zeroinitializer
%4 = tail call i1 @llvm.vector.reduce.add.v8i1(<8 x i1> %3)
ret i1 %4
}
define i1 @tgt(i1 %0) {
ret i1 0
}
```
alive2: https://alive2.llvm.org/ce/z/vejxot
`vector_reduce_add(<n x i1>)` to `Trunc(ctpop(bitcast <n x i1> to in))`
interferes with the `vector_reduce_add(<splat>)` to `mul`, so I
[2 lines not shown]
FunctionAttrs: Basic propagation of nofpclass
Perform caller->callee propagation of nofpclass on callsites. As
far as I can tell the only prior callsite to callee propagation here
was for norecurse. This doesn't handle transitive callers.
I was hoping to avoid doing this, and instead get attributor/attributor-light
enabled in the default pass pipeline. nofpclass propagation enabled by
default is the main blocker for eliminating the finite_only_opt global
check in device-libs, but this single level of propagation is most likely
sufficient for that use. Implemnting this here is probably the most expedient
path to removing the control library.
Attributor: Avoid double map lookup in updateAttrMap
This will leave behind the map entry in the unchanged case,
but this seems to not matter. Could erase the newly inserted
entry if that happens, but that also doesn't seem to make a
difference.
[RISCV] Fold shladd into Xqcisls scaled load/store in RISCVMergeBaseOffset (#182221)
We can fold `shxadd\qc.shladd` into base+offset load/store instructions
by transforming the load/store into `Xqcisls` scaled load/store
instructions.
For eg.
```
qc.e.li vreg1, s
shxadd vreg2, vreg3, vreg1
lx vreg4, imm(vreg2)
can be transformed to
qc.e.li vreg1, s+imm
qc.lrx vreg4, vreg1, vreg3, (1-7)
```
[5 lines not shown]
Attributor: Avoid calling identifyDefaultAbstractAttributes on declarations
Previously it would be called and inserted into a visited map,
but would never be used. This could possibly go one step further
and never add declarations to the SetVector of Functions. If I try
that, only one call graph printing test fails.
[NFC][LFI] Reduce includes due to c-t impact (#182617)
Removes header includes that don't need to be made at the top-level by
moving transitive dependencies directly into source files and using
forward declarations. Biggest impact is that we no longer include
`MCLFIRewriter.h` in `MCStreamer.h` and `MCAsmParserExtension.h`.
[clang-tidy] Correctly handle array of pointers in misc-const-correctness (#179059)
In arrays of pointers, `misc-const-correctness` check wrongly inspects
whether the array element type was const-qualified, rather than the type
it points to, leading to redundant `const` suggestions. This patch fixes
the problem.
Closes [#178880](https://github.com/llvm/llvm-project/issues/178880)
[Clang][AMDGPU][Docs] Add builtin documentation for AMDGPU builtins
Use the documentation generation infrastructure to document the AMDGPU builtins.
This PR starts with the ABI / Special Register builtins. Documentation for the
remaining builtin categories will be added incrementally in follow-up patches.
[Clang][TableGen] Add documentation generation infrastructure for builtins (#181573)
Add a `-gen-builtin-docs` TableGen backend that generates RST
documentation from builtin definitions, modeled after the existing
attribute documentation system (`-gen-attr-docs`).
The emitter generates per-builtin RST sections grouped by category,
including
prototype rendering with optional named parameters (via `ArgNames`),
target
feature annotations, and documentation content. A mismatch between
`ArgNames`
count and prototype parameter count is a fatal error.
[Clang] Added clang diagnostic when snprintf/vsnprintf uses sizeof(dest) for the len parameter
Closes: [#162366](https://github.com/llvm/llvm-project/issues/162366)
---------
Co-authored-by: Bogdan Zunic <bzunic at cisco.com>
[Clang][AMDGPU][Docs] Add builtin documentation for AMDGPU builtins
Use the documentation generation infrastructure to document the AMDGPU builtins.
This PR starts with the ABI / Special Register builtins. Documentation for the
remaining builtin categories will be added incrementally in follow-up patches.