InstCombine: Add minnum/maxnum SimplifyDemandedFPClass baseline tests
This is just the existing minimumnum/maximumnum tests copied with
find and replace.
Revert "[Clang][Lexer] Reland "Detect SSE4.2 availability at runtime in fastParseASCIIIdentifier"" (#177322)
Reverts llvm/llvm-project#175452 because of buildbot failures reported.
[AArch64] Use +0.0 for accumulator for FMUL -> BFMLAL lowering in more cases (#174423)
Depending on the users of the FMUL we may not need to preserve the sign
of the multiplication result (when it is zero).
See: 29611f4cbea0e867a3e55516f8dfdeca595be436
[AArch64] Fix partial_reduce v16i8 -> v2i32 (#177119)
The lowering doesn't need to check for `ConvertToScalable`, because it
lowers to another `PARTIAL_REDUCE_*MLA` node, which is subsequently
lowered using either fixed-length or scalable types.
This fixes https://github.com/llvm/llvm-project/issues/176954
[libc++][22.x] Target the release runners for the LLVM 22 release branch
This ensures that the CI on the release branch keeps working even as
we make changes to our CI setup on the main branch.
[ELF] --why-live: Skip symbol at index 0 and section symbols, handle .eh_frame symbols (#177099)
Symbols of empty names can be matched by `--why-live='*'`, which are
generally not useful.
* The first entry in a symbol table (STB_LOCAL and undefined)
* `STT_SECTION` symbols (emitted by LLVM integrated assembler when
needed by relocations). These input section symbols will be demoted by
`demoteAndCopyLocalSymbols`, so technically not really live.
In addition, such symbols of non-allocable sections currently lead to
crashes: `whyLive` does not record the section, causing the second
iteration of the `while (true)` loop in printWhyLive to call
`std::get<Symbol *>(cur)` when `cur` is an `InputSectionBase *`.
In addition, handle GCC crtendS.o `__FRAME_END__`, which is defined
relative to a `.eh_frame` section created with
`__attribute__((used, section(".eh_frame")))`.
Fix #176890
[2 lines not shown]
[AArch64][SME] Disable tail calls in new ZA/ZT0 functions (#177152)
Allowing this can result in invalid tail calls to shared ZA functions.
It may be possible to limit this to the case where the caller is private
ZA and the callee shares ZA, but for now it is generally disabled.
(cherry picked from commit 10aca26ffffe6a9ee049f479ed7fee9e07421dad)
[LLVM] Update the default value for MaxLargeFPConvertBitWidthSupported to 128 (#176851)
Previously, we can't compile the program which convert 256 bits to
floating points and vice versa(we'll crash). After this, we're able to
compile them.
Fix EXTEND_VECTOR_INREG widening when input > result size (#177095)
This patch fixes an LLVM crash on AMDGPU that occurred when compiling
valid code involving non-power-of-two vector sizes. During type
legalization, LLVM widened an EXTEND_VECTOR_INREG operation by first
widening the input vector, which could make the input larger than the
result and trigger an assertion failure.
The fix changes the logic to widen the result first and then extract the
needed portion so there's no invalid size mismatch. I've added a test
that previously crashed but now doesn't.
fixes #176966.
---------
Co-authored-by: Natalia Kokoromyti <knatalia at yost-cm-01-imme.stanford.edu>
[LoongArch] Remove DAG combination for extractelement (#177083)
Combination for `trunc+extend+extractelement` to a single
`extractelement` may occur error, because the high bits of the extract
index truncated by `trunc` operation are reserved after the combination.
This commit remove this combination and the issue
https://github.com/llvm/llvm-project/issues/176839 will never appear.
(cherry picked from commit f537408bc4fee1b7edc6b703e68792957f85f133)
[CodeGenPrepare] Fix infinite loop with same-type bitcasts (#176694)
OptimizeNoopCopyExpression was sinking same-type bitcasts (e.g. bitcast
i32 to i32) which would then be reintroduced by optimizePhiType, causing
an infinite loop.
Fix by adding a check (PhiTy == ConvertTy) in optimizePhiType to skip
the conversion when types are already identical.
Fixes #176688.
[MC] Explicitly use memcpy in emitBytes() (NFC) (#177187)
We've observed a compile-time regression in LLVM 22 when including large
blobs. The root cause was that emitBytes() was copying bytes one-by-one,
which is much slower than using memcpy for large objects.
Optimization of std::copy to memmove is apparently much less reliable
than one might think. In particular, when using a non-bleeding-edge
libstdc++ (anything older than version 15), this does not happen if the
types of the input and output iterators do not match (like here, where
there is a signed/unsigned mismatch).
As this code is performance sensitive, I think it makes sense to
directly use memcpy.
Previously this code used SmallVector::append, which explicitly uses
memcpy.