[Mips] Add r5900 (PlayStation 2 Emotion Engine) CPU support (#176666)
This PR adds basic support for the MIPS R5900 CPU, the Emotion Engine
processor used in the PlayStation 2.
**LLVM changes:**
- Add r5900 CPU definition (with soft float support for now)
- Disable instructions not supported by r5900 (64-bit multiply/divide,
LL/SC atomics, COP3)
- Add r5900 specific short loop delay slot fix (hardware errata
workaround)
- Set ISA extension `AFL_EXT_5900` in ELF flags for proper ABI
identification
**Clang changes:**
- Add r5900 as a valid CPU target for `-mcpu=r5900`
- Add r5900 to CPU test coverage
[Clang] prevent assertion in __has_embed parameter recovery at end-of-directive (#175104)
Fixes #175088
---
This PR addresses an assertion failure in the preprocessor triggered
when `__has_embed` parameter parsing reaches end-of-directive while
expecting a parenthesized argument.
[RISCV] Run combineOrToBitfieldInsert after DAG legalize (#177830)
Not combing `OR` into `QC.INSB(I)` before DAG legalization helps known
bits analysis to simplify the code if possible.
(cherry picked from commit 3ed48305ab19bf0090d2ca714a37dd7b0667b6c2)
[ARM] Count register copies when estimating function size (#175763)
`EstimateFunctionSizeInBytes`, in `ARMFrameLowering.cpp`, provides an
early estimate of the compiled size of a function, in a context that
wants to overestimate rather than underestimate.
In some cases it was underestimating severely, by over 20%. The
discrepancy was entirely accounted for by the fact that `COPY`
operations were not being counted at all, even though each one (or at
least each one that survives any post-regalloc optimizations) takes 2
bytes in Thumb or 4 in Arm. This could lead to a compile failure, if the
underestimated function size led frame lowering to not stack LR, but
later, `ARMConstantIslandsPass` needed to insert an intra-function
branch long enough to require a `bl` instruction, needing LR to have
been stacked.
The result of `EstimateFunctionSizeInBytes` was not directly available
for testing, so I added an `LLVM_DEBUG` at the end of the function. That
way, the test file doesn't need to try to make a >2048 byte function
[11 lines not shown]
[WebAssembly] Zero and NaN checks for min/max (#177968)
Custom lower FMINNUM, FMINIMUMNUM, FMAXNUM and FMAXIMUMNUM to generate
relaxed_min and relaxed_max when the inputs cannot be NaN or signed
zero.
Tablegen patterns have also been modified to check the above conditions
when trying to match relaxed min/max using the pmin/pmax pattern.
[Clang] Fix rewrite-includes-bom.c to use POSIX-compliant regex (#176043)
As `\s` is a GNU extension, it is not supported by the system grep on
AIX and thus fails in the
[buildbot](https://lab.llvm.org/buildbot/#/builders/64/builds/6835):
```
******************** TEST 'Clang :: Frontend/rewrite-includes-bom.c' FAILED ********************
Exit Code: 1
Command Output (stdout):
--
# RUN: at line 1
cat /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/clang/test/Frontend/Inputs/rewrite-includes-bom.h | od -t x1 | grep -q 'ef\s*bb\s*bf'
# executed command: cat /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/clang/test/Frontend/Inputs/rewrite-includes-bom.h
# executed command: od -t x1
# executed command: grep -q 'ef\s*bb\s*bf'
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1
--
[8 lines not shown]
[RISCV] Add tests for rv32 gather/scatter costs. NFC
There's a divergence with the rv32 costs that I plan on fixing in
another patch, so this precommits the tests for them.
The zve32f RUN lines were split off into another file so the check prefixes
are easier to reason about.
The -riscv-v-vector-bits-max RUN lines were also removed to simplify the
check prefixes since I'm not sure if they were intentionally testing any
specific logic.
(cherry picked from commit 3ad6d350c44f54482a86a7eb488732093eaed372)
[RISCV] Fix i64 gather/scatter cost on rv32 (#176105)
Fixes #175909
We compute the cost of a gather/scatter by multiplying the cost of the
scalar element type memory op by the estimated number of elements. On
rv32 though a scalar i64 load costs 2, even if we have zve64x.
This causes the cost to diverge between a vector of f64 and vector of
i64, even though both are the same. This fixes it by just using
TTI::TCC_Basic as the scalar memory op cost. The element type is checked
to be legal at this point.
I think we have the same issue for the strided op cost, but we don't
have test coverage for it yet.
(cherry picked from commit 0c1257cd46456513016b106d964dc5ad47c6289b)
Use reportFatalUsageError in llvm-omp-kernel-replay (#178371)
All error cases in this tool are usage errors (bad user input, missing
files, malformed JSON) rather than internal LLVM bugs, so
`reportFatalUsageError` is the appropriate replacement.
[JumpThreading] Avoid unnecessary map resizing in gatherIncomingValuesToPhi (#173596)
Previously, `gatherIncomingValuesToPhi` populated the `IncomingValues`
map with *all* non-undef incoming values from the PHI node. For PHI
nodes with a large number of incoming blocks, this caused the
`SmallDenseMap` to grow significantly, triggering expensive resizing and
rehashing operations, even when the caller
(`redirectValuesFromPredecessorsToPhi`) was only interested in a small
subset of predecessors.
This patch optimizes the logic to prevent this unnecessary map growth.
Instead of collecting all values, we now:
1. Initialize the `IncomingValues` map specifically for the blocks in
`BBPreds` (setting them to `nullptr` initially).
2. Iterate through the PHI node and update the map entries only if the
incoming block is already present in the map.
This ensures that the size of the map is bounded by the size of
[5 lines not shown]
[InstCombine][profcheck] Preserve !prof metadata when folding select. (#177707)
The new select `InstCombinerImpl::foldBinOpSelectBinOp` reuses the same
condition in the same BB as the original so the profile info can be
trivially copied over.
[RISCV] Add OPC_C0/C1/C2 named values to tablegen. NFC (#178325)
This adds named opcodes for the compressed instructions like we have for
the 32-bit instructions.