[CodeGen] Generalise Hexagon flags for memop inline thresholds (#172829)
Generalise the Hexagon cmdline options to control if memset, memcpy or memmove intrinsics should be inlined versus calling library functions, so they can be used by all backends:
• -max-store-memset
• -max-store-memcpy
• -max-store-memmove
These flags override the target-specific defaults set in TargetLowering (e.g., MaxStoresPerMemcpy) and allow fine-tuning of the inlining threshold for performance analysis and optimization.
The optsize variants (-max-store-memset-Os, -max-store-memcpy-Os, max-store-memmove-Os) from the Hexagon backend were removed, and now the above options control both.
The threshold is specified as a number of store operations, which is backend-specific. Operations requiring more stores than the threshold will call the corresponding library function instead of being inlined.
[PPC] Disable some ORC-powerpc64le-linux tests. (#175100)
Tests fail to link when using LLVM C++ library. Disabling the tests
until they can be investigated and the underlying cause identified and
fixed.
devel/onetbb: Improve port
- Only build unit tests when requested to make overall build faster
- Adjust port Makefile to more closely follow Porters Handbook
- Remove USES= tar:tgz (incorrect)
- Replace USES= localbase with localbase:ldflags
- Use a separate section for USE_GITHUB
PR: 292088
[MLIR][OpenMP] Support cancel taskgroup inside of taskloop (#174815)
Implementation follows exactly what is done for omp.wsloop and omp.task.
See #137841.
The change to the operation verifier is to allow a taskgroup
cancellation point inside of a taskloop. This was already allowed for
omp.cancel.
[X86][InstCombine] Generalize SSE/AVX fp MAX/MIN intrinsics to maxnum/minnum (#174806)
Fixes #173270
For x86 SSE/AVX floating point MAX/MIN intrinsics, attempt to generalize
them down into `Intrinsic::maxnum` and `Intrinsic::minnum` given that we
can verify that the inputs are either (PosNormal, NegNormal, PosZero).
This PR uses the `llvm::computeKnownFPClass` to generate the FPClass
bitset to verify if the inputs are of the other FP types (NaN, Inf,
Subnormal, NegZero).
[llvm-exegesis] Fix intermittent failure in setReg_init_check.s (#175148)
Test is failing intermittently after #174944. The issue this time is the
`WSeqPair`/`XSeqPair` tests fail if the same pair is used as there's
fewer MOVs.
The test was expecting:
```
0000000000000000 <foo>:
0: f81e0ffb str x27, [sp, #-0x20]!
4: a90163fa stp x26, x24, [sp, #0x10]
8: d2800006 mov x6, #0x0 // =0
c: d2800007 mov x7, #0x0 // =0
10: d280001a mov x26, #0x0 // =0
14: d280001b mov x27, #0x0 // =0
18: d2800018 mov x24, #0x0 // =0
1c: 48267f1a casp x6, x7, x26, x27, [x24]
```
but this can occur:
[8 lines not shown]
Firewall: Aliases - use new hostdiscovery (with arp/ndp fallback) in mac type aliases.
While here, cleanup some redundant code, if a mac address is in the local cache, the local cache should be complete at anytime.
Technically, for legacy ndp, this might be a bit worse than before, but as hostdiscovery is more complete, that should be a small price to pay.
Eventually, when hostdiscoverty is the standard, we should be able to ditch the /tmp/alias_filter_arp.cache construction as hostdiscovery has its own database.
(cherry picked from commit b2a30fc5606ce2d6c781ae9b7282b83e8ec35ac3)
[MLIR][OpenMP] Support cancel taskgroup inside of taskloop
Implementation follows exactly what is done for omp.wsloop and omp.task.
See #137841.
The change to the operation verifier is to allow a taskgroup
cancellation point inside of a taskloop. This was already allowed for
omp.cancel.
[libc++][NFC] Simplify the implementation of __mul_overflowed (#174956)
`__builtin_mul_overflow` does the right thing, even for `char` and
`short`, so the overloads for these types can simply be dropped. We can
also merge the remaining two overloads into a single one now, since we
don't do any dispatching for `char` and `short` anymore.
Fix lld crash using --fix-cortex-a53-843419 (#170495)
Original crash was observed in Chromium, in [1]. The problem occurs in
elf::isAArch64BTILandingPad because it didn't handle synthetic sections,
which can have a nullptr as a buf, so it crashed while trying to read
that buf.
After fixing that, a second issue occurs: When the patched code grows
too
much, it gets far away from the short jump, and the current
implementation
assumes a R_AARCH64_JUMP26 will be enough.
This PR changes the implementation to:
(a) In isAArch64BTILandingPad, checks if a section is synthetic, and
assumes that it'll NOT contain a landing pad, avoiding the buffer check;
(b) Suppress the size rounding for thunks that preceeds section
(Making the situation less likely to happen);
(c) Reimplements the patch by using a R_AARCH64_ABS64 in case the
[6 lines not shown]
[libc++][NFC] Refactor _LIBCPP_OVERRIDABLE_FUNCTION to be a normal attribute macro (#174964)
Currently `_LIBCPP_OVERRIDABLE_FUNCTION` takes the return type, function
name and argument list, but simply constructs the function and adds
attributes without modifying the signature in any way. We can replace
this with a normal attribute macro, making the signature easier to read
and simpler to understand what's actually going on. Since it's an
internal macro we can also drop the `_LIBCPP_` prefix.
[libc++] Fix {deque,vector}::append_range assuming too much about the types (#162438)
Currently, `deque` and `vector`'s `append_range` is implemented in terms
of `insert_range`. The problem with that is that `insert_range` has more
preconditions, resulting in us rejecting valid code.
This also significantly improves performance for `deque` in some cases.
[libc++] Refactor variant benchmarks (#174743)
The variant benchmarks are incredibly slow to compile and run currently.
This is due to them being incredibly exhaustive. This is usually a good
thing, but the exhaustiveness makes it prohibitive to actually run the
benchmarks. Even the new, incredibly reduced, set still requires almost
40 seconds to just compile on my system.
[libc++] Introduce the notion of a minimum header version (#166074)
Introducing the notion of a minimum header version has multiple
benefits. It allows us to merge a bunch of ABI macros into a single one.
This makes configuring the library significantly easier, since, for a
stable ABI, you only need to know which version you started distributing
the library with, instead of checking which ABI flags have been
introduced at what point. For platforms which have a moving window of
the minimum version a program has been compiled against, this also makes
it simple to remove symbols from the dylib when they can't be used by
any program anymore.
[mlir][OpenMP] Don't allocate task context structure if not needed (#174588)
Don't allocate a task context structure if none of the private variables
needed it. This was already skipped when there were no private variables
at all.