[MLIR][OpenMP] Support cancel taskgroup inside of taskloop
Implementation follows exactly what is done for omp.wsloop and omp.task.
See #137841.
The change to the operation verifier is to allow a taskgroup
cancellation point inside of a taskloop. This was already allowed for
omp.cancel.
[libc++][NFC] Simplify the implementation of __mul_overflowed (#174956)
`__builtin_mul_overflow` does the right thing, even for `char` and
`short`, so the overloads for these types can simply be dropped. We can
also merge the remaining two overloads into a single one now, since we
don't do any dispatching for `char` and `short` anymore.
Fix lld crash using --fix-cortex-a53-843419 (#170495)
Original crash was observed in Chromium, in [1]. The problem occurs in
elf::isAArch64BTILandingPad because it didn't handle synthetic sections,
which can have a nullptr as a buf, so it crashed while trying to read
that buf.
After fixing that, a second issue occurs: When the patched code grows
too
much, it gets far away from the short jump, and the current
implementation
assumes a R_AARCH64_JUMP26 will be enough.
This PR changes the implementation to:
(a) In isAArch64BTILandingPad, checks if a section is synthetic, and
assumes that it'll NOT contain a landing pad, avoiding the buffer check;
(b) Suppress the size rounding for thunks that preceeds section
(Making the situation less likely to happen);
(c) Reimplements the patch by using a R_AARCH64_ABS64 in case the
[6 lines not shown]
[libc++][NFC] Refactor _LIBCPP_OVERRIDABLE_FUNCTION to be a normal attribute macro (#174964)
Currently `_LIBCPP_OVERRIDABLE_FUNCTION` takes the return type, function
name and argument list, but simply constructs the function and adds
attributes without modifying the signature in any way. We can replace
this with a normal attribute macro, making the signature easier to read
and simpler to understand what's actually going on. Since it's an
internal macro we can also drop the `_LIBCPP_` prefix.
[libc++] Fix {deque,vector}::append_range assuming too much about the types (#162438)
Currently, `deque` and `vector`'s `append_range` is implemented in terms
of `insert_range`. The problem with that is that `insert_range` has more
preconditions, resulting in us rejecting valid code.
This also significantly improves performance for `deque` in some cases.
[libc++] Refactor variant benchmarks (#174743)
The variant benchmarks are incredibly slow to compile and run currently.
This is due to them being incredibly exhaustive. This is usually a good
thing, but the exhaustiveness makes it prohibitive to actually run the
benchmarks. Even the new, incredibly reduced, set still requires almost
40 seconds to just compile on my system.
[libc++] Introduce the notion of a minimum header version (#166074)
Introducing the notion of a minimum header version has multiple
benefits. It allows us to merge a bunch of ABI macros into a single one.
This makes configuring the library significantly easier, since, for a
stable ABI, you only need to know which version you started distributing
the library with, instead of checking which ABI flags have been
introduced at what point. For platforms which have a moving window of
the minimum version a program has been compiled against, this also makes
it simple to remove symbols from the dylib when they can't be used by
any program anymore.
[mlir][OpenMP] Don't allocate task context structure if not needed (#174588)
Don't allocate a task context structure if none of the private variables
needed it. This was already skipped when there were no private variables
at all.
[AArch64] Fix using NEON copies in streaming-mode-enable regions. (#174738)
The current checks for if we're allowed to use a NEON copy works based on
the function attributes, which works most of the time. However in one
particular case where a normal function calls a streaming one, there's a
window of time where we enable SM at the call site and the emit a copy for
an outgoing parameter. This copy was lowered to a NEON move which is illegal.
There's also another case where we could end up generating these,
related to zero cycle move tuning features.
Both of these cases is fixed in this patch by walking back from the copy
to look for any streaming mode changes (within the current block). I know
this is pretty ugly but I don't have a better solution right now.
rdar://167439642
[MLIR][OpenMP] Support cancel taskgroup inside of taskloop
Implementation follows exactly what is done for omp.wsloop and omp.task.
See #137841.
The change to the operation verifier is to allow a taskgroup
cancellation point inside of a taskloop. This was already allowed for
omp.cancel.
bnxt: Fix up ioctl opcodes to support IOC_VOID along with IOC_IN
The driver and applications currently use hard-coded numeric ioctl command
opcodes. These opcodes are interpreted as having the IOC_IN direction (data
copied from the user application to the driver), regardless of the actual packet
size. Consequently, when the packet size is zero and the direction is set to
IOC_IN, the kernel fails these ioctls if COMPAT is disabled.
While the driver and applications should ideally set the direction correctly—
for example, using IOC_VOID when the packet size is zero—the driver will now
be updated to define ioctl opcodes using the _IOC macro to support both
IOC_VOID and IOC_IN. This change ensures backward compatibility with older
applications that exclusively use IOC_IN.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D54601
MFC after: 3 days
[X86] Add rewrite pattern for SSE41/AVX1 roundss/sd + blendps/pd (#172056)
Due to a previous PR (https://github.com/llvm/llvm-project/pull/171227),
operations like `_mm_ceil_sd` compile to suboptimal assembly:
```asm
roundsd xmm1, xmm1, 10
blendpd xmm0, xmm1, 1
```
This PR introduces a rewrite pattern to mitigate this, and fuse the corresponding operations.
[mlir][OpenMP] Don't allocate task context structure if not needed
Don't allocate a task context structure if none of the private variables
needed it. This was already skipped when there were no private variables
at all.
[MLIR][OpenMP] Add Initial Taskloop Clause Support (#174623)
Following on from the work to implement MLIR -> LLVM IR Translation for
Taskloop, this adds support for the following clauses to be used
alongside taskloop:
- if
- grainsize
- num_tasks
- untied
- Nogroup
- Final
- Mergeable
- Priority
These clauses are ones which work directly through the relevant OpenMP
Runtime functions, so their information just needed collecting from the
relevant location and passing through to the appropriate runtime
function.
Remaining clauses retain their TODO message as they have not yet been
implemented.
[BOLT] Fix label in epilogue-determination.s test (#174960)
On RHEL8 we get the following error that may originate from the changed
typo:
```
clang: warning: argument unused during compilation: '-ffreestanding' [-Wunused-command-line-argument]
ld.lld: error: relocation R_AARCH64_ADR_PREL_LO21 cannot be used against symbol '_jmptbl2'; recompile with -fPIC
>>> defined in /tmp/epilogue-determination-7bd9d4.o
>>> referenced by /tmp/epilogue-determination-7bd9d4.o:(.text+0x54)
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```