[libc++] Avoid including <format> code in <optional> (#179466)
This patch moves `format_kind` to `<__fwd/format.h>` and moves the
mandate of `basic_format_context` into the class body. This reduces the
time it takes to parse `<optional>` by ~14%.
[lldb] Drop incomplete non-8-bit bytes support (#182025)
This was originally introduced to support kalimba DSPs featuring 24-bit
bytes by f03e6d84 and also c928de3e, but the kalimba support was mostly
removed by f8819bd5. This change removes the rest of the support, which
was far from complete.
[NFC][AArch64] Extract helper functions in AArch64ConditionOptimizer (#182197)
Extract shared logic between the cross- and intra-block paths into new
helper functions
AMDGPU: Perform libcall recognition to replace fast OpenCL pow
If a float-typed call site is marked with afn, replace the 4
flavors of pow with a faster variant.
This transforms pow, powr, pown, and rootn to __pow_fast,
__powr_fast, __pown_fast, and __rootn_fast if available. Also
attempts to handle all of the same basic folds on the new fast
variants that were already performed with the base forms. This
maintains optimizations with OpenCL when the device libs unsafe
math control library is deleted. This maintains the status quo
of how libcalls work, and only handles 4 new entry points. This
only helps with the elimination of the control library, and not
general libcall emission problems.
This makes no practical difference for HIP, which is the status
quo for libcall optimizations. AMDGPULibCalls recognizes the OpenCL
mangled names. e.g., OpenCL float "pow" is really _Z3powff but the
HIP provided function "powf" is really named _ZL4powfff, and std::pow
[6 lines not shown]
[AArch64] Enable MaxInterleaveFactor4 for cortex-x series CPUs. (#181851)
This enables MaxInterleaveFactor4 for CPUs that have 4 or more vector
pipelines in the cortex-x series of cpus.
[mlir][tosa] Add support for dense_resource in tosa-narrow-* passes (#182032)
Add support for `dense_resource` in `tosa-narrow-f64-to-f32` and
`tosa-narrow-i64-to-i32` passes.
[IRBuilder] Use ptrtoaddr in CreatePtrDiff() (#181855)
Make CreatePtrDiff() emit the pattern `ptrtoaddr(p1)-ptrtoaddr(p2)`.
This makes a few changes:
* The return type is now the address type instead of hardcoded to i64.
I've adjusted callers to deal with this where they didn't already.
* Don't use `ConstantExpr::getSizeOf()` and instead get the actual size
from DataLayout. These sizeof expressions will be removed as part of the
ptradd migration.
* Add a convenience overload without the element type, for the case
where you want a pure pointer difference.
I also adjusted some OpenMP code to consistently use zext for sizes, as
I had issues updating the test coverage otherwise (as we ended up
randomly picking zext or sext depending on the exact code path).
tr: fix class handling in unicode world
toupper/tolower logic was only handled for CCLASS_TOUPPER and
CCLASS_TOLOWER, add support for CCLASS ([:alpha:])
PR: 219900
MFC After: 1 week
(cherry picked from commit 625dc44832cd760be3d7242d8e21a530c7e32bfc)
tr: fix class handling in unicode world
toupper/tolower logic was only handled for CCLASS_TOUPPER and
CCLASS_TOLOWER, add support for CCLASS ([:alpha:])
PR: 219900
MFC After: 1 week
(cherry picked from commit 625dc44832cd760be3d7242d8e21a530c7e32bfc)
[Clang][HLSL] Start emitting structured GEP instruction (#177332)
StructuredGEP is a new LLVM intrinsic which will allow to emit proper
logical SPIR-V or DXIL. To properly stage this change going across FE,
BE and optimizations, this commits adds a new flag:
- `-fexperimental-emit-sgep`
When used, this flag will allow compatible frontends to emit the new
instructions. This will also allow us to migrate tests bit by bit,
adding the flag to each migrated test as we make progress on the
implementation.
Once the frontend migration complete, the flag will remain, and work on
the backend will start. Compatible backends like SPIR-V will first allow
both instructions, but then, depending on a target bit similar to
`requiresStructuredCFG`, will declare that they require the SGEP
instruction and will start enforcing it.
Once the whole chain completed, the flag will be defaulted to true and
removed, finishing the migration.
[LoopIdiomVectorize] Test all needles when vectorising find_first_of loops. (#179298)
Fixes #179187 - as described in the issue, the current FindFirstByte
transformation in LoopIdiomVectorizePass will incorrectly early-exit as
soon as a needle matching a search element is found, even if a previous
search element could match a subsequent needle.
This patch ensures all needles are tested before we return a matching
search element.
[mlirbc] Serialize dense elements attr i1 using packed (#182233)
Extra cost is in serialization layer localized while resulting in
smaller bytecode files, this also keeps the format compatible with what
was previously.
ncurses: merge update to ncurses 6.6
6.6 is ABI compatible with 6.5 (tested with abidiff)
Remove html documentation to ease updates
MFC After: 1 month
(cherry picked from commit 68ad2b0d7af2a3571c4abac9afa712f9b09b721c)