[ValueTracking] Support ptrtoaddr in inequality implication (#173362)
`ptrtoaddr(p1) - ptrtoaddr(p2) == non-zero` implies `p1 != p2`, same as
for ptrtoint.
[ValueTracking] Support ptrtoaddr in computeKnownBits() (#173358)
ptrtoaddr can be handled the same as ptrtoint here. The pointer known
bits cover the full pointer width, and ptrtoaddr either passes those
through directly or truncates to the address size.
[Flang] Add FIR and LLVM lowering support for prefetch directive (#167272)
Implementation details:
* Add PrefetchOp in FirOps
* Handle PrefetchOp in FIR Lowering and also pass required default
values
* Handle PrefetchOp in CodeGen.cpp
* Add required tests
[C++20] [Modules] Fix incorrect read of TULocalOffset for delayed namespace (#174365)
Close https://github.com/llvm/llvm-project/issues/158321
The root cause of the problem is a mismatch in an initializer.
[OpenMP][OMPIRBuilder] Hoist static parallel region allocas to the entry block on the CPU (#174314)
Follow-up on #171597, this PR hoists allocas in a parallel region to the
entry block of its corresponding outlined function. This PR does this
for the CPU while #171597 introduced the main mechanism to do so and did
it for the GPU.
[OpenMP][OMPIRBuilder] Hoist static parallel region allocas to the entry block on the CPU
Follow-up on #171597, this PR hoists allocas in a parallel region to the
entry block of its corresponding outlined function. This PR does this
for the CPU while #171597 introduced the main mechanism to do so and did
it for the GPU.
[OpenMP][OMPIRBuilder] Hoist static parallel region allocas to the entry block on the CPU
Follow-up on #171597, this PR hoists allocas in a parallel region to the
entry block of its corresponding outlined function. This PR does this
for the CPU while #171597 introduced the main mechanism to do so and did
it for the GPU.
[OpenMP][MLIR] Hoist static `alloca`s emitted by private `init` regions to the allocation IP of the construct (#171597)
Having more than 1 descritpr (allocatable or array) on the same
`private` clause triggers a runtime crash on GPUs at the moment.
For SPMD kernels, the issue happens because the initialization logic
includes:
* Allocating a number of temporary structs (these are emitted by flang
when `fir` is lowered to `mlir.llvm`).
* There is a conditional branch that determines whether we will allocate
storage for the descriptor and initialize array bounds from the original
descriptor or whether we will initialize the private descriptor to null.
Because of these 2 things, temp allocations needed for descriptors
beyond the 1st one are preceded by branching which causes the observed
the runtime crash.
This PR solves this issue by hoisting these static `alloca`s
instructions to the suitable allca IP of the parent construct.
[InstCombine] Fold redundant FP clamp selects; relax min-max-pattern bailout in visitFCmp (#173452)
visitFCmp() previously bailed out when a following select matched a
clamp pattern. This blocks simplifications when the clamp is provably
redundant.
This PR allows simplification for clamp selects of flavor SPF_FMAXNUM/
SPF_FMINNUM when one arm is a constant and the other is a sitofp/uitofp
of an integer value, and the constant equals the exact min/max of that
integer domain:
* SPF_FMAXNUM (pattern max(X,C)): redundant if C is the minimum integer
mapped exactly to FP (e.g. X = sitofp i8, C = -128.0f).
* SPF_FMINNUM (pattern min(X,C)): redundant if C is the maximum integer
mapped exactly to FP (e.g. X = uitofp i8, C = 255.0f).
This fixes a regression in #173454
---------
Co-authored-by: Copilot <175728472+Copilot at users.noreply.github.com>
Co-authored-by: Yingwei Zheng <dtcxzyw at qq.com>
[Clang][Diagnostics] Mention 'import std' in typeid diagnostic (#173236)
Previously, the diagnostic only suggested including `<typeinfo>`. Since
C++20,the standard library may also be made available via `import std;`.
This change updates the diagnostic to mention `import std` as an
alternative and adds a test to cover the new wording.
[RISCV][SelectionDAG] Add a ISD::CTLS node for count leading redundant sign bits. Use it to select CLS(W). (#173417)
The RISC-V P extension adds an instruction equivalent to
__builtin_clrsb. AArch64 has a similar instruction that we currently fail to
select when using the builtin.
This patch adds a combine based on the canonical version of the pattern
emitted by clang for the builtin, (add (ctlz (xor x, (sra x, bw-1)))),
-1). I'm starting the combine at the ctlz because the outer add can
easily be combined into other nodes obscuring the full pattern. So we
generate (add (ctls x), 1) and hope the add will be combined away.
I've also added a combine for the pattern AArch64 recognizes
(ctlz_zero_undef (or (shl (xor x, (sra x, bw-1)), 1), 1)).
I've only enabled the combines when the target has a Legal or Custom
action for the operation, taking into account type promotion. We
can relax this in the future by adding a default expansion to
LegalizeDAG and adding more type legalization rules.