[C] Add (new) -Wimplicit-void-ptr-cast to -Wc++-compat (#136855)
This introduces a new diagnostic group (-Wimplicit-void-ptr-cast),
grouped under -Wc++-compat, which diagnoses implicit conversions from
void * to another pointer type in C. It's a common source of
incompatibility with C++ and is something GCC diagnoses (though GCC does
not have a specific warning group for this).
Fixes #17792
[LLVM][TargetParser] Handle -msys targets the same as -cygwin. (#136817)
MSYS2 uses i686-pc-msys and x86_64-pc-msys as target, and is a fork of
Cygwin. There's an effort underway to try to switch as much as possible
to use -pc-cygwin targets, but the -msys target will be hanging around
for the forseeable future.
[libclc] Remove unnecessary clcmacros.h (#137149)
The macros defined by this file (not to be confused with clcmacro.h)
don't appear necessary for building libclc.
The language version macros should be handled by clang, and there are no
uses of NULL or kernel_exec in the source code.
AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg
Uniform S16 shifts have to be extended to S32 using appropriate Extend
before lowering to S32 instruction.
Uniform packed V2S16 are lowered to SGPR S32 instructions,
other option is to use VALU packed V2S16 and ReadAnyLane.
For uniform S32 and S64 and divergent S16, S32, S64 and V2S16 there are
instructions available.
AMDGPU/GlobalISel: add RegBankLegalize rules for select
Uniform condition S1 is AnyExtended to S32 and high bits are
cleaned using AND with 1. Divergent S1 uses VCC.
Using B32/B64 rules to cover scalars vector and pointer types.
Divergent B64 is split to S32.
AMDGPU/GlobalISel: add RegBankLegalize rules for AND OR and XOR
Uniform S1 is lowered to S32.
Divergent S1 is selected as VCC(S1) instruction select will select
SALU instruction based on wavesize (S32 or S64).
S16 are selected as is. There are register classes for vgpr S16.
Since some isel patterns check for sgpr S16 we don't lower to S32.
For 32 and 64 bit types we use B32/B64 rules that cover scalar vector
and pointers types.
SALU B32 and B64 and VALU B32 instructions are available.
Divergent B64 is lowered to B32.
AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc
Uniform S1:
Truncs to uniform S1 and AnyExts from S1 are left as is as they are meant
to be combined away. Uniform S1 ZExt and SExt are lowered using select.
Divergent S1:
Trunc of VGPR to VCC is lowered as compare.
Extends of VCC are lowered using select.
For remaining types:
S32 to S64 ZExt and SExt are lowered using merge values, AnyExt and Trunc
are again left as is to be combined away.
Notably uniform S16 for SExt and Zext is not lowered to S32 and left as is
for instruction select to deal with them. This is because there are patterns
that check for S16 type.
AMDGPU/GlobalISel: add RegBankLegalize rules for bitfield extract
Divergent S32 instruction is available, for S64 need to lower to S32.
Uniform instructions available for both S32 and S64 but need to pack
bitfield offset and size of bitfield into S32. Uniform instruction is
straight up selected since there is no available isel pattern.
[mlir][vector] Update the folder for vector.{insert|extract} (#136579)
This is a minor follow-up to #135498. It ensures that operations like
the following are not treated as out-of-bounds accesses and can be
folded correctly (*):
```mlir
%c_neg_1 = arith.constant -1 : index
%0 = vector.insert %value_to_store, %dest[%c_neg_1] : vector<5xf32> into vector<4x5xf32>
%1 = vector.extract %src[%c_neg_1, 0] : f32 from vector<4x5xf32>
```
In addition to adding tests for the case above, this PR also relocates
the tests from #135498 to be alongside existing tests for the
`vector.{insert|extract}` folder, and reformats them to follow:
* https://mlir.llvm.org/getting_started/TestingGuide/
For example:
* The "no_fold" prefix is now used to label negative tests.
[4 lines not shown]
[X86] SimplifyDemandedVectorEltsForTargetNode - handle 512-bit X86ISD::VPERMI with lower half demanded elts (#137139)
512-bit X86ISD::VPERMI nodes handle the lower/upper 256-bits separately - so if we don't demand the upper half elements, we can just use the 256-bit variant.
[libclang/python] Add equality comparison operators for File (#130383)
This covers the `File` interface changes added by #120590
---------
Co-authored-by: Mathias Stearn <redbeard0531 at gmail.com>
Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav at gmail.com>
SPIRV: Set NoPHIs property after rewriting them (#136327)
There should be no PHIs after selection, as OpPhi is used
instead. This hopefully avoids errors in #135277.
[mlir][tosa] Add verifier check for Concat Op (#136047)
This adds verifier check for Concat Op
to make sure the sum of concatenated axis dimensions is equal to the
output's axis dimension
add tests in verifier.mlir
also moved existing concat verifier checks to verifier.mlir
Signed-off-by: Tai Ly <tai.ly at arm.com>
[SystemZ] Add DAGCombine for FCOPYSIGN to remove rounding. (#136131)
Add a DAGCombine for FCOPYSIGN that removes the rounding which is never
needed as the sign bit is already in the correct place. This helps in particular the
rounding to f16 case which needs a libcall.
Also remove the roundings for other FP VTs and simplify the CPSDR
patterns correspondingly.
fp-copysign-03.ll test updated, now also covering the other FP VT
combinations.
[SystemZ] Handle f16 load positive/negative/complement without libcalls. (#136286)
This can be done directly with the (64-bit) target instruction as only the sign bit
is changed.
[TSan, SanitizerBinaryMetadata] Analyze the capture status for `alloca` rather than arbitrary `Addr` (#132756)
This PR is based on my last PR #132752 (the first commit of this PR),
but addressing a different issue.
This commit addresses the limitation in `PointerMayBeCaptured` analysis
when dealing with derived pointers (e.g. arr+1) as described in issue
#132739.
The current implementation of `PointerMayBeCaptured` may miss captures
of the underlying `alloca` when analyzing derived pointers, leading to
some FNs in TSan, as follows:
```cpp
void *Thread(void *a) {
((int*)a)[1] = 43;
return 0;
}
int main() {
[28 lines not shown]
[AArch64] Update __gcsss intrinsic to match revised ACLE specification (#136850)
The original __gcsss intrinsic was implemented based on:
https://github.com/ARM-software/acle/pull/260
with the signature: const void *__gcsss(const void *)
Per the updated specification in:
https://github.com/ARM-software/acle/pull/364
both const qualifiers have been removed. This commit updates the
signature accordingly to: void *__gcsss(void *)
This aligns the implementation with the latest ACLE definition.
[RISCV] Add fixed-length patterns for disjoint or patterns for vwadd[u].v{v,x} (#136824)
This is the fixed-length equivalent of #136716.
The pattern we need to match is ({s,z}ext_vl (or_vl disjoint a, b)).
This only allows or_vls with an undef passthru, which allows us to
ignore its mask and vl and just take it from the {s,z}ext_vl.
A riscv_or_vl_is_add_oneuse PatFrag is added to mirror or_is_add in
RISCVInstrInfo.td.
[libc++][ranges] Reject non-class types in ranges::to (#135802)
This patch adds `static_assert` using `is_class_v` and `is_union_v` to
reject no-class type template parameters.
Fixes #132133
---------
Co-authored-by: A. Jiang <de34 at live.cn>
Remove an incorrect assert in MFMASmallGemmSingleWaveOpt. (#130131)
This assert was failing in a fuzzing test. I consulted with @jrbyrnes
who said:
The MFMASmallGemmSingleWaveOpt::apply() method is invoked if and only if
the user has inserted an intrinsic llvm.amdgcn.iglp.opt(i32 1) into
their source code. This intrinsic applies a highly specialized DAG
mutation to result in specific scheduling for a specific set of kernels.
These assertions are really just confirming that the characteristics of
the kernel match what is expected (i.e. The kernels are similar to the
ones this DAG mutation strategy were designed against).
However, if we apply this DAG mutation to kernels for which is was not
designed, then we may not find the types of instructions we are looking
for, and may end up with empty caches.
I think it should be fine to just return false if the cache is empty
instead of the assert.
[mlir] add a fluent API to GreedyRewriterConfig (#137122)
This is similar to other configuration objects used across MLIR.
Rename some fields to better reflect that they are no longer booleans.
Reland 04d261101b4f229189463136a794e3e362a793af / #132253.
[VPlan] Remove ILV::sinkScalarOperands. (#136023)
Remove legacy ILV sinkScalarOperands, which is superseded by the
sinkScalarOperands VPlan transforms.
There are a few cases that aren't handled by VPlan's sinkScalarOperands,
because the recipes doesn't support replicating. Those are pointer
inductions and blends.
We could probably improve this further, by allowing replication for more
recipes, but I don't think the extra complexity is warranted.
Depends on https://github.com/llvm/llvm-project/pull/136021.
PR: https://github.com/llvm/llvm-project/pull/136023
[clangd] Strip invalid fromRanges for outgoing calls (#134657)
`CallHierarchyOutgoingCall::fromRanges` are interpreted as ranges in the
same file as the item for which 'outgoingCalls' was called.
It's possible for outgoing calls to be in a different file than that
item if the item is just a declaration (e.g. in a header file). Now,
such calls are dropped instead of being returned to the client.
This is the same as the change made in #111616, but now for outgoing
calls.
Fixes clangd/clangd#2350
---------
Co-authored-by: Nathan Ridge <zeratul976 at hotmail.com>