[SLP]Do not vectorize buildvector tree will scalars in first node, which should remain scalars
Such trees will be revectorized again, causing a compiler hang.
Fixes #172609
[mlir][scf] Fold away `scf.for` iter args cycles (#173436)
When iter args form cycle through region args/yields with the same init
value, we can replace them all with that init value.
---------
Signed-off-by: Ivan Butygin <ivan.butygin at gmail.com>
[LoopPeel] Peel last iteration to enable load widening
In loops that contain multiple consecutive small loads (e.g., 3 bytes
loading i8's), peeling the last iteration makes it safe to read beyond
the accessed region, enabling the use of a wider load (e.g., i32) for
all other N-1 iterations.
Patterns such as:
```
%a = load i8, ptr %p
%b = load i8, ptr %p+1
%c = load i8, ptr %p+2
...
%p.next = getelementptr i8, ptr %p, 3
```
Can be transformed to:
```
%wide = load i32, ptr %p ; Read 4 bytes
[9 lines not shown]
[MLIR] quick fix errors introduced in #173228 (#173474)
My bad, I thought that the build was being run for PRs, everything
succeeded and didn't test out on local.
Now main branch is throwing these errors on CI:
```
[490/2482] Building CXX object tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIComplex.dir/Complex.cpp.o
FAILED: tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIComplex.dir/Complex.cpp.o
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_CAPI_BUILDING_LIBRARY=1 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1/tools/mlir/lib/CAPI/Dialect -I/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/mlir/lib/CAPI/Dialect -I/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1/tools/mlir/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/mlir/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/stage1/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/llvm/include -mcpu=neoverse-v2 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIComplex.dir/Complex.cpp.o -MF tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIComplex.dir/Complex.cpp.o.d -o tools/mlir/lib/CAPI/Dialect/CMakeFiles/obj.MLIRCAPIComplex.dir/Complex.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/llvm/mlir/lib/CAPI/Dialect/Complex.cpp
../llvm/mlir/lib/CAPI/Dialect/Complex.cpp:15:45: error: no member named 'complex' in namespace 'mlir'
15 | mlir::complex::ComplexDialect)
| ~~~~~~^
../llvm/mlir/include/mlir/CAPI/Registration.h:39:30: note: expanded from macro 'MLIR_DEFINE_CAPI_DIALECT_REGISTRATION'
39 | unwrap(registry)->insert<ClassName>(); \
| ^~~~~~~~~
../llvm/mlir/lib/CAPI/Dialect/Complex.cpp:15:45: error: no member named 'complex' in namespace 'mlir'
15 | mlir::complex::ComplexDialect)
[52 lines not shown]
[AArch64][llvm] Add codegen for simd fpcvt intrinsics
Add tablegen patterns to provide codegen for SCVTF and UCVTF
operating purely on SIMD & FP registers, using explicit bitcasts.
[InstCombine][X86] Try to convert BLENDV(X,Y,SHL()) -> SELECT(ICMP_SGT(0,SHL()),Y,X) (#173389)
We are cautious about converting from BLENDV intrinsics as the mask is
usually bitcast from another type, often of an entirely different width
(especially for PBLENDVB which is often used for all integer types) -
incorrect handling can leave us with select ops working on the wrong
type width, which makes it difficult for other passes to make use of it
(VectorCombine in particular).
Currently BLENDV intrinsics are only folded to generic selects when we
know the mask is from a SEXT(vXi1) bool type.
But a second common use is to shift specific bits to the MSB of the
blend mask - this is common in fp mathlib code when working with bounds
etc. and the backend is pretty good at folding this back to a
VSELECT/BLENDV pattern (often better than using the shift directly
especially when it has a non-uniform shift amount).
I've been looking for other common arithmetic ops that would benefit
[2 lines not shown]
[Clang] Remove 't' from __builtin_amdgcn_flat_atomic_fadd_f32/f64 (#173381)
Allows for type checking depending on the built-in signature.
This introduces some subtle changes in code generation: before, since
the signature was meaningless, we would accept any pointer type without
casting. After this change, the pointer of the `atomicrmw` matches the
flat address space.
[ELF] Include sharded relocations in RelocationBaseSection::getSize
Although mergeRels is called prior to using this size for final layout,
Writer::setReservedSymbolSections uses this in order to set the value of
__rel[a]_iplt_end and, downstream in Morello LLVM, __rel[a]_dyn_end.
Currently none of the relocations that can exist when static linking (as
the case when these symbols are defined) are sharded, but a future
commit will change this for R_AARCH64_AUTH_RELATIVE, and similarly
R_MORELLO_RELATIVE is sharded downstream in Morello LLVM. Make sure we
compute the right size when called prior to mergeRels, and add a
regression test to demonstrate that R_AARCH64_AUTH_RELATIVE still gets
the right __rel[a]_ipt_end in future even when sharding is adopted.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/173285
[NFC][ELF] Move mergeRels/partitionRels into finalizeContents
Other than the ordering requirements that remain between sections, this
abstracts the details of how these sections are implemented.
Note that isNeeded already checks relocsVec for both section types, so
finalizeSynthetic can call it before mergeRels just fine.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/171203