[LV] Add VPlan printing test with casts converted to single scalar (NFC) (#202200)
Add test coverage for additional paths that can create single-scalar
casts: sinkScalarOperands and induction optimization.
[VectorCombine] foldShuffleChainsToReduce - add FADD/FMUL handling (#201302)
Extend `foldShuffleChainsToReduce` to fold shuffle-reduction chains of
fadd/fmul into the corresponding vector reduction intrinsics
(llvm.vector.reduce.fadd / llvm.vector.reduce.fmul).
The transformation requires the `reassoc` fast-math flag on every binop
in the chain based on the
[langspec](https://llvm.org/docs/LangRef.html#rewrite-based-flags). The
output intrinsic receives the intersection of all binops' FMF, and the
identity start value is selected via ConstantExpr::getBinOpIdentity
(-0.0 for fadd, 1.0 for fmul, respecting nsz for the sign of zero).
Fixes #199030.
[VPlan] Add VPReplicateRecipe::operandsWithoutMask() (NFC). (#202115)
Add a helper to access a VPReplicateRecipe's operands while excluding
the mask of a predicated recipe, and use it in createReplicateRegion.
Split off from https://github.com/llvm/llvm-project/pull/201676.
[DAG] Narrow vselect mask to vXi1 in foldToMaskedStore (#201609)
foldToMaskedStore (added in
https://github.com/llvm/llvm-project/commit/1c0ac80d4a9ef6c21914f2317003979952c2a2c3)
rewrites
store(vselect(cond, x, load(ptr)), ptr) -> masked_store(x, ptr, cond)
passing the vselect condition straight through as the store mask. A
masked
store follows the IR convention of a vXi1 mask, but the condition can be
a
wider boolean vector. On AVX512F targets without VLX, a maxnum/minnum
store-back lowers the NaN test with a legacy packed (CMPP) comparison
whose
result is a vXi32/vXi64 vector, so the masked store is created with a
wide
mask and LowerMSTORE asserts:
Assertion `Mask.getSimpleValueType().getScalarType() == MVT::i1 &&
"Unexpected mask type"' failed.
[13 lines not shown]
[libc++][array] Test `[[nodicard]]` with `array::const_iterator` (#202070)
Added tests with `array::const_iterator` for completeness.
Implemented in https://github.com/llvm/llvm-project/pull/198492
Towards #172124
[SimplifyCFG] Shrink integer lookup tables (#202071)
After #200664, we generate lookup tables in more cases, leading to
higher memory use and larger binaries. Partially alleviate this by
shrinking the lookup tables if all elements are small integers. The
underlying idea is that an extra integer extension can typically be
folded into a load instruction at no extra cost.
This reduces the size of stage2-clang by 0.13%.
[ASan] Improve qemu-alpha shadow mapping (#201861)
With a 1T fixed shadow offset the usable app memory is split between
LowMem (0-1T) and HighMem (1.5T-4T). This works on real Alpha hardware
where all addresses stay within TASK_SIZE (4T). However, under
qemu-alpha user mode mmap(NULL) returns addresses from the host x86-64
address space (~127T), outside both regions, causing AddrIsInMem() CHECK
failures in PoisonShadow.
Switch to a fixed shadow offset of 0x70000000000 (7 TiB). TASK_SIZE is
well below the shadow offset so HighMem is empty: kHighMemBeg =
MEM_TO_SHADOW(kHighMemEnd) + 1 > kHighMemEnd. All app memory fits in
LowMem [0, 7T), a simpler layout with no HighMem split. On qemu-alpha,
-R 0x80000000000 constrains guest mappings to [0, 8T), keeping them
within LowMem.
[CIR][OpenCL] Attach kernel argument metadata to CIR functions
Emit the CIR OpenCL kernel argument metadata attribute for kernel functions. Preserve CIR language address-space kinds until lowering and include argument names only when `-cl-kernel-arg-info` is enabled.
[CIR][OpenCL] Add kernel argument metadata attribute
Add a CIR attribute that carries OpenCL kernel argument metadata in source argument order. Verify that each metadata field has the expected element type and that all present arrays describe the same number of arguments.
[clang-doc] Move Generator classes into the anonymous namespace (#202058)
Clang-Tidy suggest moving these classes into the anonymous namespace,
to enforce internal linkage.
[clang-doc] Clean up implementation with better casting (#202060)
Having access to RTTI style casting lets us use slightly nicer
structures to clean up the overly complicated dispatch logic in merging
and other places.
[clang-doc] Clean up implementation with better casting
Having access to RTTI style casting lets us use slightly nicer
structures to clean up the overly complicated dispatch logic in merging
and other places.
[clang-doc] Move Generator classes into the anonymous namespace
Clang-Tidy suggest moving these classes into the anonymous namespace,
to enforce internal linkage.
Updating test clang/test/Driver/driverkit-path.c for usage with CLANG_RESOURCE_DIR (#197154)
When the CMake option CLANG_RESOURCE_DIR is specified, it changes
the path to various tools and thus breaks some tests that look for things
in the "standard" location. This change updates one of the tests to take
into account the CLANG_RESOURCE_DIR value if specified by querying
compiler using `-print-resource-dir` to more accurately find the expected
directory in tests.
[flang][CUDA] Allocate converted kernel descriptors in device-accessible storage (#201950)
Fix CUDA descriptor lowering when an `fir.embox` result reaches a
`gpu.launch_func` through an intermediate `fir.convert`.
CodeGen previously failed to recognize this use chain and could place
the descriptor in host stack storage. Since CUDA kernels may dereference
assumed-shape descriptors on the device, such descriptors must be
allocated through the CUDA descriptor allocation path. Teach the
GPU-launch-use check to look through `fir.convert` so these descriptors
are lowered with `_FortranACUFAllocDescriptor`.
Also adds a regression test for the `fir.embox -> fir.convert ->
gpu.launch_func` case.
CI: move libclang python byindings tests to main CI
This removes the separate python bindings CI, which run on the GitHub free
runners and take more than one hour to build libclang.
The tests are executed instead in the monolithic pipelines,
whenever clang would be tested.
This is fine in terms of resources because all the dependencies are
built anyway, and the tests themselves take less than one second to
run on the free runners.
[clang] Reland: fix getTemplateInstantiationArgs (#202088)
Relands https://github.com/llvm/llvm-project/pull/199528
Previous: #201373
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the
template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a
bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the
[2 lines not shown]
CI: move libclang python byindings tests to main CI
This removes the separate python bindings CI, which run on the GitHub free
runners and take more than one hour to build libclang.
The tests are executed instead in the monolithic pipelines,
whenever clang would be tested.
This is fine in terms of resources because all the dependencies are
built anyway, and the tests themselves take less than one second to
run on the free runners.
CI: move libclang python byindings tests to main CI
This removes the separate python bindings CI, which run on the GitHub free
runners and take more than one hour to build libclang.
The tests are executed instead in the monolithic pipelines,
whenever clang would be tested.
This is fine in terms of resources because all the dependencies are
built anyway, and the tests themselves take less than one second to
run on the free runners.