[libc++] Use compiler explorer for Clang as well and update to LLVM 23 as head (#185168)
Using the compiler explorer infrastucture simplifies the dockerfile a
bit, since we have a single source for compilers now instead of two
independent ones. compiler explorer is also usually significantly faster
at providing new versions than apt.llvm.org.
[mlir][Linalg] Prevent vectorization of generic Conv with dynamic dims (#185415)
-- We should use `isaConvolutionOpInterface` instead as it accommodates
both named as well as generic convolution ops.
-- https://github.com/llvm/llvm-project/pull/176339 missed making one
such update to `vectorizeDynamicLinalgOpPrecondition` and it got exposed
in a downstream project.
-- This commit therefore aims to fix the same.
Signed-off-by: Abhishek Varma <abhvarma at amd.com>
[mlir][vector] Flatten transfer - support multi-dim scalar element (#185417)
Adds support for flattening multi-dimensional scalar vector transfers.
The addition prevents pattern crashes on such inputs and allows for
cleaner lowering of scalar vectors.
[DA] Test AddRecs are nsw before strong SIV test (#183421)
Currently Strong SIV test, does not check that the AddRecs involved do
not overflow. This is required for correctness of the tests. Strictly
speaking, the range-based independence check in Strong SIV relies on
SCEV which internally takes care of potential overflows, so this is
mainly needed for the divisibility test and distance/directions
calculations, but putting the test early in the function covers all the
cases anyways.
[flang] Inline max/minval according to -ffp-maxmin-behavior. (#185148)
This patch takes into account the option setting when inlining
max/minval intrinsics. It is not an NFC change for Flang, because:
* Inlining for integer types now uses arith.max/minsi operations.
* We do not mark the reduction loops as `unordered`
under `reassoc` FMF. I think this was not quite correct.
Otherwise, the default Legacy setting should produce the same
MLIR as before.
[MLIR][LLVM] Fix crash in LLVMFunctionType::clone when erasing void function results (#185093)
LLVMFunctionType::clone(inputs, results) was asserting that
results.size() == 1, which caused a crash (later changed to return
null/failure) when erasing results from a void llvm.func via
FunctionOpInterface::eraseResults.
For LLVM function types, an empty results range maps to void return: the
FunctionOpInterface represents void llvm.func with 0 results, while the
underlying LLVMFunctionType stores an explicit LLVMVoidType. When
erasing all results (or no-op erasing 0 results from a void function),
the interface passes an empty TypeRange to clone(), which should produce
a void function type.
Fix by accepting an empty results range in LLVMFunctionType::clone() and
mapping it to LLVMVoidType. More than one result remains invalid.
Fixes #128322
Assisted-by: Claude Code
[mlir] Fix crash in ForwardDominanceIterator when encountering graph regions (#185043)
ForwardDominanceIterator<NoGraphRegions=true> was asserting when it
encountered a region without SSA dominance (a "graph region"), such as
scf.forall.in_parallel's body. This crash was triggered by
-test-ir-visitors when walking functions that contain graph-region ops.
Change the behavior of ForwardDominanceIterator<true> and
ReverseDominanceIterator<true> to silently skip graph regions instead of
asserting, and update the documentation accordingly. This matches the
intended semantics of the NoGraphRegions flag: the traversal simply does
not enumerate blocks/ops inside such regions.
Fixes #116370
Assisted-by: Claude Code
[Hexagon] Add new register input/output types for qf instructions (#184398)
The v81 iset has been updated with input and output register
types/extensions for instructions. Currently, it supports qf32/qf16
register types. This patch implements a qf reg type lookup to query
these types. In the future, the register type extractor can be improved
and more APIs can be added to support other register types.
Co-authored-by: <santdas at qti.qualcomm.com>
Add AMO load with Compare and Swap Not Equal
This commit adds support for lwat/ldat atomic operations with function
code 16 (Compare and Swap Not Equal) via 4 clang builtins:
__builtin_amo_lwat_csne for 32-bit unsigned operations
__builtin_amo_ldat_csne for 64-bit unsigned operations
__builtin_amo_lwat_csne_s for 32-bit signed operations
__builtin_amo_ldat_csne_s for 64-bit signed operations
[TableGen] Fix ordering of register classes with artificial members.
The current implementation wouldn't advance IB to skip artificial
registers once IA has reached the end.
[MVEGatherScatter] Fix GEP scale calculations (#185437)
The GEP scale for a single index GEP is the type alloc size of the
source element type. The pass was mostly computing it correctly, but two
places were doing something different.
[SandboxVec][DAG] Fix unscheduled succs when nodes are scheduled (#184946)
When we update use-def edges the DAG gets notified to update the
UnscheduledSuccs counters. However, if either edge node is already
scheduled we should not update UnscheduledSuccs because the
UnscheduledSuccs counter value should be treated as "undefined" after a
node has been scheduled, i.e., it's value has a meaning only before the
node gets scheduled.
[mlir][xegpu] Add support for setting `order` in `SetDescLayoutOp` and `SetOpLayoutAttrOp` transform ops. (#184705)
Currently XeGPU transform dialect does not allow the user to set the
`order` attribute of a layout in `SetDescLayoutOp` and
`SetOpLayoutAttrOp`. This PR adds `order` as an optional argument to
these transform ops.
[MLIR][OpenMP] Prevent teams reductions from deadlocking (#184625)
Currently, simple Fortran reductions like the example below cause a
deadlock at runtime:
```f90
integer :: i, x
!$omp teams distribute reduction(+:x)
do i=1, 10
x = x + 1
end do
```
Preventing a redundant barrier from being added in that case addresses
this issue. Synchronization is already being handled by the
`__kmpc_reduce` and `__kmpc_end_reduce` runtime calls for the host, and
by the OMPIRBuilder-generated `_omp_reduction_inter_warp_copy_func`
function for GPUs.
[lldb][test] PlatformDarwinTest.cpp: move directory creation into SetUp
So it can be shared by future test-cases.
Drive-by change:
* Make the `TestParseVersionBuildDir` not depend on the test-fixture,
since it doesn't require any directory/debugger setup
[CIR][AArch64] Add support for the remaining `vceqz` builtins
Implement the remaining CIR lowerings for the AdvSIMD (Neon)
`vceqz` intrinsic group (bitwise equal to zero).
Most variants of `vceqz` variant were already supported; this patch
completes the rest of the group [1] that was left as a TODO.
Tests for these intrinsics are moved from:
* test/CodeGen/AArch64/neon_intrinsics.c
* test/CodeGen/AArch64/v8.2a-fp16-intrinsics.c
to:
* test/CodeGen/AArch64/neon/intrinsics.c
* test/CodeGen/AArch64/neon/fullfp16,
respectively.
The implementation largely mirrors the existing lowering in
[4 lines not shown]
[Flang][OpenMP] Fix close map flag propagation for derived types in USM (#185330)
This fixes a bug in USM mode where the `close` map type modifer was
attached to some `map.info.op`'s corresponding to user-defined type
members while the parent type instance itself is not marked as `close`.
This fix ensures that if a parent record type map does not have the
'close' flag, it is cleared from its members as well, maintaining
consistency.
Gemini was used to create tests. AI generated test code was reviewed
line-by-line by me. Which were derived from a reproducer I was working
with to debug the issue.
Assisted-by: Gemini <gemini at google.com>