[mlir][bufferization] Drop TensorLikeType::getBufferType() (#201350)
Replace TensorLikeType::getBufferType() with
options.unknownTypeConverterFn() hook. Make the hook work with
tensor-like and buffer-like types (instead of builtins) to maintain the
same behaviour at the API boundary level and still allow user types to
be properly supported.
Historically, an attempt to support user types within the one-shot
bufferization framework was made. As part of it,
TensorLikeType::getBufferType() was introduced to allow user-provided
types to customize bufferization. However, the whole affair proved to be
overly complex: there is an interface with customization points for
user-provided tensors, and options-based (not sufficient) implementation
for builtin tensors. On top of this, there was always a
function-specific hook to customize function-level behaviour further. As
a result of this, users would need to implement two different mechanisms
on their end: interface implementation + option hooks.
[19 lines not shown]
[clang][bytecode] Fix shifting by negative IntAP values (#202505)
The negation of a negative value didn't necessarily result in a positive
value. Fix that by giving it one more bit of precision.
[OFFLOAD][L0] Add wait events for AsyncQueue memFill (#202287)
Fix an issue where memFill operations were not chained properly with respect prior operations.
[Loads] Migrate isDereferenceable APIs to SimplifyQuery (#202553)
These take the usual set of analysis parameters, so we can encapsulate
them using SimplifyQuery.
[Clang][HIP] Include `__clang_cuda_math_forward_declares.h` before `<cmath>` (#201563)
In HIP, `constexpr` functions are treated as both `__host__` and
`__device__`.
A new version of the MS STL shipped with the build tools version
14.51.36231 has `constexpr` definitions for some `cmath` functions when
the
compiler in use is Clang (this gets worse when C++23 is in use).
These definitions conflict with the `__device__` declarations we provide
in the header wrappers.
There is a workaround for this: We do not mark `constexpr`
functions [_that are defined in a system
header_](https://github.com/llvm/llvm-project/blob/03127a03860b9d8cb440fe8f51c00647f45eb8be/clang/lib/Sema/SemaCUDA.cpp#L877)
as
`__host__` and `__device__` if there is a previous `__device__`
declaration.
[14 lines not shown]
[libc++][vector] Test `[[nodiscard]]` applied to `vector::iterator` (#202262)
Adds test coverage.
`[[nodicard]]` applied in:
- #198489
- #198492
Towards #172124
Co-authored-by: Hristo Hristov <zingam at outlook.com>
[clang][bytecode] Remove `InterpFrame::ThisPointerOffset` (#202322)
Replace it with a `uint8_t` representing some bool flags about the
function. This reduces the size of a frame from 88 to 80 bytes.
[SCEVExpander] Don't expand a UDiv with a possibly-poison divisor (#202378)
SCEVExpander::isSafeToExpand only check divisor isKnownNonZero, which
ignore the possibility of poison. For the following divisor:
```
%ct = call i32 @llvm.cttz.i32(i32 %x, i1 true)
%divisor = add i32 %ct, 1
...
%rem = urem i32 1, %divisor
```
The urem may be hoisted unsafely.
Fix by also check divisor isGuaranteedNotToBePoison.
Fixes https://github.com/llvm/llvm-project/issues/202028
[mlir][Interfaces] Document completeness requirement of `RegionBranchOpInterface` (#202018)
Document that interface implementations must report all possible control
flow edges. Failure to report a possible edge may break
analyses/transformations/APIs such as
`RegionBranchOpInterface::isRepetitiveRegion`.
[GlobalISel][AMDGPU] Emit proper diagnostic when inline asm register allocation fails (#201380)
Replace the silent fallback return with a DiagnosticInfoInlineAsm error
and undef result values, so the failure is reported to the user instead
of relying on -global-isel-abort
discussed in https://github.com/llvm/llvm-project/pull/200771
[ConstantFolding] Fix dropped bits in non-integer-ratio bitcast with undef lane (#202282)
When constant-folding a vector bitcast(e.g. <4 x i24> -> <3 x i32>), an
undef source element inserted a DstBitSize-wide zero placeholder into
the bit buffer. This could clobber defined source element, producing a
wrong result on big-endian targets.
Fix by inserting SrcBitSize-wide zero instead.
Alive2 proof:
before (unsound): https://alive2.llvm.org/ce/z/R_ZQ75
after (verified): https://alive2.llvm.org/ce/z/VuV3mz
[mlir][spirv] Add Arm.ExperimentalMLOperations.1 extended inst set (#202283)
This instruction set provides a mechanism to encode experimental ML
operations in SPIR-V modules. Such instructions are encoded via the
single CALL operator in the instruction set by specifying an op_code and
customized inputs values.
Reference:
https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extended/Arm.ExperimentalMLOperations.asciidoc
Signed-off-by: Niklas Lithammer <niklas.lithammer at arm.com>
Signed-off-by: Davide Grohmann <davide.grohmann at arm.com>
[CI][Offload] Fix offload depends on openmp (#202541)
It appears that Offload depends on OpenMP. Thus, enable OpenMP as a
runtime to test when offload has changes.
[mlir][SPIR-V] Add SPIRVToLLVM direct conversions for cast, CL, GL and logical ops (#202506)
Lower the OpenCL extended instruction set math ops, GL math ops (Trunc,
Asin, Acos, Atan), logical Ordered/Unordered, and the pointer cast ops
to their LLVM dialect equivalents
[AMDGPU] Use alloc size for array stride in LowerBufferFatPointers (#202530)
Array elements are laid out at multiples of getTypeAllocSize, not
getTypeStoreSize
LLVM memory model lays out array element `i` at `i * allocSize`
(reflected in `DataLayout::getTypeAllocSize`), apply it for fat pointers
to prevent miscompile
[MLIR][NVVM] Add support for narrow-fp to bf16x2 conversions (#200157)
This change adds the following NVVM Ops to support narrow-fp to bf16x2
conversions:
- `nvvm.convert.f6x2.to.bf16x2`
- `nvvm.convert.f4x2.to.bf16x2`
- `nvvm.convert.f8x2.to.bf16x2` (updated to allow `E4M3FN` and `E5M2`
types)
Also removes unnecessary verifiers for narrow-fp to `f16x2` conversions
to instead use `TypeAttrOf` to validate the source type in the ODS
definition.
[Compiler-rt][test] Fix circular link dependency between builtins and libc (#199482)
Currently, the link order is `libclang_rt.builtins.a -lc -lm`. Builtins
are scanned first after which symbols like `abort` are unresolved
references that are resolved through libc.a. However, resolving the
references to these symbols further lead to undefined references to
`_aeabi_uldivmod` etc. that can only resolved through builtins.
Reversing the order also wont fix the issue because `libc.a` introduces
`__aeabi_uldivmod` which is resolved by builtins but it introduces
`abort` which can only be resolved libc.a.
This patch fixes this by wrapping the archives in a linker group
(--start-group/--end-group), which instructs the linker to rescan all
archives in the group until no new symbols can be resolved.
This error is exposed only when bfd like linkers are used.