[LangRef] Clarify specification for float min/max operations (#172012)
This implements some clarifications for the specification of floating
point min/max operations based on the discussion in
https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006.
The key changes are:
* Explicitly specify minnum and maxnum with an sNaN operand as
non-deterministically either returning NaN or treating sNaN as qNaN.
This was implied by our general NaN semantics, but is important to call
out here due to the special behavior of sNaN.
* Explicitly specify the same non-determinism for the minnum/maxnum
based vector reductions as well.
* Explicitly specify the meaning of nsz on float min/max ops. In
particular, clarify that unlike normal nsz semantics, it does not allow
introducing a zero with a different sign out of thin air.
* Simplify the semantics comparison section. This now focuses only on
NaN and signed zero behavior, but omits information about exceptions
that is not relevant for these non-constrained intrinsics.
[mlir] Add side-effect check to moveOperationDependencies (#176361)
This patch adds a side-effect check to `moveOperationDependencies` to
match the behavior of `moveValueDefinitions`. Previously,
`moveOperationDependencies` would move operations with side-effecting
dependencies, which could change program semantics.
**Note** that the existing test changes are needed because unregistered
operations (e.g., "moved_op"()) are treated as side-effecting. These
tests were updated to use pure operations for operations in the moved
slice, while keeping unregistered ops for operations that aren't moved
(e.g., "before"(), "foo"()). This ensures that tests continue to
exercise their intended functionality without being blocked by the new
side-effect check.
[mlir][tosa] Add constant folding for tosa.add_shape operation (#173112)
This commit introduces constant folding for the tosa.add_shape
operation. When both operands of the add_shape operation are constant
shapes, the operation is evaluated at compile-time.
[mlir][linalg] Update createWriteOrMaskedWrite (#174810)
`createWriteOrMaskedWrite` is used extensively in the Linalg vectorizer.
When a write uses non-zero indices, the helper currently computes mask
sizes as if the write started at 0 (`size = dim(d)`), which can produce
incorrect `vector.create_mask` operands for the generated
`vector.transfer_write`. Instead, the mask size should be computed as
`size = dim(d) - write_index(d)`.
EXAMPLE
-------
Let`s use this example to illustrate:
```mlir
%res = tensor.insert_slice
%src into %dest[0, %c2] [5, 1] [1, 1] : tensor<5x1xi32> into tensor<?x3xi32>
```
This op is vectorized as a pair of `vector.transfer_read` +
`vector.transfer_write` ops. When calculating the mask for the
[20 lines not shown]
[ml_program] fix bufferizesToMemoryRead for ml_program.global_store (#177387)
This is a fix for the `BufferizableOpInterface` implementation for
`ml_program.global_store`.
`bufferizesToMemoryRead` currently returns false for
`GlobalStoreOpInterface`, but I believe it should return true as
`ml_program.global_store` needs to read its input buffer to know what
value to store to global.
This manifested in a bug where `one-shot-bufferize` would produce MLIR
that copies uninitialized data to the global var instead of the intended
value to be stored.
For the following MLIR:
```
module {
ml_program.global private mutable @"state_tensor"(dense<0.0> : tensor<4x75xf32>) : tensor<4x75xf32>
[61 lines not shown]
[AMDGPU][NFC] Refine the representation of MODE register values.
- Eliminate the field masks.
- Segregate the encoding logic.
- Simplify and clarify the user code.
This is supposed to help updating downstream branches where we
have a more advanced version of the same facility.
DAG: Remove softPromoteHalfType
Remove the now unimplemented target hook and associated DAG machinery
for the old half legalization path.
Really fixes #97975
R600: Remove softPromoteHalfType
Also includes a kind of hacky, minimal change to avoid assertions
when softPromoteHalfType is removed to fix kernel arguments
lowered as f16. Half support was never really implemented
for r600, and there just happened to be a few incidental tests
which included a half argument (which were also not even meaningful,
since the function body just folded to nothing due to no callable
function support).
AMDGPU: Move softPromoteHalfType override to R600 only
As expected the code is much worse, but more correct.
We could do a better job with source modifier management around
fp16_to_fp/fp_to_fp16.
AMDGPU/GlobalISel: Regbanklegalize rules for G_UNMERGE_VALUES
Move G_UNMERGE_VALUES handling to AMDGPURegBankLegalizeRules.cpp.
Fix sgpr S16 unmerge by lowering using shift and using S32.
Previously sgpr S16 unmerge was selected using _lo16 and _hi16 subreg
indexes which are exclusive to vgpr register classes.
For remaing cases we do trivial mapping, assigns same reg bank
to all operands, vgpr or sgpr.
[mlir][bufferization] Cache SymbolTableCollection for CallOp types (#176909)
Use the BufferizationState symbol table cache when resolving CallOp
callee types in getBufferType(), avoiding repeated SymbolTableCollection
creation. Add a const accessor (backed by a mutable cache) so const
state can reuse the same tables. Completes a marked TODO.