LLVM/project 6bae2a9clang/docs LanguageExtensions.rst, llvm/docs LangRef.rst

[LangRef] Clarify specification for float min/max operations (#172012)

This implements some clarifications for the specification of floating
point min/max operations based on the discussion in
https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006.

The key changes are:

* Explicitly specify minnum and maxnum with an sNaN operand as
non-deterministically either returning NaN or treating sNaN as qNaN.
This was implied by our general NaN semantics, but is important to call
out here due to the special behavior of sNaN.
* Explicitly specify the same non-determinism for the minnum/maxnum
based vector reductions as well.
* Explicitly specify the meaning of nsz on float min/max ops. In
particular, clarify that unlike normal nsz semantics, it does not allow
introducing a zero with a different sign out of thin air.
* Simplify the semantics comparison section. This now focuses only on
NaN and signed zero behavior, but omits information about exceptions
that is not relevant for these non-constrained intrinsics.
DeltaFile
+139-139llvm/docs/LangRef.rst
+10-10clang/docs/LanguageExtensions.rst
+149-1492 files

LLVM/project 2027460llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp, llvm/test/CodeGen/AArch64 sve-load-store-legalisation.ll

DAG: Use poison in more vector legalization contexts
DeltaFile
+2-83llvm/test/CodeGen/AArch64/sve-load-store-legalisation.ll
+31-30llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+33-1132 files

LLVM/project 7a10fc8llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/RISCV combine-clmul.ll

[DAG] Add basic folds for CLMUL nodes (#176961)

Closes #176783

Adds support for folding `ISD::CMUL`/`CMULH`/`CMULR` nodes.
DeltaFile
+64-349llvm/test/CodeGen/X86/pclmulqdq.ll
+85-0llvm/test/CodeGen/RISCV/combine-clmul.ll
+27-0llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+176-3493 files

LLVM/project 5faa181mlir/include/mlir/Transforms RegionUtils.h, mlir/lib/Transforms/Utils RegionUtils.cpp

[mlir] Add side-effect check to moveOperationDependencies (#176361)

This patch adds a side-effect check to `moveOperationDependencies` to
match the behavior of `moveValueDefinitions`. Previously,
`moveOperationDependencies` would move operations with side-effecting
dependencies, which could change program semantics.

**Note** that the existing test changes are needed because unregistered
operations (e.g., "moved_op"()) are treated as side-effecting. These
tests were updated to use pure operations for operations in the moved
slice, while keeping unregistered ops for operations that aren't moved
(e.g., "before"(), "foo"()). This ensures that tests continue to
exercise their intended functionality without being blocked by the new
side-effect check.
DeltaFile
+113-52mlir/test/Transforms/move-operation-deps.mlir
+38-8mlir/lib/Transforms/Utils/RegionUtils.cpp
+2-0mlir/include/mlir/Transforms/RegionUtils.h
+153-603 files

LLVM/project f135632llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp, llvm/test/CodeGen/AArch64 sve-load-store-legalisation.ll

DAG: Use poison in more vector legalization contexts
DeltaFile
+2-83llvm/test/CodeGen/AArch64/sve-load-store-legalisation.ll
+28-27llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+30-1102 files

LLVM/project c4afdd9mlir/lib/Dialect/Linalg/Transforms Detensorize.cpp, mlir/test/Dialect/Linalg detensorize_if.mlir detensorize_while_impure_cf.mlir

[mlir][linalg] Remove abandoned Detensorize pass
DeltaFile
+0-569mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
+0-177mlir/test/Dialect/Linalg/detensorize_if.mlir
+0-104mlir/test/Dialect/Linalg/detensorize_while_impure_cf.mlir
+0-102mlir/test/Dialect/Linalg/detensorize_0d.mlir
+0-71mlir/test/Dialect/Linalg/detensorize_while.mlir
+0-58mlir/test/Dialect/Linalg/detensorize_while_pure_cf.mlir
+0-1,0815 files not shown
+0-1,22911 files

LLVM/project 4205c25llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp LegalizeVectorOps.cpp, llvm/test/CodeGen/AMDGPU vector-reduce-umax.ll vector-reduce-or.ll

DAG: Use poison for unused shuffle operands in legalizer
DeltaFile
+71-74llvm/test/CodeGen/AMDGPU/vector-reduce-umax.ll
+28-28llvm/test/CodeGen/AMDGPU/vector-reduce-or.ll
+16-16llvm/test/CodeGen/X86/x86-interleaved-access.ll
+4-4llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+4-3llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+2-2llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+125-1276 files

LLVM/project dc741f2mlir/include/mlir/Dialect/Tosa/IR TosaShapeOps.td, mlir/lib/Dialect/Tosa/IR TosaCanonicalizations.cpp

[mlir][tosa] Add constant folding for tosa.add_shape operation (#173112)

This commit introduces constant folding for the tosa.add_shape
operation. When both operands of the add_shape operation are constant
shapes, the operation is evaluated at compile-time.
DeltaFile
+49-17mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp
+33-0mlir/test/Dialect/Tosa/constant_folding.mlir
+2-0mlir/include/mlir/Dialect/Tosa/IR/TosaShapeOps.td
+84-173 files

LLVM/project 4b7cf46lldb/include/lldb/Host/windows PseudoConsole.h ProcessLauncherWindows.h, lldb/source/Host/windows PseudoConsole.cpp ProcessLauncherWindows.cpp

[lldb][windows] add STDIN and STDOUT forwarding support (#175812)

DeltaFile
+92-32lldb/source/Plugins/Process/Windows/Common/ProcessWindows.cpp
+66-0lldb/source/Host/windows/PseudoConsole.cpp
+16-8lldb/source/Host/windows/ProcessLauncherWindows.cpp
+10-0lldb/include/lldb/Host/windows/PseudoConsole.h
+8-1lldb/include/lldb/Host/windows/ProcessLauncherWindows.h
+1-1lldb/test/Shell/Settings/TestFrameFormatColor.test
+193-421 files not shown
+194-437 files

LLVM/project f5b62a7mlir/lib/Dialect/Linalg/Transforms Vectorization.cpp, mlir/test/Dialect/Linalg/vectorization insert-slice.mlir

[mlir][linalg] Update createWriteOrMaskedWrite (#174810)

`createWriteOrMaskedWrite` is used extensively in the Linalg vectorizer.
When a write uses non-zero indices, the helper currently computes mask
sizes as if the write started at 0 (`size = dim(d)`), which can produce
incorrect `vector.create_mask` operands for the generated
`vector.transfer_write`. Instead, the mask size should be computed as
`size = dim(d) - write_index(d)`.

EXAMPLE
-------
Let`s use this example to illustrate:
```mlir
%res = tensor.insert_slice
    %src into %dest[0, %c2] [5, 1] [1, 1] : tensor<5x1xi32> into tensor<?x3xi32>
```

This op is vectorized as a pair of `vector.transfer_read` +
`vector.transfer_write` ops. When calculating the mask for the

    [20 lines not shown]
DeltaFile
+73-20mlir/test/Dialect/Linalg/vectorization/insert-slice.mlir
+18-3mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+91-232 files

LLVM/project 689f978mlir/lib/Dialect/MLProgram/Transforms BufferizableOpInterfaceImpl.cpp, mlir/test/Dialect/MLProgram one-shot-bufferize.mlir

[ml_program] fix bufferizesToMemoryRead for ml_program.global_store (#177387)

This is a fix for the `BufferizableOpInterface` implementation for
`ml_program.global_store`.

`bufferizesToMemoryRead` currently returns false for
`GlobalStoreOpInterface`, but I believe it should return true as
`ml_program.global_store` needs to read its input buffer to know what
value to store to global.

This manifested in a bug where `one-shot-bufferize` would produce MLIR
that copies uninitialized data to the global var instead of the intended
value to be stored.

For the following MLIR:

```
module {
  ml_program.global private mutable @"state_tensor"(dense<0.0> : tensor<4x75xf32>) : tensor<4x75xf32>

    [61 lines not shown]
DeltaFile
+31-0mlir/test/Dialect/MLProgram/one-shot-bufferize.mlir
+1-1mlir/lib/Dialect/MLProgram/Transforms/BufferizableOpInterfaceImpl.cpp
+32-12 files

LLVM/project 51b8d45llvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp

[AMDGPU][NFC] Refine the representation of MODE register values.

- Eliminate the field masks.
- Segregate the encoding logic.
- Simplify and clarify the user code.

This is supposed to help updating downstream branches where we
have a more advanced version of the same facility.
DeltaFile
+55-56llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+55-561 files

LLVM/project de8126dclang/lib/AST/ByteCode Interp.h, clang/test/AST/ByteCode c.c

[clang][bytecode] Fix mulc/divc op for IntegralAP types (#177565)

We need to allocate those.

Fixes https://github.com/llvm/llvm-project/issues/176740
DeltaFile
+23-1clang/lib/AST/ByteCode/Interp.h
+8-0clang/test/AST/ByteCode/c.c
+31-12 files

LLVM/project 074485cllvm/include/llvm/Passes CodeGenPassBuilder.h, llvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[CodeGen][NPM] Disable Machine verifier at the end of default pipelines
DeltaFile
+4-8llvm/test/CodeGen/X86/llc-pipeline-npm.ll
+3-6llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+0-3llvm/include/llvm/Passes/CodeGenPassBuilder.h
+7-173 files

LLVM/project 1b3fa6fllvm/lib/Transforms/Scalar LoopStrengthReduce.cpp

[CodeGen][LSR][NPM] Make LoopStrengthReduce pass preserve LCSSA
DeltaFile
+4-0llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+4-01 files

LLVM/project 65f16e0llvm/lib/Target/AMDGPU SILowerControlFlow.cpp, llvm/test/CodeGen/AMDGPU si-lower-control-flow-preserve-dom-tree.mir

[AMDGPU] Fix DomTree preservation in SILowerControlFlow when nodes are deleted
DeltaFile
+59-0llvm/test/CodeGen/AMDGPU/si-lower-control-flow-preserve-dom-tree.mir
+5-0llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
+64-02 files

LLVM/project d765e1ellvm/test/CodeGen/AMDGPU si-lower-control-flow-preserve-dom-tree.mir

review comments
DeltaFile
+37-31llvm/test/CodeGen/AMDGPU/si-lower-control-flow-preserve-dom-tree.mir
+37-311 files

LLVM/project 4ea3fccllvm/include/llvm/Passes CodeGenPassBuilder.h, llvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[CodeGen][NPM] Specify Loop pass adaptor to not use MSSA
DeltaFile
+2-2llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+2-2llvm/test/CodeGen/X86/llc-pipeline-npm.ll
+2-1llvm/include/llvm/Passes/CodeGenPassBuilder.h
+6-53 files

LLVM/project f5f8a49llvm/lib/Target/AMDGPU AMDGPUTargetMachine.cpp, llvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[AMDGPU][NPM] Complete fast regalloc pipeline
DeltaFile
+38-0llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+10-1llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+48-12 files

LLVM/project 1253a30llvm/include/llvm/Passes CodeGenPassBuilder.h, llvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[CodeGen][NPM] Add "PhysicalRegisterUsageAnalysis" once
DeltaFile
+417-420llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+1-4llvm/include/llvm/Passes/CodeGenPassBuilder.h
+418-4242 files

LLVM/project 854d088llvm/lib/CodeGen/SelectionDAG LegalizeFloatTypes.cpp LegalizeTypes.h

Delete the implementation functions
DeltaFile
+0-655llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+0-37llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+0-6922 files

LLVM/project 4148355llvm/include/llvm/CodeGen TargetLowering.h, llvm/lib/CodeGen TargetLoweringBase.cpp

DAG: Remove softPromoteHalfType

Remove the now unimplemented target hook and associated DAG machinery
for the old half legalization path.

Really fixes #97975
DeltaFile
+7-22llvm/include/llvm/CodeGen/TargetLowering.h
+0-20llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+0-11llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+2-7llvm/lib/CodeGen/TargetLoweringBase.cpp
+0-8llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp
+0-2llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+9-701 files not shown
+9-717 files

LLVM/project febe138llvm/lib/Target/AMDGPU R600ISelLowering.cpp R600ISelLowering.h, llvm/test/CodeGen/AMDGPU kernel-args.ll

R600: Remove softPromoteHalfType

Also includes a kind of hacky, minimal change to avoid assertions
when softPromoteHalfType is removed to fix kernel arguments
lowered as f16. Half support was never really implemented
for r600, and there just happened to be a few incidental tests
which included a half argument (which were also not even meaningful,
since the function body just folded to nothing due to no callable
function support).
DeltaFile
+164-0llvm/test/CodeGen/AMDGPU/kernel-args.ll
+3-0llvm/lib/Target/AMDGPU/R600ISelLowering.cpp
+0-2llvm/lib/Target/AMDGPU/R600ISelLowering.h
+167-23 files

LLVM/project 2fa99dcllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

AMDGPU: Move softPromoteHalfType override to R600 only

As expected the code is much worse, but more correct.
We could do a better job with source modifier management around
fp16_to_fp/fp_to_fp16.
DeltaFile
+19,051-23,588llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+7,381-11,318llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+6,645-10,108llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+6,103-9,009llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+7,004-7,821llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+5,419-8,032llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+51,603-69,876116 files not shown
+97,949-126,397122 files

LLVM/project 6fc8028llvm/lib/Analysis LazyValueInfo.cpp

[LVI] Fix the type when inferring nonnull from a dereferenceable attribute bundle (#177562)

DeltaFile
+1-1llvm/lib/Analysis/LazyValueInfo.cpp
+1-11 files

LLVM/project 7b72ab8llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel fpext.ll unmerge-sgpr-s16.ll

AMDGPU/GlobalISel: Regbanklegalize rules for G_UNMERGE_VALUES

Move G_UNMERGE_VALUES handling to AMDGPURegBankLegalizeRules.cpp.
Fix sgpr S16 unmerge by lowering using shift and using S32.
Previously sgpr S16 unmerge was selected using _lo16 and _hi16 subreg
indexes which are exclusive to vgpr register classes.
For remaing cases we do trivial mapping, assigns same reg bank
to all operands, vgpr or sgpr.
DeltaFile
+47-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+13-27llvm/test/CodeGen/AMDGPU/GlobalISel/fpext.ll
+36-0llvm/test/CodeGen/AMDGPU/GlobalISel/unmerge-sgpr-s16.ll
+26-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+14-9llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
+14-9llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll
+150-452 files not shown
+158-498 files

LLVM/project 39a9e65mlir/include/mlir/Dialect/Bufferization/IR BufferizableOpInterface.h, mlir/lib/Dialect/Bufferization/IR BufferizableOpInterface.cpp

[mlir][bufferization] Cache SymbolTableCollection for CallOp types (#176909)

Use the BufferizationState symbol table cache when resolving CallOp
callee types in getBufferType(), avoiding repeated SymbolTableCollection
creation. Add a const accessor (backed by a mutable cache) so const
state can reuse the same tables. Completes a marked TODO.
DeltaFile
+2-4mlir/lib/Dialect/Bufferization/Transforms/FuncBufferizableOpInterfaceImpl.cpp
+3-1mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h
+4-0mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
+9-53 files

LLVM/project b791501clang/test/CodeGenObjC arc-foreach.m arc-unsafeclaim.m, clang/test/CodeGenObjCXX auto-release-result-assert.mm

Revert "Reapply "[CGObjC] Allow clang.arc.attachedcall on -O0 (#164875)" (#177285)" (#177533)

This reverts commit 4b939beb79e3390046b760bef71b7d891ba9b4df.

This commit seems to be causing these test failures:

- ThreadSanitizer-x86_64-iossim.Darwin.norace-objcxx-run-time.mm
https://ci.swift.org/job/llvm.org/job/clang-san-iossim/14230/testReport/junit/ThreadSanitizer-x86_64-iossim/Darwin/norace_objcxx_run_time_mm/
- ThreadSanitizer-x86_64-iossim.Darwin.objc-synchronize-cycle-tagged.mm
https://ci.swift.org/job/llvm.org/job/clang-san-iossim/14230/testReport/junit/ThreadSanitizer-x86_64-iossim/Darwin/objc_synchronize_cycle_tagged_mm/
- ThreadSanitizer-x86_64-iossim.Darwin.objc-synchronize-tagged.mm
https://ci.swift.org/job/llvm.org/job/clang-san-iossim/14230/testReport/junit/ThreadSanitizer-x86_64-iossim/Darwin/objc_synchronize_tagged_mm/
- ThreadSanitizer-x86_64-iossim.Darwin.objc-synchronize.mm
https://ci.swift.org/job/llvm.org/job/clang-san-iossim/14230/testReport/junit/ThreadSanitizer-x86_64-iossim/Darwin/objc_synchronize_mm/


With the error message:

```
fatal error: error in backend: Cannot select: intrinsic %llvm.objc.clang.arc.noop.use
```
DeltaFile
+1-232llvm/test/CodeGen/AArch64/call-rv-marker.ll
+89-89clang/test/CodeGenObjC/arc-foreach.m
+5-45clang/test/CodeGenObjC/arc-unsafeclaim.m
+16-16clang/test/CodeGenObjC/os_log.m
+1-22clang/test/CodeGenObjC/arc-arm.m
+6-12clang/test/CodeGenObjCXX/auto-release-result-assert.mm
+118-41610 files not shown
+155-46416 files

LLVM/project 2142388llvm/test/CodeGen/X86 clmul-vector-256.ll clmul-vector-512.ll

[X86] Add 256-bit and 512-bit CLMULR and CLMULH test coverage (#177561)

DeltaFile
+1,844-0llvm/test/CodeGen/X86/clmul-vector-256.ll
+1,595-3llvm/test/CodeGen/X86/clmul-vector-512.ll
+3,439-32 files

LLVM/project e619523mlir/lib/Dialect/XeGPU/Transforms XeGPUPropagateLayout.cpp, mlir/test/Dialect/XeGPU propagate-layout-subgroup.mlir

[MLIR][XeGPU] Add simple rank-based sg layout creation (#172867)

DeltaFile
+197-24mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+74-0mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
+271-242 files