LLVM/project d1f65cdlibcxx/docs/Status Cxx2cIssues.csv, libcxx/include optional

[libc++] Resolve LWG4308, correct `iterator` availability for `optional<T&>` (#173948)

Resolves #171345

Implements [proposed resolution for
LWG4308](https://cplusplus.github.io/LWG/issue4308) and removes
`const_iterator` from `optional<T&>`, which was missed.

- Constrains iterator to only be available if T is not an lvalue
reference, or if it is T&, that T is an object type and is not an
unbounded array
- Add a partial specialization for `__optional_iterator` for `T&`, which
only has the `iterator` type.
- Correct a static assert message as a drive-by
- Move the libcxx specific iterator test into the standard test because
the standard now specifies when the iterator should be available
DeltaFile
+43-21libcxx/include/optional
+32-2libcxx/test/std/utilities/optional/optional.iterator/iterator.pass.cpp
+0-29libcxx/test/libcxx/utilities/optional/optional.iterator/iterator.compile.pass.cpp
+1-1libcxx/docs/Status/Cxx2cIssues.csv
+76-534 files

LLVM/project 58a5adellvm/lib/Target/AArch64 AArch64SchedOryon.td, llvm/test/CodeGen/AArch64 aarch64-mcpu-oryon-runtime-unroll.ll

[AArch64] -  Allow for aggressive unrolling, with non-zero LoopMicroOpBufferSize for Oryon. (#172422)

Due to LoopMicroOpBufferSize being 0 value in Oryon machine model,
unrolling based on runtime TC was disabled. This is a pseudo value as
Oryon-1 does not have loop-uop buffer in it's micro-architecture. The
value 16 is empirical and inspired by machine model of cortex-a57 and
can be further tuned if required.
DeltaFile
+152-0llvm/test/CodeGen/AArch64/aarch64-mcpu-oryon-runtime-unroll.ll
+1-1llvm/lib/Target/AArch64/AArch64SchedOryon.td
+153-12 files

LLVM/project 3ff2637llvm/lib/CodeGen/SelectionDAG SelectionDAGISel.cpp, llvm/test/TableGen dag-isel-regclass-emit-enum.td dag-isel-subregs.td

[SelectionDAG] Use SLEB128 for signed integers in isel table instead of 'signed rotated'. NFC (#173928)

Previously, we used a VBR that stored the sign bit in bit 0 followed by
the absolute value in subsequent bits.

This patch changes it to use SLEB128 which discards redundant sign bits,
but keeps the bits in the same positions. This uses the same number of
bytes to encode values so doesn't change the table size.

My goal is to remove OPC_EmitStringInteger as a special opcode type.
Instead, we can print the string directly with OPC_EmitInteger for any
string that has an enum value of 0..63.
DeltaFile
+29-17llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
+14-26llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+7-22llvm/utils/TableGen/DAGISelMatcherGen.cpp
+8-4llvm/utils/TableGen/Common/DAGISelMatcher.h
+3-3llvm/test/TableGen/dag-isel-regclass-emit-enum.td
+3-3llvm/test/TableGen/dag-isel-subregs.td
+64-752 files not shown
+65-798 files

LLVM/project 3b7a973llvm/include/llvm/Passes CodeGenPassBuilder.h, llvm/lib/Target/AMDGPU AMDGPUTargetMachine.cpp

[AMDGPU][NPM] add "addPostBBSections()" to NPM (#172793)

Matches Legacy pipeline, GCNPassConfig::addPostBBSections()
DeltaFile
+8-0llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+3-3llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+4-0llvm/include/llvm/Passes/CodeGenPassBuilder.h
+15-33 files

LLVM/project a258924mlir/examples/standalone CMakeLists.txt pyproject.toml, mlir/test/Examples/standalone test.wheel.toy test.toy

gate standalone
DeltaFile
+3-1mlir/examples/standalone/CMakeLists.txt
+2-0mlir/examples/standalone/pyproject.toml
+1-0mlir/test/Examples/standalone/test.wheel.toy
+1-0mlir/test/Examples/standalone/test.toy
+7-14 files

LLVM/project 4b6e55cmlir/include/mlir/Bindings/Python IRCore.h, mlir/lib/Bindings/Python IRCore.cpp MainModule.cpp

move impls
DeltaFile
+2,882-268mlir/lib/Bindings/Python/IRCore.cpp
+6-2,653mlir/lib/Bindings/Python/MainModule.cpp
+13-19mlir/include/mlir/Bindings/Python/IRCore.h
+2,901-2,9403 files

LLVM/project b007420llvm/lib/Target/RISCV RISCVInstrInfoP.td

[RISCV] Use RVPTernary_rrr for accumulator instructions in RISCVInstrInfoP.td. (#173426)

DeltaFile
+3-3llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+3-31 files

LLVM/project eb89c8allvm/test/Instrumentation/MemorySanitizer/AArch64 aarch64-matmul.ll aarch64-bf16-dotprod-intrinsics.ll

[msan][NFCI] Add tests for the matrix multiplication intrinsics on Arm (#174038)

Forked from corresponding files in llvm/test/CodeGen/AArch64
DeltaFile
+547-0llvm/test/Instrumentation/MemorySanitizer/AArch64/aarch64-matmul.ll
+503-0llvm/test/Instrumentation/MemorySanitizer/AArch64/aarch64-bf16-dotprod-intrinsics.ll
+379-0llvm/test/Instrumentation/MemorySanitizer/AArch64/sve-intrinsics-bfloat.ll
+195-0llvm/test/Instrumentation/MemorySanitizer/AArch64/sve-intrinsics-matmul-int8.ll
+46-0llvm/test/Instrumentation/MemorySanitizer/AArch64/aarch64-matmul-fp16.ll
+45-0llvm/test/Instrumentation/MemorySanitizer/AArch64/aarch64-matmul-fp32.ll
+1,715-05 files not shown
+1,851-011 files

LLVM/project 670a68emlir/include/mlir/Dialect/Tensor/IR TensorOps.td, mlir/lib/Dialect/Linalg/Transforms ElementwiseOpFusion.cpp

[mlir][tensor] Preserve encoding in `CollapseShapeOp::build` (#173720)

This PR updates `CollapseShapeOp::build` so that when the result type is
not explicitly provided, the inferred result type preserves the encoding
of the source tensor.
DeltaFile
+7-7mlir/test/Dialect/Linalg/collapse-dim.mlir
+6-5mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+2-4mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
+1-1mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
+16-174 files

LLVM/project 9d15857mlir CMakeLists.txt

[NOMERGE] python bindings on by default
DeltaFile
+1-1mlir/CMakeLists.txt
+1-11 files

LLVM/project 3c03a06mlir/cmake/modules AddMLIRPython.cmake, mlir/lib/Bindings/Python IRCore.cpp MainModule.cpp

try twolevel_namespace
DeltaFile
+28-0mlir/lib/Bindings/Python/IRCore.cpp
+0-28mlir/lib/Bindings/Python/MainModule.cpp
+3-0mlir/cmake/modules/AddMLIRPython.cmake
+31-283 files

LLVM/project a46cb15mlir/include/mlir/Dialect/NVGPU/Utils MMAUtils.h, mlir/include/mlir/Dialect/Vector/Transforms VectorRewritePatterns.h

[mlir][vector] Fix typo in `vector.contract` mnemonic (NFC) (#173661)

DeltaFile
+2-2mlir/include/mlir/Dialect/Vector/Transforms/VectorRewritePatterns.h
+1-1mlir/include/mlir/Dialect/NVGPU/Utils/MMAUtils.h
+1-1mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
+1-1mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp
+5-54 files

LLVM/project 406171emlir/cmake/modules AddMLIRPython.cmake, mlir/lib/Bindings/Python IRCore.cpp MainModule.cpp

try twolevel_namespace
DeltaFile
+33-0mlir/lib/Bindings/Python/IRCore.cpp
+0-28mlir/lib/Bindings/Python/MainModule.cpp
+3-1mlir/cmake/modules/AddMLIRPython.cmake
+3-0mlir/test/Examples/standalone/test.toy
+39-294 files

LLVM/project 71579b1mlir/cmake/modules AddMLIRPython.cmake, mlir/lib/Bindings/Python IRCore.cpp MainModule.cpp

try twolevel_namespace
DeltaFile
+33-0mlir/lib/Bindings/Python/IRCore.cpp
+0-28mlir/lib/Bindings/Python/MainModule.cpp
+3-1mlir/cmake/modules/AddMLIRPython.cmake
+36-293 files

LLVM/project f43d683flang-rt/include/flang-rt/runtime work-queue.h, flang-rt/lib/cuda memmove-function.cpp

Revert "Reland "[flang][cuda] Add support for derived-type initialization on device #172568" (#174033)

This fails https://lab.llvm.org/staging/#/builders/65
This reverts commit 1ac1a547ee3b74b4d02bc94faf02ca0381196d11.
DeltaFile
+17-17flang/test/Lower/allocatable-polymorphic.f90
+6-24flang-rt/lib/runtime/derived.cpp
+0-27flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h
+7-17flang-rt/include/flang-rt/runtime/work-queue.h
+9-9flang/test/Lower/volatile-allocatable.f90
+0-18flang-rt/lib/cuda/memmove-function.cpp
+39-11226 files not shown
+114-20832 files

LLVM/project 6c81859llvm/lib/CodeGen/GlobalISel GISelValueTracking.cpp, llvm/test/CodeGen/AArch64/GlobalISel knownbits-sadde.mir knownbits-uadde.mir

[GlobalISel] Implement G_UADDO/G_UADDE/G_SADDO/G_SADDE for computeKnownBits (#165497)

Addressing the carry out cases Matt mentioned in #159202.

Note: G_[US]SUB[OE] will be implemented in a different PR.
DeltaFile
+275-0llvm/test/CodeGen/AArch64/GlobalISel/knownbits-sadde.mir
+275-0llvm/test/CodeGen/AArch64/GlobalISel/knownbits-uadde.mir
+163-0llvm/test/CodeGen/AArch64/GlobalISel/knownbits-saddo.mir
+163-0llvm/test/CodeGen/AArch64/GlobalISel/knownbits-uaddo.mir
+31-1llvm/lib/CodeGen/GlobalISel/GISelValueTracking.cpp
+1-5llvm/test/CodeGen/X86/GlobalISel/legalize-trailing-zeros-undef.mir
+908-61 files not shown
+909-117 files

LLVM/project 18155c6flang/include/flang/Runtime freestanding-tools.h

[flang][cuda] Fix device compilation after #172913 (#174031)

DeltaFile
+2-1flang/include/flang/Runtime/freestanding-tools.h
+2-11 files

LLVM/project 0bd5975llvm/include/llvm/CodeGen SelectionDAGISel.h, llvm/lib/CodeGen/SelectionDAG SelectionDAGISel.cpp

[SelectionDAG] Use uint8_t instead of unsigned char for isel MatcherTable. (#174014)

These are really the same type, but uint8_t is more accurate since we
make assumptions that a table element is 8 bits when we emit VBRs.
DeltaFile
+21-25llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+1-1llvm/include/llvm/CodeGen/SelectionDAGISel.h
+1-1llvm/test/TableGen/CPtrWildcard.td
+1-1llvm/utils/TableGen/DAGISelMatcherEmitter.cpp
+24-284 files

LLVM/project 0bc6491clang/lib/CodeGen CGBuiltin.cpp, clang/test/CodeGen builtin_clrsb.c

[Clang] Add NUW to the Sub in __builtin_clrsb expansion. (#174010)

The ctlz will produce a value in the range [1..bitwidth]. It can't
produce 0. This means the subtract of 1 will not have unsigned wrap.

It also has no signed wrap, but the optimizer can figure that out on its
own.

It's very likely InstCombine will just drop the NUW when it
canonicalizes to Add, but maybe it will be helpful in some case.
DeltaFile
+2-2clang/test/CodeGen/builtin_clrsb.c
+2-1clang/lib/CodeGen/CGBuiltin.cpp
+4-32 files

LLVM/project 6f6fca1llvm/lib/Transforms/Vectorize VPlanRecipes.cpp

[VPlan] Re-use common cast cost logic for VPReplicateRecipe (NFCI).

Move the logic to compute cast costs to getCostForRecipeWithOpcode and
use for VPReplicateRecipe.

This should match the costs computed by the legacy cost model for scalar
casts.
DeltaFile
+94-59llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+94-591 files

LLVM/project 746ecedllvm/test/Transforms/LoopVectorize/AArch64 load-cast-context.ll pr46950-load-cast-context-crash.ll, llvm/test/Transforms/LoopVectorize/X86 cost-model.ll

[LV] Add extra tests for computing replicating cast costs (NFC)
DeltaFile
+315-314llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+198-0llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll
+0-25llvm/test/Transforms/LoopVectorize/AArch64/pr46950-load-cast-context-crash.ll
+513-3393 files

LLVM/project 35040a0flang/include/flang/Evaluate tools.h, flang/test/Lower/CUDA cuda-data-transfer.cuf

[flang][cuda] Make copy to managed variable on host (#174012)

When the LHS has multiple symbols with the managed attribute, still
perform the copy on the host.
DeltaFile
+19-0flang/test/Lower/CUDA/cuda-data-transfer.cuf
+3-3flang/include/flang/Evaluate/tools.h
+22-32 files

LLVM/project c16480dllvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-minimumnum.ll simplify-demanded-fpclass-maximumnum.ll

InstCombine: Handle minimumnum/maximumnum in SimplifyDemandedFPClass
DeltaFile
+36-59llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-minimumnum.ll
+34-55llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-maximumnum.ll
+64-12llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+134-1263 files

LLVM/project 10f04e1llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-maximum.ll simplify-demanded-fpclass-minimum.ll

InstCombine: Introduce nsz flag on minimum/maximum in SimplifyDemandedFPClass

Alive isn't particularly happy with this in the case where
one of the inputs could be zero, but I think
it's wrong: https://alive2.llvm.org/ce/z/dF7V6k

nsz shouldn't permit introducing a -0 result where
there wasn't one in the input here.
DeltaFile
+46-46llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-maximum.ll
+46-46llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-minimum.ll
+18-2llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+110-943 files

LLVM/project 7669370llvm/include/llvm/Support KnownFPClass.h, llvm/lib/Analysis ValueTracking.cpp

InstCombine: Implement SimplifyDemandedFPClass for sqrt
DeltaFile
+31-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+10-19llvm/lib/Analysis/ValueTracking.cpp
+24-0llvm/lib/Support/KnownFPClass.cpp
+7-11llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-sqrt.ll
+4-0llvm/include/llvm/Support/KnownFPClass.h
+76-305 files

LLVM/project 731dc7dllvm/test/Transforms/InstCombine simplify-demanded-fpclass-maximumnum.ll simplify-demanded-fpclass-minimumnum.ll

InstCombine: Add baseline minimumnum/maximumnum SimplifyDemandedFPClass tests
DeltaFile
+1,625-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-maximumnum.ll
+1,625-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-minimumnum.ll
+3,250-02 files

LLVM/project 9744178llvm/include/llvm/Support KnownFPClass.h, llvm/lib/Analysis ValueTracking.cpp

InstCombine: Handle minimum/maximum in SimplifyDemandedFPClass
DeltaFile
+51-80llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-maximum.ll
+49-76llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-minimum.ll
+26-87llvm/lib/Analysis/ValueTracking.cpp
+94-1llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+92-0llvm/lib/Support/KnownFPClass.cpp
+14-0llvm/include/llvm/Support/KnownFPClass.h
+326-2446 files

LLVM/project dd4394bllvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-fmul.ll

InstCombine: Consider not-inf/nan context when simplifying fmul

Consider if the result can be nan, or if the inputs cannot
be infinity from the flag when trying to simplify fmul into
copysign.
DeltaFile
+18-12llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-fmul.ll
+18-6llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+36-182 files

LLVM/project eb783bbllvm/test/Transforms/InstCombine simplify-demanded-fpclass-sqrt.ll

InstCombine: Add baseline tests for sqrt SimplifyDemandedFPClass
DeltaFile
+206-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-sqrt.ll
+206-01 files

LLVM/project eb1e33allvm/include/llvm/ADT FloatingPointMode.h, llvm/include/llvm/Support KnownFPClass.h

InstCombine: Handle log/log2/log10 in SimplifyDemandedFPClass
DeltaFile
+16-30llvm/lib/Analysis/ValueTracking.cpp
+37-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+9-18llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-log.ll
+17-0llvm/lib/Support/KnownFPClass.cpp
+5-0llvm/include/llvm/ADT/FloatingPointMode.h
+4-0llvm/include/llvm/Support/KnownFPClass.h
+88-486 files