LLVM/project bfa3da8clang/lib/AST/ByteCode Record.h Record.cpp

[clang][bytecode] Optimize `interp::Record` a bit (#183494)

And things around it.

Remove the `FieldMap`, since we can use the field's index instead and
only keep an array around. `reserve()` the sizes and use
`emplace_back()`.
DeltaFile
+15-7clang/lib/AST/ByteCode/Record.h
+1-9clang/lib/AST/ByteCode/Record.cpp
+5-3clang/lib/AST/ByteCode/Program.cpp
+21-193 files

LLVM/project bb30e28llvm/lib/Target/AArch64 AArch64InstrInfo.cpp, llvm/unittests/Target/AArch64 InstSizes.cpp

[AArch64] Report accurate sizes for MOVaddr and MOVimm pseudos
DeltaFile
+89-0llvm/unittests/Target/AArch64/InstSizes.cpp
+28-16llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+117-162 files

LLVM/project 9e48c00llvm/test/CodeGen/AMDGPU local-stack-alloc-add-references.gfx8.mir coalesce-copy-to-agpr-to-av-registers.mir, llvm/test/TableGen ArtificialRegs.td

[TableGen] Complete the support for artificial registers

Artificial registers were added in eb0c510ecde667cd911682cc1e855f73f341d134
as a means of giving super-registers heavier weights than that
of their subregisters, even when they only contain a single
physical subregister.

Artifical registers thus do exist in code and participate in
register unit weight calculations, but are not supposed to be
available for register allocation.

This patch completes the support for artificial registers to:

- Ignore artificial registers when joining register unit uber
  sets. Artificial registers may be members of classes that
  together include registers and their sub-registers, making it
  impossible to compute normalised weights for uber sets they
  belong to.


    [28 lines not shown]
DeltaFile
+180-180llvm/test/CodeGen/AMDGPU/local-stack-alloc-add-references.gfx8.mir
+120-120llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
+90-90llvm/test/CodeGen/AMDGPU/local-stack-alloc-add-references.gfx9.mir
+60-7llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+56-0llvm/test/TableGen/ArtificialRegs.td
+18-18llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir
+524-41525 files not shown
+675-56231 files

LLVM/project d8ce0e7flang/lib/Semantics check-omp-loop.cpp check-omp-structure.h

[flang][OpenMP] Inline CheckNestedBlock, NFC (#181732)

CheckNestedBlock no longer calls itself, which was the primary reason
for the code to be in a separate function.
DeltaFile
+21-26flang/lib/Semantics/check-omp-loop.cpp
+0-2flang/lib/Semantics/check-omp-structure.h
+21-282 files

LLVM/project d3f76b3llvm/lib/Target/AArch64 AArch64InstrInfo.cpp

[AArch64] Report accurate sizes for MOVaddr and MOVimm pseudos
DeltaFile
+28-16llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+28-161 files

LLVM/project 1ec3b86llvm/lib/Target/AArch64 AArch64ExpandPseudo.cpp AArch64ExpandImm.cpp

[NFC][AArch64] Extract MOVaddr* expansion model into common header

This makes the expansion logic reusable by getInstSizeInBytes in a
follow-up patch.
DeltaFile
+742-0llvm/lib/Target/AArch64/AArch64ExpandPseudo.cpp
+0-722llvm/lib/Target/AArch64/AArch64ExpandImm.cpp
+75-56llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+42-0llvm/lib/Target/AArch64/AArch64ExpandPseudo.h
+0-35llvm/lib/Target/AArch64/AArch64ExpandImm.h
+10-9llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+869-8225 files not shown
+886-83911 files

LLVM/project 254cb2allvm/include/llvm/CodeGen TargetInstrInfo.h, llvm/lib/CodeGen PostRAHazardRecognizer.cpp

[AMDGPU] Hoist WMMA coexecution hazard V_NOPs from loops to preheaders (#176895)

On GFX1250, V_NOPs inserted for WMMA coexecution hazards are placed at
the use-site. When the hazard-consuming instruction is inside a loop and
the WMMA is outside, these NOPs execute every iteration even though the
hazard only needs to be covered once.

This patch hoists the V_NOPs to the loop preheader, reducing executions
from N iterations to 1.

```
Example (assuming a hazard requiring K V_NOPs):
  Before:
    bb.0 (preheader): WMMA writes vgpr0
    bb.1 (loop):      V_NOP xK, VALU reads vgpr0, branch bb.1
                      -> K NOPs executed per iteration

  After:
    bb.0 (preheader): WMMA writes vgpr0, V_NOP xK

    [12 lines not shown]
DeltaFile
+516-30llvm/test/CodeGen/AMDGPU/wmma-nop-hoisting.mir
+163-62llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+21-4llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+14-7llvm/lib/CodeGen/PostRAHazardRecognizer.cpp
+3-2llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+3-1llvm/include/llvm/CodeGen/TargetInstrInfo.h
+720-1061 files not shown
+722-1077 files

LLVM/project 32b8b9bllvm/lib/Transforms/Vectorize VPlanConstruction.cpp VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize use-scalar-epilogue-if-tp-fails.ll

[VPlan] Simplify ExitingIVValue and use for tail-folded IVs. (#182507)

Now that we have ExitingIVValue, we can also use it for tail-folded
loops; the only difference is that we have to compute the end value with
the original trip count instead the vector trip count.

This allows removing the induction increment operand only used when
tail-folding.

PR: https://github.com/llvm/llvm-project/pull/182507
DeltaFile
+66-11llvm/test/Transforms/LoopVectorize/X86/fold-tail-low-trip-count.ll
+48-8llvm/test/Transforms/LoopVectorize/AArch64/fold-tail-low-trip-count.ll
+12-26llvm/test/Transforms/LoopVectorize/use-scalar-epilogue-if-tp-fails.ll
+10-8llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+12-6llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+4-9llvm/test/Transforms/LoopVectorize/X86/small-size.ll
+152-688 files not shown
+167-9614 files

LLVM/project f02c6eallvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor/AMDGPU nofpclass-amdgcn-trig-preop.ll

AMDGPU: llvm.amdgcn.trig.preop cannot return negative values (#183306)

This returns a positive value less than 1.
DeltaFile
+2-2llvm/test/Transforms/Attributor/AMDGPU/nofpclass-amdgcn-trig-preop.ll
+2-1llvm/lib/Analysis/ValueTracking.cpp
+4-32 files

LLVM/project 69115bellvm/test/CodeGen/AMDGPU annotate-kernel-features-hsa.ll attr-amdgpu-max-num-workgroups-propagate.ll

AMDGPU: Stop adding uniform-work-group-size=false

This is one of the string attributes that takes a boolean
value for no reason. There is no point in ever writing this
with an explicit false. Stop adding the noise and reporting
an unnecessary change.
DeltaFile
+45-44llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll
+29-35llvm/test/CodeGen/AMDGPU/attr-amdgpu-max-num-workgroups-propagate.ll
+29-33llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
+24-24llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-sincos.ll
+23-23llvm/test/CodeGen/AMDGPU/amdgpu-attributor-min-agpr-alloc.ll
+21-21llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll
+171-18033 files not shown
+302-32739 files

LLVM/project 516d902llvm/lib/Target/AArch64 AArch64ExpandPseudo.cpp AArch64ExpandImm.cpp

[NFC][AArch64] Extract MOVaddr* expansion model into common header

This makes the expansion logic reusable by getInstSizeInBytes in a
follow-up patch.
DeltaFile
+742-0llvm/lib/Target/AArch64/AArch64ExpandPseudo.cpp
+0-722llvm/lib/Target/AArch64/AArch64ExpandImm.cpp
+75-56llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+42-0llvm/lib/Target/AArch64/AArch64ExpandPseudo.h
+0-35llvm/lib/Target/AArch64/AArch64ExpandImm.h
+9-9llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+868-8225 files not shown
+885-83911 files

LLVM/project 7cddd7bllvm/include/llvm/Analysis ScalarEvolution.h ScalarEvolutionExpressions.h, llvm/lib/Analysis ScalarEvolution.cpp

[SCEV] Introduce SCEVUse wrapper type (NFC)

Add SCEVUse as a PointerIntPair wrapper around const SCEV * to prepare
for storing additional per-use information.

This commit contains the mechanical changes of adding an intial SCEVUse
wrapper and updating all relevant interfaces to take SCEVUse. Note that
currently the integer part is never set, and all SCEVUses are
considered canonical.
DeltaFile
+295-249llvm/lib/Analysis/ScalarEvolution.cpp
+156-47llvm/include/llvm/Analysis/ScalarEvolution.h
+78-70llvm/include/llvm/Analysis/ScalarEvolutionExpressions.h
+36-29llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+25-26llvm/lib/Transforms/Scalar/NaryReassociate.cpp
+17-18llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+607-43922 files not shown
+725-54328 files

LLVM/project cd68939llvm/docs AMDGPUUsage.rst, llvm/lib/Target/AMDGPU AMDGPUAsmPrinter.cpp

[AMDGPU] Add attribute for FWD_PROGRESS (#181675)

Added an attribute for FWD_PROGRESS that allows it to be
turned off for some shaders.
DeltaFile
+5-4llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll
+4-0llvm/docs/AMDGPUUsage.rst
+1-1llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+10-53 files

LLVM/project 90b3fd7llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp TargetLowering.cpp, llvm/test/CodeGen/X86 known-pow2.ll

[DAG] Move (X +/- Y) & Y --> ~X & Y fold from visitAnd to SimplifyDemandedBits (#183270)

Add DemandedElts handling to allow better vector support

To prevent RISCV falling back to a mul call in known-never-zero.ll I've
had to tweak the (mul step_vector(C0), C1) to (step_vector(C0 * C1))
fold to only occur if C0 is already non-power-of-2, C0 * C1 is a
power-of-2 or the target has good mul support.
DeltaFile
+5-11llvm/test/CodeGen/X86/known-pow2.ll
+4-8llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+11-0llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+20-193 files

LLVM/project ff5dcb1llvm/lib/Target/SPIRV SPIRVISelLowering.cpp SPIRVInstructionSelector.cpp

[SPIRV] Simplify `selectPhi` and remove unreachable code (#183060)

Before it created a `OpPhi` with a Type argument, to immediately remove
this Type and change the opcode to `PHI`.

Only `TargetOpcode::PHI` get to `SPIRVTargetLowering::finalizeLowering`.

The `TargetOpcode::PHI` gets lowered to `SPIRV::OpPhi` much later, by
`patchPhi` in the `SPIRVModuleAnalysis`.

`SPIRVModuleAnalysis` is requested by the
`SPIRVAsmPrinter` through `getAnalysisUsage` (which is ugly).
```
DeltaFile
+0-16llvm/lib/Target/SPIRV/SPIRVISelLowering.cpp
+5-9llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+5-252 files

LLVM/project d46a400llvm/lib/Target/PowerPC PPCISelLowering.cpp, llvm/test/CodeGen/PowerPC fma-combine.ll

[PowerPC] Remove `NoSignedZerosFPMath` uses (#180087)

Users should use `nsz` flag only.
DeltaFile
+225-59llvm/test/CodeGen/PowerPC/fma-combine.ll
+2-4llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+227-632 files

LLVM/project 664666bclang/include/clang/Basic BuiltinsAMDGPU.td, clang/test/CodeGenHIP builtins-amdgcn-gfx1250-wmma-f16.hip

[Clang][AMDGPU] Change __fp16 to _Float16 in GFX1250 WMMA/SWMMAC builtin definitions
DeltaFile
+469-0clang/test/CodeGenHIP/builtins-amdgcn-gfx1250-wmma-f16.hip
+16-16clang/include/clang/Basic/BuiltinsAMDGPU.td
+485-162 files

LLVM/project 8540986llvm/test/Transforms/FunctionAttrs nofpclass-callsite-prop.ll

Rename
DeltaFile
+2-2llvm/test/Transforms/FunctionAttrs/nofpclass-callsite-prop.ll
+2-21 files

LLVM/project 7a560a2llvm/test/Transforms/FunctionAttrs nofpclass-callsite-prop.ll

Add another use in noundef test
DeltaFile
+3-0llvm/test/Transforms/FunctionAttrs/nofpclass-callsite-prop.ll
+3-01 files

LLVM/project b101f01libcxx/include/__iterator ostreambuf_iterator.h, libcxx/include/__locale_dir pad_and_output.h

[libc++] Optimize using std::copy with an ostreambuf_iterator (#181815)

```
Benchmark                                                  old             new    Difference    % Difference
----------------------------------------------  --------------  --------------  ------------  --------------
std::copy(CharT*,_CharT*,_ostreambuf_iterator)         8115.45          329.54      -7785.91         -95.94%
```
DeltaFile
+53-0libcxx/test/std/algorithms/alg.modifying.operations/alg.copy/ostreambuf.copy.pass.cpp
+26-0libcxx/test/benchmarks/streams/copy.bench.cpp
+26-0libcxx/test/support/stream_types.h
+23-0libcxx/include/__iterator/ostreambuf_iterator.h
+2-14libcxx/include/__locale_dir/pad_and_output.h
+12-0libcxx/test/libcxx/algorithms/specialized_algorithms.compile.pass.cpp
+142-141 files not shown
+144-147 files

LLVM/project 23dfffflldb/include/lldb/Utility AnsiTerminal.h, lldb/test/API/commands/help TestHelp.py

[lldb] Fix issues handling ANSI codes and Unicode in option help (#183314)

Fixes #177570, and a bunch of FIXMEs for other tests known to be
incorrect.

To do this, I have adapted code from the existing ansi::TrimAndPad. At
first I tried a wrapper function, but there's a few things we need to
handle that cannot be done with a simple wrapper.

We must only split at word boundaries. This requires knowing whether the
last adjustment, which may be the final adjustment, was made at, or just
before, a word boundary. Also it must check for single words wider than
the requested width (though this you could do with a wrapper).

For this reason, the new TrimAtWordBoundary has more special case checks
and a more complex inner loop. Though the core is the same split into
left, ansi escape code and right that TrimAndPad uses.

It is that splitting that implements the "bias" we need to print

    [20 lines not shown]
DeltaFile
+130-56lldb/include/lldb/Utility/AnsiTerminal.h
+149-35lldb/unittests/Utility/AnsiTerminalTest.cpp
+14-6lldb/test/API/commands/help/TestHelp.py
+293-973 files

LLVM/project 230481cllvm/lib/CodeGen/MIRParser MIRParser.cpp

CodeGen: Change error messages to follow phrasing guidance (#183488)

Avoid contractions and starting with a capital. These seem to be
missing tests.
DeltaFile
+2-2llvm/lib/CodeGen/MIRParser/MIRParser.cpp
+2-21 files

LLVM/project ec58575llvm/lib/Target/Mips MipsInstrFPU.td MicroMipsInstrFPU.td, llvm/test/CodeGen/Mips fmadd1.ll fabs.ll

[Mips] Remove NoNaNsFPMath uses (#183045)

Remove `NoNaNsFPMath` by using `PatFrag`, we should only use `nnan`.
Duplicate tests in `CodeGen/Mips/llvm-ir/nan-fp-attr.ll` are removed.
DeltaFile
+198-7llvm/test/CodeGen/Mips/fmadd1.ll
+1-167llvm/test/CodeGen/Mips/llvm-ir/nan-fp-attr.ll
+43-37llvm/test/CodeGen/Mips/fabs.ll
+47-13llvm/lib/Target/Mips/MipsInstrFPU.td
+5-5llvm/lib/Target/Mips/MicroMipsInstrFPU.td
+2-2llvm/lib/Target/Mips/MipsISelLowering.cpp
+296-2311 files not shown
+296-2347 files

LLVM/project bff5ef6llvm/test/CodeGen/X86 funnel-shift-i512.ll

[X86] Add i512 funnel shift / rotate test coverage (#183486)

DeltaFile
+5,445-0llvm/test/CodeGen/X86/funnel-shift-i512.ll
+5,445-01 files

LLVM/project 67ac275mlir/include/mlir/Dialect/X86 X86.td, mlir/include/mlir/Dialect/X86Vector X86Vector.td

[mlir][x86] Rename x86vector to x86 (#183311)

Renames 'x86vector' dialect to 'x86'.

This is the first PR in series of cleanups around dialects targeting x86
platforms.
The new naming scheme is shorter, cleaner, and opens possibility of
integrating other x86-specific operations not strictly fitting pure
vector representation. For example, the generalization will allow for
future merger of AMX dialect into the x86 dialect to create one-stop x86
operations collection and boost discoverability.
DeltaFile
+1,480-0mlir/test/Dialect/X86/vector-contract-to-packed-type-dotproduct.mlir
+0-1,480mlir/test/Dialect/X86Vector/vector-contract-to-packed-type-dotproduct.mlir
+0-1,316mlir/test/Dialect/X86Vector/vector-contract-bf16-to-fma.mlir
+1,315-0mlir/test/Dialect/X86/vector-contract-bf16-to-fma.mlir
+676-0mlir/include/mlir/Dialect/X86/X86.td
+0-676mlir/include/mlir/Dialect/X86Vector/X86Vector.td
+3,471-3,472122 files not shown
+8,969-8,987128 files

LLVM/project 99a74aallvm/lib/CodeGen/MIRParser MIRParser.cpp

CodeGen: Change error messages to follow phrasing guidance

Avoid contractions and starting with a capital. These seem to be
missing tests.
DeltaFile
+2-2llvm/lib/CodeGen/MIRParser/MIRParser.cpp
+2-21 files

LLVM/project 6d7ec4bllvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp DAGCombiner.cpp, llvm/test/CodeGen/X86 known-pow2.ll

[DAG] Improved ISD::SHL handling in isKnownToBeAPowerOfTwo (#181882)

Fixes  #181650
DeltaFile
+51-0llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
+3-7llvm/test/CodeGen/X86/known-pow2.ll
+4-4llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+2-3llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+60-144 files

LLVM/project 0f415a3llvm/lib/Target/AMDGPU SIRegisterInfo.cpp

Use `findCommonRegClass` API.
DeltaFile
+3-7llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+3-71 files

LLVM/project 369708bllvm/lib/Target/AArch64 AArch64ExpandPseudoInsts.cpp AArch64ExpandImm.cpp

[NFC][AArch64] Extract MOVaddr* expansion model into AArch64ExpandImm

This makes the expansion logic reusable by getInstSizeInBytes in a
follow-up patch.
DeltaFile
+72-53llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+20-0llvm/lib/Target/AArch64/AArch64ExpandImm.cpp
+7-0llvm/lib/Target/AArch64/AArch64ExpandImm.h
+99-533 files

LLVM/project 850b045lldb/source/Plugins/Process/FreeBSD-Kernel-Core ProcessFreeBSDKernelCore.cpp CMakeLists.txt, llvm/docs ReleaseNotes.md

Revert "[lldb][Process/FreeBSDKernelCore] Implement DoWriteMemory()" (#183485)

Reverts llvm/llvm-project#183237

This was landed without addressing review comments.
DeltaFile
+2-69lldb/source/Plugins/Process/FreeBSD-Kernel-Core/ProcessFreeBSDKernelCore.cpp
+0-12lldb/source/Plugins/Process/FreeBSD-Kernel-Core/CMakeLists.txt
+0-8lldb/source/Plugins/Process/FreeBSD-Kernel-Core/ProcessFreeBSDKernelCoreProperties.td
+0-6lldb/source/Plugins/Process/FreeBSD-Kernel-Core/ProcessFreeBSDKernelCore.h
+0-2llvm/docs/ReleaseNotes.md
+2-975 files