LLVM/project bdb50b0llvm/lib/Support ThreadPool.cpp

[llvm] Fix comment references deprecated make_scope_exit (#175820)

After #173131 and #174030, make_scope_exit is no longer used in
ThreadPool. Fix comment that references old APIs and references the new
API instead.
DeltaFile
+1-1llvm/lib/Support/ThreadPool.cpp
+1-11 files

LLVM/project 4a807e8llvm/test/Transforms/LoopVectorize single_early_exit_live_outs.ll single-early-exit-interleave.ll, llvm/test/Transforms/PhaseOrdering/AArch64 std-find.ll

[VPlan] Optimize BranchOnTwoConds to chain of 2 simple branches. (#174016)

This patch improves the lowering for BranchOnTwoConds added in
https://github.com/llvm/llvm-project/pull/172750 by replacing the branch
on OR with a chain of 2 branches.

On Apple M cores, the new lowering is ~8-10% faster for std::find-like
loops. It also makes it easier to determine the early exits in VPlan. I
am also planning on extensions to support loops with multiple early
exits and early-exits at different positions, which should also be
slightly easier to do with the new representation.


PR: https://github.com/llvm/llvm-project/pull/174016
DeltaFile
+105-131llvm/test/Transforms/LoopVectorize/single_early_exit_live_outs.ll
+134-24llvm/test/Transforms/PhaseOrdering/AArch64/std-find.ll
+36-45llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll
+31-38llvm/test/Transforms/LoopVectorize/single_early_exit.ll
+28-35llvm/test/Transforms/LoopVectorize/single-early-exit-deref-assumptions.ll
+27-31llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll
+361-30410 files not shown
+447-39616 files

LLVM/project d27d75ellvm/lib/Transforms/Vectorize VPlanTransforms.cpp LoopVectorize.cpp, llvm/unittests/Transforms/Vectorize VPlanUncountableExitTest.cpp VPlanHCFGTest.cpp

[VPlan] Use createHeaderPHIRecipes in native path (NFCI).

Simplify tryToBuildVPlan by using createHeaderPHIRecipes in the native
path as well.
DeltaFile
+9-23llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+7-6llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+3-5llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+2-4llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp
+1-2llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp
+22-405 files

LLVM/project 06dc02fclang-tools-extra Maintainers.rst

[clang-tools-extra] Update Maintainers for Clang-Doc

Currently, Erick Velez has been doing the bulk of clang-doc development.
The maintainer being removed hasn't participated in almost a year, so it
would be good to have active maintainers listed in the file.
DeltaFile
+2-2clang-tools-extra/Maintainers.rst
+2-21 files

LLVM/project 9f464f1llvm/test/CodeGen/AMDGPU fneg-combines.f16.ll bf16.ll

AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7

Keep bf16/f16 values encoded as the low half of a 32-bit register,
instead of promoting to float. This avoids unwanted FP effects
from the fpext/fptrunc which should not be implied by just
passing an argument. This also fixes ABI divergence between
SelectionDAG and GlobalISel.

I've wanted to make this change for ages, and failed the last
few times. The main complication was the hack to return
shader integer types in SGPRs, which now needs to inspect
the underlying IR type.
DeltaFile
+372-419llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
+247-430llvm/test/CodeGen/AMDGPU/bf16.ll
+116-174llvm/test/CodeGen/AMDGPU/fcopysign.bf16.ll
+139-139llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
+112-153llvm/test/CodeGen/AMDGPU/select-fabs-fneg-extract.f16.ll
+140-114llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll
+1,126-1,42981 files not shown
+3,579-4,36087 files

LLVM/project 06a1b06llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

AMDGPU: Change ABI of 16-bit element vectors on gfx6/7

Fix ABI on old subtargets so match new subtargets, packing
16-bit element subvectors into 32-bit registers. Previously
this would be scalarized and promoted to i32/float.

Note this only changes the vector cases. Scalar i16/half are
still promoted to i32/float for now. I've unsuccessfully tried
to make that switch in the past, so leave that for later.

This will help with removal of softPromoteHalfType.
DeltaFile
+47,697-51,378llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+14,474-16,242llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+16,328-12,881llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+13,036-14,705llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+11,668-13,311llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+10,558-11,908llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+113,761-120,425151 files not shown
+200,132-204,069157 files

LLVM/project fa1d723llvm/lib/Support ThreadPool.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+1-1llvm/lib/Support/ThreadPool.cpp
+1-11 files

LLVM/project a80cec9llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-fma.ll

InstCombine: Implement SimplifyDemandedFPClass for fma

This can't do much filtering on the sources, except for nans.
We can also attempt to introduce ninf/nnan.
DeltaFile
+65-14llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+16-31llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-fma.ll
+81-452 files

LLVM/project 57541fellvm/test/Transforms/InstCombine simplify-demanded-fpclass-fma.ll

InstCombine: Add baseline fma tests for SimplifyDemandedFPClass
DeltaFile
+316-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-fma.ll
+316-01 files

LLVM/project fe52072llvm/lib/Analysis ValueTracking.cpp

Fix regression
DeltaFile
+1-5llvm/lib/Analysis/ValueTracking.cpp
+1-51 files

LLVM/project 55017b8llvm/lib/Support KnownFPClass.cpp, llvm/test/Transforms/Attributor nofpclass-fma.ll

Can't prove -0 for fma
DeltaFile
+24-24llvm/test/Transforms/Attributor/nofpclass-fma.ll
+21-6llvm/lib/Support/KnownFPClass.cpp
+45-302 files

LLVM/project 4b8b922llvm/include/llvm/Support KnownFPClass.h, llvm/lib/Analysis ValueTracking.cpp

ValueTracking: Improve handling for fma/fmuladd

The handling for fma was very basic and only handled the
repeated input case. Re-use the fmul and fadd handling for more
accurate sign bit and nan handling.
DeltaFile
+44-44llvm/test/Transforms/Attributor/nofpclass-fma.ll
+34-12llvm/lib/Analysis/ValueTracking.cpp
+13-0llvm/lib/Support/KnownFPClass.cpp
+11-0llvm/include/llvm/Support/KnownFPClass.h
+102-564 files

LLVM/project b9c6a6ellvm/test/Transforms/Attributor nofpclass-fma.ll

ValueTracking: Add baseline tests for improved fma handling

Improved signbit and not-nan tracking.
DeltaFile
+392-4llvm/test/Transforms/Attributor/nofpclass-fma.ll
+392-41 files

LLVM/project 29cc6a8llvm/lib/CodeGen/GlobalISel CallLowering.cpp

GlobalISel: Fix mishandling vector-as-scalar in return values

This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16>
values as packed on gfx6/gfx7. The ABI does not pack values
currently; this is a pre-fix for that change.

Insert a bitcast if there is a single part with a different size.
Previously this would miscompile by going through the scalarization
and extend path, dropping the high element.

Also fix assertions in odd cases, like <3 x i16> -> i32. This needs
to unmerge with excess elements from the widened source vector.

All of this code is in need of a cleanup; this should look more
like the DAG version using getVectorTypeBreakdown.
DeltaFile
+24-2llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
+24-21 files

LLVM/project 50703fallvm/test/CodeGen/AMDGPU llc-pipeline-npm.ll

[AMDGPU][Test][AIX] use tr instead of sed for line split (#175557)

Test case is using sed command `sed 's/,/,\n/g'` to split a line.
On AIX that is not working with the AIX system's `sed`

AIX external BB fails from
https://lab.llvm.org/buildbot/#/builders/64/builds/6911

Here substitute:
`sed 's/,/,\n/g'`
with:
`tr ',' '\n'`
but because `tr` does not keeps the comma, also needed to change looked
for texts i.e. to remove the comma `,` from them since it is not needed
for the correctness.

Co-authored-by: Daniel Chen <cdchen at ca.ibm.com>
DeltaFile
+418-418llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+418-4181 files

LLVM/project 6f03d38bolt/docs BinaryAnalysis.md

WIP: Rework user-facing documentation of BOLT gadget scanner
DeltaFile
+322-78bolt/docs/BinaryAnalysis.md
+322-781 files

LLVM/project bf5ee06mlir/lib/IR Remarks.cpp

[mlir] Use bind_front in RemarkEngine. NFC. (#175818)

Switch from C++11 `std::bind` to C++26 `bind_front` backported in
https://github.com/llvm/llvm-project/pull/175056.

The former is an old design that predates lambdas and uses explicit
placeholders. `bind_front` should produce a much smaller object (we only
need one pointer).
DeltaFile
+1-3mlir/lib/IR/Remarks.cpp
+1-31 files

LLVM/project c620b47llvm/include/llvm/Support KnownFPClass.h, llvm/lib/Analysis ValueTracking.cpp

ValueTracking: Account for undef in adjustKnownFPClassForSelectArm (#175372)

This needs to consider undef like the KnownBits case does.
DeltaFile
+600-600llvm/test/Transforms/Attributor/nofpclass-implied-by-fcmp.ll
+25-25llvm/test/Transforms/Attributor/nofpclass-select.ll
+26-11llvm/test/Transforms/Attributor/nofpclass.ll
+19-6llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+12-2llvm/include/llvm/Support/KnownFPClass.h
+9-3llvm/lib/Analysis/ValueTracking.cpp
+691-6471 files not shown
+692-6487 files

LLVM/project 0494132llvm/lib/Target/RISCV RISCVInstrInfoP.td

[RISCV] Add isCommutable to more P extension instructions. (#175722)

DeltaFile
+51-51llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+51-511 files

LLVM/project 7b699ccllvm/lib/Target/AMDGPU GCNSchedStrategy.cpp GCNSchedStrategy.h, llvm/test/CodeGen/AMDGPU machine-scheduler-rematerialization-scoring.mir machine-scheduler-sink-trivial-remats-attr.mir

Revert "[AMDGPU][Scheduler] Scoring system for rematerializations (#175050)" (#175813)

This reverts 8ab79377740789f6a34fc6f04ee321a39ab73724 and
f21e3593371c049380f056a539a1601a843df558 which are causing a HIP failure
in a Blender test.
DeltaFile
+290-503llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+0-523llvm/test/CodeGen/AMDGPU/machine-scheduler-rematerialization-scoring.mir
+194-194llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir
+35-242llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir
+50-208llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+5-5llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
+574-1,6751 files not shown
+575-1,6767 files

LLVM/project 313a382llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 bit-test.ll switch-cases-to-branch-and.ll

AArch64: Add TBZ/TBNZ matcher for x & (1 << y).

x & (1 << y) is InstCombine's canonical form of a bit test which is
currently code generated literally, missing an opportunity to use TBZ/TBNZ
on bit 0 of x >> y, which generally results in an instruction sequence
that is shorter by 2 instructions. Implement this optimization. On my
machine this results in a 0.05% reduction in clang binary size and a 0.25%
reduction in dynamic instruction count compiling AArch64ISelLowering.cpp.

Reviewers: davemgreen, fhahn

Reviewed By: davemgreen

Pull Request: https://github.com/llvm/llvm-project/pull/172962
DeltaFile
+94-0llvm/test/CodeGen/AArch64/bit-test.ll
+39-26llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+16-24llvm/test/CodeGen/AArch64/switch-cases-to-branch-and.ll
+149-503 files

LLVM/project 2f4fb38clang/lib/Driver/ToolChains Clang.cpp, lld/ELF/Arch X86_64.cpp

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+6-6llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+3-3lld/ELF/Arch/X86_64.cpp
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+2-1llvm/lib/IR/Verifier.cpp
+1-1llvm/include/llvm/MC/MCSection.h
+1-1llvm/include/llvm/MC/MCStreamer.h
+16-146 files

LLVM/project d0eb856clang/lib/Driver/ToolChains Clang.cpp, lld/ELF/Arch X86_64.cpp

Format

Created using spr 1.3.6-beta.1
DeltaFile
+6-6llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+3-3lld/ELF/Arch/X86_64.cpp
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+2-1llvm/lib/IR/Verifier.cpp
+1-1llvm/include/llvm/MC/MCSection.h
+1-1llvm/include/llvm/MC/MCStreamer.h
+16-146 files

LLVM/project 86c8002clang/lib/Driver/ToolChains Clang.cpp, lld/ELF/Arch X86_64.cpp

Format

Created using spr 1.3.6-beta.1
DeltaFile
+6-6llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
+3-3lld/ELF/Arch/X86_64.cpp
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+2-1llvm/lib/IR/Verifier.cpp
+1-1llvm/include/llvm/MC/MCSection.h
+1-1llvm/include/llvm/MC/MCStreamer.h
+16-146 files

LLVM/project 2148e17clang/lib/Driver/ToolChains Clang.cpp, lld/ELF/Arch X86_64.cpp

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+3-3lld/ELF/Arch/X86_64.cpp
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+1-1llvm/include/llvm/MC/MCStreamer.h
+1-1llvm/include/llvm/MC/MCSection.h
+8-74 files

LLVM/project 87653f6clang/lib/Driver/ToolChains Clang.cpp, lld/ELF/Arch X86_64.cpp

Format

Created using spr 1.3.6-beta.1
DeltaFile
+3-3lld/ELF/Arch/X86_64.cpp
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+1-1llvm/include/llvm/MC/MCSection.h
+1-1llvm/include/llvm/MC/MCStreamer.h
+8-74 files

LLVM/project 3a3b379clang/lib/Driver/ToolChains Clang.cpp, llvm/include/llvm/MC MCStreamer.h MCSection.h

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+1-1llvm/include/llvm/MC/MCStreamer.h
+1-1llvm/include/llvm/MC/MCSection.h
+5-43 files

LLVM/project 4521776clang/lib/Driver/ToolChains Clang.cpp, llvm/include/llvm/MC MCSection.h MCStreamer.h

Format

Created using spr 1.3.6-beta.1
DeltaFile
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+1-1llvm/include/llvm/MC/MCSection.h
+1-1llvm/include/llvm/MC/MCStreamer.h
+5-43 files

LLVM/project f5bd814clang/lib/Driver/ToolChains Clang.cpp, llvm/include/llvm/MC MCStreamer.h MCSection.h

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.6-beta.1

[skip ci]
DeltaFile
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+1-1llvm/include/llvm/MC/MCStreamer.h
+1-1llvm/include/llvm/MC/MCSection.h
+5-43 files

LLVM/project f3d6daeclang/lib/Driver/ToolChains Clang.cpp, llvm/include/llvm/MC MCStreamer.h MCSection.h

Format

Created using spr 1.3.6-beta.1
DeltaFile
+3-2clang/lib/Driver/ToolChains/Clang.cpp
+1-1llvm/include/llvm/MC/MCStreamer.h
+1-1llvm/include/llvm/MC/MCSection.h
+5-43 files