LLVM/project 84812fdllvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

[NFC][DAG] scalarizeExtractedBinOp - pull out constant build vector detection into isExtractFree helper (#197155)

Prep work for #196493
DeltaFile
+6-5llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+6-51 files

LLVM/project 783cb86llvm/lib/Transforms/InstCombine InstCombineAndOrXor.cpp

[NFC][InstCombine] fix duplicate CreateNot in ((A^C)^B) & (B^A) fold
DeltaFile
+1-1llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+1-11 files

LLVM/project e8f4a57llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

save one knownbits computation
DeltaFile
+7-4llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+7-41 files

LLVM/project ab87cb0llvm/test/CodeGen/X86 vector-reduce-add-mask.ll vector-reduce-add-sext.ll

[X86] vector-reduce-add-*.ll - add 32-bit test coverage (#197152)
DeltaFile
+1,646-808llvm/test/CodeGen/X86/vector-reduce-add-mask.ll
+1,463-870llvm/test/CodeGen/X86/vector-reduce-add-sext.ll
+1,188-489llvm/test/CodeGen/X86/vector-reduce-add.ll
+441-194llvm/test/CodeGen/X86/vector-reduce-add-zext.ll
+4,738-2,3614 files

LLVM/project 06615e0llvm/test/CodeGen/X86 vector-reduce-ctpop.ll

[X86] vector-reduce-ctpop.ll - add 32-bit test coverage (#197149)
DeltaFile
+4,686-918llvm/test/CodeGen/X86/vector-reduce-ctpop.ll
+4,686-9181 files

LLVM/project a464577llvm/lib/IR Instructions.cpp

[IR] Preserve samesign when cloning ICmpInst (#197118)

Clone should preserve IR flags faithfully.
DeltaFile
+3-1llvm/lib/IR/Instructions.cpp
+3-11 files

LLVM/project 41e493allvm/lib/CodeGen InlineSpiller.cpp, llvm/test/CodeGen/AMDGPU remat-through-copy.mir

[RegAlloc] Trace through COPYs to find rematerializable definitions (#190955)

After live range splitting, successful rematerialization in one split
interval can remove the original defining instruction, leaving only COPY
instructions in other split intervals. When attempting to rematerialize
uses in those intervals, the code fails to find the original definition
and gives up.

This patch traces backwards through COPY chains to recover the original
rematerializable definition instead of giving up.
DeltaFile
+82-0llvm/test/CodeGen/AMDGPU/remat-through-copy.mir
+39-4llvm/lib/CodeGen/InlineSpiller.cpp
+8-30llvm/test/CodeGen/RISCV/rvv/remat.ll
+129-343 files

LLVM/project de0a051mlir/docs Tokens.md LangRef.md, mlir/include/mlir/IR CommonTypeConstraints.td

address comments
DeltaFile
+17-21mlir/docs/Tokens.md
+2-17mlir/test/IR/token-type.mlir
+1-8mlir/include/mlir/IR/CommonTypeConstraints.td
+0-7mlir/test/lib/Dialect/Test/TestOps.td
+2-1mlir/docs/LangRef.md
+22-545 files

LLVM/project c2d51a2llvm/lib/Transforms/Vectorize VPlanAnalysis.cpp VPlan.h

[VPlan] Add Type* and getType() to VPSymbolicValue (NFC) (#195183)

Add a Type* field to VPSymbolicValue, along with a getType() methods to
query the stored scalar type.

This makes it easier to retrieve the type of various symbolic values,
and also simplifies VPTypeAnalysis construction.

PR: https://github.com/llvm/llvm-project/pull/195183
DeltaFile
+3-23llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+16-8llvm/lib/Transforms/Vectorize/VPlan.h
+7-4llvm/lib/Transforms/Vectorize/VPlan.cpp
+7-1llvm/lib/Transforms/Vectorize/VPlanValue.h
+2-6llvm/lib/Transforms/Vectorize/VPlanAnalysis.h
+4-3llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+39-455 files not shown
+45-5011 files

LLVM/project 0b53aa8llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

save some knownbits computations

Co-authored-by: Copilot <copilot at github.com>
DeltaFile
+9-5llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+9-51 files

LLVM/project 9bb1933clang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn.hip

[CIR][AMDGPU] Add lowering for amdgcn readlane readfirstlane builtins
DeltaFile
+16-0clang/test/CIR/CodeGenHIP/builtins-amdgcn.hip
+13-2clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+29-22 files

LLVM/project 215bd25clang/lib/StaticAnalyzer/Core ExprEngine.cpp

[analyzer] Clean up evalBind, fix bad logic (#196313)

This commit refactors `ExprEngine::evalBind` to eliminate the use of a
`NodeBuilder` and fix incorrect logic that was apparently introduced
because the `NodeBuilder` had obfuscated the underlying set operations.

In the special case when the engine is binding to an `Unknown` or
`Undefined` memory location, with the old code on each execution path
_either_ only the `check::Bind` checkers _or_ only the pointer escape
checkers were invoked. This commit ensures that on each execution path
_both_ the `check::Bind` checkers _and then_ the pointer escape checkers
get a chance to activate.

I'm pretty sure that the bad logic did not cause incorrect behavior of
the analyzer, because there are no `checkBind` checkers that generate
non-sink transitions when the location is `Unknown` or `Undefined`.

I also added an assertion that the location argument of `evalBind`
cannot be a `NonLoc`, because this is a common sense precondition, seems
to be actually true and makes it easier to reason about the behavior of
this function.
DeltaFile
+19-33clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+19-331 files

LLVM/project 26cae62lldb/source/Host/common NativeProcessProtocol.cpp, lldb/source/Plugins/Process/FreeBSD NativeProcessFreeBSD.cpp

 Reapply "[lldb] Do not refcount breakpoints in lldb-server" (#195858)  (#196891)

This reapplies #195858 with a fix for 32-bit arm (and generally, any
architecture that uses software single-stepping). The problem was that
the temporary breakpoints used for single-stepping were interfering with
the breakpoints set by the client.

The fix is to check for existing breakpoints before setting the
temporary ones. To achieve this, I've separated the notion of "next PC
candidates for a thread" from "step breakpoints we've actually set".

The freebsd code had some software single stepping code, but:
- this was [introduced](https://reviews.llvm.org/D95802) for mips64
support, which was
[removed](https://github.com/llvm/llvm-project/pull/179582) earlier this
year
- AFAICT, this never worked since the original patch only checked
`m_threads_stepping_with_breakpoint`, but never set it to anything.


    [18 lines not shown]
DeltaFile
+19-21lldb/source/Plugins/Process/Utility/NativeProcessSoftwareSingleStep.cpp
+1-16lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
+4-10lldb/source/Host/common/NativeProcessProtocol.cpp
+11-2lldb/test/API/functionalities/multi-breakpoint/TestMultiBreakpoint.py
+6-7lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+5-2lldb/source/Plugins/Process/Utility/NativeProcessSoftwareSingleStep.h
+46-582 files not shown
+51-618 files

LLVM/project b1fe10dllvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

1
DeltaFile
+0-1llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+0-11 files

LLVM/project 53b4b27llvm/test/CodeGen/X86 fold-int-pow2-with-fmul-or-fdiv.ll

update test
DeltaFile
+16-53llvm/test/CodeGen/X86/fold-int-pow2-with-fmul-or-fdiv.ll
+16-531 files

LLVM/project 6526bb7llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/X86 fold-int-pow2-with-fmul-or-fdiv.ll

[DAGCombiner] Use KnownBits in `combineFMulOrFDivWithIntPow2`
DeltaFile
+191-0llvm/test/CodeGen/X86/fold-int-pow2-with-fmul-or-fdiv.ll
+2-3llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+193-32 files

LLVM/project f7f911flibc/include CMakeLists.txt, libc/include/llvm-libc-types in_port_t.h CMakeLists.txt

[libc] Add some types to netinet/in.h (#196932)

Not including more types because I need to fix in_addr definition first.

This exposes stdint macros and types through the header, but POSIX
permits that behavior (and explicitly requires that we define uint8_t
and uint32_t).

No test as this is just adding a typedef, and I don't *think* we have
tests for that, but I can add a "check that type is defined" test if
that is desirable.
DeltaFile
+16-0libc/include/llvm-libc-types/in_port_t.h
+4-1libc/include/CMakeLists.txt
+4-1libc/include/netinet/in.yaml
+1-0libc/include/llvm-libc-types/CMakeLists.txt
+25-24 files

LLVM/project 694fc0ellvm/test/CodeGen/AArch64/GlobalISel select-intrinsic-aarch64-sdiv.mir preselect-process-phis.mir

Fix tests
DeltaFile
+4-22llvm/test/CodeGen/AArch64/GlobalISel/select-intrinsic-aarch64-sdiv.mir
+2-4llvm/test/CodeGen/AArch64/GlobalISel/preselect-process-phis.mir
+6-262 files

LLVM/project 65a206fllvm/test/CodeGen/X86 fold-int-pow2-with-fmul-or-fdiv.ll

[X86] fold-int-pow2-with-fmul-or-fdiv.ll - regenerate with (V)PADD asm comments (#197137)

Reduce diff in #197097
DeltaFile
+11-11llvm/test/CodeGen/X86/fold-int-pow2-with-fmul-or-fdiv.ll
+11-111 files

LLVM/project 8393227mlir/lib/Transforms Mem2Reg.cpp, mlir/test/Transforms mem2reg.mlir

fix handling of region
DeltaFile
+23-0mlir/test/Transforms/mem2reg.mlir
+3-3mlir/lib/Transforms/Mem2Reg.cpp
+26-32 files

LLVM/project e08e48dllvm/test/CodeGen/AArch64/GlobalISel select-intrinsic-aarch64-sdiv.mir, llvm/test/TableGen/GlobalISelEmitter MatchTableOptimizerRecursion.td

[GlobalISel] Recursively Optimise MatchTable Matchers

The core of this change is the additional call to `Matcher::optimize()` in the `optimizeRules` function,
which enables the match table optimization logic to recurse on the children of every GroupMatcher, forming
additional groups (which hoist more common predicates into a shared group).

To enable that, I had to update the `getFirstConditionAsRootType` implementation to support `GroupMatcher`.
I also included a small refactoring of the match table optimization pipeline that was identical between the
GlobalISel and GlobalISelCombiner emitters.

The results of this change are up to a 25% size reduction for GlobalISel match tables.
There is a tiny increase (a few bytes) in a combiner table because we now create new groups
(which need up to 3 additional opcodes because of the new `Try` and `Reject` required) to hoist one predicate for only 2 rules, which
result in a small net negative change (one or two more ops).

I used a small bash script to compare all relevant files, this is the before/after:
```
FILE                                          OLD      NEW    DIFF%    SAME?
----                                      -------  -------    -----    -----

    [8 lines not shown]
DeltaFile
+204-0llvm/test/TableGen/GlobalISelEmitter/MatchTableOptimizerRecursion.td
+67-19llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp
+5-34llvm/utils/TableGen/GlobalISelEmitter.cpp
+1-34llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
+12-7llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.h
+18-0llvm/test/CodeGen/AArch64/GlobalISel/select-intrinsic-aarch64-sdiv.mir
+307-941 files not shown
+310-947 files

LLVM/project d176a1ellvm/lib/CodeGen/GlobalISel InstructionSelect.cpp, llvm/lib/Target/AArch64/GISel AArch64InstructionSelector.cpp

[GlobalISel][AMDGPU][AArch64] Fix GlobalISel copy propagation (#188781)

Disallow propagation of sub-registers after GlobalISel, as the current
code is blindly dropping any sub-register information. This also fixes
bugs in AArch64 and AMDGPU back-end that rely on the incorrect behavior
and would fail with the fix:

* Update `selectG_UNMERGE_VALUES` in AMDGPU so instead of generating
`hi16` for SGPR it shifts higher bits into the destination register
using `lshr`.
* Prevent AArch64 back-end from generating spurious `sub_32:gpr32all`
when selecting copy.
* Test changes: `fpto[s/u]i-sat-vector.ll`: The correct number of
conversions is now generated as higher 16-bits are handled correctly;
however, it introduces `lshr` instructions. This should be resolved in
#188287 by enabling `s_cvt_hi_*`.
DeltaFile
+144-55llvm/test/CodeGen/AMDGPU/fptosi-sat-vector.ll
+128-51llvm/test/CodeGen/AMDGPU/fptoui-sat-vector.ll
+9-7llvm/lib/CodeGen/GlobalISel/InstructionSelect.cpp
+10-2llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+7-0llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+298-1155 files

LLVM/project 9346acdllvm/lib/TableGen TGParser.cpp, llvm/test/TableGen submulticlass-leteq.td submulticlass-typecheck.td

[TableGen] Add submulticlass typechecking to template arg values (#197128)

Some typechecking was missing when parsing a submulticlass reference.
Add the CheckTemplateArgValues call in ParseSubMultiClassReference.

Resolves https://github.com/llvm/llvm-project/issues/84910.
DeltaFile
+21-0llvm/test/TableGen/submulticlass-leteq.td
+12-0llvm/test/TableGen/submulticlass-typecheck.td
+5-0llvm/lib/TableGen/TGParser.cpp
+38-03 files

LLVM/project 2ec483dclang/test/CodeGenOpenCL builtins-amdgcn-gfx1250.cl, llvm/include/llvm/IR IntrinsicsAMDGPU.td

[AMDGPU] Update permlane_bcast/down/up/xor intrinsic to support more types
DeltaFile
+2,848-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.gfx1250.ll
+25-8llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+12-12llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+9-9clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+16-2llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+8-6llvm/lib/Target/AMDGPU/VOP3Instructions.td
+2,918-373 files not shown
+2,938-429 files

LLVM/project bfe5d5bclang/include/clang/Analysis/Analyses/LifetimeSafety LifetimeSafety.h, clang/include/clang/Basic DiagnosticSemaKinds.td

[LifetimeSafety] Diagnose invalidated-field (#196680)

Teach lifetime safety invalidation diagnostics to handle origins that
escape through fields before the referenced object is invalidated.
Previously they were skipped.

Partially addresses https://github.com/llvm/llvm-project/issues/195706
DeltaFile
+60-0clang/test/Sema/warn-lifetime-safety-invalidations.cpp
+32-0clang/lib/Sema/SemaLifetimeSafety.h
+20-1clang/lib/Analysis/LifetimeSafety/Checker.cpp
+6-0clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
+4-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+122-15 files

LLVM/project d611791llvm/include/llvm/IR Function.h InstructionListener.h, llvm/lib/IR Function.cpp Value.cpp

refactoring
DeltaFile
+15-9llvm/include/llvm/IR/Function.h
+11-4llvm/lib/IR/Function.cpp
+6-2llvm/include/llvm/IR/InstructionListener.h
+1-3llvm/lib/IR/Value.cpp
+33-184 files

LLVM/project b4aa4d4llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 avx512-skx-insert-subvec.ll

[X86] combineINSERT_SUBVECTOR - only fold vXi1 zero-widening if scalar mask source has one use (#197125)

Fixes infinite loop reported on #192699
DeltaFile
+21-0llvm/test/CodeGen/X86/avx512-skx-insert-subvec.ll
+2-2llvm/lib/Target/X86/X86ISelLowering.cpp
+23-22 files

LLVM/project fccd6famlir/lib/Interfaces ControlFlowInterfaces.cpp

[mlir][Interfaces] Disallow tokens in `BranchOpInterface` / `RegionBranchOpInterface` verifiers
DeltaFile
+14-0mlir/lib/Interfaces/ControlFlowInterfaces.cpp
+14-01 files

LLVM/project 9ca55c0llvm/lib/Transforms/InstCombine InstCombineAddSub.cpp, llvm/test/Transforms/InstCombine sub-xor.ll

[InstCombine] Relax the requirements for (X ^ C2) + C -> (C2 + C) - X (#196897)

If (C2 - X) has no borrow between bits, it is equivalent to (X ^ C2).
A borrow would occur when c2_bit=0 and x_bit=1.
It follows that c2_bit=1 or x_bit=0 means no borrow.

Remove an artificial condition that C2 must be a low bits mask.

Proof: https://alive2.llvm.org/ce/z/uNMsg_
DeltaFile
+78-0llvm/test/Transforms/InstCombine/sub-xor.ll
+6-7llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+84-72 files

LLVM/project 87cba87clang/test/AST ast-dump-templates.cpp, llvm/test/CodeGen/AArch64 bf16-v8-instructions.ll

Merge branch 'main' into users/jmmartinez/error_on_prune
DeltaFile
+652-9,305clang/test/AST/ast-dump-templates.cpp
+5,061-4,162llvm/test/CodeGen/Thumb2/mve-clmul.ll
+7,601-671llvm/test/CodeGen/AArch64/bf16-v8-instructions.ll
+8,195-0llvm/test/MC/AMDGPU/gfx13_asm_vop3.s
+8,182-0llvm/test/MC/AMDGPU/gfx13_asm_vop3-fake16.s
+6,873-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Premium-sve-instructions.s
+36,564-14,13810,621 files not shown
+527,596-229,59610,627 files