LLVM/project f42072ellvm/include/llvm/Support KnownBits.h, llvm/lib/Analysis ValueTracking.cpp

[Analysis] Add `KnownBits` optimization for `pdep` and `pext` (#204223)

Fixes #204136
DeltaFile
+91-0llvm/test/Analysis/ValueTracking/knownbits-pext.ll
+89-0llvm/test/Analysis/ValueTracking/knownbits-pdep.ll
+65-0llvm/lib/Support/KnownBits.cpp
+3-9llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+10-0llvm/lib/Analysis/ValueTracking.cpp
+6-0llvm/include/llvm/Support/KnownBits.h
+264-91 files not shown
+266-97 files

LLVM/project 7376a70compiler-rt/lib/tsan/rtl tsan_platform.h

[tsan] fit Go/s390x mapping under QEMU (#204503)

QEMU linux-user first tries guest_base=0. In that identity-mapped mode,
fixed guest mappings use the same host addresses. On an x86-64 host
with four-level page tables, the Go/s390x meta shadow starts at
144 TiB, beyond the 128 TiB userspace limit, and its mmap fails with
ENOMEM during TSan initialization.

Move the meta shadow down by 32 TiB to
[0x700000000000, 0x780000000000), restoring the 16 TiB gap after the
shadow and placing all Go/s390x TSan regions below 2^47. Correct the
mapping comment's shadow size and ratio.

Failure report and native s390x comparison:
https://github.com/golang/go/issues/67881

QEMU identity guest-base selection:

https://github.com/qemu/qemu/blob/v10.2.3/linux-user/elfload.c#L1036-L1042

    [9 lines not shown]
DeltaFile
+8-5compiler-rt/lib/tsan/rtl/tsan_platform.h
+8-51 files

LLVM/project 2978e2fllvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/CodeGen/X86 atomic-load-store.ll

Merge branch 'main' into users/ikudrin/clang-findallocationfunction-simplify
DeltaFile
+203-329llvm/test/CodeGen/X86/atomic-load-store.ll
+214-266llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+366-0llvm/test/tools/llvm-objcopy/MachO/linkedit-alignment.test
+241-0llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
+232-0llvm/test/Transforms/VectorCombine/X86/shuffle-chain-reduction-subvector.ll
+182-2llvm/test/Transforms/InstCombine/or.ll
+1,438-597120 files not shown
+4,268-1,755126 files

LLVM/project 5066d3aclang/include/clang/Sema Sema.h, clang/lib/Sema SemaExprCXX.cpp SemaOverload.cpp

fixup! Streamline overload resolution
DeltaFile
+202-175clang/lib/Sema/SemaExprCXX.cpp
+2-2clang/include/clang/Sema/Sema.h
+1-1clang/lib/Sema/SemaOverload.cpp
+205-1783 files

LLVM/project 9d6c686orc-rt/include/orc-rt Session.h, orc-rt/lib/executor Session.cpp

[orc-rt] Sink Session::sendWrapperResult into Session.cpp. NFC. (#204956)

This function is never called inline (except by Session::wrapperReturn,
which is also in Session.cpp), so there's no need for it to be in the
header.
DeltaFile
+7-0orc-rt/lib/executor/Session.cpp
+1-6orc-rt/include/orc-rt/Session.h
+8-62 files

LLVM/project e1f65fallvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/SimplifyCFG convergent-loop-header.ll

[SimplifyCFG] Avoid threading loop-header branches in convergent functions

SimplifyCFG can fold a conditional branch when the condition is known from
a predecessor. When the destination is a loop header in a convergent function,
this can change the dynamic convergence structure of the loop even though the
scalar CFG rewrite is otherwise valid.

Skip this fold for loop-header branches in convergent functions so convergent
control flow is preserved.

Fixes ROCM-26496.
DeltaFile
+6-4llvm/test/Transforms/SimplifyCFG/convergent-loop-header.ll
+4-1llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+10-52 files

LLVM/project 0cddd5fllvm/test/Transforms/SimplifyCFG convergent-loop-header.ll

[NFC] Pre-commit a test case for a SimplifyCFG issue
DeltaFile
+50-0llvm/test/Transforms/SimplifyCFG/convergent-loop-header.ll
+50-01 files

LLVM/project ec56065.github/workflows new-prs.yml

workflows/new-prs: Remove obsolete code (#204955)

This was left over after 57e4352de0d2617bae1656dc2e2b3ca430e83c4c and
causing the jobs to fail.
DeltaFile
+0-1.github/workflows/new-prs.yml
+0-11 files

LLVM/project afac572clang/test CMakeLists.txt

[clang] Add clang-format-check-format instead to CLANG_TEST_DEPS (#204908)

Ensure that clang-format doesn't break the existing format of its own
source.

Reverts #199169 and #199638.
DeltaFile
+1-5clang/test/CMakeLists.txt
+1-51 files

LLVM/project 61d601ellvm/lib/Target/AMDGPU GCNVOPDUtils.cpp

[AMDGPU][VOPD] Cache load reachability checks in VOPDpairing (#204854)

#201930 causes significant compilation time regression when building
ROCm mathlibs.

Major regressions are caused by repeated queries to `DAG->IsReachable`
to detect possible scalarisation of loads when fusing a pair of
VOPD-capable instructions.
This patch caches the set of reachable loads for every potentially
hazardous load instruction to avoid the need to invoke
`DAG->IsReachable` at all.
DeltaFile
+74-48llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+74-481 files

LLVM/project 959f069llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp, llvm/test/CodeGen/X86 atomic-load-store.ll

[SelectionDAG] Keep split vector atomic store value in a vector register (#201566)

When the value of an ATOMIC_STORE has a vector type whose legalization
action is split (e.g. <4 x half>/<4 x bfloat> on X86 without F16C),
SplitVecOp_ATOMIC_STORE bitcast the value straight to a scalar integer
spanning the memory width. For a split vector that bitcast is expanded
element by element, reassembling the value in GPRs (a long pextrw/shl/or
sequence) before the store.

Instead, keep the value in a vector register when a legal vector form
exists: reinterpret it as a same-shaped integer-element vector (an FP
element type may have no legal vector form, e.g. bfloat on SSE2, while
the integer-of-element-size form does), widen that to a legal vector,
and extract the low integer element of the memory width. This issues the
store directly from a vector register (a single MOVQ/MOVD on X86),
matching the widen-path codegen already produced on AVX targets. Falls
back to the scalar bitcast when no suitable legal vector type exists.

Stacked on top of https://github.com/llvm/llvm-project/pull/197861; and
below of #197862.
DeltaFile
+203-329llvm/test/CodeGen/X86/atomic-load-store.ll
+33-6llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+236-3352 files

LLVM/project 3c5f0c2llvm/test/Transforms/LoopVectorize/VPlan/AArch64 vplan-memory-op-decisions.ll

[VPlan] Add memory op decision test for scalarizing loads. (NFC) (#204949)

VPlan printing tests for
https://github.com/llvm/llvm-project/pull/196842
DeltaFile
+175-0llvm/test/Transforms/LoopVectorize/VPlan/AArch64/vplan-memory-op-decisions.ll
+175-01 files

LLVM/project 5502491llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/AArch64 transform-narrow-interleave-to-widen-memory-with-wide-ops.ll transform-narrow-interleave-to-widen-memory-with-wide-ops-and-casts.ll

[VPlan] Properly check predicates and types in canNarrowOps. (#204948)

Update canNarrowOps to properly check the types of all members match.
Similarly, for recipes with predicates, the predicates must match.
DeltaFile
+241-0llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
+176-0llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops-and-casts.ll
+6-2llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+423-23 files

LLVM/project d0c2776llvm/test/Analysis/BasicAA recphi.ll phi-and-select.ll

[BasicAA] Add additional tests with GEPs with phi/select pointer ops (NFC) (#204947)
DeltaFile
+92-0llvm/test/Analysis/BasicAA/recphi.ll
+27-0llvm/test/Analysis/BasicAA/phi-and-select.ll
+21-0llvm/test/Analysis/BasicAA/phi-aa.ll
+140-03 files

LLVM/project a891d7bllvm/lib/ObjCopy/MachO MachOLayoutBuilder.cpp MachOObjcopy.cpp

[llvm-objcopy][MachO] Use alignToPowerOf2 instead of alignTo (#204033)

During the review of #203680 I noticed that Mach-O objcopy files seems
to use `alignTo` and import `Alignment.h` to align some offsets to page
boundaries and similar requirements. However, the `alignTo` in
`Alignment.h`, while being intended for powers of 2, requires using an
alignment of type `llvm::Align`, and needs explicit conversion from
`uint64_t` and similar. Single `Alignment.h` includes `MathExtras.h`,
the `alignTo` being invoked ends up being a generic `alignTo` that does
not require powers of 2, and perform divisions and multiplications.
While some of those might be optimized by the compiler into efficient
power of 2 operations, there's an explicit `alignToPowerOf2` version
that is optimized and asserts the alignment is a power of 2 (with
asserts enabled). Since all the alignments should be power of 2 for the
Mach-O binary format, change from `alignTo` to `alignToPowerOf2` to make
the fact more visible (and get the extra safety net of the assertions).

As expected, the test suite of objcopy doesn't show any regressions, but
I have not done a performance benchmark around this either.
DeltaFile
+15-13llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp
+4-3llvm/lib/ObjCopy/MachO/MachOObjcopy.cpp
+2-2llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.h
+21-183 files

LLVM/project 18c1cbcllvm/lib/ObjCopy/MachO MachOLayoutBuilder.cpp MachOWriter.cpp, llvm/test/tools/llvm-objcopy/MachO linkedit-alignment.test symbol-table.test

[llvm-objcopy][MachO] Align __LINKEDIT entries to pointer size (#203680)

Align Mach-O __LINKEDIT entries to the target pointer size when building
the tail layout. This matches the behavior of ld64 and lld-macho.

dyld on macOS 27 rejects loading dylibs with misaligned __LINKEDIT
entries.

See #203678 for details and the motivation of this fix.

AI Tool Use Disclosure:

Regarding the PR and the linked issue, I have personally wrote every
single part of the PR by myself, and have/ran/verified every single part
of the issue report as well without any AI tool usage.

I have used LLM-based coding agents only for debugging purposes, e.g. to
figure out why the dylib was not loading (from the original bug report),
and figuring out how to build, run, and test my local `llvm-objcopy`.
DeltaFile
+366-0llvm/test/tools/llvm-objcopy/MachO/linkedit-alignment.test
+51-34llvm/lib/ObjCopy/MachO/MachOLayoutBuilder.cpp
+30-12llvm/lib/ObjCopy/MachO/MachOWriter.cpp
+2-2llvm/test/tools/llvm-objcopy/MachO/symbol-table.test
+2-1llvm/test/tools/llvm-objcopy/MachO/linkedit-order-2.test
+2-1llvm/test/tools/llvm-objcopy/MachO/linkedit-order-1.test
+453-506 files

LLVM/project cb85dfellvm/lib/Transforms/Vectorize VPlanUtils.cpp, llvm/test/Transforms/LoopVectorize shl-shift-amount-out-of-range-scev.ll

[VPlan] Skip shl->mul SCEV rewrite for out-of-range shift amounts. (#204921)

getSCEVExprForVPValue rewrites `shl x, c` as `x * (1 << c)` using
ScalarEvolution::getPowerOfTwo, which asserts that the power is less
than the type's bit width.

Only perform the rewrite when the shift amount is less than the
operand's bit width, to avoid assertion.
DeltaFile
+65-0llvm/test/Transforms/LoopVectorize/shl-shift-amount-out-of-range-scev.ll
+5-2llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+70-22 files

LLVM/project b9c334dllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 insertvalue-reordered-operands.ll

[SLP] Fix scheduling crash for reordered insertvalue buildvector nodes

Insertvalue nodes keep scalars in program order but reorder operands, like
stores. Remap the operand lane via ReorderIndices for InsertValueInst (not
just StoreInst) in scheduling and the copyable helpers, fixing the
"Operand not found" assertion.

Fixes https://github.com/llvm/llvm-project/pull/200274#issuecomment-4753792761

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/204941
DeltaFile
+62-0llvm/test/Transforms/SLPVectorizer/X86/insertvalue-reordered-operands.ll
+6-5llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+68-52 files

LLVM/project 2df672allvm/include/llvm/Support WithColor.h raw_ostream.h, llvm/utils/FileCheck FileCheck.cpp

[FileCheck] Use default colors in input dumps

This patch makes two improvements to colors used in FileCheck input
dumps:

1. Without this patch, input line numbers and ellipses have a
   foreground color of black, which is hard to see in a terminal with
   a dark color theme.  This patch changes that to the terminal's
   default color.
2. Without this patch, the input text is accidentally set to bold when
   neither `-v` or `-vv` is specified.  Perhaps I never noticed
   because I tend to always use `-vv`.  This patch changes that to use
   the terminal's default color.

Case 2 exposes a problem with LLVM's color implementation.  Without
this patch, the call to `WithColor`'s constructor actually specifies
bold as `false`, but `WithColor` ignores that when the color is
`SAVEDCOLOR`.  While it seems like that should be fixed, I am
concerned about the impact of such a fix on other tools that might

    [12 lines not shown]
DeltaFile
+14-4llvm/utils/FileCheck/FileCheck.cpp
+4-0llvm/include/llvm/Support/WithColor.h
+2-0llvm/include/llvm/Support/raw_ostream.h
+20-43 files

LLVM/project 6619aa7llvm/lib/Target/AMDGPU AMDGPUBarrierLatency.cpp, llvm/test/CodeGen/AMDGPU fence-barrier-latency.ll llvm.amdgcn.update.dpp.ll

[AMDGPU] Use SchedModel latencies for Fence barrier edges (#204657)

For memory->fence dependencies, this PR sets the latency of the edge to
the instr latency of the predecessor memory instruction.

During lowering of these fences, we insert the necessary waitcnts, and
we end up waiting for any outstanding memory op at these fences. Thus,
the latency of the edges should be based on latency of the associated
load/stores.
DeltaFile
+149-0llvm/test/CodeGen/AMDGPU/fence-barrier-latency.ll
+18-17llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll
+9-9llvm/test/CodeGen/AMDGPU/schedule-barrier-latency-gfx9.mir
+5-1llvm/lib/Target/AMDGPU/AMDGPUBarrierLatency.cpp
+181-274 files

LLVM/project 513ea0dllvm/include/llvm/Support WithColor.h raw_ostream.h, llvm/utils/FileCheck FileCheck.cpp

[FileCheck] Use default colors in input dumps

This patch makes two improvements to colors used in FileCheck input
dumps:

1. Without this patch, input line numbers and ellipses have a
   foreground color of black, which is hard to see in a terminal with
   a dark color theme.  This patch changes that to the terminal's
   default color.
2. Without this patch, the input text is accidentally set to bold when
   neither `-v` or `-vv` is specified.  Perhaps I never noticed
   because I tend to always use `-vv`.  This patch changes that to use
   the terminal's default color.

Case 2 exposes a problem with LLVM's color implementation.  Without
this patch, the call to `WithColor`'s constructor actually specifies
bold as `false`, but `WithColor` ignores that when the color is
`SAVEDCOLOR`.  While it seems like that should be fixed, I am
concerned about the impact of such a fix on other tools that might

    [12 lines not shown]
DeltaFile
+10-4llvm/utils/FileCheck/FileCheck.cpp
+4-0llvm/include/llvm/Support/WithColor.h
+2-0llvm/include/llvm/Support/raw_ostream.h
+16-43 files

LLVM/project cd532fellvm/lib/Analysis LoopCacheAnalysis.cpp, llvm/test/Analysis/LoopCacheAnalysis compute-cost.ll partially-perfect-nest.ll

[LoopCacheAnalysis] Generate tests by update_analyze_test_checks.py (#204807)

Since loop interchange has been enabled in the default pipeline,
development on LoopCacheAnalysis, which is used by LoopInterchange, is
becoming more active. So I think it's a good time to support automatic
test generation for LoopCacheAnalysis.
This patch does two things. First, it changes LoopCachePrinterPass from
a loop pass to a function pass to make it possible to use
update_analyze_test_checks.py. Second, it rewrites all the CHECK
directives in the existing LoopCacheAnalysis tests using the script.
DeltaFile
+41-16llvm/test/Analysis/LoopCacheAnalysis/compute-cost.ll
+24-11llvm/lib/Analysis/LoopCacheAnalysis.cpp
+21-11llvm/test/Analysis/LoopCacheAnalysis/PowerPC/compute-cost.ll
+16-13llvm/test/Analysis/LoopCacheAnalysis/PowerPC/LoopnestFixedSize.ll
+14-11llvm/test/Analysis/LoopCacheAnalysis/partially-perfect-nest.ll
+11-8llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll
+127-7011 files not shown
+189-11817 files

LLVM/project 2c022e8llvm/test/Verifier range-1.ll nofpclass-metadata.ll

[Verifier] Only accept noundef metadata on loads and update metadata tests (#204922)

noundef metadata has been accepted everywhere so far, which seems to be
an oversight. This patch rejects it everywhere except for load
instructions, which seem to be the only ones where it's supposed to be
supported. The other metadata tests are also updated so they are
somewhat similar to each other.
DeltaFile
+0-163llvm/test/Verifier/range-1.ll
+38-84llvm/test/Verifier/nofpclass-metadata.ll
+68-0llvm/test/Verifier/range-metadata.ll
+0-21llvm/test/Verifier/nonnull_metadata.ll
+16-0llvm/test/Verifier/nonnull-metadata.ll
+12-0llvm/test/Verifier/noundef-metadata.ll
+134-26812 files not shown
+161-29018 files

LLVM/project 0c3c664llvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine fold-shuffle-chains-to-reduce.ll

[VectorCombine] Add subvector reduction support to foldShuffleChainsToReduce (#199872)

Extends foldShuffleChainsToReduce to recognise subvector reductions
where the chain narrows through shuffles before extracting lane 0.

The matcher tracks per output lane attribution as the chain is walked.
Each lane carries a per source bitmask of contributing source lanes plus
a poison flag. Shuffles permute these records. Binops union them. At the
extract, lane 0's bitmasks rebuild the reduction as one or more partial
reduce intrinsics. The walk is capped at 32 chain nodes.

Also added new test file with 11 tests:

| Test | Reason |
| ------------------------------------------------------ |
--------------------------------------------- |
| `_add_v4i32`, `_add_v8i16`, `_add_v16i8`, `_add_v64i8` | basic
subvector reductions across types/sizes |
| `_mul_v16i8` | non-add reduction |

    [20 lines not shown]
DeltaFile
+214-266llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+232-0llvm/test/Transforms/VectorCombine/X86/shuffle-chain-reduction-subvector.ll
+39-41llvm/test/Transforms/VectorCombine/fold-shuffle-chains-to-reduce.ll
+6-27llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
+20-0llvm/test/Transforms/VectorCombine/X86/shuffle-chain-reduction-umin.ll
+17-0llvm/test/Transforms/VectorCombine/AArch64/partial-reduce-crash.ll
+528-3345 files not shown
+543-37011 files

LLVM/project c888371clang-tools-extra/clangd CompileCommands.cpp

[clangd] Look for resource-dir relative to detected compiler path as a fallback (#203332)

If the standard resource directory (which is searched for relative to the clangd
executable) does not exist, look for one relative to the detected compiler as a
fallback. This handles some packaging schemes where clangd and clang are
installed in different prefixes and the resource directory is only located in the
latter.

Also print an error message to the log if the fallback didn't find an existing
directory either.
DeltaFile
+23-1clang-tools-extra/clangd/CompileCommands.cpp
+23-11 files

LLVM/project bae51e7llvm/lib/IR Instructions.cpp, llvm/test/Transforms/InstCombine alloca-big.ll

[IR] handle oversized constant alloca counts in getAllocationSize (#204540)

AllocaInst::getAllocationSize() unconditionally calls getZExtValue() for
array allocas, which asserts when the constant element count is wider
than 64 bits.

Use tryZExtValue() when reading the constant array size instead. If the
count cannot be represented in uint64_t, return std::nullopt rather than
asserting, matching the existing contract.

Fixes #203519
DeltaFile
+4-1llvm/lib/IR/Instructions.cpp
+1-0llvm/test/Transforms/InstCombine/alloca-big.ll
+5-12 files

LLVM/project b2c0c48llvm/lib/Transforms/InstCombine InstCombineAndOrXor.cpp, llvm/test/Transforms/InstCombine or.ll add.ll

[InstCombine] Fold or (ashr X, BW-1), zext (icmp ne|sgt X, 0) to scmp(X, 0) (#196828)

Recognize the bitwise signum encoding
  or (ashr X, BW-1), zext (icmp ne  X, 0) --> llvm.scmp(X, 0)
  or (ashr X, BW-1), zext (icmp sgt X, 0) --> llvm.scmp(X, 0)

Alive2: https://alive2.llvm.org/ce/z/UZ7a7Q
DeltaFile
+182-2llvm/test/Transforms/InstCombine/or.ll
+17-0llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+3-11llvm/test/Transforms/InstCombine/add.ll
+6-6llvm/test/Transforms/InstCombine/and-or-icmps.ll
+208-194 files

LLVM/project f6296fbllvm/lib/Transforms/InstCombine InstCombineCasts.cpp, llvm/test/Transforms/InstCombine zext.ll

[InstCombine] Fold zext(and/or/xor(trunc nuw x), y) -> and/or/xor(zext(y), x) (#204927)

proof: https://alive2.llvm.org/ce/z/ZORvJ6
DeltaFile
+104-0llvm/test/Transforms/InstCombine/zext.ll
+9-0llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+113-02 files

LLVM/project 2ec6f28llvm/lib/Transforms/InstCombine InstCombineCasts.cpp, llvm/test/Transforms/InstCombine set.ll

[InstCombine] Fold sext(and/or/xor(trunc nsw x), y) -> and/or/xor(sext(y), x) (#204928)

Proof: https://alive2.llvm.org/ce/z/ntVE_8
DeltaFile
+104-0llvm/test/Transforms/InstCombine/set.ll
+9-0llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+113-02 files

LLVM/project e26ff54llvm/lib/Transforms/InstCombine InstCombineCasts.cpp

[InstCombine] Remove fold with OneUse as there is fold without the check (NFC) (#204925)
DeltaFile
+0-5llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+0-51 files