LLVM/project d3b48ccllvm/utils/gn/secondary/llvm/lib/Transforms/Utils BUILD.gn

[gn build] Port a64928f267f3 (#204997)
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn
+1-01 files

LLVM/project 47fd9edllvm/utils/gn/secondary/llvm/lib/Target/AArch64 BUILD.gn

[gn build] Port 60a2d437bd04 (#204996)
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/Target/AArch64/BUILD.gn
+1-01 files

LLVM/project 0e13569libc/include/llvm-libc-macros math-function-macros.h, libc/test/include iscanonical_test.c CMakeLists.txt

[libc][math] Extend iscanonical macro to _Float16 and float128

iscanonical is a C23 type-generic macro, so the f16/f128 variants are
surfaced through it rather than as functions in the generated math.h.
float128 is only listed when distinct from long double (LDBL_MANT_DIG !=
113) to avoid two _Generic associations with compatible types.
DeltaFile
+22-1libc/include/llvm-libc-macros/math-function-macros.h
+16-0libc/test/include/iscanonical_test.c
+2-0libc/test/include/CMakeLists.txt
+40-13 files

LLVM/project f19e3e6flang/lib/Semantics check-omp-structure.cpp, llvm/include/llvm/Frontend/OpenMP OMP.td

[flang][OpenMP] Move unique clauses to allowedOnceClauses in OMP.td

Many unique clauses were listed in "allowedClauses", which turned off
the single-occurrence check in flang. Move these clauses to the right
category to enable this check.
One exception to this is the IF clause: the IF clause is unique for
all non-compound directives, but is repeatable on compound ones with
the restriction that at most one IF clause can apply to any of the
constituents. This restriction is currently not enforced correctly
in flang, and so the IF clause was left unchanged.

Although this change is applied to a file shared between flang and
clang, clang does not use these categories for its checks, and hence
is not affected by this patch.
DeltaFile
+312-260llvm/include/llvm/Frontend/OpenMP/OMP.td
+0-3flang/lib/Semantics/check-omp-structure.cpp
+312-2632 files

LLVM/project 48c0a2allvm/lib/CodeGen/SelectionDAG LegalizeDAG.cpp, llvm/lib/Target/PowerPC PPCISelLowering.cpp

Revert "[Legalizer] Add support for promoting integers for s/ucmp (#198554) (#204978)

This reverts commit 91edd87a801fc5c9d12c7f5c6863edd50327cef8.

It was causing CI failures for Linux.
DeltaFile
+33-2llvm/test/CodeGen/PowerPC/ucmp.ll
+11-8llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+0-15llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+44-253 files

LLVM/project 6542d6dllvm/lib/Target/ARM ARMExpandPseudoInsts.cpp, llvm/test/CodeGen/Thumb2 cmpxchg.mir

[ARM] Use lo tCMPr opcode when expanding CMP_SWAP (#204567)

We were always generating the tCMPhir even when the registers were both
low, which is an unpredictable instruction. Generating tCMPr instead
when both the registers are low.

Fixes #204519.
DeltaFile
+65-6llvm/test/CodeGen/Thumb2/cmpxchg.mir
+16-5llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
+81-112 files

LLVM/project d6d4921llvm/utils/gn/build write_cmake_config.gni, llvm/utils/gn/secondary/llvm/lib/Transforms/IPO BUILD.gn

[gn] Fix missing dependency (#204991)

This fixes an oversight in 27d344d36ecac364.
DeltaFile
+4-5llvm/utils/gn/secondary/llvm/lib/Transforms/IPO/BUILD.gn
+7-0llvm/utils/gn/build/write_cmake_config.gni
+11-52 files

LLVM/project 3b46febllvm/lib/Transforms/Vectorize VPlanVerifier.cpp, llvm/test/Transforms/LoopVectorize vector-loop-backedge-elimination-tail-folding.ll

[VPlan] Allow plain active lane mask in LastActiveLane verifier. (#204982)

Active lane masks are prefix masks. After simplifying the backedge, we
may end up with an active-lane-mask operand of LastActiveLane that does
not match the header mask predicate.

This fixes a verifier failure for the new test.
DeltaFile
+41-0llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-tail-folding.ll
+3-0llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+44-02 files

LLVM/project 4f3eb80llvm/lib/Target/Xtensa/MCTargetDesc XtensaInstPrinter.cpp XtensaMCCodeEmitter.cpp

[Xtensa] Call isUInt<8> in range-check asserts (#204731)

`printOffset8m8_AsmOperand` and `getSelect_256OpValue` assert on
`isUInt<8>` without calling it, so the expression takes the function's
address and the range check never runs. This also trips
`-Werror,-Wpointer-bool-conversion` in builds with assertions enabled.
Pass the operand value so the bound is actually checked.
DeltaFile
+2-1llvm/lib/Target/Xtensa/MCTargetDesc/XtensaInstPrinter.cpp
+1-1llvm/lib/Target/Xtensa/MCTargetDesc/XtensaMCCodeEmitter.cpp
+3-22 files

LLVM/project 31f308ellvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU AMDGPUInstructionSelector.cpp SIISelLowering.cpp

[AMDGPU] Guard more intrinsics with target features
DeltaFile
+1-51llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+0-42llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+0-24llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+15-2llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+4-4llvm/test/CodeGen/AMDGPU/unsupported-av-store.ll
+4-4llvm/test/CodeGen/AMDGPU/unsupported-av-load.ll
+24-12712 files not shown
+45-14318 files

LLVM/project c1037feclang/lib/CodeGen CodeGenAction.cpp, llvm/lib/CodeGen/SelectionDAG SelectionDAGBuilder.cpp

[RFC][CodeGen] Add generic target feature checks for intrinsics

This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.

It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.

Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.

This PR uses one AMDGPU intrinsic as an example.
DeltaFile
+96-3llvm/lib/MC/MCSubtargetInfo.cpp
+37-0clang/lib/CodeGen/CodeGenAction.cpp
+36-0llvm/lib/IR/DiagnosticInfo.cpp
+33-1llvm/utils/TableGen/Basic/IntrinsicEmitter.cpp
+28-0llvm/test/TableGen/intrinsic-target-features.td
+25-0llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+255-414 files not shown
+391-920 files

LLVM/project d845955llvm/lib/IR Verifier.cpp VerifierAMDGPU.cpp, llvm/test/Verifier callbr-intrinsic.ll

[RFC][IR] Extract AMDGPU-specific verification logic into `VerifierAMDGPU.cpp`

`Verifier.cpp` is large and already mixes generic IR verification with
target-specific checks. We also have a growing amount of AMDGPU verifier logic
downstream, which would all end up in the same file if we don't address this,
and that is not ideal.

This patch extracts AMDGPU-specific verification logic into a separate
`VerifierAMDGPU.cpp` file, with shared infrastructure (`VerifierSupport`) moved
into `VerifierInternal.h`.

This is purely a code organization change, not a target-dependent IR verifier.
All checks remain compiled and linked into `LLVMCore` regardless of the target
triple. The extracted functions are called unconditionally at well-defined
extension points in `Verifier.cpp`, and each function internally gates on
target-specific conditions (for example, triple checks or intrinsic IDs) as
needed. The file is strictly limited to AMDGPU-specific IR constructs (amdgcn
intrinsics, AMDGPU module flags, etc.), and does not contain generic IR rules
that vary by target.

    [10 lines not shown]
DeltaFile
+23-530llvm/lib/IR/Verifier.cpp
+401-0llvm/lib/IR/VerifierAMDGPU.cpp
+233-0llvm/lib/IR/VerifierInternal.h
+6-6llvm/test/Verifier/callbr-intrinsic.ll
+1-0llvm/lib/IR/CMakeLists.txt
+1-0llvm/utils/gn/secondary/llvm/lib/IR/BUILD.gn
+665-5366 files

LLVM/project e0cc08dclang/lib/AST/ByteCode InterpBuiltin.cpp, clang/lib/Headers avx512vnniintrin.h avx512vlvnniintrin.h

[clang][x86] Add constexpr support for VNNI intrinsics (#190549)

Fixes #161340.

It adds constexpr support for VNNI
intrinsics by modifying their header files, their TableGen definitions,
how they're interpreted in InterpBuiltin.cpp and ExprConstant.cpp, and
adds unit tests in the headers' corresponding unit test files.
DeltaFile
+190-1clang/test/CodeGen/X86/avx512vlvnni-builtins.c
+162-0clang/test/CodeGen/X86/avxvnni-builtins.c
+86-1clang/test/CodeGen/X86/avx512vnni-builtins.c
+74-1clang/lib/AST/ByteCode/InterpBuiltin.cpp
+32-35clang/lib/Headers/avx512vnniintrin.h
+29-34clang/lib/Headers/avx512vlvnniintrin.h
+573-723 files not shown
+659-1189 files

LLVM/project a12b7afllvm/lib/Target/X86 X86InstrMisc.td, llvm/test/CodeGen/X86 bmi.ll

[X86] Select BLSI for i8 operands (#202344) (#204746)

Adds a tablegen pattern to select BLSI 32 for `and (neg x), x` at i8.

Fixes #202344
DeltaFile
+94-0llvm/test/CodeGen/X86/bmi.ll
+12-1llvm/lib/Target/X86/X86InstrMisc.td
+106-12 files

LLVM/project 9b36e4forc-rt/include/orc-rt QueueingRunner.h, orc-rt/unittests QueueingTaskDispatcherTest.cpp SessionTest.cpp

[orc-rt] Replace TaskDispatcher with Session-supplied wrapper-runner. (#204965)

TaskDispatcher was only used to run wrapper-function calls that
originated from the controller. Replace it with a callable type:

  Session::RunWrapperCall = move_only_function<void(
      orc_rt_SessionRef, uint64_t, orc_rt_WrapperFunctionReturn,
      orc_rt_WrapperFunction, WrapperFunctionBuffer)>

Each call carries an outstanding ManagedCodeTaskGroup token; the runner
must eventually invoke Fn (which calls Return) or call Return directly
to bail out, otherwise Session shutdown blocks indefinitely.

Clients can supply any callable that satisfies the contract above. The
new QueueingRunner and ThreadPoolRunner classes (replacing
QueueingTaskDispatcher and ThreadPoolTaskDispatcher, respectively) are
provided as off-the-shelf options.
DeltaFile
+0-291orc-rt/unittests/QueueingTaskDispatcherTest.cpp
+85-141orc-rt/unittests/SessionTest.cpp
+153-0orc-rt/unittests/ThreadPoolRunnerTest.cpp
+133-0orc-rt/unittests/QueueingRunnerTest.cpp
+0-110orc-rt/unittests/ThreadPoolTaskDispatcherTest.cpp
+82-0orc-rt/include/orc-rt/QueueingRunner.h
+453-54219 files not shown
+668-98425 files

LLVM/project 7b85647llvm/test/CodeGen/AArch64 inline-asm-prepare.ll

[CodeGen][NFC] Use llc instead of opt
DeltaFile
+1-3llvm/test/CodeGen/AArch64/inline-asm-prepare.ll
+1-31 files

LLVM/project 4417256llvm/lib/Transforms/Vectorize LoopVectorizationPlanner.cpp, llvm/test/Transforms/LoopVectorize/AArch64 vplan-native-outer-loop-wide-type.ll

[LV] Avoid zero-width VF in computeVPlanOuterloopVF. (#204918)

RegSize / WidestType may be 0 for types wider than the vector register
size. Clamp VF to at least 1 (scalar), to avoid a crash. This matches
inner loop behavior.
DeltaFile
+59-0llvm/test/Transforms/LoopVectorize/AArch64/vplan-native-outer-loop-wide-type.ll
+1-1llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.cpp
+60-12 files

LLVM/project 71c2febflang/test/Integration split-lto-unit-2.f90

Support for -fsplit-lto-unit option in flang driver (#204904)

Fix for buildbot failures in #202858

This commit fixes a regression introduced in commit
12aefe26cedd9a8f94546cc1f2be285cfddcc861 (Support for -fsplit-lto-unit
option in flang driver). When the compiler is built only for aarch64 one
of the testcase failed.

Added explicit check %if x86-registered-target for this testcase to
resolve the issue.
DeltaFile
+6-6flang/test/Integration/split-lto-unit-2.f90
+6-61 files

LLVM/project 8947e49llvm/lib/Transforms/InstCombine InstCombineCalls.cpp, llvm/test/Transforms/InstCombine assume.ll

[InstCombine] Move alignment assumptions to the base of constant offset GEPs (#204602)
DeltaFile
+14-0llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+3-6llvm/test/Transforms/InstCombine/assume.ll
+17-62 files

LLVM/project f42072ellvm/include/llvm/Support KnownBits.h, llvm/lib/Analysis ValueTracking.cpp

[Analysis] Add `KnownBits` optimization for `pdep` and `pext` (#204223)

Fixes #204136
DeltaFile
+91-0llvm/test/Analysis/ValueTracking/knownbits-pext.ll
+89-0llvm/test/Analysis/ValueTracking/knownbits-pdep.ll
+65-0llvm/lib/Support/KnownBits.cpp
+3-9llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+10-0llvm/lib/Analysis/ValueTracking.cpp
+6-0llvm/include/llvm/Support/KnownBits.h
+264-91 files not shown
+266-97 files

LLVM/project 7376a70compiler-rt/lib/tsan/rtl tsan_platform.h

[tsan] fit Go/s390x mapping under QEMU (#204503)

QEMU linux-user first tries guest_base=0. In that identity-mapped mode,
fixed guest mappings use the same host addresses. On an x86-64 host
with four-level page tables, the Go/s390x meta shadow starts at
144 TiB, beyond the 128 TiB userspace limit, and its mmap fails with
ENOMEM during TSan initialization.

Move the meta shadow down by 32 TiB to
[0x700000000000, 0x780000000000), restoring the 16 TiB gap after the
shadow and placing all Go/s390x TSan regions below 2^47. Correct the
mapping comment's shadow size and ratio.

Failure report and native s390x comparison:
https://github.com/golang/go/issues/67881

QEMU identity guest-base selection:

https://github.com/qemu/qemu/blob/v10.2.3/linux-user/elfload.c#L1036-L1042

    [9 lines not shown]
DeltaFile
+8-5compiler-rt/lib/tsan/rtl/tsan_platform.h
+8-51 files

LLVM/project 2978e2fllvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/CodeGen/X86 atomic-load-store.ll

Merge branch 'main' into users/ikudrin/clang-findallocationfunction-simplify
DeltaFile
+203-329llvm/test/CodeGen/X86/atomic-load-store.ll
+214-266llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+366-0llvm/test/tools/llvm-objcopy/MachO/linkedit-alignment.test
+241-0llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll
+232-0llvm/test/Transforms/VectorCombine/X86/shuffle-chain-reduction-subvector.ll
+182-2llvm/test/Transforms/InstCombine/or.ll
+1,438-597120 files not shown
+4,268-1,755126 files

LLVM/project 5066d3aclang/include/clang/Sema Sema.h, clang/lib/Sema SemaExprCXX.cpp SemaOverload.cpp

fixup! Streamline overload resolution
DeltaFile
+202-175clang/lib/Sema/SemaExprCXX.cpp
+2-2clang/include/clang/Sema/Sema.h
+1-1clang/lib/Sema/SemaOverload.cpp
+205-1783 files

LLVM/project 9d6c686orc-rt/include/orc-rt Session.h, orc-rt/lib/executor Session.cpp

[orc-rt] Sink Session::sendWrapperResult into Session.cpp. NFC. (#204956)

This function is never called inline (except by Session::wrapperReturn,
which is also in Session.cpp), so there's no need for it to be in the
header.
DeltaFile
+1-6orc-rt/include/orc-rt/Session.h
+7-0orc-rt/lib/executor/Session.cpp
+8-62 files

LLVM/project e1f65fallvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/SimplifyCFG convergent-loop-header.ll

[SimplifyCFG] Avoid threading loop-header branches in convergent functions

SimplifyCFG can fold a conditional branch when the condition is known from
a predecessor. When the destination is a loop header in a convergent function,
this can change the dynamic convergence structure of the loop even though the
scalar CFG rewrite is otherwise valid.

Skip this fold for loop-header branches in convergent functions so convergent
control flow is preserved.

Fixes ROCM-26496.
DeltaFile
+6-4llvm/test/Transforms/SimplifyCFG/convergent-loop-header.ll
+4-1llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+10-52 files

LLVM/project 0cddd5fllvm/test/Transforms/SimplifyCFG convergent-loop-header.ll

[NFC] Pre-commit a test case for a SimplifyCFG issue
DeltaFile
+50-0llvm/test/Transforms/SimplifyCFG/convergent-loop-header.ll
+50-01 files

LLVM/project ec56065.github/workflows new-prs.yml

workflows/new-prs: Remove obsolete code (#204955)

This was left over after 57e4352de0d2617bae1656dc2e2b3ca430e83c4c and
causing the jobs to fail.
DeltaFile
+0-1.github/workflows/new-prs.yml
+0-11 files

LLVM/project afac572clang/test CMakeLists.txt

[clang] Add clang-format-check-format instead to CLANG_TEST_DEPS (#204908)

Ensure that clang-format doesn't break the existing format of its own
source.

Reverts #199169 and #199638.
DeltaFile
+1-5clang/test/CMakeLists.txt
+1-51 files

LLVM/project 61d601ellvm/lib/Target/AMDGPU GCNVOPDUtils.cpp

[AMDGPU][VOPD] Cache load reachability checks in VOPDpairing (#204854)

#201930 causes significant compilation time regression when building
ROCm mathlibs.

Major regressions are caused by repeated queries to `DAG->IsReachable`
to detect possible scalarisation of loads when fusing a pair of
VOPD-capable instructions.
This patch caches the set of reachable loads for every potentially
hazardous load instruction to avoid the need to invoke
`DAG->IsReachable` at all.
DeltaFile
+74-48llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+74-481 files

LLVM/project 959f069llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp, llvm/test/CodeGen/X86 atomic-load-store.ll

[SelectionDAG] Keep split vector atomic store value in a vector register (#201566)

When the value of an ATOMIC_STORE has a vector type whose legalization
action is split (e.g. <4 x half>/<4 x bfloat> on X86 without F16C),
SplitVecOp_ATOMIC_STORE bitcast the value straight to a scalar integer
spanning the memory width. For a split vector that bitcast is expanded
element by element, reassembling the value in GPRs (a long pextrw/shl/or
sequence) before the store.

Instead, keep the value in a vector register when a legal vector form
exists: reinterpret it as a same-shaped integer-element vector (an FP
element type may have no legal vector form, e.g. bfloat on SSE2, while
the integer-of-element-size form does), widen that to a legal vector,
and extract the low integer element of the memory width. This issues the
store directly from a vector register (a single MOVQ/MOVD on X86),
matching the widen-path codegen already produced on AVX targets. Falls
back to the scalar bitcast when no suitable legal vector type exists.

Stacked on top of https://github.com/llvm/llvm-project/pull/197861; and
below of #197862.
DeltaFile
+203-329llvm/test/CodeGen/X86/atomic-load-store.ll
+33-6llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+236-3352 files