LLVM/project caed089llvm/include/llvm/Analysis TargetLibraryInfo.h, llvm/include/llvm/IR SystemLibraries.h

TargetLibraryInfo: Split off VectorLibrary enum and flag (#166980)

Move this to a new shared header to facilitate the eventual
merger of RuntimeLibcallsInfo and TargetLibraryInfo. Ideally
this would be replaced with a module flag. For now put it into
a common header both can use.
DeltaFile
+10-32llvm/lib/Analysis/TargetLibraryInfo.cpp
+39-0llvm/include/llvm/IR/SystemLibraries.h
+34-0llvm/lib/IR/SystemLibraries.cpp
+1-19llvm/include/llvm/Analysis/TargetLibraryInfo.h
+9-8llvm/lib/Frontend/Driver/CodeGenOptions.cpp
+1-0llvm/lib/IR/CMakeLists.txt
+94-596 files

LLVM/project 7fe60a7llvm/lib/Target/AArch64 AArch64RegisterInfo.cpp, llvm/test/CodeGen/AArch64 sve-streaming-mode-fixed-length-int-minmax.ll sve-fixed-length-int-minmax.ll

[AArch64][SVE] Avoid movprfx by reusing register for _UNDEF pseudos. (#166926)

For predicated SVE instructions where we know that the inactive lanes
are undef, it is better to pick a destination register that is not
unique. This avoids introducing a movprfx to copy a unique register to
the destination operand, which would be needed to comply with the tied
operand constraints.

For example:
```
  %src1 = COPY $z1
  %src2 = COPY $z2
  %dst = SDIV_ZPZZ_S_UNDEF %p, %src1, %src2
```
Here it is beneficial to pick $z1 or $z2 as the destination register,
because if it would have chosen a unique register (e.g. $z0) then the
pseudo expand pass would need to insert a MOVPRFX to expand the
operation into:
```

    [10 lines not shown]
DeltaFile
+73-13llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+32-48llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-minmax.ll
+32-48llvm/test/CodeGen/AArch64/sve-fixed-length-int-minmax.ll
+30-45llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll
+24-36llvm/test/CodeGen/AArch64/sve-fixed-length-int-shifts.ll
+24-36llvm/test/CodeGen/AArch64/sve-fixed-length-fp-arith.ll
+215-22624 files not shown
+417-52730 files

LLVM/project e1f8690clang/include/clang/Basic BuiltinsX86.td

[X86] BuiltinsX86.td - move the SSE constexpr builtins together. NFC. (#167323)

Makes it much easier to workout what still needs to be converted to be constexpr compatible
DeltaFile
+24-28clang/include/clang/Basic/BuiltinsX86.td
+24-281 files

LLVM/project 241bfacllvm/include/llvm/CodeGen MachineBasicBlock.h

[CodeGen] Use MCRegister in RegisterMaskPair constructor. NFC (#167274)

It's already used for the field in the struct.
DeltaFile
+1-1llvm/include/llvm/CodeGen/MachineBasicBlock.h
+1-11 files

LLVM/project 61e5bc3llvm/docs SandboxIR.md

[SandboxIR] Fix typo in doc (#167315)

DeltaFile
+2-2llvm/docs/SandboxIR.md
+2-21 files

LLVM/project 0bae337flang/lib/Optimizer/Transforms CUFComputeSharedMemoryOffsetsAndSize.cpp, flang/test/Fir/CUDA cuda-shared-offset.mlir

[flang][cuda] Fix detection of assumed size arrays in shared memory offset (#167231)

DeltaFile
+14-14flang/test/Fir/CUDA/cuda-shared-offset.mlir
+1-2flang/lib/Optimizer/Transforms/CUFComputeSharedMemoryOffsetsAndSize.cpp
+15-162 files

LLVM/project 9a783b6clang/include/clang/Driver Options.td OptionUtils.h, clang/include/clang/Options Options.td OptionUtils.h

[clang] Refactor option-related code from clangDriver into new clangOptions library (#163659)

This change moves option-related code from clangDriver into a new
clangOptions library.

This refactoring is part of a broader effort to support driver-managed
builds for compilations using C++ named modules and/or Clang modules.
It is required for linking the dependency scanning tooling against the
driver without introducing cyclic dependencies, which would otherwise
cause build failures when dynamic linking is enabled.
In particular, clangFrontend must no longer depend on clangDriver
for this to be possible.

This PR is motivated by the following review comment:
https://github.com/llvm/llvm-project/pull/152770#discussion_r2430756918
DeltaFile
+0-9,644clang/include/clang/Driver/Options.td
+9,644-0clang/include/clang/Options/Options.td
+256-291flang/lib/Frontend/CompilerInvocation.cpp
+0-58clang/include/clang/Driver/OptionUtils.h
+58-0clang/include/clang/Options/OptionUtils.h
+0-57clang/include/clang/Driver/Options.h
+9,958-10,050119 files not shown
+10,464-10,484125 files

LLVM/project 80e5448llvm/lib/Transforms/Vectorize VPlanRecipes.cpp

Include the cost of the select/predication in the cost of the partial reduction.

In practice this won't make much difference, because VPExpressions already accounts
for the cost of the predication.
DeltaFile
+19-9llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+19-91 files

LLVM/project c04d219llvm/lib/Transforms/Vectorize VPlanRecipes.cpp LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-dot-product-neon.ll partial-reduce-dot-product-epilogue.ll

[LV] Move condition to VPPartialReductionRecipe::execute

This means that VPExpressions will now be constructed for
VPPartialReductionRecipe's when the loop has tail-folding predication.

Note that control-flow (if/else) predication is not yet handled
for partial reductions, because of the way partial reductions
are recognised and built up.
DeltaFile
+243-483llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll
+81-161llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll
+11-6llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll
+8-2llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+1-8llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+1-1llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll
+345-6616 files

LLVM/project b58b38ellvm/test/Transforms/LoopVectorize/AArch64 vplan-printing.ll

[LV] NFC: Add new test

This shows that no VPExpression is built for partial reductions
that have some form of predication.
DeltaFile
+66-0llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll
+66-01 files

LLVM/project 38cade7llvm/lib/Transforms/Instrumentation InstrProfiling.cpp, offload/test/offloading/gpupgo pgo_atomic_threads.c pgo_device_only.c

[PGO][Offload] Fix missing names bug in GPU PGO (#166444)

After #163011 was merged, the tests in
[`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1)
broke because the offload plugins were no longer able to find
`__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm`
visible to the host on GPU targets and reverses the changes made in
f7e9968a5ba99521e6e51161f789f0cc1745193f.
DeltaFile
+4-0llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
+0-1offload/test/offloading/gpupgo/pgo_atomic_threads.c
+0-1offload/test/offloading/gpupgo/pgo_device_only.c
+0-1offload/test/offloading/gpupgo/pgo_atomic_teams.c
+0-1offload/test/offloading/gpupgo/pgo_device_and_host.c
+4-45 files

LLVM/project b9e22ccclang/lib/Driver ToolChain.cpp

[Flang][driver] Do not emit -latomic on link line on Windows (#164648)

Flang on Windows added `-latomic` to the link line. This library does
not exist on Windows and the linker gives a warning.
DeltaFile
+4-1clang/lib/Driver/ToolChain.cpp
+4-11 files

LLVM/project 36287b0llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

Match shift to signbit pattern instead of computeKnownBits
DeltaFile
+10-4llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+10-41 files

LLVM/project ed01dc9llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

Revert "Match shift to signbit pattern instead of computeKnownBits"

This reverts commit 49e2c3aa7a861fc8864c2d045b3804e31e1f13cc.

One case is slighly more sophisticated
DeltaFile
+4-10llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+4-101 files

LLVM/project d6b054dllvm/include/llvm/CodeGen SelectionDAG.h, llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp DAGCombiner.cpp

DAG: Fold copysign with a known signmask to a disjoint or

If the sign bit is a computed sign mask (i.e., we know it's
either +0 or -0), turn this into a disjoint or. This pattern
appears in the pow implementations.

We also need to know the sign bit of the magnitude is 0 for
the or to be disjoint. Unfortunately the DAG's FP tracking is
weak and we did not have a way to check if the sign bit is known
0, so add something for that. Ideally we would get a complete
computeKnownFPClass implementation.

This is intended to help avoid the regression which caused
d3e7c4ce7a3d7 to be reverted.
DeltaFile
+28-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+9-14llvm/test/CodeGen/AMDGPU/copysign-simplify-demanded-bits.ll
+6-16llvm/test/CodeGen/AMDGPU/copysign-to-disjoint-or-combine.ll
+20-0llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+4-0llvm/include/llvm/CodeGen/SelectionDAG.h
+67-305 files

LLVM/project b86a3c5llvm/lib/CodeGen/SelectionDAG SelectionDAGBuilder.cpp SelectionDAGBuilder.h, llvm/test/CodeGen/AMDGPU nofpclass-call.ll

DAG: Add AssertNoFPClass from call return attributes

This defends against regressions in future patches. This excludes
the target intrinsic case for now; I'm worried introducing an intermediate
AssertNoFPClass is likely to break combines.
DeltaFile
+17-0llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+4-12llvm/test/CodeGen/AMDGPU/nofpclass-call.ll
+4-0llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+25-123 files

LLVM/project 54053cfllvm/test/CodeGen/AMDGPU copysign-to-disjoint-or-combine.ll copysign-simplify-demanded-bits.ll

AMDGPU: Add baseline tests for copysign with known signmask input (#167265)

DeltaFile
+208-0llvm/test/CodeGen/AMDGPU/copysign-to-disjoint-or-combine.ll
+108-0llvm/test/CodeGen/AMDGPU/copysign-simplify-demanded-bits.ll
+316-02 files

LLVM/project 726c049llvm/test/CodeGen/AMDGPU nofpclass-call.ll

AMDGPU: Add baseline test for nofpclass on call results (#167263)

DeltaFile
+199-0llvm/test/CodeGen/AMDGPU/nofpclass-call.ll
+199-01 files

LLVM/project 669fbcellvm/test/CodeGen/AMDGPU copysign-to-disjoint-or-combine.ll copysign-simplify-demanded-bits.ll

AMDGPU: Add baseline tests for copysign with known signmask input (#167265)

DeltaFile
+208-0llvm/test/CodeGen/AMDGPU/copysign-to-disjoint-or-combine.ll
+108-0llvm/test/CodeGen/AMDGPU/copysign-simplify-demanded-bits.ll
+316-02 files

LLVM/project ad35805llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

Use makeNegative/makeNonNegative
DeltaFile
+2-4llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+2-41 files

LLVM/project 07fa1a5llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/AMDGPU compute-known-bits-nofpclass.ll

DAG: Handle AssertNoFPClass in computeKnownBits

It's possible to determine the sign bit if the value is known
one of the positive/negative classes and not-nan.
DeltaFile
+21-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+0-2llvm/test/CodeGen/AMDGPU/compute-known-bits-nofpclass.ll
+21-22 files

LLVM/project 741ba82llvm/test/CodeGen/AMDGPU compute-known-bits-nofpclass.ll

AMDGPU: Add baseline test for known bits of AssertNoFPClass (#167288)

DeltaFile
+48-0llvm/test/CodeGen/AMDGPU/compute-known-bits-nofpclass.ll
+48-01 files

LLVM/project 506b411llvm/test/CodeGen/AMDGPU amdgpu-cs-chain-fp-nosave.ll amdgpu-cs-chain-frame-pointer.ll

Rename indirect to recurse and keep in original file -- invalid for gfx1200. Move fp_all test to fp-nosave since it compiles on both gfx942 and gfx1200.
DeltaFile
+25-0llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll
+5-20llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll
+30-202 files

LLVM/project 9625cf6utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[BAZEL] Add missing dependency on /llvm:Support from XeGPUTransformOps (#167332)

Fixes 1553f90f93d30b41457097cf274c3791b182f316
DeltaFile
+1-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+1-01 files

LLVM/project d7dc554bolt/lib/Passes PointerAuthCFIAnalyzer.cpp, bolt/test/AArch64 negate-ra-state-incorrect.s

[BOLT] Use opts::Verbosity in PointerAuthCFIAnalyzer
DeltaFile
+17-10bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+1-1bolt/test/AArch64/negate-ra-state-incorrect.s
+18-112 files

LLVM/project 7e17eeabolt/lib/Passes PointerAuthCFIAnalyzer.cpp, bolt/test/runtime/AArch64 pacret-synchronous-unwind.cpp

[BOLT][PAC] Warn about synchronous unwind tables

BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.

See also: #165215
DeltaFile
+33-0bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
+8-1bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+41-12 files

LLVM/project 451be48bolt/docs PacRetDesign.md

Update PacRetDesign.md
DeltaFile
+1-1bolt/docs/PacRetDesign.md
+1-11 files

LLVM/project 12d91b0bolt/docs PacRetDesign.md

Update bolt/docs/PacRetDesign.md

Co-authored-by: Paschalis Mpeis <paschalis.mpeis at arm.com>
DeltaFile
+1-1bolt/docs/PacRetDesign.md
+1-11 files

LLVM/project 85a832fbolt/lib/Passes InsertNegateRAStatePass.cpp PointerAuthCFIFixup.cpp, bolt/unittests/Passes InsertNegateRAState.cpp PointerAuthCFIFixup.cpp

[BOLT][NFC] Rename Pointer Auth DWARF rewriter passes

Original names were "working titles". After initial patches are merged,
I'd like to rename these passes to names that reflect their intent
better and show their relationship to each other:

InsertNegateRAStatePass renamed to PointerAuthCFIFixup,
MarkRAStates renamed to PointerAuthCFIAnalyzer.
DeltaFile
+0-350bolt/lib/Passes/InsertNegateRAStatePass.cpp
+350-0bolt/lib/Passes/PointerAuthCFIFixup.cpp
+0-288bolt/unittests/Passes/InsertNegateRAState.cpp
+288-0bolt/unittests/Passes/PointerAuthCFIFixup.cpp
+145-0bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+0-145bolt/lib/Passes/MarkRAStates.cpp
+783-78313 files not shown
+929-92819 files

LLVM/project 56251e7bolt/unittests/Passes InsertNegateRAState.cpp

[BOLT] Test fillUnknownStubs
DeltaFile
+61-0bolt/unittests/Passes/InsertNegateRAState.cpp
+61-01 files