LLVM/project 09f8f22libcxx/include stdatomic.h, libcxx/test/extensions/libcxx/depr/depr.c.headers include_stdatomic_as_c.sh.cpp

[libc++] "Always" include_next for non C++ path in stdatomic.h (#178463)

In https://github.com/llvm/llvm-project/pull/176903, `#include
<__configuration/compiler.h>` is moved into the
`#ifdef _cplusplus` clause, so `_LIBCPP_COMPILER_CLANG_BASED` is no
longer set for C compiles. This causes a regression internally, where
when C compiles includes stdatomic.h, they no longer get the
corresponding C header.

C++ stdlib headers "shouldn't" be on the search patch for C compile, but
we do and so do lots of other people, so libc++ tends to support that.
This include_next for a C compile should be unconditional, not
conditional upon being Clang.
DeltaFile
+30-0libcxx/test/extensions/libcxx/depr/depr.c.headers/include_stdatomic_as_c.sh.cpp
+1-1libcxx/include/stdatomic.h
+31-12 files

LLVM/project d1e2ddfllvm/lib/Target/AMDGPU SIInstrInfo.cpp, llvm/test/CodeGen/AMDGPU misaligned-vgpr-regsequence.mir siloadstoreopt-misaligned-regsequence.ll

[AMDGPU] Emit b32 movs if (a)v_mov_b64_pseudo dest vgprs are misaligned (#160547)

#154115 Exposed a possible destination misaligned v_mov_b64

Relaxes v_mov_b64_pseudo register class constraint (which matches
av_mov_b64_pseudo's register class).
DeltaFile
+22-14llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+30-0llvm/test/CodeGen/AMDGPU/misaligned-vgpr-regsequence.mir
+21-0llvm/test/CodeGen/AMDGPU/siloadstoreopt-misaligned-regsequence.ll
+20-0llvm/test/CodeGen/AMDGPU/av_movimm_pseudo_expansion.mir
+9-0llvm/test/CodeGen/AMDGPU/v_mov_b64_expansion.mir
+3-2llvm/test/CodeGen/AMDGPU/peephole-fold-imm.mir
+105-161 files not shown
+106-177 files

LLVM/project a726b19llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-chained.ll

NFC: Cleanup AArch64/partial-reduce-chained.ll

This had some loop attributes that were unused.
Also cleaned up the flags a little bit.
DeltaFile
+16-15llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
+16-151 files

LLVM/project 6a2d74dllvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Handle nsz in copysign SimplifyDemandedFPClass (#176916)

If the only sign bit difference is for 0, fold through the source.
DeltaFile
+30-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+2-4llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+32-42 files

LLVM/project 5823ce0clang/include/clang/Analysis/Analyses/LifetimeSafety MovedLoans.h Facts.h, clang/lib/Analysis/LifetimeSafety MovedLoans.cpp Facts.cpp

Revisit handling moved origins
DeltaFile
+108-0clang/lib/Analysis/LifetimeSafety/MovedLoans.cpp
+66-5clang/lib/Analysis/LifetimeSafety/Facts.cpp
+32-24clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+37-17clang/test/Sema/warn-lifetime-safety.cpp
+44-0clang/include/clang/Analysis/Analyses/LifetimeSafety/MovedLoans.h
+36-6clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+323-5215 files not shown
+480-11721 files

LLVM/project 1dbc705llvm/lib/Transforms/Scalar SimplifyCFGPass.cpp

[SimplifyCFG] Increase iterative simplification convergence limit. (#178406)

https://github.com/llvm/llvm-project/commit/a9b0776a81e84d8042716863842fe1f8adf39cad
added an assertion to avoid infinite loops. However, the limit seems
arbitrary, there is no justification for it neither in the code nor in
the commit message, so I think this can be increased.
DeltaFile
+1-1llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
+1-11 files

LLVM/project d8c17dcllvm/lib/Target/AMDGPU VOP3PInstructions.td, llvm/test/CodeGen/AMDGPU wmma-gfx12-convergent.mir

[AMDGPU] Ensure all WMMA instructions are marked as convergent (#178314)

This is an extension of
https://github.com/llvm/llvm-project/pull/165602. It is needed to fix an
issue with V_WMMA_F32_16X16X16_F16_twoaddr_w32 being incorrectly sunk by
machine-sink.

All WMMA instructions in AMDGPUGenInstrInfo.inc were verified to be
marked as convergent.

---------

Signed-off-by: John Lu <John.Lu at amd.com>
DeltaFile
+151-2llvm/test/CodeGen/AMDGPU/wmma-gfx12-convergent.mir
+2-2llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+153-42 files

LLVM/project 995b340llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp

Fix using Known as input
DeltaFile
+2-3llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+2-31 files

LLVM/project d014487llvm/include/llvm/Support KnownFPClass.h, llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp

InstCombine: Handle multiple use copysign

Handle multiple use copysign in SimplifyDemandedFPClass
DeltaFile
+36-3llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+7-7llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+7-0llvm/include/llvm/Support/KnownFPClass.h
+50-103 files

LLVM/project f304b41llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp

Address comments
DeltaFile
+3-3llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+3-31 files

LLVM/project 51a9fccllvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Handle nsz in copysign SimplifyDemandedFPClass

If the only sign bit difference is for 0, fold through the source.
DeltaFile
+31-1llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+2-4llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+33-52 files

LLVM/project 914ca54llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Add baseline tests for SimplifyDemandedFPClass copysign improvements (#176915)

Prepare to support more folds and multiple uses.
DeltaFile
+651-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+651-01 files

LLVM/project 7803b4allvm/lib/Target/AMDGPU AMDGPUAttributor.cpp, llvm/test/CodeGen/AMDGPU amdgpu-attributor-trap-leaf.ll amdgpu-attributor-nocallback-intrinsics.ll

AMDGPU: Add support for llvm.trap to handling of intrinsics with !nocallback (#175230)

This adds support to whitelist trap intrinsics while handling of
intrinsics with !nocallback. This fixes the reasons behind the previous
revert of #131759.

The attributor was exiting early whenever it saw intrinsics without the nocallback bit, so trap-only kernels lost all the inferred “no implicit arg” metadata and their amdgpu-agpr-alloc=0 guarantees. That conservative fallback broke certain workloads by forcing unnecessary implicit arguments and AGPR reservations. This patch allows the pass to recognize leaf-like trap intrinsics, so they no longer poison the analysis.

---------

Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
DeltaFile
+65-0llvm/test/CodeGen/AMDGPU/amdgpu-attributor-trap-leaf.ll
+16-3llvm/test/CodeGen/AMDGPU/amdgpu-attributor-nocallback-intrinsics.ll
+12-2llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+93-53 files

LLVM/project 86d0509llvm/test/CodeGen/AMDGPU fneg-combines.f16.ll fneg-combines.ll, llvm/test/CodeGen/AMDGPU/GlobalISel llvm.amdgcn.s.buffer.load.ll

Rebase

Created using spr 1.3.7
DeltaFile
+56,025-0llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+14,154-5,110llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+850-5,393llvm/test/CodeGen/RISCV/fpclamptosat.ll
+2,230-3,501llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
+2,626-2,303llvm/test/CodeGen/AMDGPU/fneg-combines.ll
+2,801-1,573llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.s.buffer.load.ll
+78,686-17,880833 files not shown
+121,849-36,064839 files

LLVM/project 6e0881cclang/lib/CIR/CodeGen CIRGenCall.cpp, clang/test/CIR/CodeGen misc-attrs.cpp

[CIR] 3 more 'quick' function attribute lowering through LLVMIRDialect (#178443)

This patch lowers 3 more attributes, two of which are trivial, and one
which has a touch of a complication.

The two trivial ones are no_caller_saved_registers and nocallback, which
are language-level attributes that are effectively just passed on.

The final one is a touch more complicated, as it is a 'string'
attribute: modular-format. Also, it has a dash in the LLVM-IR version,
but that isn't possible to add as a name in the LLVM-IR MLIR Dialect
(see the comment inline). It also has a string of some consequence (that
is checked in LLVM), but that is just passed to LLVM directly.
DeltaFile
+52-8clang/lib/CIR/CodeGen/CIRGenCall.cpp
+40-0clang/test/CIR/CodeGen/misc-attrs.cpp
+37-0mlir/test/Target/LLVMIR/Import/instructions.ll
+33-0mlir/test/Target/LLVMIR/llvmir.mlir
+18-0mlir/test/Dialect/LLVMIR/func.mlir
+18-0mlir/test/Target/LLVMIR/Import/function-attributes.ll
+198-87 files not shown
+261-813 files

LLVM/project 7d9f720llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Improve single-use fneg(fabs(x)) SimplifyDemandedFPClass handling (#176360)

Match the multi-use case's logic for understanding no-nan/no-inf
context.
Also only apply the nsz handling in the single use case. alive2 seems to
treat nsz as nondeterministic for each use.
DeltaFile
+265-11llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+73-20llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+338-312 files

LLVM/project 13b7307clang/lib/Driver/ToolChains Linux.cpp, clang/test/Driver linux-header-search.cpp linux-cross.cpp

[Clang] Include clang standard lib header directory from Linux (#175593)

Summary:
The LLVM-libc stores its headers in the target-specific include
directory. This PR makes the Linux toolchain include the standard lib
directory when used. This allows LLVM-libc to work and any other
standard language headers installed there. We search this first.
DeltaFile
+13-0clang/test/Driver/linux-header-search.cpp
+5-0clang/lib/Driver/ToolChains/Linux.cpp
+3-0clang/test/Driver/linux-cross.cpp
+0-0clang/test/Driver/Inputs/basic_linux_tree/usr/include/x86_64-unknown-linux-gnu/.keep
+0-0clang/test/Driver/Inputs/basic_linux_tree/usr/bin/.keep
+21-05 files

LLVM/project 0694daallvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/X86 binop-shuffle-mask1-cm.ll

[VectorCombine] Fix typo in foldPermuteOfBinops cost calculation (#178072)

Addresses an issue in #173153. This patch expanded the supported ops for
folding binary ops through shuffles, but seemingly had a typo which
could inaccurately increase the unmodified cost.
DeltaFile
+18-0llvm/test/Transforms/VectorCombine/X86/binop-shuffle-mask1-cm.ll
+1-1llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+19-12 files

LLVM/project e05ce8fclang/lib/Driver/ToolChains SPIRV.cpp

[SPIRV] Properly discover LLVM tools that live next to the compiler (#178779)

Summary:
When we compile with `-emit-llvm` it will try to use `llvm-link`. The
toolchain does not properly add the driver directory as a valid path so
this will default to the user's search path. This, like other tools,
should prioritize the binaries living next to the compiler.

Side note, why is this not default behavior?
DeltaFile
+3-0clang/lib/Driver/ToolChains/SPIRV.cpp
+3-01 files

LLVM/project 1a86c14offload/liboffload/API Platform.td, offload/liboffload/src OffloadImpl.cpp

[Offload] Add a function to register an RPC Server callback (#178774)

Summary:
We provide an RPC server to manage calls initiated by the device to run
on the host. This is very useful for the built-in handling we have,
however there are cases where we would want to extend this
functionality.

Cases like Fortran or MPI would be useful, but we cannot put references
to these in the core offloading runtime. This way, we can provide this
as a library interface that registers custom handlers for whatever code
people want.
DeltaFile
+66-0offload/test/libc/rpc_callback.c
+26-8offload/plugins-nextgen/common/src/RPC.cpp
+30-0offload/unittests/OffloadAPI/platform/olRegisterRPCCallback.cpp
+23-0offload/liboffload/API/Platform.td
+7-0offload/libomptarget/interface.cpp
+6-0offload/liboffload/src/OffloadImpl.cpp
+158-83 files not shown
+169-89 files

LLVM/project b6e7a71clang/include/clang/Analysis/Analyses/LifetimeSafety MovedLoans.h, clang/lib/Analysis/LifetimeSafety MovedLoans.cpp FactsGenerator.cpp

Revisit handling moved origins
DeltaFile
+121-0clang/lib/Analysis/LifetimeSafety/MovedLoans.cpp
+32-24clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+37-17clang/test/Sema/warn-lifetime-safety.cpp
+45-0clang/include/clang/Analysis/Analyses/LifetimeSafety/MovedLoans.h
+24-12clang/lib/Analysis/LifetimeSafety/Checker.cpp
+26-9clang/lib/Sema/AnalysisBasedWarnings.cpp
+285-6215 files not shown
+422-10621 files

LLVM/project 5080e0ellvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Handle nsz in copysign SimplifyDemandedFPClass

If the only sign bit difference is for 0, fold through the source.
DeltaFile
+31-1llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+2-4llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+33-52 files

LLVM/project 5a6b498llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp

Fix using Known as input
DeltaFile
+2-3llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+2-31 files

LLVM/project bad165ellvm/include/llvm/Support KnownFPClass.h, llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp

InstCombine: Handle multiple use copysign

Handle multiple use copysign in SimplifyDemandedFPClass
DeltaFile
+36-3llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+7-7llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+7-0llvm/include/llvm/Support/KnownFPClass.h
+50-103 files

LLVM/project 1fbea7ellvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp

Address comments
DeltaFile
+3-3llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+3-31 files

LLVM/project 6c2469allvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Add baseline tests for SimplifyDemandedFPClass copysign improvements

Prepare to support more folds and multiple uses.
DeltaFile
+651-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+651-01 files

LLVM/project d67cb8ellvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

One use check
DeltaFile
+7-4llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+1-1llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+8-52 files

LLVM/project 8257000llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp

Remove redundant hasOneUse check
DeltaFile
+1-2llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+1-21 files

LLVM/project c4fd242llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

Add another test
DeltaFile
+18-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+18-01 files

LLVM/project 84c091allvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: Improve single-use fneg(fabs(x)) SimplifyDemandedFPClass handling

Match the multi-use case's logic for understanding no-nan/no-inf context.
Also only apply the nsz handling in the single use case. alive2 seems to treat
nsz as nondeterministic for each use.
DeltaFile
+244-11llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+74-20llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+318-312 files