LLVM/project f73f43ccross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb pointer-union.test pointer-union.cpp, llvm/utils lldbDataFormatters.py

[llvm][formatters] Add LLDB formatter for llvm::PointerUnion (#175218)

We make use of the fact that the `PointerUnion` element is a
`PointerIntPair`, for which we have a synthetic provider already. We get
the `Int` portion of the pair (which is the index into the template
parameter pack of the union) to get the active type and the `Pointer`
portion of the pair to get the actual pointer value.

Before:
```
(lldb) (lldb) v -T z_float
(llvm::PointerUnion<Z *, float *>) z_float = {
  (llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> >, 0, Z *, float *>) llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> > >, 0, Z *, float *> = {
    (llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> >, 1, float *>) llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> > >, 1, float *> = {
      (llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> >, 2>) llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<Z *, float *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> > >, 2> = {
        (llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<Z *, float *> >) Val = {...}
      }
    }
  }

    [8 lines not shown]
DeltaFile
+86-0cross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb/pointer-union.test
+55-0llvm/utils/lldbDataFormatters.py
+28-0cross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb/pointer-union.cpp
+2-0cross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb/CMakeLists.txt
+171-04 files

LLVM/project b9859d0llvm/lib/Target/X86 X86InstCombineIntrinsic.cpp, llvm/test/Transforms/InstCombine/X86 x86-scalar-max-min.ll

[X86] InstCombine: Generalize scalar SSE MAX/MIN intrinsics (#175375)

Fixes #175162

This patch handles x86_sse_max_ss/min_ss and related intrinsics. It
check if is known to be safe to convert them to llvm.maxnum/minnum.

These intrinsics can be converted to `@llvm.maxnum` and `@llvm.minnum`.
This optimization can be done if the inputs are free of: NaN, Inf,
Subnormal, and NegZero. If it is not sure to be free of these, the
instructions remain the same.
DeltaFile
+74-0llvm/test/Transforms/InstCombine/X86/x86-scalar-max-min.ll
+40-9llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp
+114-92 files

LLVM/project 128731fllvm/include/llvm/Analysis ScalarEvolution.h, llvm/lib/Analysis ScalarEvolution.cpp

[SCEV] Handle all PtrtoIntExpr construction in CastSinkingRewriter (NFC) (#174435)

Move SCEVPtrToIntSinkingRewriter out of getLosslessPtrToIntExpr to be
re-used for PtrToAddr. Also streamline code in getLosslessPtrToIntExpr
by moving zero handling to the rewriter and removing special handling
for SCEVUnknown in getLosslessPtrToIntExpr. Instead, always use the
rewriter, which will automatically handle the case where the expression
is a SCEVUnknown.

This makes it slightly easier to add support for PtrToAddr as follow-up
to https://github.com/llvm/llvm-project/pull/158032

PR: https://github.com/llvm/llvm-project/pull/174435
DeltaFile
+80-98llvm/lib/Analysis/ScalarEvolution.cpp
+2-2llvm/include/llvm/Analysis/ScalarEvolution.h
+82-1002 files

LLVM/project 26e10cdopenmp/runtime CMakeLists.txt, openmp/runtime/src CMakeLists.txt

[OpenMP] Add libomp unit test infrastructure (#168063)

(The tests in `TestKmpStr.cpp` are an automatically generated POC to
make sure things work.)
DeltaFile
+331-0openmp/runtime/unittests/String/TestKmpStr.cpp
+89-0openmp/runtime/unittests/CMakeLists.txt
+14-18openmp/runtime/src/CMakeLists.txt
+22-0openmp/runtime/test/Unit/lit.cfg.py
+14-0openmp/runtime/CMakeLists.txt
+9-0openmp/runtime/unittests/README.md
+479-184 files not shown
+499-1910 files

LLVM/project 194a4d2llvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV intrinsic-cttz-elts-vscale.ll

[RISCV] Fix ReplaceNodeResults of Intrinsic::experimental_cttz_elts for RV32 (#174992)

The test case added in this patch crashes on rv32v without this fix. We
attempt to trunc the i32 type of the select produced by lowerCttzElts to
i64, which asserts. Use getZExtOrTrunc instead.
DeltaFile
+25-8llvm/test/CodeGen/RISCV/intrinsic-cttz-elts-vscale.ll
+1-2llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+26-102 files

LLVM/project b04cf3bllvm/lib/Transforms/Vectorize VPlanVerifier.cpp, llvm/test/Transforms/LoopVectorize/RISCV cse.ll

[VPlan] Remove verifier check that EVL can only be used by VPInstruction with one use (#175502)

Fixes #175028

We have a VPlanVerifier assertion that a VPInstruction that uses EVL
only has one use. This used to hold until we implemented CSE, but now we
can run into the case where e.g. a multiply from an expanded
VPWidenPointerInductionRecipe gets cse'd, causing it to have multiple
uses:

    EMIT ir<%0> = WIDEN-POINTER-INDUCTION ir<%.pre3>, ir<6>, vp<%5>
    EMIT ir<%1> = WIDEN-POINTER-INDUCTION ir<%.pre>, ir<6>, vp<%5>
    EMIT-SCALAR vp<%5> = EXPLICIT-VECTOR-LENGTH vp<%avl>

    -->

    EMIT-SCALAR vp<%10> = EXPLICIT-VECTOR-LENGTH vp<%avl>
    EMIT vp<%11> = mul ir<6>, vp<%10>
    EMIT vp<%ptr.ind> = ptradd vp<%pointer.phi>, vp<%11>

    [13 lines not shown]
DeltaFile
+64-0llvm/test/Transforms/LoopVectorize/RISCV/cse.ll
+0-10llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+64-102 files

LLVM/project 670ecd7.github/workflows issue-write.yml

GHA: Add the "Check LLVM ABI" flow to issue-write (#175549)

This is needed to properly test that #172673 is working as expected.
DeltaFile
+1-0.github/workflows/issue-write.yml
+1-01 files

LLVM/project 52d6170clang/include/clang/Basic BuiltinsX86.td, clang/lib/AST ExprConstant.cpp

[X86][Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - Allow SSE/AVX FP MAX/MIN intrinsics to be used in constexpr (#171966)

* Implemented a generic function interp__builtin_elementwise_fp_binop
* NaN, Infinity, Denormal cases can be integrated into the lambda in
future. For, now these cases are hardcoded in the generic function

Resolves: #169991
DeltaFile
+52-52clang/lib/Headers/avx512fintrin.h
+74-0clang/lib/AST/ByteCode/InterpBuiltin.cpp
+68-0clang/lib/AST/ExprConstant.cpp
+32-16clang/test/CodeGen/X86/avx512vl-builtins.c
+29-18clang/include/clang/Basic/BuiltinsX86.td
+20-26clang/lib/Headers/avx512vlfp16intrin.h
+275-11211 files not shown
+380-16417 files

LLVM/project a823a2aclang/lib/Basic/Targets SPIR.h, clang/test/Sema spirv-address-space.c

[SPIR-V] Do not allow AS(2) to convert to generic (#175275)

Summary:
The original logic permitted this, while it's not permitted by the
standard.

---------

Co-authored-by: Dmitry Sidorov <18708689+MrSidims at users.noreply.github.com>
DeltaFile
+3-2clang/lib/Basic/Targets/SPIR.h
+1-2clang/test/Sema/spirv-address-space.c
+4-42 files

LLVM/project 27074aallvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-debug-output-crash.ll

[AMDGPU] Fix crash in SIInsertWaitcnts debug output (#175518)

In some cases we were accessing `OldWaitcntInstr.getParent()->end()`
after `OldWaitcntInstr` had already been erased from its parent.
DeltaFile
+28-32llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+19-0llvm/test/CodeGen/AMDGPU/waitcnt-debug-output-crash.ll
+47-322 files

LLVM/project 48a5c1dllvm/test/MC/AMDGPU isa-version-pal.s dl-insts-err.s

[AMDGPU] Use -filetype=null for MC tests that do not check stdout (#175543)

DeltaFile
+6-6llvm/test/MC/AMDGPU/isa-version-pal.s
+5-5llvm/test/MC/AMDGPU/dl-insts-err.s
+5-5llvm/test/MC/AMDGPU/isa-version-unk.s
+5-5llvm/test/MC/AMDGPU/isa-version-hsa.s
+3-3llvm/test/MC/AMDGPU/elf-header-cov.s
+3-3llvm/test/MC/AMDGPU/gfx950_asm_vop3.s
+27-2724 files not shown
+58-5830 files

LLVM/project 46016e6libcxx/include string

[libc++] Make basic_string::__erase_external_with_move noexcept (#171591)

`__erase_external_with_move` is in the dylib, so the compiler doesn't
see the definition. Marking it `noexcept` sometimes allows clang to
remove exceptions related code, improving code size slightly.
DeltaFile
+3-2libcxx/include/string
+3-21 files

LLVM/project 85c1ce9llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass.ll

InstCombine: SimplifyDemandedFPClass multiple use support for select
DeltaFile
+56-6llvm/test/Transforms/InstCombine/simplify-demanded-fpclass.ll
+27-1llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+83-72 files

LLVM/project 03e1a43clang/lib/Driver/ToolChains PS4CPU.cpp, clang/test/Driver ps5-linker.c

[PS5][Driver] forward -ffat-lto-objects to the linker (#172854)

When clang is driving the linker and is passed -ffat-lto-objects, pass
it on to the linker as --fat-lto-objects.
DeltaFile
+13-0clang/test/Driver/ps5-linker.c
+4-0clang/lib/Driver/ToolChains/PS4CPU.cpp
+17-02 files

LLVM/project 3b2d14bllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

[AMDGPU] Inline two helpers in SIInsertWaitcnts. NFC. (#174557)

DeltaFile
+9-22llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+9-221 files

LLVM/project 185f078libc/src/__support/GPU allocator.cpp

[libc] Improve SIMT control flow in the GPU allocator

Summary:
The Volta independent thread scheduling is very difficult to work with.
This is a first attempt to make the logic more sound when lanes execute
independently. This isn't all that's required, but it ends up improving
control flow for AMDGPU as well.
DeltaFile
+46-39libc/src/__support/GPU/allocator.cpp
+46-391 files

LLVM/project 263802cflang/lib/Optimizer/Builder IntrinsicCall.cpp, flang/test/Lower/Intrinsics show_descriptor.f90

[flang] Enhance show_descriptor intrinsic to avoid extra descriptor copies (#173461)

Originally, the argument to show_descriptor() intrinsic was declared
with the passing mechanism of "asBox". This resulted in `fir.load`
instruction to be emitted to pass descriptor "asBox", which resulted in
extra llvm.memcpy in LLVM IR. The current change eliminates this, so
that show_descriptor() prints information about the original descriptor,
not about its copy.

The current change modifies the passing mechanism of the argument to
show_intrinsic() to "asInquired". The lowering of show_descriptor() now
passes the reference to a descriptor directly to the runtime routine. If
descriptor is passed as a value in SSA register, then it's spilled on
the stack and its address is passed to the runtime routine. If a
non-descriptor value is passed to show_descriptor(), then this value is
spilled to the stack, wrapped into a descriptor that is also spilled to
the stack, and the resulting descriptor pointer is passed to
show_descriptor().

show_descriptor() LIT test was modified to correspond to the new
implementation and additional test cases were added to it.
DeltaFile
+111-59flang/test/Lower/Intrinsics/show_descriptor.f90
+36-5flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+147-642 files

LLVM/project f3187e7llvm/lib/Target/AArch64 AArch64InstrInfo.td AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 arm64-cvt-simd-fptoi.ll arm64-cvtf-simd-itofp.ll

[AArch64][llvm] Allow FPRCVT insns to run in streaming mode if safe

For FEAT_FPRCVT instructions, allow them to run in streaming mode if safe
DeltaFile
+2-2llvm/lib/Target/AArch64/AArch64InstrInfo.td
+3-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+2-0llvm/test/CodeGen/AArch64/arm64-cvt-simd-fptoi.ll
+2-0llvm/test/CodeGen/AArch64/arm64-cvtf-simd-itofp.ll
+2-0llvm/test/CodeGen/AArch64/fp16_i16_intrinsic_scalar.ll
+11-25 files

LLVM/project 2220c00mlir/lib/Analysis/DataFlow SparseAnalysis.cpp

[mlir][dataflow] Use OpWithFlags skipRegions to replace opName when print op in SparseAnalysis.cpp (NFC) (#175418)

DeltaFile
+13-7mlir/lib/Analysis/DataFlow/SparseAnalysis.cpp
+13-71 files

LLVM/project b983b0ellvm/include/llvm/CodeGen SelectionDAGTargetInfo.h, llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

[PowerPC] using milicode call for strcpy instead of lib call (#174782)

AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strcpy routine is a millicode implementation;
we use millicode for the strcpy function instead of a library call to
improve performance.

---------

Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
DeltaFile
+57-26llvm/test/CodeGen/PowerPC/milicode64.ll
+41-21llvm/test/CodeGen/PowerPC/milicode32.ll
+26-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+10-0llvm/lib/Target/PowerPC/PPCSelectionDAGInfo.cpp
+6-4llvm/lib/Target/SystemZ/SystemZSelectionDAGInfo.h
+4-5llvm/include/llvm/CodeGen/SelectionDAGTargetInfo.h
+144-565 files not shown
+164-6211 files

LLVM/project 14e97d6mlir/lib/Dialect/Shape/Transforms OutlineShapeComputation.cpp

[mlir][Shape] Fix Yoda condition in OutlineShapeComputation (#174146)

Change `nullptr != inpDefOp` to `inpDefOp != nullptr` for better
readability and consistency with LLVM coding standards.
DeltaFile
+1-1mlir/lib/Dialect/Shape/Transforms/OutlineShapeComputation.cpp
+1-11 files

LLVM/project e661230lldb/test/API/commands/frame/var-dil/basics/LocalVars TestFrameVarDILLocalVars.py

[lldb][test] Remove unused imports in TestFrameVarDILLocalVars.py (#175541)

DeltaFile
+0-4lldb/test/API/commands/frame/var-dil/basics/LocalVars/TestFrameVarDILLocalVars.py
+0-41 files

LLVM/project 51ee583clang-tools-extra/include-cleaner/lib Analysis.cpp, clang-tools-extra/include-cleaner/unittests AnalysisTest.cpp

[include-cleaner] Report refs from macro-concat'd tokens as ambigious (#175532)

Previously we completely ignored these references as we couldn't detect
whether some pieces of concat'd token originated from main file and we
wanted to prevent false positives. Unfortunately these are resulting in
false negatives in certain cases and are breaking builds.

After this change, include-cleaner will treat such references as
ambigious to prevent deletion of likely-used headers (if they're already
directly included), while still giving user the opportunity to
explicitly delete them.
DeltaFile
+26-0clang-tools-extra/include-cleaner/unittests/AnalysisTest.cpp
+13-1clang-tools-extra/include-cleaner/lib/Analysis.cpp
+39-12 files

LLVM/project 8380b57clang/docs ReleaseNotes.rst, clang/lib/Sema SemaOverload.cpp

[Clang] prevent an assertion failure caused by C++ constant expression checks in C23 floating conversions (#174113)

Fixes #173847

---

This patch addresses an assertion failure during compilation of C23 code
involving floating-point conversions.

As part of the C23 constexpr support introduced in PR #73099, Clang
began reusing parts of the C++ constant evaluation and narrowing logic.
In C23 mode, a failed constant evaluation caused the condition to
proceed to C++ constant-expression checks, resulting in an assertion
failure.

This change evaluates constants using `EvaluateAsRValue` in C23 mode and
restricts C++ constant-expression checks to C++ mode.
DeltaFile
+8-0clang/test/Sema/constexpr.c
+2-1clang/lib/Sema/SemaOverload.cpp
+1-0clang/docs/ReleaseNotes.rst
+11-13 files

LLVM/project a9a39e1llvm/lib/Target/AMDGPU SIFrameLowering.cpp, llvm/test/CodeGen/AMDGPU amdgpu-cs-chain-frame-pointer.ll

merge host code branches, so simplify expression arg to emitCSRSpillRestores
DeltaFile
+2-5llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+0-7llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-frame-pointer.ll
+2-122 files

LLVM/project 986a1a5llvm/test/CodeGen/AMDGPU ran-out-of-sgprs-allocation-failure.mir

lit test update after rebase from main.
DeltaFile
+81-102llvm/test/CodeGen/AMDGPU/ran-out-of-sgprs-allocation-failure.mir
+81-1021 files

LLVM/project 66cb85allvm/test/CodeGen/AMDGPU regpressure-mitigation-with-subreg-reload.mir

test rebase
DeltaFile
+12-12llvm/test/CodeGen/AMDGPU/regpressure-mitigation-with-subreg-reload.mir
+12-121 files

LLVM/project 710afcdllvm/lib/Target/AMDGPU #SIRegisterInfo.cpp#, llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll skip-partial-reload-for-16bit-regaccess.mir

[InlineSpiller][AMDGPU] Implement subreg reload during RA spill

Currently, when a virtual register is partially used, the
entire tuple is restored from the spilled location, even if
only a subset of its sub-registers is needed. This patch
introduces support for partial reloads by analyzing actual
register usage and restoring only the required sub-registers.
This improvement enhances register allocation efficiency,
particularly for cases involving tuple virtual registers.
For AMDGPU, this change brings considerable improvements
in workloads that involve matrix operations, large vectors,
and complex control flows.
DeltaFile
+3,429-4,107llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+3,938-0llvm/lib/Target/AMDGPU/#SIRegisterInfo.cpp#
+91-0llvm/test/CodeGen/AMDGPU/skip-partial-reload-for-16bit-regaccess.mir
+35-56llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+40-40llvm/test/CodeGen/AMDGPU/ra-inserted-scalar-instructions.mir
+26-52llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+7,559-4,25568 files not shown
+7,886-4,48174 files

LLVM/project 7676097llvm/test/CodeGen/AMDGPU regpressure-mitigation-with-subreg-reload.mir

compacted the virt-reg numbers
DeltaFile
+14-14llvm/test/CodeGen/AMDGPU/regpressure-mitigation-with-subreg-reload.mir
+14-141 files

LLVM/project 6f69429llvm/test/CodeGen/AMDGPU regpressure-mitigation-with-subreg-reload.mir

[AMDGPU] Test precommit for subreg reload

This test currently fails due to insufficient
registers during allocation. Once the subreg
reload is implemented, it will begin to pass
as the partial reload help mitigate register
pressure.
DeltaFile
+37-0llvm/test/CodeGen/AMDGPU/regpressure-mitigation-with-subreg-reload.mir
+37-01 files