LLVM/project 7e2411cclang/docs StandardCPlusPlusModules.rst

[clang][docs] Add link to C++ modules Wikipedia page to docs (#169200)

This PR adds a link to the "[Modules
(C++)](https://en.wikipedia.org/wiki/Modules_(C++))" page on Wikipedia
and similar on cpp reference, as per recommendation by another
contributor.
DeltaFile
+4-1clang/docs/StandardCPlusPlusModules.rst
+4-11 files

LLVM/project 399166cllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

fix: resolve issue after rebase
DeltaFile
+0-15llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-151 files

LLVM/project 4289849bolt/docs BAT.md

Improve formatting in BAT.md (#170254)

Make "Header" a subheading to improve readability in the Functions table
section.
DeltaFile
+1-0bolt/docs/BAT.md
+1-01 files

LLVM/project f411dc4llvm/lib/Target/AMDGPU SIISelLowering.h

Fix build
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIISelLowering.h
+1-11 files

LLVM/project ea3fdc5llvm/lib/Analysis InstructionSimplify.cpp, llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp

Avoid maxnum(sNaN, x) optimizations / folds (#170181)

The behaviour of constant-folding `maxnum(sNaN, x)` and `minnum(sNaN,
x)` has become controversial, and there are ongoing discussions about
which behaviour we want to specify in the LLVM IR LangRef.

See:
  - https://github.com/llvm/llvm-project/issues/170082
  - https://github.com/llvm/llvm-project/pull/168838
  - https://github.com/llvm/llvm-project/pull/138451
  - https://github.com/llvm/llvm-project/pull/170067
-
https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006

This patch removes optimizations and constant-folding support for
`maxnum(sNaN, x)` but keeps it folded/optimized for `qNaN`. This should
allow for some more flexibility so the implementation can conform to
either the old or new version of the semantics specified without any
changes.

    [4 lines not shown]
DeltaFile
+37-8llvm/test/CodeGen/X86/fminnum.ll
+37-8llvm/test/CodeGen/X86/fmaxnum.ll
+32-7llvm/test/CodeGen/ARM/fminmax-folds.ll
+22-13llvm/test/Transforms/InstSimplify/fminmax-folds.ll
+11-14llvm/lib/Analysis/InstructionSimplify.cpp
+7-8llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+146-586 files not shown
+174-6612 files

LLVM/project f5dd2dcllvm/cmake/modules CrossCompile.cmake

[cmake] Fix semicolon expansion when passing LLVM_TABLEGEN_FLAGS (#169518)

This patch uses common workaround for cmake semicolon expansion to
spaces
DeltaFile
+3-1llvm/cmake/modules/CrossCompile.cmake
+3-11 files

LLVM/project 0e6d612llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 neon-anyof-splat.ll expand-select.ll

[AArch64] Improve select dagcombine (#169925)

An AnyOf reduction (aka vector.reduce.or) with a fixed-width vector is
canonicalized to a bitcast of the mask vector to an integer of the same
overall size, which is then compared against zero.

If the scalar result of the bitcast is smaller than the element size of
vectors being selected, we often end up with suboptimal codegen. This
fixes the main cases, removing scalarized code.
DeltaFile
+67-0llvm/test/CodeGen/AArch64/neon-anyof-splat.ll
+20-30llvm/test/CodeGen/AArch64/expand-select.ll
+9-6llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+96-363 files

LLVM/project 153c7e4runtimes/cmake/Modules WarningFlags.cmake

[libc++] Use private CMake flags to enable the pragma system_header macro when building (#138826)

That property doesn't need to be propagated beyond the translation units
of the libc++ built library itself.
DeltaFile
+1-1runtimes/cmake/Modules/WarningFlags.cmake
+1-11 files

LLVM/project 709640dllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

skip expanding out-of-order events
DeltaFile
+143-20llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+42-12llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+185-322 files

LLVM/project 7e993fbllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Address reviewer feedback: fix getWaitCountMax and reduce code duplication

- Fix getWaitCountMax() to use correct bitmasks based on architecture:
  - Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
  - GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
  a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0
DeltaFile
+18-32llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+18-321 files

LLVM/project a28ab4ellvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

add run line for diff GPU Gen and counter types
DeltaFile
+567-203llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+567-2031 files

LLVM/project 48c7b23llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

[AMDGPU] Add -amdgpu-expand-waitcnt-profiling option for PC-sampling profiling
DeltaFile
+230-0llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+172-22llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+402-222 files

LLVM/project 74552a4llvm/lib/Target/AMDGPU AMDGPUISelDAGToDAG.cpp AMDGPUInstrInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel extractelement.i128.ll implicit-kernarg-backend-usage-global-isel.ll

Revert "AMDGPU: Fix treating unknown mem operands as uniform (#168980)"

This reverts commit d23e1765a9a7cc52673e374be7869f5f0ffc6486.
DeltaFile
+52-222llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
+29-45llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll
+10-8llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+5-3llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+97-2795 files

LLVM/project a21cbdallvm/lib/Target/AMDGPU AMDGPUCallLowering.cpp

Fix build
DeltaFile
+2-2llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+2-21 files

LLVM/project 5202766llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp

Remove comment
DeltaFile
+0-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+0-11 files

LLVM/project acde738llvm/lib/Target/AMDGPU AMDGPUCallLowering.cpp

Use MF variable
DeltaFile
+1-1llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+1-11 files

LLVM/project d23e176llvm/lib/Target/AMDGPU AMDGPUISelDAGToDAG.cpp AMDGPUInstrInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel extractelement.i128.ll implicit-kernarg-backend-usage-global-isel.ll

AMDGPU: Fix treating unknown mem operands as uniform (#168980)

The test changes are mostly GlobalISel specific regressions.
GlobalISel is still relying on isUniformMMO, but it doesn't really
have an excuse for doing so. These should be avoidable with new
regbankselect.

There is an additional regression for addrspacecast for cov4. We
probably ought to be using a separate PseudoSourceValue for the
access of the queue pointer.
DeltaFile
+222-52llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i128.ll
+43-27llvm/test/CodeGen/AMDGPU/GlobalISel/implicit-kernarg-backend-usage-global-isel.ll
+8-10llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+3-5llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+277-955 files

LLVM/project 9d55129llvm/lib/Target/AMDGPU AMDGPUCallLowering.cpp

Use helper
DeltaFile
+1-3llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+1-31 files

LLVM/project 2054131llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel irtranslator-amdgpu_kernel.ll regbankselect-widen-scalar-loads.mir

AMDGPU: Use ConstantPool as source value for DAG lowered kernarg loads

This isn't quite a constant pool, but probably close enough for this
purpose. We just need some known invariant value address. The aliasing
queries against the real kernarg base pointer will falsely report
no aliasing, but for invariant memory it probably doesn't matter.
DeltaFile
+216-216llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll
+76-76llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-widen-scalar-loads.mir
+73-73llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-load.mir
+22-9llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+20-7llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+8-8llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-split-scalar-load-metadata.mir
+415-3894 files not shown
+433-39110 files

LLVM/project 5c59c10llvm/include/llvm/IR RuntimeLibcalls.td, llvm/test/Transforms/Util/DeclareRuntimeLibcalls aix.ll

PowerPC: Add vec_malloc functions to AIX in RuntimeLibcalls
DeltaFile
+7-0llvm/test/Transforms/Util/DeclareRuntimeLibcalls/aix.ll
+4-0llvm/include/llvm/IR/RuntimeLibcalls.td
+11-02 files

LLVM/project 7bced74llvm/test/CodeGen/X86 combine-icmp.ll

[X86] combine-icmp.ll - fix copy+paste typo in concat_icmp_v64i8_v16i8 test (#170281)

I changed the condcode for variety but failed to update the constant to prevent constant folding
DeltaFile
+84-12llvm/test/CodeGen/X86/combine-icmp.ll
+84-121 files

LLVM/project 9ba5fa2llvm/test/Analysis/Delinearization validation_large_size.ll

[Delinearization] Add test for inferred array size exceeds integer range (NFC) (#169048)

Add test cases where the delinearized arrays may not satisfy the
following "common" property:

`&A[I_1][I_2]...[I_n] == &A[J_1][J_2]...[J_n]` iff
`(I_1, I_2, ..., I_n) == (J_1, J_2, ..., J_n)`

The root cause of this issue is that the inferred array size is too
large and the offset calculation overflows.
Such results should be discarded during validation. This will be fixed
by #169902 .
DeltaFile
+140-0llvm/test/Analysis/Delinearization/validation_large_size.ll
+140-01 files

LLVM/project 3098bfeclang/docs ReleaseNotes.rst, llvm/docs ReleaseNotes.md

[llvm][Docs] Add release notes about dwarf fission with relaxations (#169871)

DeltaFile
+3-0clang/docs/ReleaseNotes.rst
+2-0llvm/docs/ReleaseNotes.md
+5-02 files

LLVM/project f741851llvm/include/llvm/Transforms/Utils ARMCommonInstCombineIntrinsic.h, llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp

Revert "[AArch64][ARM] Move ARM-specific InstCombine transforms into `Transforms/Utils` (#169589)"

This reverts commit 1c32b6f51ccaaf9c65be11d7dca9e5a476cddb5a due to failures on
BUILD_SHARED_LIBS builds.
DeltaFile
+0-135llvm/lib/Transforms/Utils/ARMCommonInstCombineIntrinsic.cpp
+104-0llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+0-56llvm/include/llvm/Transforms/Utils/ARMCommonInstCombineIntrinsic.h
+0-14llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+0-13llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+1-1llvm/test/Transforms/InstCombine/ARM/2012-04-23-Neon-Intrinsics.ll
+105-2194 files not shown
+107-22310 files

LLVM/project e8bf011llvm/include/llvm/Transforms/Vectorize LoopVectorizationLegality.h, llvm/lib/Transforms/Vectorize LoopVectorizationLegality.cpp

[LV] Emit better debug and opt-report messages when vectorization is disallowed in the LoopVectorizer (#158513)

While looking into fixing #158499, I found some other cases where the
messages emitted could be improved. This PR improves both the messages
printed to the debug output and the missed-optimization messages in
cases where:

- loop vectorization is explicitly disabled
- loop vectorization is implicitly disabled by disabling all loop
transformations
- loop vectorization is set to happen only where explicitly enabled

A branch that should currently be unreachable is also added. If the
related logic ever breaks (eg. due to changes to getForce() or the
ForceKind enum) this should alert devs and users. New test cases are
also added to verify that the correct messages (and only them) are
outputted.

---------

    [2 lines not shown]
DeltaFile
+94-0llvm/test/Transforms/LoopVectorize/diag-disabled-vectorization-msgs.ll
+24-4llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+9-0llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+127-43 files

LLVM/project 4b6ad11llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h, llvm/test/Transforms/LoopVectorize hoist-predicated-loads-with-predicated-stores.ll

[VPlan] Sink predicated stores with complementary masks. (#168771)

Extend the logic to hoist predicated loads
(https://github.com/llvm/llvm-project/pull/168373) to sink predicated
stores with complementary masks in a similar fashion.

The patch refactors some of the existing logic for legality checks to be
shared between hosting and sinking, and adds a new sinking transform on
top.

With respect to the legality checks, for sinking stores the code also
checks if there are any aliasing stores that may alias, not only loads.

PR: https://github.com/llvm/llvm-project/pull/168771
DeltaFile
+206-203llvm/test/Transforms/LoopVectorize/hoist-predicated-loads-with-predicated-stores.ll
+216-102llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+7-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+1-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+430-3054 files

LLVM/project 753f47dllvm/lib/Target/X86 X86ISelLowering.cpp

[X86] Make VBMI2 funnel shifts use VSHLD/VSHRD for const splats (#169401)

Make ISD::FSHL/FSHR legal on VBMI2 vector targets and convert to VSHLD/VSHRD in a combine

closes #166949
DeltaFile
+48-14llvm/lib/Target/X86/X86ISelLowering.cpp
+48-141 files

LLVM/project 4580350llvm/test/MC/AArch64 seh-large-func-multi-epilog.s seh-packed-unwind.s

[AArch64] [test] Make unwind info tests actually use the right instructions

This makes them match the expected decoding of the unwind info
opcodes, avoiding mismatch indications from "dumpbin -unwindinfo".
DeltaFile
+19-19llvm/test/MC/AArch64/seh-large-func-multi-epilog.s
+2-2llvm/test/MC/AArch64/seh-packed-unwind.s
+1-1llvm/test/MC/AArch64/seh-packed-epilog.s
+22-223 files

LLVM/project 3e5b86cllvm/test/MC/AArch64 seh.s

[AArch64] [test] Write the seh.s test output object to a file

This is what is done in other tests; this makes it easier to
inspect the output of this test manually.
DeltaFile
+2-1llvm/test/MC/AArch64/seh.s
+2-11 files

LLVM/project e50ac8allvm/test/MC/AArch64 seh.s

[AArch64] [test] Move tests for custom unwind opcodes to a separate function

These custom opcodes disable the checker for having the prologue
length actually match the opcodes (see checkARM64Instructions in
MCWin64EH.cpp) - which led to the prologue mismatching the opcodes
by one instruction, since 312d6b488ef9d7c0e4d649827820db7285e36406.

Move the special opcodes to a separate test function.

Remove the mismatched nop instruction at the end of the main
function, as this prologue now is assembled with the strict length
checking enabled.
DeltaFile
+41-22llvm/test/MC/AArch64/seh.s
+41-221 files