LLVM/project 35ebb8cllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp, llvm/test/CodeGen/AMDGPU fptosi-sat-vector.ll fptoui-sat-vector.ll

[AMDGPU] Saturate at i16 for f16 to i1/i8 conversion (#187467)

By using a native `v_cvt_i16/u16_f16` conversion and saturation at `i16`
we avoid additional `f16` to `f32` conversion that is required to
perform saturation at `i32`. It also allows to perform clamping using
`i16` instructions, reducing number of registers needed in *true16* mode
in some of the lit tests. The behavior is disabled for pre-gfx8 targets
by checking `has16BitInsts()`.
DeltaFile
+313-433llvm/test/CodeGen/AMDGPU/fptosi-sat-vector.ll
+314-430llvm/test/CodeGen/AMDGPU/fptoui-sat-vector.ll
+37-21llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+22-33llvm/test/CodeGen/AMDGPU/fptosi-sat-scalar.ll
+20-30llvm/test/CodeGen/AMDGPU/fptoui-sat-scalar.ll
+706-9475 files

LLVM/project bda702cllvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h

review: address suggestion on hasDivergence flag
DeltaFile
+26-16llvm/include/llvm/ADT/GenericUniformityImpl.h
+0-3llvm/include/llvm/ADT/GenericUniformityInfo.h
+26-192 files

LLVM/project da8d0abflang/test/Lower/Intrinsics merge.f90 minloc.f90

[flang][NFC] Converted five tests from old lowering to new lowering (part 36) (#187628)

Tests converted from test/Lower/Intrinsics: maxloc.f90, maxval.f90,
merge.f90, merge_bits.f90, minloc.f90
DeltaFile
+62-40flang/test/Lower/Intrinsics/merge.f90
+35-58flang/test/Lower/Intrinsics/minloc.f90
+35-58flang/test/Lower/Intrinsics/maxloc.f90
+33-36flang/test/Lower/Intrinsics/maxval.f90
+45-24flang/test/Lower/Intrinsics/merge_bits.f90
+210-2165 files

LLVM/project 19b0c68llvm/lib/Transforms/Vectorize LoopVectorize.cpp LoopVectorizationPlanner.h, llvm/test/Transforms/LoopVectorize/AArch64 transform-narrow-interleave-to-widen-memory-epilogue-vec.ll

[VPlan] Skip epilogue vectorization if dead after narrowing IGs. (#187016)

When narrowing interleave groups, the main vector loop processes IC
iterations instead of VF * IC. Update selectEpilogueVectorizationFactor
to use the effective VF, checking if the canonical IV controlling the
loop now steps by UF instead of VFxUF.

This avoids epilogue vectorization with dead epilogue vector loops and
also prevents crashes in cases where we can prove both the epilogue and
scalar loop are dead.

Fixes https://github.com/llvm/llvm-project/issues/186846

PR: https://github.com/llvm/llvm-project/pull/187016
DeltaFile
+39-18llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+53-0llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-epilogue-vec.ll
+2-2llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+94-203 files

LLVM/project 2600c72libc/src/__support/File file.cpp

[libc][NFC] Fix typo in file.cpp (#91192) (#187688)

Corrected language and spelling errors in a comment within file.cpp.

Credit GH user @iBlanket for identifying this typo.
DeltaFile
+1-1libc/src/__support/File/file.cpp
+1-11 files

LLVM/project 5712b1cllvm/lib/Target/AMDGPU DSInstructions.td, llvm/test/MC/AMDGPU gfx13_asm_vds.s gfx13_asm_vds_alias.s

Add VDS encoding for gfx13
DeltaFile
+1,987-0llvm/test/MC/AMDGPU/gfx13_asm_vds.s
+179-149llvm/lib/Target/AMDGPU/DSInstructions.td
+147-0llvm/test/MC/AMDGPU/gfx13_asm_vds_alias.s
+2,313-1493 files

LLVM/project a6a3433clang/lib/StaticAnalyzer/Core SimpleSValBuilder.cpp, clang/test/Analysis allow-equality-of-stack-and-symbolic-ptr.c ptr-arith.c

[analyzer] Don't rule out symbolic pointer pointing to stack (#187080)

Ensure that the analyzer doesn't rule out the equality (or guarantee
disequality) of a pointer to the stack and a symbolic pointer in unknown
space. Previously the analyzer incorrectly assumed that stack pointers
cannot be equal to symbolic pointers in unknown space.

It is true that functions cannot validly return pointers to their own
stack frame, but they can easily return a pointer to some other stack
frame (e.g. a function can return a pointer recieved as an argument).

The old behavior was introduced intentionally in 2012 by commit
3563fde6a02c2a75d0b4ba629d80c5511056a688, but it causes incorrect
analysis, e.g. it prevents the correct handling of some testcases from
the Juliet suite because it rules out the "fgets succeeds" branch.

Reported-by: Daniel Krupp <daniel.krupp at ericsson.com>
DeltaFile
+61-0clang/test/Analysis/allow-equality-of-stack-and-symbolic-ptr.c
+2-6clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+1-1clang/test/Analysis/ptr-arith.c
+64-73 files

LLVM/project bdc8d92clang/lib/Headers/llvm_libc_wrappers ctype.h string.h

[OFFLOAD] Add GPU wrappers for headers currently supported by SPIRV built libc (#181913)

This is to add GPU wrappers for headers that are currently supported by
libc built for SPIRV.
DeltaFile
+3-2clang/lib/Headers/llvm_libc_wrappers/ctype.h
+3-2clang/lib/Headers/llvm_libc_wrappers/string.h
+6-42 files

LLVM/project 1dfd268llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/AArch64 vector-reverse.ll

[VPlan] Simplify mul x, -1 -> sub 0, x (#187551)

Simplify exactly as InstCombine does. A follow-up would include
simplifying add x, (sub 0, y) -> sub x, y.

Alive2 proof: https://alive2.llvm.org/ce/z/Af7QiD
DeltaFile
+16-16llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+5-5llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll
+9-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+4-4llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-interleave.ll
+3-3llvm/test/Transforms/LoopVectorize/RISCV/dbg-tail-folding-by-evl.ll
+2-2llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse.ll
+39-303 files not shown
+42-339 files

LLVM/project b6accfallvm/test/Transforms/LoopVectorize induction-ptrcasts.ll

[LV] Regen induction-ptrcasts test with UTC (NFC) (#187678)
DeltaFile
+51-13llvm/test/Transforms/LoopVectorize/induction-ptrcasts.ll
+51-131 files

LLVM/project b2da42allvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AArch64 revec-reductions.ll

Oulined, fix PHIs and split vector nodes

Created using spr 1.3.7
DeltaFile
+120-40llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+49-19llvm/test/Transforms/SLPVectorizer/AArch64/revec-reductions.ll
+169-592 files

LLVM/project 39d6bb2lldb/source/Plugins/SymbolLocator/SymStore SymbolLocatorSymStore.cpp, lldb/test/API/symstore TestSymStore.py TestSymStoreLocal.py

[lldb] Add HTTP support in SymbolLocatorSymStore (#186986)

The initial version of SymbolLocatorSymStore supported servers only on
local paths. This patch extends it to HTTP/HTTPS end-points. For that to
work on Windows, we add a WinHTTP-based HTTP client backend in LLVM next
to the existing CURL-based implementation.

We don't add a HTTP server implementation, because there is no use right
now. Test coverage for the LLVM part is built on llvm-debuginfod-find
and works server-less, since it checks textual output of request
headers. The existing CURL-based implementation uses the same approach.
The LLDB API test for the specific SymbolLocatorSymStore feature spawns
a HTTP server from Python.

To keep the size of this patch within reasonable limits, the initial
implementation of the SymbolLocatorSymStore feature is dump: There is no
caching, no verification of downloaded files and no protection against
file corruptions. We use a local implementation of LLVM's
HTTPResponseHandler, but should think about extracting and reusing

    [9 lines not shown]
DeltaFile
+246-13llvm/lib/Support/HTTP/HTTPClient.cpp
+146-7lldb/source/Plugins/SymbolLocator/SymStore/SymbolLocatorSymStore.cpp
+153-0lldb/test/API/symstore/TestSymStore.py
+0-117lldb/test/API/symstore/TestSymStoreLocal.py
+0-33llvm/test/tools/llvm-debuginfod-find/headers.test
+33-0llvm/test/tools/llvm-debuginfod-find/headers-curl.test
+578-1704 files not shown
+617-17210 files

LLVM/project 22f5b8dlibclc/clc/lib/generic/math clc_acos.inc clc_acos.cl

libclc: Update acos (#187666)

This was originally ported from rocm device libs in
efeafa1bdaa715733fc100bcd9d21f93c7272368, merge in more
recent changes.
DeltaFile
+114-105libclc/clc/lib/generic/math/clc_acos.inc
+2-0libclc/clc/lib/generic/math/clc_acos.cl
+116-1052 files

LLVM/project a021a93clang/include/clang/Analysis MacroExpansionContext.h, clang/lib/Analysis MacroExpansionContext.cpp

Revert "Reapply [clang][analyzer] Format macro expansions" (#186614)

This reverts commit 6bc779506107d3a2f3cbf266621fc19649f593c5 (#172479)
Some concerns were raised on discourse:

https://discourse.llvm.org/t/can-we-link-clang-format-into-clanganalysis/89014/7

To be on the safe side, let's revert this for now.
DeltaFile
+134-70clang/test/Analysis/plist-macros-with-expansion.cpp
+0-37clang/unittests/Analysis/MacroExpansionContextTest.cpp
+0-30clang/lib/Analysis/MacroExpansionContext.cpp
+0-11clang/include/clang/Analysis/MacroExpansionContext.h
+2-2clang/test/Analysis/plist-macros-with-expansion-ctu.c
+2-2clang/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
+138-1522 files not shown
+139-1558 files

LLVM/project 214bc4dllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 avx10_2fptosi_satcvtds.ll avx10_2_512fptosi_satcvtds.ll

[X86][AVX10.2] Canonicalize narrow FP_TO_{S,U}INT_SAT (#186786)

When SatWidth < DstWidth, type legalization left narrow SatVT in
carrier-width nodes.

Example:
v8i32 = fp_to_sint_sat v8f32, sat=i24

Canonicalize narrow SatVT forms on AVX10.2.
Preserve existing legal full-width lowering.

Rewrite narrow SatVT forms to full-width sat + clamp.

Results:
v8i32 = fp_to_sint_sat v8f32, sat=i32
v8i32 = smax ..., min_i24
v8i32 = smin ..., max_i24

Avoid scalar i48 isel failures.

    [2 lines not shown]
DeltaFile
+408-0llvm/test/CodeGen/X86/avx10_2fptosi_satcvtds.ll
+137-3llvm/test/CodeGen/X86/avx10_2_512fptosi_satcvtds.ll
+46-16llvm/lib/Target/X86/X86ISelLowering.cpp
+591-193 files

LLVM/project 277bd13clang/lib/StaticAnalyzer/Core CallEvent.cpp

[analyzer] Fix logic in CallEvent::getReturnValueUnderConstruction (#187020)

The `CallEvent` has data members that store the `LocationContext` and
the `CFGElementRef` (i.e. `CFGBlock` + index of statement within that
block); but the method `getReturnValueUnderConstruction` ignored these
and used the currently analyzed `LocationContext` and `CFGBlock` instead
of them.

This was logically incorrect and would have caused problems if the
`CallEvent` was used later when the "currently analyzed" things are
different. However, the lit tests do pass even if I assert that the
currently analyzed `LocationContext` and `CFGBlock` is the same as the
ones saved in the `CallEvent`, so I'm pretty sure that there was no
actual problem caused by this bad logic and this commit won't cause
functional changes.

I also evaluated this change on a set of open source projects (postgres,
tinyxml2, libwebm, xerces, bitcoin, protobuf, qtbase, contour, openrct2)
and validated that it doesn't change the results of the analysis.
DeltaFile
+4-6clang/lib/StaticAnalyzer/Core/CallEvent.cpp
+4-61 files

LLVM/project 73f9769libc/src/__support/CPP iterator.h

fix iterator
DeltaFile
+1-3libc/src/__support/CPP/iterator.h
+1-31 files

LLVM/project e8ac71elibc/src/__support/CPP iterator.h

fix iterator
DeltaFile
+1-3libc/src/__support/CPP/iterator.h
+1-31 files

LLVM/project 334ab49llvm/lib/CodeGen CodeGenPrepare.cpp, llvm/test/Transforms/CodeGenPrepare/AArch64 ptrauth.ll

[CGP][PAC] Flip PHI and blends when all immediate modifiers are the same

GVN PRE, SimplifyCFG and possibly other passes may hoist the call to
`@llvm.ptrauth.blend` intrinsic, introducing multiple duplicate call
instructions hidden behind a PHI node. This prevents the instruction
selector from generating safer code by absorbing the address and
immediate modifiers into separate operands of AUT, PAC, etc. pseudo
instruction.

This patch makes CodeGenPrepare pass detect when discriminator is
computed as a PHI node with all incoming values being blends with the
same immediate modifier. Each such discriminator value is replaced by a
single blend, whose address argument is computed by a PHI node.
DeltaFile
+142-0llvm/test/Transforms/CodeGenPrepare/AArch64/ptrauth.ll
+75-0llvm/lib/CodeGen/CodeGenPrepare.cpp
+217-02 files

LLVM/project 172c0bbclang-tools-extra/clang-tidy/tool check_alphabetical_order_test.py check_alphabetical_order.py

[clang-tidy] Fix alphabetical order check for multiline doc entries and whitespace handling (#186950)

The `check_alphabetical_order.py` script previously only scanned the
first line of each bullet point in `ReleaseNotes.rst`, causing sorting
failures when a `:doc:` tag was split across multiple lines.

Also, when it is sorting the last entry of a section, the script will
insert an unnecessary whitespace.

This PR fixes these two problems.
DeltaFile
+43-3clang-tools-extra/clang-tidy/tool/check_alphabetical_order_test.py
+14-6clang-tools-extra/clang-tidy/tool/check_alphabetical_order.py
+57-92 files

LLVM/project 03cd306libc/utils/wctype_utils gen.py

remove flag
DeltaFile
+1-1libc/utils/wctype_utils/gen.py
+1-11 files

LLVM/project 66bc565utils/bazel/llvm-project-overlay/mlir/python BUILD.bazel

[BAZEL] Add missing affine python enum gen (#187669)
DeltaFile
+10-4utils/bazel/llvm-project-overlay/mlir/python/BUILD.bazel
+10-41 files

LLVM/project 7410a49llvm/test/CodeGen/AArch64 ptrauth-isel.ll

[AArch64][PAC] Precommit ptrauth-isel.ll tests on calls and tail calls
DeltaFile
+209-0llvm/test/CodeGen/AArch64/ptrauth-isel.ll
+209-01 files

LLVM/project 2d4b569llvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64InstrInfo.td, llvm/lib/Target/AArch64/GISel AArch64CallLowering.cpp

[AArch64][PAC] Rework discriminator analysis for calls and tail calls

Make use of fixupBlendComponents for AUTH_TCRETURN[_BTI] and for
BLRA[_RVMARKER] pseudos the same way it is done for AUT/PAC/AUTPAC.

This patch unifies discriminator analysis for DAGISel and GlobalISel
and improves cross-BB analysis in case of DAGISel.
DeltaFile
+18-41llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+23-16llvm/test/CodeGen/AArch64/ptrauth-isel.ll
+6-18llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
+3-1llvm/lib/Target/AArch64/AArch64InstrInfo.td
+2-2llvm/test/CodeGen/AArch64/ptrauth-call.ll
+52-785 files

LLVM/project df3b24alibc/utils/wctype_utils gen.py, libc/utils/wctype_utils/conversion hex_writer.py

format
DeltaFile
+4-5libc/utils/wctype_utils/gen.py
+1-2libc/utils/wctype_utils/conversion/hex_writer.py
+5-72 files

LLVM/project 2a19234llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 deactivation-symbols.ll

Move handling of COPY $xzr here from PR147136
DeltaFile
+4-8llvm/test/CodeGen/AArch64/deactivation-symbols.ll
+6-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+10-82 files

LLVM/project 21f439fllvm/lib/Passes PassBuilder.cpp PassRegistry.def, llvm/lib/Transforms/Scalar LoopRotation.cpp

[LoopRotate] Use SCEV exit counts to improve rotation profitability (#187483)

Most loop transformations, like unrolling and vectorization, expect the
latch branch to be countable. Allow rotation, if it turns the latch from
uncountable to countable.

This use SCEV to check for countable exits, if CheckExitCount set.
Currently it is not set for the LPM1 run (where SCEV is not used by
other passes), only in LPM.

With that compile-time impact is mostly neutral

https://llvm-compile-time-tracker.com/compare.php?from=eba342d0ba930a404a026c80aada51c43974f0db&to=2e676337b45fae63ce9498116d8e6e43772363c5&stat=instructions:u

ClamAV is consistently slower (~+0.15%) and 7zip faster in most cases
(~-0.13%)

Across a large test set based on C/C++ workloads, this rotates ~0.8%
more loops with ~2.68M rotated loops.

    [16 lines not shown]
DeltaFile
+164-0llvm/test/Transforms/PhaseOrdering/AArch64/loop-rotate-to-enable-unrolling-and-vectorization.ll
+155-0llvm/test/Transforms/LoopRotate/rotate-exitcount.ll
+20-9llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+12-6llvm/lib/Transforms/Scalar/LoopRotation.cpp
+12-4llvm/lib/Passes/PassBuilder.cpp
+5-3llvm/lib/Passes/PassRegistry.def
+368-224 files not shown
+376-2810 files

LLVM/project 1083efclibc/utils/wctype_utils gen.py, libc/utils/wctype_utils/conversion hex_writer.py

format
DeltaFile
+2-1libc/utils/wctype_utils/conversion/hex_writer.py
+2-1libc/utils/wctype_utils/gen.py
+4-22 files

LLVM/project 14de6dallvm/lib/Target/SPIRV SPIRVAsmPrinter.cpp SPIRVModuleAnalysis.h, llvm/test/CodeGen/SPIRV/transcoding GlobalVarAnnotate.ll

[SPIR-V] Support global variable annotations in llvm.global.annotations (#187241)

SPIR-V backend previously only supported function annotations in
llvm.global.annotations and crashed with a fatal error when encountering
global variable entries
DeltaFile
+23-0llvm/test/CodeGen/SPIRV/transcoding/GlobalVarAnnotate.ll
+6-8llvm/lib/Target/SPIRV/SPIRVAsmPrinter.cpp
+5-5llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.h
+5-4llvm/lib/Target/SPIRV/SPIRVMCInstLower.cpp
+6-3llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
+45-205 files

LLVM/project 658bed5libc/src/__support/wctype perfect_hash_map.h lower_to_upper.h, libc/utils/wctype_utils/conversion hex_writer.py

[libc][wctype] Add perfect hash map for conversion functions
DeltaFile
+876-0libc/src/__support/wctype/perfect_hash_map.h
+568-0libc/src/__support/wctype/lower_to_upper.h
+553-0libc/src/__support/wctype/upper_to_lower.h
+0-400libc/src/__support/wctype/lower_to_upper.inc
+0-390libc/src/__support/wctype/upper_to_lower.inc
+71-1libc/utils/wctype_utils/conversion/hex_writer.py
+2,068-7918 files not shown
+2,256-79714 files