LLVM/project 716bd04clang/lib/CIR/CodeGen CIRGenBuiltin.cpp, clang/test/CIR/CodeGen builtins-x86.c

[CIR]Upstream generic intrinsic emission path (#179098)

This PR upstreams the generic intrinsic emission path and tests it for
the rdpmc builtin. The incubator has llvm_unreachable("NYI") when the
intrinsic return type doesn't match. This PR adds the type coercion to
handle that case.
DeltaFile
+189-1clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+42-0clang/test/CIR/CodeGen/builtins-x86.c
+28-0clang/test/CIR/CodeGenBuiltins/X86/rd-builtins.c
+259-13 files

LLVM/project 9a496ddllvm/test/CodeGen/RISCV rv32p.ll rv64p.ll

[RISC] Rename the P extensions srx/slx tests and add fshl/fshr intrinsic tests. NFC (#180984)

I plan to change to the i64 shift lowering on RV32 to use nsrl/nsra
instead of srx. Only fshr will use srx.

We now have shift tests with constant shift amount < XLEN and >= XLEN,
and non-constant shift amount that is fully unknown, known < XLEN, and
known >= XLEN

Assisted-by: claude
DeltaFile
+208-10llvm/test/CodeGen/RISCV/rv32p.ll
+206-8llvm/test/CodeGen/RISCV/rv64p.ll
+414-182 files

LLVM/project d4c7d5bclang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/CodeGen CIRGenBuiltinX86.cpp CIRGenBuilder.h

[CIR] X86 vector masked load builtins (#169464)

Add a new CIR masked load operation and use it to handle X86 masked load builtins.

Part of https://github.com/llvm/llvm-project/issues/167752
DeltaFile
+364-0clang/test/CIR/CodeGenBuiltins/X86/avx512vl-builtins.c
+45-0clang/include/clang/CIR/Dialect/IR/CIROps.td
+27-0clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
+21-0clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+16-0clang/lib/CIR/CodeGen/CIRGenBuilder.h
+473-05 files

LLVM/project 39f2ce3libc/docs configure.rst, libc/shared rpc_util.h rpc_dispatch.h

[libc] Cleanup RPC helpers and comments

Summary:
Mostly NFC, replaced some inconsistent comments and replaces `class`
with `typename` to be consistent. Also fix incomplete type detection I
forgot to merge in the RPC helper PR.
DeltaFile
+41-29libc/shared/rpc_util.h
+12-9libc/shared/rpc_dispatch.h
+6-5libc/shared/rpc.h
+1-1libc/docs/configure.rst
+60-444 files

LLVM/project 9f85fa2clang/lib/Driver/ToolChains Clang.cpp, clang/test/Driver offload.f90

[Flang] Fix finding the Flang runtime for the GPU (#180971)

Summary:
We were looking for `flang_rt.builtins` instead of `flang_rt.runtime`.
Also adds a test so we know that it actually works.
DeltaFile
+7-3clang/lib/Driver/ToolChains/Clang.cpp
+5-0clang/test/Driver/offload.f90
+0-0clang/test/Driver/Inputs/resource_dir_with_per_target_subdir/lib/amdgcn-amd-amdhsa/libflang_rt.runtime.a
+12-33 files

LLVM/project 0991d7blibunwind/src Unwind-seh.cpp

[libunwind] Fix building with EXCEPTION_DISPOSITION as enum (#180513)

On Windows, libunwind is normally only built in mingw mode; it's not
relevant for MSVC targets. (Attempting to build it is entirely blocked
in CMake, see [1]).

In mingw headers, the type EXCEPTION_DISPOSITION is defined as an int
while it is an enum in the MSVC vcruntime headers.

However, in addition to the canonical enum members, we also use a value
which has no assigned enum member. In older mingw-w64 headers, there was
a define for this value, 4, with the name ExceptionExecuteHandler, but
that was removed in [2]. The libunwind code reference this value just by
the literal value 4, with a comment referencing it as
ExceptionExecuteHandler.

This extra enum value isn't passed to the outside Microsoft runtime, but
is only used to pass values from one part of libunwind to another; this
handling is necessary for the forced_unwind{1-3}.pass.cpp tests to

    [24 lines not shown]
DeltaFile
+6-7libunwind/src/Unwind-seh.cpp
+6-71 files

LLVM/project e22db5cmlir/lib/Bindings/Python stubgen_runner.py, utils/bazel/llvm-project-overlay/mlir stubgen_runner.py BUILD.bazel

[MLIR] [Bazel] Moved stubgen_runner.py out of llvm-project-overlay (#181029)

It was checked into the overlay by accident.
DeltaFile
+54-0mlir/lib/Bindings/Python/stubgen_runner.py
+0-54utils/bazel/llvm-project-overlay/mlir/stubgen_runner.py
+1-1utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+55-553 files

LLVM/project c4f74a1lldb/source/Host/windows/PythonPathSetup CMakeLists.txt

[lldb][windows] fix link issue when building with dylibs (#180976)

Fix a link issue which was introduced by
https://github.com/llvm/llvm-project/pull/179306 when building with
dylibs (with MinGW).

LLVMSupport is not needed by `PythonPathSetup`. It's safe to remove it.
DeltaFile
+0-3lldb/source/Host/windows/PythonPathSetup/CMakeLists.txt
+0-31 files

LLVM/project a816f92compiler-rt/lib/scudo/standalone secondary.h primary32.h, compiler-rt/lib/scudo/standalone/include/scudo interface.h

[scudo] Add new fast purge option. (#175266)

This adds a new option to do a faster of a purge.

When doing a release to OS due to a purge call, if another thread is
also doing a release, the call can be blocked while that operation
concludes. In some cases, code wants a fast version that releases as
fast as possible and the call will not block.

For example, on Android, when destroying a Bitmap a purge occurs to save
memory. But this can cause some jank if the purge takes too long.

In the future, I envision that this option will also do a calculation to
stop purging after some cutoff value to avoid being blocked in this call
for too long.
DeltaFile
+13-4compiler-rt/lib/scudo/standalone/secondary.h
+11-2compiler-rt/lib/scudo/standalone/primary32.h
+11-2compiler-rt/lib/scudo/standalone/primary64.h
+11-0compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
+4-2compiler-rt/lib/scudo/standalone/common.h
+4-0compiler-rt/lib/scudo/standalone/include/scudo/interface.h
+54-101 files not shown
+57-107 files

LLVM/project 0bab981clang/test/CIR/CodeGen builtin-floating-point.c, llvm/test/CodeGen/AMDGPU fptoi.i128.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+5,835-5,584llvm/test/tools/llvm-dwarfdump/X86/simplified-template-names.s
+80-2,209llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops-large-matrixes.ll
+2,212-0clang/test/CIR/CodeGen/builtin-floating-point.c
+470-1,417llvm/test/CodeGen/AMDGPU/fptoi.i128.ll
+1,656-0llvm/test/Transforms/SLPVectorizer/AArch64/aarch64-vector-functions.ll
+0-1,411llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll
+10,253-10,6211,226 files not shown
+58,618-31,8641,232 files

LLVM/project e039949clang/test/CIR/CodeGen builtin-floating-point.c, llvm/test/CodeGen/AMDGPU fptoi.i128.ll

reb

Created using spr 1.3.7
DeltaFile
+5,835-5,584llvm/test/tools/llvm-dwarfdump/X86/simplified-template-names.s
+80-2,209llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops-large-matrixes.ll
+2,212-0clang/test/CIR/CodeGen/builtin-floating-point.c
+470-1,417llvm/test/CodeGen/AMDGPU/fptoi.i128.ll
+1,656-0llvm/test/Transforms/SLPVectorizer/AArch64/aarch64-vector-functions.ll
+0-1,411llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll
+10,253-10,6211,226 files not shown
+58,619-31,8651,232 files

LLVM/project cfbb56bclang/test/CIR/CodeGen builtin-floating-point.c, llvm/test/CodeGen/AMDGPU fptoi.i128.ll

reb

Created using spr 1.3.7
DeltaFile
+5,835-5,584llvm/test/tools/llvm-dwarfdump/X86/simplified-template-names.s
+80-2,209llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops-large-matrixes.ll
+2,212-0clang/test/CIR/CodeGen/builtin-floating-point.c
+470-1,417llvm/test/CodeGen/AMDGPU/fptoi.i128.ll
+1,656-0llvm/test/Transforms/SLPVectorizer/AArch64/aarch64-vector-functions.ll
+0-1,411llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll
+10,253-10,6211,226 files not shown
+58,618-31,8641,232 files

LLVM/project aab5d1aclang/test/CIR/CodeGen builtin-floating-point.c, llvm/test/CodeGen/AMDGPU fptoi.i128.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+5,835-5,584llvm/test/tools/llvm-dwarfdump/X86/simplified-template-names.s
+80-2,209llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops-large-matrixes.ll
+2,212-0clang/test/CIR/CodeGen/builtin-floating-point.c
+470-1,417llvm/test/CodeGen/AMDGPU/fptoi.i128.ll
+1,656-0llvm/test/Transforms/SLPVectorizer/AArch64/aarch64-vector-functions.ll
+0-1,411llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll
+10,253-10,6211,226 files not shown
+58,616-31,8621,232 files

LLVM/project 97e250cclang/lib/Analysis/FlowSensitive/Models UncheckedStatusOrAccessModel.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+41-41clang/lib/Analysis/FlowSensitive/Models/UncheckedStatusOrAccessModel.cpp
+41-411 files

LLVM/project 5c114d8cross-project-tests/dtlto path.test

[TEST][FIX] Fix typo in tool name: 'llvm-ar'
DeltaFile
+2-2cross-project-tests/dtlto/path.test
+2-21 files

LLVM/project 59238f6utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[mlir] Fix #180988: Add GPUDialect and DataLayoutInterfaces to OpenACC related dependencies (#181027)

DeltaFile
+3-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+3-01 files

LLVM/project 1268e76lldb/include/lldb/Symbol Variable.h, lldb/source/Plugins/SymbolFile/DWARF SymbolFileDWARF.cpp

[LLDB][NFCI] Teach LLDB to read the DW_AT_LLVM_tag_offset attribute for variables (#181011)

LLVM added support for emitting the tagging offset of a variable for
hwasan/memtag-stack using the DW_AT_LLVM_tag_offset attribute in
dabd262. This patch teaches LLDB to read this attribute.
DeltaFile
+8-1lldb/include/lldb/Symbol/Variable.h
+5-1lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+2-2lldb/source/Symbol/Variable.cpp
+15-43 files

LLVM/project 5710e41cross-project-tests lit.site.cfg.py.in, cross-project-tests/dtlto path.test

[DTLTO][Windows] Expand short 8.3 form paths in ThinLTO module IDs (#178303)

Windows supports short 8.3 form filenames (e.g. `compile_commands.json`
-> `COMPIL~1.JSO`) for legacy reasons. See:
https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#short-vs-long-names.

Short 8.3 form paths are undesirable in distributed compilation
scenarios because they are local mappings tied to a specific directory
layout on a specific machine. As a result, they can break or defeat
sandboxing and path-based isolation mechanisms used by distributed build
systems.

We have observed such failures with DTLTO even in simple scenarios. For
example, on Windows, running:

```
clang.exe hello.c -flto=thin -fuse-ld=lld -fthinlto-distributor=fastbuild.exe -###
```


    [29 lines not shown]
DeltaFile
+92-0cross-project-tests/dtlto/path.test
+59-9llvm/lib/DTLTO/DTLTO.cpp
+14-1cross-project-tests/lit.site.cfg.py.in
+6-1llvm/include/llvm/DTLTO/DTLTO.h
+171-114 files

LLVM/project 524ae2fmlir/include/mlir/Dialect/Linalg/IR LinalgInterfaces.h, mlir/lib/Dialect/Linalg/IR LinalgInterfaces.cpp

[mlir][linalg] Make conv dim inference return pairing (outputImage, filterLoop) (#180859)

The original method sorts all the dimensions which loses the information
about pairing. It makes other transformation that works on generic op
form harder. The revision ensures the pairing, so callers have more
useful information when they work on transformations.

---------

Signed-off-by: hanhanW <hanhan0912 at gmail.com>
DeltaFile
+173-0mlir/unittests/Dialect/Linalg/InferConvolutionDimsTest.cpp
+26-10mlir/lib/Dialect/Linalg/IR/LinalgInterfaces.cpp
+11-0mlir/unittests/Dialect/Linalg/CMakeLists.txt
+6-2mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.h
+1-0mlir/unittests/Dialect/CMakeLists.txt
+217-125 files

LLVM/project fc64868llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 vec_list_bias_external_insert_shuffled.ll vec_list_bias-inseltpoison.ll

[SLP]Add external uses estimations into tree throttling

Added basic estimations for the external uses, when calculating the cost
of the non-profitable trees. Excluding stores/insertelement, as thay are
very good candidates for the vectorization. Also, tuned
buildvector/gather cost with minimum bitwidth analysis data.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/178024
DeltaFile
+111-26llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+12-14llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias_external_insert_shuffled.ll
+12-13llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll
+12-13llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll
+2-1llvm/test/Transforms/SLPVectorizer/X86/gathered-loads-non-full-reg.ll
+149-675 files

LLVM/project da6e301lldb/tools/lldb-dap EventHelper.cpp

[lldb-dap] Adjusting multi-stopped event order. (#181001)

When multiple stopped events are detected we should send the
`"allThreadsStopped":true` last.

Currently, if there are multiple stopped threads and we attempt to step
around the 'allThreadsStopped' ends up with multiple stops highlighted
in the UI.

Reporting the focused thread last fixes this while still correctly
updating the thread state of all stopped threads.

This fixes an issue reported in
https://github.com/llvm/llvm-project/pull/176273#discussion_r2775979486
DeltaFile
+29-17lldb/tools/lldb-dap/EventHelper.cpp
+29-171 files

LLVM/project 0deb1b6offload/plugins-nextgen/level_zero/src L0DynWrapper.cpp

[Offload] Try to load Level Zero loader with version suffix (#180042)

The default Level Zero loader `libze_loader.so` may not be available on
systems that don't have Level Zero development package. Level Zero
loaders with major version suffix are searched in that case.
DeltaFile
+28-2offload/plugins-nextgen/level_zero/src/L0DynWrapper.cpp
+28-21 files

LLVM/project 1919b3bllvm/lib/Target/SPIRV SPIRVGlobalRegistry.cpp, llvm/test/CodeGen/SPIRV single-element-vector.ll

[SPIRV] Scalarize single-element vectors in type creation (#180735)

SPIR-V requires vectors to have at least 2 components. So treat <1 x T>
as T.

Fixes: https://github.com/llvm/llvm-project/issues/171175
DeltaFile
+53-0llvm/test/CodeGen/SPIRV/single-element-vector.ll
+6-0llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.cpp
+59-02 files

LLVM/project 79c281acompiler-rt/lib/asan asan_allocator.cpp, compiler-rt/test/asan/TestCases/Windows rtlsizeheap_zero.cpp

[compiler-rt][ASan][Windows] Fix false positive for zero sized rtl allocations
DeltaFile
+107-0compiler-rt/test/asan/TestCases/Windows/rtlsizeheap_zero.cpp
+44-0compiler-rt/lib/asan/asan_allocator.cpp
+151-02 files

LLVM/project 612ddd8llvm/utils/gn/secondary/llvm/unittests/Transforms/Vectorize BUILD.gn

[gn build] Port 0215f6b6cf81
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/unittests/Transforms/Vectorize/BUILD.gn
+1-01 files

LLVM/project 08f131dcompiler-rt/lib/asan asan_allocator.cpp, compiler-rt/test/asan/TestCases/Windows rtlsizeheap_zero.cpp

[compiler-rt][ASan][Windows] Fix false positive for zero sized rtl allocations
DeltaFile
+107-0compiler-rt/test/asan/TestCases/Windows/rtlsizeheap_zero.cpp
+44-0compiler-rt/lib/asan/asan_allocator.cpp
+151-02 files

LLVM/project d3afa17llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/X86 cost-model.ll

[LV] Don't scalarize loads that need predication in legacy CM.

The legacy cost model tries to scalarize loads that are used as
pointers. Skip if the load would need predicating when scalarized,
because that would incur very high costs, see useEmulatedMaskMemRefHack.

Fixes https://github.com/llvm/llvm-project/issues/180780.
DeltaFile
+83-0llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+3-3llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+86-32 files

LLVM/project 0215f6bllvm/include/llvm/Analysis DominanceFrontier.h DominanceFrontierImpl.h, llvm/include/llvm/CodeGen MachineDominanceFrontier.h

[DominanceFrontier] Support post-dominators on graphs with single root (#179336)

I plan to use that to optimize mask creation in VPlan predicator by
`or`ing edge masks from the post-dominance frontier instead of all
predecessors in a subsequent patch. Note that it would require to use
the same unmodified post-dom tree for *all* the basic blocks in a VPlan
that is already limited to a particular loopnest so the algorithmic
complexity concerns behind the "deprecation" notice in the beggining of
`DominanceFrontier.h` (and also discussion in the
https://discourse.llvm.org/t/dominance-frontiers/21755 thread) don't
apply for my use case (at least to the best of my understanding).

The change here is to properly use graph-traits for children traversal
plus inline `ForwardDominanceFrontierBase` into `DominanceFrontierBase` now 
that it's used for post-dom-frontier.

Since the only planned use-case is in the vectorizer, I'm adding a
VPlan-base unittest along with this change.


    [2 lines not shown]
DeltaFile
+94-0llvm/unittests/Transforms/Vectorize/VPPostDomFrontierTest.cpp
+18-26llvm/include/llvm/Analysis/DominanceFrontier.h
+6-6llvm/include/llvm/Analysis/DominanceFrontierImpl.h
+4-5llvm/include/llvm/CodeGen/MachineDominanceFrontier.h
+0-2llvm/lib/CodeGen/MachineDominanceFrontier.cpp
+0-1llvm/lib/Analysis/DominanceFrontier.cpp
+122-401 files not shown
+123-407 files

LLVM/project c6329a3llvm/include/llvm/Transforms/Utils MemoryTaggingSupport.h, llvm/lib/Target/AArch64 AArch64StackTagging.cpp

[NFC] [MemoryTagging] pass AllocaInfo to isStandardLifetime (#180311)

DeltaFile
+7-8llvm/lib/Transforms/Utils/MemoryTaggingSupport.cpp
+2-4llvm/include/llvm/Transforms/Utils/MemoryTaggingSupport.h
+1-2llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
+1-2llvm/lib/Target/AArch64/AArch64StackTagging.cpp
+11-164 files

LLVM/project d518d49lldb/source/Target Process.cpp

[LLDB]Move clean up to dtor (#181010)

I attempted to cleanup the raw ptr in pr/180996. But the cleanup should
have been done in the dtor, not `StopPrivateStateThread` so moving it
there.
DeltaFile
+2-2lldb/source/Target/Process.cpp
+2-21 files