LLVM/project e3aad30lldb/include/lldb/Target Process.h, lldb/source/Target Process.cpp ThreadPlanStepOverBreakpoint.cpp

[lldb] Implement delayed breakpoints
DeltaFile
+83-9lldb/source/Target/Process.cpp
+28-1lldb/include/lldb/Target/Process.h
+4-2lldb/source/Target/ThreadPlanStepOverBreakpoint.cpp
+5-0lldb/source/Target/TargetProperties.td
+120-124 files

LLVM/project a1dcec4lldb/include/lldb/Breakpoint BreakpointSite.h, lldb/include/lldb/Target Process.h

[lldb][NFC] Move BreakpointSite::IsEnabled/SetEnabled into Process

The Process class is the one responsible for managing the state of a
BreakpointSite inside the process. As such, it should be the one
answering questions about the state of the site.
DeltaFile
+23-29lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+6-14lldb/include/lldb/Breakpoint/BreakpointSite.h
+12-4lldb/source/Target/Process.cpp
+10-0lldb/include/lldb/Target/Process.h
+5-5lldb/source/Plugins/Process/MacOSX-Kernel/ProcessKDP.cpp
+8-0lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+64-525 files not shown
+72-6111 files

LLVM/project 38a505alldb/source/Plugins/Process/gdb-remote GDBRemoteCommunicationClient.cpp GDBRemoteCommunicationClient.h

[lldb][GDBRemote] Parse MultiBreakpoint+ capability
DeltaFile
+10-0lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
+3-0lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.h
+13-02 files

LLVM/project 8265e86lldb/include/lldb/Utility StringExtractorGDBRemote.h, lldb/packages/Python/lldbsuite/test/tools/lldb-server gdbremote_testcase.py

[lldbremote] Implement support for MultiBreakpoint packet

This is fairly straightfoward, thanks to the helper functions created in
the previous commit.

https://github.com/llvm/llvm-project/pull/192910
DeltaFile
+59-0lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
+2-0lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.h
+2-0lldb/source/Utility/StringExtractorGDBRemote.cpp
+1-0lldb/include/lldb/Utility/StringExtractorGDBRemote.h
+1-0lldb/packages/Python/lldbsuite/test/tools/lldb-server/gdbremote_testcase.py
+0-1lldb/test/API/functionalities/multi-breakpoint/TestMultiBreakpoint.py
+65-16 files

LLVM/project d2ffdc3utils/bazel/llvm-project-overlay/lldb/source/Plugins BUILD.bazel

[Bazel] Fixes 920b203 (#192922)

This fixes 920b203fe2a2ad9f4403e8923b2b06518a46129e.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+2-0utils/bazel/llvm-project-overlay/lldb/source/Plugins/BUILD.bazel
+2-01 files

LLVM/project 1e6f8aamlir/lib/Tools/mlir-opt MlirOptMain.cpp, mlir/test/Dialect/Transform normal-forms.mlir

[mlir] MlirOptMain: avoid double verification (#192661)

MlirOptMain would run verification twice at the end of the processing:
  1. after the last pass in the pipeline;
2. prior to printing. Since there is no logic that could mutate, and
thus potentially invalidate, the IR between the two, the second
verification is redundant. Skip it when possible.
DeltaFile
+7-2mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
+1-3mlir/test/Dialect/Transform/normal-forms.mlir
+8-52 files

LLVM/project de5ac00llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp SelectionDAG.cpp

[SelectionDAG] Return poison instead of undef for out-of-bounds EXTRACT_VECTOR_ELT (#192844)

Out-of-bounds EXTRACT_VECTOR_ELT on fixed-length vectors is undefined
behavior.

Return poison instead of undef to be consistent with LangRef semantics.

Prep work to help with https://github.com/llvm/llvm-project/pull/190307
DeltaFile
+2-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+2-2llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+4-42 files

LLVM/project 8800beellvm/lib/Transforms/InstCombine InstructionCombining.cpp, llvm/test/Transforms/InstCombine urem-via-cmp-select.ll

InstCombine: Update assumption cache when replacing values

Fixes worklist verifier error with assumption cache

Co-authored-by: Yingwei Zheng <dtcxzyw at qq.com>
DeltaFile
+16-2llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+3-6llvm/test/Transforms/InstCombine/urem-via-cmp-select.ll
+19-82 files

LLVM/project 0b0cb9dllvm/lib/Analysis ValueTracking.cpp BasicAliasAnalysis.cpp, llvm/lib/Transforms/Utils SimplifyCFG.cpp

ValueTracking: Use SimplifyQuery for computeConstantRange (#191726)

Does introduce new context passing in a few of the updated contexts.
DeltaFile
+22-28llvm/lib/Analysis/ValueTracking.cpp
+20-20llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+18-10llvm/unittests/Analysis/ValueTrackingTest.cpp
+4-3llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+4-3llvm/lib/Analysis/BasicAliasAnalysis.cpp
+4-1llvm/test/Transforms/InstCombine/urem-via-cmp-select.ll
+72-656 files not shown
+83-7712 files

LLVM/project 962e90dclang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn.hip

[CIR][AMDGPU] Add lowering for amdgcn_div_scale builtins
DeltaFile
+49-0clang/test/CIR/CodeGenHIP/builtins-amdgcn.hip
+27-4clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+76-42 files

LLVM/project 18c3c86clang/include/clang/Options Options.td, clang/lib/CodeGen CGExpr.cpp CGHLSLRuntime.cpp

[HLSL][Clang] Start emitting @llvm.structured.alloca (#190157)

Allowing some pattterns in the FE to emit this new instruction to emit
logical pointers. Renamed the experimental-emit-sgep flag to reflect the
broader logic it gates.
This also updates the few frontend tests to reflect the newly emitted
alloca.

Next step is to handle the Mem2Reg/Reg2Mem.
DeltaFile
+10-5clang/lib/CodeGen/CGExpr.cpp
+11-4clang/lib/CodeGen/CGHLSLRuntime.cpp
+5-5clang/test/CodeGenHLSL/sgep/array_load.hlsl
+5-5clang/test/CodeGenHLSL/sgep/array_store.hlsl
+10-0llvm/include/llvm/IR/IRBuilder.h
+5-5clang/include/clang/Options/Options.td
+46-243 files not shown
+52-309 files

LLVM/project 26c45c5llvm/lib/Target/AMDGPU SIRegisterInfo.cpp

Refactor loop
DeltaFile
+23-19llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+23-191 files

LLVM/project 8b20914llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/Analysis/CostModel/AArch64 masked_expand_load.ll

[AArch64] Lower masked.expandload intrinsic using SVE2p2/SME2p2 EXPAND (#190999)

The masked.expandload intrinsic can be lowered using the EXPAND instruction
when available, where the source vector is the result of a contiguous load
of the number of active elements in the predicate. EXPAND is available with
either feature in non-streaming mode. It is available in streaming-mode
with SME2p2, or with SVE2p2 when SME_FA64 is also enabled.

Intrinsics which return a fixed-width result can also be lowered using SVE
instructions when preferred, otherwise they will be scalarised by falling
back on scalarizeMaskedExpandLoad.

Scalable vectors are not supported when EXPAND is not available. In this
case, the cost model will return an invalid cost for the intrinsic.
DeltaFile
+26,606-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-expandloads.ll
+4,078-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-expandload.ll
+281-0llvm/test/Analysis/CostModel/AArch64/masked_expand_load.ll
+78-12llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+89-0llvm/test/CodeGen/AArch64/scalable-expanding-load.ll
+39-0llvm/test/CodeGen/AArch64/fixed-expanding-load.ll
+31,171-122 files not shown
+31,197-148 files

LLVM/project 318a1ealibcxx/test/std/algorithms/alg.sorting/alg.sort/sort ranges.sort.pass.cpp

[libc++][test] Unblock cases for `ranges::sort` with proxy ranges (#188490)

libc++ switched to use `iter_move`/`iter_swap` long time ago, so we
should unblock these cases.
DeltaFile
+0-3libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/ranges.sort.pass.cpp
+0-31 files

LLVM/project 4a41e9cllvm/lib/Target/AMDGPU SIFrameLowering.cpp, llvm/test/CodeGen/AMDGPU callee-frame-setup.ll

AMDGPU: Don't save FP/BP for noreturn functions (#187668)

As suggested here:
https://github.com/llvm/llvm-project/pull/184616#discussion_r2889401998

We could probably skip saving other regs too, but that can be a future
patch.

Assisted-by: Sline with Claude Opus.
DeltaFile
+49-0llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
+8-5llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+57-52 files

LLVM/project 24ec6d4llvm/lib/Target/SPIRV SPIRVBuiltins.cpp, llvm/test/CodeGen/SPIRV/transcoding OpImageSampleExplicitLod.ll OpImageReadMS.ll

[SPIR-V] Fix image query and sampler type (#190767)

- Use OpImageQuerySize instead of OpImageQuerySizeLod for multisampled
SPIR-V spec requires MS=0 for OpImageQuerySizeLod
- Use `target("spirv.Sampler")` instead of i32 for non-constant sampler
kernel parameters so they produce OpTypeSampler as required by
OpSampledImage

related to https://github.com/llvm/llvm-project/issues/190736
DeltaFile
+4-4llvm/test/CodeGen/SPIRV/transcoding/OpImageSampleExplicitLod.ll
+4-2llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp
+1-1llvm/test/CodeGen/SPIRV/transcoding/OpImageReadMS.ll
+9-73 files

LLVM/project 90ecec0clang/lib/Sema SemaARM.cpp

[clang][NFC] Simplify boolean return in `SemaARM::checkTargetClonesAttr` (#192832)
DeltaFile
+1-3clang/lib/Sema/SemaARM.cpp
+1-31 files

LLVM/project 050da78llvm/test/CodeGen/RISCV/rvv vfsub-vp.ll fixed-vectors-vfsub-vp.ll

[RISCV] Remove codegen for vp_fsub (#191832)

Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off the vp_fsub intrinsic from #179622.
DeltaFile
+378-564llvm/test/CodeGen/RISCV/rvv/vfsub-vp.ll
+135-143llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfsub-vp.ll
+45-45llvm/test/CodeGen/RISCV/rvv/vfrsub-vp.ll
+36-36llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfrsub-vp.ll
+8-12llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll
+9-6llvm/test/CodeGen/RISCV/rvv/fold-vp-fsub-and-vp-fmul.ll
+611-8063 files not shown
+616-8119 files

LLVM/project c3f8eccllvm/lib/Target/AMDGPU SIRegisterInfo.cpp

Refactor loop
DeltaFile
+16-23llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+16-231 files

LLVM/project d3141b4llvm/lib/Transforms/Scalar LoopFuse.cpp, llvm/test/Transforms/LoopFusion indirect-br.ll

[LoopFusion] Validate loop structure before creating LoopCandidates (#192280)

This patch deletes the assert which required the loop to have 
preheader. It is not guaranteed to have preheader when loops
are structured using `indirectbr`. Instead, we now rely on header.

Fixes #156670.
DeltaFile
+59-0llvm/test/Transforms/LoopFusion/indirect-br.ll
+6-3llvm/lib/Transforms/Scalar/LoopFuse.cpp
+65-32 files

LLVM/project 69c566blldb/source/Plugins/SymbolLocator/SymStore SymbolLocatorSymStore.cpp SymbolLocatorSymStoreProperties.td, lldb/test/API/symstore TestSymStore.py

[lldb] Add caching and _NT_SYMBOL_PATH parsing in SymbolLocatorSymStore (#191782)

The _NT_SYMBOL_PATH environment variable is the idiomatic way to set a
system-wide lookup order of symbol servers and a local cache for
SymStore. It holds a semicolon-separated list of entries in the
following notations:
* srv*[<cache>*]<source> sets a source and an optional explicit cache
* cache*<cache> sets an implicit cache for all subsequent entries
* all other entries are bare local directories

Since symbol paths are closely intertwined with the caching of symbol
files, this patch proposes support in LLDB for both features at once.
ParseEnvSymbolPaths() implements the parsing logic, which processes
entries of the symbol path string from left to right to create a series
of LookupEntry objects that each store a source and a cache location.
The source of a LookupEntry can be a local directory or an HTTP server
address. The cache is a local directory or empty. This representation
unifies the implicit vs. explicit caching options from the SymStore
protocol.

    [22 lines not shown]
DeltaFile
+256-18lldb/source/Plugins/SymbolLocator/SymStore/SymbolLocatorSymStore.cpp
+158-3lldb/test/API/symstore/TestSymStore.py
+113-0lldb/unittests/Symbol/SymStoreTest.cpp
+6-2lldb/source/Plugins/SymbolLocator/SymStore/SymbolLocatorSymStoreProperties.td
+8-0lldb/source/Plugins/SymbolLocator/SymStore/SymbolLocatorSymStore.h
+2-0lldb/unittests/Symbol/CMakeLists.txt
+543-236 files

LLVM/project 3323903llvm/test/CodeGen/AArch64/GlobalISel knownbits-vector.mir, llvm/unittests/CodeGen/GlobalISel KnownBitsVectorTest.cpp

[AArch64][GlobalISel] Move KnownBitsVectorTest to mir. NFC (#192536)

This ports some of the older C++ GlobalISel known-bits tests to use
print<gisel-value-tracking> in a mir file. This is mostly autogenerated,
but attempts to keep the existing comments. Some tests have not been
ported as they are entirely in C++ or tested isKnownToBeAPowerOfTwo,
which is not tested in the print output.
DeltaFile
+0-1,370llvm/unittests/CodeGen/GlobalISel/KnownBitsVectorTest.cpp
+1,291-0llvm/test/CodeGen/AArch64/GlobalISel/knownbits-vector.mir
+1,291-1,3702 files

LLVM/project 7a64225clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp, clang/test/CodeGen/AArch64 neon-misc.c neon-intrinsics.c

[CIR] add vsqrt and vsqrtq support (#192282)

Part of https://github.com/llvm/llvm-project/issues/185382

co-authored by: @Kouunnn <xerw1314 at gmail.com>

---------

Co-authored-by: Zile Xiong <xiongzile99 at gmail.com>
Co-authored-by: ZCkouun <1765074320 at qq.com>
DeltaFile
+62-0clang/test/CodeGen/AArch64/neon/intrinsics.c
+0-39clang/test/CodeGen/AArch64/neon-misc.c
+1-13clang/test/CodeGen/AArch64/neon-intrinsics.c
+6-0clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+69-524 files

LLVM/project 57e35e9llvm/test/Transforms/LoopVectorize/X86 replicating-load-store-costs.ll

[LV][NFC] Regen CHECK lines in LoopVectorize/X86/replicating-load-store-costs.ll (#192682)
DeltaFile
+525-525llvm/test/Transforms/LoopVectorize/X86/replicating-load-store-costs.ll
+525-5251 files

LLVM/project b37a607compiler-rt/lib/builtins CMakeLists.txt, compiler-rt/lib/builtins/aarch64 sme-abi.S

[compiler-rt] Don't provide `__arm_sme_state` for baremetal targets (#191434)

Previously, we required baremetal runtimes to implement an undocumented
`__aarch64_sme_accessible` hook to check if SME is available (as
checking CPU features may vary across targets).

This allowed us to provide a generic `__arm_sme_state` implementation
but caused some friction for toolchains that depend on compiler-rt.

This patch instead removes the implementation of `__arm_sme_state` for
baremetal. This makes it the responsibility of the runtime (e.g. libc)
to provide this function for baremetal targets.

The requirements of this function are documented in the AAPCS64:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#811__arm_sme_state

All other SME ABI rountines are still provided by compiler-rt.
DeltaFile
+21-16compiler-rt/lib/builtins/cpu_model/aarch64/fmv/baremetal.inc
+0-9llvm/test/CodeGen/AArch64/aarch64-sme-stubs.ll
+3-3compiler-rt/lib/builtins/cpu_model/aarch64.c
+0-5mlir/lib/ExecutionEngine/ArmSMEStubs.cpp
+4-1compiler-rt/lib/builtins/aarch64/sme-abi.S
+1-1compiler-rt/lib/builtins/CMakeLists.txt
+29-356 files

LLVM/project 6fd5b52mlir/lib/Dialect/Transform/IR Utils.cpp, mlir/test/Dialect/Transform normal-forms.mlir

[mlir] reduce excessive verification in transform (#192653)

`mergeSymbolsInto` called by the transform interpreter for named
sequence management was calling a full verifier after renaming symbols.
The renaming could have potentially broken symbol table-related
invariants, but not really anything else. Only verify the symbol
table-related invariants instead.
DeltaFile
+8-4mlir/lib/Dialect/Transform/IR/Utils.cpp
+4-5mlir/test/Dialect/Transform/normal-forms.mlir
+12-92 files

LLVM/project 75d63afllvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp, llvm/test/CodeGen/AArch64 itofp-bf16.ll itofp.ll

[AArch64] Fix lowering of non-power2 uitofp (#190921)

The code in DAGTypeLegalizer::SplitVecOp_TruncateHelper attempts to use
getFloatingPointVT(InElementSize/2), which is invalid for non-power2
type sizes. Fall back to the existing SplitVecOp_UnaryOp in this case.
DeltaFile
+77-0llvm/test/CodeGen/AArch64/itofp-bf16.ll
+72-0llvm/test/CodeGen/AArch64/itofp.ll
+2-2llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+151-23 files

LLVM/project f2efeabllvm/include/llvm/CodeGen MIRYamlMapping.h MachineFrameInfo.h, llvm/lib/CodeGen TargetOptionsImpl.cpp MachineFunction.cpp

[CodeGen] Parse frame-pointer attribute once when creating MachineFunction (#191974)

TargetOptions::DisableFramePointerElim is hot and showing up in
compile-time profiles via AArch64FrameLowering::hasFPImpl on
aarch64-O0-g builds. Repeatedly looking up the function attribute is
expensive. Parsing it once at MachineFunction initialisation and storing
as FramePointerKind on MachineFrameInfo is a -0.21% geomean improvement
on CTMark stage1-aarch64-O0-g. Also helps debug builds on other targets.

https://llvm-compile-time-tracker.com/compare.php?from=215f35eb8f1c313ac135ad47db1cc0b99b3ae694&to=51f6617517177bea1cc49baeab3acaf62d5e9df9&stat=instructions%3Au
DeltaFile
+61-0llvm/test/CodeGen/MIR/Generic/frame-info.mir
+19-18llvm/lib/CodeGen/TargetOptionsImpl.cpp
+17-0llvm/lib/CodeGen/MachineFunction.cpp
+14-0llvm/include/llvm/CodeGen/MIRYamlMapping.h
+10-0llvm/include/llvm/CodeGen/MachineFrameInfo.h
+2-0llvm/lib/CodeGen/MIRParser/MIRParser.cpp
+123-182 files not shown
+126-188 files

LLVM/project 920b203lldb/source/Plugins/ABI/RISCV ABISysV_riscv.cpp, lldb/source/Plugins/DynamicLoader/POSIX-DYLD DynamicLoaderPOSIXDYLD.cpp

[lldb][RISCV] Implement access to TLS variables on RISC-V (#191410)

On RISC-V Linux, LLDB computes TLS variable addresses incorrectly:
`GetThreadLocalData` returns a correct tls_block, but then
unconditionally adds tls_file_addr from `DW_OP_GNU_push_tls_address`,
which on RISC-V/glibc is a VMA inside PT_TLS, not a pure offset. This
results in an over-shifted address.

This patch:

* Adds a small helper that, for an ELF module, finds the PT_TLS program
header and reads its p_vaddr.

* In `DynamicLoaderPOSIXDYLD::GetThreadLocalData`, normalizes
tls_file_addr to an offset: if `PT_TLS` is found and tls_file_addr >=
p_vaddr, it uses tpoff = tls_file_addr - p_vaddr, otherwise keeps the
old value.

* Returns tls_block + tpoff instead of always tls_block + tls_file_addr.

    [9 lines not shown]
DeltaFile
+54-2lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
+2-1lldb/source/Plugins/ABI/RISCV/ABISysV_riscv.cpp
+1-1lldb/source/Plugins/Process/Utility/RegisterInfos_riscv64.h
+1-1llvm/docs/ReleaseNotes.md
+58-54 files

LLVM/project 9366436mlir/lib/Dialect/Tensor/IR TensorOps.cpp, mlir/lib/Dialect/Tensor/Transforms EmptyOpPatterns.cpp

[mlir][tensor] Preserve tensor encodings when materializing tensor.empty in some passes (#192411)

This PR fixes tensor encoding propagation bugs in some `tensor.empty`
materialization paths that could produce type-invalid IR (encoded result
expected, unencoded value produced).

Assisted-by: Cursor (Codex 5.3)
DeltaFile
+27-0mlir/test/Interfaces/TilingInterface/tile-pad-using-interface.mlir
+16-0mlir/test/Dialect/Tensor/fold-empty-op.mlir
+6-3mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+4-2mlir/lib/Dialect/Tensor/Transforms/EmptyOpPatterns.cpp
+53-54 files