LLVM/project faa87b0llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 clmul-scalable.ll

[AArch64] Lower scalable i64 CLMUL with SVE2/SME (#198999)

When AES or SSVE-AES are not available, but SVE2 or SME are,
clmul.nxv2i64 can benefit from a cross-byte CLMUL of .S precision. This
re-uses the functionality added for nxv8i16.
DeltaFile
+16-448llvm/test/CodeGen/AArch64/clmul-scalable.ll
+54-40llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+70-4882 files

LLVM/project caf469dllvm/include/llvm/Support ConvertUTF.h, llvm/lib/Support ConvertUTFWrapper.cpp

[Support] Take ArrayRef in convertWideToUTF8 (#200687)

`convertWideToUTF8` took a `std::wstring`, but it never modified its
data. An `ArrayRef` or `std::wstring_view` are sufficient here. I chose
`ArrayRef<wchar_t>` over `std::wstring_view`, because it can be
implicitly constructed from any range that provides `data()` and
`size()`. A second overload taking a `const wchar_t *` is provided to
convert null terminated wide C-strings.
DeltaFile
+10-5llvm/include/llvm/Support/ConvertUTF.h
+5-1llvm/lib/Support/ConvertUTFWrapper.cpp
+15-62 files

LLVM/project 73ded45clang/lib/Analysis LiveVariables.cpp, clang/lib/StaticAnalyzer/Core ExprEngineCXX.cpp

[Liveness][analyzer] Fix handling of [[assume]] attributes  (#198618)

Before this commit, if the analyzer encountered code like
```
int f(int a, int b) {
  [[assume(a == 2), assume(b == 3)]];
  return a + b;
}
```
it performed the following steps:
1. It visited the expression `a == 2` with `ExprEngine::Visit` (after
visiting its sub-expressions, within the regular visitation that visits
each statement of the `CFGBlock`). This triggered the `EagerlyAssume`
logic and separated two execution paths.
2. It discarded the result bound to `a == 2` from the `Environment`
because `a == 2` is not a direct child of the `AttributedStmt`.
3. Analogously, it visited an evaluated `b == 3`.
4. Analogously, it discarded the result bound to `b == 3`.
5. On each execution path `VisitAttributedStmt` was reached, it ran the

    [32 lines not shown]
DeltaFile
+27-5clang/test/Analysis/cxx23-assume-attribute.cpp
+16-4clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
+9-0clang/lib/Analysis/LiveVariables.cpp
+52-93 files

LLVM/project 9738adalibsycl/include/sycl/__impl queue.hpp, libsycl/include/sycl/__impl/detail unified_range_view.hpp get_device_kernel_info.hpp

[libsycl] Add single_task (#192499)

Depends on liboffload PR:
https://github.com/llvm/llvm-project/pull/194333.

The approach with void sycl_kernel_launch(pack of arguments) implies
that
we can use or copy arguments only during that call. Since it pass only
kernel arguments as parameters and returns void - we have to split
setting
of extra kernel data like event dependencies and range and getting
result
event from arguments handling and direct kernel submision if it is
possible. Key stages: 1) passing to queue (or handler in future)
dependency
events and range (for parallel_for), saving them in queue (copy/move).
2)
wrapping kernel arguments into typeless wrappers (pointer based,
initially

    [39 lines not shown]
DeltaFile
+113-0libsycl/src/detail/queue_impl.cpp
+100-0libsycl/include/sycl/__impl/queue.hpp
+53-0libsycl/test/basic/get_backend.cpp
+51-0libsycl/include/sycl/__impl/detail/unified_range_view.hpp
+43-0libsycl/include/sycl/__impl/detail/get_device_kernel_info.hpp
+40-0libsycl/src/detail/queue_impl.hpp
+400-06 files not shown
+468-012 files

LLVM/project 0059fe6clang/include/clang/Analysis/Analyses/LifetimeSafety FactsGenerator.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

[LifetimeSafety] Add support for lifetime capture_by (#196884)

This PR implements support for the `[[clang::lifetime_capture_by(X)]]`
attribute within the lifetime-safety analysis.

The PR introduces a new helper in `FactGenerator.cpp` called
`handleLifetimeCaptureBy` which detects
`[[clang::lifetime_capture_by(X)]]` on parameters. If detected, the
analyzer now generates an `OriginFlowFact` ensuring that captured
dependencies are added to the capturer's state. The PR supports
capture_by params and `this` and currently doesn't implement attributes
on function declarations.

Example:
Integrate `[[clang::lifetimebound]]`: This existing Clang annotation is
crucial for specifying that the lifetime of a function's output is tied
to one of its inputs.

```cpp

    [60 lines not shown]
DeltaFile
+160-0clang/test/Sema/warn-lifetime-safety.cpp
+61-0clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+5-0clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+226-03 files

LLVM/project 8fed372clang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn.hip

[CIR][AMDGPU] Implement lowering for __builtin_amdgcn_dispatch_ptr
DeltaFile
+29-6clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+8-0clang/test/CIR/CodeGenHIP/builtins-amdgcn.hip
+37-62 files

LLVM/project 02d839allvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis exact-siv-mul-overflow.ll

[DA] Fix overflow in the Exact test
DeltaFile
+8-6llvm/lib/Analysis/DependenceAnalysis.cpp
+1-3llvm/test/Analysis/DependenceAnalysis/exact-siv-mul-overflow.ll
+9-92 files

LLVM/project 7872403llvm/test/Analysis/DependenceAnalysis exact-siv-mul-overflow.ll

[DA] Add test for the Exact test misses dependency due to overflow
DeltaFile
+54-0llvm/test/Analysis/DependenceAnalysis/exact-siv-mul-overflow.ll
+54-01 files

LLVM/project 356d7e6llvm/lib/IR Value.cpp, llvm/test/Analysis/ValueTracking memory-dereferenceable.ll

[IR] Handle nofree noalias in canBeFreed() (#200194)

Based on the argument nofree semantics specified in
https://github.com/llvm/llvm-project/pull/195658, we can conclude that
an argument with both nofree and noalias cannot be freed.

This also handles the case of readonly + noalias, to be consistent with
the logic for functions (and because we had a FIXME for it...)
DeltaFile
+20-7llvm/test/Analysis/ValueTracking/memory-dereferenceable.ll
+6-0llvm/lib/IR/Value.cpp
+26-72 files

LLVM/project bd1b3d4lldb/include/lldb/Core Mangled.h, lldb/source/Core Mangled.cpp

[lldb] Reduce size of Mangled class (#200181)

The Mangled class is used in several places in LLDB, most notably as a
direct member of Symbol. This makes this class one of the most
frequently long-lived allocations in LLDB.

In commit a2672250be871bdac18c1a955265a98704434218 , this class got a
(large) cache that stores information about demangled data. This cache
is stored in a std::optional member, which means the memory for the
class is allocated within our Mangled object. It should be noted that
this cache is only used when we actually demangle the name, which
doesn't happen for every mangled name we encounter.

The additional cache member caused that the size of Mangled went from
16B to 152B by default (that is, even if the Mangled name was never
demangled).

This patch replaces the std::optional with a unique_ptr which stores the
cache on first use in a separate heap allocation. This changes decreases
the amount of allocated memory when debugging a relatively small
Objective-C project from 1.57GiB to 1.18GiB (-400MiB).
DeltaFile
+21-19lldb/unittests/Core/MangledTest.cpp
+24-2lldb/include/lldb/Core/Mangled.h
+6-6lldb/source/Core/Mangled.cpp
+1-1lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
+52-284 files

LLVM/project 919f72a.github CODEOWNERS, libc/docs porting.rst

[docs] Remove all references to Maintainers.rst (#200368)

All projects are using Maintainers.md files as of #200365.
DeltaFile
+4-5llvm/docs/DeveloperPolicy.rst
+2-2llvm/docs/Contributing.rst
+1-1.github/CODEOWNERS
+1-1libc/docs/porting.rst
+8-94 files

LLVM/project 105e8cellvm/include/llvm/IR CFG.h, llvm/lib/AsmParser LLParser.cpp

[spr] initial version

Created using spr 1.3.8-wip
DeltaFile
+0-51llvm/lib/AsmParser/LLParser.cpp
+0-42llvm/test/Assembler/uselistorder_bb.ll
+5-17llvm/include/llvm/IR/CFG.h
+2-14llvm/lib/CodeGen/IndirectBrExpandPass.cpp
+0-11llvm/test/Assembler/invalid-uselistorder_bb-numbered.ll
+0-11llvm/test/Assembler/invalid-uselistorder_bb-float-literal.ll
+7-14616 files not shown
+21-19022 files

LLVM/project 050c202clang-tools-extra Maintainers.md, clang-tools-extra/docs conf.py Maintainers.md

[clang-tools-extra][docs] Convert maintainers file to Markdown (#200365)

Following the way clang does it.

* Moved files to .md (done in #200769).
* Reformatted into Markdown.
* Changed the stub file docs/Maintainers.rst into docs/Maintainers.md
and used a myst directive for the include.
* In the config file, added myst parser and ".md" as a recognised file
extension.

After this change, all maintainers files in llvm-project will be in
Markdown format.
DeltaFile
+54-56clang-tools-extra/Maintainers.md
+3-2clang-tools-extra/docs/conf.py
+2-1clang-tools-extra/docs/Maintainers.md
+1-1llvm/Maintainers.md
+60-604 files

LLVM/project 1b24a40clang-tools-extra Maintainers.md Maintainers.rst, clang-tools-extra/docs Maintainers.rst Maintainers.md

[clang-tools-extra] Move maintainer files to .md files (#200769)

Without any formatting changes. This will break the docs build, but a
follow up (#200365) will fix the formatting and so on.
DeltaFile
+84-0clang-tools-extra/Maintainers.md
+0-84clang-tools-extra/Maintainers.rst
+0-1clang-tools-extra/docs/Maintainers.rst
+1-0clang-tools-extra/docs/Maintainers.md
+85-854 files

LLVM/project 044b63dclang/lib/AST/ByteCode Compiler.cpp

[clang][bytecode][NFC] Avoid some code duplication for `ScalarValueInitExpr` (#200755)
DeltaFile
+13-26clang/lib/AST/ByteCode/Compiler.cpp
+13-261 files

LLVM/project 1678d2elibsycl/src/detail queue_impl.cpp

fix liboffload usage

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+5-3libsycl/src/detail/queue_impl.cpp
+5-31 files

LLVM/project 4176fb6lld/test/ELF aarch64-reloc-pauth.s

Address review comments
DeltaFile
+5-11lld/test/ELF/aarch64-reloc-pauth.s
+5-111 files

LLVM/project 9a0f9dallvm/lib/Support UnicodeNameToCodepointGenerated.cpp, llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll llvm.amdgcn.av.load.b128.ll

Merge branch 'main' into users/KseniyaTikhomirova/kernel_submit_single_3
DeltaFile
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+12,982-11,930llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+12,365-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.av.load.b128.ll
+10,469-10llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-global.mir
+8,268-12llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+2,674-2,698llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+70,631-35,5736,841 files not shown
+387,475-197,6156,847 files

LLVM/project ea8f3dfllvm/test/Transforms/LoopVectorize cast-costs.ll vscale-cost.ll

[LV][NFC] Add cost model tests for VPInstructionWithType (#200135)
DeltaFile
+80-0llvm/test/Transforms/LoopVectorize/cast-costs.ll
+36-0llvm/test/Transforms/LoopVectorize/vscale-cost.ll
+116-02 files

LLVM/project f2f9eaellvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining-avx512vbmi2.ll

[X86] matchShuffleAsVSHLD - fix incorrect shift factor (#200754)

#200604 left the non-commuted case to still scale by 8bits instead of the src scalar bit size
DeltaFile
+17-0llvm/test/CodeGen/X86/vector-shuffle-combining-avx512vbmi2.ll
+1-1llvm/lib/Target/X86/X86ISelLowering.cpp
+18-12 files

LLVM/project 581c37autils/bazel/llvm-project-overlay/libc BUILD.bazel

[bazel] Port ae1d75e (#200758)
DeltaFile
+1-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-01 files

LLVM/project 63c29dfclang/lib/Serialization ASTReaderDecl.cpp, clang/test/PCH friend-template-spec-redecl.cpp

[Serialization] Fix assertion on re-deserialized friend template spec… (#200566)

…ialization in PCH (#198133)

A friend function-template specialization declared inside a class
template is serialized into a PCH. When the class template is later
instantiated while loading the PCH, the friend specialization can be
deserialized re-entrantly (VisitFriendDecl -> VisitFunctionDecl -> ...
-> VisitFunctionDecl for the same specialization) at the same time as
the canonical copy, producing two redeclarations of the same
specialization in the template's specialization set.

ASTDeclReader::VisitFunctionDecl asserted that this collision could only
happen when merging declarations from different modules. Since
38b3d87bd384, friend functions defined inside dependent class templates
are loaded eagerly, so the collision can now also occur within a single
PCH/AST file (non-modules build), tripping the assertion:

  Assertion failed: (Reader.getContext().getLangOpts().Modules &&

    [7 lines not shown]
DeltaFile
+34-0clang/test/PCH/friend-template-spec-redecl.cpp
+0-2clang/lib/Serialization/ASTReaderDecl.cpp
+34-22 files

LLVM/project ae1d75elibc/src/__support/math hypotf16.h expxf16_utils.h

[libc][math] Guard f16 math headers to fix certain 32-bit ARM builds (#200715)

Wrap hypotf16.h and expxf16_utils.h in LIBC_TYPES_HAS_FLOAT16 macros
like other flaot16 math headers. This fixes build breaks on systems
where float16 is unsupported (like some 32-bit ARM).
DeltaFile
+6-0libc/src/__support/math/hypotf16.h
+6-0libc/src/__support/math/expxf16_utils.h
+12-02 files

LLVM/project e9556fcmlir/lib/Conversion/MathToSPIRV MathToSPIRV.cpp, mlir/test/Conversion/MathToSPIRV math-to-gl-spirv.mlir math-to-opencl-spirv.mlir

[mlir][SPIR-V] Convert math.trunc to GL Trunc and CL trunc (#200739)
DeltaFile
+4-0mlir/test/Conversion/MathToSPIRV/math-to-gl-spirv.mlir
+4-0mlir/test/Conversion/MathToSPIRV/math-to-opencl-spirv.mlir
+2-0mlir/lib/Conversion/MathToSPIRV/MathToSPIRV.cpp
+10-03 files

LLVM/project 52e2280clang/lib/CodeGen TargetInfo.h CodeGenModule.cpp, clang/lib/CodeGen/Targets AMDGPU.cpp SPIR.cpp

[NFCI][clang] Allow overriding any global variable address space

Allow the target to change the AS of a global variable at will, not just whenever Clang cannot assign one.
This enables the next patch that will specialize LDS GVs for barriers as a separate address space.
DeltaFile
+10-9clang/lib/CodeGen/Targets/AMDGPU.cpp
+12-6clang/lib/CodeGen/TargetInfo.h
+7-8clang/lib/CodeGen/Targets/SPIR.cpp
+11-2clang/lib/CodeGen/CodeGenModule.cpp
+5-6clang/lib/CodeGen/TargetInfo.cpp
+6-3clang/lib/CodeGen/Targets/AVR.cpp
+51-346 files

LLVM/project c9f6a05llvm/test/CodeGen/AMDGPU s-barrier-id-allocation.ll, mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td

Fix MLIR
DeltaFile
+21-21llvm/test/CodeGen/AMDGPU/s-barrier-id-allocation.ll
+8-6mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+4-4mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-barriers-gfx12.mlir
+2-2mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+1-1mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+36-345 files

LLVM/project 005e564llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU addrspacecast-barrier.ll s-barrier.ll

[RFC][AMDGPU] Add BARRIER address space

Add a new BARRIER address space that is used for global variables that are used to represent the barrier IDs in GFX12.5.

These barrier addresses just have values corresponding 1-1 to barrier IDs. They are still implemented on top of LDS, but the offsetting happens during an addrspacecast to generic, not whenever the barrier GV is used.

The motivation for this is to make the relation between LDS and barrier GVs explicit in the compiler. It does add a bit more complexity, but that complexity was already there, just hidden by pretending barrier GVs were actual LDS.
DeltaFile
+442-0llvm/test/CodeGen/AMDGPU/addrspacecast-barrier.ll
+62-45llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+54-31llvm/test/CodeGen/AMDGPU/s-barrier.ll
+52-14llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+35-31llvm/test/CodeGen/AMDGPU/s-barrier-lowering.ll
+32-32llvm/test/CodeGen/AMDGPU/amdgpu-lower-exec-sync-and-module-lds.ll
+677-15342 files not shown
+1,108-44148 files

LLVM/project 7e5a386llvm/lib/Target/AMDGPU AMDGPULowerExecSync.cpp

clang-format
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+1-21 files

LLVM/project 62b7cf9llvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU buffer-store-dwordx4-vpk-mul-war-hazard-gfx942.mir

[AMDGPU] Widen MUBUF/MTBUF source-vgpr WAR hazard on gfx940-family to SGPR soffset (#197267)

createsVALUHazard previously gated the MUBUF/MTBUF source-vgpr WAR
hazard to fire only when SOFFSET was a literal or absent. On
gfx940-family subtargets that gate is too narrow: the hazard also fires
when SOFFSET is sourced from an SGPR.

Concretely, on gfx950 a sequence of the form

```
  buffer_store_dwordx4 v[X:X+3], voff, descr, sN offen
  v_pk_mul_f32 v[X:X+1], <src>, <src>           # next VALU cycle
```

deterministically commits the post-pk_mul value of v[X+1] to memory for
the second dword of the store; the other three dwords store correctly.

The wait-state window depends on the SOFFSET shape:


    [20 lines not shown]
DeltaFile
+122-0llvm/test/CodeGen/AMDGPU/buffer-store-dwordx4-vpk-mul-war-hazard-gfx942.mir
+58-21llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+180-212 files

LLVM/project dc01d53llvm/test/CodeGen/AMDGPU s-barrier-id-allocation.ll, mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td

Fix MLIR
DeltaFile
+21-21llvm/test/CodeGen/AMDGPU/s-barrier-id-allocation.ll
+8-6mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+4-4mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-barriers-gfx12.mlir
+2-2mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+1-1mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+36-345 files