LLVM/project e602cd1llvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU wmma-coexecution-valu-hazards.mir wmma-hazards-gfx1250-w32.mir

[AMDGPU] Add a few wmma co-execution hazard checks, NFC (#203658)

This is to reflect the gfx1251 update regarding wmma*8f6f4 with
 matrix format as F4.

  Also fix a comment in GCNHazardRecognizer.cpp
DeltaFile
+362-0llvm/test/CodeGen/AMDGPU/wmma-coexecution-valu-hazards.mir
+87-0llvm/test/CodeGen/AMDGPU/wmma-hazards-gfx1250-w32.mir
+1-1llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+450-13 files

LLVM/project 24faf90clang/include/clang/Basic AttrDocs.td Attr.td, clang/test/AST undocumented-attrs.cpp

[Clang] Add AttrDocs entry for OverflowBehavior (#203392)

These docs were previously missing.

Fixes: #203322

Signed-off-by: Justin Stitt <justinstitt at google.com>
DeltaFile
+45-0clang/include/clang/Basic/AttrDocs.td
+1-2clang/test/AST/undocumented-attrs.cpp
+1-1clang/include/clang/Basic/Attr.td
+47-33 files

LLVM/project e63cd40llvm/lib/Target/NVPTX NVPTXISelLowering.cpp NVPTXAsmPrinter.cpp

[NVPTX] Rip out vestigial variadic support (NFC) (#202385)
DeltaFile
+65-231llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+11-21llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+4-9llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+0-8llvm/lib/Target/NVPTX/NVPTXSubtarget.h
+80-2694 files

LLVM/project 18a4c90libcxx/include optional, libcxx/test/std/utilities/optional/optional.monadic and_then.pass.cpp transform.pass.cpp

[libc++] Fix bug where `optional<T&>` couldn't be constructed from `transform()` (#203462)

- Add the proper from monadic base constructor
- Fix the constraint so it allows references.
- Add tests
DeltaFile
+35-2libcxx/test/std/utilities/optional/optional.monadic/and_then.pass.cpp
+26-5libcxx/include/optional
+17-4libcxx/test/std/utilities/optional/optional.monadic/transform.pass.cpp
+9-0libcxx/test/std/utilities/optional/optional.monadic/or_else.pass.cpp
+87-114 files

LLVM/project 44cc797libcxx/include/__memory uninitialized_algorithms.h ranges_uninitialized_algorithms.h, libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.construct.default ranges_uninitialized_default_construct.pass.cpp uninitialized_default_construct_n.pass.cpp

[libc++] P3369R0: constexpr for `uninitialized_default_construct` (#200163)

Remarks:
- Tests also verify that `uninitialized_default_construct(_n)`
algorithms do not initialize trivially default-constructible elements
(`int` in these tests) to determined values during constant evaluation.
DeltaFile
+179-0libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.construct.default/ranges_uninitialized_default_construct.pass.cpp
+110-1libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.construct.default/uninitialized_default_construct_n.pass.cpp
+108-1libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.construct.default/uninitialized_default_construct.pass.cpp
+109-0libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.construct.default/ranges_uninitialized_default_construct_n.pass.cpp
+7-4libcxx/include/__memory/uninitialized_algorithms.h
+5-3libcxx/include/__memory/ranges_uninitialized_algorithms.h
+518-97 files not shown
+531-1613 files

LLVM/project 08e6e14llvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp, llvm/test/CodeGen/RISCV/GlobalISel rv64zbb.ll

[GlobalISel] Fix sign-extended byte mask in lowerBswap (#199387)

The per-byte mask in `LegalizerHelper::lowerBswap` was constructed via

```
APInt APMask(SizeInBytes * 8, 0xFF << (i * 8));
```

where `0xFF << (i * 8)` is evaluated as a signed `int`. For `i*8 >= 24`
(byte-3 mask of an s64 G_BSWAP) the value `0xFF000000` does not fit in a
positive 32-bit `int`; the conversion to signed `int` is
implementation-defined under C++17 (UB under C++11, fully defined under
C++20) and on two's-complement targets produces `-16777216`. The modular
conversion to `uint64_t` in the `APInt` constructor then materializes
that negative `int` as `0xFFFFFFFFFF000000` — the intended mask was
`0x00000000FF000000`. The over-wide mask preserved bytes 4-7 of the
source where only byte 3 was intended, and the spurious bytes propagated
through the subsequent shift/OR chain.


    [3 lines not shown]
DeltaFile
+28-0llvm/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp
+13-12llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+1-1llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-bswap-rv64.mir
+1-1llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+1-1llvm/test/CodeGen/RISCV/GlobalISel/legalizer/legalize-bitreverse-rv64.mir
+44-155 files

LLVM/project afeee22llvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/X86 fold-signbit-reduction-cmp-codesize.ll

[VectorCombine] Use TCK_CodeSize for size-optimized functions (#202207)

VectorCombine currently uses `TCK_RecipThroughput` for all functions,
including functions optimized for size.

Select `TCK_CodeSize` when `Function::hasOptSize()` is true, covering
both `-Os` (`optsize`) and `-Oz` (`minsize`), while retaining
  `TCK_RecipThroughput` for the default optimization mode.

The X86 regression test demonstrates a sign-bit reduction where the
throughput cost model folds an `or` reduction into a `umax` reduction.
The code-size model preserves the smaller form for `optsize` and
`minsize` functions, while the default function retains the existing
throughput-oriented transformation.

  Fixes #153375.
DeltaFile
+83-0llvm/test/Transforms/VectorCombine/X86/fold-signbit-reduction-cmp-codesize.ll
+3-2llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+86-22 files

LLVM/project 756ff71clang/lib/CodeGen CGHLSLRuntime.cpp

[HLSL][NFC] Move HLSLBufferCopyEmitter class (#203595)

Move `HLSLBufferCopyEmitter` class to the anonymous namespace at the top
of `CGHLSLRuntime.cpp` and use it directly from
`CGHLSLRuntime::createBufferMatrixTempAddress` instead going though the
`CGHLSLRuntime::emitBufferCopy` call. No changes were done to the
`HLSLBufferCopyEmitter` code.

This is preparation for work related to resources in cbuffer structs
which will be changing the signature of `CGHLSLRuntime::emitBufferCopy`
and modifying the `HLSLBufferCopyEmitter`.
DeltaFile
+163-165clang/lib/CodeGen/CGHLSLRuntime.cpp
+163-1651 files

LLVM/project e3e2fd6llvm/lib/Target/RISCV RISCVInstrInfo.td RISCVInstrInfo.cpp

[RISCV] Add PseudoClearGPR to the special cases in RISCVInstrInfo::getInstSizeInBytes. (#203637)

This instruction is expanded to an ADDI with immediate of 0 and should
then be compressed to c.li with Zca. The compression code doesn't know
this due to the Pseudo so manually give a size of 2 for Zca.
DeltaFile
+1-2llvm/lib/Target/RISCV/RISCVInstrInfo.td
+1-0llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+2-22 files

LLVM/project 4e5fa3bllvm/lib/Target/RISCV RISCVFrameLowering.cpp, llvm/test/CodeGen/RISCV shadowcallstack-frame-flags.ll

[RISCV] Mark HW shadow stack ops as frame setup/destroy (#203362)

This change follows up on PR #200182 and addresses the issue in the
[related
comment](https://github.com/llvm/llvm-project/pull/200182#discussion_r3329197379).

It sets `FrameSetup` on SSPUSH/C_SSPUSH and `FrameDestroy` on SSPOPCHK
instructions emitted by RISCVFrameLowering for the HW shadow stack path.
The test was written manually (instead of using
`utils/update_mir_test_checks.py`) to keep it simple and avoid
unnecessary fragility.
DeltaFile
+21-0llvm/test/CodeGen/RISCV/shadowcallstack-frame-flags.ll
+9-3llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
+30-32 files

LLVM/project 81877c8clang/test/Sema warn-lifetime-safety.cpp, clang/test/Sema/LifetimeSafety safety.cpp

minor simplification

Created using spr 1.3.8-beta.1
DeltaFile
+3,204-3,450llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
+1,905-2,037llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+3,716-0clang/test/Sema/LifetimeSafety/safety.cpp
+0-3,653clang/test/Sema/warn-lifetime-safety.cpp
+2,760-227llvm/test/CodeGen/AMDGPU/fcanonicalize.ll
+1,813-654llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sched.group.barrier.ll
+13,398-10,021689 files not shown
+62,422-27,646695 files

LLVM/project 3956031clang/test/Sema warn-lifetime-safety.cpp, clang/test/Sema/LifetimeSafety safety.cpp

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.8-beta.1

[skip ci]
DeltaFile
+3,204-3,450llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
+1,905-2,037llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+3,716-0clang/test/Sema/LifetimeSafety/safety.cpp
+0-3,653clang/test/Sema/warn-lifetime-safety.cpp
+2,760-227llvm/test/CodeGen/AMDGPU/fcanonicalize.ll
+1,813-654llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sched.group.barrier.ll
+13,398-10,021688 files not shown
+62,378-27,596694 files

LLVM/project 2e472f5clang/include/clang/Basic DiagnosticSemaKinds.td, clang/lib/Sema SemaLifetimeSafety.h

helpful-destroyed-here
DeltaFile
+157-157clang/test/Sema/LifetimeSafety/safety.cpp
+44-44clang/test/Sema/LifetimeSafety/nocfg.cpp
+7-7clang/test/Sema/LifetimeSafety/annotation-suggestions.cpp
+2-2clang/test/Sema/LifetimeSafety/cfg-bailout.cpp
+2-1clang/lib/Sema/SemaLifetimeSafety.h
+1-1clang/include/clang/Basic/DiagnosticSemaKinds.td
+213-2126 files

LLVM/project b8e34c5clang/include/clang/Basic DiagnosticSemaKinds.td, clang/lib/Sema SemaLifetimeSafety.h

[LifetimeSafety] Improve diagnostic messages for invalidations (#203577)
DeltaFile
+74-74clang/test/Sema/LifetimeSafety/invalidations.cpp
+60-55clang/test/Sema/LifetimeSafety/safety.cpp
+39-25clang/test/Sema/LifetimeSafety/nocfg.cpp
+28-7clang/lib/Sema/SemaLifetimeSafety.h
+6-6clang/test/Sema/LifetimeSafety/annotation-suggestions.cpp
+4-4clang/include/clang/Basic/DiagnosticSemaKinds.td
+211-1716 files

LLVM/project b5a0aa7clang/include/clang/Basic DiagnosticSemaKinds.td, clang/lib/Sema SemaLifetimeSafety.h

helpful-destroyed-here
DeltaFile
+157-157clang/test/Sema/LifetimeSafety/safety.cpp
+44-44clang/test/Sema/LifetimeSafety/nocfg.cpp
+7-7clang/test/Sema/LifetimeSafety/annotation-suggestions.cpp
+2-2clang/test/Sema/LifetimeSafety/cfg-bailout.cpp
+2-1clang/lib/Sema/SemaLifetimeSafety.h
+1-1clang/include/clang/Basic/DiagnosticSemaKinds.td
+213-2126 files

LLVM/project 0579490llvm/include/llvm/IR Intrinsics.td Intrinsics.h, llvm/lib/IR Intrinsics.cpp

[NFC][LLVM] Refactor IIT_ANY payload for vector/element constraint (#203506)

Change `IIT_ANY` payload from a single packed OverloadIndex + AnyKind
byte to 2 bytes:
- An 8 bit OverloadIndex
- An 8 pit packed vector + element type constraint.
This will enable `IIT_ANY` to express constraints on the overload type
is a more general fashion compared to a flat `AnyKind` enum.

Also fixed a latent bug in fixed encodings generated by the intrinsic
emitter (exposed by this change). Existing `encodePacked` packs the
type-signature as 8 nibbles into a 32-bit word and then checks if the
MSB bit position (i.e., bit 15) is 0 (to allow it's use in fixed
encoding). This effectively drop any 0 valued bytes in the encoding in
the upper 4 nibbles. Fix this by changing `encodePacked` to use the
actual fixed encoding type and its size.
DeltaFile
+67-56llvm/include/llvm/IR/Intrinsics.td
+66-15llvm/lib/IR/Intrinsics.cpp
+42-26llvm/utils/TableGen/Basic/IntrinsicEmitter.cpp
+38-0llvm/test/TableGen/intrinsic-overload-index-oor.td
+15-12llvm/test/TableGen/intrinsic-struct.td
+13-9llvm/include/llvm/IR/Intrinsics.h
+241-1186 files

LLVM/project 7670d88flang/lib/Optimizer/Transforms/CUDA CUFDeviceFuncTransform.cpp, flang/test/Fir/CUDA cuda-device-func-transform.mlir

[flang][cuda] Set kernel intent(in) as const __restrict__ (#203652)

Set attributes on `intent(in)` so `ld.global.nc` is generated by the
backend.
DeltaFile
+38-0flang/lib/Optimizer/Transforms/CUDA/CUFDeviceFuncTransform.cpp
+16-2flang/test/Fir/CUDA/cuda-device-func-transform.mlir
+54-22 files

LLVM/project 1b58516clang/lib/Sema SemaLifetimeSafety.h, clang/test/Sema/LifetimeSafety safety.cpp nocfg.cpp

[LifetimeSafety] Improve aliasing notes to include callee name (#203606)
DeltaFile
+56-51clang/test/Sema/LifetimeSafety/safety.cpp
+39-25clang/test/Sema/LifetimeSafety/nocfg.cpp
+14-1clang/lib/Sema/SemaLifetimeSafety.h
+6-6clang/test/Sema/LifetimeSafety/annotation-suggestions.cpp
+115-834 files

LLVM/project f676da3clang/include/clang/Basic DiagnosticSemaKinds.td, clang/lib/Sema SemaLifetimeSafety.h

users/usx95/helpful-invalidations
DeltaFile
+74-74clang/test/Sema/LifetimeSafety/invalidations.cpp
+14-6clang/lib/Sema/SemaLifetimeSafety.h
+4-4clang/test/Sema/LifetimeSafety/safety.cpp
+4-4clang/include/clang/Basic/DiagnosticSemaKinds.td
+96-884 files

LLVM/project af60d56flang/lib/Semantics expression.cpp, flang/test/Semantics cuf28.cuf cuf-generic-literal-host.cuf

[flang][CUDA] Keep host literals from using unified-memory generic distance (#201257)

Fix CUDA generic resolution under `-gpu=mem:unified` so unattributed
literals and expression temporaries are not treated as unified-memory
actuals.

Previously, a host scalar literal such as `1.0` could score as
compatible with a `DEVICE` dummy and incorrectly select the
device-scalar overload. This could pass a host stack address to a device
helper and fail at runtime. The fix applies the unified/managed memory
distance columns only to symbol-backed actuals.
DeltaFile
+37-0flang/test/Semantics/cuf28.cuf
+34-0flang/test/Semantics/cuf-generic-literal-host.cuf
+15-2flang/lib/Semantics/expression.cpp
+86-23 files

LLVM/project 3203867llvm/test/CodeGen/AMDGPU fcanonicalize.ll maximumnum.ll, llvm/test/Transforms/LICM vector-insert.ll

Merge branch 'main' into users/usx95/06-12-users_usx95_helpful-invalidations
DeltaFile
+2,760-227llvm/test/CodeGen/AMDGPU/fcanonicalize.ll
+1,357-0llvm/test/CodeGen/AMDGPU/maximumnum.ll
+1,317-0llvm/test/CodeGen/AMDGPU/minimumnum.ll
+1,313-0llvm/test/CodeGen/AMDGPU/packed-u64.ll
+736-0llvm/test/CodeGen/AMDGPU/shl.v2i64.ll
+0-572llvm/test/Transforms/LICM/vector-insert.ll
+7,483-799172 files not shown
+14,680-2,140178 files

LLVM/project 04f1175clang/lib/Sema SemaLifetimeSafety.h, clang/test/Sema/LifetimeSafety safety.cpp nocfg.cpp

[LifetimeSafety] Improve aliasing notes to include callee name (#203606)
DeltaFile
+56-51clang/test/Sema/LifetimeSafety/safety.cpp
+39-25clang/test/Sema/LifetimeSafety/nocfg.cpp
+14-1clang/lib/Sema/SemaLifetimeSafety.h
+6-6clang/test/Sema/LifetimeSafety/annotation-suggestions.cpp
+115-834 files

LLVM/project 6b82a04flang/lib/Optimizer/Transforms/CUDA CUFOpConversion.cpp, flang/test/Fir/CUDA cuda-global-addr.mlir

[flang][cuda] Fix host loads from CUDA constant globals (#203064)

This fixes CUDA Fortran lowering for scalar module variables with the
constant attribute that are read from host code, such as launch
configuration expressions or CUF kernel loop bounds.

Previously, host-side declarations for these globals could be rewritten
to device constant-memory addresses, causing host loads to dereference
the result of _FortranACUFGetDeviceAddress. The fix preserves host reads
from the host-visible global while still using the device address for
host-to-device assignment updates.

A FIR regression test covers host reads and assignment updates for
scalar CUDA constant globals.
DeltaFile
+36-0flang/lib/Optimizer/Transforms/CUDA/CUFOpConversion.cpp
+25-0flang/test/Fir/CUDA/cuda-global-addr.mlir
+61-02 files

LLVM/project 0a713dbllvm/test/CodeGen/AMDGPU fcanonicalize.ll llvm.amdgcn.sched.group.barrier.ll

Merge branch 'main' into users/petar-avramovic/pk-f64
DeltaFile
+2,760-227llvm/test/CodeGen/AMDGPU/fcanonicalize.ll
+1,813-654llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sched.group.barrier.ll
+1,357-0llvm/test/CodeGen/AMDGPU/maximumnum.ll
+1,317-0llvm/test/CodeGen/AMDGPU/minimumnum.ll
+1,313-0llvm/test/CodeGen/AMDGPU/packed-u64.ll
+784-230llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.ll
+9,344-1,111279 files not shown
+20,659-3,533285 files

LLVM/project baf76a8bolt/lib/Profile DataAggregator.cpp, bolt/test/perf2bolt perf_test.test

[BOLT] Change DataAggregator error types (#203651)

1. In `filterBinaryMMapInfo`, replace `incovertibleErrorCode` with errc
   code as `parseMainEvents` converts returned Error to std::error_code.
2. In `parsePerfData`, pass through Error returned by `prepareToParse`
   for memory events.

Test Plan: updated perf_test.test
DeltaFile
+4-0bolt/test/perf2bolt/perf_test.test
+2-2bolt/lib/Profile/DataAggregator.cpp
+6-22 files

LLVM/project 52a3108llvm/test/CodeGen/RISCV clmul.ll clmulr.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll clmul-sdnode.ll

Merge branch 'main' into users/shiltian/reqd_work_group_size-verifier
DeltaFile
+38,494-84,026llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+22,388-22,086llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+19,087-24,391llvm/test/CodeGen/RISCV/clmul.ll
+10,473-12,572llvm/test/CodeGen/RISCV/clmulr.ll
+10,281-12,374llvm/test/CodeGen/RISCV/clmulh.ll
+8,361-8,920llvm/test/CodeGen/RISCV/rvv/expandload.ll
+109,084-164,3694,730 files not shown
+509,330-370,8814,736 files

LLVM/project a8e3c08libc/src/__support/threads raw_rwlock.h raw_mutex.h, libc/src/__support/threads/linux futex_utils.h

[libc] fix EAGAIN being treated as timeout in mutex and rwlock (#203574)

fix #203411. 

This PR addresses the problem that `EAGAIN` may be treated as timeout in
mutex and rwlock. Two changes are applied:

1. timeout sites always explicitly check for timeout now to make the
logic more robust;
2. the futex wait now discards the error of `EAGAIN/EWOULDBLOCK` and
returns 0;

We don't distinguish waking up from signal and waking up from mismatch
for the following 3 reasons:
- We have userspace guard to avoid futex syscall if we already know
value would match, it seems awkward to make that check returns error, as
we may wake up and loop back to the check, where signal is consumed but
we still return error....;
- futex syscall can spuriously wake up anyway, there is no way to tell

    [3 lines not shown]
DeltaFile
+4-2libc/src/__support/threads/raw_rwlock.h
+5-0libc/src/__support/threads/linux/futex_utils.h
+2-2libc/src/__support/threads/raw_mutex.h
+1-1libc/test/integration/src/__support/threads/futex_requeue_test.cpp
+12-54 files

LLVM/project 92d7a7fmlir/include/mlir/Dialect/Quant/IR QuantDialectBytecode.td QuantBase.td, mlir/lib/Dialect/Quant/IR QuantDialectBytecode.cpp

QuantileType bytecode patch (#203495)

Since the merge of this
PR(https://github.com/llvm/llvm-project/pull/190321) there were some
issues identified, such as QuantileType not being added in the ByteCode
files. This PR focuses on fixing these missing pieces which should make
QuantileType a complete and functional type.
DeltaFile
+23-0mlir/lib/Dialect/Quant/IR/QuantDialectBytecode.cpp
+15-1mlir/include/mlir/Dialect/Quant/IR/QuantDialectBytecode.td
+16-0mlir/test/Dialect/Quant/Bytecode/types.mlir
+10-0mlir/include/mlir/Dialect/Quant/IR/QuantBase.td
+1-0mlir/include/mlir/Dialect/Quant/IR/Quant.h
+65-15 files

LLVM/project c9b25a6libc/include stdlib.yaml, libc/src/stdlib mkstemp.cpp mkstemp.h

[libc] implement mkstemp (#199220)

Fixes #191266
Implements `mkstemp` as specified in POSIX
Currently Linux-only since it relies on the Linux syscall wrappers for
`getrandom` and `open`
DeltaFile
+207-0libc/test/src/stdlib/mkstemp_test.cpp
+87-0libc/src/stdlib/mkstemp.cpp
+31-0libc/src/stdlib/mkstemp.h
+21-0libc/test/src/stdlib/CMakeLists.txt
+17-0libc/src/stdlib/CMakeLists.txt
+6-0libc/include/stdlib.yaml
+369-03 files not shown
+372-09 files

LLVM/project 7430170clang/test/Analysis/Scalable/PointerFlow lref-to-rref-cast.test

add ' --ssaf-compilation-unit-id'
DeltaFile
+2-1clang/test/Analysis/Scalable/PointerFlow/lref-to-rref-cast.test
+2-11 files