LLVM/project 6ae5803llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch fsqrt-reciprocal-estimate.ll

[LoongArch] Fix incorrect reciprocal sqrt estimate semantics (#187621)

The current implementation of getSqrtEstimate() has incorrect semantics
when using `FRSQRTE`.

`FRSQRTE` computes an approximation to 1/sqrt(x), but the existing code
multiplies the estimate by the operand when Reciprocal is true. This
results in returning sqrt(x) instead of 1/sqrt(x), effectively reversing
the intended semantics of the 'Reciprocal' flag.

Additionally, the implementation does not properly account for LLVM's
Newton-Raphson refinement pipeline. When refinement steps are requested,
the initial estimate must be in reciprocal form so that the generic
DAGCombiner can apply NR iterations correctly.

This patch fixes the behavior by:

- Returning the raw FRSQRTE result when Reciprocal is true, or when
  refinement steps are required.

    [13 lines not shown]
DeltaFile
+67-31llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+0-12llvm/test/CodeGen/LoongArch/fsqrt-reciprocal-estimate.ll
+0-4llvm/test/CodeGen/LoongArch/lasx/fsqrt-reciprocal-estimate.ll
+0-2llvm/test/CodeGen/LoongArch/lsx/fsqrt-reciprocal-estimate.ll
+67-494 files

LLVM/project d49d24cllvm/lib/Target/RISCV RISCVInstrInfoXRivos.td, llvm/test/Analysis/CostModel/RISCV rvv-extractelement.ll rvv-insertelement.ll

[RISCV] Remove the experimental XRivosVisni extension (#188370)
DeltaFile
+0-399llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
+0-331llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
+0-293llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
+0-291llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll
+1-86llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
+0-43llvm/test/MC/RISCV/xrivosvisni-valid.s
+1-1,44313 files not shown
+7-1,50819 files

LLVM/project 218f240llvm/lib/Target/RISCV RISCVTargetTransformInfo.cpp

[RISCV] Fix ssub_sat cost model to use signed VSSUB instead of VSSUBU (#188195)

Intrinsic::ssub_sat was incorrectly mapped to RISCV::VSSUBU_VV (unsigned
saturating subtract) instead of RISCV::VSSUB_VV (signed saturating
subtract), causing wrong cost estimates.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
DeltaFile
+1-1llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+1-11 files

LLVM/project 448c10ellvm/lib/Target/LoongArch LoongArchISelLowering.cpp

rebase after vshuf4i_d was defined
DeltaFile
+1-1llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+1-11 files

LLVM/project 5d7b810llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch/lasx vec-shuffle-byte-rotate.ll

[LoongArch] Custom legalize vector_shuffle to `xvshuf4i.d`
DeltaFile
+28-4llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+3-7llvm/test/CodeGen/LoongArch/lasx/vec-shuffle-byte-rotate.ll
+2-7llvm/test/CodeGen/LoongArch/lasx/ir-instruction/shuffle-as-xvshuf4i.ll
+33-183 files

LLVM/project 4a83fe7llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/test/CodeGen/LoongArch/lsx vec-sext.ll vec-shuffle-any-ext.ll

[LoongArch] Custom legalize vector_shuffle to `vextrins`

TODO: LASX supporting will be in a later commit.
DeltaFile
+88-0llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+24-27llvm/test/CodeGen/LoongArch/lsx/vec-sext.ll
+24-24llvm/test/CodeGen/LoongArch/lsx/ir-instruction/shuffle-as-vextrins.ll
+18-21llvm/test/CodeGen/LoongArch/lsx/vec-shuffle-any-ext.ll
+11-23llvm/test/CodeGen/LoongArch/lsx/vec-trunc.ll
+4-10llvm/test/CodeGen/LoongArch/lsx/vmskcond.ll
+169-1056 files

LLVM/project 303a1a4libc/src/__support/wctype perfect_hash_map.h

implicit cast
DeltaFile
+9-7libc/src/__support/wctype/perfect_hash_map.h
+9-71 files

LLVM/project 04c12callvm/test/CodeGen/LoongArch/lsx/ir-instruction shuffle-as-vextrins.ll

[LoongArch][NFC] Pre-commit tests for `vextrins`
DeltaFile
+78-0llvm/test/CodeGen/LoongArch/lsx/ir-instruction/shuffle-as-vextrins.ll
+78-01 files

LLVM/project 8de427blibc/src/__support/wctype upper_to_lower.h lower_to_upper.h, libc/test/src/__support/wctype wctype_perfect_hash_test.cpp

fix size difference on windows
DeltaFile
+540-523libc/src/__support/wctype/upper_to_lower.h
+460-538libc/src/__support/wctype/lower_to_upper.h
+42-28libc/utils/wctype_utils/conversion/hex_writer.py
+24-0libc/test/src/__support/wctype/wctype_perfect_hash_test.cpp
+1,066-1,0894 files

LLVM/project cb12534compiler-rt/lib/builtins emutls.c

[compiler-rt] Suppress unused variable report in emutls

Pull Request: https://github.com/llvm/llvm-project/pull/188329
DeltaFile
+4-0compiler-rt/lib/builtins/emutls.c
+4-01 files

LLVM/project a32d903bolt/docs profiles.md, llvm/include/llvm/CodeGenTypes LowLevelType.h

Merge branch 'main' into users/vitalybuka/spr/compiler-rt-suppress-unused-variable-report-in-emutls
DeltaFile
+132-389llvm/include/llvm/CodeGenTypes/LowLevelType.h
+364-0llvm/test/CodeGen/SPIRV/bool-vector-bitcast.ll
+253-0llvm/lib/Target/SPIRV/SPIRVCtorDtorLowering.cpp
+73-160llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+212-0bolt/docs/profiles.md
+0-135llvm/unittests/CodeGen/GlobalISel/IRTranslatorBF16Test.cpp
+1,034-68494 files not shown
+2,235-1,312100 files

LLVM/project aecfaf1flang/lib/Optimizer/OpenACC/Support RegisterOpenACCExtensions.cpp, flang/test/Fir/OpenACC offload-livein-value-canonicalization.fir

[flang][acc] Handle fir.undefined with OutlineRematerializationOpInterface in OffloadLiveInValueCanonicalization (#188325)

Example:
```fortran
!$ACC KERNELS PRESENT(CG, W1)
  CG(1:W1%WDES1%NPL, NN) = W1%CPTWFP(1:W1%WDES1%NPL)
  CPROJ(:, NN) = W1%CPROJ(1:SIZE(CPROJ,1))
!$ACC END KERNELS
```

When compiling OpenACC kernels containing array section assignments of
rank-2 arrays with a scalar index in one dimension (e.g. `CG(1:NPL,
NN)`), the Fortran lowering creates a `fir.slice` where collapsed
(scalar) dimensions use `fir.undefined index` as the stop/step values.
`SliceOp::getOutputRank()` relies on `getDefiningOp()` returning
`fir::UndefOp` to identify these collapsed dimensions and compute the
correct output rank.

When `fir.undefined` values defined outside an offload region are used

    [15 lines not shown]
DeltaFile
+52-0flang/test/Fir/OpenACC/offload-livein-value-canonicalization.fir
+4-0flang/lib/Optimizer/OpenACC/Support/RegisterOpenACCExtensions.cpp
+56-02 files

LLVM/project 408bb4dlibc/src/math totalordermagbf16.h totalorderbf16.h

[libc] Wrong guards for `totalorderbf16` and `totalordermagbf16` (#188241)

Currently the guards for `totalorderbf16` and `totalordermagbf16` are as
follows:
```
#ifndef LLVM_LIBC_SRC_MATH_TOTALORDERMAGF16_H
#define LLVM_LIBC_SRC_MATH_TOTALORDERMAGF16_H
-
#endif // LLVM_LIBC_SRC_MATH_TOTALORDERMAGF16_H
```
and 
```
#ifndef LLVM_LIBC_SRC_MATH_TOTALORDERF16_H
#define LLVM_LIBC_SRC_MATH_TOTALORDERF16_H
-
#endif // LLVM_LIBC_SRC_MATH_TOTALORDERF16_H
```
As we can see these are for F16 and not BF16 .
This Pr intends to fix that with correct guards as `TOTALORDERBF16` and
`TOTALORDERMAGBF16`
DeltaFile
+3-3libc/src/math/totalordermagbf16.h
+3-3libc/src/math/totalorderbf16.h
+6-62 files

LLVM/project 9999f7flibc/src/math atanpif16.h

[libc] Wrong header guard comment for atanpif16 (#188310)

This PR intends to fix a small nit caused in
[1c1135b](https://github.com/llvm/llvm-project/pull/150400/commits/1c1135b3fccf59537243fc365e83a568f77273ae)
```
#endif // LLVM_LIBC_SRC_MATH_ASINIF16_H
```
to 
```
#endif // LLVM_LIBC_SRC_MATH_ATANPIF16_H
```
DeltaFile
+1-1libc/src/math/atanpif16.h
+1-11 files

LLVM/project b60a39elld/test/wasm/lto thinlto-shared-memory-atomics.ll, lld/wasm LTO.cpp

[lld][WebAssembly] Propagate +atomics for ThinLTO when using --shared-memory (#188381)

When compiling WebAssembly with ThinLTO, functions are partitioned into
isolated `.bc` modules and dispatched to individual LTO backend threads.
During code generation, the `CoalesceFeaturesAndStripAtomics` pass
iterates over the module to gather the union of target features (like
`+atomics`) attached to defined functions. In particular when not using
threads, it lowers away atomics and TLS variables to their
single-threaded equivalents.

However, if a partitioned module only contains globally defined TLS
variables (e.g. there are no functions, or all functions were fully
inlined or stripped by dropDeadSymbols before ThinLTO optimization), the
module becomes completely devoid of function definitions. The coalescing
pass then falls back to fetching features from the `TargetMachine`.
Because in LTO the `TargetMachine` defaults to a generic target without
atomics enabled, the TLS is lowered away and the `wasm-feature-atomics`
flag is omitted from the resulting ThinLTO object partition, causing
`wasm-ld` to immediately reject it.

    [8 lines not shown]
DeltaFile
+40-0lld/test/wasm/lto/thinlto-shared-memory-atomics.ll
+13-0lld/wasm/LTO.cpp
+53-02 files

LLVM/project aafe5bdlibc/src/__support/wctype perfect_hash_map.h

fix bitwidth
DeltaFile
+2-2libc/src/__support/wctype/perfect_hash_map.h
+2-21 files

LLVM/project 6708e82libc/src/__support/wctype perfect_hash_map.h

bit field
DeltaFile
+2-2libc/src/__support/wctype/perfect_hash_map.h
+2-21 files

LLVM/project dd9885cclang-tools-extra/clang-tidy/modernize UseStdFormatCheck.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Add missing #include insertion in macros for modernize-use-std-format (#188247)

Added missing ``#include`` insertion when the format function call
appears as an argument to a macro.

Part of #175183

---------

Co-authored-by: Victor Chernyakin <chernyakin.victor.j at outlook.com>
DeltaFile
+21-0clang-tools-extra/test/clang-tidy/checkers/modernize/use-std-format-macro.cpp
+6-2clang-tools-extra/docs/ReleaseNotes.rst
+2-2clang-tools-extra/clang-tidy/modernize/UseStdFormatCheck.cpp
+29-43 files

LLVM/project 9ecae70libc/test/src/__support/wctype wctype_perfect_hash_test.cpp

fix tests
DeltaFile
+12-6libc/test/src/__support/wctype/wctype_perfect_hash_test.cpp
+12-61 files

LLVM/project d2dab97llvm/lib/Target/SPIRV SPIRVEmitIntrinsics.cpp, llvm/test/CodeGen/SPIRV bool-vector-bitcast.ll

[SPIR-V] Decompose bitcasts involving bool vectors (#187960)

OpTypeBool has no defined bitwidth in SPIR-V, so OpBitcast is invalid
for boolean vector types. Decompose `<N x i1> <-> iN` bitcasts into
element-wise extract/shift/OR and AND/icmp/insert sequences during IR
preprocessing.

Fixes:
https://github.com/kuhar/iree/blob/amdgcn-spirv/spirv-repros/bitcast_crash.ll
and https://github.com/llvm/llvm-project/issues/185815
DeltaFile
+364-0llvm/test/CodeGen/SPIRV/bool-vector-bitcast.ll
+80-0llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+66-0llvm/test/CodeGen/SPIRV/llvm-intrinsics/masked-load-store.ll
+510-03 files

LLVM/project 9d7e716libc/src/__support/wctype perfect_hash_map.h lower_to_upper.h, libc/utils/wctype_utils/conversion hex_writer.py

fix windows build errors
DeltaFile
+16-14libc/src/__support/wctype/perfect_hash_map.h
+2-2libc/utils/wctype_utils/conversion/hex_writer.py
+1-1libc/src/__support/wctype/lower_to_upper.h
+1-1libc/src/__support/wctype/upper_to_lower.h
+1-1libc/src/__support/wctype/CMakeLists.txt
+21-195 files

LLVM/project 2a74c82llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.h AMDGPUCoExecSchedStrategy.cpp

Formatting

Change-Id: I3d89fba145471141ef945b1de15330caa245e82d
DeltaFile
+4-4llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+4-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+8-72 files

LLVM/project f4180a7llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUCoExecSchedStrategy.h, llvm/test/CodeGen/AMDGPU coexec-scheduler.ll

Claude Code review

Change-Id: Iab06de2981b27667cc29a56931dd378ecf7a1b0c
DeltaFile
+115-109llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
+16-26llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+5-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+136-1353 files

LLVM/project 710c2f0llvm/unittests/SandboxIR TrackerTest.cpp

[SandboxIR][Tracker] Test UncondBrInst CondBrInst setters (#187549)

This checks the `setCondition()` and `setSuccessor()` setters introduced
in #187196.
DeltaFile
+78-0llvm/unittests/SandboxIR/TrackerTest.cpp
+78-01 files

LLVM/project 2a7b0f0lldb/bindings/python python-wrapper.swig

[lldb] use the Py_REFCNT() macro instead of directly accessing member (#188161)

[PyObject members are not to be accessed
directly](https://docs.python.org/3/c-api/structures.html#c.PyObject),
but rather through macros, in this case `Py_REFCNT()`.

In most, ie Global Interpreter Lock-enabled, CPython cases,
`Py_REFCNT()` expands to accessing `ob_refcnt` anyway. However, in a
free-threaded CPython, combined with disabling the limited API (since it
requires the GIL for now), the direct member does not exist, causing the
build to fail. The macro expands to the correct access method in the
free-threaded configuration.
DeltaFile
+1-1lldb/bindings/python/python-wrapper.swig
+1-11 files

LLVM/project 30084d7libcxx/include/ext hash_map, libcxx/test/extensions/gnu/hash_map non_standard_layout.pass.cpp

[libc++] Fix type confusion in hash_{,multi}map

The type `__gnu_cxx::hash_{,multi}map` creates objects of type
`std::pair<Key, Value>` and returns pointers to them of type
`std::pair<const Key, Value>`. If either `Key` or `Value` are
non-standard-layout, this is UB, and is furthermore considered by
pointer field protection to be a type confusion, which leads to a
program crash. Fix it by using the correct type for the pair's storage
and using const_cast to form a pointer to the key in the one place where
that is needed.

Reviewers: ldionne

Reviewed By: ldionne

Pull Request: https://github.com/llvm/llvm-project/pull/183223
DeltaFile
+7-11libcxx/include/ext/hash_map
+16-0libcxx/test/extensions/gnu/hash_map/non_standard_layout.pass.cpp
+16-0libcxx/test/extensions/gnu/hash_multimap/non_standard_layout.pass.cpp
+39-113 files

LLVM/project ea4e247llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp GCNSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-scheduler.ll

[AMDGPU] Add block carried latency to CoExecSched

Change-Id: Ib04e40e57d38e127d6c5452d1719e32dacef2ade
DeltaFile
+880-4llvm/test/CodeGen/AMDGPU/coexec-scheduler.ll
+167-45llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+0-37llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+22-5llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+0-4llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+1,069-955 files

LLVM/project c3d69edmlir/lib/Transforms Mem2Reg.cpp, mlir/test/Dialect/LLVMIR mem2reg.mlir

[mlir][mem2reg] Process direct uses inside other regions. (#188359)

We need to add the regions with the direct uses into the list
for processing, otherwise the direct uses will not be removed
and will use the slot after the promotion.

The added LIT test was triggering "after promotion, the slot pointer
should not be used anymore" assertion.
DeltaFile
+16-0mlir/test/Dialect/LLVMIR/mem2reg.mlir
+1-0mlir/lib/Transforms/Mem2Reg.cpp
+17-02 files

LLVM/project 303afa0lldb/source/Plugins/SymbolFile/DWARF DWARFASTParserClang.cpp DWARFASTParserClang.h, lldb/test/API/lang/cpp/non-type-template-param-member-ptr main.cpp TestCppNonTypeTemplateParamPtrToMember.py

[lldb][DWARFASTParserClang] Handle pointer-to-member-data non-type template (#187598)

## Description

### Problem
MakeAPValue in DWARFASTParserClang.cpp did not handle
pointer-to-member-data non-type template parameters (e.g., template <int
S::*P>), causing LLDB to produce incorrect results or crash.

DWARF encodes pointer-to-member-data NTTPs as
`DW_TAG_template_value_parameter` with a `DW_AT_const_value`
representing the byte offset of the member within the containing struct.
MakeAPValue is responsible for converting this value into a clang
APValue, but it only handled integer/enum and floating-point types. For
pointer-to-member types, it returned `std::nullopt`.

This caused the caller (ParseTemplateDIE) to fall back to creating a
type-only TemplateArgument (kind=Type) instead of a value-carrying one.
When two specializations differ only by which member they point to

    [209 lines not shown]
DeltaFile
+94-11lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+16-0lldb/test/API/lang/cpp/non-type-template-param-member-ptr/main.cpp
+14-0lldb/test/API/lang/cpp/non-type-template-param-member-ptr/TestCppNonTypeTemplateParamPtrToMember.py
+7-0lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.h
+3-0lldb/test/API/lang/cpp/non-type-template-param-member-ptr/Makefile
+134-115 files

LLVM/project 809e412llvm/test/CodeGen/AMDGPU memintrinsic-unroll.ll fptosi-sat-vector.ll, llvm/test/CodeGen/X86 vector-interleaved-load-i64-stride-7.ll vector-interleaved-load-i8-stride-8.ll

Merge branch 'fix-blockfreq-unroll-unconditional-latches--fast' into fix-blockfreq-unroll-unconditional-latches--uniform
DeltaFile
+6,835-6,798llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+5,208-5,214llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-7.ll
+3,046-3,042llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-8.ll
+4,674-713llvm/test/CodeGen/AMDGPU/fptosi-sat-vector.ll
+4,293-678llvm/test/CodeGen/AMDGPU/fptoui-sat-vector.ll
+4,523-0llvm/test/tools/llvm-mca/RISCV/SiFiveX100/rvv/arithmetic.test
+28,579-16,4453,996 files not shown
+217,312-84,1304,002 files