LLVM/project e1ad646llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll memintrinsic-unroll.ll, llvm/test/tools/llvm-dwarfdump/X86 simplified-template-names.s

Merge branch 'main' into users/kparzysz/flang-test-fix
DeltaFile
+6,475-9,691llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+7,387-7,087llvm/test/tools/llvm-dwarfdump/X86/simplified-template-names.s
+6,665-6,661llvm/test/tools/llvm-ir2vec/output/reference_x86_entities.txt
+5,202-5,039llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+4,325-0llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir
+768-2,280llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+30,822-30,7581,080 files not shown
+73,945-62,1301,086 files

LLVM/project aa3f930libc/src/__support/math atanf_float.h atanf.h, libc/test/src/math atanf_test.cpp

[libc][math] Add float-only implementation for atanf. (#167004)

Algorithm:
```
  1)  atan(x) = sign(x) * atan(|x|)

  2)  If |x| > 1 + 1/32, atan(|x|) = pi/2 - atan(1/|x|)

  3)  For 1/16 < |x| < 1 + 1/32, we find k such that: | |x| - k/16 | <= 1/32.
      Let y = |x| - k/16, then using the angle summation formula, we have:
    atan(|x|) = atan(k/16) + atan( (|x| - k/16) / (1 + |x| * k/16) )
              = atan(k/16) + atan( y / (1 + (y + k/16) * k/16 )
              = atan(k/16) + atan( y / ((1 + k^2/256) + y * k/16) )

  4)  Let u = y / (1 + k^2/256), then we can rewritten the above as:
    atan(|x|) = atan(k/16) + atan( u / (1 + u * k/16) )
              ~ atan(k/16) + (u - k/16 * u^2 + (k^2/256 - 1/3) * u^3 +
                              + (k/16 - (k/16)^3) * u^4) + O(u^5)
```

    [2 lines not shown]
DeltaFile
+168-0libc/src/__support/math/atanf_float.h
+41-0libc/test/src/math/exhaustive/atanf_float_test.cpp
+15-0libc/test/src/math/exhaustive/CMakeLists.txt
+10-3libc/test/src/math/atanf_test.cpp
+10-0libc/src/__support/math/atanf.h
+1-0libc/src/__support/math/CMakeLists.txt
+245-36 files

LLVM/project a74bfc0mlir/lib/Dialect/Tosa/IR TosaCanonicalizations.cpp, mlir/test/Dialect/Tosa canonicalize.mlir

[mlir][tosa] Fix select folder when operands are broadcast (#165481)

This commit addresses a crash in the dialects folder. The currently
folder assumes no broadcasting of the input operand happens and
therefore the folder can complain that the returned value was not the
same
shape as the result.

For now, this commit ensures no folding happens when broadcasting is
involved. In the future, folding with a broadcast could likely be
supported by inserting a `tosa.tile` operation before returning the
operand. This type of transformation is likely better suited for a
canonicalization pass. This commit only aims to avoid the crash.
DeltaFile
+42-0mlir/test/Dialect/Tosa/canonicalize.mlir
+17-0mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp
+59-02 files

LLVM/project 364fe55flang/lib/Lower Bridge.cpp, flang/test/Lower volatile3.f90 volatile4.f90

[flang] simplify pointer assignments (#168732)

Pointer assignment lowering was done in different ways depending on
contexts and types, sometimes still using runtime calls when this is not
needed and the complexity of doing this inline is very limited (the
pointer and target descriptors were already prepared inline, the runtime
is just doing the descriptor assignment and ensuring the pointer
descriptor keep its pointer flag).

Slightly extent the inline version that was used for Forall and use it
for all cases.
When lowering without HLFIR is removed, this will allow removing more
code.
DeltaFile
+165-165flang/test/Lower/volatile3.f90
+55-67flang/test/Lower/volatile4.f90
+59-54flang/lib/Lower/Bridge.cpp
+69-0flang/test/Lower/pointer-disassociate-character.f90
+26-24flang/test/Lower/HLFIR/allocatable-and-pointer-status-change.f90
+6-5flang/test/Lower/HLFIR/issue80884.f90
+380-3154 files not shown
+395-32910 files

LLVM/project a9a14d6flang-rt/lib/runtime type-code.cpp, flang-rt/unittests/Runtime TypeCode.cpp CMakeLists.txt

[flang-rt] Fix TypeCategory for quad-precision COMPLEX (#168090)

Modify the TypeCategory for quad-precision COMPLEX to
CFI_type_float128_Complex so it matches the TypeCode returned
by SELECT TYPE lowering.

Fixes #134565
DeltaFile
+43-0flang-rt/unittests/Runtime/TypeCode.cpp
+1-1flang-rt/lib/runtime/type-code.cpp
+1-0flang-rt/unittests/Runtime/CMakeLists.txt
+45-13 files

LLVM/project 0e8222bflang/include/flang/Optimizer/Dialect/FIRCG CGOps.td, flang/lib/Optimizer/CodeGen PreCGRewrite.cpp

[flang][debug] Make common blocks data extraction more robust. (#168752)

Our current implementation for extracting information about common block
required traversal of FIR which was not ideal but previously there was
no other way to obtain that information. The `[hl]fir.declare` was
extended in commit https://github.com/llvm/llvm-project/pull/155325 to
include storage and storage_offset. This commit adds these operands in
`fircg.ext_declare` and then use them in `AddDebugInfoPass` to create
debug data for common blocks.
DeltaFile
+69-51flang/lib/Optimizer/Transforms/AddDebugInfo.cpp
+18-18flang/test/Transforms/debug-common-block.fir
+20-0flang/test/Fir/declare-codegen.fir
+20-0flang/test/Integration/debug-module-equivalence.f90
+9-1flang/include/flang/Optimizer/Dialect/FIRCG/CGOps.td
+1-0flang/lib/Optimizer/CodeGen/PreCGRewrite.cpp
+137-706 files

LLVM/project 5d0bfd1mlir/lib/Conversion/SCFToGPU SCFToGPU.cpp, mlir/test/Conversion/SCFToGPU parallel_loop.mlir

[MLIR][SCFToGPU] Guard operands before AffineApplyOp::create to avoid crash (#167959)

This fixes a crash in SCF→GPU when building the per‑dim index for mapped
scf.parallel.

**Change**:
- Map step/lb through cloningMap, then run ensureLaunchIndependent.
- If either is still unavailable at launch scope, emit a match‑failure;
otherwise build the affine.apply.

**Why this is correct:**
- Matches how the pass already handles launch bounds; avoids creating an
op with invalid operands and replaces a segfault with a clear
diagnostic.

**Tests**:
- Added two small regressions that lower to gpu.launch and exercise the
affine.apply path.


    [2 lines not shown]
DeltaFile
+48-0mlir/test/Conversion/SCFToGPU/parallel_loop.mlir
+16-2mlir/lib/Conversion/SCFToGPU/SCFToGPU.cpp
+64-22 files

LLVM/project 4bb4ad4llvm/lib/Target/AArch64 AArch64MachineFunctionInfo.cpp AArch64MachineFunctionInfo.h

[AArch64][PAC] Use enum to describe LR signing condition (NFC) (#168548)

Express the condition of signing the return address in a function using
an `enum class` instead of a pair of `bool`s. Define `enum class
SignReturnAddress` with the values corresponding to the three possible
modes that can be requested via "sign-return-address" function
attribute.

Previously, there were two overloads of `shouldSignReturnAddress`
accepting either `const MachineFunction &` or `bool` argument. Due to
pointer-to-bool conversion, when `shouldSignReturnAddress` was
incorrectly called with `const MachineFunction *` argument, the latter
overload was used instead of reporting a compile-time error.
DeltaFile
+23-21llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.cpp
+18-8llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
+10-7llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+1-2llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+52-384 files

LLVM/project b1c5e6eopenmp/runtime/src/include omp_lib.F90.var

fix omp_lib.F90.var and add F90 test
DeltaFile
+2-0openmp/runtime/src/include/omp_lib.F90.var
+2-01 files

LLVM/project e085082mlir/lib/Dialect/MemRef/Utils MemRefUtils.cpp, mlir/test/Dialect/MemRef transform-ops.mlir

[mlir][memref] Generalize dead store detection to all view-like ops (#168507)

The dead alloc elimination pass previously considered only subviews when
checking for dead stores. This change generalizes the logic to support
all view-like operations, ensuring broader coverage.
DeltaFile
+67-0mlir/test/Dialect/MemRef/transform-ops.mlir
+2-2mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp
+69-22 files

LLVM/project 4544ff6openmp/runtime/src CMakeLists.txt

[OpenMP][AIX] Not to create symbolic links to libomp.so in install step (NFC) (#168585)

Commit bb563b1 handles the links in the build directory but 
misses the case in the install step. This patch is to link only 
the libomp.a on AIX.
DeltaFile
+9-3openmp/runtime/src/CMakeLists.txt
+9-31 files

LLVM/project aeba7a8clang/docs ReleaseNotes.rst, clang/include/clang/Basic DiagnosticSemaKinds.td

[clang][diagnostics] added warning for possible enum compare typo (#168445)

Added diagnosis and fixit comment for possible accidental comparison
operator in an enum.

Closes: #168146
DeltaFile
+98-0clang/test/Sema/warn-enum-compare-typo.c
+55-0clang/lib/Sema/SemaDecl.cpp
+8-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+4-0clang/docs/ReleaseNotes.rst
+165-04 files

LLVM/project eb37df5flang/lib/Lower/OpenMP OpenMP.cpp

Fix clang-foramt.
DeltaFile
+3-3flang/lib/Lower/OpenMP/OpenMP.cpp
+3-31 files

LLVM/project 859b874flang/test/Lower/OpenMP target.f90

Fix test.
DeltaFile
+1-3flang/test/Lower/OpenMP/target.f90
+1-31 files

LLVM/project 3f65b96flang/test/Lower/OpenMP target.f90 is-device-ptr-target-lowering.f90

move test to flang/test/Lower/OpenMP/target.f90.
DeltaFile
+32-0flang/test/Lower/OpenMP/target.f90
+0-30flang/test/Lower/OpenMP/is-device-ptr-target-lowering.f90
+32-302 files

LLVM/project 150d9b7. .git-blame-ignore-revs

[clang-tidy][NFC] Add clang-tidy formatting commit to `.git-blame-ignore-revs` (#167126)

Co-authored-by: Baranov Victor <bar.victor.2002 at gmail.com>
DeltaFile
+9-0.git-blame-ignore-revs
+9-01 files

LLVM/project cfda27dmlir/lib/Dialect/Vector/Transforms LowerVectorScan.cpp, mlir/test/Dialect/Vector vector-scan-transforms.mlir

[mlir][Vector] Add support for scalable vectors to `ScanToArithOps` (#123117)

Note, scalable reductions dims are left as a TODO.
DeltaFile
+93-1mlir/test/Dialect/Vector/vector-scan-transforms.mlir
+12-2mlir/lib/Dialect/Vector/Transforms/LowerVectorScan.cpp
+105-32 files

LLVM/project 2480ae5openmp/runtime/src/include omp_lib.h.var

fix omp_lib.h.var
DeltaFile
+2-2openmp/runtime/src/include/omp_lib.h.var
+2-21 files

LLVM/project 96e1c90llvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp, llvm/test/CodeGen/AArch64/GlobalISel legalize-min-max-crash.mir

[AArch64][GlobalISel] Don't crash when legalising  G_*MIN/G_*MAX of pointer vector
DeltaFile
+174-0llvm/test/CodeGen/AArch64/GlobalISel/legalize-min-max-crash.mir
+1-1llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+175-12 files

LLVM/project 57765c0flang/lib/Lower/OpenMP OpenMP.cpp, flang/test/Lower/OpenMP is-device-ptr-target-lowering.f90

[Flang][OpenMP] Add lowering support for is_device_ptr clause

Add support for OpenMP is_device_ptr clause for target directives.
DeltaFile
+76-3flang/lib/Lower/OpenMP/OpenMP.cpp
+30-0flang/test/Lower/OpenMP/is-device-ptr-target-lowering.f90
+106-32 files

LLVM/project 76f1949clang/lib/Headers/hlsl hlsl_intrinsics.h, clang/test/CodeGenHLSL/builtins fwidth.hlsl

[HLSL] Implement the `fwidth` intrinsic for DXIL and SPIR-V target (#161378)

Adds the fwidth intrinsic for HLSL.
The DXIL path only requires modification to the hlsl headers.
The SPIRV path implements the OpFwidth builtin in Clang and instruction
selection for the OpFwidth instruction in LLVM.
Also adds shader stage tests to the ddx_coarse and ddy_coarse
instructions used by fwidth.

Closes #99120

---------

Co-authored-by: Alexander Johnston <alexander.johnston at amd.com>
DeltaFile
+118-0clang/test/CodeGenHLSL/builtins/fwidth.hlsl
+47-0llvm/test/CodeGen/SPIRV/hlsl-intrinsics/fwidth.ll
+41-0clang/test/CodeGenSPIRV/Builtins/fwidth.c
+40-0clang/lib/Headers/hlsl/hlsl_intrinsics.h
+24-0clang/test/SemaSPIRV/BuiltIns/fwidth-errors.c
+11-12llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+281-128 files not shown
+365-1814 files

LLVM/project b40af54bolt/include/bolt/Core MCPlusBuilder.h, bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp

[BOLT][BTI] Add MCPlusBuilder::addBTItoBBStart

This function contains most of the logic for BTI:
- it takes the BasicBlock and the instruction used to jump to it.
- then it checks if the first non-pseudo instruction is a sufficient
landing pad for the used call.
- if not, it generates the correct BTI instruction.

Also introduce the isBTIVariantCoveringCall helper to simplify the logic.
DeltaFile
+105-0bolt/unittests/Core/MCPlusBuilder.cpp
+75-0bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+13-0bolt/include/bolt/Core/MCPlusBuilder.h
+193-03 files

LLVM/project 905a5eabolt/include/bolt/Core MCPlusBuilder.h, bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp

[BOLT][BTI] Add MCPlusBuilder::updateBTIVariant

Checks if an instruction is BTI, and updates the immediate value to the
newly requested variant.
DeltaFile
+8-0bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+6-0bolt/include/bolt/Core/MCPlusBuilder.h
+6-0bolt/unittests/Core/MCPlusBuilder.cpp
+20-03 files

LLVM/project d054c47bolt/include/bolt/Core MCPlusBuilder.h, bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp

[BOLT][BTI] Add MCPlusBuilder::isBTILandingPad

- takes both implicit and explicit BTIs into account
- fix related comment in AArch64BranchTargets.cpp
DeltaFile
+18-0bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+17-0bolt/unittests/Core/MCPlusBuilder.cpp
+14-0bolt/include/bolt/Core/MCPlusBuilder.h
+4-2llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
+53-24 files

LLVM/project c508aacbolt/include/bolt/Core MCPlusBuilder.h, bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp

[BOLT] Fix naming

- CouldCall -> CallTarget
- CouldJump -> JumpTarget
DeltaFile
+3-3llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+2-2bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+1-1bolt/include/bolt/Core/MCPlusBuilder.h
+6-63 files

LLVM/project 21c4c15llvm/lib/Target/AArch64 AArch64ISelLowering.cpp SVEInstrFormats.td, llvm/test/CodeGen/AArch64 sve-bf16-combines.ll sve-bf16-arith.ll

[LLVM][CodeGen][SVE] Only use unpredicated bfloat instructions when all lanes are in use. (#168387)

While SVE support for exception safe floating point code generation is
bare bones we try to ensure inactive lanes remiain inert. I mistakenly
broke this rule when adding support for SVE-B16B16 by lowering some
bfloat operations of unpacked vectors to unpredicated instructions.
DeltaFile
+10-26llvm/test/CodeGen/AArch64/sve-bf16-combines.ll
+12-6llvm/test/CodeGen/AArch64/sve-bf16-arith.ll
+3-3llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+0-2llvm/lib/Target/AArch64/SVEInstrFormats.td
+25-374 files

LLVM/project 9e86c0dmlir/lib/Dialect/Linalg/IR LinalgOps.cpp

[MLIR] Apply clang-tidy fixes for readability-container-size-empty in LinalgOps.cpp (NFC)
DeltaFile
+1-1mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+1-11 files

LLVM/project c6a79a5mlir/lib/Target/LLVMIR/Dialect/LLVMIR LLVMToLLVMIRTranslation.cpp

[MLIR] Apply clang-tidy fixes for readability-identifier-naming in LLVMToLLVMIRTranslation.cpp (NFC)
DeltaFile
+5-5mlir/lib/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.cpp
+5-51 files

LLVM/project 3da82afmlir/lib/Dialect/SparseTensor/Transforms SparseBufferRewriting.cpp

[MLIR] Apply clang-tidy fixes for bugprone-argument-comment in SparseBufferRewriting.cpp (NFC)
DeltaFile
+4-4mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
+4-41 files

LLVM/project 848f8bellvm/include/llvm/IR IntrinsicsAMDGPU.td, llvm/lib/Target/AMDGPU SIISelLowering.cpp SIInstructions.td

[AMDGPU] Add wave reduce intrinsics for float types - 2

Supported Ops: `fadd`, `fsub`
DeltaFile
+1,001-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+1,001-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+44-3llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2-0llvm/lib/Target/AMDGPU/SIInstructions.td
+1-1llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+2-0llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+2,051-46 files