LLVM/project c85457flldb/source/Plugins/ABI/ARM ABISysV_arm.cpp, lldb/source/Plugins/ABI/X86 ABISysV_x86_64.cpp ABIWindows_x86_64.cpp

[lldb][TypeSystemClang] Remove mostly unused is_complex output parameter to IsFloatingPointType (#178906)

Depends on:
* https://github.com/llvm/llvm-project/pull/178904

(only last commit is relevant for the review)

This is part of a patch series to clean up the
TypeSystemClang::IsFloatingPointType API. The `is_complex` parameter is
rarely checked. This patch introduces a `CompilerType::IsComplexType`
API which callers that previously checked `is_complex` can use instead.

This will also allow us to remove `CompilerType::IsFloat`, which is just
`IsFloatingPointType` that ignores the `is_complex` parameter.
DeltaFile
+12-11lldb/source/Symbol/CompilerType.cpp
+10-12lldb/source/Plugins/ABI/ARM/ABISysV_arm.cpp
+5-14lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
+15-0lldb/unittests/Symbol/TestTypeSystemClang.cpp
+2-4lldb/source/Plugins/ABI/X86/ABISysV_x86_64.cpp
+2-4lldb/source/Plugins/ABI/X86/ABIWindows_x86_64.cpp
+46-455 files not shown
+54-5511 files

LLVM/project 871c643llvm/include/llvm/Transforms/IPO Attributor.h, llvm/lib/Passes PassBuilderPipelines.cpp

Attributor: Add -light options to -attributor-enable flag (#179346)

Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.

Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.

I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
DeltaFile
+24-0llvm/test/Other/opt-pipeline-attributor-enable.ll
+12-2llvm/lib/Passes/PassBuilderPipelines.cpp
+5-1llvm/include/llvm/Transforms/IPO/Attributor.h
+41-33 files

LLVM/project da43386clang/test/CIR/CodeGenCUDA filter-decl.cu

fix nit test case
DeltaFile
+1-1clang/test/CIR/CodeGenCUDA/filter-decl.cu
+1-11 files

LLVM/project ac0327aclang/lib/CIR/CodeGen CIRGenModule.cpp

le format monseiur
DeltaFile
+3-4clang/lib/CIR/CodeGen/CIRGenModule.cpp
+3-41 files

LLVM/project b204f16clang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

fix lit includes yet again
DeltaFile
+5-5clang/test/CIR/CodeGenCUDA/filter-decl.cu
+2-2clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+7-72 files

LLVM/project bb7054cclang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

nit: fix lit includes
DeltaFile
+4-4clang/test/CIR/CodeGenCUDA/filter-decl.cu
+1-1clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+5-52 files

LLVM/project 503be9fclang/lib/CIR/CodeGen CIRGenModule.cpp

fmt yo
DeltaFile
+1-1clang/lib/CIR/CodeGen/CIRGenModule.cpp
+1-11 files

LLVM/project d9fbd4aclang/lib/CIR/CodeGen CIRGenModule.cpp, clang/test/CIR/CodeGenCUDA filter-decl.cu

address comments
DeltaFile
+32-14clang/test/CIR/CodeGenCUDA/filter-decl.cu
+2-9clang/lib/CIR/CodeGen/CIRGenModule.cpp
+34-232 files

LLVM/project fed5831clang/lib/CIR/CodeGen CIRGenModule.cpp TargetInfo.cpp, clang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

[CIR][CUDA] Add NVPTX target info and CUDA/HIP global emission filtering
DeltaFile
+66-0clang/lib/CIR/CodeGen/CIRGenModule.cpp
+37-0clang/test/CIR/CodeGenCUDA/filter-decl.cu
+30-0clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+19-0clang/lib/CIR/CodeGen/TargetInfo.cpp
+4-0clang/lib/CIR/CodeGen/CIRGenModule.h
+2-0clang/lib/CIR/CodeGen/TargetInfo.h
+158-06 files

LLVM/project 5121780llvm/lib/Target/AArch64 AArch64SVEInstrInfo.td AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 intrinsic-cttz-elts-sve.ll

[AArch64] Fix cttz.elts codegen for fixed-length vectors (#178902)

When lowering cttz.elts for fixed-length vectors when SVE is available,
we use scalable container types for the predicate types since NEON
doesn't have dedicated predicate registers. Unfortunately, this also
discards the actual length of the vector to look at if it's shorter than a
full vector.
Example codegen, for a llvm.experimental.cttz.elts.i64.v4i1

  shl v0.4h, v0.4h, 15
  ptrue p0.h, vl4
  ptrue p1.h
  cmpne p0.h, p0/z, z0.h, #0
  brkb p0.b, p1/z, p0.b
  cntp x8, p0, p0.h

The 'ptrue p1.h' is where we went wrong -- if p0 is empty, we should
only set 4 lanes active at most, but since brkb's pg operand is all
active, it sets all available lanes (e.g. 8 .h lanes on a 128b SVE

    [6 lines not shown]
DeltaFile
+36-54llvm/test/CodeGen/AArch64/intrinsic-cttz-elts-sve.ll
+28-28llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+6-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+2-2llvm/lib/Target/AArch64/AArch64InstrInfo.td
+72-854 files

LLVM/project 2d12279clang/lib/CIR/CodeGen CIRGenCUDANV.cpp CIRGenCUDARuntime.h, clang/test/CIR/CodeGenCUDA kernel-call.cu kernel-stub-name.cu

[CIR][CUDA] Upstream  device stub body emission and name mangling (#177790)

Part of #175871 

This patch adds the initial implementation of the CUDA/NV Runtimes
generating code for the device stub body. tested on CUDA. HIP coverage
to be added in a later PR.
DeltaFile
+341-0clang/lib/CIR/CodeGen/CIRGenCUDANV.cpp
+80-0clang/test/CIR/CodeGenCUDA/Inputs/cuda.h
+50-0clang/lib/CIR/CodeGen/CIRGenCUDARuntime.h
+50-0clang/test/CIR/CodeGenCUDA/kernel-call.cu
+27-2clang/lib/CIR/CodeGen/CIRGenModule.cpp
+22-0clang/test/CIR/CodeGenCUDA/kernel-stub-name.cu
+570-26 files not shown
+618-312 files

LLVM/project e706614llvm/test/Analysis/CostModel/AArch64 insert-extract.ll

[Analysis][CostModel] Add insert-extract runlines for Apple CPUs (NFC) (#179236)

Including `apple-latest` to cover new processors until (if) they
diverge.
DeltaFile
+11-11llvm/test/Analysis/CostModel/AArch64/insert-extract.ll
+11-111 files

LLVM/project c7cd15flibcxx/test/benchmarks/format formatter_float.bench.cpp

[libc++] Refactor formatter_float.bench.cpp and drop some benchmarks (#178886)

`formatter_float.bench.cpp` currently benchmarks the floating point
formatting very extensively. This patch reduces the number of benchmarks
by removing some of the cases that are relatively meaningless.
The benchmark is also converted to the more recent style of benchmarks.
As a nice side-effect, this reduces the time it takes to compile the
benchmark by ~20x.

We may be able to drop more benchmarks, but I'm not an expert here and
am rather conservative here for that reason.
DeltaFile
+311-229libcxx/test/benchmarks/format/formatter_float.bench.cpp
+311-2291 files

LLVM/project 6d15d1cllvm CMakeLists.txt

[CMake] Update "all" project/runtimes (#179270)

Move compiler-rt from "all" projects to "all" runtimes and add "openmp"
to "all" runtimes, as it was recently removed from "all" projects.
DeltaFile
+5-6llvm/CMakeLists.txt
+5-61 files

LLVM/project f74d072llvm/include/llvm/Transforms/IPO Attributor.h, llvm/lib/Passes PassBuilderPipelines.cpp

Rename all to full
DeltaFile
+3-3llvm/test/Other/opt-pipeline-attributor-enable.ll
+2-2llvm/lib/Passes/PassBuilderPipelines.cpp
+1-1llvm/include/llvm/Transforms/IPO/Attributor.h
+6-63 files

LLVM/project d752536llvm/include/llvm/Transforms/IPO Attributor.h, llvm/lib/Passes PassBuilderPipelines.cpp

Attributor: Add -light otions to -attributor-enable flag

Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.

Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.

I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
DeltaFile
+24-0llvm/test/Other/opt-pipeline-attributor-enable.ll
+10-0llvm/lib/Passes/PassBuilderPipelines.cpp
+5-1llvm/include/llvm/Transforms/IPO/Attributor.h
+39-13 files

LLVM/project c9ab97fflang/lib/Lower OpenACC.cpp, flang/test/Lower/OpenACC acc-cache.f90

[flang][acc] Fix cache directive with mapped component (#179335)

When a derived type component is mapped via a data clause (e.g.,
`copyin(data%A(...))`), the base address inside the parallel region
comes from an `hlfir.declare` op (for the mapped address) instead of
an `hlfir.designate` op. Use `FortranVariableOpInterface` to extract
shape/typeparams/attrs, which works for both cases since both ops
implement this interface.
DeltaFile
+104-0flang/test/Lower/OpenACC/acc-cache.f90
+12-6flang/lib/Lower/OpenACC.cpp
+116-62 files

LLVM/project 9745669flang/lib/Lower OpenACC.cpp, flang/test/Lower/OpenACC acc-no-create-array-section.f90

[flang][acc] remap no_create array sections (#178660)

The workaround for no_create with array section is not needed anymore
because it has been expected that it would be up to the runtime to make
sure fir.box for the variable are always readable on the device even
when the variable is not present.
DeltaFile
+21-0flang/test/Lower/OpenACC/acc-no-create-array-section.f90
+1-8flang/lib/Lower/OpenACC.cpp
+22-82 files

LLVM/project 8124ee5mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR openmp-todo.mlir

[OpenMP][MLIR] Add num_threads mlir->llvm lowering
DeltaFile
+40-26mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+3-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+43-292 files

LLVM/project 38e280dllvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/LoongArch/lsx issue177155.ll

[SelectionDAG] Use promoted types when creating nodes after type legalization (#178617)

When creating new nodes with illegal types after type legalization, we
should try to use promoted type to avoid creating nodes with illegal
types.

Fixes: https://github.com/llvm/llvm-project/issues/177155
DeltaFile
+26-0llvm/test/CodeGen/LoongArch/lsx/issue177155.ll
+7-0llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+33-02 files

LLVM/project f95b63bllvm/lib/Frontend/OpenMP OMPIRBuilder.cpp, mlir/test/Target/LLVMIR openmp-simd-guided.mlir

[llvm][OpenMP] Allow Chunk Size on SIMD Guided (#178853)

As per the OpenMP Spec, Chunk Size is allowed when using the guided
kind-type with the schedule clause. However, when being used in cases
such as `!$omp do simd schedule (simd:guided,4)`, this was not allowed
as the base type, BaseGuidedSimd, would hit an assert not allowing
ChunkSizes.

By making this change, we can allow the use of the Guided type, with a
ChunkSize and the schedule clause when using OMPIRBuidler.

Fixes #82106
DeltaFile
+23-0mlir/test/Target/LLVMIR/openmp-simd-guided.mlir
+1-1llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+24-12 files

LLVM/project 6d994d8llvm/lib/MC MCObjectStreamer.cpp

[MC] Try to fix ubsan bot

Check that the size is non-zero to make sure we don't call
memcpy with null pointers. This is well-defined now, but ubsan
may still warn about it.

(cherry picked from commit d064f395af7ac226dec3f8e90516a26e96e2acf1)
DeltaFile
+2-1llvm/lib/MC/MCObjectStreamer.cpp
+2-11 files

LLVM/project 1655d51clang/include/clang/Options Options.td, clang/lib/Basic/Targets X86.cpp

[X86][APX] Disable PP2/PPX generation on Windows (#178122)

The PUSH2/POP2/PPX instructions for APX require updates to the Microsoft
Windows OS x64 calling convention documented at
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170
due to lack of suitable unwinder opcodes that can support APX
PUSH2/POP2/PPX.

The PR request disables this support by default for code robustness;
workloads that choose to explicitly enable this support can change the
default behavior by explicitly specifying the flag options that enable
this support e.g. for experimentation or code paths that do not need
unwinder support.

(cherry picked from commit 2f3935bcee6eaf7df8c85a21b7c0fbef967316b5)
DeltaFile
+25-5clang/lib/Driver/ToolChains/Arch/X86.cpp
+6-5clang/test/Driver/x86-target-features.c
+6-5clang/test/Driver/cl-x86-flags.c
+8-2clang/lib/Basic/Targets/X86.cpp
+2-6clang/include/clang/Options/Options.td
+4-0llvm/lib/TargetParser/Host.cpp
+51-232 files not shown
+56-238 files

LLVM/project 25b8d52lldb/source/API SBBreakpointName.cpp, lldb/test/API/functionalities/breakpoint/breakpoint_names TestBreakpointNames.py

[lldb] Fix SBBreakpointName::SetEnabled to propagate changes to breakpoints (#178734)

When setting the enabled state of a breakpoint name via the API, the
change was not being propagated to breakpoints using that name.
This was inconsistent with the CLI behaviour where `breakpoint name
configure --enable/--disable` correctly updates all associated
breakpoints.

(cherry picked from commit 8370304f1e5878c1860223239932ddd05d9ba4c8)
DeltaFile
+66-2lldb/test/API/functionalities/breakpoint/breakpoint_names/TestBreakpointNames.py
+1-0lldb/source/API/SBBreakpointName.cpp
+67-22 files

LLVM/project 189c8e4mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR openmp-todo.mlir openmp-target-launch-host.mlir

[OpenMP][MLIR] Add num_teams mlir to llvm lowering
DeltaFile
+80-34mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+18-2mlir/test/Target/LLVMIR/openmp-todo.mlir
+3-3mlir/test/Target/LLVMIR/openmp-target-launch-host.mlir
+101-393 files

LLVM/project 81f8bdallvm/lib/Target/AArch64/GISel AArch64InstructionSelector.cpp, llvm/test/CodeGen/AArch64 aarch64-tbz.ll

[AArch64][GlobalISel] Do no skip zext in getTestBitReg. (#177991)

We can, when attempting to lower to tbz, skip a zext that is then not
accounted for elsewhere. The attached test ends up with a tbz from an
extract that then does not properly zext the value extracted from the
vector. This patch fixes that by only looking through a G_ZEXT if the
bit checked is in the low part of the value, lining up the code with the
comment.

Fixes #173895

(cherry picked from commit 0321f3eeee5cceddc2541046ee155863f5f59585)
DeltaFile
+7-7llvm/test/CodeGen/AArch64/GlobalISel/widen-narrow-tbz-tbnz.mir
+5-4llvm/test/CodeGen/AArch64/aarch64-tbz.ll
+4-3llvm/test/CodeGen/AArch64/GlobalISel/opt-fold-xor-tbz-tbnz.mir
+5-1llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+21-154 files

LLVM/project 1e55b98llvm/test/CodeGen/AArch64 aarch64-tbz.ll

[AArch64] Update aarch64-tbz.ll test. NFC

(cherry picked from commit 8302e8ae6694978806f94aca81cd31258db66169)
DeltaFile
+179-25llvm/test/CodeGen/AArch64/aarch64-tbz.ll
+179-251 files

LLVM/project 61203aellvm/test/CodeGen/AMDGPU fneg-combines.f16.ll fneg-combines.ll, llvm/test/CodeGen/RISCV fpclamptosat.ll

Merge branch 'main' into users/zhaoqi5/promote-type-afterlegalizetypes
DeltaFile
+56,025-0llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+14,154-5,110llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+850-5,393llvm/test/CodeGen/RISCV/fpclamptosat.ll
+2,230-3,501llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
+2,626-2,303llvm/test/CodeGen/AMDGPU/fneg-combines.ll
+4,716-0llvm/test/MC/AMDGPU/gfx13_asm_sop2.s
+80,601-16,3071,738 files not shown
+167,210-45,6751,744 files

LLVM/project 6e0577fllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 avx512-intrinsics.ll

[X86] getScalarMaskingNode - FIXUPIMM scalar ops take upper elements from second operand (#179101)

FIXUPIMMSS/SD instructions passthrough the SECOND operand upper elements, and not the first like most (2-op) instructions

Fixes #179057

(cherry picked from commit 49d2323447aec77c3d1ae8c941f3f8a126ff1480)
DeltaFile
+6-4llvm/test/CodeGen/X86/avx512-intrinsics.ll
+6-4llvm/lib/Target/X86/X86ISelLowering.cpp
+12-82 files

LLVM/project 0e8db60llvm/test/CodeGen/X86 avx512-intrinsics.ll

[X86] Add test coverage for #179057 (#179092)

Incorrect folding of fixupimm scalar intrinsics passthrough when the
mask is known zero

(cherry picked from commit 618d71dc98df760d0c724cff6fa69b780e8c0372)
DeltaFile
+36-0llvm/test/CodeGen/X86/avx512-intrinsics.ll
+36-01 files