LLVM/project c7cd15flibcxx/test/benchmarks/format formatter_float.bench.cpp

[libc++] Refactor formatter_float.bench.cpp and drop some benchmarks (#178886)

`formatter_float.bench.cpp` currently benchmarks the floating point
formatting very extensively. This patch reduces the number of benchmarks
by removing some of the cases that are relatively meaningless.
The benchmark is also converted to the more recent style of benchmarks.
As a nice side-effect, this reduces the time it takes to compile the
benchmark by ~20x.

We may be able to drop more benchmarks, but I'm not an expert here and
am rather conservative here for that reason.
DeltaFile
+311-229libcxx/test/benchmarks/format/formatter_float.bench.cpp
+311-2291 files

LLVM/project 6d15d1cllvm CMakeLists.txt

[CMake] Update "all" project/runtimes (#179270)

Move compiler-rt from "all" projects to "all" runtimes and add "openmp"
to "all" runtimes, as it was recently removed from "all" projects.
DeltaFile
+5-6llvm/CMakeLists.txt
+5-61 files

LLVM/project f74d072llvm/include/llvm/Transforms/IPO Attributor.h, llvm/lib/Passes PassBuilderPipelines.cpp

Rename all to full
DeltaFile
+3-3llvm/test/Other/opt-pipeline-attributor-enable.ll
+2-2llvm/lib/Passes/PassBuilderPipelines.cpp
+1-1llvm/include/llvm/Transforms/IPO/Attributor.h
+6-63 files

LLVM/project d752536llvm/include/llvm/Transforms/IPO Attributor.h, llvm/lib/Passes PassBuilderPipelines.cpp

Attributor: Add -light otions to -attributor-enable flag

Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.

Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.

I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
DeltaFile
+24-0llvm/test/Other/opt-pipeline-attributor-enable.ll
+10-0llvm/lib/Passes/PassBuilderPipelines.cpp
+5-1llvm/include/llvm/Transforms/IPO/Attributor.h
+39-13 files

LLVM/project c9ab97fflang/lib/Lower OpenACC.cpp, flang/test/Lower/OpenACC acc-cache.f90

[flang][acc] Fix cache directive with mapped component (#179335)

When a derived type component is mapped via a data clause (e.g.,
`copyin(data%A(...))`), the base address inside the parallel region
comes from an `hlfir.declare` op (for the mapped address) instead of
an `hlfir.designate` op. Use `FortranVariableOpInterface` to extract
shape/typeparams/attrs, which works for both cases since both ops
implement this interface.
DeltaFile
+104-0flang/test/Lower/OpenACC/acc-cache.f90
+12-6flang/lib/Lower/OpenACC.cpp
+116-62 files

LLVM/project 9745669flang/lib/Lower OpenACC.cpp, flang/test/Lower/OpenACC acc-no-create-array-section.f90

[flang][acc] remap no_create array sections (#178660)

The workaround for no_create with array section is not needed anymore
because it has been expected that it would be up to the runtime to make
sure fir.box for the variable are always readable on the device even
when the variable is not present.
DeltaFile
+21-0flang/test/Lower/OpenACC/acc-no-create-array-section.f90
+1-8flang/lib/Lower/OpenACC.cpp
+22-82 files

LLVM/project 8124ee5mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR openmp-todo.mlir

[OpenMP][MLIR] Add num_threads mlir->llvm lowering
DeltaFile
+40-26mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+3-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+43-292 files

LLVM/project 38e280dllvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/LoongArch/lsx issue177155.ll

[SelectionDAG] Use promoted types when creating nodes after type legalization (#178617)

When creating new nodes with illegal types after type legalization, we
should try to use promoted type to avoid creating nodes with illegal
types.

Fixes: https://github.com/llvm/llvm-project/issues/177155
DeltaFile
+26-0llvm/test/CodeGen/LoongArch/lsx/issue177155.ll
+7-0llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+33-02 files

LLVM/project f95b63bllvm/lib/Frontend/OpenMP OMPIRBuilder.cpp, mlir/test/Target/LLVMIR openmp-simd-guided.mlir

[llvm][OpenMP] Allow Chunk Size on SIMD Guided (#178853)

As per the OpenMP Spec, Chunk Size is allowed when using the guided
kind-type with the schedule clause. However, when being used in cases
such as `!$omp do simd schedule (simd:guided,4)`, this was not allowed
as the base type, BaseGuidedSimd, would hit an assert not allowing
ChunkSizes.

By making this change, we can allow the use of the Guided type, with a
ChunkSize and the schedule clause when using OMPIRBuidler.

Fixes #82106
DeltaFile
+23-0mlir/test/Target/LLVMIR/openmp-simd-guided.mlir
+1-1llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+24-12 files

LLVM/project 6d994d8llvm/lib/MC MCObjectStreamer.cpp

[MC] Try to fix ubsan bot

Check that the size is non-zero to make sure we don't call
memcpy with null pointers. This is well-defined now, but ubsan
may still warn about it.

(cherry picked from commit d064f395af7ac226dec3f8e90516a26e96e2acf1)
DeltaFile
+2-1llvm/lib/MC/MCObjectStreamer.cpp
+2-11 files

LLVM/project 1655d51clang/include/clang/Options Options.td, clang/lib/Basic/Targets X86.cpp

[X86][APX] Disable PP2/PPX generation on Windows (#178122)

The PUSH2/POP2/PPX instructions for APX require updates to the Microsoft
Windows OS x64 calling convention documented at
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170
due to lack of suitable unwinder opcodes that can support APX
PUSH2/POP2/PPX.

The PR request disables this support by default for code robustness;
workloads that choose to explicitly enable this support can change the
default behavior by explicitly specifying the flag options that enable
this support e.g. for experimentation or code paths that do not need
unwinder support.

(cherry picked from commit 2f3935bcee6eaf7df8c85a21b7c0fbef967316b5)
DeltaFile
+25-5clang/lib/Driver/ToolChains/Arch/X86.cpp
+6-5clang/test/Driver/x86-target-features.c
+6-5clang/test/Driver/cl-x86-flags.c
+8-2clang/lib/Basic/Targets/X86.cpp
+2-6clang/include/clang/Options/Options.td
+4-0llvm/lib/TargetParser/Host.cpp
+51-232 files not shown
+56-238 files

LLVM/project 25b8d52lldb/source/API SBBreakpointName.cpp, lldb/test/API/functionalities/breakpoint/breakpoint_names TestBreakpointNames.py

[lldb] Fix SBBreakpointName::SetEnabled to propagate changes to breakpoints (#178734)

When setting the enabled state of a breakpoint name via the API, the
change was not being propagated to breakpoints using that name.
This was inconsistent with the CLI behaviour where `breakpoint name
configure --enable/--disable` correctly updates all associated
breakpoints.

(cherry picked from commit 8370304f1e5878c1860223239932ddd05d9ba4c8)
DeltaFile
+66-2lldb/test/API/functionalities/breakpoint/breakpoint_names/TestBreakpointNames.py
+1-0lldb/source/API/SBBreakpointName.cpp
+67-22 files

LLVM/project 189c8e4mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR openmp-todo.mlir openmp-target-launch-host.mlir

[OpenMP][MLIR] Add num_teams mlir to llvm lowering
DeltaFile
+80-34mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+18-2mlir/test/Target/LLVMIR/openmp-todo.mlir
+3-3mlir/test/Target/LLVMIR/openmp-target-launch-host.mlir
+101-393 files

LLVM/project 81f8bdallvm/lib/Target/AArch64/GISel AArch64InstructionSelector.cpp, llvm/test/CodeGen/AArch64 aarch64-tbz.ll

[AArch64][GlobalISel] Do no skip zext in getTestBitReg. (#177991)

We can, when attempting to lower to tbz, skip a zext that is then not
accounted for elsewhere. The attached test ends up with a tbz from an
extract that then does not properly zext the value extracted from the
vector. This patch fixes that by only looking through a G_ZEXT if the
bit checked is in the low part of the value, lining up the code with the
comment.

Fixes #173895

(cherry picked from commit 0321f3eeee5cceddc2541046ee155863f5f59585)
DeltaFile
+7-7llvm/test/CodeGen/AArch64/GlobalISel/widen-narrow-tbz-tbnz.mir
+5-4llvm/test/CodeGen/AArch64/aarch64-tbz.ll
+4-3llvm/test/CodeGen/AArch64/GlobalISel/opt-fold-xor-tbz-tbnz.mir
+5-1llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+21-154 files

LLVM/project 1e55b98llvm/test/CodeGen/AArch64 aarch64-tbz.ll

[AArch64] Update aarch64-tbz.ll test. NFC

(cherry picked from commit 8302e8ae6694978806f94aca81cd31258db66169)
DeltaFile
+179-25llvm/test/CodeGen/AArch64/aarch64-tbz.ll
+179-251 files

LLVM/project 61203aellvm/test/CodeGen/AMDGPU fneg-combines.f16.ll fneg-combines.ll, llvm/test/CodeGen/RISCV fpclamptosat.ll

Merge branch 'main' into users/zhaoqi5/promote-type-afterlegalizetypes
DeltaFile
+56,025-0llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+14,154-5,110llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+850-5,393llvm/test/CodeGen/RISCV/fpclamptosat.ll
+2,230-3,501llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
+2,626-2,303llvm/test/CodeGen/AMDGPU/fneg-combines.ll
+4,716-0llvm/test/MC/AMDGPU/gfx13_asm_sop2.s
+80,601-16,3071,738 files not shown
+167,210-45,6751,744 files

LLVM/project 6e0577fllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 avx512-intrinsics.ll

[X86] getScalarMaskingNode - FIXUPIMM scalar ops take upper elements from second operand (#179101)

FIXUPIMMSS/SD instructions passthrough the SECOND operand upper elements, and not the first like most (2-op) instructions

Fixes #179057

(cherry picked from commit 49d2323447aec77c3d1ae8c941f3f8a126ff1480)
DeltaFile
+6-4llvm/test/CodeGen/X86/avx512-intrinsics.ll
+6-4llvm/lib/Target/X86/X86ISelLowering.cpp
+12-82 files

LLVM/project 0e8db60llvm/test/CodeGen/X86 avx512-intrinsics.ll

[X86] Add test coverage for #179057 (#179092)

Incorrect folding of fixupimm scalar intrinsics passthrough when the
mask is known zero

(cherry picked from commit 618d71dc98df760d0c724cff6fa69b780e8c0372)
DeltaFile
+36-0llvm/test/CodeGen/X86/avx512-intrinsics.ll
+36-01 files

LLVM/project 6299a32llvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor nofpclass-fma.ll nofpclass-fmul.ll

ValueTracking: Revert noundef checks in computeKnownFPClass for fmul/fma (#178850)

This functionally reverts fd5cfcc41311c6287e9dc408b8aae499501660e1 and
35ce17b6f6ca5dd321af8e6763554b10824e4ac4.

This was correct and necessary, but is causing performance regressions
since isGuaranteedNotToBeUndef is apparently not smart enough to detect
through recurrences. Revert this for the release branch.

Also the test coverage was inadequate for the fma case, so add a new
case which changes with and without the check.

(cherry picked from commit 07ec2fa1443ccd3cbb55612937f1dddebfe51c15)
DeltaFile
+30-4llvm/test/Transforms/Attributor/nofpclass-fma.ll
+4-5llvm/lib/Analysis/ValueTracking.cpp
+1-1llvm/test/Transforms/Attributor/nofpclass-fmul.ll
+35-103 files

LLVM/project cebe861clang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

fix lit includes yet again
DeltaFile
+5-5clang/test/CIR/CodeGenCUDA/filter-decl.cu
+2-2clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+7-72 files

LLVM/project 22ee9f5clang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

nit: fix lit includes
DeltaFile
+4-4clang/test/CIR/CodeGenCUDA/filter-decl.cu
+1-1clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+5-52 files

LLVM/project 6ca0f42clang/lib/CIR/CodeGen CIRGenModule.cpp

fmt yo
DeltaFile
+1-1clang/lib/CIR/CodeGen/CIRGenModule.cpp
+1-11 files

LLVM/project 1404014clang/lib/CIR/CodeGen CIRGenModule.cpp TargetInfo.cpp, clang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

[CIR][CUDA] Add NVPTX target info and CUDA/HIP global emission filtering
DeltaFile
+66-0clang/lib/CIR/CodeGen/CIRGenModule.cpp
+37-0clang/test/CIR/CodeGenCUDA/filter-decl.cu
+30-0clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+19-0clang/lib/CIR/CodeGen/TargetInfo.cpp
+4-0clang/lib/CIR/CodeGen/CIRGenModule.h
+2-0clang/lib/CIR/CodeGen/TargetInfo.h
+158-06 files

LLVM/project ee5efc9clang/test/CIR/CodeGenCUDA filter-decl.cu

fix nit test case
DeltaFile
+1-1clang/test/CIR/CodeGenCUDA/filter-decl.cu
+1-11 files

LLVM/project 3bf2d33clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/test/CIR/CodeGenCUDA filter-decl.cu

address comments
DeltaFile
+32-14clang/test/CIR/CodeGenCUDA/filter-decl.cu
+2-9clang/lib/CIR/CodeGen/CIRGenModule.cpp
+34-232 files

LLVM/project c58709cclang/lib/CIR/CodeGen CIRGenModule.cpp

le format monseiur
DeltaFile
+3-4clang/lib/CIR/CodeGen/CIRGenModule.cpp
+3-41 files

LLVM/project c1a90f3clang/test/CIR/CodeGenCUDA kernel-call.cu kernel-stub-name.cu

Fix includes once again
DeltaFile
+2-2clang/test/CIR/CodeGenCUDA/kernel-call.cu
+2-2clang/test/CIR/CodeGenCUDA/kernel-stub-name.cu
+4-42 files

LLVM/project 56d3169mlir/include/mlir/Dialect/XeGPU/uArch IntelGpuXe2.h uArchBase.h, mlir/lib/Dialect/XeGPU/Transforms XeGPUPropagateLayout.cpp

[MLIR][XeGPU] Reorganize uArch for easier extension (#178907)

DeltaFile
+10-20mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h
+26-0mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h
+12-9mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+48-293 files

LLVM/project e29b48ellvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp, llvm/test/CodeGen/AMDGPU promote-alloca-non-volatile-accesses.ll promote-alloca-vgpr-ratio.ll

[AMDGPU][PromoteAlloca] Set !amdgpu.non.volatile if promotion fails

I thought about doing this in a separate pass, but this pass already has all the necessary analysis for this to be a trivial addition.
We can simply set `!amdgpu.non.volatile`  if all other attempts to promote the operation failed.
DeltaFile
+45-0llvm/test/CodeGen/AMDGPU/promote-alloca-non-volatile-accesses.ll
+23-18llvm/test/CodeGen/AMDGPU/promote-alloca-vgpr-ratio.ll
+29-2llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+2-2llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
+99-224 files

LLVM/project 550cb85llvm/test/CodeGen/AMDGPU whole-wave-functions.ll accvgpr-spill-scc-clobber.mir

[AMDGPU] Set MONonVolatile on memory accesses for spills

Mark the memory operand of spill load/stores as non-volatile, so that these
loads and stores are emitted with `nv` set.

The reason is that scratch memory used by spills will never be shared by
another thread. It's purely thread local and thus a good fit for the `nv` bit.
DeltaFile
+5,528-5,528llvm/test/CodeGen/AMDGPU/whole-wave-functions.ll
+4,314-4,314llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+1,260-1,260llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+902-902llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+180-180llvm/test/CodeGen/AMDGPU/sgpr-spill.mir
+166-166llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+12,350-12,35042 files not shown
+13,195-13,18148 files