LLVM/project 82799a4llvm/lib/Target/AMDGPU AMDGPULibCalls.cpp, llvm/test/CodeGen/AMDGPU amdgpu-simplify-libcall-pown.ll amdgpu-simplify-libcall-pow.ll

Reapply "AMDGPU: Use real copysign in fast pow (#97152)" (#178036)

This reverts commit bff619f91015a633df659d7f60f842d5c49351df.

This was reverted due to regressions caused by poor copysign
optimization, which have been fixed.
DeltaFile
+24-32llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll
+21-28llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
+8-9llvm/test/CodeGen/AMDGPU/simplify-libcalls.ll
+4-4llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
+4-3llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp
+61-765 files

LLVM/project 4c05ff1clang/lib/Frontend CompilerInstance.cpp, clang/lib/Lex PPDirectives.cpp

[clang][modules] Support every import syntax in single-module-parse-mode (#179610)

Previously, `-fmodules-single-module-parse-mode` only prevented module
compilation/loading when initiated from an `#include` or `#import`
directive. This PR does the same for `@import`, `#pragma clang module
import` and `#pragma clang module load`. This is done by sinking the
logic down into `CompilerInstance::loadModule()`.
DeltaFile
+36-0clang/test/Modules/single-module-parse-mode-compiles.m
+17-0clang/lib/Frontend/CompilerInstance.cpp
+1-6clang/lib/Lex/PPDirectives.cpp
+54-63 files

LLVM/project 0d69410llvm/lib/Target/AMDGPU SIInstrInfo.cpp AMDGPUInstructionSelector.cpp, llvm/test/CodeGen/AMDGPU licm-wwm.mir

[AMDGPU] Disable VALU sinking and hoisting with WWM

Machine LICM can hoist a VALU instruction from a WWM region.
In this case WQM pass will have to create yet another WWM region
around the hoisted instruction, which is not desired.

Unfortunatelly we cannot tell if an instruction is in the WWM
region, so this patch disables hoisting if WWM is used in the
function.

This works around the bug SWDEV-502411.
DeltaFile
+20-0llvm/test/CodeGen/MIR/AMDGPU/uses-whole.wave.ll
+9-7llvm/test/CodeGen/AMDGPU/licm-wwm.mir
+11-1llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+5-1llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+4-0llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll
+4-0llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
+53-99 files not shown
+70-915 files

LLVM/project ccf4615mlir/include/mlir/Dialect/Utils ReshapeOpsUtils.h, mlir/test/Dialect/MemRef canonicalize.mlir

[mlir] disable folding collapse expand to cast (#179209)

Collapsing expand(collapse(src)) to cast(src) is supported in cases
where the source and result are cast compatible but not equal. When the
source has dynamic dimensions this leads to cases where the cast is
enabled even though certain dimensions cast from static to dynamic when
the dynamic size is not assured to be equal to the static size.
Currently blocking applying this folding when the source has dynamic
dimensions to preserve correctness.
In the future it could be possible to enable some cases of folding when
not all dimensions of the source are static.
Such cases could be when:
  1) expand and collapse happened on non dynamic dims
  2) expand and collapse on dynamic dims could be folded to no op
DeltaFile
+15-0mlir/test/Dialect/MemRef/canonicalize.mlir
+2-1mlir/include/mlir/Dialect/Utils/ReshapeOpsUtils.h
+17-12 files

LLVM/project 059176dmlir/include/mlir-c Support.h, mlir/include/mlir/Bindings/Python NanobindUtils.h

[MLIR][Python] Add llvm raw fd ostream c api (#179770)

This PR adds a C API `MlirLlvmRawFdOstream` for `llvm::raw_fd_ostream`,
which cannot be safely replaced by `std::ofstream` on Windows.
`llvm::raw_fd_ostream` configures Win32 file sharing flags, allowing
other handles (e.g. Python temp file handles) to coexist, see details
[here](https://llvm.org/doxygen/Windows_2Path_8inc_source.html#l1281),
while `std::ofstream` disables file sharing by default.
DeltaFile
+38-0mlir/lib/CAPI/IR/Support.cpp
+26-10mlir/include/mlir/Bindings/Python/NanobindUtils.h
+32-0mlir/include/mlir-c/Support.h
+2-0mlir/include/mlir/CAPI/Support.h
+98-104 files

LLVM/project 7bf47e2mlir/lib/Dialect/Tensor/IR TensorOps.cpp, mlir/test/Dialect/Tensor invalid.mlir

[mlir][tensor] Guard constant reshape folding (#179077)

DeltaFile
+10-0mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+10-0mlir/test/Dialect/Tensor/invalid.mlir
+20-02 files

LLVM/project 9639e96llvm/lib/CodeGen/GlobalISel InlineAsmLowering.cpp, llvm/test/CodeGen/AArch64/GlobalISel inline-asm.ll irtranslator-inline-asm.ll

[AArch64] fix copy from GPR32 to FPR16 (#176594)

fixes https://github.com/llvm/llvm-project/issues/79822
cc https://github.com/rust-lang/rust/issues/120374

The example fails on nightly https://godbolt.org/z/zEojPzqWc.
DeltaFile
+73-5llvm/test/CodeGen/AArch64/GlobalISel/inline-asm.ll
+17-8llvm/lib/CodeGen/GlobalISel/InlineAsmLowering.cpp
+3-2llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
+93-153 files

LLVM/project 8921f7flibc/src/__support/wctype wctype_classification_utils.cpp wctype_classification_utils.h, llvm/test/CodeGen/AArch64 clmul-fixed.ll clmul-scalable.ll

Merge branch 'main' into users/arsenm/reapply-use-real-copysign-in-fast-pow
DeltaFile
+4,100-13llvm/test/CodeGen/AArch64/clmul-fixed.ll
+3,681-0libc/src/__support/wctype/wctype_classification_utils.cpp
+5-3,665libc/src/__support/wctype/wctype_classification_utils.h
+2,212-1,142llvm/test/CodeGen/AArch64/clmul-scalable.ll
+756-0llvm/test/CodeGen/AArch64/clmul.ll
+1-469llvm/test/Transforms/LoopVectorize/multiple-result-intrinsics.ll
+10,755-5,289769 files not shown
+21,258-10,695775 files

LLVM/project 943782blldb/include/lldb/Target Target.h, lldb/source/Target Target.cpp

[lldb] Broadcast `eBroadcastBitStackChanged` when frame providers change (#171482)

We want to reload the call stack whenever the frame providers are
updated. To do so, we now emit a `eBroadcastBitStackChanged` on all
threads whenever any changes to the frame providers take place.

I found this very useful while iterating on a frame provider in
lldb-dap. So far, the new frame provider only took effect after
continuing execution. Now the backtrace in VS-Code gets refreshed
immediately upon running `target frame-provider add`.
DeltaFile
+77-0lldb/test/API/functionalities/scripted_frame_provider/TestScriptedFrameProvider.py
+36-23lldb/source/Target/Target.cpp
+6-0lldb/include/lldb/Target/Target.h
+119-233 files

LLVM/project ee81694libc/shared/math f16fmaf.h, libc/src/__support/math f16fmaf.h CMakeLists.txt

[libc][math] Refactor f16fmaf to Header Only. (#178851)

closes #175319 
DeltaFile
+33-0libc/src/__support/math/f16fmaf.h
+31-0libc/shared/math/f16fmaf.h
+12-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+10-0libc/src/__support/math/CMakeLists.txt
+2-4libc/src/math/generic/f16fmaf.cpp
+1-2libc/src/math/generic/CMakeLists.txt
+89-73 files not shown
+94-79 files

LLVM/project 9ebdeb2lldb/include/lldb/Target Process.h, lldb/source/Core DynamicLoader.cpp

[lldb] Return Expected<ModuleSP> from Process::ReadModuleFromMemory (#179583)

I noticed that Module::GetMemoryObjectFile populates a Status object
upon error but it's effectively dropped on the floor. Instead, the
clients can report the error as desired.

At the moment, all clients are either (1) consuming the error because
it's only trying to find a module, or (2) log the error and bail out
early. I tried to preserve existing behavior as faithfully as possible.
DeltaFile
+23-25lldb/source/Target/Process.cpp
+30-16lldb/source/Plugins/DynamicLoader/MacOSX-DYLD/DynamicLoaderDarwin.cpp
+17-3lldb/source/Core/DynamicLoader.cpp
+17-2lldb/source/Plugins/DynamicLoader/FreeBSD-Kernel/DynamicLoaderFreeBSDKernel.cpp
+16-2lldb/source/Plugins/DynamicLoader/Darwin-Kernel/DynamicLoaderDarwinKernel.cpp
+14-3lldb/include/lldb/Target/Process.h
+117-515 files not shown
+159-6111 files

LLVM/project 3199749clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp

Rebase
DeltaFile
+2-2clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+2-21 files

LLVM/project fad9b2ellvm/include/llvm/ADT ScopeExit.h

[llvm][ADT] Mark the whole scope_exit class [[nodiscard]] instead (#180008)

This PR is to address:
https://github.com/llvm/llvm-project/pull/179720#issuecomment-3854792269
https://github.com/llvm/llvm-project/pull/179720#issuecomment-3855339636
DeltaFile
+3-4llvm/include/llvm/ADT/ScopeExit.h
+3-41 files

LLVM/project 1c60dbeclang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp CIRGenFunction.h

Address comments from Andy
DeltaFile
+16-14clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+1-1clang/lib/CIR/CodeGen/CIRGenFunction.h
+17-152 files

LLVM/project c8f6577clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp CIRGenTypes.cpp, clang/test/CIR/CodeGenBuiltins/AArch64 acle_sve_dup.c

[CIR][AArch64] Add lowering for predicated SVE svdup builtins (zeroing)

This PR adds CIR lowering support for predicated SVE `svdup` builtins on
AArch64. The corresponding ACLE intrinsics are documented at:
  https://developer.arm.com/architectures/instruction-sets/intrinsics

This change focuses on the zeroing-predicated variants (suffix `_z`, e.g.
`svdup_n_f32_z`), which lower to the LLVM SVE `dup` intrinsic with a
`zeroinitializer` passthrough operand.

IMPLEMENTATION NOTES
--------------------
* The CIR type converter is extended to support `BuiltinType::SveBool`,
  which is lowered to `cir.vector<[16] x i1>`, matching current Clang
  behaviour and ensuring compatibility with existing LLVM SVE lowering.
* Added logic that converts `cir.vector<[16] x i1>` according to the
  underlying element type. This is done by calling
  `@llvm.aarch64.sve.convert.from.svbool`.


    [56 lines not shown]
DeltaFile
+472-5clang/test/CIR/CodeGenBuiltins/AArch64/acle_sve_dup.c
+86-9clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+4-0clang/lib/CIR/CodeGen/CIRGenTypes.cpp
+2-0clang/lib/CIR/CodeGen/CIRGenFunction.h
+564-144 files

LLVM/project 59a63b2llvm/lib/Analysis StaticDataProfileInfo.cpp

add comments based on code review and offline discussions
DeltaFile
+12-1llvm/lib/Analysis/StaticDataProfileInfo.cpp
+12-11 files

LLVM/project 91f3953clang/test/CodeGenHLSL/builtins pow-overloads.hlsl distance.hlsl, clang/test/Headers __clang_hip_math.hip

Inliner: Handle nofpclass return attributes (#179776)

Follow along with how range is handled.
DeltaFile
+52-52clang/test/Headers/__clang_hip_math.hip
+57-0llvm/test/Transforms/Inline/ret_attr_nofpclass.ll
+20-20clang/test/CodeGenHLSL/builtins/pow-overloads.hlsl
+16-16clang/test/CodeGenHLSL/builtins/distance.hlsl
+16-16clang/test/CodeGenHLSL/builtins/length.hlsl
+9-9clang/test/CodeGenHLSL/builtins/refract.hlsl
+170-1136 files not shown
+209-14212 files

LLVM/project d659267llvm/lib/Target/SystemZ SystemZAsmPrinter.cpp, llvm/test/CodeGen/SystemZ zos-func-alias.ll

Add lit test
DeltaFile
+17-0llvm/test/CodeGen/SystemZ/zos-func-alias.ll
+2-2llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+19-22 files

LLVM/project fe754dfllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/PhaseOrdering/X86 loadcombine.ll

[SLP]Remove LoadCombine workaround after handling of the copyables

LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.

Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)

Recommit after revert in 6377c86d718232fe60c548dfd7ab439f7ff84df7

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/174205
DeltaFile
+38-378llvm/test/Transforms/SLPVectorizer/X86/bad-reduction.ll
+145-99llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+12-204llvm/test/Transforms/PhaseOrdering/X86/loadcombine.ll
+4-17llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll
+4-17llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll
+2-17llvm/test/Transforms/SLPVectorizer/X86/bswap-reduction-aliased.ll
+205-7326 files

LLVM/project 5326166llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

[AMDGPU][SIInsertWaitcnt][NFC] Don't expose internal data structure to user (#179736)

With this patch we are no longer exposing the internal data structure
that holds the WaitEvents to the user through the `getWaitEventMask()`
API. Instead we only allow the user to query a specific type and get the
corresponding `WaitEventSet` with `getWaitEvents(T)`.
Note: This patch also renames `getWaitEventMask()` to `getWaitEvents()`
because we are no longer returning a mask but instead a `WaitEventSet`
object.
DeltaFile
+15-17llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+15-171 files

LLVM/project ba58225mlir/python CMakeLists.txt, mlir/python/mlir/dialects X86VectorTransformOps.td X86Vector.td

[mlir][x86vector] Python bindings for x86vector dialect (#179958)

Registers python bindings for x86vector dialect and transform ops.
DeltaFile
+76-0mlir/test/python/dialects/x86vector.py
+40-0mlir/test/python/dialects/transform_x86vector_ext.py
+16-0mlir/python/CMakeLists.txt
+14-0mlir/python/mlir/dialects/X86VectorTransformOps.td
+14-0mlir/python/mlir/dialects/X86Vector.td
+6-0mlir/python/mlir/dialects/x86vector.py
+166-01 files not shown
+171-07 files

LLVM/project d040788clang/lib/CIR/CodeGen CIRGenExpr.cpp, clang/lib/CodeGen CGExpr.cpp CodeGenModule.cpp

[clang] remove unused SrcAddr parameter from performAddrSpaceCast (#179330)

The conversion code always ended up just getting the type of Src from
the Src argument itself, with no virtual users of this, so there is no
point in also providing this API hook. Fix the documentation as well,
since it seems DestAddr must have been similarly removed at some point
in the past from the API but was still documented.

Also fixes CIR to actually return the casted value!
DeltaFile
+11-16clang/lib/CodeGen/CGExpr.cpp
+11-15clang/lib/CodeGen/CodeGenModule.cpp
+0-25clang/lib/CodeGen/TargetInfo.h
+7-15clang/lib/CIR/CodeGen/CIRGenExpr.cpp
+0-21clang/lib/CodeGen/TargetInfo.cpp
+8-10clang/lib/CodeGen/CGBuiltin.cpp
+37-10214 files not shown
+75-17120 files

LLVM/project d762cc2llvm/lib/CodeGen/GlobalISel IRTranslator.cpp, llvm/lib/Target/RISCV RISCVISelLowering.cpp

[GlobalISel] Add SVE support for alloca (#178976)

Complementary to the same handling code in SelectionDAG:

https://github.com/llvm/llvm-project/blob/f3d81d4110f3415eab3459e07b52043872b9e03b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp#L160-L165

https://github.com/llvm/llvm-project/blob/f3d81d4110f3415eab3459e07b52043872b9e03b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L4613-L4623

Co-authored-by: Claude Sonnet 4.5 <noreply at anthropic.com>
DeltaFile
+24-5llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+15-0llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/alloca.ll
+14-0llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-dynamic-alloca-scalable.ll
+0-11llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/fallback.ll
+0-5llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+53-215 files

LLVM/project 13c9276llvm/test/CodeGen/PowerPC aix-ifunc-toc-restore-query-neg.ll

[AIX] fix aix-ifunc-toc-restore-query-neg.ll (#153049)
DeltaFile
+1-0llvm/test/CodeGen/PowerPC/aix-ifunc-toc-restore-query-neg.ll
+1-01 files

LLVM/project 92d0fd7mlir/include/mlir/IR Remarks.h

[MLIR][NFC] Use toVector instead toStringRef (#179998)

DeltaFile
+1-1mlir/include/mlir/IR/Remarks.h
+1-11 files

LLVM/project 259487bllvm/lib/Support StringMap.cpp, llvm/lib/Target/SystemZ SystemZAsmPrinter.cpp SystemZAsmPrinter.h

Initial implementation
DeltaFile
+50-0llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+22-1llvm/lib/Support/StringMap.cpp
+8-4llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
+80-53 files

LLVM/project 366dfffllvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AArch64 fold-reduce-add-cmp-zero.ll icmp-vector-reduce.ll

[VectorCombine] Fold (icmp eq/ne (reduce.add X), 0) to reduce.umax

When vector elements are known to be either non-positive (e.g., from
sext i1), or non-negative (e.g., from zext i1), comparing the sum
against zero is equivalent to checking if all elements are zero. This
can be done more efficiently using reduce.umax.
DeltaFile
+227-0llvm/test/Transforms/VectorCombine/AArch64/fold-reduce-add-cmp-zero.ll
+227-0llvm/test/Transforms/VectorCombine/RISCV/fold-reduce-add-cmp-zero.ll
+71-0llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+14-19llvm/test/Transforms/VectorCombine/RISCV/icmp-vector-reduce.ll
+14-19llvm/test/Transforms/VectorCombine/AArch64/icmp-vector-reduce.ll
+6-6llvm/test/Transforms/VectorCombine/RISCV/fold-signbit-reduction-cmp.ll
+559-441 files not shown
+561-467 files

LLVM/project f4441cbllvm/utils/gn/secondary/clang/unittests/Analysis/Scalable BUILD.gn

[gn build] Port e59e9fcd38a9
DeltaFile
+1-0llvm/utils/gn/secondary/clang/unittests/Analysis/Scalable/BUILD.gn
+1-01 files

LLVM/project ecd1767offload/libomptarget DeviceImage.cpp

  [offload] Fix DeviceImage to handle OffloadBinary::create returning vector (#180003)

OffloadBinary::create() now returns
`Expected<SmallVector<unique_ptr<OffloadBinary>>>`
instead of a single unique_ptr, to support multiple entries in version 2
format.

Updated DeviceImageTy constructor to extract the first binary from the
returned
vector, with empty check. In this context, only one image per
OffloadBinary is expected.
DeltaFile
+10-4offload/libomptarget/DeviceImage.cpp
+10-41 files

LLVM/project 4f97d09mlir/lib/Conversion/XeGPUToXeVM XeGPUToXeVM.cpp, mlir/test/Conversion/XeGPUToXeVM materializecast.mlir

[MLIR][XeGPU][XeVM] Update single element vector type handling. (#178558)

Type conversion rule for single element vector and materialization
function to support the conversion has a mismatch.
Update materialization function to match the type conversion rule.
DeltaFile
+85-39mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp
+29-0mlir/test/Conversion/XeGPUToXeVM/materializecast.mlir
+114-392 files