LLVM/project c703ea5clang/lib/Headers/hlsl hlsl_alias_intrinsics.h, clang/lib/Sema SemaHLSL.cpp

[HLSL][DirectX][SPIRV] Implement the `fma` API (#185304)

This PR adds `fma` HLSL intrinsic (with support for matrices)
It follows all of the steps from #99117.
Closes #99117.
DeltaFile
+138-0clang/test/CodeGenHLSL/builtins/fma.hlsl
+113-0clang/test/SemaHLSL/BuiltIns/fma-errors.hlsl
+54-0clang/lib/Headers/hlsl/hlsl_alias_intrinsics.h
+53-0llvm/test/CodeGen/DirectX/fma.ll
+35-0clang/lib/Sema/SemaHLSL.cpp
+11-3llvm/lib/Target/DirectX/DXILShaderFlags.cpp
+404-34 files not shown
+430-610 files

LLVM/project 3d5a255llvm/lib/Transforms/Instrumentation MemorySanitizer.cpp

[msan] Disambiguate "Strict" vs. "Heuristic" when dumping instructions (#188873)

When -msan-dump-strict-instructions and
-msan-dump-heuristic-instructions are simultaneously enabled, it is
unclear from the output whether each instruction is strictly vs.
heuristically handled. [*] This patch fixes the issue by tagging the
output.

The actual instrumentation of the code is unaffected by this change.

[*] A workaround is to compile the code once with only
-msan-dump-strict-instructions, and a second time with
-msan-dump-heuristic-instructions, but this unnecessarily doubles the
compilation time.
DeltaFile
+10-8llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
+10-81 files

LLVM/project ebce149llvm/lib/Target/AMDGPU SIRegisterInfo.cpp, llvm/test/CodeGen/AMDGPU frame-index-disjoint-s-or-b32.ll eliminate-frame-index-scalar-bit-ops.mir

Revert "AMDGPU: Fold frame indexes into disjoint s_or_b32 (#102345)"

This reverts commit fc2dac83ed0cac4dccbf1ef72445e7ebe84553b1.
DeltaFile
+0-220llvm/test/CodeGen/AMDGPU/frame-index-disjoint-s-or-b32.ll
+0-161llvm/test/CodeGen/AMDGPU/eliminate-frame-index-scalar-bit-ops.mir
+2-6llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+2-3873 files

LLVM/project 00aebbfllvm/include/llvm/Analysis DependenceAnalysis.h, llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Refactor signature of weakCrossingSIVtest and check inputs (NFCI) (#187117)

Passing SCEVAddRecExpr objects directly to weakCrossingSIVtest and
checking the validity of the input operands
DeltaFile
+12-9llvm/lib/Analysis/DependenceAnalysis.cpp
+2-4llvm/include/llvm/Analysis/DependenceAnalysis.h
+14-132 files

LLVM/project ead9ac8libc/include limits.yaml locale.h.def, utils/bazel/llvm-project-overlay/libc BUILD.bazel

[libc] Remove header templates from several C standard headers. (#188878)

Switches the following headers to hdrgen-produced ones by referencing
some macro from C standard and the file containing the declarations in
corresponding YAML files:

* limits.h (referenced _WIDTH / _MAX / _MIN families).
* locale.h (referenced LC_ family).
* time.h (referenced CLOCKS_PER_SEC).
* wchar.h (referenced WEOF).
DeltaFile
+73-2libc/include/limits.yaml
+4-16utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+0-18libc/include/locale.h.def
+0-17libc/include/wchar.h.def
+16-1libc/include/locale.yaml
+0-17libc/include/time.h.def
+93-713 files not shown
+101-879 files

LLVM/project 80b304dlld/COFF LTO.cpp, lld/ELF LTO.cpp

[DTLTO] Improve performance of adding files to the link (#186366)

The in-process ThinLTO backend typically generates object files in
memory and adds them directly to the link, except when the ThinLTO cache
is in use. DTLTO is unusual in that it adds files to the link from disk
in all cases.

When the ThinLTO cache is not in use, ThinLTO adds files via an
`AddStreamFn` callback provided by the linker, which ultimately appends
to a `SmallVector` in LLD. When the cache is in use, the linker supplies
an `AddBufferFn` callback that adds files more efficiently (by moving
`MemoryBuffer` ownership).

This patch adds a mandatory `AddBufferFn` to the DTLTO ThinLTO backend.
The backend uses this to add files to the link more efficiently.
Additionally:
- Move AddStream from CGThinBackend to InProcessThinBackend, for reader
  clarity.
- Modify linker comments that implied the AddBuffer path is

    [12 lines not shown]
DeltaFile
+30-30llvm/lib/LTO/LTO.cpp
+19-19llvm/tools/llvm-lto2/llvm-lto2.cpp
+17-9lld/COFF/LTO.cpp
+16-8lld/ELF/LTO.cpp
+3-1llvm/include/llvm/LTO/LTO.h
+85-675 files

LLVM/project d271bd3llvm/lib/LTO LTO.cpp, llvm/tools/llvm-lto2 llvm-lto2.cpp

Revert "[DTLTO] Speed up temporary file removal in the ThinLTO backed (#189043)

This reverts commit 11b439c5c5a07c95d30ce25abd6adf7f5fbb7105.

timeTraceProfilerCleanup() can be called before the temporary file
deletion has completed in LLD. This causes memory leaks that were
flagged up by sanitizer builds, e.g.:

https://lab.llvm.org/buildbot/#/builders/24/builds/18840/steps/11/logs/stdio
DeltaFile
+7-49llvm/lib/LTO/LTO.cpp
+1-6llvm/tools/llvm-lto2/llvm-lto2.cpp
+8-552 files

LLVM/project 7e2f789clang/include/clang/Support RISCVVIntrinsicUtils.h, clang/lib/Support RISCVVIntrinsicUtils.cpp

[RISCV][NFC] Use enum types to improve debuggability (#188418)

So that we can see the enum values instead of integral values when
dumping in debuggers.
DeltaFile
+14-16clang/lib/Support/RISCVVIntrinsicUtils.cpp
+7-6clang/include/clang/Support/RISCVVIntrinsicUtils.h
+21-222 files

LLVM/project 030ef70libc/docs build_and_test.rst

[libc][docs] Document libc-shared-tests ninja target (#189062)

Added a brief description of the libc-shared-tests target to the
Building and Testing page.

This target allows running tests for shared standalone components like
math primitives without the full libc runtime.
DeltaFile
+6-0libc/docs/build_and_test.rst
+6-01 files

LLVM/project bd947eaclang/lib/Sema SemaExpr.cpp, clang/test/SemaCXX cxx23-builtin-subscript.cpp

[Clang] [Sema] Don't diagnose multidimensional subscript operators on dependent types (#188910)

I forgot to check for dependent types in #187828; we somehow didn’t have
tests for this so CI didn’t catch this...
DeltaFile
+3-0clang/test/SemaCXX/cxx23-builtin-subscript.cpp
+2-1clang/lib/Sema/SemaExpr.cpp
+5-12 files

LLVM/project a5b6a4clibsycl/include/sycl/__impl/detail kernel_arg_helpers.hpp

removed invalid comment

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+0-2libsycl/include/sycl/__impl/detail/kernel_arg_helpers.hpp
+0-21 files

LLVM/project cb58fe9mlir/lib/Dialect/SCF/Utils Utils.cpp, mlir/test/Dialect/SCF loop-unroll.mlir

[MLIR][SCF] Fix loopUnrollByFactor for unsigned loops with narrow integer types (#189001)

`loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds,
which sign-extends the constant to `int64_t`. For unsigned `scf.for`
loops with narrow integer types (e.g. i1, i2, i3), this produces wrong
results: a bound such as `1 : i1` has `getSExtValue() == -1` but should
be treated as `1` (unsigned).

Two bugs were introduced by this:

1. **Wrong epilogue detection**: the comparison `upperBoundUnrolledCst <
ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the
sign-extended i1 value 1) evaluated to false, suppressing the epilogue
that should execute the remaining iterations.

2. **Zero step after overflow**: when `tripCountEvenMultiple == 0` (all
iterations go to the epilogue), `stepUnrolledCst = stepCst *
unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A
zero step causes `constantTripCount` to return `nullopt`, preventing the

    [11 lines not shown]
DeltaFile
+142-0mlir/test/Dialect/SCF/loop-unroll.mlir
+29-5mlir/lib/Dialect/SCF/Utils/Utils.cpp
+171-52 files

LLVM/project 28e2fa3mlir/lib/Dialect/XeGPU/IR XeGPUOps.cpp, mlir/lib/Dialect/XeGPU/Transforms XeGPUUnroll.cpp XeGPUWgToSgDistribute.cpp

[MLIR][XeGPU] Extend convert_layout op to support scalar type (#188874)

This PR adds scalar type to convert_layout op's result and operand. It
also enhance convert_layout pattern in wg-to-sg, unrolling, and
sg-to-lane distribution.

It is to support reduction to scalar, whether currently the layout
propagation doesn't support scalar to carry any layout. The design
choice to insert convert_layout op after reduction-to-scalar op to
record the layout information permanently across the passes.
DeltaFile
+15-6mlir/lib/Dialect/XeGPU/Transforms/XeGPUUnroll.cpp
+10-8mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp
+18-0mlir/test/Dialect/XeGPU/xegpu-blocking.mlir
+15-0mlir/test/Dialect/XeGPU/subgroup-distribute-unit.mlir
+14-0mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+10-3mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp
+82-174 files not shown
+99-2010 files

LLVM/project 2af95b2libc/utils/docgen docgen.py

[libc][docs] Fix POSIX basedefs links for nested headers (#188738)

Fix broken POSIX basedefs links for nested headers in llvm-libc docs.

The docgen script currently emits paths like `sys/wait.h.html`, but the
Open Group uses `sys_wait.h.html`, for example:
-
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_wait.h.html

This updates nested-header link generation while leaving flat headers
unchanged.
DeltaFile
+7-3libc/utils/docgen/docgen.py
+7-31 files

LLVM/project 22cfe6flldb/include/lldb/Breakpoint BreakpointLocation.h, lldb/source/Breakpoint BreakpointLocation.cpp

[lldb] Make single-argument Address constructor explicit (NFC) (#189035)

This is to highlight places where we (probably unintentionally)
construct an `Address` object from an already resolved address, making
it unresolved again.
See the changes in `DynamicLoaderDarwin.cpp` for a quick example.

Also, use this constructor instead of `Address(lldb::addr_t file_addr,
const SectionList *section_list)` when `section_list` is `nullptr`.
DeltaFile
+7-7lldb/source/Plugins/Language/ObjC/NSString.cpp
+6-4lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
+4-4lldb/include/lldb/Breakpoint/BreakpointLocation.h
+2-6lldb/source/Breakpoint/BreakpointLocation.cpp
+4-4lldb/source/Expression/DWARFExpression.cpp
+4-3lldb/source/Commands/CommandObjectTarget.cpp
+27-2826 files not shown
+59-6132 files

LLVM/project 9e44babmlir/lib/Dialect/Vector/Transforms VectorTransforms.cpp, mlir/test/Dialect/Vector vector-transfer-collapse-inner-most-dims.mlir

[mlir][vector] Add support for dropping inner unit dims for transfer_read/write with masks. (#188841)

The revision clears a long-due TODO, which supports the lowering when
transfer_read/write ops have mask via inserting a vector.shape_cast op
for the masked value.

---------

Signed-off-by: hanhanW <hanhan0912 at gmail.com>
DeltaFile
+26-13mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
+31-0mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir
+57-132 files

LLVM/project 104cceflibsycl/include/sycl/__impl index_space_classes.hpp queue.hpp, libsycl/include/sycl/__impl/detail kernel_arg_helpers.hpp

draft

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>

add tests for parallel_for

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>

remove operators from index space classes

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+413-0libsycl/include/sycl/__impl/index_space_classes.hpp
+192-31libsycl/include/sycl/__impl/queue.hpp
+187-0libsycl/include/sycl/__impl/detail/kernel_arg_helpers.hpp
+111-0libsycl/test/basic/wrapped_usm_pointers.cpp
+75-0libsycl/include/sycl/__spirv/spirv_vars.hpp
+47-0libsycl/test/basic/queue_parallel_for_generic.cpp
+1,025-312 files not shown
+1,031-328 files

LLVM/project a125d9bllvm/lib/Transforms/Vectorize SLPVectorizer.cpp

[SLP][NFC] Reapply "Refactor to prepare for constant stride stores" (#188689)

Refactor to proceed #185964.

Much of this is a refactor to address this issues. Instead of iterating over one chain at a time, attempting all VFs for that given change, we now iterate over VFs, trying each chain for the current VF.

Includes fix for use after free bug.
DeltaFile
+471-258llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+471-2581 files

LLVM/project 87bec47llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.div.fixup.f16.ll llvm.amdgcn.div.fixup.ll

AMDGPU/GlobalISel: RegBankLegalize rules for div_fmas/fixup/scale (#188305)
DeltaFile
+197-276llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll
+218-47llvm/test/CodeGen/AMDGPU/llvm.amdgcn.div.fixup.f16.ll
+20-12llvm/test/CodeGen/AMDGPU/GlobalISel/constant-bus-restriction.ll
+20-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+17-2llvm/test/CodeGen/AMDGPU/llvm.amdgcn.div.fixup.ll
+4-5llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.div.fmas.mir
+476-3422 files not shown
+481-3488 files

LLVM/project f52797ccompiler-rt/cmake/Modules CompilerRTUtils.cmake

[compiler-rt] Fix irrelevant warning on the builtins target (#189055)

Summary:
Currently, building through runtimes will yield this warning:
```
  CMake Warning at compiler-rt/cmake/Modules/CompilerRTUtils.cmake:335 (message):
    LLVMTestingSupport not found in LLVM_AVAILABLE_LIBS
  Call Stack (most recent call first)
```

This is due to the fact that the builtins target does not go through the
s tandard runtimes patch and sets them as BUILDTREE_ONLY so they do not
show up. These are not used in this case, so just guard the condition to
suppress the warning.
DeltaFile
+15-11compiler-rt/cmake/Modules/CompilerRTUtils.cmake
+15-111 files

LLVM/project 15bfc06offload CMakeLists.txt, offload/cmake OpenMPTesting.cmake

[Offload][NFC] Various minor changes to Offload CMake (#189029)

Summary:
Most of these just remove some redundancy or rename `openmp` ->
`offload` where the variable is purely internal.
DeltaFile
+15-20offload/cmake/OpenMPTesting.cmake
+5-24offload/cmake/Modules/LibomptargetGetDependencies.cmake
+6-8offload/tools/CMakeLists.txt
+2-9offload/CMakeLists.txt
+3-3offload/test/lit.site.cfg.in
+1-1offload/plugins-nextgen/CMakeLists.txt
+32-653 files not shown
+34-689 files

LLVM/project 9238b0fllvm/test/CodeGen/SPIRV/passes SPIRVEmitIntrinsics-infer-ptr-type.ll

[NFC][SPIRV] New test for untested case in SPIRVEmitIntrinsics (#188950)

[This
case](https://github.com/llvm/llvm-project/blob/bc3571569685bfa4671e80d112dc0d5c8fc7b25d/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp#L2815-L2818)
is not covered by any existing test (checked via code coverage and
inserting an `abort`). New test proposed that covers this line, as
demonstrated by test failure when an `abort` is present in that line.
DeltaFile
+28-0llvm/test/CodeGen/SPIRV/passes/SPIRVEmitIntrinsics-infer-ptr-type.ll
+28-01 files

LLVM/project d340a68clang/lib/AST MicrosoftMangle.cpp, clang/test/CodeGenCXX mangle-ms.cpp cfi-icall.cpp

Produce back-references for anonymous namespaces (#188843)

The Microsoft mangle implementation does not produce back-references for
anonymous namespaces, which results in nonsensical output from both
`undname` and `llvm-undname`. Consider the following example:

```
namespace {
    struct X {};
    X foo(X, X);
}

int main() {
    foo({}, {});
}
```

Clang 22.1.0
```

    [30 lines not shown]
DeltaFile
+13-1clang/test/CodeGenCXX/mangle-ms.cpp
+3-1clang/lib/AST/MicrosoftMangle.cpp
+1-1clang/test/CodeGenCXX/cfi-icall.cpp
+17-33 files

LLVM/project 7603603clang/lib/AST/ByteCode Interp.h, clang/test/Frontend fixed_point_sub_const.c

[clang][bytecode] Fix an assertion failure with fixed-point types (#189033)

Negation can also fail for fixed-point values.
DeltaFile
+3-0clang/test/Frontend/fixed_point_sub_const.c
+1-1clang/lib/AST/ByteCode/Interp.h
+4-12 files

LLVM/project 1a08f41lldb/packages/Python/lldbsuite/test decorators.py

[lldb] Disable arm64e tests under ASan (#189052)

Technically ASan is supported, but we need an arm64e sanitizer runtime.
It's still enabled when the whole test suite runs as arm64e, assuming
that you need arm64e runtimes regardless.

This will fix
https://ci.swift.org/view/all/job/llvm.org/job/lldb-cmake-sanitized-os-verification/

rdar://173313715
DeltaFile
+7-0lldb/packages/Python/lldbsuite/test/decorators.py
+7-01 files

LLVM/project 5c1ddabclang/lib/Driver Driver.cpp, clang/test/Driver pseudo-probe.c

[clang][Darwin] Externalize pseudoprobe and debug info (#186873)
DeltaFile
+22-0clang/test/Driver/pseudo-probe.c
+13-5clang/lib/Driver/Driver.cpp
+35-52 files

LLVM/project fa7ce27clang/docs OpenMPSupport.rst

[Clang][OpenMP][NFC] Fix status color mismatches in OpenMPSupport.rst (#189050)

Correct the colors used in the OpenMP support tables so they
consistently match their status text:

- :good: (green) is for 'done' only
- :part: (yellow) is for in-progress states ('partial', 'worked on', 'in
progress', 'prototyped', etc.)
- :none: (red) is for 'unclaimed' only

Assisted with copilot
DeltaFile
+10-10clang/docs/OpenMPSupport.rst
+10-101 files

LLVM/project 166f996libc/include/sys socket.yaml, utils/bazel/llvm-project-overlay/libc BUILD.bazel libc_build_rules.bzl

[libc][bazel] Add generation for public headers (#184889)

Previously there was a single rule for stdbit, this PR adds generated
header targets for the rest of the linux headers. It also adds a
cc_library
for all of the public headers which also includes the types and macros
headers.
DeltaFile
+275-12utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+14-2utils/bazel/llvm-project-overlay/libc/libc_build_rules.bzl
+1-1libc/include/sys/socket.yaml
+290-153 files

LLVM/project 5688acallvm/lib/Target/AMDGPU AMDGPUCodeGenPrepare.cpp

AMDGPU: Simplify synthesis of nextdown(1.0) constant (#189039)
DeltaFile
+3-5llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+3-51 files

LLVM/project 9a0b003

[MLIR][XeVM] Wrap in-place op modifications in modifyOpInPlace in LLVMLoadStoreToOCLPattern (#188952)

LLVMLoadStoreToOCLPattern::matchAndRewrite was calling op->removeAttr()
and op->setOperand() directly without going through the rewriter API.
This caused MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS to report "expected
pattern to replace the root operation or modify it in place".

Fix: wrap the direct mutations in rewriter.modifyOpInPlace().

Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
DeltaFile
+0-00 files