LLVM/project 383f858llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 sve2-fixed-length-sra.ll sve2-fixed-length-sra-flag.ll

[AArch64] Preserve SDNode flags when lowering fixed vectors to scalable operations (#204616)

Preserve the original SDNode flags when LowerToScalableOp rebuilds fixed-length vector operations using their scalable container type. This allows combines to use flag information generated before the scalable was created.
DeltaFile
+832-0llvm/test/CodeGen/AArch64/sve2-fixed-length-sra.ll
+28-0llvm/test/CodeGen/AArch64/sve2-fixed-length-sra-flag.ll
+2-1llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+862-13 files

LLVM/project 17d49bclldb/source/Plugins/Process/Linux NativeRegisterContextLinux_arm64.cpp

[lldb][Linux][AArch64] Get NT_ARM_ constants from llvm's ELF header (#205834)

The first thing I do for any new register set is add it to the llvm
header. So we should just use those values instead of having all these
macros to handle older kernels.
DeltaFile
+33-68lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
+33-681 files

LLVM/project ec6eed0libcxx/include/__algorithm simd_utils.h, libcxx/include/__locale_dir num.h

[libc++] Remove __broadcast simd function (#205559)

The simd vector type in Clang already provides a conversion operator
that acts as a broadcast constructor. We can use that instead.
DeltaFile
+0-9libcxx/include/__algorithm/simd_utils.h
+1-2libcxx/include/__locale_dir/num.h
+1-112 files

LLVM/project c1869ffflang-rt/lib/runtime CMakeLists.txt cudadevice.f90, openmp/module omp_lib.F90.var CMakeLists.txt

[flang-rt][CMake] Avoid 'use, intrinsic ::' (#205634)

Two build failures reported after #204260

* Unix Makefiles generator stops working: The cause is that the rules
for building each OBJECT library lands in its own Makefile, e.g.
`flang_rt.mod.fortran.builtins.dir/build.make` and
`libomp-mod.dir/build.make`. Trying to inject dependencies directly for
build rules in the other file does not work.

* `__ppc_types.f90` not tracked: Forgotten in #204260 due to being only
conditionally enabled for PowerPC targets.

The solution for both is to just remove the workaround for CMake not
recognizing modules uses declared using `intrinsic` which caused these
problems. This PR promotes the `use` constructs in the module sources to
normal dependencies that are not ignored by CMake.

The `intrinsic` modifier changes the search path to only look for such a

    [35 lines not shown]
DeltaFile
+5-66flang-rt/lib/runtime/CMakeLists.txt
+20-20openmp/module/omp_lib.F90.var
+17-7runtimes/cmake/config-Fortran.cmake
+0-7openmp/module/CMakeLists.txt
+3-3flang-rt/lib/runtime/cudadevice.f90
+2-2flang-rt/lib/runtime/cooperative_groups.f90
+47-1051 files not shown
+48-1067 files

LLVM/project 213c7b7clang/lib/CIR/CodeGen CIRGenOpenMPClause.cpp CIRGenStmtOpenMP.cpp, clang/test/CIR/CodeGenOpenMP target-map-llvm-host.c target-map-llvm-device.c

[CIR][OpenMP] Initial implementation of target region support (#195452)

This patch adds support for target regions with some basic support for map
clauses. It also changes the clause handling to make use of the OMP dialect
ClauseOps to simplify op constrution.

Assisted-by: Cursor / claude-4.6-opus-high
DeltaFile
+120-70clang/lib/CIR/CodeGen/CIRGenOpenMPClause.cpp
+163-0clang/test/CIR/CodeGenOpenMP/target-map-llvm-host.c
+130-0clang/test/CIR/CodeGenOpenMP/target-map-llvm-device.c
+113-7clang/lib/CIR/CodeGen/CIRGenStmtOpenMP.cpp
+105-0clang/test/CIR/CodeGenOpenMP/target-map.c
+88-0clang/lib/CIR/CodeGen/CIRGenOpenMPClause.h
+719-775 files not shown
+746-8511 files

LLVM/project 0bb2b00libsycl/docs index.rst, libsycl/include/sycl/__impl index_space_classes.hpp

[libsycl] add operators to sycl::range and sycl::id (#203572)

This PR was assisted by GH Copilot (tests extension).

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+284-0libsycl/test/basic/index_space_classes.cpp
+180-34libsycl/include/sycl/__impl/index_space_classes.hpp
+16-0libsycl/test/basic/index_space_classes_negative.cpp
+1-1libsycl/docs/index.rst
+481-354 files

LLVM/project 84c42feclang/docs ReleaseNotes.rst, clang/lib/Sema SemaType.cpp

[clang][opencl][sycl] Deprecate opencl_global_device and opencl_global_host (#203569)

These attributes were originally introduced as part of the SYCL
upstreaming effort to enable improved performance for USM pointers on
FPGA targets. However, subsequent evaluation indicates that they are not
meaningfully used in practice. Additionally given the current shift in
focus away from FPGAs in DPC++, these attributes no longer serve an
active purpose. Their removal would simplify the codebase and reduce
ongoing maintenance burden.

RFC:
https://discourse.llvm.org/t/rfc-remove-opencl-global-device-and-opencl-global-host-address-space-attributes/90677
DeltaFile
+12-12clang/test/CodeGenOpenCL/address-spaces.cl
+6-6clang/test/SemaOpenCL/usm-address-spaces-conversions.cl
+4-4clang/test/SemaSYCL/address-space-conversions.cpp
+4-4clang/test/CodeGenOpenCL/address-spaces-conversions.cl
+5-2clang/lib/Sema/SemaType.cpp
+5-1clang/docs/ReleaseNotes.rst
+36-295 files not shown
+44-3211 files

LLVM/project 28475c2llvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVISelDAGToDAG.cpp, llvm/test/CodeGen/RISCV rv32p.ll

[RISCV][P-ext] Select signed widening add/sub accumulate to wadda/wsuba (#205475)

WADDA is rd += sext(rs1) + sext(rs2) and WSUBA is rd += sext(rs1) - sext(rs2),
the signed counterparts of WADDAU/WSUBAU added in #181396.

Add the WADDA/WSUBA SelectionDAG nodes, fold ADDD/SUBD whose addend is a
sign-extended i32 (high half == sra(lo, 31)) into them, collapse chained
accumulates into the free source slot, and select them to the wadda/wsuba
instructions.
DeltaFile
+84-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+72-0llvm/test/CodeGen/RISCV/rv32p.ll
+24-5llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+4-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+184-54 files

LLVM/project b45bb83llvm/test/Transforms/LoopVectorize runtime-checks-hoist.ll

[LV] Regen runtime-checks-hoist.ll with newer UTC. NFC (#205831)

Fixes some noise in an upcoming test diff with labels being renamed.
DeltaFile
+354-354llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll
+354-3541 files

LLVM/project acff1e5lldb/examples/python/templates scripted_process.py

[lldb] Don't fail scripted frame construction on WebAssembly (#205692)
DeltaFile
+7-0lldb/examples/python/templates/scripted_process.py
+7-01 files

LLVM/project e1cc4f0mlir/include/mlir/Dialect/Vector/IR VectorOps.td, mlir/lib/Dialect/Vector/IR VectorOps.cpp

Revert "[mlir][vector] add consistent stride verification to `masked load/sto…"

This reverts commit 4d4c865933e1048842f836490e02296b2cb48711.
DeltaFile
+2-38mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+0-38mlir/test/Dialect/Vector/invalid.mlir
+0-12mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+2-883 files

LLVM/project f36745elibc/shared/math tanbf16.h, libc/src/__support/math tanbf16.h

[libc][math][c23] Add tanbf16 math function (#185100)

Adds tanbf16 higher math function for bfloat16 type
DeltaFile
+83-0libc/src/__support/math/tanbf16.h
+43-0libc/test/src/math/tanbf16_test.cpp
+41-0libc/test/src/math/smoke/tanbf16_test.cpp
+24-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+23-0libc/shared/math/tanbf16.h
+21-0libc/src/math/tanbf16.h
+235-023 files not shown
+322-129 files

LLVM/project 385328aflang/lib/Lower/OpenMP OpenMP.cpp, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[flang][OpenMP] Lower target in_reduction for host fallback

Enable host-fallback lowering for target in_reduction in Flang and MLIR OpenMP translation.

Model target in_reduction through the matching map entry, force address-preserving implicit mapping for Flang in_reduction list items, and emit the host-side task-reduction lookup with __kmpc_task_reduction_get_th_data. The runtime entry point takes and returns a generic, default-address-space pointer, so normalize a non-default-address-space captured pointer to the generic address space before the call and cast the returned private pointer back to the map block argument's address space, mirroring the in_reduction handling on omp.taskloop. Unsupported device/offload-entry and richer reduction forms remain diagnosed.

Add Flang lowering, MLIR verifier/translation, and LLVM IR tests for the supported host-fallback path, including a non-default-address-space case, and the remaining unsupported cases.
DeltaFile
+131-14mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+95-21mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+110-3mlir/test/Target/LLVMIR/openmp-todo.mlir
+107-0mlir/test/Target/LLVMIR/openmp-target-in-reduction.mlir
+77-0mlir/test/Target/LLVMIR/openmp-target-in-reduction-multi.mlir
+60-15flang/lib/Lower/OpenMP/OpenMP.cpp
+580-5312 files not shown
+911-8018 files

LLVM/project c31dd96llvm/test/MC/AMDGPU gfx11_asm_vop3_from_vop2.s, llvm/test/MC/Disassembler/AMDGPU gfx11_dasm_vop3_from_vop2.txt gfx11_dasm_vop3_from_vop2-fake16.txt

[AMDGPU][NFC] Roundtrip gfx11_asm_vop3_from_vop2.s

Removes the need for gfx11_dasm_vop3_from_vop2_hi.txt sitting
downstream.

Catches a problem with printing op_sel for the tied operands in
v_fmac_f16_e64.
DeltaFile
+0-2,217llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_from_vop2.txt
+1,849-0llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_from_vop2-fake16.txt
+7-3llvm/test/MC/AMDGPU/gfx11_asm_vop3_from_vop2.s
+1,856-2,2203 files

LLVM/project 7b54691clang/lib/Headers __clang_hip_libdevice_declares.h

clang/HIP: Remove some unused ocml function declarations (#204735)
DeltaFile
+0-93clang/lib/Headers/__clang_hip_libdevice_declares.h
+0-931 files

LLVM/project 755bc5clibcxx/include/__memory shared_ptr.h

[libc++] Add missing attribute usages to `<__memory/shared_ptr.h>` (#205776)

Since 44546e0e32077241ca9a9a90ac57f2f086f9488a, lack of
`_LIBCPP_NODEBUG` and `_LIBCPP_HIDE_FROM_ABI` are caught by clang-tidy.
This patch adds them wherever expected.
DeltaFile
+6-6libcxx/include/__memory/shared_ptr.h
+6-61 files

LLVM/project bef95e5libc/src/stdlib environ_internal.cpp CMakeLists.txt, libc/src/stdlib/linux unsetenv.cpp

[libc][stdlib] Add unsetenv (#202422)

Added the POSIX unsetenv() function and its internal support.

Implemented EnvironmentManager::unset() to remove a variable by name,
free the string if allocated, and compact the array.

Updated EnvironmentManager to synchronize the public global environ
pointer when transitioning to managed storage.

Registered for x86_64, aarch64, and riscv. Integration tests cover basic
operations and edge cases.

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+176-0libc/test/integration/src/stdlib/unsetenv_test.cpp
+44-0libc/src/stdlib/environ_internal.cpp
+43-0libc/src/stdlib/linux/unsetenv.cpp
+27-9libc/src/stdlib/CMakeLists.txt
+25-0libc/src/stdlib/unsetenv.h
+16-0libc/test/integration/src/stdlib/CMakeLists.txt
+331-97 files not shown
+363-1013 files

LLVM/project ec574cfutils/bazel/llvm-project-overlay/mlir BUILD.bazel

[Bazel] Fixes 5314be5 (#205818)

This fixes 5314be5a740c9985b0b3ab958269b5f1824cce02.

Signed-off-by: Ingo Müller <ingomueller at google.com>
Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+32-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+32-01 files

LLVM/project 647c298llvm/lib/Target/AArch64 AArch64SchedA510.td, llvm/test/CodeGen/AArch64 zext-to-tbl.ll

[AArch64] Correct A510 scheduling information for LDn instructions (#205518)

The latency and throughput for these instructions don't match what's in
the A510 Software Optimization Guide, so adjust them so that they do
match. Also rearrange the definitions to match how they're structured in
the optimization guide and rename things in a similar manner to how the
C1 CPUs do things, as it's much clearer.
DeltaFile
+378-378llvm/test/tools/llvm-mca/AArch64/Cortex/A510-writeback.s
+57-57llvm/test/tools/llvm-mca/AArch64/Cortex/A510-neon-instructions.s
+56-54llvm/lib/Target/AArch64/AArch64SchedA510.td
+13-13llvm/test/CodeGen/AArch64/zext-to-tbl.ll
+504-5024 files

LLVM/project 4d4c865mlir/include/mlir/Dialect/Vector/IR VectorOps.td, mlir/lib/Dialect/Vector/IR VectorOps.cpp

[mlir][vector] add consistent stride verification to `masked load/store` and `gather/scatter` ops (#204842)

Extend negative stride checks to MaskedLoadOp, MaskedStoreOp, GatherOp,
and ScatterOp to match LoadOp and StoreOp behavior.

Depends on: #204611.

AI Disclaimer: I used AI for the tests.

---------

Signed-off-by: Federico Bruzzone <federico.bruzzone.i at gmail.com>
DeltaFile
+38-2mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+38-0mlir/test/Dialect/Vector/invalid.mlir
+12-0mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+88-23 files

LLVM/project 630125allvm/include/llvm/IR InstrTypes.h, llvm/lib/Transforms/InstCombine InstCombineCalls.cpp InstructionCombining.cpp

Revert "Reapply "[InstCombine] Merge consecutive assumes", round 2" (#205805)

It looks like there is still a bug with removing assumes from the
assumption cache.

Reverts llvm/llvm-project#205773
DeltaFile
+14-22llvm/test/Transforms/InstCombine/assume.ll
+3-19llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+0-6llvm/include/llvm/IR/InstrTypes.h
+2-1llvm/test/Transforms/InstCombine/assume-loop-align.ll
+2-1llvm/test/Transforms/PhaseOrdering/AArch64/std-find.ll
+1-1llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+22-506 files

LLVM/project c834d1aclang/lib/AST/ByteCode Interp.cpp InterpState.h, clang/test/AST/ByteCode evaluate-dtor.cpp

[clang][bytecode] Fix `evaluateDestruction()` (#205778)

Me previous testing regarding this seems to have been insufficient. Or
this regressed some time along the way.

Now that `CLANG_USE_EXPERIMENTAL_CONST_INTERP` is used for testing I
noticed a few regressions.

We need to special-case the evaluating decl in a few places, since it's
a global variable that we're allowed to modify.
DeltaFile
+45-14clang/lib/AST/ByteCode/Interp.cpp
+56-0clang/test/AST/ByteCode/evaluate-dtor.cpp
+34-0clang/lib/AST/ByteCode/InterpState.h
+5-4clang/lib/AST/ByteCode/Compiler.cpp
+2-1clang/test/CodeGenCXX/const-init-cxx2a.cpp
+3-0clang/lib/AST/ByteCode/EvalEmitter.cpp
+145-195 files not shown
+150-2111 files

LLVM/project 87c11c9libc/config/linux/aarch64 headers.txt, libc/config/linux/arm headers.txt

[libc] Add libgen.h to target public headers (#205804)

Ensure libgen.h is included in TARGET_PUBLIC_HEADERS for Linux targets
so that it gets generated and installed.

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+1-0libc/config/linux/aarch64/headers.txt
+1-0libc/config/linux/arm/headers.txt
+1-0libc/config/linux/riscv/headers.txt
+1-0libc/config/linux/x86_64/headers.txt
+4-04 files

LLVM/project 5fdc948offload/test/offloading/fortran target-no-loop.f90

[Offload][OpenMP][Flang] Update no-loop test (#205803)

Updates to the kernel type detection logic now allow `target parallel
do` to be promoted to SPMD-No-Loop.

A currently broken offload test that was affected by this change is
updated here.
DeltaFile
+3-1offload/test/offloading/fortran/target-no-loop.f90
+3-11 files

LLVM/project a787b01llvm/docs ProgrammersManual.rst, llvm/test/CodeGen/AMDGPU sched-handleMoveUp-dead-def-join.mir

Rebase

Created using spr 1.3.7
DeltaFile
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+0-8,306llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16.txt
+5,672-0llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16-fake.txt
+5,126-0llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-dead-def-join.mir
+0-4,257llvm/docs/ProgrammersManual.rst
+35,645-19,5924,340 files not shown
+184,506-135,0904,346 files

LLVM/project 23589b7clang/test/CodeGenOpenCL amdgpu-sizeof-alignof.cl func-call-dbg-loc.cl

clang/AMDGPU: Remove gizcl triples from tests

These are a leftover from a very old migration
DeltaFile
+6-10clang/test/CodeGenOpenCL/amdgpu-sizeof-alignof.cl
+1-1clang/test/CodeGenOpenCL/func-call-dbg-loc.cl
+7-112 files

LLVM/project e098135clang/lib/Analysis/FlowSensitive WatchedLiteralsSolver.cpp

[clang][dataflow] Move expensive solver asserts under EXPENSIVE_CHECKS (#205715)

The watched-literal solver has a few invariant checks that run on every
solver iteration in assertion builds. Some of these checks rebuild and
iterate over the watched-literal state. This overhead is usually hidden,
but it becomes dominant for large flow-sensitive analyses.

While testing clang-tidy's `unchecked-optional-access` check on real
world projects (in this case, LLVM itself), we found there are a few
extreme slow analyses caused by this overhead.

| Time    | File                                                |
|---------|-----------------------------------------------------|
| 8235.7s | llvm-project/clang/utils/TableGen/RISCVVEmitter.cpp |
| 8197.2s | llvm-project/clang/lib/Driver/Multilib.cpp          |

(Ran on a machine with Icelake 32cores + 128gb memory)

After moving these asserts to `EXPENSIVE_CHECKS`, the same files

    [13 lines not shown]
DeltaFile
+2-0clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
+2-01 files

LLVM/project 87f7884llvm/lib/Transforms/Scalar NaryReassociate.cpp, llvm/test/Transforms/NaryReassociate nary-gep-zero-sized-element.ll

[NaryReassociate] Fix divide by zero crash in NaryReassociatePass (#202377)

Updates NaryReassociatePass with a safety check to guard against GEPs
into arrays with zero sized element types (eg. [0 x ptr]) to prevent
division by zero.
DeltaFile
+22-0llvm/test/Transforms/NaryReassociate/nary-gep-zero-sized-element.ll
+1-1llvm/lib/Transforms/Scalar/NaryReassociate.cpp
+23-12 files

LLVM/project 0c540b9clang/lib/StaticAnalyzer/Core ExprEngineCallAndReturn.cpp

[analyzer] Fix unjustified early return in processCallExit (#205656)

In `ExprEngine::processCallExit` step 3 may theoretically split the
state because it calls `removeDead`, which activates `LiveSymbols` and
`DeadSymbols` callbacks of various checkers. (However, in practice it is
likely that these checker callbacks never actually split the state -- at
least, no such state splits happen in the LIT tests.)

The nodes produced by `removeDead` are placed in the set `CleanedNodes`;
in theory the different execution paths should be handled in parallel,
independently of each other. However, the loop `for (ExplodedNode *N :
CleanedNodes)` contained an early return statement, which meant that if
the creation of `CEENode` failed for a node `N`, then the subsequent
iterations were skipped altogether.

This commit replaces the `return` with a `continue` to ensure that the
nodes in `CleanedNodes` are handled independently (if there are several
such nodes).


    [6 lines not shown]
DeltaFile
+1-1clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
+1-11 files

LLVM/project c4a11bbllvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp, llvm/test/CodeGen/AArch64/GlobalISel legalize-saddsat.mir legalize-ssubsat.mir

GlobalISel/LegalizerHelper: Use same LLT kind as WideTy for widen merge

In widenScalarMergeValues, WideTy is input given by target. Use same LLT
kind for other types of different sizes instead of LLT::scalar.
Makes a difference with extendedLLTs.
DeltaFile
+2-2llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+1-2llvm/test/CodeGen/AArch64/GlobalISel/legalize-saddsat.mir
+1-2llvm/test/CodeGen/AArch64/GlobalISel/legalize-ssubsat.mir
+4-63 files