LLVM/project 73df0d6llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/test/MC/AMDGPU gfx1250_asm_wmma_w32.s gfx1250_asm_wmma_w32_err.s

[AMDGPU] Disable neg_lo[0:1] and neg_hi[0:1] on gfx1250 WMMA, MC part (#188349)
DeltaFile
+10-45llvm/test/MC/AMDGPU/gfx1250_asm_wmma_w32.s
+10-31llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_wmma_w32.txt
+30-0llvm/test/MC/AMDGPU/gfx1250_asm_wmma_w32_err.s
+6-0llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+56-764 files

LLVM/project a0f4e65llvm/test/CodeGen/X86 vector-interleaved-store-i64-stride-7.ll vector-interleaved-store-i64-stride-6.ll

Merge branch 'main' into users/arsenm/libclc/override-amdgpu-fast-fma-macro
DeltaFile
+4,978-4,984llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-7.ll
+4,590-4,623llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-6.ll
+3,850-4,310llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-8.ll
+3,562-3,632llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-8.ll
+2,430-2,474llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-7.ll
+1,815-1,852llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll
+21,225-21,875404 files not shown
+38,939-33,584410 files

LLVM/project c1cd0d5libclc/clc/include/clc/math math.h gentype.inc, libclc/clc/lib/generic/math clc_ep.inc clc_sincos_helpers.inc

libclc: Unify fast FMA controls (#188244)

This was defined in multiple places with different names. Consolidate
on one, with a gentype wrapper for it. Also set the value based on the
standard FP_FAST_FMA* macros.
DeltaFile
+9-28libclc/clc/lib/generic/math/clc_ep.inc
+17-2libclc/clc/include/clc/math/math.h
+6-0libclc/clc/include/clc/math/gentype.inc
+2-2libclc/clc/lib/generic/math/clc_sincos_helpers.inc
+1-0libclc/clc/lib/generic/math/clc_ep.cl
+35-325 files

LLVM/project 7cb57c6mlir/include/mlir/Dialect/XeGPU/TransformOps XeGPUTransformOps.td, mlir/lib/Dialect/XeGPU/TransformOps XeGPUTransformOps.cpp

[MLIR][XeGPU][TransformOps] Remove obsolete transform ops (#187561)

Cleaning up XeGPU transform ops. Now that XeGPU layout propagation
works, it is sufficient to set the layouts for anchor ops (e.g.
load/store/dpas ops) only.

Changes:
* Remove `xegpu.get_desc_op` and `xegpu.set_desc_layout`. Users should
not change the layout of descriptor op's return value anymore.
* Add `xegpu.get_load_op(value)` that finds either `xegpu.load_nd` or
`xegpu.load` op in the value's producer chain. This is a useful utility
as load ops often need to be annotated with a layout.
* The generic `xegpu.set_op_layout_attr(op, ...)` is now replaced by
`xegpu.set_anchor_layout(op, ...)` that only sets layout attribute of
anchor ops. Raises an error if the given op does not support anchor
layouts.
* `xegpu.insert_prefetch` takes a load op handle instead of a value.
DeltaFile
+141-279mlir/test/Dialect/XeGPU/transform-ops.mlir
+73-196mlir/test/python/dialects/transform_xegpu_ext.py
+65-181mlir/lib/Dialect/XeGPU/TransformOps/XeGPUTransformOps.cpp
+26-105mlir/include/mlir/Dialect/XeGPU/TransformOps/XeGPUTransformOps.td
+13-94mlir/python/mlir/dialects/transform/xegpu.py
+5-78mlir/test/Dialect/XeGPU/transform-ops-invalid.mlir
+323-9336 files

LLVM/project 34ee487llvm/lib/Target/AMDGPU SIInstrInfo.cpp SIInstrInfo.h, llvm/test/CodeGen/AMDGPU si-lower-sgpr-spills-vgpr-lanes-usage.mir

AMDGPU: Implememt memsize forms of isLoadFromStackSlot/isStoreToStackSlot (#188264)

Requested in #182673, though I'm not sure why this needs to be pushed
into targets. The size can be taken from the machine mem operand
generically.
DeltaFile
+22-12llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+19-4llvm/lib/Target/AMDGPU/SIInstrInfo.h
+0-3llvm/test/CodeGen/AMDGPU/si-lower-sgpr-spills-vgpr-lanes-usage.mir
+41-193 files

LLVM/project 1904867llvm/test/Transforms/InstCombine nanless-canonicalize-combine.ll

InstCombine: Add baseline test for nanless canonicalize combine (#172997)
DeltaFile
+832-0llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
+832-01 files

LLVM/project 4c4c1dblibc/src/__support/OSUtil/linux/syscall_wrappers utimensat.h, libc/src/sys/stat utimensat.h

[libc] Add utimensat syscall wrapper and entrypoint (#188347)

Implemented the utimensat syscall for Linux and added the entrypoint to
sys/stat.h.

* Added utimensat syscall wrapper to OSUtil
* Updated utimes to use the utimensat wrapper
* Added utimensat unit tests to sys/stat
* Configured entrypoints for x86_64, riscv, and aarch64
DeltaFile
+119-0libc/test/src/sys/stat/utimensat_test.cpp
+42-0libc/src/__support/OSUtil/linux/syscall_wrappers/utimensat.h
+15-18libc/src/sys/time/linux/utimes.cpp
+29-0libc/src/sys/stat/linux/utimensat.cpp
+22-0libc/src/sys/stat/utimensat.h
+21-0libc/test/src/sys/stat/CMakeLists.txt
+248-189 files not shown
+301-2215 files

LLVM/project e40062coffload/plugins-nextgen/level_zero/include L0Device.h, offload/plugins-nextgen/level_zero/src L0Device.cpp

[OFFLOAD][L0] Add support to run ctor/dtor code (#187510)

This PR adds support in the Level Zero plugin to execute
constructors/destructors on the device code. As spirv-link has some
limitations, it mimics the CUDA plugin behavior where the RTL constructs
the device side tables before invoking the kernel that will execute
them.

The kernel and other necessary symbols to create the device tables are
created by the SPIRVCtorDtorLowering pass to be added in #187509
DeltaFile
+143-0offload/plugins-nextgen/level_zero/src/L0Device.cpp
+10-0offload/plugins-nextgen/level_zero/include/L0Device.h
+153-02 files

LLVM/project abd26bfllvm/include/llvm/ExecutionEngine/Orc SelfExecutorProcessControl.h, llvm/lib/ExecutionEngine/Orc SelfExecutorProcessControl.cpp

[ORC] Move DylibManager impl out of SelfExecutorProcessControl. (#188417)

SelfExecutorProcessControl no longer implements DylibManager. Instead a
private inner class, InProcessDylibManager, is used to implement this
interface. This change should not affect the behavior of
SelfExecutorProcessControl from the perspective of API clients.

This is a step towards decoupling ExecutorProcessControl implementations
from other interfaces.
DeltaFile
+41-39llvm/lib/ExecutionEngine/Orc/SelfExecutorProcessControl.cpp
+14-8llvm/include/llvm/ExecutionEngine/Orc/SelfExecutorProcessControl.h
+55-472 files

LLVM/project d69fc65llvm/lib/Target/AMDGPU AMDGPUPromoteAlloca.cpp, llvm/test/CodeGen/AMDGPU promote-alloca-placeholder-replacement.ll promote-alloca-use-after-erase.ll

AMDGPU: Simplify placeholder replacement in AMDGPUPromoteAlloca (#188202)

If `promoteAllocaUserToVector` returns the placeholder, it means the
instruction does not actually modify the alloca. we don't need to add
the placeholder as block available value for correctness. Instructions
appear afterwards in the the same block could still get the placeholder
as source value through GetCurVal() call. Instructions in other block
which access the alloca will be set up later when we really do
placeholder replacement.

This help simplify the placeholder replacement logic.
DeltaFile
+13-15llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+20-0llvm/test/CodeGen/AMDGPU/promote-alloca-placeholder-replacement.ll
+2-3llvm/test/CodeGen/AMDGPU/promote-alloca-use-after-erase.ll
+35-183 files

LLVM/project f0b58c1libc/docs porting.rst, libc/docs/dev entrypoints.rst cmake_build_rules.rst

[libc][docs][NFC] Expand entrypoints technical reference (#4) (#188255)

Expanded entrypoints.rst with details about definitions, registration
rules, and the lifecycle of an entrypoint.

Updated multiple documents to remove redundant technical details and
link to the centralized entrypoints reference:

- libc/docs/dev/cmake_build_rules.rst
- libc/docs/dev/implementation_standard.rst
- libc/docs/porting.rst
- libc/docs/dev/source_tree_layout.rst
DeltaFile
+111-3libc/docs/dev/entrypoints.rst
+11-21libc/docs/porting.rst
+7-10libc/docs/dev/cmake_build_rules.rst
+10-7libc/docs/dev/implementation_standard.rst
+3-2libc/docs/dev/source_tree_layout.rst
+2-0libc/docs/dev/fuzzing.rst
+144-431 files not shown
+146-437 files

LLVM/project a5a7f62compiler-rt/test/builtins CMakeLists.txt, compiler-rt/test/builtins/Unit lit.cfg.py lit.site.cfg.py.in

[compiler-rt] CRT builtins tests should not run on mac/windows under LLVM_ENABLE_RUNTIMES (#187835)

#171941 got the builtins tests running under LLVM_ENABLE_RUNTIMES by
testing the builtins as part of the runtimes build.

As a consequence, CMake in `lib/builtins/` is no longer visible when
configuring the tests (but `test/builtins/` is). This means that the
`cmake_dependent_option` from `lib/builtins/` is not accounted for by
the tests, allowing COMPILER_RT_BUILD_CRT to be YES when
COMPILER_RT_HAS_CRT is NO. As a consequence, the CRT tests are running
on platforms where COMPILER_RT_HAS_CRT is false (#176892).


https://github.com/llvm/llvm-project/blob/367da15a11c52886c50e7f020cb4de59fe6d07ca/compiler-rt/lib/builtins/CMakeLists.txt#L1106-L1108

Although the long-term solution could be to split both the builtins (and
their tests) out of compiler-rt into a top-level directory with shared
options, this works around the issue for the moment by checking both
COMPILER_RT_HAS_CRT and COMPILER_RT_BUILD_CRT before enabling the "crt"

    [2 lines not shown]
DeltaFile
+6-3compiler-rt/test/builtins/CMakeLists.txt
+2-2compiler-rt/test/builtins/Unit/lit.cfg.py
+1-1compiler-rt/test/builtins/Unit/lit.site.cfg.py.in
+9-63 files

LLVM/project fb51d6fllvm/test/CodeGen/AMDGPU llvm.amdgcn.reduce.sub.ll llvm.amdgcn.reduce.add.ll

Update test cases
DeltaFile
+424-146llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.sub.ll
+420-142llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.add.ll
+406-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.min.ll
+406-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.max.ll
+372-132llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.xor.ll
+394-108llvm/test/CodeGen/AMDGPU/llvm.amdgcn.reduce.umax.ll
+2,422-7443 files not shown
+3,512-1,0689 files

LLVM/project acbbe5cclang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std stddef.h cstddef, clang-tools-extra/test/clang-tidy/checkers/readability implicit-bool-conversion.cpp implicit-bool-conversion-cxx98.cpp

[clang-tidy][NFC] Remove hack in readability-implicit-bool-conversion testcases (#188399)

Another attempt after #184850
DeltaFile
+4-7clang-tools-extra/test/clang-tidy/checkers/readability/implicit-bool-conversion.cpp
+1-4clang-tools-extra/test/clang-tidy/checkers/readability/implicit-bool-conversion-cxx98.cpp
+2-0clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std/stddef.h
+2-0clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std/cstddef
+9-114 files

LLVM/project 621f0a6llvm/test/CodeGen/AMDGPU schedule-gcn-physreg-pressure.ll

Added test for early clobber with a tuple register.
DeltaFile
+71-0llvm/test/CodeGen/AMDGPU/schedule-gcn-physreg-pressure.ll
+71-01 files

LLVM/project 2eae3f3clang/lib/DependencyScanning DependencyScannerImpl.cpp, clang/test/ClangScanDeps p1689-suppress-warnings.cppm

[Reland] [ClangScanDeps] Do not emit warning for P1689 format (#186966) (#188401)

Close https://github.com/llvm/llvm-project/issues/185394

This is only for P1689 format as ClangScanDeps/optimize-vfs-pch.m will
check for warning message. I'll leave this to people who want to change
that.
DeltaFile
+23-0clang/test/ClangScanDeps/p1689-suppress-warnings.cppm
+2-0clang/lib/DependencyScanning/DependencyScannerImpl.cpp
+25-02 files

LLVM/project 3b04d26compiler-rt/test/asan/TestCases stack_container_dynamic_lib.c stack_container_dynamic_lib.cpp

[asan] Convert __SANITIZER_DISABLE_CONTAINER_OVERFLOW__ tests to C (#188406)

As-is tests do not pass on android with older C++ headers.
There is nothing C++ specific in tests.

Followup to #181721.
DeltaFile
+126-0compiler-rt/test/asan/TestCases/stack_container_dynamic_lib.c
+0-118compiler-rt/test/asan/TestCases/stack_container_dynamic_lib.cpp
+49-0compiler-rt/test/asan/TestCases/disable_container_overflow_checks.c
+0-49compiler-rt/test/asan/TestCases/disable_container_overflow_checks.cpp
+175-1674 files

LLVM/project 311b4declang/lib/Sema SemaLookup.cpp

clang-format
DeltaFile
+1-1clang/lib/Sema/SemaLookup.cpp
+1-11 files

LLVM/project bd31d4dclang/lib/AST ASTContext.cpp, clang/lib/Sema SemaLookup.cpp SemaChecking.cpp

[Clang] define memory scopes as a builtin enum

Clang currently represents memory scopes as pre-defined preprocessor macros that
evaluate to integers. But so far, there are three sets of conflicting scopes:
"common" clang scopes, HIP scopes and OpenCL scopes. These sets use the same
integers in different orders, making it impossible to validate their use. A
better approach is to represent these scopes as enum types, so that the integer
values become less significant. Sema can now validate the scope argument by its
type instead.

Both C and C++ define an enum for memory_order, but there is no standard enum
for memory_scope. This change introduces a Clang-specific enum "memory_scope".
The pre-defined macros are now mapped to this enum. Later changes can add
similar enums for other languages.

enum __memory_scope {
  __memory_scope_system,
  __memory_scope_device,
  __memory_scope_workgroup,

    [19 lines not shown]
DeltaFile
+78-0clang/lib/AST/ASTContext.cpp
+66-0clang/lib/Sema/SemaLookup.cpp
+30-30clang/test/Preprocessor/init.c
+58-0clang/test/Sema/scoped-atomic-scope-warning.c
+56-0clang/test/Sema/builtin-memory-scope.c
+50-0clang/lib/Sema/SemaChecking.cpp
+338-3013 files not shown
+473-6019 files

LLVM/project f599bfclld/ELF LinkerScript.cpp

[ELF] Guard relocation section handling behind copyRelocs in addOrphanSections. NFC (#188409)

In addOrphanSections, getRelocatedSection() only returns non-null for -r
or --emit-relocs links. Guard code blocks with `copyRelocs` to skip
unnecessary dyn_cast + getRelocatedSection calls per section in the
common case. Hoist copyRelocs and relocatable to local variables so the
compiler does not reload them through ctx on every loop iteration.

"Assign sections" decreases by 1ms.
DeltaFile
+28-22lld/ELF/LinkerScript.cpp
+28-221 files

LLVM/project 665c2e9flang/test/Integration/OpenMP taskloop-bounds-cast.f90, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[Flang][OpenMP] Remove dead restoreIP in OpenMP taskloop lowering (#187222)

This fixes an intermittent crash in `OpenMP` taskloop lowering.

In `OMPIRBuilder::createTaskloop`, the `restoreIP` in `PostOutlineCB`
was immediately overwritten by the following
`Builder.SetInsertPoint(StaleCI)` with no instructions created in
between, so it was effectively dead. This patch removes that dead
restore, which is the smallest change and preserves the intended IR
placement.

Adds a regression test that compiles a taskloop to LLVM IR and verifies
the bounds casts and __kmpc_taskloop call are present.
DeltaFile
+33-0flang/test/Integration/OpenMP/taskloop-bounds-cast.f90
+27-0mlir/test/Target/LLVMIR/openmp-taskloop-bounds-cast.mlir
+0-2llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+60-23 files

LLVM/project b16e012flang-rt/lib/runtime trampoline.cpp

[flang-rt] Fix macOS build: define _DARWIN_C_SOURCE for mmap flags (#186142)

On Darwin, `sys/mman.h` hides `MAP_JIT` and `MAP_ANON(YMOUS)` when
`_POSIX_C_SOURCE` is defined unless `_DARWIN_C_SOURCE` is also defined.
`trampoline.cpp` uses those flags, so this change defines
`_DARWIN_C_SOURCE` before including `<sys/mman.h>` in this file.

Fixes build failure reported in #183108.

Co-authored-by: Sairudra More <moresair at pe31.hpc.amslabs.hpecorp.net>
DeltaFile
+15-2flang-rt/lib/runtime/trampoline.cpp
+15-21 files

LLVM/project 483d8b6llvm/cmake/modules UnityBuild.cmake

use the right target ClangTidyTests
DeltaFile
+2-2llvm/cmake/modules/UnityBuild.cmake
+2-21 files

LLVM/project 98742dc.ci monolithic-linux.sh, llvm CMakeLists.txt

refactor UnityBuild.cmake
DeltaFile
+1,448-1,308llvm/cmake/modules/UnityBuild.cmake
+31-24llvm/CMakeLists.txt
+1-1.ci/monolithic-linux.sh
+1,480-1,3333 files

LLVM/project 798e545clang/lib/CodeGen CMakeLists.txt, clang/unittests CMakeLists.txt

works
DeltaFile
+1,460-0llvm/cmake/modules/UnityBuild.cmake
+0-62clang/unittests/CMakeLists.txt
+0-38clang/lib/CodeGen/CMakeLists.txt
+0-34llvm/utils/TableGen/CMakeLists.txt
+0-26llvm/lib/CodeGen/CMakeLists.txt
+0-23mlir/test/lib/IR/CMakeLists.txt
+1,460-183117 files not shown
+1,470-1,163123 files

LLVM/project 7384e9dclang/include/clang/CIR MissingFeatures.h, clang/lib/CIR/CodeGen CIRGenModule.cpp CIRGenModule.h

[CIR] Add addLLVMUsed and addLLVMCompilerUsed methods to CIRGenModule
DeltaFile
+104-2clang/lib/CIR/CodeGen/CIRGenModule.cpp
+27-0clang/test/CIR/CodeGenHIP/hip-cuid.hip
+19-0clang/lib/CIR/CodeGen/CIRGenModule.h
+0-1clang/include/clang/CIR/MissingFeatures.h
+150-34 files

LLVM/project 1a1dcf2clang/include/clang/CIR MissingFeatures.h, clang/include/clang/CIR/Dialect/IR CIROps.td

[CIR][CIRGen] Support for section atttribute
DeltaFile
+26-13clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+20-4clang/lib/CIR/CodeGen/CIRGenModule.cpp
+14-0clang/test/CIR/CodeGen/global-section.c
+7-1clang/include/clang/CIR/Dialect/IR/CIROps.td
+0-1clang/include/clang/CIR/MissingFeatures.h
+67-195 files

LLVM/project 2e35e62clang/include/clang/CIR MissingFeatures.h, clang/lib/CIR/CodeGen TargetInfo.cpp TargetInfo.h

update requiresAMDGPUProtectedVisibility and other minor fixes
DeltaFile
+51-64clang/lib/CIR/CodeGen/Targets/AMDGPU.cpp
+12-5clang/lib/CIR/CodeGen/TargetInfo.cpp
+4-0clang/lib/CIR/CodeGen/TargetInfo.h
+0-1clang/include/clang/CIR/MissingFeatures.h
+67-704 files

LLVM/project 7dd140cclang/lib/CIR/CodeGen/Targets AMDGPU.cpp, clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVMIR.cpp

add support for amdgpu-expand-waitcnt-profiling
DeltaFile
+44-32clang/lib/CIR/CodeGen/Targets/AMDGPU.cpp
+16-0clang/test/CIR/CodeGenHIP/amdgpu-attrs.hip
+1-4clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVMIR.cpp
+61-363 files

LLVM/project 177b4f0clang/lib/CIR/CodeGen CIRGenModule.cpp TargetInfo.cpp, clang/lib/CIR/CodeGen/Targets AMDGPU.cpp

[CIR][AMDGPU] Add AMDGPU-specific function attributes for HIP kernels
DeltaFile
+256-0clang/lib/CIR/CodeGen/Targets/AMDGPU.cpp
+82-0clang/test/CIR/CodeGenHIP/amdgpu-attrs.hip
+24-3clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVMIR.cpp
+8-6clang/lib/CIR/CodeGen/CIRGenModule.cpp
+10-0clang/lib/CIR/CodeGen/TargetInfo.cpp
+5-0clang/lib/CIR/CodeGen/TargetInfo.h
+385-91 files not shown
+386-97 files