LLVM/project 23e4914.github CODEOWNERS

[NFC] Add default code reviewers for some AMDGPU-related files (#185942)
DeltaFile
+11-0.github/CODEOWNERS
+11-01 files

LLVM/project 0bebee6libc/shared/math fmabf16.h, libc/src/__support/math fmabf16.h

[libc][math][c++23] Add Fmabf16 math function (#182836)

closes #180171 
part of #177259 

Here are some extra changes apart from the usual which were needed
1. `libc/src/__support/FPUtil/generic/add_sub.h` → +0 -0 error
2. `libc/src/__support/FPUtil/generic/FMA.h` → implemented to handle
fmabf16(Normal,Normal,+/-INF)

```jsx
/home/runner/work/llvm-project/llvm-project/libc/test/src/math/fmabf16_test.cpp:62: FAILURE
Failed to match __llvm_libc_23_0_0_git::fmabf16(x, y, z) against LIBC_NAMESPACE::testing::mpfr::get_mpfr_matcher<mpfr::Operation::Fma>( input, __llvm_libc_23_0_0_git::fmabf16(x, y, z), 0.5, mpfr::RoundingMode::Nearest).
Input decimal: x: 338953138925153547590470800371487866880.00000000000000000000000000000000000000000000000000 y: 338953138925153547590470800371487866880.00000000000000000000000000000000000000000000000000 z: -inf
 First input bits: 0x7F7F = (S: 0, E: 0x00FE, M: 0x007F)
Second input bits: 0x7F7F = (S: 0, E: 0x00FE, M: 0x007F)
 Third input bits: (-Infinity)
Libc result: nan
MPFR result: -inf

    [16 lines not shown]
DeltaFile
+77-0libc/test/src/math/exhaustive/fmabf16_test.cpp
+66-0libc/test/src/math/fmabf16_test.cpp
+27-0libc/src/__support/math/fmabf16.h
+23-0libc/shared/math/fmabf16.h
+21-0libc/src/math/fmabf16.h
+19-0libc/test/src/math/exhaustive/CMakeLists.txt
+233-026 files not shown
+361-232 files

LLVM/project 9b61ff2clang/test/OpenMP target_teams_distribute_parallel_for_simd_schedule_codegen.cpp teams_distribute_parallel_for_simd_schedule_codegen.cpp

Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat" (#186309)

Reverts llvm/llvm-project#185989
DeltaFile
+4,814-5,294clang/test/OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp
+4,758-5,238clang/test/OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp
+4,098-4,350clang/test/OpenMP/distribute_parallel_for_simd_codegen.cpp
+3,524-4,004clang/test/OpenMP/teams_distribute_parallel_for_schedule_codegen.cpp
+3,520-4,000clang/test/OpenMP/target_teams_distribute_parallel_for_schedule_codegen.cpp
+3,174-3,590clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
+23,888-26,476350 files not shown
+112,671-126,870356 files

LLVM/project a8c6ccelibc/config/linux/aarch64 entrypoints.txt, libc/config/linux/x86_64 entrypoints.txt

[libc] Add posix_memalign as external entrypoint on Linux x86/ARM. (#185310)

`posix_memalign` is provided by Scudo allocator and is a part of POSIX
standard, so we can safely declare it in the `<stdlib.h>` header on
Linux systems.
DeltaFile
+8-0libc/src/stdlib/CMakeLists.txt
+1-0libc/config/linux/aarch64/entrypoints.txt
+1-0libc/config/linux/x86_64/entrypoints.txt
+10-03 files

LLVM/project 863e058clang/include/clang/CIR/Dialect/IR CIRDialect.td, clang/lib/CIR/CodeGen CIRGenAMDGPU.cpp CIRGenModule.cpp

[CIR][AMDGPU] Add module flags for AMDGPU target (#186081)

Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/2100

This PR adds support to emit AMDGPU-specific module flags
`amdhsa_code_object_version` and `amdgpu_printf_kind` to match OGCG
behavior.

In `CIRGenModule`, the flags are stored as CIR module attributes:

`cir.amdhsa_code_object_version` (integer)
`cir.amdgpu_printf_kind` (string: "hostcall" or "buffered")
During lowering to LLVM IR (in LowerToLLVMIR.cpp), these attributes are
converted to LLVM module flags.
DeltaFile
+41-0clang/lib/CIR/CodeGen/CIRGenAMDGPU.cpp
+30-0clang/test/CIR/CodeGenHIP/amdgpu-module-flags.hip
+22-1clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVMIR.cpp
+3-0clang/include/clang/CIR/Dialect/IR/CIRDialect.td
+3-0clang/lib/CIR/CodeGen/CIRGenModule.cpp
+3-0clang/lib/CIR/CodeGen/CIRGenModule.h
+102-11 files not shown
+103-17 files

LLVM/project 6bf197dmlir/lib/Conversion/XeVMToLLVM XeVMToLLVM.cpp, mlir/lib/Target/LLVMIR/Dialect/XeVM XeVMToLLVMIRTranslation.cpp

Merge branch 'main' into revert-185989-UpdateImplicitArgs
DeltaFile
+247-16mlir/lib/Conversion/XeVMToLLVM/XeVMToLLVM.cpp
+84-44mlir/test/Conversion/XeVMToLLVM/xevm-to-llvm.mlir
+17-42openmp/docs/design/Runtimes.rst
+10-36mlir/test/Target/LLVMIR/xevm.mlir
+1-37mlir/lib/Target/LLVMIR/Dialect/XeVM/XeVMToLLVMIRTranslation.cpp
+0-31offload/test/api/omp_dynamic_shared_memory.c
+359-2067 files not shown
+371-24413 files

LLVM/project 0770c83mlir/lib/Conversion/XeVMToLLVM XeVMToLLVM.cpp, mlir/lib/Target/LLVMIR/Dialect/XeVM XeVMToLLVMIRTranslation.cpp

[XeVM] Add translation for XeVM cache-control attributes. (#181856)

Use `llvm.intr.ptr.annotation` to attach cache-control metadata to a
pointer. Each cache-control attribute produces its own annotation call;
multiple attributes are chained so every annotation sits on the same
pointer.

This approach protects the metadata across optimizations.
DeltaFile
+247-16mlir/lib/Conversion/XeVMToLLVM/XeVMToLLVM.cpp
+84-44mlir/test/Conversion/XeVMToLLVM/xevm-to-llvm.mlir
+10-36mlir/test/Target/LLVMIR/xevm.mlir
+1-37mlir/lib/Target/LLVMIR/Dialect/XeVM/XeVMToLLVMIRTranslation.cpp
+342-1334 files

LLVM/project ac71b18offload/libomptarget OffloadRTL.cpp, offload/plugins-nextgen/amdgpu/src rtl.cpp

[offload] Remove LIBOMPTARGET_SHARED_MEMORY_SIZE envar (#186231)

This commit removes the `LIBOMPTARGET_SHARED_MEMORY_SIZE` envar and
outputs a runtime warning if it is defined. Access to dynamic shared memory
should be obtained through the `dyn_groupprivate` clause (OpenMP 6.1) or
the launch arguments in liboffload kernel launch.
DeltaFile
+17-42openmp/docs/design/Runtimes.rst
+0-31offload/test/api/omp_dynamic_shared_memory.c
+0-26offload/test/api/omp_dynamic_shared_memory_amdgpu.c
+11-0offload/libomptarget/OffloadRTL.cpp
+0-4offload/plugins-nextgen/cuda/src/rtl.cpp
+0-4offload/plugins-nextgen/amdgpu/src/rtl.cpp
+28-1073 files not shown
+29-1119 files

LLVM/project 9226288clang/test/OpenMP target_teams_distribute_parallel_for_simd_schedule_codegen.cpp teams_distribute_parallel_for_simd_schedule_codegen.cpp

Revert "[OpenMP] Move OpenMP implicit argument to the end and reformat (#185989)"

This reverts commit 4376fbd7931048837d34c91b79f01cd1246637a5.
DeltaFile
+4,814-5,294clang/test/OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp
+4,758-5,238clang/test/OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp
+4,098-4,350clang/test/OpenMP/distribute_parallel_for_simd_codegen.cpp
+3,524-4,004clang/test/OpenMP/teams_distribute_parallel_for_schedule_codegen.cpp
+3,520-4,000clang/test/OpenMP/target_teams_distribute_parallel_for_schedule_codegen.cpp
+3,174-3,590clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
+23,888-26,476350 files not shown
+112,671-126,870356 files

LLVM/project e51e9afmlir/lib/Dialect/XeGPU/Transforms XeGPUSgToWiDistributeExperimental.cpp, mlir/test/Dialect/XeGPU sg-to-wi-experimental-unit.mlir

[MLIR][XeGPU] Add distribution pattern for convertLayoutOp (#184826)
DeltaFile
+22-2mlir/lib/Dialect/XeGPU/Transforms/XeGPUSgToWiDistributeExperimental.cpp
+13-0mlir/test/Dialect/XeGPU/sg-to-wi-experimental-unit.mlir
+35-22 files

LLVM/project 13e94eaclang/lib/CodeGen CodeGenFunction.cpp

remove unused var
DeltaFile
+0-1clang/lib/CodeGen/CodeGenFunction.cpp
+0-11 files

LLVM/project 1cf130dmlir/test/Dialect/Builtin/Bytecode builtin_fixed.mlir

Exclude known failure case (#186305)

External resources does not produce same result on big-endian. Keeping
this test for regressions of the encoding scoped keeps it simple while
it doesn't affect the usage there. So just mark as XFAIL.
DeltaFile
+3-0mlir/test/Dialect/Builtin/Bytecode/builtin_fixed.mlir
+3-01 files

LLVM/project 40b2079clang/test/CodeGen attr-target-clones-ppc.c

adjust test to reflect the new IR we generate after rebasing on b3d99ac2cda4
DeltaFile
+4-4clang/test/CodeGen/attr-target-clones-ppc.c
+4-41 files

LLVM/project 7b677a8mlir/test/Dialect/Builtin/Bytecode builtin_fixed.mlir

Exclude known failure case

External resources does not produce same result on big-endian. Keeping this test for regressions of the encoding scoped keeps it simple while it doesn't affect the usage there. So just mark as XFAIL.
DeltaFile
+3-0mlir/test/Dialect/Builtin/Bytecode/builtin_fixed.mlir
+3-01 files

LLVM/project 94ba117mlir/lib/Dialect/XeGPU/Transforms XeGPUSubgroupDistribute.cpp, mlir/lib/Dialect/XeGPU/Utils XeGPUUtils.cpp

[MLIR][XeGPU] Support leading unit dim for reduction in sg to wi pass (#185110)
DeltaFile
+74-0mlir/test/Dialect/XeGPU/subgroup-distribute-unit.mlir
+30-16mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+31-15mlir/lib/Dialect/XeGPU/Utils/XeGPUUtils.cpp
+135-313 files

LLVM/project 4540415llvm/lib/Target/RISCV RISCVTargetTransformInfo.cpp, llvm/test/Analysis/CostModel/RISCV rvp-shuffle-reverse.ll

[RISCV] Fix crash in getShuffleCost for P-extension without V extension (#186149)

RISCVTTIImpl::getShuffleCost() crashes when querying the cost of a
reverse shufflevector on a target with the P-extension but without V/Zve
extensions. The SK_Reverse case calls
getContainerForFixedLengthVector(), which asserts hasVInstructions().

The P-extension uses fixed-width packed SIMD in GPRs, not RVV registers,
so V extension is typically not enabled.

Add an early return for P-extension fixed vectors in getShuffleCost,
consistent with the existing guards in getScalarizationOverhead,
getCastInstrCost, and getVectorInstrCost.
DeltaFile
+70-0llvm/test/Analysis/CostModel/RISCV/rvp-shuffle-reverse.ll
+7-0llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+77-02 files

LLVM/project bcdf3c9llvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV rvp-narrowing-shift-trunc.ll

[RISCV] Fix crash in combinePExtTruncate for truncate(srl) without MUL/SUB (#186141)

combinePExtTruncate is called from performTRUNCATECombine when the
P-extension is enabled. It attempts to match patterns like
truncate(srl(mul/sub(...), shamt)) and combine them into P-extension
narrowing shift instructions (e.g. PNSRLI, PNSRAI).

However, after extracting the shift input operand `Op` from the SRL
node, the function unconditionally accessed Op.getOperand(0) and
Op.getOperand(1) without first verifying that Op has at least two
operands. For example, when combining:

```
  truncate(v2i16
    srl(v2i32
      bitcast(v2i32 i64),   <-- Op = bitcast, a unary op with 1 operand
      BUILD_VECTOR <8, 8>))
```


    [7 lines not shown]
DeltaFile
+80-0llvm/test/CodeGen/RISCV/rvp-narrowing-shift-trunc.ll
+4-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+84-02 files

LLVM/project c6395bbmlir/lib/Dialect/XeGPU/Transforms XeGPULayoutImpl.cpp XeGPUPropagateLayout.cpp, mlir/test/Dialect/XeGPU propagate-layout.mlir propagate-layout-inst-data.mlir

[MLIR][XeGPU] Enhance Layout Propagation for broadcasting both leading dimensions and  inner unit dimensions (#185583)

This PR enhances the layout propagation rules for broadcast operations.

The source layout is derived from the result layout based on the
broadcast pattern:
1. Broadcast on leading dimensions
  The source layout is the slice layout of the result layout.
2. Broadcast on inner unit dimensions
The source layout matches the result layout, with sg_data and lane_data
set to 1.
3. Broadcast on both leading dimensions and inner unit dimensions
  The source layout is derived by combining the above two rules.
DeltaFile
+17-12mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+27-0mlir/test/Dialect/XeGPU/propagate-layout.mlir
+21-0mlir/test/Dialect/XeGPU/propagate-layout-inst-data.mlir
+21-0mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
+11-4mlir/lib/Dialect/XeGPU/Transforms/XeGPUPropagateLayout.cpp
+7-7mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+104-233 files not shown
+115-369 files

LLVM/project 2aaef84llvm/include module.modulemap

[llvm] Fix modules build after 2e614f353871a9a0738c3b49d35d5ed4e480633b (#186302)
DeltaFile
+1-0llvm/include/module.modulemap
+1-01 files

LLVM/project eb5f780clang/lib/Basic/Targets PPC.cpp PPC.h, clang/lib/CodeGen CodeGenFunction.cpp CodeGenModule.cpp

code review

code review

code review
DeltaFile
+3-9clang/lib/CodeGen/CodeGenFunction.cpp
+4-6clang/lib/CodeGen/CodeGenModule.cpp
+0-7clang/lib/Basic/Targets/PPC.cpp
+0-1clang/lib/Basic/Targets/PPC.h
+7-234 files

LLVM/project b2c6788clang/test/CodeGen attr-target-clones-ppc.c

test all supported CPUs
DeltaFile
+13-2clang/test/CodeGen/attr-target-clones-ppc.c
+13-21 files

LLVM/project 5c8c073clang/include/clang/Basic AttrDocs.td

update target_clones documentation
DeltaFile
+6-0clang/include/clang/Basic/AttrDocs.td
+6-01 files

LLVM/project cc711c8clang/test/Sema attr-target-clones-ppc.c

fix test
DeltaFile
+3-1clang/test/Sema/attr-target-clones-ppc.c
+3-11 files

LLVM/project 35b2e4fclang/lib/Sema SemaPPC.cpp

create PPCTargetInfo::isTargetClonesSupportedCPU to filter out unsupported CPUs during Sema
DeltaFile
+6-3clang/lib/Sema/SemaPPC.cpp
+6-31 files

LLVM/project 74e4342clang/include/clang/Sema SemaPPC.h, clang/lib/Basic/Targets PPC.cpp PPC.h

normalize CPU during Sema

fix Sema and create ppc target_clones tests based on the x86 test
DeltaFile
+130-0clang/test/Sema/attr-target-clones-ppc.c
+15-5clang/lib/Sema/SemaPPC.cpp
+7-0clang/lib/Basic/Targets/PPC.cpp
+2-1clang/lib/Sema/SemaDeclAttr.cpp
+2-1clang/include/clang/Sema/SemaPPC.h
+1-0clang/lib/Basic/Targets/PPC.h
+157-76 files

LLVM/project 062947cclang/lib/CodeGen CodeGenModule.cpp, clang/lib/Sema SemaPPC.cpp

diagnose non-cpu strings in target_clones in Sema
DeltaFile
+3-5clang/lib/Sema/SemaPPC.cpp
+1-1clang/lib/CodeGen/CodeGenModule.cpp
+4-62 files

LLVM/project 8a2f16aclang/lib/Basic/Targets PPC.cpp, clang/lib/CodeGen CodeGenFunction.cpp

now that we normalize CPU on target_clones in Sema, remove normalization in codegen
DeltaFile
+5-5clang/lib/CodeGen/CodeGenFunction.cpp
+2-7clang/lib/Basic/Targets/PPC.cpp
+7-122 files

LLVM/project d900078clang/lib/Sema SemaPPC.cpp

checkTargetClonesAttr: compute TargetInfo once
DeltaFile
+4-4clang/lib/Sema/SemaPPC.cpp
+4-41 files

LLVM/project eaf36ebclang/include/clang/Sema SemaPPC.h, clang/lib/Sema SemaPPC.cpp

code review: add const to parameters
DeltaFile
+2-2clang/include/clang/Sema/SemaPPC.h
+2-1clang/lib/Sema/SemaPPC.cpp
+4-32 files

LLVM/project 07e2abcclang/lib/CodeGen CodeGenModule.cpp

inline the only call to IgnoreFMVOnADeclaration
DeltaFile
+7-15clang/lib/CodeGen/CodeGenModule.cpp
+7-151 files