LLVM/project 86fbaefflang-rt/lib/runtime cudadevice.f90 __ppc_intrinsics.f90, flang/module cudadevice.f90 __ppc_intrinsics.f90

[Flang] Move builtin .mod generation into runtimes (#137828)

Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:

1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Most other modules have `#ifdef`-enclosed code as well.

2. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.

3. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted.



    [33 lines not shown]
DeltaFile
+0-2,242flang/module/cudadevice.f90
+2,242-0flang-rt/lib/runtime/cudadevice.f90
+1,911-0flang-rt/lib/runtime/__ppc_intrinsics.f90
+0-1,911flang/module/__ppc_intrinsics.f90
+1,122-0flang-rt/lib/runtime/mma.f90
+0-1,122flang/module/mma.f90
+5,275-5,27575 files not shown
+7,862-7,68581 files

LLVM/project a11e734. .gitignore

[llvm][nfc] Ignore OpenAI Codex artifacts (#162481)

Follow-up to #153853 to also ignore Codex artifacts [1]. AGENTS.md may
be at the root or in sub-directories, so unlike other Markdown config
files I've not prefixed it with '/'.

[1] https://github.com/openai/codex/blob/main/docs/getting-started.md#memory-with-agentsmd
DeltaFile
+2-0.gitignore
+2-01 files

LLVM/project cf5234bllvm/lib/Target/AArch64 MachineSMEABIPass.cpp

[AArch64] Silence a warning (NFC)

/llvm-project/llvm/lib/Target/AArch64/MachineSMEABIPass.cpp:952:12:
 error: unused variable 'SMEFnAttrs' [-Werror,-Wunused-variable]
  SMEAttrs SMEFnAttrs = AFI->getSMEFnAttrs();
           ^
1 error generated.
DeltaFile
+1-1llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+1-11 files

LLVM/project a086fb2llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp GCNSubtarget.h, llvm/test/CodeGen/AMDGPU memory-legalizer-buffer-atomics.ll spillv16.ll

[AMDGPU][gfx1250] Add wait_xcnt before any access that cannot be repeated (#168852)

The xcnt wait is actually required before any memory access that can
only be done once, so atomic stores and volatile accesses are affected.
This patch also ensures buffer instructions are handled.
DeltaFile
+435-0llvm/test/CodeGen/AMDGPU/memory-legalizer-buffer-atomics.ll
+16-3llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+14-0llvm/test/CodeGen/AMDGPU/spillv16.ll
+6-3llvm/lib/Target/AMDGPU/GCNSubtarget.h
+6-0llvm/test/CodeGen/AMDGPU/integer-mad-patterns.ll
+4-0llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll
+481-612 files not shown
+504-618 files

LLVM/project eb568d6llvm/lib/Target/AArch64 MachineSMEABIPass.cpp AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 sme-zt0-state.ll

[AArch64][SME] Handle zeroing ZA and ZT0 in functions with ZT0 state (#166361)

In the MachineSMEABIPass, if we have a function with ZT0 state, then
there are some additional cases where we need to zero ZA and ZT0.

If the function has a private ZA interface, i.e., new ZT0 (and new ZA if
present). Then ZT0/ZA must be zeroed when committing the incoming ZA
save.

If the function has a shared ZA interface, e.g. new ZA and shared ZT0.
Then ZA must be zeroed on function entry (without a ZA save commit).

The logic in the ABI pass has been reworked to use an "ENTRY" state to
handle this (rather than the more specific "CALLER_DORMANT" state).
DeltaFile
+54-42llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+11-18llvm/test/CodeGen/AArch64/sme-zt0-state.ll
+0-9llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+65-693 files

LLVM/project 2ce363dllvm/utils/gn/secondary/llvm/lib/IR BUILD.gn

[gn build] Port a39af125dba2
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/IR/BUILD.gn
+1-01 files

LLVM/project fbd3fc2bolt/include/bolt/Core MCPlusBuilder.h, bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp

[BOLT][BTI] Add MCPlusBuilder::addBTItoBBStart

This function contains most of the logic for BTI:
- it takes the BasicBlock and the instruction used to jump to it.
- then it checks if the first non-pseudo instruction is a sufficient
landing pad for the used call.
- if not, it generates the correct BTI instruction.

Also introduce the isBTIVariantCoveringCall helper to simplify the logic.
DeltaFile
+105-0bolt/unittests/Core/MCPlusBuilder.cpp
+75-0bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+13-0bolt/include/bolt/Core/MCPlusBuilder.h
+193-03 files

LLVM/project 7c2404bbolt/include/bolt/Core MCPlusBuilder.h, bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp

[BOLT][BTI] Add MCPlusBuilder::updateBTIVariant

Checks if an instruction is BTI, and updates the immediate value to the
newly requested variant.
DeltaFile
+8-0bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+6-0bolt/include/bolt/Core/MCPlusBuilder.h
+6-0bolt/unittests/Core/MCPlusBuilder.cpp
+20-03 files

LLVM/project f4f312bbolt/include/bolt/Core MCPlusBuilder.h, bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp

[BOLT][BTI] Add MCPlusBuilder::isBTILandingPad

- takes both implicit and explicit BTIs into account
- fix related comment in AArch64BranchTargets.cpp
DeltaFile
+18-0bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+17-0bolt/unittests/Core/MCPlusBuilder.cpp
+14-0bolt/include/bolt/Core/MCPlusBuilder.h
+4-2llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
+53-24 files

LLVM/project ed95c4dbolt/include/bolt/Core MCPlusBuilder.h, bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp

[BOLT][BTI] Add MCPlusBuilder::createBTI (#167305)

- creates a BTI j|c landing pad MCInst.
- create getBTIHintNum utility in AArch64/Utils, to make sure BOLT
  generates BTI immediates the same way as LLVM.
- add MCPlusBuilder unittests to cover new function.
DeltaFile
+30-0bolt/unittests/Core/MCPlusBuilder.cpp
+10-0llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+2-7llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
+7-0bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+5-0bolt/include/bolt/Core/MCPlusBuilder.h
+54-75 files

LLVM/project 6193f2allvm/lib/Target/AArch64 AArch64ExpandImm.cpp

[AArch64] Assert `expandMOVImm` prioritizes optimal single MOVZ/N (#169341)

The expansion of move immediate in `expandMOVImm` follows the priority
of the `MOV` alias. In addition, the selection there properly prefers
expansion based on perf optimality order. This change adds a simple
assert that `expandMOVImmSimple` expands a single optimal MOVZ/MOVK.
DeltaFile
+2-0llvm/lib/Target/AArch64/AArch64ExpandImm.cpp
+2-01 files

LLVM/project a39af12llvm/include/llvm/IR NVVMIntrinsicUtils.h, llvm/lib/IR NVVMIntrinsicUtils.cpp CMakeLists.txt

[NVVM] Move pretty-print functions from NVVMIntrinsicUtils.h to cpp file (#168997)

This patch moves the print functions from `NVVMIntrinsicUtils.h` to
`NVVMIntrinsicUtils.cpp`, a file created in the `llvm/lib/IR` directory.

Signed-off-by: Dharuni R Acharya <dharunira at nvidia.com>
DeltaFile
+61-0llvm/lib/IR/NVVMIntrinsicUtils.cpp
+4-45llvm/include/llvm/IR/NVVMIntrinsicUtils.h
+1-0llvm/lib/IR/CMakeLists.txt
+66-453 files

LLVM/project 4bcd279llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/tools/llvm-dwarfdump/X86 simplified-template-names.s

Merge branch 'main' into users/jmmartinez/fix/remat_on_excess_sgpr_to_vgpr
DeltaFile
+31,478-35,882llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+10,429-9,804llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+5,981-8,885llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+5,981-8,885llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+7,387-7,087llvm/test/tools/llvm-dwarfdump/X86/simplified-template-names.s
+3,868-6,624llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll
+65,124-77,1675,728 files not shown
+349,796-283,6825,734 files

LLVM/project f817a1bflang/lib/Optimizer/Builder/Runtime Reduction.cpp, lldb/include/lldb/API SBStructuredData.h

[NFC] Fix typo of `integer` (#169325)

DeltaFile
+2-2lldb/include/lldb/API/SBStructuredData.h
+1-1mlir/include/mlir/Analysis/DataFlow/IntegerRangeAnalysis.h
+1-1mlir/lib/Target/LLVMIR/ModuleImport.cpp
+1-1flang/lib/Optimizer/Builder/Runtime/Reduction.cpp
+5-54 files

LLVM/project 5490bcfbolt/lib/Rewrite RewriteInstance.cpp

[BOLT] Add missing new line. NFC
DeltaFile
+1-1bolt/lib/Rewrite/RewriteInstance.cpp
+1-11 files

LLVM/project 30c49a4mlir/lib/Target/LLVMIR ModuleTranslation.cpp, mlir/test/Target/LLVMIR anonymous-tbaa.mlir

[mlir][LLVMIR] Handle anonymous TBAA roots during metadata emission (#169167)

This commit enhances MLIR's TBAA export with support for anonymous TBAA roots. The import for this was around for a bit but the export was missing.

Fixes: #160721
DeltaFile
+21-0mlir/test/Target/LLVMIR/anonymous-tbaa.mlir
+14-6mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
+35-62 files

LLVM/project 1d64fd5clang/test/CodeGen/arm-mve-intrinsics vmulq.c vsubq.c, clang/utils/TableGen MveEmitter.cpp

[ARM] Introduce intrinsics for MVE add/sub/mul under strict-fp. (#169156)

As far as I understand, the MVE fp vadd/vsub/vmul instructions will set
exception flags in the same ways as scalar fadd/fsub/fmul, but will not
honor flush-to-zero (for f32 they always flush, for f16 they follows the
fpsrc flags) and will always use the default rounding mode.

This means that we cannot convert the vadd_f23/vsub_f32/vmul_f32
intrinsics to llvm.constrained.fadd/fsub/fmul and then vadd/vsub/vmul
without changing the expected behaviour under strict-fp. This patch
introduces a set in intrinsics that we can use instead, going from
vadd_f32 -> llvm.arm.mve.vadd -> MVE_VADD.

The current implementations assumes that the standard variant of a
strictfp alternative will be a IRBuilder, this can be changed to take a
IRBuilder or IRInt.
DeltaFile
+266-124clang/test/CodeGen/arm-mve-intrinsics/vmulq.c
+146-68clang/test/CodeGen/arm-mve-intrinsics/vsubq.c
+146-68clang/test/CodeGen/arm-mve-intrinsics/vaddq.c
+143-0llvm/test/CodeGen/Thumb2/mve-intrinsics/strict-intrinsics.ll
+34-2clang/utils/TableGen/MveEmitter.cpp
+22-11llvm/lib/Target/ARM/ARMInstrMVE.td
+757-2734 files not shown
+795-28410 files

LLVM/project 44a7d2fllvm/lib/Target/AArch64 AArch64InstrInfo.td, llvm/lib/Target/AArch64/GISel AArch64InstructionSelector.cpp

[AArch64] Add patterns for add(x, trunc(shift)) (#168927)

This can be lowered to a 64bit add where we only use the bottom 32bits
of the result. It is conceptually the same as
https://alive2.llvm.org/ce/z/Xfz3Rf, but with the sext replaced by an
anyext.
DeltaFile
+51-62llvm/test/CodeGen/AArch64/rem-by-const.ll
+10-11llvm/test/CodeGen/AArch64/combine-sdiv.ll
+14-0llvm/lib/Target/AArch64/AArch64InstrInfo.td
+4-4llvm/test/CodeGen/AArch64/srem-lkk.ll
+4-4llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+83-815 files

LLVM/project 98751f2flang/lib/Lower/Support ReductionProcessor.cpp

post-rebase fixes
DeltaFile
+2-1flang/lib/Lower/Support/ReductionProcessor.cpp
+2-11 files

LLVM/project f86f6declang/lib/CodeGen CGOpenMPRuntimeGPU.cpp, llvm/unittests/Frontend OpenMPIRBuilderTest.cpp

try to fix Windows build
DeltaFile
+2-2mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+1-1clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+1-1llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+4-43 files

LLVM/project 59092bbflang/lib/Lower/Support ReductionProcessor.cpp, llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h

Add `data_ptr_ptr` region to `declare_reduction` op.
DeltaFile
+54-28llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+40-5flang/lib/Lower/Support/ReductionProcessor.cpp
+37-0mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+12-4llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+10-5llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+11-2mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+164-444 files not shown
+183-4710 files

LLVM/project e424d81flang/lib/Lower/Support ReductionProcessor.cpp

review comments, Tom
DeltaFile
+2-3flang/lib/Lower/Support/ReductionProcessor.cpp
+2-31 files

LLVM/project eeeed4allvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

review comments, Michael
DeltaFile
+4-4llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+2-2llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+6-62 files

LLVM/project 6f1108eflang/lib/Lower/Support ReductionProcessor.cpp, llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h

[OpenMP][flang] Add initial support for by-ref reductions on the GPU

Adds initial support for GPU by-ref reductions. In particular, this diff
adds support for reductions on scalar allocatables where reductions
happen on loops nested in `target` regions. For example:

```fortran
  integer :: i
  real, allocatable :: scalar_alloc

  allocate(scalar_alloc)
  scalar_alloc = 0

  !$omp target map(tofrom: scalar_alloc)
  !$omp parallel do reduction(+: scalar_alloc)
  do i = 1, 1000000
    scalar_alloc = scalar_alloc + 1
  end do
  !$omp end target

    [12 lines not shown]
DeltaFile
+126-35llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+92-0mlir/test/Target/LLVMIR/allocatable_gpu_reduction.mlir
+40-13llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+20-4mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+9-1flang/lib/Lower/Support/ReductionProcessor.cpp
+4-4mlir/test/Target/LLVMIR/omptarget-multi-reduction.mlir
+291-5728 files not shown
+327-9034 files

LLVM/project 675dc35.github/workflows gha-codeql.yml libclang-abi-tests.yml

Update [Github] Update GHA Dependencies (#169257)

This PR contains the following updates:

| Package | Type | Update | Change | Pending |
|---|---|---|---|---|
| ghcr.io/llvm/ci-ubuntu-24.04-abi-tests | container | digest |
`f80125c` -> `9138b6a` | |
|
[github/codeql-action](https://redirect.github.com/github/codeql-action)
| action | patch | `v4.31.3` -> `v4.31.4` | `v4.31.5` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

    [58 lines not shown]
DeltaFile
+2-2.github/workflows/gha-codeql.yml
+2-2.github/workflows/libclang-abi-tests.yml
+2-2.github/workflows/llvm-abi-tests.yml
+1-1.github/workflows/scorecard.yml
+7-74 files

LLVM/project c25e0d3llvm/test/Transforms/LoopVectorize single-early-exit-cond-poison.ll, llvm/test/Transforms/LoopVectorize/AArch64 transform-narrow-interleave-to-widen-memory-derived-ivs.ll sve-inductions-unusual-types.ll

[VPlan] Simplify x + 0 -> x (#169394)

DeltaFile
+26-50llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll
+17-18llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll
+3-7llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
+2-4llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll
+1-5llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
+2-4llvm/test/Transforms/LoopVectorize/single-early-exit-cond-poison.ll
+51-882 files not shown
+56-928 files

LLVM/project 488ed96orc-rt/unittests SessionTest.cpp

[orc-rt] Remove stray debugging output. NFCI. (#169451)

DeltaFile
+0-4orc-rt/unittests/SessionTest.cpp
+0-41 files

LLVM/project 28fde68flang/lib/Semantics check-omp-loop.cpp check-omp-structure.cpp, flang/test/Semantics/OpenMP target-teams-nesting.f90

[Flang] - Enhance testing for strictly-nested teams in target regions. (#168437)

This patch enhances the semantics test for checking that teams
directives are strictly nested inside target directives.

Fixes https://github.com/llvm/llvm-project/issues/153173
DeltaFile
+20-0flang/test/Semantics/OpenMP/target-teams-nesting.f90
+9-0flang/lib/Semantics/check-omp-loop.cpp
+7-0flang/lib/Semantics/check-omp-structure.cpp
+36-03 files

LLVM/project d92d501clang/lib/CodeGen CGOpenMPRuntimeGPU.cpp, llvm/unittests/Frontend OpenMPIRBuilderTest.cpp

try to fix Windows build
DeltaFile
+2-2mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+1-1clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+1-1llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+4-43 files

LLVM/project 9c2d5e2llvm/lib/Target/Mips MipsISelLowering.cpp MipsISelLowering.h, llvm/test/CodeGen/Mips fp-strict-fcmp.ll

[Mips] Set custom lowering for STRICT_FSETCC/STRICT_FSETCCS ops. (#168303)

DeltaFile
+586-0llvm/test/CodeGen/Mips/fp-strict-fcmp.ll
+28-1llvm/lib/Target/Mips/MipsISelLowering.cpp
+1-0llvm/lib/Target/Mips/MipsISelLowering.h
+615-13 files