LLVM/project dbc3667flang/test/Semantics/OpenMP declare-target-flags.f90 declare-target-modfile.f90

Add common block to tests
DeltaFile
+21-14flang/test/Semantics/OpenMP/declare-target-flags.f90
+9-0flang/test/Semantics/OpenMP/declare-target-modfile.f90
+30-142 files

LLVM/project 32564b2clang/lib/Driver/ToolChains Clang.cpp, flang/lib/Optimizer/HLFIR/Transforms OptimizedBufferization.cpp

Merge branch 'users/hvdijk/dxilprettyprinter-ir-printing' into users/hvdijk/directx-delay-converting-debug-info
DeltaFile
+38-1libc/test/src/__support/FPUtil/bfloat16_test.cpp
+30-0flang/test/HLFIR/opt-bufferization-eval_in_mem.fir
+23-5flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+13-12clang/lib/Driver/ToolChains/Clang.cpp
+13-1libc/src/__support/FPUtil/bfloat16.h
+117-195 files

LLVM/project a224465clang/lib/Driver/ToolChains Clang.cpp, flang/lib/Optimizer/HLFIR/Transforms OptimizedBufferization.cpp

Merge branch 'users/hvdijk/aaw-emitmdnodeannot' into users/hvdijk/dxilprettyprinter-ir-printing
DeltaFile
+38-1libc/test/src/__support/FPUtil/bfloat16_test.cpp
+30-0flang/test/HLFIR/opt-bufferization-eval_in_mem.fir
+23-5flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+13-12clang/lib/Driver/ToolChains/Clang.cpp
+13-1libc/src/__support/FPUtil/bfloat16.h
+117-195 files

LLVM/project 4cee4eeclang/lib/Driver/ToolChains Clang.cpp, flang/lib/Optimizer/HLFIR/Transforms OptimizedBufferization.cpp

Merge branch 'main' into users/hvdijk/aaw-emitmdnodeannot
DeltaFile
+38-1libc/test/src/__support/FPUtil/bfloat16_test.cpp
+30-0flang/test/HLFIR/opt-bufferization-eval_in_mem.fir
+23-5flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+13-12clang/lib/Driver/ToolChains/Clang.cpp
+13-1libc/src/__support/FPUtil/bfloat16.h
+117-195 files

LLVM/project afa43b5llvm/test/Transforms/LoopInterchange call-instructions.ll

update another test
DeltaFile
+7-3llvm/test/Transforms/LoopInterchange/call-instructions.ll
+7-31 files

LLVM/project ef4fb18flang/lib/Optimizer/HLFIR/Transforms OptimizedBufferization.cpp, flang/test/HLFIR opt-bufferization-eval_in_mem.fir

[flang][hlfir] Resolve shape_of users when bufferizing eval_in_mem (#201214)

A follow-up to #197814.

Example:

```fortran
bmat = matmul(mat, mat)   ! bmat is allocatable
```

In this code, `SeparateAllocatableAssign` sizes the reallocation with an
`hlfir.shape_of` of the RHS. Once the `matmul` is lowered to
`hlfir.eval_in_mem`, that `shape_of` is an extra user, so
`EvaluateIntoMemoryAssignBufferization` erases the `eval_in_mem` while
it's still used, hitting a `use-after-erase` assertion at `-O2`.

Fix: in `OptimizedBufferization`, redirect a `shape_of` user to the
`eval_in_mem`'s shape operand before erasing it.
DeltaFile
+30-0flang/test/HLFIR/opt-bufferization-eval_in_mem.fir
+23-5flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+53-52 files

LLVM/project a2369b9clang/lib/Driver/ToolChains Clang.cpp

[Clang] Fix leftover use of old LTO path (#201360)

Summary:
This was accidentally missed when I merged the refactor because it
showed up after I made the PR and didn't have any merge conflicts I
noticed.
DeltaFile
+13-12clang/lib/Driver/ToolChains/Clang.cpp
+13-121 files

LLVM/project c689c16libc/src/__support/FPUtil bfloat16.h, libc/test/src/__support/FPUtil bfloat16_test.cpp

[libc] Add compound assignment operator overloads for BFloat16 (#201301)

The current Bfloat16 has normal operator overloads `+` , `-` , `=`,
`!=`, `*`, & `/`.
Later during a function failure `*=` was added in
https://github.com/llvm/llvm-project/pull/182882
For completeness the rest of the operators: `/=`, `+=`, `-=` are added 
These are added along with some smoke test .
DeltaFile
+38-1libc/test/src/__support/FPUtil/bfloat16_test.cpp
+13-1libc/src/__support/FPUtil/bfloat16.h
+51-22 files

LLVM/project 6c5b5fdllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll, llvm/test/CodeGen/AMDGPU/GlobalISel legalize-load-global.mir

Merge branch 'users/hvdijk/dxilprettyprinter-ir-printing' into users/hvdijk/directx-delay-converting-debug-info
DeltaFile
+13,610-111llvm/test/CodeGen/X86/clmul-vector.ll
+5,590-5,510llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+10,469-10llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-global.mir
+2,241-2,241llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/arithmetic.test
+1,831-1,831llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/fp.test
+1,541-1,541llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/vlseg-vsseg.test
+35,282-11,2444,290 files not shown
+202,249-99,7944,296 files

LLVM/project aca8961llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll, llvm/test/CodeGen/AMDGPU/GlobalISel legalize-load-global.mir

Merge branch 'users/hvdijk/aaw-emitmdnodeannot' into users/hvdijk/dxilprettyprinter-ir-printing
DeltaFile
+13,610-111llvm/test/CodeGen/X86/clmul-vector.ll
+5,590-5,510llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+10,469-10llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-global.mir
+2,241-2,241llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/arithmetic.test
+1,831-1,831llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/fp.test
+1,541-1,541llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/vlseg-vsseg.test
+35,282-11,2444,290 files not shown
+202,249-99,7944,296 files

LLVM/project 1fe0072llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll, llvm/test/CodeGen/AMDGPU/GlobalISel legalize-load-global.mir

Merge branch 'main' into users/hvdijk/aaw-emitmdnodeannot
DeltaFile
+13,610-111llvm/test/CodeGen/X86/clmul-vector.ll
+5,590-5,510llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+10,469-10llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-global.mir
+2,241-2,241llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/arithmetic.test
+1,831-1,831llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/fp.test
+1,541-1,541llvm/test/tools/llvm-mca/RISCV/SiFiveP400/rvv/vlseg-vsseg.test
+35,282-11,2444,290 files not shown
+202,249-99,7944,296 files

LLVM/project e5332f3llvm/include/llvm/MC MCGOFFObjectWriter.h, llvm/lib/MC GOFFObjectWriter.cpp

[SystemZ][GOFF] Implement reset() for GOFFObjectWriter (#201197)

The reset() methods is used to free memory before the object is
destructed or reused. This change adds this functionality to the GOFF
writer.
DeltaFile
+7-0llvm/lib/MC/GOFFObjectWriter.cpp
+2-0llvm/include/llvm/MC/MCGOFFObjectWriter.h
+9-02 files

LLVM/project 911eddeflang/lib/Frontend CompilerInvocation.cpp, flang/test lit.cfg.py

[Flang] Fix omp_lib.h location and search path (#201104)

Before this PR, omp_lib.h is emitted to `${PREFIX}/include` or
`${PREFIX}/lib/clang/<version>/include` (install prefix) and
`${PREFIX}/runtime/src/omp_lib.h` (builddir prefix). It is never found
there because the driver only adds `${PREFIX}/include/flang/OpenMP` to
the include path.

Fix the `omp_lib.h` include by using the same mechanism as the
omp_lib.mod; that is, move it to
`${PREFIX}/lib/clang/<version>/finclude/flang/<target-triple>`. The
search path is already added by the driver via
`-fintrinsics-modules-path` by the driver. Although omp_lib.h currently
does not contain anything target-specific, it could do so in the future
and I don't think it is worth the effort to add a mechanism without the
target triple. It should also me consistent with omp_lib.mod.

The changes in detail consist of:
1. Move the omp_lib.h output in the builddir to

    [16 lines not shown]
DeltaFile
+25-18flang/test/Driver/include-omp-header.f90
+0-14flang/lib/Frontend/CompilerInvocation.cpp
+0-8flang/test/lit.cfg.py
+3-3openmp/module/CMakeLists.txt
+28-434 files

LLVM/project 70d62e1clang/include/clang/Driver Driver.h ToolChain.h, clang/lib/Driver Driver.cpp ToolChain.cpp

[Clang] Rework LTO mode selection to be a Toolchain property (#201155)

Summary:
Currently, the LTO mode is a property of the Driver, which makes sense
because it is used to set up phases. However, we currently have `-flto`
and `-foffload-lto`, which is a split that doesn't fully work with the
full context of a heterogenous compilation as it is 'all-or-nothing'.

This PR seeks to be mostly NFC for now, just moving the queries to a
per-toolchain interface rather than the static driver mode setting we
have right now. The *single* use of this before ToolChains are created
is for the Webassembly toolchain to set an include path. This is now
just a direct check on the flag, which is consistent. In the future they
could shift to fat LTO objects as well.

The main goal for the PR is to allow the GPU / Offloading toolchains to
specify their "real" LTO behavior. Right now SPIR-V and AMDGCN both
default to LTO, but rather than re-use the LTO handling we hack through
the driver phases to override it. Allowing this split would let us

    [6 lines not shown]
DeltaFile
+39-75clang/lib/Driver/Driver.cpp
+53-0clang/lib/Driver/ToolChain.cpp
+3-29clang/include/clang/Driver/Driver.h
+13-12clang/lib/Driver/ToolChains/Clang.cpp
+13-0clang/include/clang/Driver/ToolChain.h
+7-5clang/lib/Driver/ToolChains/WebAssembly.cpp
+128-12121 files not shown
+175-18027 files

LLVM/project 52cf94allvm/lib/Target/AMDGPU AMDGPUMemoryUtils.cpp AMDGPUCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU amdgpu-late-codegenprepare.ll widen_extending_scalar_loads.ll

[AMDGPU] Drop !noundef when widening sub-DWORD constant loads (#201085)

The widened i32 load reads bytes outside the original sub-DWORD load, so
new op cannot claim !noundef
DeltaFile
+59-0llvm/test/CodeGen/AMDGPU/amdgpu-late-codegenprepare.ll
+17-0llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
+16-0llvm/test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
+7-8llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+5-0llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.h
+2-2llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
+106-106 files

LLVM/project 0fd2402offload/ci hip-tpl.py

[HIP] Remove explicit compiler-rt from bot recipe (#201329)

The same change was done to the AnnotatedBuilder script recently. Let's
keep them in sync.
https://github.com/llvm/llvm-zorg/pull/861
DeltaFile
+0-1offload/ci/hip-tpl.py
+0-11 files

LLVM/project cf1e507llvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange function-attr.ll

[LoopInterchange] Bail out if function that may diverge is called
DeltaFile
+16-40llvm/test/Transforms/LoopInterchange/function-attr.ll
+4-2llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+20-422 files

LLVM/project 1a3e053llvm/lib/Transforms/Scalar LoopInterchange.cpp

address review
DeltaFile
+2-2llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+2-21 files

LLVM/project b1d5d7ellvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize/AArch64 conditional-branches-cost.ll

[VPlan] Don't use the legacy cost model for loop conditions (#156864)

The current behaviour of using the legacy cost model for instructions
that compute the loop exit condition gets the wrong result when the loop
has been transformed to use a different exit condition, e.g. when have
tail-folded predicated vectorization the exit condition is based on the
predicate vector.

Fix this by adding cost computation for BranchOnCount and removing the
restriction on computing the cost for scalar ICmp/FCmp, and removing the
use of the legacy cost model for loop exit conditions.

This causes quite a lot of changes to expected output in tests. Some of
these are just changes to the -debug output, others are choosing a
different VF due to previously over or under-estimating the cost, and in
others the minimum trip count has changed as we now compute the cost for
compares in the middle block.
DeltaFile
+35-35llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+20-20llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+0-39llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+27-9llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll
+22-12llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+8-16llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll
+112-1316 files not shown
+150-17812 files

LLVM/project 906968bllvm/test/Transforms/LoopInterchange function-attr.ll

[LoopInterchange] Add tests for func attributes called in loops (NFC) (#201331)

LoopInterchange has special handling for call instructions. In general,
loops that contain call instructions are not legal to interchange, but
if a call satisfies certain conditions, we allow the interchange to
proceed. Currently, the legality checker only verifies whether the call
reads or writes memory. However, as pointed out in
https://github.com/llvm/llvm-project/pull/200828#issuecomment-4593914293,
we also need to ensure that the call does not diverge. Otherwise, an
illegal interchange may occur.
This patch adds test cases that demonstrate the issue, which will be
fixed in a follow-up patch.
DeltaFile
+172-2llvm/test/Transforms/LoopInterchange/function-attr.ll
+172-21 files

LLVM/project d67b41allvm/test/CodeGen/X86 clmul-vector.ll

[X86] Add clmul vector allones baseline tests for #200592 (#201321)
DeltaFile
+13,610-111llvm/test/CodeGen/X86/clmul-vector.ll
+13,610-1111 files

LLVM/project d3b2471flang/lib/Semantics resolve-directives.cpp

Update resolve-directives.cpp
DeltaFile
+2-0flang/lib/Semantics/resolve-directives.cpp
+2-01 files

LLVM/project 799af5dllvm/lib/Target/AArch64/GISel AArch64LegalizerInfo.cpp, llvm/test/CodeGen/AArch64 arm64-vcnt.ll

[AArch64][GlobalISel] Add handling for cls intrinsic (#200440)

Neon intrinsic neon.cls wasn't linked to the generic node G_CTLS.
Add in this link in Legalisation (LegalizeIntrinsic), to allow the intrinsic to properly lower.
DeltaFile
+32-12llvm/test/CodeGen/AArch64/arm64-vcnt.ll
+2-0llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+34-122 files

LLVM/project 92e40d8flang/lib/Semantics check-omp-variant.cpp check-omp-metadirective.cpp, llvm/lib/LTO LTO.cpp

Merge branch 'main' into users/shiltian/reqd_work_group_size-verifier
DeltaFile
+0-1,898llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
+754-0flang/lib/Semantics/check-omp-variant.cpp
+0-754flang/lib/Semantics/check-omp-metadirective.cpp
+694-0llvm/test/CodeGen/AMDGPU/div-rem-fast-path.ll
+0-682llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll
+93-541llvm/lib/LTO/LTO.cpp
+1,541-3,875607 files not shown
+16,259-10,008613 files

LLVM/project 0cd33adflang/lib/Semantics symbol.cpp

Update symbol.cpp
DeltaFile
+1-1flang/lib/Semantics/symbol.cpp
+1-11 files

LLVM/project 27fd70fclang/include/clang/Basic LangOptions.h, clang/lib/AST ItaniumMangle.cpp RecordLayoutBuilder.cpp

[clang][NFC] Introduce `LangOptions::isCompatibleWith(ClangABI)` (#201067)

This slightly improves readability and reduces the probability of
off-by-one errors.
DeltaFile
+10-12clang/lib/AST/ItaniumMangle.cpp
+10-11clang/lib/AST/RecordLayoutBuilder.cpp
+8-8clang/lib/CodeGen/Targets/X86.cpp
+3-3clang/lib/Basic/TargetInfo.cpp
+3-3clang/lib/Sema/SemaDeclCXX.cpp
+4-0clang/include/clang/Basic/LangOptions.h
+38-374 files not shown
+42-4310 files

LLVM/project 036babellvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange function-attr.ll

[LoopInterchange] Bail out if function that may diverge is called
DeltaFile
+16-40llvm/test/Transforms/LoopInterchange/function-attr.ll
+4-2llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+20-422 files

LLVM/project 0385a1bllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 avx512-trunc.ll avg-mask.ll

[X86] combineSelect - fold select(c,trunc(x),y) -> X86ISD::MTRUNC(x,y,c) for non-BWI targets (#201339)

Fixes #200617
DeltaFile
+9-20llvm/test/CodeGen/X86/avx512-trunc.ll
+5-6llvm/test/CodeGen/X86/avg-mask.ll
+9-0llvm/lib/Target/X86/X86ISelLowering.cpp
+23-263 files

LLVM/project 8943bfbllvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPURegisterBankInfo.cpp, llvm/test/CodeGen/AMDGPU dynamic_stackalloc.ll amdgpu-cs-chain-fp-nosave.ll

[AMDGPU] Do not scale private alloca size when using flat-scratch (#201142)

When using flat-scratch, the `scratch_load/scratch_store` instructions
scale the stack offset by the wavefront size on their own.

Scaling the alloca-size by the wave-front size lead to accesses outside
of the private-memory limit.
DeltaFile
+99-134llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll
+24-32llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-fp-nosave.ll
+20-9llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+5-16llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-uniform.ll
+16-4llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+8-8llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll
+172-2031 files not shown
+176-2117 files

LLVM/project 89d4276llvm/docs/AMDGPU AMDGPUAsmGFX12.rst, llvm/lib/Target/AArch64 AArch64SystemOperands.td

Merge branch 'main' into users/kparzysz/w02-declare-target-mod
DeltaFile
+1,087-1,602llvm/docs/AMDGPU/AMDGPUAsmGFX12.rst
+1,044-1,044llvm/lib/Target/AArch64/AArch64SystemOperands.td
+516-1,543llvm/test/Transforms/GVN/PRE/rle.ll
+0-1,898llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll
+887-803llvm/test/CodeGen/AMDGPU/llvm.exp10.f64.ll
+855-771llvm/test/CodeGen/AMDGPU/llvm.exp.f64.ll
+4,389-7,6612,103 files not shown
+62,627-40,0482,109 files