LLVM/project e602eabllvm/lib/Target/AMDGPU GCNHazardRecognizer.cpp, llvm/test/CodeGen/AMDGPU misched-into-wmma-hazard-shadow.mir

[AMDGPU] Allow hazard checks for WMMA co-exec

Now we are just inserting V_NOP instrtuctions, try to schedule
something into the shadow.

It is still somewhat imprecise, for example AdvanceCycle() will
use TII.getNumWaitStates() anyway, but in a scheduling mode
we are not required to be precise. We must be finally precise
in the hazard recognizer mode. Then EmittedInstrs buffer is also
limited to MaxLookAhead even though VALU only hazards may actually
never expire and require an endless buffer. But that's OK, we can
at least mitigate what the buffer can hold. The buffer is also
currently much bigger than any of VALU hazards may need.

That said the rest of the 'fix*' functions here can be changed
the same way, these which are using V_NOPs. This one is just the
worst because it may require up to 9 nops.
DeltaFile
+56-0llvm/test/CodeGen/AMDGPU/misched-into-wmma-hazard-shadow.mir
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+62-02 files

LLVM/project 76d614bllvm/lib/Analysis InstructionSimplify.cpp, llvm/test/Transforms/InstSimplify compare.ll

[InstSimplify] Extend icmp-of-add simplification to sle/sgt/sge (#168900)

When comparing additions with the same base where one has `nsw`, the
following simplification can be performed:

```llvm
icmp slt/sgt/sle/sge (x + C1), (x +nsw C2)
=>
icmp slt/sgt/sle/sge C1, C2
```

Previously this was only done for `slt`. This patch extends it to the
`sgt`, `sle`, and `sge` predicates when either of the conditions hold:
- `C1 <= C2 && C1 >= 0`, or
- `C2 <= C1 && C1 <= 0`

This patch also handles the `C1 == C2` case, which was previously
excluded.

Proof: https://alive2.llvm.org/ce/z/LtmY4f
DeltaFile
+102-12llvm/test/Transforms/InstSimplify/compare.ll
+8-8llvm/lib/Analysis/InstructionSimplify.cpp
+110-202 files

LLVM/project 04acac2compiler-rt/test/asan/TestCases stack_container_dynamic_lib.cpp

[compiler-rt] [test] Generalize an UNSUPPORTED marking (#168858)

Don't specifically target windows-msvc - the same goes for any windows
target; mingw doesn't have dlfcn.h either.
DeltaFile
+1-1compiler-rt/test/asan/TestCases/stack_container_dynamic_lib.cpp
+1-11 files

LLVM/project 244b230llvm/test/CodeGen/AMDGPU rewrite-vgpr-mfma-scale-to-agpr.mir

[AMDGPU] Precommit test for issue in amdgpu-rewrite-agpr-copy-mfma, (#168609)

which reassigns scale operand in vgpr_32 register to agpr_32, not
permitted by instruction format. Reduced from ck.

---------

Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
Co-authored-by: theRonShark <ron.lieberman at amd.com>
DeltaFile
+33-0llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-scale-to-agpr.mir
+33-01 files

LLVM/project a3f6c43llvm/test/Transforms/LoopVectorize/X86 replicating-load-store-costs.ll

[LV] Add test a low-trip count test without folding the tail.

Add a low trip count test that is currently vectorized but unprofitable,
for https://github.com/llvm/llvm-project/issues/167858.
DeltaFile
+124-0llvm/test/Transforms/LoopVectorize/X86/replicating-load-store-costs.ll
+124-01 files

LLVM/project 1056584llvm/lib/Transforms/Instrumentation MemorySanitizer.cpp, llvm/test/Instrumentation/MemorySanitizer/X86 avx2-intrinsics-x86.ll avx-intrinsics-x86.ll

[msan] Fix handling of 256-bit hadd/hsub instructions (#168121)

These horizontal add/sub instructions are currently handled by
adding/subtracting tuples of the first operand, followed by tuples of
the second operand. This is not the correct semantics for the 256-bit
insructions: they process the first half of the first operand, then the
first half of the second operand, then the second half of the first
operand, and finally the second half of the second operand (trust me bro
[*]).

This patch fixes the issue by applying the "shards" functionality that
was added in https://github.com/llvm/llvm-project/pull/167954, to handle
the top and bottom 128-bit "shards" in turn.

[*] clang/test/CodeGen/X86/avx2-builtins.c:
```
TEST_CONSTEXPR(match_v8si(_mm256_hadd_epi32(
    (__m256i)(__v8si){10, 20, 30, 40, 50, 60, 70, 80},
    (__m256i)(__v8si){5, 15, 25, 35, 45, 55, 65, 75}),
    30,70,20,60,110,150,100,140));
```
DeltaFile
+12-12llvm/test/Instrumentation/MemorySanitizer/X86/avx2-intrinsics-x86.ll
+12-12llvm/test/Instrumentation/MemorySanitizer/i386/avx2-intrinsics-i386.ll
+8-8llvm/test/Instrumentation/MemorySanitizer/i386/avx-intrinsics-i386.ll
+8-8llvm/test/Instrumentation/MemorySanitizer/X86/avx-intrinsics-x86.ll
+4-8llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
+44-485 files

LLVM/project 1b8a4aaflang/include/flang/Optimizer/Builder CUFCommon.h, flang/lib/Optimizer/Builder CUFCommon.cpp

[flang][cuda] Extract element count computation into helper function (#168937)

This patch extracts the common logic for computing array element counts
from shape operands into a reusable helper function in CUFCommon.
DeltaFile
+41-0flang/lib/Optimizer/Builder/CUFCommon.cpp
+2-25flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
+4-0flang/include/flang/Optimizer/Builder/CUFCommon.h
+47-253 files

LLVM/project 7e43715llvm/test/ExecutionEngine/JITLink/ppc64 ELF_ppc64_relocations.s

[JITLINK] Disable ELF_ppc64_relocations.s on SystemZ host (#168939)

Mark ELF_ppc64_relocations.s as unsupported on SystemZ because of cross
build issue related to using dlsym for host symbols.
Test fails to resolve __tls_get_aadr on SystemZ host.

Co-authored-by: anoopkg6 <anoopkg6 at github.com>
DeltaFile
+4-0llvm/test/ExecutionEngine/JITLink/ppc64/ELF_ppc64_relocations.s
+4-01 files

LLVM/project 91e777flibc/cmake/caches armv6m-none-eabi.cmake armv7em-none-eabi.cmake

[libc] Removed unused flags from baremetal cache files (#168942)

These flags are not needed for building libc.
DeltaFile
+1-1libc/cmake/caches/armv6m-none-eabi.cmake
+1-1libc/cmake/caches/armv7em-none-eabi.cmake
+1-1libc/cmake/caches/armv7m-none-eabi.cmake
+3-33 files

LLVM/project 600917ellvm/test/MC/AMDGPU gfx11_asm_vop1.s, llvm/utils update_mc_test_checks.py

[Utils][update_llc_test_checks] Support generating asm tests from templates.

Reduces the pain of manual editing tests applying the same
changes over multiple instructions and keeping them consistent.
DeltaFile
+472-150llvm/test/MC/AMDGPU/gfx11_asm_vop1.s
+94-8llvm/utils/update_mc_test_checks.py
+566-1582 files

LLVM/project 318e7dfllvm/docs LangRef.rst

[LangRef] Docs: more detailed categories for Vector intrinsics (#168924)

Fixes: https://github.com/llvm/llvm-project/issues/167132
DeltaFile
+45-39llvm/docs/LangRef.rst
+45-391 files

LLVM/project 1552efeutils/bazel MODULE.bazel.lock .bazelversion

[bazel] Bump to 8.4.2 (#168933)

Just staying up to date
DeltaFile
+17-11utils/bazel/MODULE.bazel.lock
+1-1utils/bazel/.bazelversion
+18-122 files

LLVM/project 06562e2clang/lib/Sema SemaPPC.cpp, clang/test/CodeGen/PowerPC builtins-amo-err.c

Add 64-bit target check for signed AMO builtins
DeltaFile
+6-6clang/test/CodeGen/PowerPC/builtins-amo-err.c
+2-0clang/lib/Sema/SemaPPC.cpp
+8-62 files

LLVM/project 35c1bfdmlir/include/mlir/Target/LLVMIR ModuleImport.h, mlir/lib/Target/LLVMIR ModuleImport.cpp ConvertFromLLVMIR.cpp

[mlir][llvm] Handle debug record import edge cases

This commit enables the direct import of debug records by default and
fixes issues with two edge cases:
- Detect early on if the address operand is an argument list
  (calling getAddress() for argument lists asserts)
- Use getAddress() to check if the address operand is null, which
  means the address operand is an empty metadata node, which currently
  is not supported.
- Add support for debug label records.

This is a follow-up to:
https://github.com/llvm/llvm-project/pull/167812
DeltaFile
+84-44mlir/lib/Target/LLVMIR/ModuleImport.cpp
+7-6mlir/include/mlir/Target/LLVMIR/ModuleImport.h
+5-7mlir/test/Target/LLVMIR/Import/import-failure.ll
+1-1mlir/lib/Target/LLVMIR/ConvertFromLLVMIR.cpp
+97-584 files

LLVM/project 35cbf2dllvm/include/llvm/CodeGen SDPatternMatch.h, llvm/unittests/CodeGen SelectionDAGPatternMatchTest.cpp

DAG: Handle poison in m_Undef
DeltaFile
+3-1llvm/include/llvm/CodeGen/SDPatternMatch.h
+2-0llvm/unittests/CodeGen/SelectionDAGPatternMatchTest.cpp
+5-12 files

LLVM/project 930066futils/bazel MODULE.bazel.lock MODULE.bazel, utils/bazel/llvm-project-overlay/clang BUILD.bazel

[bazel] Add explicit dep on protobuf (#168928)

This is required for correctly loading the protobuf rules. It's
possible we could drop the version here to a lower version, as long as
that version supports the versions of bazel we support. I picked this
because it is the current version being used by bazel 8.0.0 (which is
defined in the .bazelversion). Users can override this in their project
anyways if they need an older one
DeltaFile
+349-0utils/bazel/MODULE.bazel.lock
+9-8utils/bazel/MODULE.bazel
+2-0utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+360-83 files

LLVM/project d015af2clang/include/clang/CIR/Sema CIRAnalysisKind.h FallThroughWarning.h, clang/lib/CIR/FrontendAction CIRGenAction.cpp

Skeleton
DeltaFile
+170-0clang/lib/CIR/Sema/FallThroughWarning.cpp
+117-0clang/include/clang/CIR/Sema/CIRAnalysisKind.h
+67-0clang/lib/CIR/Sema/CIRAnalysisKind.cpp
+66-0clang/include/clang/CIR/Sema/FallThroughWarning.h
+36-0clang/lib/CIR/FrontendAction/CIRGenAction.cpp
+19-0clang/lib/CIR/Sema/CMakeLists.txt
+475-04 files not shown
+485-010 files

LLVM/project 01a4177llvm/include/llvm/Frontend/Directive DirectiveBase.td, llvm/include/llvm/Frontend/OpenMP OMP.td

[OpenMP] Introduce "loop sequence" as directive association

OpenMP 6.0 introduced a `fuse` directive, and with it a "loop sequence"
as the associated code. What used to be "loop association" has become
"loop-nest association".

Rename Association::Loop to LoopNest, add Association::LoopSeq to
represent the "loop sequence" association.

Change the association of fuse from "block" to "loop sequence".
DeltaFile
+12-12llvm/include/llvm/Frontend/OpenMP/OMP.td
+10-9llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
+3-2llvm/include/llvm/Frontend/Directive/DirectiveBase.td
+3-2llvm/test/TableGen/directive1.td
+3-2llvm/test/TableGen/directive2.td
+2-2llvm/lib/Frontend/OpenMP/OMP.cpp
+33-293 files not shown
+38-349 files

LLVM/project 88055b3clang-tools-extra/clang-doc ClangDoc.cpp Representation.h

[clang-doc][NFC] Remove unused headers (#168806)

Removes unused headers or replaces them with headers that directly
provide the symbol instead. For example, `Serialize.h` included `AST.h`,
but it was actually `Serialize.cpp` that needed concept expressions, so
now it includes just `ExprConcepts.h`.
DeltaFile
+0-4clang-tools-extra/clang-doc/ClangDoc.cpp
+1-3clang-tools-extra/clang-doc/Representation.h
+0-4clang-tools-extra/clang-doc/BitcodeWriter.h
+0-3clang-tools-extra/clang-doc/Serialize.h
+2-0clang-tools-extra/clang-doc/Serialize.cpp
+0-2clang-tools-extra/clang-doc/ClangDoc.h
+3-165 files not shown
+4-2311 files

LLVM/project be8c7c9llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel irtranslator-amdgpu_kernel.ll regbankselect-widen-scalar-loads.mir

AMDGPU: Use ConstantPool as source value for DAG lowered kernarg loads

This isn't quite a constant pool, but probably close enough for this
purpose. We just need some known invariant value address. The aliasing
queries against the real kernarg base pointer will falsely report
no aliasing, but for invariant memory it probably doesn't matter.
DeltaFile
+216-216llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-amdgpu_kernel.ll
+76-76llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-widen-scalar-loads.mir
+73-73llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-load.mir
+22-9llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+20-7llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+8-8llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-split-scalar-load-metadata.mir
+415-3894 files not shown
+433-39110 files

LLVM/project 954dc93llvm/lib/Target/AMDGPU SIISelLowering.cpp

AMDGPU: Handle invariant when lowering global loads

Global with invariant should be treated identically to
constant.
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1-11 files

LLVM/project 7acfbc2llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlan.h, llvm/test/Transforms/LoopVectorize pointer-induction-index-width-smaller-than-iv-width.ll

[VPlan] Remove PtrIV::IsScalarAfterVectorization, use VPlan analysis. (#168289)

Remove `VPWidenPointerInductionRecipe::IsScalarAfterVectorization` and
replace it with `onlyScalarValuesUsed`. This removes the need to carry
state from the legacy cost model through VPlan, and the VPlan-based
analysis gives more accurate results, avoiding a number of extracts.

PR: https://github.com/llvm/llvm-project/pull/168289
DeltaFile
+11-18llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
+10-7llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll
+5-10llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+3-8llvm/lib/Transforms/Vectorize/VPlan.h
+4-7llvm/test/Transforms/LoopVectorize/X86/interleave-opaque-pointers.ll
+1-5llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
+34-552 files not shown
+36-588 files

LLVM/project 155a7d8llvm/include/llvm/Support JSON.h

[Support] Add vector::erase to JSON::Array (#168835)

DeltaFile
+2-0llvm/include/llvm/Support/JSON.h
+2-01 files

LLVM/project 1c8ddb0llvm/test/CodeGen/AMDGPU load-global-invariant.ll

AMDGPU: Add baseline test for split/widen invariant loads
DeltaFile
+70-0llvm/test/CodeGen/AMDGPU/load-global-invariant.ll
+70-01 files

LLVM/project b3d1e92compiler-rt/test/asan/TestCases stack_container_dynamic_lib.cpp disable_container_overflow_checks.cpp

[ASAN] Disable broken __SANITIZER_DISABLE_CONTAINER_OVERFLOW__ tests on iOS/Android (#168821)

The tests added by #163468 appear to be broken due to lack of libcxx support (?).

Marking unsupported everywhere for now since it passes on some platforms and fails on others, and
I don't know the full list.

Android fail: https://lab.llvm.org/buildbot/#/builders/186/builds/14106
DeltaFile
+2-0compiler-rt/test/asan/TestCases/stack_container_dynamic_lib.cpp
+1-0compiler-rt/test/asan/TestCases/disable_container_overflow_checks.cpp
+3-02 files

LLVM/project bd003aallvm/lib/Transforms/Instrumentation BoundsChecking.cpp

correct logic

Created using spr 1.3.7
DeltaFile
+1-1llvm/lib/Transforms/Instrumentation/BoundsChecking.cpp
+1-11 files

LLVM/project 54d9d4dllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 commutable-node-with-non-sched-parent.ll

[SLP]Check if the non-schedulable phi parent node has unique operands

Need to check if the non-schedulable phi parent node has unique
operands, if the incoming node has copyables, and the node is
commutative. Otherwise, there might be issues with the correct
calculation of the dependencies.

Fixes #168589
DeltaFile
+54-0llvm/test/Transforms/SLPVectorizer/X86/commutable-node-with-non-sched-parent.ll
+28-8llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+82-82 files

LLVM/project 777935cutils/bazel WORKSPACE

[bazel] Delete WORKSPACE file (#168926)

This has been replaced by the MODULE.bazel file. Users can still use
their own WORKSPACE files, but they didn't inherit this file anyways.
Users should migrate to bzlmod as with bazel 9.x that is required.
DeltaFile
+0-215utils/bazel/WORKSPACE
+0-2151 files

LLVM/project 0041660llvm/lib/Transforms/Instrumentation BoundsChecking.cpp

undo stray change

Created using spr 1.3.7
DeltaFile
+1-2llvm/lib/Transforms/Instrumentation/BoundsChecking.cpp
+1-21 files

LLVM/project 827ff2cllvm/test/Transforms/LoopVectorize/AArch64 fold-tail-low-trip-count.ll, llvm/test/Transforms/LoopVectorize/X86 fold-tail-low-trip-count.ll induction-costs.ll

[LV] Add tests for loops with low trip counts requiring tail-folding.

Add extra tests for over-eager tail-folding for tiny trip-count loops.

Reduced from https://github.com/llvm/llvm-project/issues/167858.
DeltaFile
+116-0llvm/test/Transforms/LoopVectorize/AArch64/fold-tail-low-trip-count.ll
+99-0llvm/test/Transforms/LoopVectorize/X86/fold-tail-low-trip-count.ll
+44-28llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll
+56-10llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll
+19-36llvm/test/Transforms/LoopVectorize/X86/replicating-load-store-costs.ll
+334-745 files