LLVM/project 65907e9llvm/include/llvm/IR InstrTypes.h, llvm/lib/Transforms/InstCombine InstCombineCalls.cpp InstructionCombining.cpp

Revert "Reapply "[InstCombine] Merge consecutive assumes", round 2 (#205773)"

This reverts commit 2c1d884c47b03e35283639f7122499ded8986b2b.
DeltaFile
+14-22llvm/test/Transforms/InstCombine/assume.ll
+3-19llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+0-6llvm/include/llvm/IR/InstrTypes.h
+2-1llvm/test/Transforms/InstCombine/assume-loop-align.ll
+2-1llvm/test/Transforms/PhaseOrdering/AArch64/std-find.ll
+1-1llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+22-506 files

LLVM/project 1f84b4acompiler-rt/lib/tysan tysan_interceptors.cpp tysan.cpp, compiler-rt/test/tysan memcpy_doesnt_change_shadow.c global.c

[TySan] Remove incorrect memcpy shadow memory tracking (#199958)

Fixes the false positive in
https://github.com/llvm/llvm-project/issues/122934

memcpy is allowed to bypass strict aliasing rules (see
https://en.cppreference.com/c/string/byte/memcpy) so we shouldn't alter
shadow memory when it is used
DeltaFile
+25-0compiler-rt/test/tysan/memcpy_doesnt_change_shadow.c
+3-11compiler-rt/test/tysan/global.c
+0-13compiler-rt/lib/tysan/tysan_interceptors.cpp
+0-2compiler-rt/lib/tysan/tysan.cpp
+28-264 files

LLVM/project d3bb0callvm/docs ProgrammersManual.rst, llvm/test/CodeGen/AMDGPU sched-handleMoveUp-dead-def-join.mir

Merge branch 'main' into users/c8ef/atomic_minmax
DeltaFile
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+0-8,306llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16.txt
+5,672-0llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16-fake.txt
+5,126-0llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-dead-def-join.mir
+0-4,257llvm/docs/ProgrammersManual.rst
+35,645-19,5924,357 files not shown
+185,020-135,2234,363 files

LLVM/project c07bc29llvm/test/Transforms/ArgumentPromotion noipa.ll, llvm/test/Transforms/GlobalOpt resolve-fmv-ifunc-noipa.ll global-constructor-noipa.ll

[Attr] Add `noipa` function attribute (#203304)

This adds a `noipa` function attribute to LLVM IR. This new attribute
disables any interprocedural analysis that inspects the definition of
the function. Setting this attribute is equivalent to moving the
function definition to a separate, optimizer-opaque, module.

The `noipa` attribute does *not* control inlining or outlining. Add the
`noinline` and `nooutline` attributes as well in cases where inlining
and outlining should additionally be disabled.

Revival of https://reviews.llvm.org/D101011
Discussed in https://discourse.llvm.org/t/noipa-continues/74411

LLVM portion of https://github.com/llvm/llvm-project/issues/40819
DeltaFile
+76-0llvm/test/Transforms/GlobalOpt/resolve-fmv-ifunc-noipa.ll
+53-0llvm/test/Transforms/MergeFunc/noipa.ll
+36-0llvm/test/Transforms/GlobalOpt/global-constructor-noipa.ll
+30-0llvm/test/Transforms/WholeProgramDevirt/vcp-noipa.ll
+29-0llvm/test/Transforms/GlobalOpt/resolve-static-ifunc-noipa.ll
+27-0llvm/test/Transforms/ArgumentPromotion/noipa.ll
+251-026 files not shown
+453-732 files

LLVM/project 94ae3ddclang-tools-extra/clangd/unittests SymbolDocumentationTests.cpp HoverTests.cpp, clang/lib/AST CommentSema.cpp CommentParser.cpp

[clangd] Fix unknown doxygen command parsing in parameter documentation (#202121)

This patch mainly fixes a bug with parsing of unknown doxygen commands
in function parameter documentation.

To extract the parameter documentation from the function documentation,
the whole function documentation is parsed first.
Then the documentation paragraph for the requested parameter is
"converted" to a string and stored as the documentation for the
parameter. The string is converted by visiting and dumping all chunks of
the parsed paragraph.

When unknown doxygen commands are parsed (during the function
documentation parsing step), they are registered in a
`clang::comments::CommandTraits` object.
Visiting the unknown command requires to query the registered commands
through the `clang::comments::CommandTraits` object to get the command
name.


    [18 lines not shown]
DeltaFile
+100-0clang-tools-extra/clangd/unittests/SymbolDocumentationTests.cpp
+66-0clang-tools-extra/clangd/unittests/HoverTests.cpp
+54-0clang-tools-extra/clangd/unittests/CodeCompletionStringsTests.cpp
+36-4clang/unittests/AST/CommentLexer.cpp
+11-9clang/lib/AST/CommentSema.cpp
+11-8clang/lib/AST/CommentParser.cpp
+278-218 files not shown
+321-6014 files

LLVM/project 5314be5mlir/lib/Conversion/RaiseWasm RaiseWasmMLIR.cpp, mlir/test/Conversion/RaiseWasm wasm-div-to-arith-div.mlir wasm-convert-to-arith-tofp.mlir

[MLIR][WASM] Re-Introduce the RaiseWasmMLIRPass to convert WasmSSA MLIR to core dialects (#205483)

See the previous PR here:
https://github.com/llvm/llvm-project/pull/164562

It was reverted by @lforg37 because of some build bot issue: see
https://github.com/llvm/llvm-project/pull/164562#issuecomment-4756828598.

However, after checking on my end, I could not reproduce the buildbot
issue. Seeing that the problem triggered in `flang` which is completely
unrelated to this work, I assume that it was a builder or a flaky test
problem so I'm re-opening this PR as it had been initially merged.

---------

Signed-off-by: Ferdinand Lemaire <flemairen6 at gmail.com>
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire at woven-planet.global>
Co-authored-by: Ferdinand Lemaire <flemairen6 at gmail.com>
DeltaFile
+469-0mlir/lib/Conversion/RaiseWasm/RaiseWasmMLIR.cpp
+109-0mlir/test/Conversion/RaiseWasm/wasm-div-to-arith-div.mlir
+81-0mlir/test/Conversion/RaiseWasm/wasm-convert-to-arith-tofp.mlir
+80-0mlir/test/Conversion/RaiseWasm/wasm-sub-to-arith-sub.mlir
+79-0mlir/test/Conversion/RaiseWasm/wasm-add-to-arith-add.mlir
+78-0mlir/test/Conversion/RaiseWasm/wasm-mul-to-arith-mul.mlir
+896-034 files not shown
+1,868-140 files

LLVM/project 4e0108cllvm/utils amdgpu-pin-default-subtarget.py amdgpu-pin-default-subtarget-batch.sh

utils/AMDGPU: Add scripts to update tests using default subtarget

Add vibe coded scripts to migrate AMDGPU codegen tests that run llc
without a -mcpu argument. This should either be uncommitted or delete
after the migration is completed.

amdgpu-pin-default-subtarget.py adds an explicit -mcpu matching the current
default for the triple's OS (amdhsa defaults to gfx700, all others to
gfx600). The blank default subtarget is a featureless generic that does not
match any explicit -mcpu, so pinning changes codegen; the batch driver
amdgpu-pin-default-subtarget-batch.sh regenerates the autogenerated CHECK
lines and reverts anything that still fails.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+92-0llvm/utils/amdgpu-pin-default-subtarget.py
+64-0llvm/utils/amdgpu-pin-default-subtarget-batch.sh
+156-02 files

LLVM/project d9e53b2clang/test/CodeGen amdgpu-feature-builtins-invalid-use.cpp, clang/test/SemaCXX amdgpu-feature-builtins-invalid-use.cpp

clang: Move __builtin_amdgcn_processor_is diagnostic test to sema (#205734)

This wasn't checking the codegen result, so move it to the right place
and use -verify instead of FileChecking stderr.

Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>
DeltaFile
+50-0clang/test/SemaCXX/amdgpu-feature-builtins-invalid-use.cpp
+0-48clang/test/CodeGen/amdgpu-feature-builtins-invalid-use.cpp
+50-482 files

LLVM/project 3ff82c7llvm/include/llvm/Support CHERICapabilityFormat.h, llvm/lib/Support CHERICapabilityFormat.cpp

[Support][CHERI] Refactor CHERICapabilityFormatBase to embrace CRTP (#205623)

Currently CHERICapabilityFormatBase does not provide a definition for
getAlignmentMask, but does provide a declaration, which leads to
warnings when building with MSVC. We want to have an abstract base here
without any dynamic dispatch, which is what CRTP is for, so use it for
getAlignmentMask such that the base can provide a definition that uses
each derived type's implementation, just as the two base wrappers were
already doing when calling getAlignmentMask. Whilst doing this we might
as well move the wrappers to the header so they can be inlined (and now
that getAlignmentMask is defined we can use it in the helpers rather
than needing each of them to explicitly use the derived type).

Fixes: 7dc09d0d3cf1 ("[CHERI] Add a Support utility for determining
alignment requirements of CHERI capabilities. (#197402)")
DeltaFile
+19-5llvm/include/llvm/Support/CHERICapabilityFormat.h
+3-17llvm/lib/Support/CHERICapabilityFormat.cpp
+22-222 files

LLVM/project 850462ellvm/test/CodeGen/AMDGPU/GlobalISel udiv.i64.ll urem.i64.ll

AMDGPU: Avoid default subtarget in codegen tests (4/9)

Continue migrating targets away from codegenning the dummy target
by script.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+1,410-1,359llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
+1,351-1,351llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll
+442-429llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll
+408-393llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll
+174-174llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll
+167-167llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i32.ll
+3,952-3,87389 files not shown
+4,226-4,16495 files

LLVM/project 7e2a762llvm/test/CodeGen/AMDGPU wave_dispatch_regs.ll zext-lid.ll

AMDGPU: Avoid default subtarget in hand-written codegen tests (8/9)

Introduce the missing -mcpu argument to some tests which are not
autogenerated.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+2-2llvm/test/CodeGen/AMDGPU/wave_dispatch_regs.ll
+2-2llvm/test/CodeGen/AMDGPU/zext-lid.ll
+1-1llvm/test/CodeGen/AMDGPU/zext-i64-bit-operand.ll
+1-1llvm/test/CodeGen/AMDGPU/vop-shrink.ll
+1-1llvm/test/CodeGen/AMDGPU/waitcnt-no-redundant.mir
+1-1llvm/test/CodeGen/AMDGPU/waitcnt-trailing.mir
+8-86 files

LLVM/project e5e3c81llvm/test/CodeGen/AMDGPU llvm.amdgcn.unreachable.ll indirect-private-64.ll

AMDGPU: Avoid default subtarget in hand-written codegen tests (6/9)

Introduce -mcpu arguments in tests which didn't require check line
updates.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+4-4llvm/test/CodeGen/AMDGPU/llvm.amdgcn.unreachable.ll
+3-3llvm/test/CodeGen/AMDGPU/indirect-private-64.ll
+3-3llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+2-2llvm/test/CodeGen/AMDGPU/llvm.amdgcn.dispatch.id.ll
+2-2llvm/test/CodeGen/AMDGPU/implicit-def-muse.ll
+2-2llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll
+16-1694 files not shown
+116-116100 files

LLVM/project 250a0fellvm/test/CodeGen/AMDGPU use-sgpr-multiple-times.ll gep-address-space.ll

AMDGPU: Avoid default subtarget in hand-written codegen tests (9/9)

Fix some manual test checks using amdgcn triples without -mcpu. These require the
most careful consideration. The highest impact changes are the optimizations
removing execz branch now that there's a sched model.
DeltaFile
+6-14llvm/test/CodeGen/AMDGPU/use-sgpr-multiple-times.ll
+6-10llvm/test/CodeGen/AMDGPU/gep-address-space.ll
+7-7llvm/test/CodeGen/AMDGPU/setcc.ll
+4-6llvm/test/CodeGen/AMDGPU/si-lower-control-flow-unreachable-block.ll
+4-4llvm/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll
+3-3llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll
+30-441 files not shown
+32-467 files

LLVM/project 5176131llvm/test/CodeGen/AMDGPU trap.ll ran-out-of-registers-errors.ll

AMDGPU: Avoid default subtarget in hand-written codegen tests (7/9)

Introduce an -mcpu argument to tests missing it to avoid codegening
the default dummy target. These are cases that didn't require adjusting
the check lines.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+18-18llvm/test/CodeGen/AMDGPU/trap.ll
+5-5llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll
+4-4llvm/test/CodeGen/AMDGPU/set-wave-priority.ll
+2-2llvm/test/CodeGen/AMDGPU/subreg-intervals.mir
+2-2llvm/test/CodeGen/AMDGPU/stack-size-overflow.ll
+2-2llvm/test/CodeGen/AMDGPU/rename-independent-subregs-mac-operands.mir
+33-3394 files not shown
+142-142100 files

LLVM/project e64c5f7llvm/test/CodeGen/AMDGPU call-graph-register-usage.ll debug_frame.ll

AMDGPU: Avoid default subtarget in hand-written codegen tests (5/9)

Introduce -mcpu arguments in tests that did not need check line updates.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+6-6llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll
+4-4llvm/test/CodeGen/AMDGPU/debug_frame.ll
+3-3llvm/test/CodeGen/AMDGPU/elf.ll
+3-3llvm/test/CodeGen/AMDGPU/eh_frame.ll
+3-3llvm/test/CodeGen/AMDGPU/amdgpu-function-calls-option.ll
+3-3llvm/test/CodeGen/AMDGPU/force-alwaysinline-lds-global-address-codegen.ll
+22-2294 files not shown
+137-137100 files

LLVM/project 01f78e6llvm/test/CodeGen/AMDGPU vselect.ll valu-i1.ll

AMDGPU: Avoid default subtarget in generated codegen tests (3/9)

Another batch of tests updated by script.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+51-45llvm/test/CodeGen/AMDGPU/vselect.ll
+26-27llvm/test/CodeGen/AMDGPU/valu-i1.ll
+26-26llvm/test/CodeGen/AMDGPU/widen-vselect-and-mask.ll
+1-1llvm/test/CodeGen/AMDGPU/vgpr_constant64_to_sgpr.mir
+1-1llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir
+1-1llvm/test/CodeGen/AMDGPU/vector-legalizer-divergence.ll
+106-1012 files not shown
+108-1038 files

LLVM/project c3a51b7llvm/test/CodeGen/AMDGPU div_v2i128.ll bf16.ll

AMDGPU: Avoid using default subtarget in generated codegen tests (1/9)

Fix codegen tests using amdgcn triples without a target-cpu. The dummy
default subtarget has always been an irritating edge case to deal with.
For unknown/mesa3d/amdpal triples, this has been a gfx600-like result
and gfx700-like result for amdhsa. Convert tests to use the explicit
target. This was performed by vibe-coded script, and covers tests
using update_{llc|mir}_test_checks. There are some minor codegen differences
to be expected, mostly due to now having a scheduling model.

In the future we should forbid trying to codegen the default target.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+2,592-2,587llvm/test/CodeGen/AMDGPU/div_v2i128.ll
+1,940-1,931llvm/test/CodeGen/AMDGPU/bf16.ll
+639-643llvm/test/CodeGen/AMDGPU/fceil64.ll
+538-538llvm/test/CodeGen/AMDGPU/calling-conventions.ll
+238-746llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll
+238-746llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll
+6,185-7,19191 files not shown
+11,628-12,66197 files

LLVM/project 1c5aea0llvm/test/CodeGen/AMDGPU load-constant-i8.ll indirect-addressing-si.ll

AMDGPU: Avoid default subtarget in generated codegen tests (2/9)

Continue migrating away from testing the dummy target, and use
real targets approximating the old behavior. Performed by script.

Co-Authored-By: Claude <noreply at anthropic.com> (Claude-Opus-4.8)
DeltaFile
+899-888llvm/test/CodeGen/AMDGPU/load-constant-i8.ll
+523-1,180llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll
+710-768llvm/test/CodeGen/AMDGPU/load-global-i8.ll
+693-685llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+664-639llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
+626-635llvm/test/CodeGen/AMDGPU/load-local-i16.ll
+4,115-4,79590 files not shown
+6,994-7,67896 files

LLVM/project 8bf29eaclang/include/clang/Analysis/Analyses/LifetimeSafety Facts.h FactsGenerator.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

[LifetimeSafety] Fix loop liveness leakage for conditional operator

Generate flow facts for conditional operators in their respective
predecessor blocks (branches) instead of the merge block, path-isolating
the flows and preventing liveness from leaking across loop backedges.

Also includes tests, formatting cleanups, and refactoring of the flow propagation.

TAG=agy
CONV=b4614911-a1e1-489f-a395-2f895c423788
DeltaFile
+49-55clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+17-0clang/test/Sema/LifetimeSafety/safety.cpp
+4-0clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+2-1clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+72-564 files

LLVM/project 2c1d884llvm/include/llvm/IR InstrTypes.h, llvm/lib/Transforms/InstCombine InstCombineCalls.cpp InstructionCombining.cpp

Reapply "[InstCombine] Merge consecutive assumes", round 2 (#205773)

This patch was reverted due to triggering another bug. That bug has been
fixed by https://github.com/llvm/llvm-project/pull/205275, so this
should be ready to land now.

Original commit message:

This should make assumes a bit more efficient, since it removes a few
instructions. This should also help with optimizations that are limited
in how many instructions they step through.

This reverts commit 053d75c1d580e0c394f4cfb0688bafd05c187b0f.
DeltaFile
+22-14llvm/test/Transforms/InstCombine/assume.ll
+19-3llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+6-0llvm/include/llvm/IR/InstrTypes.h
+1-2llvm/test/Transforms/PhaseOrdering/AArch64/std-find.ll
+1-2llvm/test/Transforms/InstCombine/assume-loop-align.ll
+1-1llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+50-226 files

LLVM/project 47caa5dmlir/lib/Conversion/MathToSPIRV MathToSPIRV.cpp, mlir/test/Conversion/MathToSPIRV math-to-gl-spirv.mlir

[mlir][SPIR-V] Convert math.atan2 to GL atan2 (#201928)
DeltaFile
+14-0mlir/test/Conversion/MathToSPIRV/math-to-gl-spirv.mlir
+1-0mlir/lib/Conversion/MathToSPIRV/MathToSPIRV.cpp
+15-02 files

LLVM/project 4b85f3cllvm/lib/Target/X86 X86InstrMisc.td, llvm/test/CodeGen/X86 bmi.ll

[X86] Select BLSMSK for i8 operands (#205093)

Adds a tablegen pattern to select BLSMSK i8 for 
```
  %neg = sub i8 %x, 1
  %and = xor i8 %neg, %x
```

I've used Claude to generate the comment line before the tablegen entry and the ll file decoding which I confirmed after llc

Fixes #204984
DeltaFile
+49-0llvm/test/CodeGen/X86/bmi.ll
+14-0llvm/lib/Target/X86/X86InstrMisc.td
+63-02 files

LLVM/project 2efe4d4clang/include/clang/Analysis/Analyses/LifetimeSafety Facts.h FactsGenerator.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

[LifetimeSafety] Fix loop liveness leakage for conditional operator

Generate flow facts for conditional operators in their respective
predecessor blocks (branches) instead of the merge block, path-isolating
the flows and preventing liveness from leaking across loop backedges.

Also includes tests, formatting cleanups, and refactoring of the flow propagation.

TAG=agy
CONV=b4614911-a1e1-489f-a395-2f895c423788
DeltaFile
+49-55clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+17-0clang/test/Sema/LifetimeSafety/safety.cpp
+5-0clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+2-1clang/include/clang/Analysis/Analyses/LifetimeSafety/FactsGenerator.h
+73-564 files

LLVM/project 236a0d1libsycl/include/sycl/__impl queue.hpp, libsycl/unittests/mock helpers.cpp mock.cpp

[libsycl] add UT for kernel submission (#203931)

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+98-0libsycl/unittests/queue/sycl_kernel_launch.cpp
+25-1libsycl/unittests/mock/helpers.cpp
+14-2libsycl/unittests/mock/mock.cpp
+8-1libsycl/unittests/mock/helpers.hpp
+2-0libsycl/include/sycl/__impl/queue.hpp
+1-0libsycl/unittests/queue/CMakeLists.txt
+148-46 files

LLVM/project fdb2f1bllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 gfni-xor-fold.ll gfni-xor-fold-avx512.ll

[X86] Fold splat XOR on VGF2P8AFFINEQB source (#204508)

Given that XORs are associative, a XOR on `vgf2p8affineqb`'s source can
be reassociated to occur after by first permuting by the matrix. If the
XOR operand is a 8-bit splat, it can be applied for free by combining it
with the immediate. This patch:

- Folds XOR by splat on `vgf2p8affineqb`'s source into its immediate.
- Only occurs when the matrix is both constant and splat across each
64-bit lane.
- Can occur when the XOR is multi-use, as it can still reduce the
dependency chain.
- Includes test coverage for a variety of matrices and negative cases
for when the fold isn't possible.

Fixes #179606
DeltaFile
+151-0llvm/test/CodeGen/X86/gfni-xor-fold.ll
+99-0llvm/test/CodeGen/X86/gfni-xor-fold-avx512.ll
+36-0llvm/lib/Target/X86/X86ISelLowering.cpp
+286-03 files

LLVM/project e8cca37flang/lib/Lower/OpenMP OpenMP.cpp, flang/lib/Optimizer/OpenMP DoConcurrentConversion.cpp

[Flang][OpenMP] Add combined construct information (#198783)

This patch adds the `omp.combined` attribute to OpenMP dialect
operations following changes to the `ComposableOpInterface`.

This attribute is added to operations representing non-innermost leaf
constructs of a combined construct and to standalone block-associated
constructs that can be combined with their parent construct.

Changes are made to the OpenMP lowering logic, as well as the
do-concurrent, workshare and workdistribute transformation passes.
DeltaFile
+1,094-0flang/test/Lower/OpenMP/compound.f90
+58-20flang/lib/Lower/OpenMP/OpenMP.cpp
+6-6flang/test/Transforms/DoConcurrent/use_loop_bounds_in_body.f90
+5-5flang/test/Transforms/DoConcurrent/local_device.mlir
+4-4flang/test/Transforms/DoConcurrent/reduce_device.mlir
+6-2flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+1,173-3727 files not shown
+1,227-7133 files

LLVM/project a1396edflang/lib/Lower/OpenMP OpenMP.cpp, llvm/test/Transforms/LoopVectorize uniform_across_vf_induction2.ll uniform_across_vf_induction1_lshr.ll

Merge branch 'main' into users/usx95/06-25-suggesionsopt-in-suggestions
DeltaFile
+464-464llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
+418-197flang/lib/Lower/OpenMP/OpenMP.cpp
+220-220llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll
+206-228mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+155-155llvm/test/Transforms/LoopVectorize/X86/replicating-load-store-costs.ll
+239-0mlir/test/Dialect/Linalg/linalg-morph-elementwise-to-named.mlir
+1,702-1,264327 files not shown
+5,395-3,557333 files

LLVM/project 3e4c108mlir/include/mlir/Dialect/OpenMP OpenMPOps.td, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[MLIR][OpenMP] Explicit tagging of combined constructs (#198782)

Combined OpenMP constructs, such as `parallel do`, which represent nests
of constructs where each one contains a single other construct without
any other directives or statements in between, are currently not marked
in any way in the MLIR representation.

This works because they don't usually require any specific handling
other than what would be done for the included operations. However, the
handling of `target` regions needs to know whether it was part of a
combined construct in order to properly optimize for the SPMD case and
detect when certain clauses must be inconditionally evaluated in the
host.

So far, this has been achieved by having some MLIR pattern-matching
logic to infer whether a nest of operations could have potentially been
produced for a combined construct. This approach is error prone,
computationally expensive and it can't really work in the general case.
On the other hand, a compiler frontend can easily tell the difference

    [10 lines not shown]
DeltaFile
+137-134mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+123-76mlir/test/Dialect/OpenMP/invalid.mlir
+106-0mlir/test/Dialect/OpenMP/invalid-interface.mlir
+33-33mlir/test/Dialect/OpenMP/ops.mlir
+29-33mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+24-24mlir/test/Target/LLVMIR/openmp-teams-clauses-trunc-ext.mlir
+452-30036 files not shown
+574-37942 files

LLVM/project 8cd49c6flang/lib/Lower/OpenMP OpenMP.cpp, flang/test/Transforms/OpenMP function-filtering-host-ops.mlir

[Flang][MLIR][OpenMP] Explicitly represent omp.target kernel types (#186166)

Currently, the kernel type (i.e. `generic`, `spmd`, `spmd-no-loop` and
`bare`) of an `omp.target` operation is not an explicit attribute of the
operation. Rather, this is inferred based on the contents of its region
and clauses.

The problems with this approach are that it can be a potentially
resource intensive check for large kernels, and misidentifications are
prone to happen based on the presence of arbitrary operations from other
dialects.

Since the AST already contains the information needed to identify the
kernel type in a more reliable manner, this patch moves that
responsiblity to the Flang frontend. Other MLIR passes that create
`omp.target` operations are updated as well.

One known limitation of this approach is that the MLIR op verifier for
`omp.target` can't completely check that the contents of its region are

    [4 lines not shown]
DeltaFile
+418-197flang/lib/Lower/OpenMP/OpenMP.cpp
+110-135mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+96-50mlir/test/Dialect/OpenMP/ops.mlir
+116-28mlir/test/Dialect/OpenMP/invalid.mlir
+37-36flang/test/Transforms/OpenMP/function-filtering-host-ops.mlir
+29-28mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+806-474159 files not shown
+1,227-916165 files

LLVM/project 4b42e25clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp

no fallback
DeltaFile
+21-21clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+21-211 files