LLVM/project 3e72f02openmp/device/src Kernel.cpp

[Offload][OpenMP][libdevice] Make check to enter state machine architecture dependent (#188144)

The genericStateMachine call uses synchronize::thread wich is expected
to be implemented using a workgroup level barrier.
Currently as in some other architectures where if threads in the same
warp as the main thread reach the barrier may cause a race condition
there's a condition that makes some threads not enter the state machine.
But in Intel GPUs all threads must reach the barrier for it to be
completed, otherwise the threads in the state machine never make
progress.

This PR moves the condition into an architecture-dependent config so it
can work correctly for both kinds of hardware.
DeltaFile
+29-15openmp/device/src/Kernel.cpp
+29-151 files

LLVM/project 46cc035flang-rt CMakeLists.txt

Explicitly warn about disabled tests
DeltaFile
+8-4flang-rt/CMakeLists.txt
+8-41 files

LLVM/project f871503llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/RISCV shuffled-gather-casted.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+25-14llvm/test/Transforms/SLPVectorizer/RISCV/shuffled-gather-casted.ll
+13-6llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+38-202 files

LLVM/project 80dc5aaflang/lib/Lower/OpenMP ClauseProcessor.cpp, flang/test/Lower/OpenMP declare-simd.f90 linear_modifier.f90

[flang][mlir][OpenMP] Add linear modifier (val, ref, uval) (#187142)

Add support for OpenMP linear modifiers `val`, `ref`, and `uval` as
defined in OpenMP 5.2 (5.4.6).
DeltaFile
+107-23mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+82-0mlir/test/Dialect/OpenMP/ops.mlir
+75-0mlir/test/Dialect/OpenMP/invalid.mlir
+55-8flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+55-2flang/test/Lower/OpenMP/declare-simd.f90
+54-0flang/test/Lower/OpenMP/linear_modifier.f90
+428-338 files not shown
+521-6614 files

LLVM/project 33da12amlir/lib/Dialect/OpenACC/Transforms ACCIfClauseLowering.cpp, mlir/test/Dialect/OpenACC acc-if-clause-lowering.mlir

[acc] Lower acc if with multi-block host fallback via scf.execute_region (#188350)

handle multi-block host fallback regions by wrapping them in
scf.execute_region, instead of rejecting with `not yet implemented:
region with multiple blocks`.
DeltaFile
+22-15mlir/lib/Dialect/OpenACC/Transforms/ACCIfClauseLowering.cpp
+36-0mlir/test/Dialect/OpenACC/acc-if-clause-lowering.mlir
+58-152 files

LLVM/project 7aaec28llvm/lib/Transforms/InstCombine InstCombineSelect.cpp, llvm/test/Transforms/InstCombine nanless-canonicalize-combine.ll

InstCombine: Fold out nanless canonicalize pattern (#172998)

Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.

The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility

    [16 lines not shown]
DeltaFile
+41-121llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
+99-0llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+140-1212 files

LLVM/project 51c3f97lld/ELF/Arch Hexagon.cpp, lld/test/ELF hexagon-undefined-weak.s

[lld][Hexagon] Test undefined weak branches (#186613)

Undefined weak branches do not needsThunk().

Add a test case to cover undef weak.
DeltaFile
+28-0lld/test/ELF/hexagon-undefined-weak.s
+4-1lld/ELF/Arch/Hexagon.cpp
+32-12 files

LLVM/project 1dc5b0cllvm/include/llvm/Analysis AliasAnalysis.h, llvm/lib/Analysis BasicAliasAnalysis.cpp

[DSE] Use CycleInfo instead of LoopInfo (#188253)

DSE needs to reason about cycles in order to correctly handle
loop-carried dependencies. It currently does this by using LoopInfo and
performing a separate check for irreducible control flow.

Instead, we can use CycleInfo, which is like LoopInfo but also handles
irreducible cycles.

This requires computing CycleInfo (which, unlike LoopInfo won't be
reused by surrouding passes), but ends up being neutral in terms of
compile-time overall.
DeltaFile
+17-27llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
+8-4llvm/lib/Analysis/BasicAliasAnalysis.cpp
+5-2llvm/include/llvm/Analysis/AliasAnalysis.h
+1-0llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+1-0llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+1-0llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+33-335 files not shown
+38-3311 files

LLVM/project 08c94c0lldb/source/Plugins/ObjectContainer/BSD-Archive ObjectContainerBSDArchive.cpp, lldb/source/Plugins/ObjectFile/COFF ObjectFileCOFF.cpp

[lldb] Clear up GetModuleSpecifications return value confusion (#188276)

Some plugins were returning the number of specifications they have
added, while others were returning the total final number. Particularly
devious plugins (Minidump) were clearing the specification list
altogether. This resulted in nondeterministic failures (depending on
plugin ininitialization order) in TestSBModule.

This PR defines the problem away by having each plugin only return the
specifications it is responsible for. If the caller wants to merge them,
it is free to do so. This *might* be slighly less efficient, but this is
hardly hot code.

I'm not touching the ObjectFile::GetModuleSpecifications function (the
caller of all these functions) as the PR is big enough, although the
same approach might be warranted there as well.

Fixes https://github.com/llvm/llvm-project/issues/178625.
DeltaFile
+13-12lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp
+12-7lldb/source/Symbol/ObjectFile.cpp
+8-9lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp
+8-7lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp
+7-8lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp
+7-7lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
+55-5026 files not shown
+182-19232 files

LLVM/project b2d161ccompiler-rt/lib/builtins/arm dnan2.c

Remove unnecessary 'else' to match fnan2.c
DeltaFile
+1-2compiler-rt/lib/builtins/arm/dnan2.c
+1-21 files

LLVM/project 4dbcc3flibc/src/__support/File file.cpp file.h, libc/src/__support/File/linux file.cpp

[libc] implement fflush(NULL) support (#188217)

Implement support for flushing all open streams when fflush is called
with a NULL pointer.

* Added a global linked list to track all open File objects.
* Updated File class to include prev/next pointers and list management
methods.
* Implemented POSIX requirement for fflush to sync seekable input
streams back to the host environment.
* Updated Linux-specific file creation to register new files in the
global list.
* Fixed a memory safety bug in create_file_from_fd using delete instead
of free.
* Added unit test for fflush(NULL).
* Added explanatory comments to fflush.cpp and file.cpp.
DeltaFile
+42-3libc/src/stdio/generic/fflush.cpp
+41-1libc/src/__support/File/file.cpp
+21-0libc/test/src/stdio/fileop_test.cpp
+16-3libc/src/__support/File/file.h
+5-1libc/src/__support/File/linux/file.cpp
+4-0libc/src/stdio/generic/CMakeLists.txt
+129-86 files

LLVM/project 1e3a31eutils/bazel/llvm-project-overlay/mlir BUILD.bazel

[Bazel] Fixes e6cfdd0 (#188498)

This fixes e6cfdd01ae1a52dd193499bb324ac7aa007fa22d.
DeltaFile
+2-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+2-01 files

LLVM/project 902b89bllvm/utils/gn/secondary/clang/lib/Headers BUILD.gn

[gn] fix typo in 627f6aa1cd930e6a
DeltaFile
+2-2llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
+2-21 files

LLVM/project e6cfdd0mlir/lib/Conversion/XeGPUToXeVM XeGPUToXeVM.cpp, mlir/test/Conversion/XeGPUToXeVM failed_conversion.mlir dpas.mlir

[MLIR][XeGPU] Validate DPAS operand types against uArch in XeGPUToXeVM conversion (#185081)

The `DpasOp` would crash with `llvm_unreachable` with unsupported types
(like i16, or i32 in operand) when during lowering to the XeVM dialect.
This happens in both `encodePrecision` and `getNumOperandsPerDword`.

Per
https://github.com/llvm/llvm-project/issues/180107#issuecomment-4009160113,
we handle this in the `matchAndRewrite` by retrieving the uArch instance
and fetching the registered `SubgroupMatrixMultiplyAcc` instruction.
Then, we validate with `getSupportedTypes` and check `aTy`, `bTy`, and
`resultType` correctly with `notifyMatchError` for reporting and
graceful handling.

We add a failed conversion test for a simplified version of the
reproducible error in #180107
DeltaFile
+35-0mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp
+14-0mlir/test/Conversion/XeGPUToXeVM/failed_conversion.mlir
+1-1mlir/test/Conversion/XeGPUToXeVM/dpas.mlir
+50-13 files

LLVM/project a36b969llvm/lib/CodeGen/AsmPrinter AsmPrinter.cpp

[AsmPrinter] Handle fragment splitting in instruction size verification

Also verify the size if the fragment ends up being split.
DeltaFile
+12-5llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+12-51 files

LLVM/project 627f6aallvm/utils/gn/secondary/clang/lib/Headers BUILD.gn, llvm/utils/gn/secondary/clang/utils/TableGen BUILD.gn

[gn build] Port daec3b9fb6e2 (clang hlsl header tblgen)
DeltaFile
+14-0llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
+1-0llvm/utils/gn/secondary/clang/utils/TableGen/BUILD.gn
+15-02 files

LLVM/project 3864b2allvm/lib/Transforms/InstCombine InstCombineSelect.cpp

Address comments
DeltaFile
+9-13llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+9-131 files

LLVM/project 02c3892llvm/test/Transforms/InstCombine nanless-canonicalize-combine.ll

regen tests
DeltaFile
+38-14llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
+38-141 files

LLVM/project 5aff33allvm/lib/Transforms/InstCombine InstCombineSelect.cpp, llvm/test/Transforms/InstCombine nanless-canonicalize-combine.ll

InstCombine: Fold out nanless canonicalize pattern

Pattern match a wrapper around llvm.canonicalize which
weakens the semantics to not require quieting signaling
nans. Depending on the denormal mode and FP type, we can
either drop the pattern entirely or reduce it only to
a canonicalize call. I'm inventing this pattern to deal
with LLVM's lax canonicalization model in math library
code.

The math library code currently has explicit checks for
the denormal mode, and conditionally canonicalizes the
result if there is flushing. Semantically, this could be
directly replaced with a simple call to llvm.canonicalize,
but doing so would incur an additional cost when using
standard IEEE behavior. If we do not care about quieting
a signaling nan, this should be a no-op unless the denormal
mode may flush. This will allow replacement of the
conditional code with a zero cost abstraction utility

    [17 lines not shown]
DeltaFile
+51-155llvm/test/Transforms/InstCombine/nanless-canonicalize-combine.ll
+103-0llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+154-1552 files

LLVM/project c4c9d99llvm/include/llvm/Analysis CycleAnalysis.h, llvm/include/llvm/CodeGen MachineCycleAnalysis.h

[CycleInfo] Preserve if CFG preserved (#188443)

Add a custom invalidator that makes sure CycleAnalysis is preserved if
the CFGAnalyses set is preserved. The implementation matches that of
LoopInfo.
DeltaFile
+11-0llvm/lib/CodeGen/MachineCycleAnalysis.cpp
+9-0llvm/lib/Analysis/CycleAnalysis.cpp
+3-0llvm/include/llvm/Analysis/CycleAnalysis.h
+3-0llvm/include/llvm/CodeGen/MachineCycleAnalysis.h
+26-04 files

LLVM/project 396d638mlir/lib/Bindings/Python IRCore.cpp IRAffine.cpp, mlir/test/mlir-tblgen op-python-bindings.td

[MLIR] [Python] More improvements to type annotations (#188468)

* `mlir.ir` now exports `_OperationBase`. It is handy to use when both
`Operation` and `OpView` are accepted.
* Added type arguments where they were missing, e.g.
`list[ir.Attribute]` instead of just `list`.
* Changed `Opview.build_generic` and `OpView.parse` to return `Self`
instead of the supertype `Type`.
* Changed the bindings generator to emit a parameterized `OpResult` when
the exact type is available.
DeltaFile
+19-8mlir/lib/Bindings/Python/IRCore.cpp
+10-7mlir/lib/Bindings/Python/IRAffine.cpp
+15-0mlir/test/mlir-tblgen/op-python-bindings.td
+4-2mlir/lib/Bindings/Python/IRAttributes.cpp
+4-0mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
+2-2mlir/lib/Bindings/Python/IRTypes.cpp
+54-191 files not shown
+55-197 files

LLVM/project b86a279llvm/lib/CodeGen Rematerializer.cpp, llvm/unittests/CodeGen RematerializerTest.cpp

[CodeGen] Fix incorrect rematerializtion order in rematerializer

When rematerializing DAGs of registers wherein multiple paths exist
between some regsters of the DAG, it is possible that the
rematerialization determines an incorrect rematerialization order that
does not ensure that a register's dependencies are rematerialized before
itself; an invariant that is otherwise required.

This fixes that using a simpler recursive logic to determine a correct
rematerialization order that honors this invariant. A minimal unit test
is added that fails on the current implementation.
DeltaFile
+20-33llvm/lib/CodeGen/Rematerializer.cpp
+38-0llvm/unittests/CodeGen/RematerializerTest.cpp
+58-332 files

LLVM/project 9c6054dcross-project-tests/debuginfo-tests/llvm-prettyprinters/gdb llvm-support.cpp llvm-support.gdb, cross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb pointer-union.test pointer-union.cpp

[lldb][ADT] Fix LLDB/GDB formatters for PointerUnion after recactoring (#188483)

In #188242, we replaced `PointerUnion`'s `PointerIntPair` storage with
`PunnedPointer<void*>`. The old formatters relied on the PIP synthetic
provider (LLDB) / `get_pointer_int_pair helper` (GDB) which no longer
work.

Instead, read raw bytes from `PunnedPointer` and compute the active tag
from template argument type alignments -- the same fixed-width encoding
the C++ implementation uses. When template arg enumeration is truncated
(e.g., function-local types in GDB), the formatters fall back to showing
a tag-stripped `void*` instead of silently misdecoding.

Alternatives that didn't work out:
- Adding a C++ helper (`getActiveMemberIdx`) callable from Python: gets
optimized out even with `__attribute__((used, noinline))`, and
expression evaluation fails for synthetic children.
- Using `isa`/`dyn_cast` checks from Python: requires expression
evaluation, which does not work for local types or synthetic children

    [2 lines not shown]
DeltaFile
+64-20llvm/utils/lldbDataFormatters.py
+57-5llvm/utils/gdb-scripts/prettyprinters.py
+21-0cross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb/pointer-union.test
+11-3cross-project-tests/debuginfo-tests/llvm-prettyprinters/lldb/pointer-union.cpp
+2-2cross-project-tests/debuginfo-tests/llvm-prettyprinters/gdb/llvm-support.cpp
+1-1cross-project-tests/debuginfo-tests/llvm-prettyprinters/gdb/llvm-support.gdb
+156-316 files

LLVM/project 5b7cec1libclc/clc/lib/generic/async clc_prefetch.inc

libclc: Use prefetch builtin to implement default prefetch
DeltaFile
+1-1libclc/clc/lib/generic/async/clc_prefetch.inc
+1-11 files

LLVM/project ca9ac0ellvm/lib/Target/AMDGPU AMDGPUSwLowerLDS.cpp, llvm/lib/Transforms/Utils EntryExitInstrumenter.cpp

[CHERI] Allow @llvm.returnaddress to return a pointer in any address space. (#188464)

Clang now constructs calls to it using the default program address space from the DataLayout.

Co-authored-by: Alex Richardson <alexrichardson at google.com>
DeltaFile
+8-9llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-nested-asan.ll
+8-8llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-static-indirect-access-nested.ll
+8-8llvm/test/Transforms/EntryExitInstrumenter/mcount.ll
+9-3llvm/lib/Transforms/Utils/EntryExitInstrumenter.cpp
+6-6llvm/test/Instrumentation/ThreadSanitizer/atomic-non-integer.ll
+7-4llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+46-3850 files not shown
+147-12856 files

LLVM/project 412912clibclc/opencl/lib/generic/atomic atomic_fetch_sub.cl atomic_fetch_add.cl

Fix formatting
DeltaFile
+9-9libclc/opencl/lib/generic/atomic/atomic_fetch_sub.cl
+9-9libclc/opencl/lib/generic/atomic/atomic_fetch_add.cl
+18-182 files

LLVM/project dca3b7cflang/lib/Semantics openmp-utils.cpp check-omp-loop.cpp, flang/test/Parser/OpenMP tile-fail.f90 interchange-fail.f90

[flang][OpenM] Check if loop nest/sequence is well-formed

Check if the code associated with a nest or sequence construct is well
formed. Emit diagnostic messages if not.

Make a clearer separation for checks of loop-nest-associated and loop-
sequence-associated constructs.

Unify structure of some of the more common messages.

Issue: https://github.com/llvm/llvm-project/issues/185287
DeltaFile
+115-47flang/lib/Semantics/openmp-utils.cpp
+55-74flang/lib/Semantics/check-omp-loop.cpp
+0-31flang/test/Parser/OpenMP/tile-fail.f90
+0-31flang/test/Parser/OpenMP/interchange-fail.f90
+18-0flang/test/Semantics/OpenMP/tile-fail.f90
+18-0flang/test/Semantics/OpenMP/interchange-fail.f90
+206-18317 files not shown
+270-23623 files

LLVM/project e76bbaalibclc/opencl/lib/generic/atomic atomic_fetch_add.cl atomic_fetch_sub.cl

Address comments
DeltaFile
+40-17libclc/opencl/lib/generic/atomic/atomic_fetch_add.cl
+40-17libclc/opencl/lib/generic/atomic/atomic_fetch_sub.cl
+80-342 files

LLVM/project 6b6d157flang/include/flang/Semantics openmp-utils.h, flang/lib/Semantics openmp-utils.cpp check-omp-loop.cpp

[flang][OpenMP] Provide reasons for calculated sequence length (#187866)

If the length was limited by some factor, include the reason for what
caused the reduction.

Issue: https://github.com/llvm/llvm-project/issues/185287
DeltaFile
+34-23flang/lib/Semantics/openmp-utils.cpp
+9-7flang/lib/Semantics/check-omp-loop.cpp
+5-5flang/include/flang/Semantics/openmp-utils.h
+2-0flang/test/Semantics/OpenMP/loop-transformation-construct04.f90
+1-0flang/test/Semantics/OpenMP/loop-transformation-construct02.f90
+1-0flang/test/Semantics/OpenMP/fuse1.f90
+52-356 files

LLVM/project 857a405llvm/utils/gn/secondary/compiler-rt/test/builtins BUILD.gn

[gn] port a5a7f6266ef05
DeltaFile
+1-1llvm/utils/gn/secondary/compiler-rt/test/builtins/BUILD.gn
+1-11 files