LLVM/project 1503293mlir/include/mlir-c ExtensibleDialect.h, mlir/lib/Bindings/Python IRCore.cpp

[MLIR][Python] Support `has_trait` for operations (#188492)

This PR adds a `has_trait(trait_cls)` API to `_OperationBase`, that can
be used for:
- C++-defined operations and C++-defined traits (e.g.
`func_return_op.has_trait(IsTerminatorTrait)`)
- Python-defined operations and C++-defined traits (e.g.
`my_python_op.has_trait(IsTerminatorTrait)`)
- Python-defined operations and Python-defined traits (e.g.
`my_python_op.has_trait(MyPythonTrait)`)

---------

Co-authored-by: Maksim Levental <maksim.levental at gmail.com>
DeltaFile
+27-2mlir/lib/Bindings/Python/IRCore.cpp
+20-1mlir/test/python/dialects/ext.py
+16-0mlir/test/python/dialects/builtin.py
+15-0mlir/test/python/dialects/func.py
+8-0mlir/include/mlir-c/ExtensibleDialect.h
+8-0mlir/lib/CAPI/IR/ExtensibleDialect.cpp
+94-34 files not shown
+109-410 files

LLVM/project dfefc03llvm/utils/lit/lit TestRunner.py

[lit] Explicitly unset timer to free thread stack (#188717)

Currently the virtual address space usage of lit fluctuates wildly, with
peak usage exceeding 4GB, which results in subsequent thread spawning
errors on 32-bit systems.

The cause of this is a circular reference in TimeoutHelper._timer (via the
callback), which causes the 8MB thread stack to not be immediately
reclaimed when the timer is cancelled.

We can avoid this by explicitly unsetting the timer.
DeltaFile
+2-0llvm/utils/lit/lit/TestRunner.py
+2-01 files

LLVM/project c9c1520llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Fix -Wunused-variable

A couple of these variables are only used within LLVM_DEBUG statements
which get removed by the preprocessor in non-assertions builds which
will cause the variable to become unused. Mark them maybe_unused given
the names make the code more readable.
DeltaFile
+4-4llvm/lib/Analysis/DependenceAnalysis.cpp
+4-41 files

LLVM/project e80604aflang/lib/Lower/OpenMP OpenMP.cpp, flang/lib/Lower/Support ReductionProcessor.cpp PrivateReductionUtils.cpp

[flang][OpenMP] Support user-defined declare reduction with derived types (#184897)

Fix lowering of `!$omp declare reduction` for intrinsic operators
applied
to user-defined derived types (e.g., `+` on `type(t)`). Previously, this
hit a TODO in `ReductionProcessor::getReductionInitValue` because the
code
tried to compute an init value for a non-predefined type, when it should
instead use the initializer region from the `DeclareReductionOp`.

This fixes the issue #176278: [Flang][OpenMP] Compilation error when
type-list in declare reduction directive is derived type name.

The root cause was a naming mismatch: `genOMP` for
`OpenMPDeclareReductionConstruct` used a raw operator string (e.g.,
"Add")
as the reduction name, while `processReductionArguments` at the use site
computed a canonical name via `getReductionName` (e.g.,
"add_reduction_byref_rec__QFTt"). The `lookupSymbol` in

    [83 lines not shown]
DeltaFile
+151-30flang/lib/Lower/OpenMP/OpenMP.cpp
+86-0flang/test/Lower/OpenMP/declare-reduction-finalizer.f90
+18-22flang/test/Lower/OpenMP/omp-declare-reduction-derivedtype.f90
+25-10flang/lib/Lower/Support/ReductionProcessor.cpp
+25-2flang/test/Lower/OpenMP/declare-reduction-intrinsic-op.f90
+26-0flang/lib/Lower/Support/PrivateReductionUtils.cpp
+331-643 files not shown
+362-809 files

LLVM/project 841d96dllvm/test/Transforms/LoopVectorize/AArch64 maxbandwidth-regpressure.ll

Fix maxbandwidth-regpressure.ll after rebase
DeltaFile
+2-2llvm/test/Transforms/LoopVectorize/AArch64/maxbandwidth-regpressure.ll
+2-21 files

LLVM/project b87084eclang/lib/CodeGen CGDebugInfo.cpp, clang/test/CodeGenHLSL/debug source-language.hlsl

Revert "[HLSL][SPIRV] Add support for -g to generate NonSemantic Debug Info (…"

This reverts commit 85049fc357ac3917350b97f4812209d9d00fe808.
DeltaFile
+0-34clang/test/CodeGenHLSL/debug/source-language.hlsl
+0-32llvm/test/CodeGen/SPIRV/debug-info/hlsl-debug-info-auto-activation.ll
+5-6llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
+3-5llvm/docs/SPIRVUsage.rst
+2-6clang/lib/CodeGen/CGDebugInfo.cpp
+2-2llvm/test/CodeGen/SPIRV/debug-info/debug-compilation-unit.ll
+12-854 files not shown
+15-9010 files

LLVM/project 4e383ecclang/test/Analysis ctu-main.cpp ctu-on-demand-parsing.cpp, clang/test/Analysis/Inputs ctu-other.cpp

[NFC][clang][analyzer] Move CTU tests and inputs into a dedicated subfolder

Move CTU related LIT tests to a dedicated directory.

-- 

CPP-7804
DeltaFile
+249-0clang/test/Analysis/ctu/main.cpp
+0-249clang/test/Analysis/ctu-main.cpp
+0-183clang/test/Analysis/Inputs/ctu-other.cpp
+183-0clang/test/Analysis/ctu/Inputs/other.cpp
+0-116clang/test/Analysis/ctu-on-demand-parsing.cpp
+116-0clang/test/Analysis/ctu/on-demand-parsing.cpp
+548-54874 files not shown
+1,522-1,52080 files

LLVM/project 5524fa4clang/lib/Basic/Targets AMDGPU.cpp, clang/test/Driver amdgpu-macros.cl

clang: Define FP_FAST_FMA_HALF macro for AMDGPU (#188243)
DeltaFile
+4-0clang/lib/Basic/Targets/AMDGPU.cpp
+1-0clang/test/Driver/amdgpu-macros.cl
+1-0clang/test/Preprocessor/predefined-arch-macros.c
+6-03 files

LLVM/project 233faf1llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-no-dotprod.ll partial-reduce-fdot-product.ll

Improvements to cost-model

The chosen costs are more precise as it tries to better use the target-features to determine if something can be expanded.
The costs in sdot-i16-i32 are now more accurate and the loops that didn't vectorise before result in equivalent or better codegen.
DeltaFile
+62-42llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+9-9llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll
+8-8llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-fdot-product.ll
+3-3llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-add-sdot-i16-i32.ll
+2-2llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-sdot.ll
+84-645 files

LLVM/project e448bd5llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-fdot-product.ll partial-reduce-add-sdot-i16-i32.ll

Various changes to the cost-model.

This has a number of changes to the partial reduction cost-model:

* Implement the fact that *MLALB/T instructions can be used for
  16-bit -> 32-bit partial reductions (or *MLAL/MLAL2 for NEON).

* Fixes the cost of reductions that don't have specific lowering,
  rather than returning a random number, we now return the cost of
  expanding the partial reduction in ISel.

  For sub-reductions we scale the cost to make them slightly cheaper,
  so that they're still candidates for forming cdot operations.

* Reduce the cost of FP reductions, which are currently prohibitively
  expensive.
DeltaFile
+39-26llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+22-22llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-fdot-product.ll
+26-2llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-add-sdot-i16-i32.ll
+2-2llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-sdot.ll
+89-524 files

LLVM/project 187dc3allvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-sub-sdot.ll partial-reduce-add-sdot-i16-i32.ll

Address comments
DeltaFile
+22-24llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+2-2llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub-sdot.ll
+1-1llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-add-sdot-i16-i32.ll
+25-273 files

LLVM/project 26f6c2fllvm/include/llvm/Analysis TargetTransformInfo.h, llvm/lib/Analysis TargetTransformInfo.cpp

Distinguish between extends
DeltaFile
+13-11llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+16-0llvm/lib/Analysis/TargetTransformInfo.cpp
+3-0llvm/include/llvm/Analysis/TargetTransformInfo.h
+32-113 files

LLVM/project e107cd0llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-chained.ll

NFC Pre-commit of rerunning checks on partial-reduced-chained.ll
DeltaFile
+24-24llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll
+24-241 files

LLVM/project 3a56470libc/shared rpc.h

[libc] Increase the maximum RPC port size for future hardware (#188756)

Summary:
We store the locks in local device memory for performance and
simplicity. The number here needs to correspond to the maximum occupancy
so that we never have a situation where a GPU thread is blocking another
GPU thread.

The number now is sufficient for most hardware, but modern compute chips
like the MI300x are already pushing ~12000 resident waves. This has ABI
impliciations so I'd like to bump it up sooner rather than later. The
ABI change is within what OpenMP expects, LLVM major versions, and it
will be caught statically so there's no risk of silent corruption (size
doesn't match).
DeltaFile
+3-1libc/shared/rpc.h
+3-11 files

LLVM/project ffd6a13compiler-rt/include/profile InstrProfData.inc, compiler-rt/lib/profile InstrProfilingPlatformOther.c InstrProfilingPlatformGPU.c

[compiler-rt] Rework profile data handling for GPU targets (#187136)

Summary:
Currently, the GPU iterates through all of the present symbols and
copies them by prefix. This is inefficient as it requires a lot of small
high-latency data transfers rather than a few large ones. Additionally,
we force every single profiling symbol to have protected visibility.
This means potentially hundreds of unnecessary symbols in the symbol
table.

This PR changes the interface to move towards the start / stop section
handling. AMDGPU supports this natively as an ELF target, so we need
little changes. Instead of overriding visibility, we use a single table
to define the bounds that we can obtain with one contiguous load.

Using a table interface should also work for the in-progress HIP
implementation for this, as it wraps the start / stop sections into
standard void pointers which will be inside of an already mapped region
of memory, so they should be accessible from the HIP API.

    [13 lines not shown]
DeltaFile
+78-95offload/plugins-nextgen/common/src/GlobalHandler.cpp
+35-12compiler-rt/lib/profile/InstrProfilingPlatformOther.c
+44-0compiler-rt/lib/profile/InstrProfilingPlatformGPU.c
+24-15llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
+36-0compiler-rt/include/profile/InstrProfData.inc
+36-0llvm/include/llvm/ProfileData/InstrProfData.inc
+253-1225 files not shown
+274-13911 files

LLVM/project 76f8806llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU ctls.ll

[AMDGPU] Remove AMDGPUISD::FFBH_I32 and add ISD::CTLS lowering (#187694)

It's the a continuation of previously reverted
https://github.com/llvm/llvm-project/pull/178420

The patch removes custom AMDGPUISD::FFBH_I32 SelectionDAG node. Call
sites that need raw hardware semantics (LowerINT_TO_FP32, legalizeITOFP)
now use amdgcn_sffbh intrinsic directly. ISD::CTLS is added as a Custom
operation for i32.

Previous attempt had an issue:
The hardware v_ffbh_i32 instruction (v_cls_i32 on newer targets) has
different semantics than ISD::CTLS:
-sffbh returns [1, BitWidth-1] for normal values, -1 for
all-same-bits
-CTLS returns [0, BitWidth-2] for normal values, BitWidth-1 for
all-same-bits

Now LowerCTLS handles this by: sffbh -> umin(sffbh, BitWidth) -> sub 1.

    [6 lines not shown]
DeltaFile
+624-0llvm/test/CodeGen/AMDGPU/ctls.ll
+159-0llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ctls.mir
+41-2llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+25-0llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+18-1llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+0-4llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
+867-75 files not shown
+873-911 files

LLVM/project 51593c1clang/lib/CodeGen CGObjCMac.cpp

format
DeltaFile
+3-2clang/lib/CodeGen/CGObjCMac.cpp
+3-21 files

LLVM/project 249a3d1llvm/utils/gn/secondary/llvm/lib/Target/NVPTX BUILD.gn

[gn build] Port 28318d5db86f
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/Target/NVPTX/BUILD.gn
+1-01 files

LLVM/project a5cd44fllvm/utils/gn/secondary/compiler-rt/lib/sanitizer_common BUILD.gn

[gn] port 25904ac91554
DeltaFile
+1-0llvm/utils/gn/secondary/compiler-rt/lib/sanitizer_common/BUILD.gn
+1-01 files

LLVM/project a111106clang/lib/CodeGen CodeGenModule.h CGObjCMac.cpp

isPreconditionThunkEnabled -> isObjCDirectPreconditionThunkEnabled
DeltaFile
+3-3clang/lib/CodeGen/CodeGenModule.h
+2-2clang/lib/CodeGen/CGObjCMac.cpp
+1-1clang/lib/CodeGen/CGObjC.cpp
+6-63 files

LLVM/project f08f7ecllvm/utils/gn/secondary/compiler-rt/lib/builtins BUILD.gn

[gn] "port" 80831832e03f
DeltaFile
+3-0llvm/utils/gn/secondary/compiler-rt/lib/builtins/BUILD.gn
+3-01 files

LLVM/project bbd69eellvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/X86 srem-seteq-vec-nonsplat.ll urem-seteq-vec-nonsplat.ll

[TargetLowering] In prepareUREMEqFold/prepareSREMEqFold, fix K=-1 for i64 elements. (#188600)

K is an unsigned, it will be zero extended to uint64_t for
the APInt constructor. If the ShSVT has more than 32 bits, we won't
create an all ones ConstantSDNode.

To fix this, explicitly push an all ones constant to KAmts. This
also fixes an APInt ImplicitTrunc.

This allows turnVectorIntoSplatVector to work for this case.
DeltaFile
+119-0llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
+114-0llvm/test/CodeGen/X86/urem-seteq-vec-nonsplat.ll
+6-10llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+239-103 files

LLVM/project 797916bllvm/lib/Frontend/OpenMP OMPIRBuilder.cpp, mlir/test/Target/LLVMIR omptarget-region-host-device-llvm.mlir

[OpenMP][flang] Fix crash in host offload (#187847)

Guard `getGridValue` in `OMPIRBuilder` to avoid reaching the
`unreachable` in `getGridValue` when offloading to host device without
an explicit num_threads clause.
DeltaFile
+13-3llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+14-0mlir/test/Target/LLVMIR/omptarget-region-host-device-llvm.mlir
+27-32 files

LLVM/project 1422665clang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/CodeGen CIRGenAtomic.cpp

[CIR] Add support for __atomic_fetch_uinc and __atomic_fetch_udec (#188050)

This patch adds CIRGen and LLVM lowering support for the
`__atomic_fetch_uinc` and the `__atomic_fetch_udec` built-in functions.

Assisted-by: Claude Opus 4.6
DeltaFile
+30-0clang/test/CIR/CodeGen/atomic.c
+16-5clang/lib/CIR/CodeGen/CIRGenAtomic.cpp
+14-0clang/test/CIR/IR/atomic.cir
+11-2clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+5-2clang/include/clang/CIR/Dialect/IR/CIROps.td
+76-95 files

LLVM/project 14269b4openmp/runtime/test/taskgraph taskgraph_deps_23.cpp taskgraph_deps_25.cpp

[OpenMP] OpenMP 6.0 "taskgraph" support, add new tests
DeltaFile
+100-0openmp/runtime/test/taskgraph/taskgraph_deps_23.cpp
+86-0openmp/runtime/test/taskgraph/taskgraph_deps_25.cpp
+77-0openmp/runtime/test/taskgraph/taskgraph_deps_3.cpp
+77-0openmp/runtime/test/taskgraph/taskgraph_deps_24.cpp
+73-0openmp/runtime/test/taskgraph/taskgraph_deps_4.cpp
+72-0openmp/runtime/test/taskgraph/taskgraph_deps_15.cpp
+485-021 files not shown
+1,575-027 files

LLVM/project eec9d38openmp/runtime/test/tasking omp_record_replay_multiTDGs.cpp omp_record_replay_print_dot.cpp

[OpenMP] OpenMP 6.0 "taskgraph" support, remove obsolete tests
DeltaFile
+0-76openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp
+0-69openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
+0-63openmp/runtime/test/tasking/omp_record_replay_deps.cpp
+0-58openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp
+0-56openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp
+0-50openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp
+0-3721 files not shown
+0-4207 files

LLVM/project a409a9bclang/include/clang/AST OpenMPClause.h, clang/lib/AST OpenMPClause.cpp

[OpenMP] OpenMP 6.0 "taskgraph" support, frontend parts
DeltaFile
+447-221clang/lib/CodeGen/CGOpenMPRuntime.cpp
+71-2clang/include/clang/AST/OpenMPClause.h
+53-10clang/lib/CodeGen/CGStmtOpenMP.cpp
+28-0clang/lib/Sema/SemaOpenMP.cpp
+26-0clang/lib/Sema/TreeTransform.h
+17-2clang/lib/AST/OpenMPClause.cpp
+642-23511 files not shown
+734-24117 files

LLVM/project 15c75e1clang/include/clang/Driver Driver.h, clang/lib/Driver Driver.cpp

[Driver][HIP] Bundle AMDGPU -S output under the new offload driver (#188262)

[Driver][HIP] Bundle AMDGPU -S output under the new offload driver

The old offload driver emits bundled assembly code for -S in textual
clang-offload-bundler format. This allows a single .s file to contain
assembly
code for both host and devices, which can be consumed by clang. This
eases
manual optimization of assembly code for host and device. There are
existing
HIP tests and examples depending on this feature. The new offload driver
does
not support it, causing regressions. This patch adds support for this
feature
with minor changes to the job action creations.

Fixes: LCOMPILER-553
DeltaFile
+56-6clang/lib/Driver/Driver.cpp
+8-4clang/include/clang/Driver/Driver.h
+3-0clang/test/Driver/hip-phases.hip
+67-103 files

LLVM/project 19420c0clang/lib/CodeGen CGOpenMPRuntime.cpp, clang/test/OpenMP target_update_codegen.cpp

[OpenMP] Fix non-contiguous array omp target update (#156889)

The existing implementation has three issues which this patch addresses.

1. The last dimension which represents the bytes in the type, has the
wrong stride and count. For example, for a 4 byte int, count=1 and
stride=4. The correct representation here is count=4 and stride=1
because there are 4 bytes (count=4) that we need to copy and we do not
skip any bytes (stride=1).

2. The size of the data copy was computed using the last dimension.
However, this is incorrect in cases where some of the final dimensions
get merged into one. In this case we need to take the combined size of
the merged dimensions, which is (Count * Stride) of the first merged
dimension.

3. The Offset into a dimension was computed as a multiple of its Stride.
However, this Stride which is in bytes, already includes the stride
multiplier given by the user. This means that when the user specified

    [3 lines not shown]
DeltaFile
+102-61offload/test/offloading/non_contiguous_update.cpp
+95-0offload/test/offloading/strided_offset_multidim_update.c
+22-21clang/test/OpenMP/target_update_codegen.cpp
+18-18offload/test/offloading/strided_update_variable_stride_misc.c
+12-8clang/lib/CodeGen/CGOpenMPRuntime.cpp
+9-7offload/test/offloading/strided_update_count_expression_complex.c
+258-1153 files not shown
+276-1249 files

LLVM/project 4ca9638llvm/lib/Analysis UniformityAnalysis.cpp

review: avoid adding NeverUniform arg and inst to uniformValues
DeltaFile
+18-11llvm/lib/Analysis/UniformityAnalysis.cpp
+18-111 files