[mlir][linalg][elementwise] Fold broadcast into new elementwise (#167626)
Fold broadcast into new elementwise Op which has affine-map attached.
Merging on behalf of @someoneinjd
[libc] Fix llvm-gpu-loader passing uninitialized device memory (#186804)
Summary:
The return value was not zeroed, this was accidentally dropped when we
did the port and it's zero "almost always" so I didn't notice. Hopefully
this makes the test suite no longer flaky.
[flang] Reorder messages wrt line number before diff(actual, expect)
When messages are attached together, the source locations to which they
refer are not necessarily monotonically increasing. For example
```
error: foo.f90:10: There is a problem here # line 10
because: foo.f90:12: This thing is invalid # line 12 (attached)
error: foo.f90:11: There is another problem here # line 11
```
There is no way to represent that in the source flle via ERROR annotations,
so before running unified_diff "canonicalize" the list of messages into an
order that corresponds to the line numbers.
[Flang][OpenMP] Provide option to use heap allocation for private adjustable arrays (#186795)
The size of adjustable Fortran arrays is not known at compilation time.
Using limited GPU stack memory may cause hard-to-debug errors. On the
other hand, switching to heap memory allocation may lead to missed
optimization opportunities and significantly increased kernel execution
time.
Adding the option `-mmlir --enable-gpu-heap-alloc` allows the user to
generate valid code for adjustable Fortran arrays. The flag is off by
default, so there is no efficiency penalty for code that does not use
adjustable arrays.
[SPIR-V] Fix llvm.spv.gep return type for vector-indexed GEPs (#185931)
The `int_spv_gep` intrinsic was defined with `llvm_anyptr_ty` which
forced it to return a scalar pointer. Change the return type to
`llvm_any_ty` to allow the intrinsic to match the actual result type of
the original GEP, whether scalar or vector
[lit] Stop holding subprocess objects open in TimeoutHelper (#186712)
Tweak TestRunner's TimeoutHelper storage to hold only PIDs rather
than the whole process object. Holding the object causes many pipes to
stay open, when all we need is the pid.
Addresses #185941
[lldb][NativePDB] Compile `vbases.test` without default libraries (#186510)
#185735 added the `vbases.test`, which compiles with
`--target=x86_64-windows-msvc`. This will cause the final executable to
be linked to `libcmt.lib`. That doesn't work on ARM, so this PR changes
the command line to link without the default libraries. They're not
needed if we disable `/GS` (buffer security check) like in other tests.
We use `%clang_cl` over `%build` to be able to compile with DWARF as
well.
[CodeGen] Fix C++ global dtor for non-zero program AS targets (#186484)
In codegen for C++ global destructors, we pass a pointer to the
destructor to be called at program exit as the first arg to the
`__cxa_atexit` function.
If the target's default program AS and default AS are not equal, we need
to emit an addrspacecast from the program AS to the generic AS (which is
used as the argument type for the first arg of `__cxa_atexit`) in the
function call.
---------
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[Utils] Modernize type annotations in git-llvm-push
Import annotations from __future__ so we can start using more modern
annotations now rather than once we move to Python 3.10 while still
preserving Python 3.8 compatibility. Also fix a couple typing issues
while here.
Reviewers: ilovepi, petrhosek
Pull Request: https://github.com/llvm/llvm-project/pull/186690
[IR][NFC] Inline CmpInst::isSigned/isUnsigned (#186791)
These are small helper functions that are called somewhat often, so
inlining is beneficial.
A very minor improvement. Nonetheless, these two functions are
called somewhat regularly and compile to three instructions each,
so it is always beneficial to inline them.
[DWARFVerifier] Fix infinite loop in verifyDebugInfoCallSite (#186413)
When attempting to find the callsite for a DwarfDie to see if it was
valid or not, there was a while loop that incorrectly attempted to walk
up the Die parent hierarch. It set `curr` to parent, but then `curr` was
set to same original parent instead of curr.getParent(). This caused
infinite recursion on validation of some kernel binaries by
llvm-dwarfdump where DW_TAG_call_site was nested inside a
DW_TAG_lexical_block (or any non-subprogram, non-inlined_subroutine
tag).
Fix by changing Die.getParent() to Curr.getParent() so the loop
correctly walks up the DIE tree.
Add a new test that validates this scenario. Without this change, that
test hangs rather than succeeding.
AMDGPU: Don't limit VGPR usage based on occupancy in dVGPR mode (#185981)
The maximum VGPR usage of a shader is limited based on the target
occupancy,
ensuring that the targeted number of waves actually fit onto a CU/WGP.
However, in dynamic VGPR mode, we should not do that, because VGPRs are
allocated
dynamically at runtime, and there are no static constraints based on
occupancy.
Fix that in this patch.
Also fixup the getMinNumVGPRs helper to behave consistently by always
returning
zero in dVGPR mode.
This also fixes a problem where AMDGPUAsmPrinter bumps the VGPR usage to
at least
the result of getMinNumVGPRs, per my understanding in order to avoid an
occupancy
[2 lines not shown]
[AArch64] Add partial reduce patterns for new sve dot variants (#184649)
This patch enables generation of new dot instruction added in 2025 arm
extension from partial reduce nodes.
[IR][NFC] Hot-cold splitting in PatternMatch (#186777)
ConstantAggregates are rare, therefore split that check into a separate
function so that the fast path can be inlined.
Likewise for vectors, which occur much less frequently than scalar
values.
[NFC][analyzer] Refactor ExprEngine::processCallExit (#186182)
This commit converts `ExprEngine::processCallExit` to the new paradigm
introduced in 1c424bfb03d6dd4b994a0d549e1f3e23852f1e16 where the current
`LocationContext` and `Block` is populated near the beginning of the
`dispatchWorkItem` call (= elementary analysis step) and remains
available during the whole step.
Unfortunately the first half of the `CallExit` procedure (`removeDead`)
happens within the callee context, while the second half (`PostCall` and
similar callbacks) happen in the caller context -- so I need to change
the current `LocationContext` and `Block` at the middle of this big
method.
This means that I need to discard my invariant that
`setCurrLocationContextAndBlock` is only called once per each
`dispatchWorkItem`; but I think this exceptional case (first half in
callee, second half in caller) is still clear enough.
In addition to this main goal, I perform many small changes to clarify
and modernize the code of this old method.
[ADT] Add `Repeated<T>` for memory-efficient repeated-value ranges (#186721)
Introduce a lightweight range representing N copies of the same value
without materializing a dynamic array. The range owns this value.
I plan to use it with MLIR APIs that often end up requiring N copies of
the same thing. Currently, we use `SmallVector<T>(N, Val)` for these,
which is wasteful.
---------
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[LLVM][CodeGen][SVE] insert_subvector(undef, splat(C), 0) -> splat(C). (#186090)
When converting a fixed-length constant splats to scalable vector we can
instead regenerate the splat using the target type.