[mlir][x86] Lower packed type vector.contract to AMX dot-product (online-packing) (#188192)
A transform pass to lower flat layout `vector.contract` operation to (a)
amx.tile_mulf for BF16, or (b) amx.tile_muli for Int8 packed types via
`online` packing.
TODOs: On an another `patch` planned to re-factor this pass + retiring
`convert-vector-to-amx` pass.
[TargetLowering] Support larger divisors in expandDIVREMByConstant. (#191119)
Instead of bailing out if the original divisor exceeds HBitWidth,
allow divisors that fit in HBitWidth after removing trailing zeros.
PartialRem now needs a low and high part. Shifting RemL left
now needs to handle shifting into RemH.
Assisted-by: Claude Sonnet 4.5
Address review comments
- Revert a lot of mnemonic renames caused by a brute-force sed.
- Add -filetype=null to unsupported test RUN lines
- Regenerate CHECK lines in codegen tests
Assisted-By: Claude Opus 4.6 (1M context)
Skip type check for metadata operands in addTypeCheckPredicate
Metadata is trivially always metadata. So we don't actually need the predicate
introduced in #191389.
[libclc] Enable LLVM_RUNTIME_TARGETS in build system (#189892)
libclc target is now passed in from LLVM_RUNTIME_TARGETS.
The old configure flow based on `-DLLVM_ENABLE_RUNTIMES=libclc` is
deprecated because libclc no longer has a default target.
`-DLLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="<target-triple>"`
still works but it is considered legacy.
The new standard build requires:
Each target must now be selected explicitly on the CMake command line
through the runtimes target-specific cache entry and
LLVM_RUNTIME_TARGETS.
For example:
-DRUNTIMES_amdgcn-amd-amdhsa-llvm_LLVM_ENABLE_RUNTIMES=libclc
-DLLVM_RUNTIME_TARGETS="amdgcn-amd-amdhsa-llvm"
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libclc
-DLLVM_RUNTIME_TARGETS="nvptx64-nvidia-cuda"
-DRUNTIMES_clspv--_LLVM_ENABLE_RUNTIMES=libclc
[17 lines not shown]
[LSR] Use TTI to check if zero-start IV is free in getSetupCost (#190587)
This avoids a downstream regression where LSR prefers {-1,+1}.
When constant zero typically doesn't require preheader initialization
(queried via TTI::getIntImmCost), consider it as free in getSetupCost.
Three test changes are improvements: amx-across-func.ll,
2011-11-29-postincphi.ll and pr62660-normalization-failure.ll.
Other test changes are neutral.
[libclc] Refine generic __clc_get_sub_group_size with fast full sub-group path (#188895)
Add a fast path for the common case that total work-group size is
multiple of max sub-group size.
The fallback path is ported from amdgpu/workitem/clc_get_sub_group_size.cl.
Compiler can generate predicated instructions for the fallback path to
avoid branches.
[LLVM][Intrinsics] Eliminate range check for IIT table in `DecodeIITType` (#190260)
`DecodeIITType` does a range check each time the next entry from the IIT
encoding table is read. This is required to handle IIT encodings that
are in-lined into the `IIT_Table` entries, since the `IITEntries` array
in `getIntrinsicInfoTableEntries` is terminated after the last non-zero
nibble is seen in the inlined encoding (but that may not be the actual
end). Change this code to instead have the `IITEntries` array for the
inlined case point to the full `IITValues` array payload + a IIT_Done
terminator, so that such entries look exactly like they would if they
were encoded in the long encoding table and then remove the range check
in `DecodeIITType` to streamline that code a bit.
Additionally, change some use if 0s (in loop conditions and default
constructed terminator in the IIT long encoding table) to explicitly use
IIT_Done to clarify the code better.
Also use `consume_front()` in a few places instead of `front()` followed
by `slice(1)`.
[NFC][LLVM] Rename IRBuilder/LLVM C API params for overload types (#191674)
Rename IRBuilder and LLVM C API function params for overload types to
use names to better reflect their meaning.
[clang][bytecode] Stop using QualTypes when checking evaluation results (#191732)
They might not match the descriptor contents exactly, so just look at
the descriptors.
[VPlan] Handle calls in VPInstruction:opcodeMayReadOrWriteFromMemory. (#190681)
Retrieve the called function and check its memory attributes, to
determine if a VPInstruction calling a function reads or writes memory.
Use it to strengthen assert in areAllLoadsDereferenceable.
PR: https://github.com/llvm/llvm-project/pull/190681
[clang-format] treat continuation as indent for aligned lines (#191217)
This allows to inherit tabbed indent from the lines we break by the
lines we want to align. Thus in the AlignWithSpaces mode aligned lines
do not generate smaller indent than those they are aligned to.
[Clang][diagtool] Fix memory leak in ShowEnabledWarnings (#191711)
Fix 136-byte memory leak introduced in commit 6dc059ac3c7c. Before
that commit, the TextDiagnosticBuffer was passed to DiagnosticsEngine
constructor which took ownership and managed its lifetime. After the
refactoring, the buffer is no longer passed to DiagnosticsEngine, so
it becomes an orphaned allocation that is never freed. Changed to use
std::unique_ptr for automatic cleanup.