[AArch64][GlobalISel] Extend smaller than i32 gpr loads/stores in RegBankSelect. (#175810)
A i8 / i16 load and store is only legal for FPR registers. This patch extends
the types on i8/i16 G_LOADS and G_STORES to i32 using anyext / trunc, so that
selection can be simpler and does not need to handle illegal operations.
This can leave some anyext(trunc) operations that could be removed yet but
should be possible to optimize away.
Add persistent option to cache plugin
This commit adds ability to persistently set cache entries
(survives across middleware restarts / reboots, but not system
upgrades), and set clustered cache entries (ditto about
lifecycle).
[AMDGPU] Further improve `AMDGPUSubtargetFeature` multiclass (#177077)
This PR extends the multiclass to support two additional parameters: one
for specifying whether an `AssemblerPredicate` should be generated, and
another for dependent `SubtargetFeatures`. This allows 15 more
definitions to be converted to use the multiclass.
[SLP]Correctly handle vector nodes, coming from same incoming blocks in PHI nodes
If multiple nodes are generated from same PHI node for the same block,
still need to vectorize vector nodes, even if the value for the incoming block was already emitted.
Fixes #177124
[lldb][cmake] Fix standalone Xcode build header staging (#177033)
The LLDB standalone build using Xcode fails because the staging
directory custom command output is attached to multiple
liblldb-stage-header-* targets, but none of these targets depend on each
other. Xcode's new build system doesn't allow this.
This creates a new target `liblldb-header-staging-dir` that depends on
the staging directory creation, and makes all header staging targets
depend on it instead of directly depending on the directory in their
custom commands. This ensures all targets share a common dependency,
satisfying Xcode's build system requirements.
[ROCDL] Refactored MFMA ops in ODS; added constraints (#175775)
This PR improves the ROCDL MFMA intrinsics by making their operand and
result types explicit in the IR and by modeling immediate arguments
(immargs) as attributes rather than opaque operands.
This brings MFMA intrinsics in line with recent changes made to ROCDL
WMMA operations, where intrinsic signatures were clarified to avoid
treating them as an unstructured “blob of arguments”.
Add persistent option to cache plugin
This commit adds ability to persistently set cache entries
(survives across middleware restarts / reboots, but not system
upgrades), and set clustered cache entries (ditto about
lifecycle).
[AMDGPU] Further improve `AMDGPUSubtargetFeature` multiclass
This PR extends the multiclass to support two additional parameters: one for specifying whether an `AssemblerPredicate` should be generated, and another for dependent `SubtargetFeatures`. This allows 15 more definitions to be converted to use the multiclass.
[NFCI][AMDGPU] Use X-macro to reduce boilerplate in `GCNSubtarget.h` (#176844)
`GCNSubtarget.h` contained a large amount of repetitive code following
the pattern `bool HasXXX = false;` for member declarations and `bool
hasXXX() const { return HasXXX; }` for getters. This boilerplate made
the file unnecessarily long and harder to maintain.
This patch introduces an X-macro pattern `GCN_SUBTARGET_HAS_FEATURE`
that consolidates 135 simple subtarget features into a single list. The
macro is expanded twice: once in the protected section to generate
member variable declarations, and once in the public section to generate
the corresponding getter methods. This reduces the file by approximately
600 lines while preserving the exact same API and functionality.
Features with complex getter logic or inconsistent naming conventions
are left as manual implementations for future improvement.
Ideally, these could be generated by TableGen using
`GET_SUBTARGETINFO_MACRO`, similar to the X86 backend. However,
`AMDGPU.td` has several issues that prevent direct adoption: duplicate
[5 lines not shown]
[win][x64] Unwind v2: Avoid non-terminator instructions after terminator by using different psuedo for splitting frame infos (#177007)
After merging #159206 the new tests added would fail when verifying
machine code instructions with:
```
*** Bad machine code: Non-terminator instruction after the first terminator ***
- function: has_funclet
- basic block: %bb.4 call.block.4 (0x8000f837e8)
- instruction: SEH_SplitChained
First terminator was: RET64 $eax
*** Bad machine code: Non-terminator instruction after the first terminator ***
- function: has_funclet
- basic block: %bb.4 call.block.4 (0x8000f837e8)
- instruction: SEH_EndPrologue
First terminator was: RET64 $eax
```
[3 lines not shown]
Add persistent option to cache plugin
This commit adds ability to persistently set cache entries
(survives across middleware restarts / reboots, but not system
upgrades), and set clustered cache entries (ditto about
lifecycle).
[NFC][NVVM][NVPTX] Moved common code for tcgen05.mma to the base class (#176327)
This change moves common code parts for `tcgen05.mma` intrinsics to a
separate base class. It removes code duplication and increases
readability. There are no functional changes.
[bazel][IR2Vec] Exclude ir2vec python bindings from main tool (#177230)
#176571 adds python bindings (using nanobind) in a subdirectory of the
ir2vec tool dir, but the bazel target just globs everything. Exclude the
bindings directory, which needs to be built in a special way.
[AMDGPU] Remove intrinsic declarations in a couple tests, NFC (#177218)
There is no need to explicitly declare intrinsic now. In this PR, we
only
remove intrinsic declarations in the two tests recently touched.
[OpenACC][MLIR] clone reduction operands during ACCIfClauseLowering (#177196)
Clone the reduction operands into the compute region side. This also
fixes an issue where references to acc.reduction remain on the host
side.
[TableGen] Add MatchNumber to CheckChildSameMatcher::printImpl. NFC
Make the formatting more consistent with other child matchers.
This function is only used for debugging so it doesn't change
the output.
bar syntax and only print input if different from output.
Breaks update_test_checks Function Attrs comment check in the rare
case where the modes mismatch.