[AMDGPU] Ensure v_mfma_scale_f32_{16x16x128|32x32x64}_f8f6f4 instructions are convergent (#178627)
The scaled variants of mfma instructions are not properly marked as
"convergent" and hence the machine-sink pass sinks them which is
incorrect.
This patch ensures that the instructions get marked as "convergent". The
new test also covers other mfma variants, but only the scale variants
are mistreated without the changes from this patch.
[clang-tools-extra][docs] Add documentation for clang-reorder-fields (#178446)
Add comprehensive documentation for the clang-reorder-fields tool,
addressing #35520. The tool has existed in the repository but was
previously undocumented.
The documentation includes:
- Basic usage examples for C and C++ structs/classes
- Constructor initializer list reordering
- Designated initializer support (C++20)
- Detailed limitations and caveats
- Command line option reference
- Common use cases (memory layout optimization, etc.)
Fixes #35520
---------
Co-authored-by: EugeneZelenko <eugene.zelenko at gmail.com>
[lldb-dap] Conditionally check UBSan stack trace on Darwin only (#178655)
non-darwin platforms may have incorrect stop information location
heuristics. Enable assertion once UBSan stopInfo heuristic is updated.
I hit this locally, I don't see it hitting any CI bot but should, Mostly
likely the CI linux bots may not have `compiler_rt` run time enabled.
see
https://github.com/llvm/llvm-project/pull/177964#discussion_r2732271531
[AMDGPU] Have VCC as a first-class member of the SGPR pool.
Add VCC and tuples using VCC to SGPR register classes.
We already support VCC as an allocatable register for 32-bit SGPR
operands, so it seems most natural to support it for register
tuple operands as well.
s106/s107 are still not allowed as aliases of vcc_lo/hi in
AsmParser.
The names given to the VCC tuples match those produced by SP3,
though it feels like there is room for improvement.
https://github.com/llvm/llvm-project/issues/62651
[openmp] Build doxygen in bootstrapping builds (#178298)
When LLVM_ENABLE_DOXYGEN=ON, forward the `doxygen-openmp` build target
from the nested (default target) runtimes build. When
LLVM_BUILD_DOCS=ON, also trigger `doxygen-build` with `ninja doxygen`.
LLVM_INCLUDE_DOCS=ON is required in the runtimes build, which is the
default.
This is required to update the OpenMP doxygen documentation at
https://openmp.llvm.org/doxygen by the publish-doxygen-docs buidbot,
discussed here:
https://github.com/llvm/llvm-zorg/pull/716#pullrequestreview-3713032311
[Clang] Try to fix HIPSPV tests after #168043
Summary:
https://github.com/llvm/llvm-project/pull/168043 seems to not have
specified the target triple for the tests so different architectures
fail these tests. Try to set it manually. If this doesn't clear up the
bots I'll revert both.
[ExpandIRInsts] Test fptoi expansion for small types
Allow testing fptoui/fptosi on half types, which are small enough
for alive2 to verify the result.
They currently pass for non-undef/poison input. (The fptoui
expansion is the same as fptosi, which is confusing, but not
incorrect, because the saturation it performs is not actually
required by fptoi.)
[AArch64][SME2] Allow lowering to whilelo.x2 in non-streaming mode (#178399)
Since #145322 relaxed the SME predicate for the multi-register while
instructions, these instructions are allowed in non-streaming mode
when SME2 is available.
This patch removes the isStreaming() restriction from both
performActiveLaneMaskCombine & ReplaceGetActiveLaneMaskResults,
allowing the whilelo.x2 intrinsic to be used if SVE or streaming
SVE is available.
[Clang] Lift HIPSPV onto the new offload driver (#168043)
Update HIPSPV toolchain to support `--offload-new-driver`. Additionally,
tailor `llvm-spirv` invocation for
[chipStar](https://github.com/CHIP-SPV/chipStar) via
`spirv64-*-chipstar` offload triple.
Depends on one commit from #170467 and one from #170655.
---------
Co-authored-by: Henry Linjamäki <henry.mikael.linjamaki at intel.com>
Co-authored-by: Joseph Huber <huberjn at outlook.com>
[mlir][Linalg] Preserve discardable/user-defined attributes during generalization (#178599)
-- As observed in a [downstream
project](https://github.com/iree-org/iree/pull/23294#discussion_r2734982998)
: the named to generize linalg op conversion wasn't preserving
discardable attributes.
-- This commit aims to fix the same.
-- Only a single test case is added as the change applies to any named
linalg op's generalization.
Signed-off-by: Abhishek Varma <abhvarma at amd.com>
[MLIR][Arith] Ensure ConstantOp validates signless integers for vectors (#177857)
Fixes #177818
`arith::ConstantOp::isBuildableWith()` was only checking scalar integers
for signlessness, allowing signed vector element types to pass
validation incorrectly.
---------
Co-authored-by: Milos Poletanovic <mpoletanovic at syrmia.com>
[MLIR][OpenMP] Simplify OpenMP device codegen (#137201)
After removing host operations from the device MLIR module, it is no
longer necessary to provide special codegen logic to prevent these
operations from causing compiler crashes or miscompilations.
This patch removes these now unnecessary code paths to simplify codegen
logic. Some MLIR tests are now replaced with Flang tests, since the
responsibility of dealing with host operations has been moved earlier in
the compilation flow.
MLIR tests holding target device modules are updated to no longer
include now unsupported host operations.
[Flang][OpenMP] Minimize host ops remaining in device compilation (#137200)
This patch updates the function filtering OpenMP pass intended to remove
host functions from the MLIR module created by Flang lowering when
targeting an OpenMP target device.
Host functions holding target regions must be kept, so that the target
regions within them can be translated for the device. The issue is that
non-target operations inside these functions cannot be discarded because
some of them hold information that is also relevant during target device
codegen. Specifically, mapping information resides outside of
`omp.target` regions.
This patch updates the previous behavior where all host operations were
preserved to then ignore all of those that are not actually needed by
target device codegen. This, in practice, means only keeping target
regions and mapping information needed by the device. Arguments for some
of these remaining operations are replaced by placeholder allocations
and `fir.undefined`, since they are only actually defined inside of the
[4 lines not shown]
[LoopInterchange] Initialize new_var to InitValue on first iteration (#178370)
Fixed a bug found during testing:
- If it is the first iteration, `new_var` should be initialized to
'InitValue'.
[Clang] avoid assertion in __underlying_type for enum redeclarations (#177984)
Fixes #177943
---
This patch addresses cases where `__underlying_type` is used with enum
redeclarations. The previously added assertion
(https://github.com/llvm/llvm-project/pull/155900) treated a missing
`int` on the referenced `EnumDecl` as an indicator of a _demoted
definition_, while this condition can also occur for redeclarations.
[LLVM][DAGCombiner] Look through freeze when combining extensions of loads (#175022)
Following on from https://github.com/llvm/llvm-project/pull/172484 I
have added support to tryToFoldExtOfLoad for looking through freezes, in
order to catch more cases of extending loads. This type of code is
sometimes seen being generated by the loop vectoriser. For now I've
limited this to cases where the load is only used by the freeze, since
otherwise it leads to worse code in some X86 tests.
[lldb] Refactor command option printing (#178208)
So I have an easier time fixing #177570.
Changes I have made:
* Init a variable inside if statement to reduce scope.
* Added const to some variables.
* Early return if we print a single line, and dedent the "else" that
handles multiple lines.
* Only convert lldb's short codes into ansi codes once.
* Rename a couple of variables where they could have either referred to
the visible text or the raw data with the ansi codes in.
[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128
Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
that are currently dependent on FEAT_D128 are updated to be dependent
on FEAT_D128 or FEAT_TLBID
```