[MLIR][OpenMP] Simplify OpenMP device codegen (#137201)
After removing host operations from the device MLIR module, it is no
longer necessary to provide special codegen logic to prevent these
operations from causing compiler crashes or miscompilations.
This patch removes these now unnecessary code paths to simplify codegen
logic. Some MLIR tests are now replaced with Flang tests, since the
responsibility of dealing with host operations has been moved earlier in
the compilation flow.
MLIR tests holding target device modules are updated to no longer
include now unsupported host operations.
[Flang][OpenMP] Minimize host ops remaining in device compilation (#137200)
This patch updates the function filtering OpenMP pass intended to remove
host functions from the MLIR module created by Flang lowering when
targeting an OpenMP target device.
Host functions holding target regions must be kept, so that the target
regions within them can be translated for the device. The issue is that
non-target operations inside these functions cannot be discarded because
some of them hold information that is also relevant during target device
codegen. Specifically, mapping information resides outside of
`omp.target` regions.
This patch updates the previous behavior where all host operations were
preserved to then ignore all of those that are not actually needed by
target device codegen. This, in practice, means only keeping target
regions and mapping information needed by the device. Arguments for some
of these remaining operations are replaced by placeholder allocations
and `fir.undefined`, since they are only actually defined inside of the
[4 lines not shown]
[LoopInterchange] Initialize new_var to InitValue on first iteration (#178370)
Fixed a bug found during testing:
- If it is the first iteration, `new_var` should be initialized to
'InitValue'.
[Clang] avoid assertion in __underlying_type for enum redeclarations (#177984)
Fixes #177943
---
This patch addresses cases where `__underlying_type` is used with enum
redeclarations. The previously added assertion
(https://github.com/llvm/llvm-project/pull/155900) treated a missing
`int` on the referenced `EnumDecl` as an indicator of a _demoted
definition_, while this condition can also occur for redeclarations.
[LLVM][DAGCombiner] Look through freeze when combining extensions of loads (#175022)
Following on from https://github.com/llvm/llvm-project/pull/172484 I
have added support to tryToFoldExtOfLoad for looking through freezes, in
order to catch more cases of extending loads. This type of code is
sometimes seen being generated by the loop vectoriser. For now I've
limited this to cases where the load is only used by the freeze, since
otherwise it leads to worse code in some X86 tests.
[lldb] Refactor command option printing (#178208)
So I have an easier time fixing #177570.
Changes I have made:
* Init a variable inside if statement to reduce scope.
* Added const to some variables.
* Early return if we print a single line, and dedent the "else" that
handles multiple lines.
* Only convert lldb's short codes into ansi codes once.
* Rename a couple of variables where they could have either referred to
the visible text or the raw data with the ansi codes in.
[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128
Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:
```
All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
that are currently dependent on FEAT_D128 are updated to be dependent
on FEAT_D128 or FEAT_TLBID
```
[lldb] Fix UbSan decorator (#177964)
the ubsan decorator previously assumes the platform is macOS.
macOS has an extra underscore in symbols names match two or more.
uses the llvm-nm that is built instead of the system's nm.
[MLIR][OpenMP] Simplify OpenMP device codegen
After removing host operations from the device MLIR module, it is no longer
necessary to provide special codegen logic to prevent these operations from
causing compiler crashes or miscompilations.
This patch removes these now unnecessary code paths to simplify codegen logic.
Some MLIR tests are now replaced with Flang tests, since the responsibility of
dealing with host operations has been moved earlier in the compilation flow.
MLIR tests holding target device modules are updated to no longer include now
unsupported host operations.
[Flang][OpenMP] Minimize host ops remaining in device compilation
This patch updates the function filtering OpenMP pass intended to remove host
functions from the MLIR module created by Flang lowering when targeting an
OpenMP target device.
Host functions holding target regions must be kept, so that the target regions
within them can be translated for the device. The issue is that non-target
operations inside these functions cannot be discarded because some of them hold
information that is also relevant during target device codegen. Specifically,
mapping information resides outside of `omp.target` regions.
This patch updates the previous behavior where all host operations were
preserved to then ignore all of those that are not actually needed by target
device codegen. This, in practice, means only keeping target regions and mapping
information needed by the device. Arguments for some of these remaining
operations are replaced by placeholder allocations and `fir.undefined`, since
they are only actually defined inside of the target regions themselves.
[3 lines not shown]
[mlir] Fix integer overflow in ShapedType::getNumElements and `makeCanonicalStridedLayoutExpr` (#178395)
Add to `ShapedTypeInterface` a new `tryGetNumElements()` API which
returns `std::optional<int64_t>` - returns `std::nullopt` on overflow
instead of UB, using `llvm::checkedMul` for proper overflow detection.
`getNumElements()` now uses this new API to assert on overflow.
Also fix `AffineExpr` canonicalization to avoid crashing on overflow
using `llvm::checkedMul`.
Fixes #178362
Fixes #177816
---------
Co-authored-by: Claude Opus 4.5 <noreply at anthropic.com>
[AArch64][Driver] Enable host supported features with march=native. (#177128)
Currently, march=native enables the base features implied by the host
system architecture, such as Armv8.2-A, Armv9-A, etc, rather than the
actual features supported by the host (e.g. crypto). This is suboptimal
as it generally leaves optional but supported features disabled.
This patch aligns the behaviour of march=native with mcpu=native by
using the latter's decoding logic to decode the former as well. This
means both options should enable a similar set of features. We also set
the target-cpu accordingly, so that march=native becomes a drop-in
replacement for mcpu=native.