[GlobalISel][AMDGPU][AArch64] Fix GlobalISel copy propagation (#188781)
Disallow propagation of sub-registers after GlobalISel, as the current
code is blindly dropping any sub-register information. This also fixes
bugs in AArch64 and AMDGPU back-end that rely on the incorrect behavior
and would fail with the fix:
* Update `selectG_UNMERGE_VALUES` in AMDGPU so instead of generating
`hi16` for SGPR it shifts higher bits into the destination register
using `lshr`.
* Prevent AArch64 back-end from generating spurious `sub_32:gpr32all`
when selecting copy.
* Test changes: `fpto[s/u]i-sat-vector.ll`: The correct number of
conversions is now generated as higher 16-bits are handled correctly;
however, it introduces `lshr` instructions. This should be resolved in
#188287 by enabling `s_cvt_hi_*`.
[TableGen] Add submulticlass typechecking to template arg values (#197128)
Some typechecking was missing when parsing a submulticlass reference.
Add the CheckTemplateArgValues call in ParseSubMultiClassReference.
Resolves https://github.com/llvm/llvm-project/issues/84910.
[LifetimeSafety] Diagnose invalidated-field (#196680)
Teach lifetime safety invalidation diagnostics to handle origins that
escape through fields before the referenced object is invalidated.
Previously they were skipped.
Partially addresses https://github.com/llvm/llvm-project/issues/195706
[InstCombine] Relax the requirements for (X ^ C2) + C -> (C2 + C) - X (#196897)
If (C2 - X) has no borrow between bits, it is equivalent to (X ^ C2).
A borrow would occur when c2_bit=0 and x_bit=1.
It follows that c2_bit=1 or x_bit=0 means no borrow.
Remove an artificial condition that C2 must be a low bits mask.
Proof: https://alive2.llvm.org/ce/z/uNMsg_
[lld] Remove unused argument of DataExtractor constructor (NFC) (#196361)
`AddressSize` parameter is not used by `DataExtractor` and will be
removed in the future. See #190519 for more context.
[AArch64] Guard against vector invalidation in EmitAArch64CpuSupports. (#196909)
This prevents the Vector from being invalidated whilst iterator over it.
As far as I can tell we were adding elements twice.
Fixes #196789
[ValueTracking] Handle sext, zext in computeConstantRange
Propagate constant ranges through sign extension, zero extension.
Extends the existing handling for truncations.
[MachineBlockPlacement] Fix use-after-erase (#197109)
`ComputedEdges.erase(FoundEdge)` invalidates `FoundEdge`, but the
function then returns `FoundEdge->second`. Read the bucket value into
a local before erasing.
[AA] Respect potential synchronization effects of inline asm (#196965)
Respect potential synchronization effects of inline assembly calls on
not-yet-escaped memory.
We only do this if the call is both non-nosync and ModRefs "other"
memory. This is consistent with the atomic memory effects established in
https://github.com/llvm/llvm-project/pull/193768 and makes sure that
things like readonly/argmemonly continue to work as expected even for
frontends that do not emit nosync (which, right now, is all of them).
The limitation to inline asm should not actually exist: The issue
applies to all calls. This just fixes a particularly important case in a
targeted way. (The fact that inline asm memory barrier do not work as
expected is a problem for making optimizations of monotonic accesses
more aggressive, e.g. it caused issues for
https://github.com/llvm/llvm-project/pull/195015.)
The ability of inline asm (with a `~{memory}` clobber) to synchronize
was explicitly specified in
https://github.com/llvm/llvm-project/pull/150191.
[PowerPC] Fix types when emitting ppc_altivec_vupklsw (#187789)
When lowering BUILD_VECTOR, we produce this intrinsic node, but fail to
adjust the input/output types to ensure ISel works.
This patch simply adds the necessary bitcasts.
Fixes: https://github.com/llvm/llvm-project/issues/175297
[clang][bytecode] Pass correct QualType to getFixedPointSemantics() (#196952)
The expression type might be different, so pass the QualType we have at
hand.
[AArch64] Add a regression test for Apple tuning features(NFC) (#196792)
This patch adds a TableGen regression test that directly checks complete
featrure lists per generation for Apple CPUs, to guard against changes
that can break the <CPU,features> association if we lack indirect
coverage.
A followup patch should introduce generational delta encoding for Apple
tuning features that this test should help verify.
[clang][bytecode] Fix a crash with invalid ArraySubscriptExprs (#196964)
In the attached test case, `arr` becomes the _index_, not the base,
which causes us later to run into issues because the index is a pointer
and not an integer.
libclc: Pass LLVM_NATIVE_TOOL_DIR to runtime builds (#196498)
This patch sets `LLVM_NATIVE_TOOL_DIR` in the runtime build
configuration to point to the directory containing the just-built LLVM
tools, allowing libclc to find them without requiring them to be
installed on the host system.
Fixes build errors like:
```
Error evaluating generator expression: $<TARGET_FILE:opt>
No target "opt"
```
A few lines above this change, `extra_deps` list of dependencies for
libclc is created. But those tools don't get build in the runtime build.
We build libclc in the monolithic build and there we have all the tools
which is why I've added the path to discover the tools.
[GlobalISel] Defer RegBankSelect operand mapper creation (#196985)
RegBankSelect::applyMapping constructs an OperandsMapper before applying
repairs. Default mappings that only need Reassign repairs only update
the register bank and do not create replacement operands, so the generic
applyDefaultMapping path has no rewriting work to do in that case.
Defer OperandsMapper creation until an Insert repair actually needs new
virtual registers. If no mapper was needed for a default mapping, return
after applying the repairs.
CTMark geomean -0.23% improvement on aarch64-O0-g.
https://llvm-compile-time-tracker.com/compare.php?from=ed50ea52004259af958bb3e5636268342c49ee62&to=1a4730426e14969626cad43c6b06e93bde707bd1&stat=instructions%3Au
Assisted-by: Codex
[clang] Fix x86_64-windows-msvc over- and under-alignment (#196505)
This fixes two issues where Clang was both over- and under-aligning
variables:
1) We were applying the x86_64 Sys V psABI "large array" alignment increase
(default when inheriting from X86_64TargetInfo), but MSVC doesn't follow
that ABI.
2) MSVC implements a similar scheme though, where it increases the
alignment of large objects. This is documented for ARM64 [1] and was
implemented in Clang b7c6d95af5e295c560d1445e7090e31eb9289932, but it
also applies to x86_64. ([2] says "MSVC does size (total size, not
element size) based alignment for global symbols on ARM64 *which is
copied from AMD64*").
This patch stops doing 1) and implements 2) for x86_64-windows-msvc.
[1]
[4 lines not shown]