[InstCombine] Relax the requirements for (X ^ C2) + C -> (C2 + C) - X (#196897)
If (C2 - X) has no borrow between bits, it is equivalent to (X ^ C2).
A borrow would occur when c2_bit=0 and x_bit=1.
It follows that c2_bit=1 or x_bit=0 means no borrow.
Remove an artificial condition that C2 must be a low bits mask.
Proof: https://alive2.llvm.org/ce/z/uNMsg_
[lld] Remove unused argument of DataExtractor constructor (NFC) (#196361)
`AddressSize` parameter is not used by `DataExtractor` and will be
removed in the future. See #190519 for more context.
[AArch64] Guard against vector invalidation in EmitAArch64CpuSupports. (#196909)
This prevents the Vector from being invalidated whilst iterator over it.
As far as I can tell we were adding elements twice.
Fixes #196789
[ValueTracking] Handle sext, zext in computeConstantRange
Propagate constant ranges through sign extension, zero extension.
Extends the existing handling for truncations.
[MachineBlockPlacement] Fix use-after-erase (#197109)
`ComputedEdges.erase(FoundEdge)` invalidates `FoundEdge`, but the
function then returns `FoundEdge->second`. Read the bucket value into
a local before erasing.
[AA] Respect potential synchronization effects of inline asm (#196965)
Respect potential synchronization effects of inline assembly calls on
not-yet-escaped memory.
We only do this if the call is both non-nosync and ModRefs "other"
memory. This is consistent with the atomic memory effects established in
https://github.com/llvm/llvm-project/pull/193768 and makes sure that
things like readonly/argmemonly continue to work as expected even for
frontends that do not emit nosync (which, right now, is all of them).
The limitation to inline asm should not actually exist: The issue
applies to all calls. This just fixes a particularly important case in a
targeted way. (The fact that inline asm memory barrier do not work as
expected is a problem for making optimizations of monotonic accesses
more aggressive, e.g. it caused issues for
https://github.com/llvm/llvm-project/pull/195015.)
The ability of inline asm (with a `~{memory}` clobber) to synchronize
was explicitly specified in
https://github.com/llvm/llvm-project/pull/150191.
[PowerPC] Fix types when emitting ppc_altivec_vupklsw (#187789)
When lowering BUILD_VECTOR, we produce this intrinsic node, but fail to
adjust the input/output types to ensure ISel works.
This patch simply adds the necessary bitcasts.
Fixes: https://github.com/llvm/llvm-project/issues/175297
[clang][bytecode] Pass correct QualType to getFixedPointSemantics() (#196952)
The expression type might be different, so pass the QualType we have at
hand.
[AArch64] Add a regression test for Apple tuning features(NFC) (#196792)
This patch adds a TableGen regression test that directly checks complete
featrure lists per generation for Apple CPUs, to guard against changes
that can break the <CPU,features> association if we lack indirect
coverage.
A followup patch should introduce generational delta encoding for Apple
tuning features that this test should help verify.
[clang][bytecode] Fix a crash with invalid ArraySubscriptExprs (#196964)
In the attached test case, `arr` becomes the _index_, not the base,
which causes us later to run into issues because the index is a pointer
and not an integer.
libclc: Pass LLVM_NATIVE_TOOL_DIR to runtime builds (#196498)
This patch sets `LLVM_NATIVE_TOOL_DIR` in the runtime build
configuration to point to the directory containing the just-built LLVM
tools, allowing libclc to find them without requiring them to be
installed on the host system.
Fixes build errors like:
```
Error evaluating generator expression: $<TARGET_FILE:opt>
No target "opt"
```
A few lines above this change, `extra_deps` list of dependencies for
libclc is created. But those tools don't get build in the runtime build.
We build libclc in the monolithic build and there we have all the tools
which is why I've added the path to discover the tools.
[GlobalISel] Defer RegBankSelect operand mapper creation (#196985)
RegBankSelect::applyMapping constructs an OperandsMapper before applying
repairs. Default mappings that only need Reassign repairs only update
the register bank and do not create replacement operands, so the generic
applyDefaultMapping path has no rewriting work to do in that case.
Defer OperandsMapper creation until an Insert repair actually needs new
virtual registers. If no mapper was needed for a default mapping, return
after applying the repairs.
CTMark geomean -0.23% improvement on aarch64-O0-g.
https://llvm-compile-time-tracker.com/compare.php?from=ed50ea52004259af958bb3e5636268342c49ee62&to=1a4730426e14969626cad43c6b06e93bde707bd1&stat=instructions%3Au
Assisted-by: Codex
[clang] Fix x86_64-windows-msvc over- and under-alignment (#196505)
This fixes two issues where Clang was both over- and under-aligning
variables:
1) We were applying the x86_64 Sys V psABI "large array" alignment increase
(default when inheriting from X86_64TargetInfo), but MSVC doesn't follow
that ABI.
2) MSVC implements a similar scheme though, where it increases the
alignment of large objects. This is documented for ARM64 [1] and was
implemented in Clang b7c6d95af5e295c560d1445e7090e31eb9289932, but it
also applies to x86_64. ([2] says "MSVC does size (total size, not
element size) based alignment for global symbols on ARM64 *which is
copied from AMD64*").
This patch stops doing 1) and implements 2) for x86_64-windows-msvc.
[1]
[4 lines not shown]
[VPlan] Add SCEV support for abs intrinsic (#195678)
Teach `getSCEVExprForVPValue` to model `llvm.abs` via
`ScalarEvolution::getAbsExpr`, preserving the intrinsic's
is_int_min_poison flag as the SCEV IsNSW argument. Add a unit test
covering both poison and wrapping llvm.abs forms.
[Instrumentor] Allow multiple config files with different filters
To instrument different functions in different ways we allow to provide
multiple config files now. Each file will result in one instrumentation
run. Multiple files can be passed via command line option or listed in
a "summary" file that is passed via command line option (to keep the
command length managable).
[Instrumentor] Add a global function regexp to limit the instrumentation
Only functions that match the "function_regex" will be instrumented,
or if they have the instrumentation attribute.
[Instrumentor] Add unreachable support; unreachable stack trace printing
Allow to instrument unreachable and provide a use case for stack trace
printing.