[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only
In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.
Reviewers: RKSimon, hiraditya, bababuck
Pull Request: https://github.com/llvm/llvm-project/pull/197763
[AMDGPU] Use shorter form for i16 operands
For 16-bit operands an inline constant is zero extended
which in particular allows to use FP constants. These
will have 16 bits of zeroes in the high half and FP16
value in the low 16 bits.
[flang-rt] Rework findloc.cpp to dispatch target at runtime (#197756)
Summary:
The previous code had a combinatorial explosion of functions by
templating on both the source and target types. This created around 170
instantiations. Instead we just template on the source type and then use
a simple runtime check. This should not affect performance in a
significant way, it introduces maybe a few branches in what is already a
non-trivial operation that I do not think justifies a two-minute compile
time.
The result is that this file goes from 120 seconds to 12 on my machine
and the resulting file goes from 7.2 MiB to 757 kiB. Functinally this
makes us instantiate 1/10th the functions.
[dsymutil] Collect .cas-config files in dSYM bundles (#197818)
When caching is enabled the Swift compiler might substitute CAS
identifiers for on-disk paths. In order to resolve them the build system
puts a .cas-config file in the build directory. Dsymutil needs to
collect the contents of these files so tools consuming the dSYM (which
do not have access to the original build directory) can resolve these
CAS identifiers, too.
Assisted-by: claude
rdar://169986664
[AMDGPU] Optimize fcanonicalize/fneg/fsub with packed bf16 math ops (#197318)
This work makes fcanonicalize v2bf16 'Legal' and implements the
selection pattern for it with v_pk_mul_bf16.
We also make fneg and fabs 'Legal' in this patch. With this change,
packed fadd can be selected for vector fsub with bf16. Also, the vector
fneg can be successfully folded into the operand in the packed bf16 math ops.
[AArch64] Do not pass debug insn to liveness analysis (#197989)
This time in the MachineSMEABIPass, where we can skip any debug
instructions. See #193104.
[lldb][NFC] Socket::Send should return a signed type (#197844)
The underlying `::send` and `::sendto` returns `ssize_t` on macOS/Linux
and `int` on Windows. These functions also commmunicate an error through
returning -1 (which is not expressible in a size_t).
[CodeExtractor] fix use list iterator invalidation (#197986)
Fix crash in HotColdSplit that uses CodeExtractor to outline cold
functions from https://github.com/llvm/llvm-project/pull/191824.
When `CodeExtractor::insertReplacerCall` replaces the outlined function
return value, calling `replaceUsesOfWith` invalidates the users iterator
causing the loop exit early without having replaced all of the original users
of `FuncRetVal`.
```
Referring to an instruction in another function!
%s.sroa.0.0 = phi ptr [ %call.i, %codeRepl ], [ undef, %entry ]
LLVM ERROR: Broken module found, compilation aborted!
```
Reproducer: https://godbolt.org/z/G5qv35nnq
[ARM][NFC] Test that debug insns are not passed to stepBackward (#197976)
Test that debug insns are not passed to stepBackward. This testcase
shows the necessity of https://github.com/llvm/llvm-project/pull/197769.
Signed-off-by: John Lu <John.Lu at amd.com>
[Flang][OpenMP] Support iterator modifiers in map and motion clauses
Support iterated array elements and array sections in map and motion clauses for
target data, target enter data, target exit data, and target update constructs.
Preserve mapper resolution for iterated entries, including explicit mappers,
user-defined default mappers, declare mapper entries, and implicit default
mappers.
This PR stacked on top of #197047 and #197752.
This patch is part of the feature work for #188061.
Assisted with copilot.
[lldb] Make use of ConstString(StringRef) where applicable (NFC) (#197954)
Replace `ConstString(char *)`, mostly with the `StringRef` constructor.
Eliminates some unnecessary `strlen`, and a few copies.
[flang-rt] Move template parameters to runtime to improve compile times (#197738)
Summary:
These files take an extraordinarily long amount of time to compile. This
is due to the combinatorial explosino of template parameters. This PR
takes a few choice factors and puts them as runtime components instead.
This should have no real impact on runtime with optimizations, but
brings the compilation for these from 60s each to 30s on my machine. A
future PR will split these up by component so we get more build-system
parallelism.
[-Wunsafe-buffer-usage] Warning-only analysis no longer re-analyzes pre-compiled code
Move the warning-only analysis back to the end of parsing each Decl.
The warning-only analysis no longer does any extra AST deserialization.
Pre-compiled code will only be analyzed once during its own compilation.
When `-fsafe-buffer-usage-suggestions` is used, the behavior is the
same as before, because it requires visibility of the whole
translation unit.
rdar://177185295
[lldb][test] Don't overwrite existing decorators in _skipForVariant (#197409)
_skipForVariant uses a dictionary to save the decorator functions for
specific variants. However, if the dictionary already contains a
decorator function, then that decorator is overwritten in the current
implementation.
This means that if we have two decorators that skip a variant in
combination with another check, then the second decorator overwrites the
first one and is effectively ignored.
Downstream we are hit by this because we have an embedded Swift variant.
If we skip this variant for both Linux and Windows, then depending on
the decorator order we still run the test on one of those two platforms
(as one decorator is overwritten by the other).
[safestack][test] Fix sigaltstack.c on Solaris (#197967)
The `SafeStack-Standalone-*:: sigaltstack.c` test `FAIL`s on Solaris. It
uses `MAP_STACK` which is unportable and on glibc systems just a no-op.
Therefore this patch provides a fallback definition.
Tested on `x86_64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
[CodeView] Generate debug info for artificial subprograms (#196327)
Based on https://clang.llvm.org/docs/AttributeReference.html#artificial,
Artificial subprograms are not required to have a non-zero line number
location, so don't ignore them.
Fix #195768