Handle more cases in DebugInfoFinder (#194684)
In #181028 we discovered that DebugInfoFinder is missing some cases.
This corrects several of these. It is hard to know if I found them all.
[Support] Support runtime override for LLVM_WINDOWS_PREFER_FORWARD_SLASH (#199210)
Allow overriding the compile-time LLVM_WINDOWS_PREFER_FORWARD_SLASH
setting at runtime using an environment variable of the same name.
This enables testing both path separator behaviors (forward slash vs.
backslash on Windows) using a single build, which is useful for
CI/Buildbots.
The environment variable is checked once and cached in a static variable
for performance.
Also updated relevant tests in SupportTests (Path.cpp and
CommandLineTest.cpp) to dynamically detect the preferred separator style
at runtime instead of relying on the compile-time macro, making them
compatible with the override.
[RISCV][GlobalISel] Lower i8 bitreverse using brev8 with Zbkb (#199469)
This teaches RISC-V GlobalISel to custom-lower scalar i8 G_BITREVERSE
using brev8 when Zbkb is available.
The i8 source is zero-extended to XLEN before applying the riscv_brev8
intrinsic. Since brev8 reverses bits independently within each byte, the
high zero bytes remain zero, so the result can be truncated back to i8.
[CloneModule] Clone undefined ifuncs (#197353)
To satisfy the verifier rule "IFunc resolver must be a definition". We
fix iFunc handling when cloning modules.
When cloning a module, if an IFunc has no definition
(ShouldCloneDefinition returns false), directly create an external
GlobalValue (Function or GlobalVariable) instead of trying to clone the
ifunc.
Add a test case for llvm-split to verify the ifunc cloning/splitting
behavior works correctly.
[VPlan] Make TransformState::get BCast-logic robust (#197589)
The logic for inserting Broadcasts in a more optimal location in
VPTransformState::get is quite fragile, especially around scalable VFs.
Fix it, resulting in minor improvements.
[LinkerWrapper] Fix temps being dumped to CWD instead of output path (#198679)
Summary:
Offloading save temps is a complex dance where we have clang,
linker-wrapper, and lld all making their own temp files. The ones in the
linker wrapper were not respecting the output directory because we
stripped everything with filename. Just get rid of this so it uses the
output file's directory properly in this mode.
[LV] Optimize partial reduction extends before handling inloop subs
The crash avoided in #194660 was caused by the extend optimizations
failing to match as due to the extra sub/negation added to the
"ExtendedOp".
A similar crash exists for [us]abs partial reductions
(see https://godbolt.org/z/MerMon5rE), which is fixed with this patch.
This patch solves the underlying issue by running the extend optimizations
before any inloop sub/fsub handling.
Fixes #194000
[LV] Support partial reduce subs/fsubs without a mul operand
This allows the `UpdateR(PrevValue, ext(...))` form for fsub/sub
updates (i.e, AddWithSub or Sub reductions). For sub reductions the
codegen/handling is identical to add reductions (with the sub handled
out of loop). For AddWithSub, reductions the sub is handled in-loop
with a NegatedExtendedReduction VP expression, which the encapsulates
`reduce.[f]add(neg(ext(op)))`.
[GlobalsModRef] Don't erase while iterating
The loop erases from AllocsForIndirectGlobals while walking it, which
now hits the iterator invalidation assert in DenseMap::erase. Use
remove_if instead.
[ORC] Avoid iterator invalidation when erasing image info symbols
processObjCImageInfo iterated the section's DenseSet of symbols while
calling removeDefinedSymbol, which erases from that same set. Re-fetch
begin() each iteration so the iterator is always fresh.
[Clang][test] check-clang-format not created with LLVM_ENABLE_IDE (#199638)
add_lit_testsuites skips creating targets for each subdirectory when
LLVM_ENABLE_IDE. Only create the dependency (introduced in #199169) when
the check-clang-format target actually exists.
Fixes the LLVM build when using an IDE.
[Coroutines] Allow rematerialization of unary operators and selected intrinsics (#197698)
All of those can be cheaply recomputed when the coroutine has resumed.
Before this change, results of unary operators and intrinsics were
spilled into the coroutine frame and reloaded on resume:
```
%neg = fneg float %n
store float %neg, ptr %neg.spill.addr
; In resume:
%neg.reload = load float, ptr %neg.reload.addr
; ... use %neg.reload
```
After this change, only the operand is spilled and the operation is
rematerialized on each resume, avoiding the frame store:
[9 lines not shown]
[mlir][mem2reg] fix assert for indirect blocking uses inside regions (#199193)
When adding new blocking uses created by the interface of a previous
blocking uses (typically forwarding the blocking uses to the op result
users), the mem2reg framework was assuming that the new blocking uses
are in the same region as the original blocking use, which is not true
in general and lead to the assert:
`Transforms/Mem2Reg.cpp:743: void
{anonymous}::MemorySlotPromoter::removeBlockingUses(mlir::Region*):
Assertion `op->getParentRegion() == region && "all operations must still
be in the same region"' failed.`
This patch fixes this by adding the new uses into the userToBlockingUses
for the region of the new blocking uses.
[LV] Add support for partial reduction chains with fsubs. (#197114)
The cost-model prevented this from happening, but the LV would otherwise
generate incorrect code (i.e. without the fneg).
[RISCV] Remove TargetLowering arg from getContainerForFixedLengthVector. NFC (#199629)
Unless I'm missing something we can just fetch the TLI from
RISCVSubtarget
build: adjust LLDB and clang library naming on Windows (#185084)
Ensure that use of the GNU driver does not change the library name on
Windows. We would check the build tools being MSVC rather than targeting
Windows to select the output name.
(cherry picked from commit 687e66c989887542b1702a7a99eeaa4e25edd12e)
[libc] Demote compiler check error to a warning (#198033)
Summary:
This check exists to encode the policy that this is only intended to be
built with a just-built compiler. In practice it's a little too strict
and breaks pretty much every six months when the version bumps or when
people try to build a separate patch. Just demote to a warning.
(cherry picked from commit 13da33e922fe43cd97246f5e33320acc4f5ea186)
[NFC] Add null terminator assert to CodeViewRecordIO::mapStringZ (#199624)
mapStringZ assumes that there's a null terminator past the end of Value
(I suppose the name hints at this too). This doesn't seem very nice to
me, but at least we can add an assert to check that the assumption
holds.
[LoongArch] Fix musttail with indirect arguments by forwarding incoming pointers (#198965)
When a `musttail` call passes arguments indirectly (fp128 on LA32, i128
on LA32), the backend allocates a stack temporary and hands the callee a
pointer. The tail call deallocates the caller's frame, and the pointer
dangles.
Fix by forwarding the incoming indirect pointers instead. They point to
the caller's caller's frame, which stays valid after the tail call.
Forwarded formal parameters reuse the pointer directly; computed values
get stored into the incoming buffer first.
The pointers are saved in virtual registers (`CopyToReg`/`CopyFromReg`)
rather than SDValues. The SelectionDAG is cleared between basic blocks
and musttail calls can appear in non-entry blocks, so storing raw
SDValues across BBs is unsound (this was the bug that led to the revert
in 501417baa60f). The vreg save only fires when the function has
musttail calls; other functions see no codegen change.
[2 lines not shown]
[X86] LowerBUILD_VECTORvXi1 - scalarize the bool masks if we insert a single non-const value (#199523)
Minor generalization of the existing fold for splat bool masks - if only
a single value is used in insertion(s) (as well as any immediate/undefs
values), then fold to a scalar select (val, insert|immediate, immediate)
Yak shaving for #198162