[CloneModule] Clone undefined ifuncs (#197353)
To satisfy the verifier rule "IFunc resolver must be a definition". We
fix iFunc handling when cloning modules.
When cloning a module, if an IFunc has no definition
(ShouldCloneDefinition returns false), directly create an external
GlobalValue (Function or GlobalVariable) instead of trying to clone the
ifunc.
Add a test case for llvm-split to verify the ifunc cloning/splitting
behavior works correctly.
[VPlan] Make TransformState::get BCast-logic robust (#197589)
The logic for inserting Broadcasts in a more optimal location in
VPTransformState::get is quite fragile, especially around scalable VFs.
Fix it, resulting in minor improvements.
[LinkerWrapper] Fix temps being dumped to CWD instead of output path (#198679)
Summary:
Offloading save temps is a complex dance where we have clang,
linker-wrapper, and lld all making their own temp files. The ones in the
linker wrapper were not respecting the output directory because we
stripped everything with filename. Just get rid of this so it uses the
output file's directory properly in this mode.
[LV] Optimize partial reduction extends before handling inloop subs
The crash avoided in #194660 was caused by the extend optimizations
failing to match as due to the extra sub/negation added to the
"ExtendedOp".
A similar crash exists for [us]abs partial reductions
(see https://godbolt.org/z/MerMon5rE), which is fixed with this patch.
This patch solves the underlying issue by running the extend optimizations
before any inloop sub/fsub handling.
Fixes #194000
[LV] Support partial reduce subs/fsubs without a mul operand
This allows the `UpdateR(PrevValue, ext(...))` form for fsub/sub
updates (i.e, AddWithSub or Sub reductions). For sub reductions the
codegen/handling is identical to add reductions (with the sub handled
out of loop). For AddWithSub, reductions the sub is handled in-loop
with a NegatedExtendedReduction VP expression, which the encapsulates
`reduce.[f]add(neg(ext(op)))`.
[GlobalsModRef] Don't erase while iterating
The loop erases from AllocsForIndirectGlobals while walking it, which
now hits the iterator invalidation assert in DenseMap::erase. Use
remove_if instead.
[ORC] Avoid iterator invalidation when erasing image info symbols
processObjCImageInfo iterated the section's DenseSet of symbols while
calling removeDefinedSymbol, which erases from that same set. Re-fetch
begin() each iteration so the iterator is always fresh.
[Clang][test] check-clang-format not created with LLVM_ENABLE_IDE (#199638)
add_lit_testsuites skips creating targets for each subdirectory when
LLVM_ENABLE_IDE. Only create the dependency (introduced in #199169) when
the check-clang-format target actually exists.
Fixes the LLVM build when using an IDE.
[Coroutines] Allow rematerialization of unary operators and selected intrinsics (#197698)
All of those can be cheaply recomputed when the coroutine has resumed.
Before this change, results of unary operators and intrinsics were
spilled into the coroutine frame and reloaded on resume:
```
%neg = fneg float %n
store float %neg, ptr %neg.spill.addr
; In resume:
%neg.reload = load float, ptr %neg.reload.addr
; ... use %neg.reload
```
After this change, only the operand is spilled and the operation is
rematerialized on each resume, avoiding the frame store:
[9 lines not shown]
[mlir][mem2reg] fix assert for indirect blocking uses inside regions (#199193)
When adding new blocking uses created by the interface of a previous
blocking uses (typically forwarding the blocking uses to the op result
users), the mem2reg framework was assuming that the new blocking uses
are in the same region as the original blocking use, which is not true
in general and lead to the assert:
`Transforms/Mem2Reg.cpp:743: void
{anonymous}::MemorySlotPromoter::removeBlockingUses(mlir::Region*):
Assertion `op->getParentRegion() == region && "all operations must still
be in the same region"' failed.`
This patch fixes this by adding the new uses into the userToBlockingUses
for the region of the new blocking uses.
[LV] Add support for partial reduction chains with fsubs. (#197114)
The cost-model prevented this from happening, but the LV would otherwise
generate incorrect code (i.e. without the fneg).
[RISCV] Remove TargetLowering arg from getContainerForFixedLengthVector. NFC (#199629)
Unless I'm missing something we can just fetch the TLI from
RISCVSubtarget
build: adjust LLDB and clang library naming on Windows (#185084)
Ensure that use of the GNU driver does not change the library name on
Windows. We would check the build tools being MSVC rather than targeting
Windows to select the output name.
(cherry picked from commit 687e66c989887542b1702a7a99eeaa4e25edd12e)
[libc] Demote compiler check error to a warning (#198033)
Summary:
This check exists to encode the policy that this is only intended to be
built with a just-built compiler. In practice it's a little too strict
and breaks pretty much every six months when the version bumps or when
people try to build a separate patch. Just demote to a warning.
(cherry picked from commit 13da33e922fe43cd97246f5e33320acc4f5ea186)
[NFC] Add null terminator assert to CodeViewRecordIO::mapStringZ (#199624)
mapStringZ assumes that there's a null terminator past the end of Value
(I suppose the name hints at this too). This doesn't seem very nice to
me, but at least we can add an assert to check that the assumption
holds.
[LoongArch] Fix musttail with indirect arguments by forwarding incoming pointers (#198965)
When a `musttail` call passes arguments indirectly (fp128 on LA32, i128
on LA32), the backend allocates a stack temporary and hands the callee a
pointer. The tail call deallocates the caller's frame, and the pointer
dangles.
Fix by forwarding the incoming indirect pointers instead. They point to
the caller's caller's frame, which stays valid after the tail call.
Forwarded formal parameters reuse the pointer directly; computed values
get stored into the incoming buffer first.
The pointers are saved in virtual registers (`CopyToReg`/`CopyFromReg`)
rather than SDValues. The SelectionDAG is cleared between basic blocks
and musttail calls can appear in non-entry blocks, so storing raw
SDValues across BBs is unsound (this was the bug that led to the revert
in 501417baa60f). The vreg save only fires when the function has
musttail calls; other functions see no codegen change.
[2 lines not shown]
[X86] LowerBUILD_VECTORvXi1 - scalarize the bool masks if we insert a single non-const value (#199523)
Minor generalization of the existing fold for splat bool masks - if only
a single value is used in insertion(s) (as well as any immediate/undefs
values), then fold to a scalar select (val, insert|immediate, immediate)
Yak shaving for #198162
[ELF] Initialize Symbol fields in the constructor instead of via memset (#198129)
`initSectionsAndLocalSyms` and `makeDefined` memset the storage to zero
and then placement-new a Symbol-derived object into it. Placement new
begins a new object's lifetime. The standard does not seem to guarantee
the memset bytes carry into members the constructor leaves
uninitialized.
lld built by GCC 16 can make Valgrind report reads of Symbol::flags
(via getSymSectionIndex during finalizeSections) as uses of
uninitialized values (ClangBuiltLinux/linux#2162).
This patch reinstates the per-field initialization that commit
778742760534 ("[ELF] Avoid redundant assignment to Symbol fields. NFC")
had replaced with a bulk memset.
(cherry picked from commit 905a88b923433eb8cd83677ea55bee82eb9ba498)
[RISCV] Fix fixed-length masked.{u,s}{div,rem} lowering not converting operands (#197913)
Similar to #197724, but this time I also somehow forgot to convert the
operands to scalable vectors. I'm surprised that nothing asserted here,
since SDT_RISCVIntBinOp_VL has a type profile constraint that the
operands and result types need to be the same.
Reland [C++20] [Modules] Don't profiling the callee of CXXFoldExpr (#190732) (#195983)
Close https://github.com/llvm/llvm-project/issues/190333
For the test case, the root cause of the problem is, the compiler
thought the declaration of `operator &&` in consumer.cpp may change the
meaning of '&&' in the requrie clause of `F::operator()`. But it doesn't
make sense. Here we skip profiling the callee to solve the problem. Note
that we've already record the kind of the operator. So '&&' and '||'
won't be confused.
---
See the discussion in https://github.com/llvm/llvm-project/pull/194283
For the new found pattern that we may have other binary operator (e.g.,
operator +) in the require clause, e.g.,
```C++
[8 lines not shown]
[Clang][Sema] Fix crash in __builtin_dump_struct with immediate callables (#192880)
## Motivation
`ComplexRemove` (used by `Sema::PopExpressionEvaluationContext` to strip
nested `ConstantExpr` wrappers) inherits the default
`TreeTransform::TransformOpaqueValueExpr`, which asserts on any
`OpaqueValueExpr` with a non-null `SourceExpr` unless a binding has
already been set up.
`__builtin_dump_struct` binds the record pointer to an `OpaqueValueExpr`
inside a `PseudoObjectExpr`. When the callable argument is
immediate-escalated (e.g. via `__builtin_is_within_lifetime`),
`RemoveNestedImmediateInvocation` roots `ComplexRemove` inside the PSE's
semantic form, reaching that OVE without the binding the assert expects
- triggering a crash.
## Closing Issues
[6 lines not shown]
[CoroSplit] Never collect allocas used by catchpad into frame (#186728)
Windows EH requires exception objects allocated on stack. But there is
no reliable way to identify them. CoroSplit employs a best-effort
algorithm to determine whether allocas persist on the stack or the
frame, which may result in miscompilation when Windows exceptions are
used.
This patch proposes that we treat allocas used by catchpad as exception
objects and never place them on the frame. A verifier check is added to
enforce that operands of catchpad are either constants or allocas.
Close #143235 Close #153949 Close #182584
[VPlan] Fold canonical IV recipe creation into createLoopRegion. (#198383)
Remove the separate addCanonicalIVRecipes transform and create the
canonical IV's increment and the latch's exiting branch directly in
createLoopRegion, using the loop region's VPRegionValue for the
canonical IV. The temporary VPPhi placeholder previously inserted in the
header is no longer needed.
PR: https://github.com/llvm/llvm-project/pull/198383
[Clang][AMDGPU] Add ``amdgcn_av("none")`` attribute for atomic expressions
Add a statement attribute that suppresses MakeAvailable/MakeVisible
cache operations on AMDGPU atomic instructions while preserving memory
ordering (waits).
The attribute takes a string argument specifying the mode. Currently
"none" is the only supported mode. The resulting atomic or fence
instruction carries !mmra !{!"amdgcn-av", !"none"} metadata.
Assisted-By: Claude Opus 4.6