This new flag depends both on the compiler version AND the linker (#188864)
version, and clang will say it supports the flag even if the linker
can't use its output.
The compiler actually has code to know whether the linker version is
right, and sets it to the default if the stars align.
So I'm going to just test whether whatever is the default method for the
compiler and linker works.
[lld-macho] Make safe ICF conservative without __llvm_addrsig (#188400)
MachO --icf=safe and --icf=safe_thunks used to keep folding code from
object files that did not contain __llvm_addrsig, which was inconsistent
with the conservative ELF/COFF behavior. Mark all symbols in such
objects as address-significant instead, and add regression coverage for
both safe ICF modes with and without addrsig.
Revert "[VPlan] Explicitly unroll replicate-regions without live-outs by VF." (#188868)
Reverts llvm/llvm-project#170212
appears to cause a failure with expensive checks:
https://lab.llvm.org/buildbot/#/builders/187/builds/18306
[AArch64][llvm] Separate TLBI-only feature gating from TLBIP aliases
Refactor the TLBI system operand definitions so that TLBI and TLBIP
records are emitted through separate helper multiclasses, whilst keeping
the table layout readable.
The feature-scoped wrappers now apply FeatureTLB_RMI, FeatureRME, and
FeatureTLBIW only to TLBI records (it was previously incorrectly also
applied to TLBIP instructions), while TLBIP aliases remain gated only
by FeatureD128, including their nXS forms.
Update testcases accordingly.
[AArch64][llvm] Rewrite the TLBI multiclass to be much clearer (NFC)
The `tlbi` multiclass is really doing four jobs at once: base TLBI,
synthesized nXS, optional TLBIP, and synthesized TLBIP nXS. Also,
`needsreg` and `optreg` are really just a 3-state operand policy in
disguise. Likewise, the PLBI multiclass has this same issue.
Change `needsreg` and `optreg` into a combined fake enum, so it's
clearer whether the instruction takes no register operand, a required
register operand or an optional register operand.
This improves on my original change 66e8270e8.
[UnsafeBufferUsage] Check for uninstantiated default arguments to prevent crash. (#188817)
Fix a crash introduced by
https://github.com/llvm/llvm-project/pull/184899
The -Wunsafe-buffer-usage analysis was crashing when it encountered a
template function with a default argument that hadn't been instantiated
yet. This occurred in populateStmtsForFindingGadgets when it attempted
to access the default argument of each parameter.
This fix adds a check to ensure the default argument is instantiated
before attempting to access it.
Assisted-by: Gemini
[lldb] Enable caching for BytecodeSyntheticChildren::FrontEnd::Update (#181199)
Update `BytecodeSyntheticChildren` to support `ChildCacheState` return
values from `@update` implementations.
[lldb][bytecode] Change compiler to require update return type decl (#188637)
To better ensure that bytecode `@update` implementations return a 0/1
value (see https://github.com/llvm/llvm-project/pull/181199), this
changes the Python -> formatter bytecode compiler to require that Python
`update` methods be declared to return `bool`.
A declaration like this will be a compiler error:
```py
def update(self):
# implementation...
```
[AMDGPU] Do not overlap dst with srcs for v_cvt_scalef32_2xpk16_fp6/bf6_f32 (#188809)
v_cvt_scalef32_2xpk16_fp6_f32 and v_cvt_scalef32_2xpk16_bf6_f32, as multipass instructions,
the destination operand must not overlap with any of the source operands.
In this work, we apply Constraints = "@earlyclobber $vdst" to these two instructions.
Fixes: LCCOMPILER-561
[VPlan] Explicitly unroll replicate-regions without live-outs by VF. (#170212)
This patch adds a new replicateReplicateRegionsByVF transform to
unroll replicate=regions by VF, dissolving them. The transform creates
VF copies of the replicate-region's content, connects them and converts
recipes to single-scalar variants for the corresponding lanes.
The initial version skips regions with live-outs (VPPredInstPHIRecipe),
which will be added in follow-up patches.
Depends on https://github.com/llvm/llvm-project/pull/170053
PR: https://github.com/llvm/llvm-project/pull/170212
[CUDA] Use SetVector for CUDADeviceVarODRUsedByHost for determinism (#188616)
This replaces DenseSet with SetVector to avoid non-deterministic
iteration order
when emitting device variables ODR-used by host.
[mlir] Bump SmallVector sizes along hot paths (#188827)
This is based on empirical data from compiling 9 medium to large
language and diffusion models with IREE. e2e, this improves compilation
times by 0.33% in terms of `instructions:u` (same metric is used by the
[CTMark for
Clang](https://www.npopov.com/2024/01/01/This-year-in-LLVM-2023.html#compile-time-improvements)).
I explored using other constants and these are the ones that performed
best while keeping the sizes relatively small.
[libc] Fix check-libc-lit running tests during build (#188081)
Updated check-libc-lit to depend only on build-only targets. Added
libc-integration-tests-build to track integration test executables and
updated LLVMLibCTestRules.cmake to populate it.
Removed incorrect dependencies on execution suites in include and
integration tests that were introduced in #184366.
[LV] Refine tripcount estimate using minimum iteration count rt check. (#188135)
When not folding the tail the minimum iteration count check ensures that
the vector loop is not executed if computing the trip count wraps around
to zero, as the trip count must be at least VF when vectorizing without
tail-folding.
Add and use a new tryToRefineConstantMaxTripCount helper. This ensures
we do not create dead main loops when vectorizing the epilogue, as we
choose smaller main VFs.
PR: https://github.com/llvm/llvm-project/pull/188135
[X86] Remove custom widening legalization of vector udiv/sdiv/urem/srem. (#188786)
This custom legalization was preserving splat values in widened
build_vector to allow the div by constant optimization to work.
We now allow division by constant optimization on narrow vector types
before type legalization so we no longer need this.