[libc] Move function argument from rpc::dispatch to template (#194953)
Summary:
This was previous put here for ergnomics as to put it in the template
required decltype. However, this has the effect of putting an actual
functoin pointer in an escaping context if it is not fully removed or
inlined. C++17 has a non-type-template parameter that we can use to
keep the interface clean. Use that instead.
[X86] Add basic ISD::VECREDUCE_AND/OR/XOR handling (#195063)
Custom lower ISD::VECREDUCE_AND/OR/XOR using vector logic ops
Handling of any_of/all_of/parity patterns will happen later once we start dismantling combinePredicateReduction()
[VPlanSLP] Strip stub (#192635)
VPlanSLP hasn't seen much progress since it was checked in 7 years ago,
and it is unclear if there ever will be any progress. Strip it from the
tree to avoid confusion.
[GlobalISel][KnownBits] Use KnownBits::urem for G_UREM (#193455)
This updates the implementation of G_UREM in GlobalISel to use
KnownBits::urem instead of reimplementing the logic.
Supersedes #189087.
[MLIR][NVVM] SpecialRegister&PureSpecialRegister takes result type (#195030)
Use concrete `I32` (default) and `I64` (clock64, globaltimer) instead of
generic `LLVM_Type` for special-register op results. The dialect
verifier now rejects mismatches up-front, and the Python op-binding
generator emits the inferred-result form, so callers can write
`nvvm.ThreadIdXOp()` with no arguments. Strict tightening: no valid
existing IR is rejected.
[LV] Re-generate check lines with UTC version 6. (NFC) (#195061)
The checks in the re-generated files check if.pred block chains, which
are prone to renaming chains. Re-generate with version 6 to avoid
unnecessary test changes due to renumbering.
[SPIR-V] Recover aggregate type for stores of undef/composite constants (#195003)
preprocessUndefs/preprocessCompositeConstants lower aggregate values to
spv_undef/spv_const_composite calls returning i32, stashing the original
type in AggrConstTypes
[AArch64] ConditionOptimizer: replace intra-block scan with map-based algorithm (#190455)
The previous condopt implementation found the first two CSINC
instructions in a block and attempted one optimisation, ignoring other
possible pairs. It also performed extra forward and backward walks.
Replace the two-CSINC scan with a single forward walk maintaining a
DenseMap keyed by canonical (copy-traced) register. Any number of pairs
per block are now handled.
[clang][unittests] Fix flaky PerformPendingInstantiations nesting in TimeProfilerTest (#193717)
buildTraceGraph already compensates for timer rounding that makes
PerformPendingInstantiations appear to be inside the previous event, but
only when it is nested exactly one level deep. The aarch64-darwin
buildbot produced three-level nesting for ConstantEvaluationC99, which
slipped through the normalization and broke the expected trace output.
Keep popping while PerformPendingInstantiations looks nested—we know it
is always a top-level event in these tests—instead of stopping at the
single-level case.
Followup to https://github.com/llvm/llvm-project/pull/138613.
[LV][NFC] Remove unused -simplifycfg-*** option from tests (#195044)
The -simplifycfg-require-and-preserve-domtree=1 option used in two tests
had no effect.
[LoopInterchange] Fix handling of PHI which refers to another PHI (#194364)
In the transformation phase, at first LoopInterchange moves several
instructions in the inner loop into the new latch block. The
instructions used as incoming values to the induction variables from the
latch block are the targets of this movement. Previously, this process
could result in an infinite loop when a PHI node refers to another PHI
node, as in the following example:
```
%i = phi i64 [ 0, %entry ], [ %i.inc, %latch ]
%j = phi i64 [ 0, %entry ], [ %i, %latch ]
```
The root cause was that `%i` enqueued for processing because it is used
by `%j`.
This patch fixes the issue by preventing induction variables from being
enqueued into the movement list.
Fix #193733
[JITLink][COFF] Move GetImageBaseSymbol utility into public header. (#195041)
This utility may be useful for people writing
LinkGraphLinkingLayer::Plugins for COFF LinkGraphs, so this commit moves
it a public header where it can easily be reused
(llvm/ExecutionEngine/JITLink/COFF.h).
Also adds unit tests for the utility.
[AMDGPU][NFC] Use LaneMaskConstants for waterfall loops in AMDGPURegBankLegalizeHelper (#190792)
Use `LaneMaskConstants` for generating waterfall loops in
`AMDGPURegBankLegalizeHelper`.
No Functionality Change.
[AMDGPU] Extend max3/min3 tree-reduction combine to cover ternary chains (#194845)
The tree-reduction combine for min/max currently trigger on shapes where
both children of a node are same-opcode. This patch extends it to also
recognize cases where only one child is same-opcode and one-use like
max(max(a, b), c) feeding another max.
For example, with R = max(max(A, B), C) where A, B, and C are each
ternary chains of the form max(max(x, y), z), the current predicate does
not recognize the ternary-chain interiors as still combinable, so the
higher-level rules fire eagerly and produce max3 nodes with 2-op maxes
inside them. With the extended predicate, each ternary chain is allowed
to fold into a max3 first, after which the higher levels reduce cleanly
without leaving stranded 2-op maxes behind.
Adds six regression tests covering a 2-level ternary chain, a mixed
ternary+binary shape and vector examples.
Fix: LCOMPILER-2166
[AMDGPU] Refactor setreg handling in the VGPR MSB lowering
It can skip inserting S_SET_VGPR_MSB if we set the mode via
piggybacking. We are now relying on the HW bug for correct
behavior. If/when the bug is fixed lowering will be incorrect.
SETREG is not a piggybacking target anymore. Instead piggybacking is
disabled if we have seen a SETREG since the last mode change.
[LifetimeSafety] Add placement new support (#194030)
Allows flow from placement new closely resembling standard library form.
Comes as part of the completion of #164963.
[LoongArch] Add patterns for vector bitwise selection (#193753)
Add instruction selection patterns for VBITSEL_V/XVBITSEL_V and
VBITSELI_B/XVBITSELI_B to match the canonical bitwise select idiom:
`(a & b) | (~a & c)`
This enables the backend to generate dedicated bitwise select
instructions instead of separate AND/ANDN/OR sequences.
[libc++] Refactor std::print to allow for constant folding of the format part (#185459)
```
---------------------------------------------------------
Benchmark old new
---------------------------------------------------------
std::print("Hello, World!") 43.6 ns 9.88 ns
```