[SLP] Apply reused-scalar reduction counters at the vectorized lane
The horizontal reduction reuse-counter scale was placed by deduplicated
candidate order, but the emitted reduction vector lane order is defined by
the root node, which may be reordered or split (SplitVectorize). As a
result a repeat count could be applied to the wrong lane, producing a wrong
reduction result. Place each counter at the lane the matching candidate is
vectorized to.
Fixes #206476
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/206611
[AMDGPU] Autogen checks for tests in AMDGPU Cost Model. NFC (#206595)
Even though there are comments in the test files saying checks
are autogenerated, it seems some checks are not actually updated.
This work autogenerates checks based on the latest llvm source.
RAIDZ: Fix parity regeneration/check condition
Profiling RAIDZ1/dRAID1 resilver I've noticed that they calculate
the parity twice for most of blocks: first to reconstruct the data
column and then to "verify" the parity column. Same time it is
obvious that parity generated from data reconstructed from the
parity will be identical to the original. The code even had this
condition, but it was overridden by ZIO_FLAG_RESILVER check.
I think the ZIO_FLAG_RESILVER condition is not right. Instead we
should check for parity_errors > 0, when we failed to read some
parity columns that we'll need to rewrite. It should not matter
if we are resilvering or just doing self healing on regular read.
Profiling shows this saving ~16% of ZFS CPU time when resilvering
RAIDZ1. RAIDZ2+ are out of luck, unless two disks are replaced
same time, since there is still a parity to verify.
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin at TrueNAS.com>
Closes #18707
[llvm][clang] Remove `format_object_base` forward declarations (#206526)
PR https://github.com/llvm/llvm-project/pull/206319 removed the
`format_object_base` class itself, but not some of its
forward-declarations. NFCI
[clang][docs]Refactor compiler standard references from c94 to C95 (#206403)
The patch changes references to a non existent c94 standard from
to C95 (C90 + AMD1)
Closes #206389
[SSAF][Extractor][Do not merge] Extract operator new/delete overload entities that shall retain their types
This commit creates an extractor for operator new/delete overloads.
Overloads of operator new shall retain their void* return type,
regardless of whether they are propagated by unsafe buffers. The same
applies to the parameters of operator delete overloads.
Therefore, clang-reforge eventually need this information.
rdar://179151541
sysutils/spiped: Clean up UNIX sockets
When a TCP socket is closed, it becomes possible to create a new
socket listening on the same address; the behaviour of UNIX (aka
"local") sockets is different, in that an inode remains even after
it is closed, and blocks the creation of a new socket with the same
address.
When spiped is launched with a UNIX socket as its source address,
delete any existing socket with that address first. This makes it
possible to "service spiped restart" when UNIX sockets are used.
Deleting the socket when stopping spiped would also work for the
case of restarting the daemon, but not for the case of starting the
daemon after an unclean system shutdown; so deleting only prior to
starting the daemon seemed like the better option.
PR: 295432
Reported by: feld
[VPlan] Remove unused InductionDescriptor VPDerivedIVRecipe constructor (#206583)
Both callers use the 5-argument (Kind, FPBinOp, ...) constructor; the
delegating InductionDescriptor overload has no users.
[flang][OpenMP] Add explicit return type to visitor lambdas (#206588)
This should silence MSVC (14.51.36231) error:
error C2338: static assertion failed: 'visit() requires the result of
all potential invocations to have the same type and value category
(N4950 [variant.visit]/5).'
e.g. https://lab.llvm.org/buildbot/#/builders/166/builds/9664
[lldb] Add a BugReporter plugin type and "diagnostics report" (#206578)
Introduce a BugReporter plugin kind that files an assembled
Diagnostics::Report through a pluggable destination, plus a "diagnostics
report" command (aliased "bugreport") that collects the bundle and files
it through the first registered reporter.
CreateBugReporterInstance() returns the first registered reporter, so a
reporter registered earlier wins and a downstream tree can take over by
registering ahead of the built-ins. BugReporterNone is the
always-registered, last-in-order fallback. Its File() returns an error
pointing at LLDB_BUG_REPORT_URL, so the command surfaces "no tracker
configured" through the normal error path instead of special-casing it.
"diagnostics report" writes the bundle, prints a review warning, and
files it unless --no-open is given. The upcoming GitHub reporter, gated
by a CMake option, is the first real destination.
[VPlan] Pass CostCtx to makeMemOpWideningDecisions (NFC). (#206580)
makeMemOpWideningDecisions already uses 2 members (PSE, L) and will need
more in the future. Direcly pass CostCtx.
AMDGPU: Migrate unittests to subarch triples
Replace specifying a processor name with the triple
subarch.
The register-limit helpers in AMDGPUUnitTests.cpp that enumerate every
valid CPU via fillValidArchListAMDGCN still pass the CPU explicitly, as
does the MC Disassembler smoke test (its C disassembler API derives the
subtarget from the CPU, not the triple subarch).
Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>
clang: Start using new amdgpu subarch triples
Fixup invocations using --target=amdgcn + -mcpu to introduce
the subarch in the triple.
For offload toolchains, a single toolchain is constructed for the
top level amdgpu architecture, and the effective triple is used for
target specific tool invocations.
The specifics of the resource directory layout are tbd. This does
try to find resources in the subarch named directory. The paths
are searched at toolchain creation time, so that does not work
when there are multiple subarches.
Fixes #154925
clang/AMDGPU: Stop passing redundant -target-cpu to cc1
Now that the exact target is encoded in the triple's subarch field,
-target-cpu is redundant. This avoids polluting the resultant IR with
unwanted "target-cpu" attributes. The net result is the desired codegen
when compiling libraries for a major subarch and linking it into a
program compiled for a specific arch. e.g., compiling for "gfx9-generic"
would pollute the IR with "target-cpu"="gfx9-generic", so codegen
would ultimately be performed for the generic target even after
linking into the concrete gfx9 cpu. The specialization will now be
achieved by merging the triples without the linker or optimization
passes needing to fixup function attributes.
clang/AMDGPU: Validate -target-cpu in cc1 is valid for the subarch
Restrict the reported list of valid target-cpus based on the triple's
subarch. This is more consistent with how other targets validate the
target CPU name. Currently we have split handling validating the target
name for the triple in both the driver and here. The driver based diagnostic
seems to be an amdgpu-ism in 2 different places (though there is one arm
validation emitting the same diagnostic). In the future we could probably
drop those.