[SLP] Apply reused-scalar reduction counters at the vectorized lane
The horizontal reduction reuse-counter scale was placed by deduplicated
candidate order, but the emitted reduction vector lane order is defined by
the root node, which may be reordered or split (SplitVectorize). As a
result a repeat count could be applied to the wrong lane, producing a wrong
reduction result. Place each counter at the lane the matching candidate is
vectorized to.
Fixes #206476
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/206611
[AMDGPU] Autogen checks for tests in AMDGPU Cost Model. NFC (#206595)
Even though there are comments in the test files saying checks
are autogenerated, it seems some checks are not actually updated.
This work autogenerates checks based on the latest llvm source.
[llvm][clang] Remove `format_object_base` forward declarations (#206526)
PR https://github.com/llvm/llvm-project/pull/206319 removed the
`format_object_base` class itself, but not some of its
forward-declarations. NFCI
[clang][docs]Refactor compiler standard references from c94 to C95 (#206403)
The patch changes references to a non existent c94 standard from
to C95 (C90 + AMD1)
Closes #206389
[SSAF][Extractor][Do not merge] Extract operator new/delete overload entities that shall retain their types
This commit creates an extractor for operator new/delete overloads.
Overloads of operator new shall retain their void* return type,
regardless of whether they are propagated by unsafe buffers. The same
applies to the parameters of operator delete overloads.
Therefore, clang-reforge eventually need this information.
rdar://179151541
[VPlan] Remove unused InductionDescriptor VPDerivedIVRecipe constructor (#206583)
Both callers use the 5-argument (Kind, FPBinOp, ...) constructor; the
delegating InductionDescriptor overload has no users.
[flang][OpenMP] Add explicit return type to visitor lambdas (#206588)
This should silence MSVC (14.51.36231) error:
error C2338: static assertion failed: 'visit() requires the result of
all potential invocations to have the same type and value category
(N4950 [variant.visit]/5).'
e.g. https://lab.llvm.org/buildbot/#/builders/166/builds/9664
[lldb] Add a BugReporter plugin type and "diagnostics report" (#206578)
Introduce a BugReporter plugin kind that files an assembled
Diagnostics::Report through a pluggable destination, plus a "diagnostics
report" command (aliased "bugreport") that collects the bundle and files
it through the first registered reporter.
CreateBugReporterInstance() returns the first registered reporter, so a
reporter registered earlier wins and a downstream tree can take over by
registering ahead of the built-ins. BugReporterNone is the
always-registered, last-in-order fallback. Its File() returns an error
pointing at LLDB_BUG_REPORT_URL, so the command surfaces "no tracker
configured" through the normal error path instead of special-casing it.
"diagnostics report" writes the bundle, prints a review warning, and
files it unless --no-open is given. The upcoming GitHub reporter, gated
by a CMake option, is the first real destination.
[VPlan] Pass CostCtx to makeMemOpWideningDecisions (NFC). (#206580)
makeMemOpWideningDecisions already uses 2 members (PSE, L) and will need
more in the future. Direcly pass CostCtx.
AMDGPU: Migrate unittests to subarch triples
Replace specifying a processor name with the triple
subarch.
The register-limit helpers in AMDGPUUnitTests.cpp that enumerate every
valid CPU via fillValidArchListAMDGCN still pass the CPU explicitly, as
does the MC Disassembler smoke test (its C disassembler API derives the
subtarget from the CPU, not the triple subarch).
Co-authored-by: Claude (Opus 4.8) <noreply at anthropic.com>
clang: Start using new amdgpu subarch triples
Fixup invocations using --target=amdgcn + -mcpu to introduce
the subarch in the triple.
For offload toolchains, a single toolchain is constructed for the
top level amdgpu architecture, and the effective triple is used for
target specific tool invocations.
The specifics of the resource directory layout are tbd. This does
try to find resources in the subarch named directory. The paths
are searched at toolchain creation time, so that does not work
when there are multiple subarches.
Fixes #154925
clang/AMDGPU: Stop passing redundant -target-cpu to cc1
Now that the exact target is encoded in the triple's subarch field,
-target-cpu is redundant. This avoids polluting the resultant IR with
unwanted "target-cpu" attributes. The net result is the desired codegen
when compiling libraries for a major subarch and linking it into a
program compiled for a specific arch. e.g., compiling for "gfx9-generic"
would pollute the IR with "target-cpu"="gfx9-generic", so codegen
would ultimately be performed for the generic target even after
linking into the concrete gfx9 cpu. The specialization will now be
achieved by merging the triples without the linker or optimization
passes needing to fixup function attributes.
clang/AMDGPU: Validate -target-cpu in cc1 is valid for the subarch
Restrict the reported list of valid target-cpus based on the triple's
subarch. This is more consistent with how other targets validate the
target CPU name. Currently we have split handling validating the target
name for the triple in both the driver and here. The driver based diagnostic
seems to be an amdgpu-ism in 2 different places (though there is one arm
validation emitting the same diagnostic). In the future we could probably
drop those.
AMDGPU: Introduce amdgpu triple arch
Move towards using the triple for representing incompatible
ISA changes. Use the subarch field to represent the various
incompatible cases. Previously we pretended a single triple arch
was universally compatible, and only distinguished by function
level subtargets. Move towards using distinct triples to enable
more sophisticated toolchain handling in the future, like proper
runtime library linking.
Introduce a new subarch per unique ISA, but also introduce
"major subarches" which are compatible by a set of covered
minor ISA versions. These map to the existing generic targets.
There are a few placeholder subarch entries, which currently
have missing backing generic arches for codegen.
This should be the preferred triple arch name going forward,
but is treated as an alias of amdgcn. This does not yet change
clang to emit the new triples.
[2 lines not shown]
[flang][OpenMP] Add explicit return type to visitor lambdas
This should silence MSVC (14.51.36231) error:
error C2338: static assertion failed: 'visit() requires the result of
all potential invocations to have the same type and value category
(N4950 [variant.visit]/5).'
e.g. https://lab.llvm.org/buildbot/#/builders/166/builds/9664
[mlir][acc] Lower sequential acc.loop to scf.for in ACCComputeLowering (#206165)
Sequential loops already have fixed parallelism, so represent them with
`scf.for` rather than `scf.parallel`. To prevent further analysis and
parallelization, `parDimAttr` is set to seq.
[HLSL] Implement codegen for copying cbuffer structs with resources (#204232)
Global-scope structs are in `hlsl_constant` address space and use
cbuffer layout. When those structs contain resources, the resources are
not stored inline in the constant buffer. Instead, they are represented
as separate globals, or in case of resource arrays initialized on
demand.
This change implements the HLSL codegen for cases where a cbuffer-backed
struct with embedded resources is copied into a local variable or passed
as a function argument. CodeGen materializes a temporary in the default
address space, copies the constant-data fields using the cbuffer struct
layout, and reconstruct the resource members in the local copy.
Fixes #182990
[PGO] Fix malformed raw profile test (#206574)
PR #190708 added a uniform counter pointer to the raw profile data
record, but a hand-written raw profile test gained one extra zero word
in the record.
Remove the extra word so the name section starts at the offset expected
by the reader. This fixes the regression while keeping the test focused
on the malformed counter pointer.
Buildbot failure:
https://lab.llvm.org/buildbot/#/builders/24/builds/21581
Test:
```
~/git/scripts_shared/scripts/llvm/llvm-dev.sh llvm test llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test
```
[libc] Add regex AST and ExprPool (#198728)
Implemented the core AST nodes and the ExprPool arena-based allocator.
Utilised AllocChecker for memory safety and enforced hardening at node
initialisation.
Assisted-by: Automated tooling, human reviewed.
[lldb][Windows] also run tests with LLDB_TEST_USE_LLDB_SERVER=1 (#206511)
`LLDB_TEST_USE_LLDB_SERVER` defaults to 0, meaning that the green dragon
lldb config on Windows only tests lldb with the in process plugin.
This patch runs the test suite both with and without lldb-server.
This only affects
https://ci-external.swift.org/job/lldb-windows/job/main/.
[BOLT] Fix use-old-text-zero-padding on FreeBSD
BSD od supports only decimal value to -N parameter. To fix the test
failure, we use decimal value instead of hex value in this test case.