[Clang] Fix Itanium mangling crash for local lambda in ctor/dtor (#181068)
Fixes #176395
Note: I need someone to help me merge this PR, since I don't have commit
access.
[AArch64][SVE] Use SUBR for unpredicated bitwise NOT. (#191155)
This relies on the identity NOT (x) = -1 - x, which can be lowered as
byte SUBR (x, 255). The recently added pseudos for SUBR (immediate)
should avoid cases where we would risk emitting a MOV.
[NVPTX] Do not permit calls to ptx_kernel CC (#190434)
Summary:
Removes support for calling the ptx_kernel CC. Regenerate bitcode that
used it, no auto upgrade because this never worked, it just wasn't
diagnosed.
[VPlan] Test tail folding with find-last-iv-sinkable-expr.ll tests. NFC (#191176)
I believe these are being miscompiled after #183911 since we're no
longer using the original select on the header mask added in
addReductionResultComputation
This is additional test coverage for #191166
[Clang] Do not try to create invalid variable specializations. (#190960)
When a variable specialization is ambiguous,
we would still create a node for it.
If the first such specialization takes places in a sfinae context, ie
when appearing in a concept, the initial diagnostic was silence, but no
further errors were emitted on that specialization as the variable was
created anyway.
Instead we do not create a specialization in this case.
Fixes #132592
[AArch64] Fix broken SME code with GlobalISel (#191140)
The checks introduced in #190135 are too restrictive because no SVE or
SME is required to compile streaming-compatible or agnostic-ZA
functions. Now it generates incorrect code for
streaming-compatible/agnostic-ZA functions when the function has no
`+sve` or `+sme`.
[NFC][SPIR-V] Remove unnecessary 'REQUIRES: asserts' from tests (#190986)
Remove `REQUIRES: asserts` from tests that don't use any assertions-only
functionality and should run for all build configurations
[LV] Update forced epilogue VF options to allow different VFs than main. (#190393)
Previously, forced epilogue vector factors via the command line options
required to match the forced main VF (or the VF to be built in general).
This leads to a number of akward tests, where we end up with dead
epilogue vector loops.
Update the logic to build an additional VPlan with the epilogue vector
factor, and require the provided epilogue VF to be < IC * MainLoopVF.
Otherwise, epilogue vectorization is skipped.
This only impacts the forced epilogue VF option used for testing and
ensures epilogue tests to cover more realistic scenarios and make them
more robust w.r.t. to additional VPlan-based folding.
PR: https://github.com/llvm/llvm-project/pull/190393
[DAG] computeKnownFPClass - Add handling for AssertNoFPClass (#190185)
Resolves #189478
Adds code to handle AssertNoFPClass in computeKnownFPClass and adds IR
test coverage for RISC-V.
[Clang] Do not create a NoSFINAETrap for variable specialization. (#191000)
There is no thing in the standard that says this should happen outside
of the immediate context.
Fixes #54439
[AMDGPU] Use wavefront scope for single-wave workgroup synchronization (#187673)
Workgroup-scoped fences and non-relaxed workgroup atomics were
previously legalized with synchronization strong enough for multi-wave
workgroups.
When the kernel's maximum flat work-group size does not exceed the
wavefront size, the workgroup contains only a single wavefront, so
workgroup-scoped synchronization is equivalent to wavefront scope and
the stronger legalization is unnecessary.
SIMemoryLegalizer now demotes workgroup scope to wavefront scope
in this case for workgroup-scoped fences and for non-relaxed atomic
load, store, atomicrmw, and cmpxchg operations.
This allows subsequent legalization to operate at wavefront scope.
The decision is based on AMDGPUSubtarget::isSingleWavefrontWorkgroup.
---------
Co-authored-by: Barbara Mitic <Barbara.Mitic at amd.com>
[OMPIRBuilder] Move debug records to correct blocks. (#157125)
Consider the following small OpenMP target region:
```
!$omp target map(tofrom: x)
x = x + 1
!$omp end target
```
Currently, when compiled with `flang`, it will generate an outlined
function like below (with irrelevant bits removed).
```
void @__omp_offloading_10303_14e8afc__QQmain_l13(ptr %0, ptr %1) { entry:
%2 = alloca ptr, align 8, addrspace(5)
%3 = addrspacecast ptr addrspace(5) %2 to ptr
...
br i1 %exec_user_code, label %user_code.entry, label %worker.exit
[36 lines not shown]
[analyzer] Fix crash in CStringChecker on zero-size element types (#191061)
Move the null check of Offset before its dereference in checkInit. When
the element type has zero size (e.g., an empty struct in C), the
division returns an empty optional, which was dereferenced
unconditionally.
Fixes #190457
[clang][ssaf][test] Fix the extraction-works-alongside-compilation.cpp test (#191162)
I forgot that we need this `REQUIRES: asserts` for the test.
Fixes build bots not setting `LLVM_ENABLE_ASSERTIONS=ON`.
For example:
https://lab.llvm.org/buildbot/#/builders/11/builds/37623
This fixes up #191058
[LV] NFCI: Create VPExpressions in transformToPartialReductions.
With this change, all logic to generate partial reductions and
recognising them as VPExpressions is contained in
`transformToPartialReductions`, without the need for a second
transform pass.
The PR intends to be a non-functional change.
[LV] Simplify costing partial reduction chain links (NFCI) (#190980)
Previously, `getPartialReductionLinkCost()` needed to figure out what
case `matchExtendedReductionOperand()` matched to compute a cost. This
made adding new cases to `matchExtendedReductionOperand()` more complex
and added some redundancy.
This patch updates `ExtendedReductionOperand` so that it contains all
the information needed to compute the cost ready to pass to
`getPartialReductionCost()`. This means matching new operand forms only
needs to be done in `matchExtendedReductionOperand()`.
This is split off from #188043 (this change simplifies matching absolute
difference operands).