[LV] Update forced epilogue VF options to allow different VFs than main. (#190393)
Previously, forced epilogue vector factors via the command line options
required to match the forced main VF (or the VF to be built in general).
This leads to a number of akward tests, where we end up with dead
epilogue vector loops.
Update the logic to build an additional VPlan with the epilogue vector
factor, and require the provided epilogue VF to be < IC * MainLoopVF.
Otherwise, epilogue vectorization is skipped.
This only impacts the forced epilogue VF option used for testing and
ensures epilogue tests to cover more realistic scenarios and make them
more robust w.r.t. to additional VPlan-based folding.
PR: https://github.com/llvm/llvm-project/pull/190393
[DAG] computeKnownFPClass - Add handling for AssertNoFPClass (#190185)
Resolves #189478
Adds code to handle AssertNoFPClass in computeKnownFPClass and adds IR
test coverage for RISC-V.
[Clang] Do not create a NoSFINAETrap for variable specialization. (#191000)
There is no thing in the standard that says this should happen outside
of the immediate context.
Fixes #54439
[AMDGPU] Use wavefront scope for single-wave workgroup synchronization (#187673)
Workgroup-scoped fences and non-relaxed workgroup atomics were
previously legalized with synchronization strong enough for multi-wave
workgroups.
When the kernel's maximum flat work-group size does not exceed the
wavefront size, the workgroup contains only a single wavefront, so
workgroup-scoped synchronization is equivalent to wavefront scope and
the stronger legalization is unnecessary.
SIMemoryLegalizer now demotes workgroup scope to wavefront scope
in this case for workgroup-scoped fences and for non-relaxed atomic
load, store, atomicrmw, and cmpxchg operations.
This allows subsequent legalization to operate at wavefront scope.
The decision is based on AMDGPUSubtarget::isSingleWavefrontWorkgroup.
---------
Co-authored-by: Barbara Mitic <Barbara.Mitic at amd.com>
[OMPIRBuilder] Move debug records to correct blocks. (#157125)
Consider the following small OpenMP target region:
```
!$omp target map(tofrom: x)
x = x + 1
!$omp end target
```
Currently, when compiled with `flang`, it will generate an outlined
function like below (with irrelevant bits removed).
```
void @__omp_offloading_10303_14e8afc__QQmain_l13(ptr %0, ptr %1) { entry:
%2 = alloca ptr, align 8, addrspace(5)
%3 = addrspacecast ptr addrspace(5) %2 to ptr
...
br i1 %exec_user_code, label %user_code.entry, label %worker.exit
[36 lines not shown]
[analyzer] Fix crash in CStringChecker on zero-size element types (#191061)
Move the null check of Offset before its dereference in checkInit. When
the element type has zero size (e.g., an empty struct in C), the
division returns an empty optional, which was dereferenced
unconditionally.
Fixes #190457
[clang][ssaf][test] Fix the extraction-works-alongside-compilation.cpp test (#191162)
I forgot that we need this `REQUIRES: asserts` for the test.
Fixes build bots not setting `LLVM_ENABLE_ASSERTIONS=ON`.
For example:
https://lab.llvm.org/buildbot/#/builders/11/builds/37623
This fixes up #191058
[LV] NFCI: Create VPExpressions in transformToPartialReductions.
With this change, all logic to generate partial reductions and
recognising them as VPExpressions is contained in
`transformToPartialReductions`, without the need for a second
transform pass.
The PR intends to be a non-functional change.
[LV] Simplify costing partial reduction chain links (NFCI) (#190980)
Previously, `getPartialReductionLinkCost()` needed to figure out what
case `matchExtendedReductionOperand()` matched to compute a cost. This
made adding new cases to `matchExtendedReductionOperand()` more complex
and added some redundancy.
This patch updates `ExtendedReductionOperand` so that it contains all
the information needed to compute the cost ready to pass to
`getPartialReductionCost()`. This means matching new operand forms only
needs to be done in `matchExtendedReductionOperand()`.
This is split off from #188043 (this change simplifies matching absolute
difference operands).
[mlir][Vector] Make createWriteOrMaskedWrite utility (#190967)
Analog to https://github.com/llvm/llvm-project/pull/89119, make
`createWriteOrMaskedWrite` a vector utility, exposing it for re-use by
downstream users.
This PR is mostly just moving code and updating documentation but also
addresses a `TODO` for `isMaskTriviallyFoldable` to use that utility in
`createReadOrMaskedRead` as well.
No new tests were added, because the functionality is covered by existing tests.
---------
Signed-off-by: Lukas Sommer <lukas.sommer at amd.com>
[VPlan] Handle AnyOf Or reduction via ComputeReductionResult. (#191049)
Instead of having ComputeAnyOfResult handle the Or reduction of unrolled
parts inline, route it through ComputeReductionResult with
RecurKind::Or. ComputeAnyOfResult now takes a pre-reduced scalar and
only performs the freeze + select.
This is a preparatory step towards removing ComputeAnyOfResult entirely
in https://github.com/llvm/llvm-project/pull/190039.
PR: https://github.com/llvm/llvm-project/pull/191049
[mlir][debug] Make DICompileUnitAttr recursive. (#190808)
This PR add `DIRecursiveTypeAttrInterface` to `DICompileUnitAttr`. It
should fix the circular dependency problem we have since
`importedEntities` field was added.
[Clang] Improve concept performance 1/N (#188421)
The concept parameter mapping patch significantly impacted performance
in scenarios where concepts are heavily used, even with
atomic-expression-level caching.
After normalization, we often end up with large atomic expressions
containing numerous duplicate and complex template parameter mappings.
Previously, we were substituting and checking these repeatedly, which
was highly inefficient.
We now cache these substitution results within TemplateInstantiator.
This saves us a lot of duplicate semantic checking and provides us some
performance improvement, as in these regression cases:
usb_ids_gen.cpp:
clang-21: 1.41s
clang-22: 3.90s
This patch: 2.45s
[12 lines not shown]
[clang][test] Modernize 2004-02-13-Memset.c to use FileCheck (#191092)
Replace `grep | count` verification with `FileCheck` and update `CHECK`
directives with current codegen output.
[LV][NFC] Remove unneeded LLVM intrinsic declarations (#190993)
We no longer need to declare LLVM intrinsics in .ll files as the
intrinsics are populated automatically in the module. Remove the
declarations from tests to reduce test noise and size.
This came from a suggestion on PR #190786.
Revert "[SelectionDAG] Recurse through mask expression trees in WidenVSELECTMask (#188085)" (#191151)
This reverts commit 815edc3ff646392bfee2b381d37dd35e4b04f9c5.
[clang][ssaf] Preserve AST after codegen for SSAF extractors (#191058)
This is a use-after-free.
Codegen would drop the AST before starting the optimizations on the LLVM
IR level. This means that the ASTConsumers of the SSAF extractors only
had dangling TU Decls etc.
For now, let's override this option to force-keep the AST alive. Note
that PluginActions already did the same if their consumers were added
after the main frontend-action.
See:
https://github.com/llvm/llvm-project/blob/69e0367e8221b8002b5d438fb70ff3daf36257fc/clang/lib/Frontend/FrontendAction.cpp#L470
```c++
CI.getCodeGenOpts().ClearASTBeforeBackend = false;
```
Long term, we could think about the stability implications of running
the extractors before codegen to be able to drop the AST, thus save
[13 lines not shown]
[clang][CIR] Add lowering for vcvt_n_ and vcvtq_n_ conversion intrinsics (#190961)
This PR adds lowering for the conversion intrinsics with an immediate
argument (identified by `_n_` in the intrinsic name), excluding FP16
variants.
It also moves the corresponding tests from:
* clang/test/CodeGen/AArch64/neon_intrinsics.c
to:
* clang/test/CodeGen/AArch64/neon/intrinsics.c
The lowering follows the existing implementation in
CodeGen/TargetBuiltins/ARM.cpp and adds the `getFloatNeonType` helper
to support it. The remaining changes are code motion and refactoring.
Reference:
[1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#conversions
[libc] Implement listen(2) on linux (#190755)
I'm using the new syscall wrapper framework, and enabling the entry
point for x86_64, aarch64 and riscv. I also extend the connect test to
check for successful connection, now that we have that ability.
[AArch64][CodeGen] match (or x (not y)) to generate mov+orn (#191145)
Fixes: #100045
Adds a tablegen pattern that matches (or x (not y)) and generates a
mov+orn instead of the original mvn+orr.
The number of instructions still stay the same but mov+orn can be
considered better than mvn+orr for two reasons:
1. Symmetry: For the same input with an 'and' instead of 'or', mov+bic
is generated.
2. Optimzation through register rename: If mov is immediate (for
example, 'mov x1, #0x4'), it can be retired early by the register
renamer and never issued for execution.
This patch was reverted as I wanted to change my email associated with
the patch.
Original patch: #190769
[2 lines not shown]
[LV] NFCI: Create VPExpressions in transformToPartialReductions.
With this change, all logic to generate partial reductions and
recognising them as VPExpressions is contained in
`transformToPartialReductions`, without the need for a second
transform pass.
The PR intends to be a non-functional change.
[CIR][Aarch64] upstream scalar & vector intrinsics (FP16) (#190310)
This PR upstreams the following fp16 intrinsics as part of #185382:
- vaddh_f16,
- vsubh_f16,
- vmulh_f16,
- vdivh_f16
This is my first PR to LLVM, so any feedback is greatly appreciated!
[clang][CIR] Add lowering for vcvt_n_ and vcvtq_n_ conversion intrinsics
This PR adds lowering for the conversion intrinsics with an immediate
argument (identified by `_n_` in the intrinsic name), excluding FP16
variants.
It also moves the corresponding tests from:
* clang/test/CodeGen/AArch64/neon_intrinsics.c
to:
* clang/test/CodeGen/AArch64/neon/intrinsics.c
The lowering follows the existing implementation in
CodeGen/TargetBuiltins/ARM.cpp and adds the `getFloatNeonType` helper
to support it. The remaining changes are code motion and refactoring.
Reference:
[1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#conversions
[lldb] Handle simulator printout in TestSimulatorPlatform (#189571)
This test invokes a binary in a simulator and then reads the first line
of stderr to parse the PID of the invoked binary.
This approach fails when the simulator itself prints a warning/error on
startup. In this case, we try to parse the error as the PID and fail.
This patch just removes the line limit. It doesn't seem to add any value
as we anyway need to search until we find the PID line, and if there is
no PID line we cannot do anything but time out eventually.
See also rdar://169799464