[lldb][test] TestAbiTagStructors.py: XFAIL on Windows
This is failing on the lldb-aarch64-windows bots with (see error below).
XFAIL for now because it's unlikely that these expression evaluator
calls were supported before they were added.
```
======================================================================
FAIL: test_nested_no_structor_linkage_names_dwarf (TestAbiTagStructors.AbiTagStructorsTestCase.test_nested_no_structor_linkage_names_dwarf)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\packages\Python\lldbsuite\test\lldbtest.py", line 1828, in test_method
return attrvalue(self)
[109 lines not shown]
[X86] Allow AVX512 rotate intrinsics to be used in constexpr (#157652)
Now that they wrap the __builtin_elementwise_fshl/fshr builtin intrinsics this is pretty trivial.
Another step towards #153152 - just VBMI2 double shifts remaining
[DAGCombiner] Relax condition for extract_vector_elt combine
Checking `isOperationLegalOrCustom` instead of `isOperationLegal`
allows more optimization opportunities. In particular, if a target
wants to mark `extract_vector_elt` as `Custom` rather than `Legal`
in order to optimize some certain cases, this combiner would
otherwise miss some improvements.
Previously, using `isOperationLegalOrCustom` was avoided due to
the risk of getting stuck in infinite loops (as noted in
https://github.com/llvm/llvm-project/commit/61ec738b60a4fb47ec9b7195de55f1ecb5cbdb45).
After testing, the issue no longer reproduces, but the coverage
is limited to the regression/unit tests and the test-suite.
Would it make sense to relax this condition to enable more
optimizations? And what would be the best way to ensure that
doing so does not reintroduce infinite loop regressions?
Any suggestions would be appreciated.
[LoopUtils] Simplify expanded RT-checks (#157518)
Follow up on 528b13d ([SCEVExp] Add helper to clean up dead instructions
after expansion.) to hoist the SCEVExapnder::eraseDeadInstructions call
from LoopVectorize into the LoopUtils APIs add[Diff]RuntimeChecks, so
that other callers (LoopDistribute and LoopVersioning) can benefit from
the patch.
[LLD][COFF] Make `/summary` work when `/debug` isn't provided (#157476)
Previously, `/summary` was meant to print some PDB information. Now move
handling of `/summary` to `Writer.cpp` so that it can have an effect
when `/debug` isn't provided. This will also provide grounds for
extending with more general information.
[analyzer] In LivenessValues::equals also check liveBindings (#157645)
This was likely accidentally omitted when `liveBindings` was introduced.
I don't think in practice it matters.
[X86] Allow XOP rotate intrinsics to be used in constexpr (#157643)
Now that they wrap the __builtin_elementwise_fshl/fshr builtin intrinsics this is pretty trivial.
Another step towards #153152 - I'll handle the AVX512 rotates next
[AMDGPU][gfx1250] Support "cluster" syncscope
Defaults to "agent" for targets that do not support it.
- Add documentation
- Register it in MachineModuleInfo
- Add MemoryLegalizer support
[AArch64][SME] Support agnostic ZA functions in the MachineSMEABIPass (#149064)
This extends the MachineSMEABIPass to handle agnostic ZA functions. This
case is currently handled like shared ZA functions, but we don't require
ZA state to be reloaded before agnostic ZA calls.
Note: This patch does not yet fully handle agnostic ZA functions that
can catch exceptions. E.g.:
```
__arm_agnostic("sme_za_state") void try_catch_agnostic_za_callee()
{
try {
agnostic_za_call();
} catch(...) {
noexcept_agnostic_za_call();
}
}
```
[3 lines not shown]
[flang][OpenMP] Support multi-block reduction combiner regions on the GPU
Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.
[flang][OpenMP] `do concurrent`: support `reduce` on device
Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.
[flang][OpenMP] `do concurrent`: support `local` on device
Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.