[LifetimeSafety] Add support for tracking non-trivially destructed temporary objects (#172007)
Add support for tracking loans to temporary materializations that
require non-trivial destructors. We only support non-trivially
destructed temporaries as they have a nice end-of-life marker via the
`CFGTemporaryDtor`.
This small PR introduces the following changes:
1. AccessPaths can now also represent `MaterializeTemporaryExpr *` via
`llvm::PointerUnion`
3. `FactsGenerator::VisitMaterializeTemporaryExpr` now checks to see if
the temporary materialization is such that it requires a non-trivial
destructor (by checking for a child `CXXBindTemporaryExpr` node when all
implicit casts are stripped away), and if so: creates a Loan whose
AccessPath is a pointer to that `MaterializeTemporaryExpr`, and issues
it to the origin represented by the `MaterializeTemporaryExpr` node we
were called on. When we cannot find a child `CXXBindTemporaryExpr`, we
fall-back to an `OriginFlow` as before.
4. `FactsGenerator::handleTemporaryDtor` is called from
[11 lines not shown]
[flang-rt] Disable testing support for the GPU cross build (#175583)
Summary:
We're starting to provide the GPU version of the Fortran runtime with
the GPU cross-build semantics. This does not support tests right now but
will attempt to build the unit tests and fail to find gtest for the GPU.
Disable this for now so it can build.
[SampleProf] test that calls to function aliases get profile info (#169355)
When a call is made to a function alias, in
SampleProfileLoader::generateMDProfMetadata we look up the actual call
target name in the profile to resolve the alias, in the same way as we
do for indirect calls. Add a test for this so we don't lose profile info
on these calls some day.
[AArch64] Add new pass after VirtRegRewriter to add implicit-defs
When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds extra
implicit-def's to instructions that define the low N bits of a GPR/FPR
register to represent that the top bits are written, because all AArch64
instructions that write the low bits of a GPR/FPR also implicitly zero the
top bits.
These semantics are originally represented in the MIR using `SUBREG_TO_REG`,
but during register coalescing this information is lost and when rewriting
virtual -> physical registers the implicit-defs are not added to represent
the the top bits are written.
There have been several attempts to fix this in the coalescer (#168353),
but each iteration has exposed new bugs and the patch had to be reverted.
Additionally, the concept of adding 'implicit-def' of a virtual register
during the register allocation process is particularly fragile and many
places don't expect it (for example in `X86::commuteInstructionImpl` the
code only looks at specific operands and does not consider implicit-defs.
[9 lines not shown]
[flang-rt] Fix unused flag warning when compiling for the GPU (#175643)
Summary:
Because we link the `cxx` target directly we do not need to use this
flag, that's also why we pass `-nostdinc++` which makes this an unused
command line flag, hence the warning.
[AArch64] Let LoadStoreOptimizer handle renamable implicit-defs.
The LoadStoreOptimizer is very conservative with handling instructions
that have implicit-def operands, and only support them for 2 instructions.
However, they can be considered also when marked explicitly as 'renamable'.
[AMDGPU][Scheduler] Fix compile failure due to const/sort interaction (#175755)
On some configurations sorting `ScoredRemat` objects which contains
const members causes a compile failure due to impossibility of
swapping/moving objects. The problem was introduced in #175050.
This removes const from those fields to address the issue. The design
will soon change anyway to not rely on sorting objects of this type, and
consts were only here for semantic clarity.
[MC/DC] Prune MCDCLogOpStack and use CGF.isMCDCDecisionExpr. NFC (#125410)
`MCDCLogOpStack` is used only for detection of the Decision root. It can
be detected with `MCDC::State::DecisionByStmt`.
[AMDGPU] Have VCC as a first-class member of the SGPR pool.
Add VCC and tuples using VCC to SGPR register classes.
We already support VCC as an allocatable register for 32-bit SGPR
operands, so it seems most natural to support it for register
tuple operands as well.
s106/s107 are still not allowed as aliases of vcc_lo/hi in
AsmParser.
The names given to the VCC tuples match those produced by SP3,
though it feels like there is room for improvement.
https://github.com/llvm/llvm-project/issues/62651
[AMDGPU] Rematerialize VGPR candidates when SGPR spills to VGPR over the VGPR limit
Before, when selecting candidates to rematerialize, we would only
consider SGPR candidates when there was an excess of SGPR registers.
Failing to eliminate the excess would result in spills to VGPRs.
This is normally not an issue, unless spilling to VGPRs results in
excess VGPRs.
This patch does 2 things:
* It relaxes the GCNRPTarget success criteria: now we accept regions
where we spill SGPRs to VGPRs, as long as this does not end up in
excess VGPRs.
* It changes isSaveBeneficial to consider the excess VGPRs (which
includes the SGPRs that would be spilled to VGPR).
With these changes, the compiler rematerializes VGPRs when the excess
SGPRs would result in VGPR excess.
[4 lines not shown]