[LSR] Preserve LCSSA in SCEVRewriter
This is necessery to fix some regressions when switching to the NewPM
and seems to improve optimization quality in some cases due to LSR
currently not understanding loop nests (usage of getSCEV vs
getSCEVScoped). This patch just enables LCSSA
preservation for SCEVRewriter and updates all the relevant tests. There
are some further fixes that are needed to get this fully working that
will be included in follow up patches. This patch also only changes
behavior in the NewPM path to get that unblocked while more work is done
on ensuring LCSSA preservation/requirements do not regress LSR.
Similar to #185373 (although without follow up fixes and a regression
test).
Regression test added for the specific NewPM case noticed is in
Transforms/LoopStrengthReduce/X86/lcssa-preservation-regression.ll
(does not reproduce without the target triple).
[2 lines not shown]
Fix SectionList::ReplaceSection to not replace incorrect section. (#204677)
We use SectionList::ReplaceSection to check for some sections in the
main object file and in separate debug info files. It was relying on
section IDs being consistent between different individual section lists
in different object files which does not work. I fixed this by not using
a section ID when replacing a section, but using the section shared
pointer so there can be no errors.
[ProfileData] Avoid unnecessary copies. (#204875)
Make `Frame` moveable and avoid some unnecessary copies in `RawMemProfReader`. Unnecessary copies fixed in this PR were found by the CSan prototype described in the RFC [1] CopySanitizer (CSan): Detecting unneccessary object copies at runtime.
[1] https://discourse.llvm.org/t/rfc-copysanitizer-csan-detecting-unneccessary-object-copies-at-runtime/91038
Co-authored-by: Jan Newger <jannewger at google.com>
[lldb][tests] Fix FS timing issue in `TestRerunAndExprDylib`. (#205116)
This PR fixes a timing issue that made `TestRerunAndExprDylib` fail with
a small probability. The test rebuilds a library; however, the build and
the re-build may fall into the same timestamp if the underlying
filesystem only has second granularity such that LLDB doesn't reload the
rebuilt library for the second execution.
The fix consists in artifically aging the library file from the first
build, i.e., setting its timestamp 10 seconds into the past. This not
only guarantees that LLDB reloads the file but also also that it is
rebuilt, so the explicit removing is now unnecessary and removed.
This issue exists for at least six months, possible since the tests
exists; I was not able to test older versions. However, we have recently
seen frequent failures, probably due to some change in our underlying
testing infrastructure.
Signed-off-by: Ingo Müller <ingomueller at google.com>
[SimplifyCFG] Allow hoisting in the presence of pseudoprobes (#199753)
Fix regressions in the presence of pseudoprobes that prevents
SimplifyCFG from hoisting instructions into the predecessor. Teach
`hoistCommonCodeFromSuccessors` and `foldBranchToCommonDest` to ignore
pseudo probes and drop them when the BB is eliminated.
The minor loss of profile quality for these cases are justified, as not
performing these hoists degrades performance more and blocks downstream
passes like loop-vectorize (can be upto 30% in 526.blender_r and
525.x264_r).
[AMDGPU] Update packed FP32 intrinsic cost model (#205145)
Intrinsics will not have packed vector benefit if they don't have
the corresponding packed instructions.
AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo (#204315)
Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
[PGO][HIP] HSA-introspection device profile drain + GPU PGO tests (#203056)
## Summary
Follow-up to #202095 (now landed). #202095's host-shadow device-profile
drain can
only collect device counters for kernels that registered a host-side
shadow via
`__hipRegisterVar`. Device-linked programs (e.g. RCCL), whose
instrumented code
objects are linked directly into the device image with no host shadow,
are never
drained.
This adds a **supplemental, Linux-only HSA-introspection drain** that
runs after
the host-shadow drain: it walks each GPU agent, enumerates only the code
objects
actually resident there, reads each one's `__llvm_profile_sections`
[83 lines not shown]
[flang][OpenMP] Check that IF clause applies to at most one leaf
This also allows placing the IF clause in the "allowedClauses" set for
all directives, instead of having it in "allowedOnceClauses" for some
directives and in "allowedClauses" for others.
The emitted diagnostic will show which constituent has multiple IF
clauses applying to it:
```
if.f90:4:35: error: At most one IF clause can apply to each directive constituent
!$omp & if(target teams: x > 0) if(teams distribute: y > 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
if.f90:4:11: Previous IF clause applying to the TEAMS constituent
!$omp & if(target teams: x > 0) if(teams distribute: y > 0)
^^^^^^^^^^^^^^^^^^^^^^^
```