[LoongArch] Enable tail calls for sret and byval functions (#168506)
Allow tail calls for functions returning via sret when the caller's sret
pointer can be reused. Also support tail calls for byval arguments.
The previous restriction requiring exact match of caller and callee
arguments is relaxed: tail calls are allowed as long as the callee does
not use more stack space than the caller.
Fixes #168152
[RISCV] Add isCommutable=1 to some binary P extension instructions. (#175692)
This allows MachineCSE to commute these instructions if it would allow
CSE.
Remove caching from sysdataset plugin
Retrieving the underlying dataset name for the sysdataset path
is now only two syscalls (statx + statmount) instead of reading
the entire /proc/self/mountinfo contents and so this extra caching
actually hurting us now.
Rework system dataset migration to be less bad
This commit reworks how we migrate the system datasets so that
it's somewhat less racy and uses kernel APIs for this.
On migration:
1. build new mount tree in middleware run dir
2. sync data from old to new
3. move new under old
4. move old to middleware rundir
5. restart services
6. cleanup
[Metadata][profcheck] Handle identical MDNodes in getMergedProfMetadata
This fixes a bug where !prof metadata was dropped from SelectInsts when GVN simplified/merged them.
Guarded by -profcheck-disable-metadata-fixes. Exposed by the tests in
Transforms/SampleProfile.
[libc++][istream] Removed `[[nodiscard]]` from `peek()` (#175591)
Calling `peek()` after constructing a stream is something one can use to
make the stream ignore empty inputs:
```
#include <sstream>
int main() {
std::istringstream s;
s.peek();
while (s && !s.eof()) {
char c;
s >> c;
printf("not eof; read \'%c\' (%d)\n", c, c);
}
}
```
[2 lines not shown]
[InlineSpiller][AMDGPU] Implement subreg reload during RA spill
Currently, when a virtual register is partially used, the
entire tuple is restored from the spilled location, even if
only a subset of its sub-registers is needed. This patch
introduces support for partial reloads by analyzing actual
register usage and restoring only the required sub-registers.
This improvement enhances register allocation efficiency,
particularly for cases involving tuple virtual registers.
For AMDGPU, this change brings considerable improvements
in workloads that involve matrix operations, large vectors,
and complex control flows.
[AMDGPU] Put back ProperlyAlighedRC helper functions
Putting back the functions that are recently deleted
as they were found unused. They are needed for
implementing subreg reload during RA.
[AMDGPU] Test precommit for subreg reload
This test currently fails due to insufficient
registers during allocation. Once the subreg
reload is implemented, it will begin to pass
as the partial reload help mitigate register
pressure.
[CodeGen] Enhance createFrom for sub-reg aware cloning
Instead of just cloning the virtual register, this
function now creates a new virtual register derived
from a subregister class of the original value.
[AMDGPU] Make AMDGPURewriteAGPRCopyMFMA aware of subreg reload
AMDGPURewriteAGPRCopyMFMA pass is currently not subreg-aware.
In particular, the logic that optimizes spills into COPY
instructions assumes full register reloads. This becomes
problematic when the reload instruction partially restores
a tuple register. This patch introduces the necessary changes
to make this pass subreg-aware, for a future patch that
implements subreg reload during RA.
[llvm][utils] Make git-llvm-push set the skip-precommit-approval label (#174833)
skip-precommit-approval label is intended for simple PR that don't
require approval. To reduce the volume of notifications, label all PRs
created using the git-llvm-push script with the skip-precommit-approval
label.
Fixes #174825
[AMDGPU] Introduce Offset field in SGPR spill Pseudos
Currently, SGPR spill pseudo-instructions lack
an offset field to represent non-zero stack offsets.
This patch introduces an additional offset field to
SGPR spill pseudo-instructions and updates all
relevant passes that handle spill lowering to support
this new field. This field is essential for a future
patch that implements subreg reload of tuple registers
from their stack location during RA.