[LoongArch] Enable tail calls for sret functions and relax argument matching
Allow tail-calling functions that return via sret when the caller has an
incoming sret pointer that can be forwarded.
Remove the overly strict requirement that tail-call argument values must
exactly match the caller's incoming arguments. The real constraint is only
that the callee uses no more argument stack space than the caller.
This fixes musttail codegen and enables significantly more tail-call
optimizations.
[clang][bytecode] Fix fallthrough to switch labels (#168484)
We need to fallthrough here in case we're not jumping to the labels.
This is only needed in expression contexts.
[CIR] X86 vector fcmp-sse vector builtins (#167125)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/163895.
Just add fcmp-sse part of X86 vector builtins for CIR.
---------
Co-authored-by: liuzhenya <zyliu at siorigin.com>
[RISCV] Reduce minimum VL needed for vslidedown.vx in RISCVVLOptimizer (#168392)
Whenever #149042 is relanded we will soon start EVL tail folding
vectorized loops that have live-outs, e.g.:
```c
int f(int *x, int n) {
for (int i = 0; i < n; i++) {
int y = x[i] + 1;
x[y] = y;
}
return y;
}
```
These are vectorized by extracting the last "active lane" in the loop's
exit:
```llvm
[41 lines not shown]
[RISCV] Remove unused argument check (NFC) (#168313)
The index == 0 scenerio has already been handled by the early return, so
only the upper half scenerio is relevant here.
[clang-doc] Fix whitespace issues in Mustache basic test
I found that the issues we've been seeing in the HTML
whitespace/alignment are due to partials inserting their own whitespace
and calling partials on indented lines or lines containing text already.
This patch gets rid of unnecessary whitespace in the comment and
function partials so that they are properly indented when inserted.
[mlir][NVVM] Add no-rollback option to NVVM lowering passes (#168477)
Add pass options to run lowerings to NVVM without pattern rollback. This
makes the dialect conversions easier to debug and improves
performance/memory usage.
[clang][NVPTX] Fix SM requirement of f32-tf32 rna satfinite conversion (#167836)
This change fixes the SM requirement of the f32 to tf32 conversion with
`rna` rounding mode and `.satfinite` modifier. The current requirement
specified is `sm_89` but this conversion is supported from `sm_80`
onwards after it was added in PTX 8.1.
PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
InstCombine: Stop transforming EQ/NE of SHR to 0 to ULT/UGT if >1 use
This is a small code size optimization that lets us avoid both shifting
and comparing to a constant if we need the shifted value anyway. On most
architectures the zero comparison is cheaper than a constant comparison
(or free if the shift sets flags).
Although this change appears to remove the optimization entirely, we
continue to do this transform if there is one use because of the code
below the removed code that transforms the shift into an and, followed
by the PR10267 case in InstCombinerImpl::foldICmpAndConstConst that
transforms the and into a ult/ugt. Added a test case to verify this
explicitly.
Per [1] reduces clang .text size by 0.09% and dynamic instruction count
by 0.01%.
[1] https://llvm-compile-time-tracker.com/compare.php?from=1f38d49ebe96417e368a567efa4d650b8a9ac30f&to=0873787a12b8f2eab019d8211ace4bccc1807343&stat=size-text
[5 lines not shown]
[ORC] Make tests work with Internal Shell (#168471)
This patch makes objc-imageinfo.S work with the internal shell. The test
uses a subshell to temporarily change the directory. The internal shell
does not support subshells, so this construct was replaced with a
pushd/popd sequence.
[ORC] Support scanning "fallback" slices for interfaces. (#168472)
When scanning an interface source (dylib or TBD file), consider
"fallback" architectures (CPUType / CPUSubType pairs) in addition to the
process's CPUType / CPUSubType.
Background:
When dyld loads a dylib into a process it may load dylib or slice whose
CPU type / subtype isn't an exact match for the process's CPU type /
subtype. E.g. arm64 processes can load arm64e dylibs / slices.
When building an interface we need to follow the same logic, otherwise
we risk generating a spurious "does not contain a compatible slice"
error. E.g. If we're running an arm64 JIT'd program and loading an
interface from a TBD file, and if no arm64 slice is present in that
file, then we should fall back to looking for an arm64e slice.
rdar://164510783
[libc] implement inet_addr (#167708)
This patch adds the posix function `inet_addr`. Since most of the
parsing logic is delegated to `inet_aton`, I have only included some
basic smoke tests for testing purposes.
[ORC] Merge GetDylibInterface.h APIs into MachO.h. (#168462)
These APIs are MachO specific, and the interfaces are about to be
extended to support more MachO-specific behavior. For now it makes sense
to group them with other MachO specific APIs in MachO.h.
Add documentation about CMAKE_OSX_SYSROOT (#168024)
Add documentation about CMAKE_OSX_SYSROOT so that folks bringing up on
OSX can have a clean test run.
[Arm64EC] Preserve X9 for indirect calls. (#167782)
Arm64EC indirect calls use a function __os_arm64x_check_icall... this
has one obvious return value, x11, which is the function to call.
However, it actually returns one other important value: x9, which is the
final destination for the emulator after the call. If the call is
calling x64 code, x9 is used by the thunk.
Previously, we didn't model this, and it mostly worked because the
compiler usually doesn't modify x9 in the narrow window between the
check, and the call. That said, it can happen in some cases; one
reliable way is to do an indirect tail-call with stack protectors
enabled. (You can also just get unlucky with register allocation, but
it's harder to write a testcase for that.)
This patch uses the cfguardtarget bundle to simplify the calling
convention handling, for similar reasons that x64 uses it: modifying
arbitrary calls is difficult without a separate marking.
Fixes #167430.
[CI] Gracefully Fail when Job Completion Timestamp is None (#168457)
There seem to be cases where the workflow status is completed but the
jobs have not completed. We need to gracefully handle these changes to
avoid a crash loop in the metrics container.
Revert "Reapply "[compiler-rt] Default to Lit's Internal Shell" (#168232)"
This reverts commit bde90624185ea2cead0a8d7231536e2625d78798.
This caused failures on Darwin that were not caught by upstream
buildbots. Reverting for now to give myself some time to fix.