LLVM/project 5fefdddllvm/lib/Target/LoongArch LoongArchISelLowering.cpp LoongArchMachineFunctionInfo.h, llvm/test/CodeGen/LoongArch musttail.ll tail-calls.ll

[LoongArch] Enable tail calls for sret functions and relax argument matching

Allow tail-calling functions that return via sret when the caller has an
incoming sret pointer that can be forwarded.

Remove the overly strict requirement that tail-call argument values must
exactly match the caller's incoming arguments. The real constraint is only
that the callee uses no more argument stack space than the caller.

This fixes musttail codegen and enables significantly more tail-call
optimizations.
DeltaFile
+397-0llvm/test/CodeGen/LoongArch/musttail.ll
+65-10llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+4-9llvm/test/CodeGen/LoongArch/tail-calls.ll
+7-0llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h
+6-0llvm/lib/Target/LoongArch/LoongArchISelLowering.h
+479-195 files

LLVM/project f15b756llvm/lib/ExecutionEngine/Orc MachO.cpp

[ORC] Remove unnecessary LLVM_ABI on function def. NFCI. (#168478)

DeltaFile
+1-1llvm/lib/ExecutionEngine/Orc/MachO.cpp
+1-11 files

LLVM/project 886d24dclang/lib/AST/ByteCode Compiler.cpp, clang/test/AST/ByteCode literals.cpp

[clang][bytecode] Fix fallthrough to switch labels (#168484)

We need to fallthrough here in case we're not jumping to the labels.
This is only needed in expression contexts.
DeltaFile
+11-0clang/test/AST/ByteCode/literals.cpp
+2-0clang/lib/AST/ByteCode/Compiler.cpp
+13-02 files

LLVM/project 7354533clang/include/clang/CIR MissingFeatures.h, clang/include/clang/CIR/Dialect/Builder CIRBaseBuilder.h

[CIR] X86 vector fcmp-sse vector builtins (#167125)

### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/163895.
Just add fcmp-sse part of X86 vector builtins for CIR.

---------

Co-authored-by: liuzhenya <zyliu at siorigin.com>
DeltaFile
+213-0clang/test/CIR/CodeGen/builtin-fcmp-sse.c
+58-15clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
+18-0clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
+1-0clang/include/clang/CIR/MissingFeatures.h
+290-154 files

LLVM/project 485b3afllvm/lib/Target/RISCV RISCVVLOptimizer.cpp, llvm/test/CodeGen/RISCV/rvv vl-opt-live-out.ll vl-opt.mir

[RISCV] Reduce minimum VL needed for vslidedown.vx in RISCVVLOptimizer (#168392)

Whenever #149042 is relanded we will soon start EVL tail folding
vectorized loops that have live-outs, e.g.:

```c
int f(int *x, int n) {
  for (int i = 0; i < n; i++) {
    int y = x[i] + 1;
    x[y] = y;
  }
  return y;
}
```

These are vectorized by extracting the last "active lane" in the loop's
exit:

```llvm

    [41 lines not shown]
DeltaFile
+44-0llvm/test/CodeGen/RISCV/rvv/vl-opt-live-out.ll
+41-1llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+35-0llvm/test/CodeGen/RISCV/rvv/vl-opt.mir
+120-13 files

LLVM/project ea26d92llvm/lib/Target/RISCV RISCVISelLowering.cpp

[RISCV] Remove unused argument check (NFC) (#168313)

The index == 0 scenerio has already been handled by the early return, so
only the upper half scenerio is relevant here.
DeltaFile
+1-1llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+1-11 files

LLVM/project 3b27356clang-tools-extra/clang-doc/assets comment-template.mustache function-template.mustache, clang-tools-extra/test/clang-doc basic-project.mustache.test

[clang-doc] Fix whitespace issues in Mustache basic test

I found that the issues we've been seeing in the HTML
whitespace/alignment are due to partials inserting their own whitespace
and calling partials on indented lines or lines containing text already.
This patch gets rid of unnecessary whitespace in the comment and
function partials so that they are properly indented when inserted.
DeltaFile
+358-453clang-tools-extra/test/clang-doc/basic-project.mustache.test
+42-47clang-tools-extra/clang-doc/assets/comment-template.mustache
+1-5clang-tools-extra/clang-doc/assets/function-template.mustache
+1-1clang-tools-extra/clang-doc/assets/class-template.mustache
+402-5064 files

LLVM/project 951ab04mlir/include/mlir/Dialect/GPU/Pipelines Passes.h, mlir/lib/Conversion/GPUToNVVM LowerGpuOpsToNVVMOps.cpp

[mlir][NVVM] Add no-rollback option to NVVM lowering passes (#168477)

Add pass options to run lowerings to NVVM without pattern rollback. This
makes the dialect conversions easier to debug and improves
performance/memory usage.
DeltaFile
+4-1mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+4-0mlir/include/mlir/Dialect/GPU/Pipelines/Passes.h
+1-1mlir/test/Integration/GPU/CUDA/two-modules.mlir
+1-1mlir/test/Integration/GPU/CUDA/shuffle.mlir
+1-1mlir/test/Integration/GPU/CUDA/printf.mlir
+1-1mlir/test/Integration/GPU/CUDA/multiple-all-reduce.mlir
+12-519 files not shown
+32-1925 files

LLVM/project 45279a4mlir/include/mlir/Conversion Passes.td, mlir/lib/Conversion/SCFToControlFlow SCFToControlFlow.cpp

[mlir][SCF] Add pass option to deactivate pattern rollback
DeltaFile
+5-2mlir/lib/Conversion/SCFToControlFlow/SCFToControlFlow.cpp
+4-0mlir/include/mlir/Conversion/Passes.td
+1-0mlir/test/Conversion/SCFToControlFlow/convert-to-cfg.mlir
+10-23 files

LLVM/project 35a95feclang/include/clang/Basic BuiltinsNVPTX.td, clang/test/CodeGen builtins-nvptx.c

[clang][NVPTX] Fix SM requirement of f32-tf32 rna satfinite conversion (#167836)

This change fixes the SM requirement of the f32 to tf32 conversion with
`rna` rounding mode and `.satfinite` modifier. The current requirement
specified is `sm_89` but this conversion is supported from `sm_80`
onwards after it was added in PTX 8.1.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
DeltaFile
+18-0llvm/test/CodeGen/NVPTX/convert-sm80-sf.ll
+7-3clang/test/CodeGen/builtins-nvptx.c
+0-7llvm/test/CodeGen/NVPTX/convert-sm89.ll
+1-1llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+1-1clang/include/clang/Basic/BuiltinsNVPTX.td
+27-125 files

LLVM/project b3c5491llvm/lib/Transforms/InstCombine InstCombineCompares.cpp, llvm/test/Analysis/ValueTracking known-power-of-two-urem.ll

InstCombine: Stop transforming EQ/NE of SHR to 0 to ULT/UGT if >1 use

This is a small code size optimization that lets us avoid both shifting
and comparing to a constant if we need the shifted value anyway. On most
architectures the zero comparison is cheaper than a constant comparison
(or free if the shift sets flags).

Although this change appears to remove the optimization entirely, we
continue to do this transform if there is one use because of the code
below the removed code that transforms the shift into an and, followed
by the PR10267 case in InstCombinerImpl::foldICmpAndConstConst that
transforms the and into a ult/ugt. Added a test case to verify this
explicitly.

Per [1] reduces clang .text size by 0.09% and dynamic instruction count
by 0.01%.

[1] https://llvm-compile-time-tracker.com/compare.php?from=1f38d49ebe96417e368a567efa4d650b8a9ac30f&to=0873787a12b8f2eab019d8211ace4bccc1807343&stat=size-text


    [5 lines not shown]
DeltaFile
+45-5llvm/test/Transforms/InstCombine/icmp-shr.ll
+6-8llvm/test/Transforms/PhaseOrdering/ARM/arm_mean_q7.ll
+0-10llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+2-2llvm/test/Transforms/LoopVectorize/loop-scalars.ll
+2-2llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
+1-1llvm/test/Transforms/LoopVectorize/induction.ll
+56-281 files not shown
+57-297 files

LLVM/project 3fb98e7libcxx/include span, libcxx/test/libcxx/containers/views/views.span nodiscard.verify.cpp

[libc++][span] Mark functions as `[[nodiscard]]` (#168033)

https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

---------

Co-authored-by: Hristo Hristov <zingam at outlook.com>
Co-authored-by: Nikolas Klauser <nikolasklauser at berlin.de>
DeltaFile
+53-37libcxx/include/span
+76-0libcxx/test/libcxx/containers/views/views.span/nodiscard.verify.cpp
+129-372 files

LLVM/project e7df8d7mlir/include/mlir/Dialect/GPU/Pipelines Passes.h, mlir/lib/Conversion/GPUToNVVM LowerGpuOpsToNVVMOps.cpp

[mlir][NVVM] Add no-rollback option to NVVM lowering passes
DeltaFile
+4-1mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+4-0mlir/include/mlir/Dialect/GPU/Pipelines/Passes.h
+1-1mlir/test/Integration/GPU/CUDA/two-modules.mlir
+1-1mlir/test/Integration/GPU/CUDA/shuffle.mlir
+1-1mlir/test/Integration/GPU/CUDA/printf.mlir
+1-1mlir/test/Integration/GPU/CUDA/multiple-all-reduce.mlir
+12-519 files not shown
+32-1925 files

LLVM/project 0e3fba8llvm/lib/Target/RISCV RISCVInstrInfoXSfmm.td, llvm/lib/Target/RISCV/AsmParser RISCVAsmParser.cpp

[RISCV] Remove Match_InvalidXSfmmVType. NFC (#168465)

It's not reachable because the custom parser will accept or fail the
whole instruction.
DeltaFile
+0-4llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+0-1llvm/lib/Target/RISCV/RISCVInstrInfoXSfmm.td
+0-52 files

LLVM/project d464c99compiler-rt/test/orc/TestCases/Darwin/arm64 objc-imageinfo.S

[ORC] Make tests work with Internal Shell (#168471)

This patch makes objc-imageinfo.S work with the internal shell. The test
uses a subshell to temporarily change the directory. The internal shell
does not support subshells, so this construct was replaced with a
pushd/popd sequence.
DeltaFile
+3-1compiler-rt/test/orc/TestCases/Darwin/arm64/objc-imageinfo.S
+3-11 files

LLVM/project 4b0d422llvm/include/llvm/ExecutionEngine/Orc MachO.h, llvm/lib/ExecutionEngine/Orc MachO.cpp

[ORC] Support scanning "fallback" slices for interfaces. (#168472)

When scanning an interface source (dylib or TBD file), consider
"fallback" architectures (CPUType / CPUSubType pairs) in addition to the
process's CPUType / CPUSubType.

Background:

When dyld loads a dylib into a process it may load dylib or slice whose
CPU type / subtype isn't an exact match for the process's CPU type /
subtype. E.g. arm64 processes can load arm64e dylibs / slices.

When building an interface we need to follow the same logic, otherwise
we risk generating a spurious "does not contain a compatible slice"
error. E.g. If we're running an arm64 JIT'd program and loading an
interface from a TBD file, and if no arm64 slice is present in that
file, then we should fall back to looking for an arm64e slice.

rdar://164510783
DeltaFile
+105-41llvm/lib/ExecutionEngine/Orc/MachO.cpp
+21-6llvm/include/llvm/ExecutionEngine/Orc/MachO.h
+23-0llvm/test/ExecutionEngine/JITLink/AArch64/Inputs/MachO_Foo_arm64e.tbd
+23-0llvm/test/ExecutionEngine/JITLink/AArch64/Inputs/MachO_Foo_arm64.tbd
+0-23llvm/test/ExecutionEngine/JITLink/AArch64/Inputs/MachO_Foo.tbd
+8-2llvm/test/ExecutionEngine/JITLink/AArch64/MachO_weak_link.test
+180-726 files

LLVM/project a5590a2libc/include/arpa inet.yaml, libc/src/arpa/inet inet_addr.cpp inet_addr.h

[libc] implement inet_addr (#167708)

This patch adds the posix function `inet_addr`. Since most of the
parsing logic is delegated to `inet_aton`, I have only included some
basic smoke tests for testing purposes.
DeltaFile
+25-0libc/test/src/arpa/inet/inet_addr_test.cpp
+23-0libc/src/arpa/inet/inet_addr.cpp
+21-0libc/src/arpa/inet/inet_addr.h
+15-0libc/src/arpa/inet/CMakeLists.txt
+11-0libc/test/src/arpa/inet/CMakeLists.txt
+7-0libc/include/arpa/inet.yaml
+102-04 files not shown
+106-010 files

LLVM/project 7c09f12llvm/utils/gn/secondary/llvm/lib/ExecutionEngine/Orc BUILD.gn

[gn build] Port 17f0afe40ae8
DeltaFile
+0-1llvm/utils/gn/secondary/llvm/lib/ExecutionEngine/Orc/BUILD.gn
+0-11 files

LLVM/project 5b1a4dbllvm/lib/Target/RISCV/AsmParser RISCVAsmParser.cpp

[RISCV] Remove unused function declaration. NFC (#168459)

DeltaFile
+0-1llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+0-11 files

LLVM/project 17f0afellvm/include/llvm/ExecutionEngine/Orc GetDylibInterface.h MachO.h, llvm/lib/ExecutionEngine/Orc GetDylibInterface.cpp MachO.cpp

[ORC] Merge GetDylibInterface.h APIs into MachO.h. (#168462)

These APIs are MachO specific, and the interfaces are about to be
extended to support more MachO-specific behavior. For now it makes sense
to group them with other MachO specific APIs in MachO.h.
DeltaFile
+0-128llvm/lib/ExecutionEngine/Orc/GetDylibInterface.cpp
+110-0llvm/lib/ExecutionEngine/Orc/MachO.cpp
+0-41llvm/include/llvm/ExecutionEngine/Orc/GetDylibInterface.h
+18-0llvm/include/llvm/ExecutionEngine/Orc/MachO.h
+0-1llvm/tools/llvm-jitlink/llvm-jitlink.cpp
+0-1llvm/lib/ExecutionEngine/Orc/CMakeLists.txt
+128-1716 files

LLVM/project f6ebb35llvm/docs CMake.rst

Add documentation about CMAKE_OSX_SYSROOT  (#168024)

Add documentation about CMAKE_OSX_SYSROOT so that folks bringing up on
OSX can have a clean test run.
DeltaFile
+11-0llvm/docs/CMake.rst
+11-01 files

LLVM/project 186b8balldb/bindings/lua lua-typemaps.swig

[lldb] Update Lua typemap for #167764 (#168464)

DeltaFile
+6-5lldb/bindings/lua/lua-typemaps.swig
+6-51 files

LLVM/project 307d7edllvm/utils/gn/secondary/llvm/lib/Target/AMDGPU BUILD.gn

[gn build] Port 49d5bb0ad0cb
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/Target/AMDGPU/BUILD.gn
+1-01 files

LLVM/project ec3e5dcllvm/utils/gn/secondary/llvm/unittests/CodeGen BUILD.gn

[gn build] Port 472e4ab0b02d
DeltaFile
+0-1llvm/utils/gn/secondary/llvm/unittests/CodeGen/BUILD.gn
+0-11 files

LLVM/project be96137llvm/utils/gn/secondary/llvm/lib/Target/X86 BUILD.gn

[gn build] Port 1425d75c7116
DeltaFile
+0-2llvm/utils/gn/secondary/llvm/lib/Target/X86/BUILD.gn
+0-21 files

LLVM/project 5ba8579llvm/lib/Target/AArch64 AArch64Arm64ECCallLowering.cpp AArch64CallingConvention.td, llvm/test/CodeGen/AArch64 arm64ec-indirect-call.ll

[Arm64EC] Preserve X9 for indirect calls. (#167782)

Arm64EC indirect calls use a function __os_arm64x_check_icall... this
has one obvious return value, x11, which is the function to call.
However, it actually returns one other important value: x9, which is the
final destination for the emulator after the call. If the call is
calling x64 code, x9 is used by the thunk.

Previously, we didn't model this, and it mostly worked because the
compiler usually doesn't modify x9 in the narrow window between the
check, and the call. That said, it can happen in some cases; one
reliable way is to do an indirect tail-call with stack protectors
enabled. (You can also just get unlucky with register allocation, but
it's harder to write a testcase for that.)

This patch uses the cfguardtarget bundle to simplify the calling
convention handling, for similar reasons that x64 uses it: modifying
arbitrary calls is difficult without a separate marking.

Fixes #167430.
DeltaFile
+50-0llvm/test/CodeGen/AArch64/arm64ec-indirect-call.ll
+17-3llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
+11-2llvm/lib/Target/AArch64/AArch64CallingConvention.td
+78-53 files

LLVM/project efee326.ci/metrics metrics.py

[CI] Gracefully Fail when Job Completion Timestamp is None (#168457)

There seem to be cases where the workflow status is completed but the
jobs have not completed. We need to gracefully handle these changes to
avoid a crash loop in the metrics container.
DeltaFile
+7-0.ci/metrics/metrics.py
+7-01 files

LLVM/project eb20b53compiler-rt/test lit.common.cfg.py

Revert "Reapply "[compiler-rt] Default to Lit's Internal Shell" (#168232)"

This reverts commit bde90624185ea2cead0a8d7231536e2625d78798.

This caused failures on Darwin that were not caught by upstream
buildbots. Reverting for now to give myself some time to fix.
DeltaFile
+3-5compiler-rt/test/lit.common.cfg.py
+3-51 files

LLVM/project 2c4bce4llvm/utils/gn/secondary/llvm/lib/Target/SystemZ BUILD.gn

[gn] port 320c18a066b29 (systemz SDNodeInfo)
DeltaFile
+7-0llvm/utils/gn/secondary/llvm/lib/Target/SystemZ/BUILD.gn
+7-01 files

LLVM/project 1bf902eutils/bazel/llvm-project-overlay/llvm BUILD.bazel

[bazel] Fix #168108 (#168461)

DeltaFile
+4-0utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+4-01 files