[WebAssembly] Fold any/alltrue SIMD boolean reductions with eqz (#184704)
Existing ISel patterns match setne/seteq following SIMD boolean reductions
any_true and all_true, and drop the ones that are redundant (because the
reductions always return 1 or 0). This adds patterns to also produce eqz
instructions instead of a comparison with a const.
[flang-rt] Need to pad the output of execute_command_line(..., CMDMSG) (#185509)
Previously the error message was copied, but not padded for cases where
the message was shorter than the passed CMDMSG string. Add the padding
and also change the test case to test padding on all platforms.
[Offload][AMDGPU] Fix RPC server on mixed w32 w64 workloads (#185496)
Summary:
This was a regression from the original LLVM-gpu-loader. We used to
handle `-mwavefrontsize64` correctly in the loader by over-allocating
memory and just leaving the upper 32-bits masked off. In order to handle
this in offload we need to scan loaded kernels to see how much memory we
need to allocate. This should be safe, the protocol is designed to
handle an arbitrary size and worst-case this just wastes space.
[libc] Add more macro/type declarations to Elf headers. (#185348)
* Add several `AT_` macro values from `<sys/auxv.h>`. In particular,
this allows to make internal Linux auxv header parsing more hermetic by
removing one of Linux header includes.
* Add constants between `DT_ADDRNGLO` and `DT_ADDRNGHI`, in particular
`DT_GNU_HASH`, which is de-facto standard on many platforms.
* Add `Elf32_auxv_t` and `Elf64_auxv_t` types which define the auxv
entries and can be used by VDSO parsing code. Note that this PR doesn't
yet update libc's own Linux auxv header support (in
`src/__support/OSUtil/linux/auxv.h`).
This fixes some of the missing definitions when building code working
with Elf files, such as Abseil's debugging support in
https://github.com/abseil/abseil-cpp/tree/master/absl/debugging/internal.
[clang-doc] Cleanup CMake files and ensure benchmarks build (#185469)
There's some poor formatting, and ClangDocBenchmark references several
targets that are required, but only because they're required for clang-doc
itself. We can just get those requirements from the clangDoc target.
Additionally, we can make sure the benchmark builds as part of testing
when LLVM_INCLUDE_BENCHMARKS is set.
[arm64ec] Fix missing sret return in Arm64EC entry thunks for large struct returns (#185452)
When an Arm64EC function returns a struct by value that is too large for
x64's `RAX` (>8 bytes), the entry thunk synthesizes a hidden sret
pointer parameter for the x64 side. However, this
parameter was never marked with the sret attribute, so ISel did not copy
its value into `x8` (the Arm64EC mapping of `RAX`) on return. This
caused the x64 caller to see a garbage pointer in `RAX` instead of the
return buffer address.
The change adds the sret attribute to the thunk's synthesized pointer
parameter, so that `LowerFormalArguments` saves it and `LowerReturn`
restores it to `x8` before the tail call to `__os_arm64x_dispatch_ret`.
Fixes #185390
[flang][cuda][NFC] Add filename and line number in error reporting (#185516)
Some entry points carry over filename and line number for error
reporting. Use this information when reporting cuda error.
[libunwind][PAC] Defang ptrauth's PC in valid CFI range abort
It turns out making the CFI check a release mode abort causes many,
if not the majority, of JITs to fail during unwinding as they do not
set up CFI sections for their generated code. As a result any JITs
that do nominally support unwinding (and catching) through their JIT
or assembly frames trip this abort.
rdar://170862047
[Clang] Update the 'gpuintrin.h' lane scan handling (#185451)
Summary:
This patch uses a more efficient algorithm for the reduction rather than
a divergent branch. We also provide a prefix and suffix version, the sum
is now just the first element of this.
This changes the name to this, which is technically breaking but I don't
think these were really used in practice and it's a trivial change based
on the clang version if it's really needed..
```
__gpu_prefix_scan_sum_u32(...)
__gpu_suffix_scan_sum_u32(...)
```
[libc][docs] Furo theme, new landing page, cleanups (#184303)
Switch the libc documentation site from the alabaster theme to Furo,
which provides mobile-friendly layout, a collapsible sidebar with
caption-based section grouping, and built-in "Edit this page" links.
Changes by area:
conf.py
- Switch html_theme to "furo"
- Add myst_parser extension (already in llvm/docs/requirements.txt, used
by LLDB/Clang/LLVM docs) to allow Markdown alongside RST
- Accept both .rst and .md source suffixes
- Configure Furo source_repository/source_branch/source_directory for
"Edit this page" links pointing to GitHub
- Wire _static/copybutton.{js,css} for copy-to-clipboard buttons on code
blocks (no new pip dependency; can migrate to sphinx-copybutton later
once it's in requirements-hashed.txt)
- Exclude plan-docs.md and Helpers/ from Sphinx processing
[31 lines not shown]
[mlir][spirv] Make `MatrixType` type a `ShapedType` (#185470)
This will allow to enforce some of the type constraints in ODS using
builtin classes e.g., `AllElementTypesMatch`. This is a first PR in a series of PRs moving all verification for Matrix
ops to ODS.
[CodeGen] Fix prefetch-targets-error.mir
\#184194 introduced this test which was failing in some configurations
as it would try and write output to the test directory by having
incorrectly specified -o flags.
[libc] Use explicit cast to time_t in utimes_test. (#185307)
This fixes an error on RISCV-32 bot, where time_t is "long long" type
(64-bit, as required by POSIX), instead of "long".