[OpenMP][clang] Indirect and Virtual function call mapping from host to device (#159857)
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
girara & zathura: update buildlinking to match current state
gitara had an SO major bump, so at minimum its ABI_DEPENDS needs
bumping. It also no longer requires GTK3, that moved to zathura, so
reflect it there instead.
No revbumps are needed, since all the dependent packages were updated
to new versions anyway. Ride those updates from earlier today.
shells/ksh-devel: Update to latest github commit plus additional fixes
Main changes:
- Removal of obsolete comments and build system workarounds.
- Update build/test command invocations.
- Add -j${MAKE_JOBS_NUMBER} flag to enable parallel building (I added
support for this last year).
- Ensure that ${SH} (/bin/sh) is used for shell actions while building.
- Install default shell functions in /usr/local/share/fun (for use with
FPATH and the autoload command). Symlink /usr/local/share/examples/ksh*
to that. (Of course you may decide to handle this differently, but it
would be good if the canonical share/fun directory were available.)
- Install the version with dynamic (*.so*) libraries by default (the STATIC
option can now be used to link those libraries statically). The version
with the libraries is preferred because this enables access to all the
libcmd built-ins (which are bound to /opt/ast/bin by default) and allows
writing C programs that link against these libraries -- you can even embed
the entire shell as a library. It would be good if this received wider
testing.
[5 lines not shown]
[AMDGPU] Insert readfirstlane for uniform VGPR arguments (#178198)
Fix inreg argument, which is uniform, but using VGPR due to run out of
SGPR.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault at amd.com>
[HLSL] Add globals for resources embedded in structs
For each resource or resource array member of a struct declared
at global scope or inside a cbuffer, create an implicit global
variable of the same resource type. The variable name will be
derived from the struct instance name and the member name.
The new global is associated with the struct declaration using
a new attribute HLSLAssociatedResourceDeclAttr.
Closes #182988
math/octave-forge-mboct-fem-pkg: Premptive fix for GCC-15.
- Changes so that when GCC defaults to version 15, the port will
still build.
PR: 293334
Reported by: salvadore at freebsd.org
[mlir][acc] Add ACCRecipeMaterialization pass and reduction ops (#184252)
Pass
----
Add the `acc-recipe-materialization` pass, which materializes OpenACC
privatization, firstprivate and reduction recipes by inlining their
init, copy, combiner, and destroy regions into the operation for the
construct. The pass runs on acc.parallel, acc.serial, acc.kernels, and
acc.loop.
- Firstprivate: Inserts acc.firstprivate_map so the initial value is
available on the device, then clones the recipe init and copy regions
into the construct and replaces uses with the materialized alloca.
Optional destroy region is cloned before the region terminator.
- Private: Clones the recipe init region into the construct (at region
entry or at the loop op for acc.loop private). Replaces uses of the
recipe result with the materialized alloca. Optional destroy region is
cloned before the region terminator.
[42 lines not shown]
[Github] Respect LLVM_VERSION when building windows container (#184231)
Otherwise setting LLVM_VERSION does not actually do anything. This
avoids needing to update ~8 different locations in the file when doing a
toolchain bump to just 1 place.
[Github] Bump Github Runner to v2.332.0 (#184230)
To stay ahead of the support horizon. There were no major feature
changes/bug fixes from a cursory glance at the release notes.
[Clang] Add missing extension cl_intel_split_work_group_barrier declaration (#184269)
All the OpenCL extensions must be declared in OpenCLExtensions.def,
otherwise the frontend won't recognize them and won't be able to use
them in the code. This patch adds the missing declaration for the
`cl_intel_split_work_group_barrier` extension.
net-im/deltachat-desktop: Fix distinfo
The distinfo in my staged commit was from before dc5d1ed9379bf9b2bebfe8631b2caa99a7cf0819
which fixed distinfo reproducibilty when using pnpm
net-im/deltachat-desktop: Fix distinfo
The distinfo in my staged commit was from before dc5d1ed9379bf9b2bebfe8631b2caa99a7cf0819
which fixed distinfo reproducibilty when using pnpm
[CIR] Upstream basic CodeGen tests from incubator (#183998)
This PR upstreams `expressions.cpp` and `c89-implicit-int.c` from the
ClangIR incubator to the mainline.
Following the incremental approach discussed in #156747 and the feedback
from the closed PR #157333, I have:
1. Copied the files directly from the incubator to preserve history.
2. Updated the `RUN` lines to use the `--check-prefix=CIR` flag.
3. Converted `CHECK:` lines to `CIR:`.
4. Standardized variable captures using the `%[[VAR:.*]]` regex syntax
(in `expressions.cpp`).
Verified locally with `llvm-lit`. This is a partial fix for #156747.
*Note: As suggested in previous reviews, I am focusing only on the `CIR`
checks for now to keep the upstreaming incremental. OGCG/LLVM
verification can be added in a follow-up PR once the base tests land.*
[RISCV] Update Andes45 vector reduction scheduling info (#182980)
This PR adds latency/throughput for all RVV reductions to the andes45
series scheduling model.
[clang-doc] Improve complexity of Index construction
The existing implementation ends up with an O(N^2) algorithm due to
repeated linear scans during index construction. Switching to a
StringMap allows us to reduce this to O(N), since we no longer need to
search the vector.
The `BM_Index_Insertion` benchmark measures the time taken to insert N
unique records into the index.
| Scale (N Items) | Baseline (ns) | Patched (ns) | Speedup | Change |
|----------------:|--------------:|-------------:|--------:|-------:|
| 10 | 9,977 | 11,004 | 0.91x | +10.3% |
| 64 | 69,249 | 69,166 | 1.00x | -0.1% |
| 512 | 1,932,714 | 525,877 | 3.68x | -72.8% |
| 4,096 | 92,411,535 | 4,589,030 | 20.1x | -95.0% |
| 10,000 | 577,384,945 | 12,998,039 | 44.4x | -97.7% |
The patch delivers significant improvements to scalability. At 10,000
[13 lines not shown]