[clang][bytecode][NFC] Add popToUInt64() to builtin evaluation (#170164)
We often don't need the APSInt at all, so add a version that pops the
integral from the stack and just static_casts to uint64_t.
[OpenMP][flang] Support GPU team-reductions on allocatables (#169651)
Extends the work started in #165714 by supporting team reductions.
Similar to what was done in #165714, this PR introduces proper
allocations, loads, and stores for by-ref reductions in teams-related
callbacks:
* `_omp_reduction_list_to_global_copy_func`,
* `_omp_reduction_list_to_global_reduce_func`,
* `_omp_reduction_global_to_list_copy_func`, and
* `_omp_reduction_global_to_list_reduce_func`.
[LLDB] Add type casting to DIL, part 1 of 3. (#165199)
This is an alternative to
https://github.com/llvm/llvm-project/pull/159500, breaking that PR down
into three separate PRs, to make it easier to review.
This first PR of the three adds the basic framework for doing type
casing to the DIL code, but it does not actually do any casting: In this
PR the DIL parser only recognizes builtin type names, and the DIL
interpreter does not do anything except return the original operand (no
casting). The second and third PRs will add most of the type parsing,
and do the actual type casting, respectively.
[LSV] Merge contiguous chains across scalar types (#154069)
This change enables the LoadStoreVectorizer to merge and vectorize
contiguous chains even when their scalar element types differ, as long
as the total bitwidth matches. To do so, we rebase offsets between
chains, normalize value types to a common integer type, and insert the
necessary casts around loads and stores. This uncovers more
vectorization opportunities and explains the expected codegen updates
across AMDGPU tests.
Key changes:
- Chain merging
- Build contiguous subchains and then merge adjacent ones when:
- They refer to the same underlying pointer object and address space.
- They are either all loads or all stores.
- A constant leader-to-leader delta exists.
- Rebasing one chain into the other's coordinate space does not overlap.
- All elements have equal total bit width.
- Rebase the second chain by the computed delta and append it to the
[22 lines not shown]
[mlir][tensor] Fix bug in `ConcatOpInterface` (#168676)
This PR fixes an issue in `ConcatOpInterface` where `tensor.concat`
fails when the concat dimension is dynamic while the result type is
static. The fix unifies the computation by using `OpFoldResult`,
avoiding the need to separately handle dynamic and static dimension
values. Fixes #162776.
[bazel] Rewrite overlay handling to starlark (#170000)
Starlark is perfectly capable of doing what we need and this avoids the
dependency on a host Python
[lldb/Target] Track containing StackFrameList to avoid circular dependencies (#170226)
This change adds tracking of the StackFrameList that produced each frame
by storing a weak pointer (m_frame_list_wp) in both `StackFrame` and
`ExecutionContextRef`.
When resolving frames through `ExecutionContextRef::GetFrameSP`, the
code now first attempts to use the remembered frame list instead of
immediately calling `Thread::GetStackFrameList`. This breaks circular
dependencies that can occur during frame provider initialization, where
creating a frame provider might trigger `ExecutionContext` resolution,
which would then call back into `Thread::GetStackFrameList()`, creating
an infinite loop.
The `StackFrameList` now sets m_frame_list_wp on every frame it creates,
and a new virtual method `GetOriginatingStackFrameList` allows frames to
expose their originating list.
Signed-off-by: Med Ismail Bennani <ismail at bennani.ma>
[XRay] Run tests inside bootstrapping build (#168378)
COMPILER_RT_STANDALONE_BUILD is set when doing a bootstrapping build
through LLVM_ENABLE_RUNTIMES with the CMake source directory being in
llvm/. This patch changes the XRay tests to also detect that we have
LLVM sources and the llvm-xray tool if we are in a bootstrapping build
through the use of the LLVM_TREE_AVAILABLE variable which is set in
runtimes/CMakeLists.txt.
[BoundsSafety][LLDB] Implement instrumentation plugin for -fbounds-safety soft traps (#169117)
This patch tries to upstream code landed downstream in
https://github.com/swiftlang/llvm-project/pull/11835.
This patch implements an instrumentation plugin for the
`-fbounds-safety` soft trap mode first implemented in
https://github.com/swiftlang/llvm-project/pull/11645 (rdar://158088757).
That functionality isn't supported in upstream Clang yet, however the
instrumented plugin can be compiled without issue so this patch tries to
upstream it. The included tests are all disabled when the clang used for
testing doesn't support `-fbounds-safety`. This means the tests will be
skipped. However, it's fairly easy to point LLDB at a clang that does
support `-fbounds-safety. I've done this and confirmed the tests pass.
To use a custom clang the following can be done:
* For API tests set the `LLDB_TEST_COMPILER` CMake cache variable to
point to appropriate compiler.
* For shell tests applying a patch like this can be used to set the
[43 lines not shown]
[libclc] Fix bitfield_insert implementation (#170208)
The `bitfield_insert` function in the OpenCL C library had an incorrect
`__CLC_BODY` definition, that included the `.inc` file for the
`__clc_bitfield_insert` declaration instead of the correct
implementation. So, the function was not defined at all, leading to
linker errors when trying to use it.
[CIR] Upstream Emit the resume branch for cir.await op (#169864)
This PR upstreams the emission of the `cir.await` resume branch.
Handling the case where the return value of `co_await` is not ignored is
deferred to a future PR, which will be added once `co_return` is
upstreamed. Additionally, the `forLValue` variable is always `false` in
the current implementation. When support for emitting `coro_yield` is
added, this variable will be set to `true`, so that work is also
deferred to a future PR.