[flang][cuda] Fix ignore_tkr(m) to also cover CUDA unified attribute (#192131)
The ignore_tkr(m) directive suppresses CUDA managed attribute checking
on dummy arguments, but it was not covering the unified attribute. This
caused a spurious error when passing a plain host array to a unified
dummy with ignore_tkr(m):
```
error: dummy argument 'x=' has ATTRIBUTES(UNIFIED) but its associated actual argument has no CUDA data attribute
```
Extend the IgnoreTKR::Managed check in AreCompatibleCUDADataAttrs to
accept Unified in addition to Managed and no-attribute.
[OpenMP] Create check-openmp target for device targets (#192175)
offload/cmake/caches/AMDGPUBot.cmake enables
RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES="openmp". In that
sub-build, check-openmp target doesn't exist and there is build error
`unknown target 'check-openmp'` after 18f63d1375d0, which makes
top-level check-openmp depend on check-openmp-amdgcn-amd-amdhsa.
In openmp, the device targets only call add_subdirectory(device), which
doesn't calls construct_check_openmp_target() and check-openmp target
doesn't exist. `ninja check-openmp-amdgcn-amd-amdhsa` also fails with
the same error before 18f63d1375d0.
Fix by adding construct_check_openmp_target() for device targets as well.
Assisted-by: Claude Sonnet 4.6
[SPIRV]Implementing PopCount for 16 and 64 bits (#191283)
`OpBitCount` only supports 32bit types. So this patch modifies the
codegen to follow a similar pattern as `firstbithigh` and `firstbitlow`.
On 8 and 16 bits, the parameters are zero-extended to 32 bits. With 64
bits it is bitcasting into 2xi32 types. The logic is adapted to larger
component counts as well.
Fix: https://github.com/llvm/llvm-project/issues/142677
---------
Co-authored-by: Joao Saffran <jderezende at microsoft.com>
Make GSYM 64 bit safe and add a new version 2 of the GSYM files (#190353)
# Motivation
GSYM files are approaching the need for 64 bit offsets in the GSYM
files. We also want to add more global data to GSYM files. Right now the
GSYM file format is:
```
Header
AddressOffsets
AddressInfoOffsets
FileTable
StringTable
FunctionInfos
```
The location of the `AddressOffsets`, `AddressInfoOffsets` and
`FileTable` are always immediately following the Header. The
`StringTable` is pointed to by the header and the header uses 32 bit
integers for the string table file offset and file size. The
[74 lines not shown]
[LSR][IndVarSimplify] Update assertion message (#192168)
rewriteLoopExitValues is called by both LSR and IndVarSimplify. Update
the assertion message to match this reality rather than only mentioning
IndVarSimplify.
[runtimes] Aggregate per-target runtime checks in top-level check-${runtime_name} (#191743)
When a per-target runtime build exports a
check-${runtime_name}-${target} proxy, make the top-level
check-${runtime_name} target depend on it, creating
check-${runtime_name} on demand (it may not exist).
This applies regardless of whether the runtime comes from the default
LLVM_ENABLE_RUNTIMES set or from a target-specific
RUNTIMES_<target>_LLVM_ENABLE_RUNTIMES override.
This allows a single `check-${runtime_name}` command to trigger all
per-target tests for that runtime.
[lldb] Fix deadlock when scripted frame providers load on private state thread (#191913)
Frame providers are an overlay on top of the parent reality (the
unwinder stack). The private state thread (PST) manages the stop of that
parent reality, so the correct view for PST logic IS the parent --
providers should only be applied once the process has settled and
clients query the stopped state.
When a scripted breakpoint's `was_hit` callback calls
`EvaluateExpression` on the PST, `RunThreadPlan` spawns an override PST
(Thread B) and reassigns `m_current_private_state_thread_sp` to it. Two
threads then need to see parent frames:
- Thread B (override PST): processes stop events via
`HandlePrivateEvent` -> `ShouldStop` -> `GetStackFrameList`. If it loads
a provider, the provider's Python code can acquire locks held by Thread
A, causing a deadlock.
- Thread A (original PST): processes events inline via
[25 lines not shown]
[BOLT] Fix DW_FORM_implicit_const values lost during DWARF5 rewriting
Summary:
Fix two bugs in DIEBuilder that caused DW_FORM_implicit_const values to
be zeroed out when rewriting DWARF5 debug sections (--update-debug-sections).
1. In constructDIEFast(), DWARFFormValue was constructed with just the
form code, leaving the value at 0. For DW_FORM_implicit_const,
extractValue() is a no-op since the value is expected to be pre-set.
Fix: use AttrSpec.getFormValue() which initializes the value from the
abbreviation table.
2. In assignAbbrev(), AddAttribute(Attr.getAttribute(), Attr.getForm())
used the two-argument overload which discards the implicit_const
value. Fix: use AddAttribute(Attr) to copy the full DIEAbbrevData.
Fixes https://github.com/llvm/llvm-project/issues/192084
[AMDGPU] Add object linking support for LDS and named barrier lowering in the middle end (#191645)
This is the first patch in a series introducing object linking support
for AMDGPU.
This PR adds the `-amdgpu-enable-object-linking` flag to enable object
linking in the backend. It also updates the `AMDGPULowerModuleLDSPass`
and `AMDGPULowerExecSync` passes to support lowering LDS and named
barrier globals when object linking is enabled.
[RISCV][GISel] Use a single FEQ for fcmp ord/uno x, x (#192022)
When both operands of an ORD/UNO compare are the same register,
the double-FEQ + AND sequence is redundant: a single FEQ x, x
gives the same result. Addresses the FIXME in selectFCmp.
[LV][VPlan] Print VPlan after construction and initial optimizations. NFC (#187443)
This patch add a helper pass `printOptimizedVPlan` to print the plan at
the end of the VPlan construction and optimize pipeline.
This patch enables the opportunity that we can further clamp and attach
VF range after `VPlanTransforms::optimize` and not changing the test
printing (#172799).
[CMake] Pass ZLIB_LIBRARY_* to runtimes bootstrap (#191555)
Runtimes external project (compiler-rt / combined runtimes) reconfigures
with an initial cache that did not propagate `ZLIB_LIBRARY_RELEASE`.
CMake 4.x `FindZLIB` may leave `ZLIB_LIBRARY` unset while finding
headers, leading to:
```
-- Could NOT find ZLIB (missing: ZLIB_LIBRARY) (found version "...")
```
and later when loading LLVM exports from the main build:
```
The link interface of target "LLVMSupport" contains: ZLIB::ZLIB
but the target was not found.
```
This was found by building the Windows installer with:
```
llvm\utils\release\build_llvm_release.bat --x64 --version 23.0.0 --skip-checkout --local-python
```
[flang][CodeGen] Fix address space mismatch for CUF globals in AddrOfOpConversion (#190408)
AddrOfOpConversion in CodeGen.cpp only handled `LLVM::GlobalOp` when
determining the address space for `llvm.mlir.addressof`. When the global
was still a `fir::GlobalOp` (not yet converted), it fell back to address
space 0, breaking CUF constant globals (addr_space 4) and AMDGPU targets
(global addr_space 1).
This extends the upstream fix (#192111, which only covered Constant) to
also handle Shared and Managed CUF data attributes, and returns
`std::nullopt` instead of 0 for non-CUF globals so the target's default
address space is preserved.