[lldb-server] Implement support for MultiBreakpoint packet
This is fairly straightforward, thanks to the helper functions created
in the previous commit.
https://github.com/llvm/llvm-project/pull/192910
[DAG] visitAND - attempt to fold (and buildvector(), buildvector()) -> buildvector() (#193987)
See if we can fold all elements of an AND of buildvectors: AND(-1,X) -> X, AND(0,X) -> 0, etc.
Companion to ##183032
[lldb] Override UpdateBreakpointSites in ProcessGDBRemote to use MultiBreakpoint
This concludes the implementation of MultiBreakpoint by actually using
the new packet to batch breakpoint requests.
https://github.com/llvm/llvm-project/pull/192910
[lldb] Implement delayed breakpoints
This patch changes the Process class so that it delays *physically*
enabling/disabling breakpoints until the process is about to
resume/detach/be destroyed, potentially reducing the packets transmitted
by batching all breakpoints together.
Most classes only need to know whether a breakpoint is "logically"
enabled, as opposed to "physically" enabled (i.e. the remote server has
actually enabled the breakpoint). However, lower level classes like
derived Process classes, or StopInfo may actually need to know whether
the breakpoint was physically enabled. As such, this commit also adds a
"IsPhysicallyEnabled" API.
https://github.com/llvm/llvm-project/pull/192910
[lldb][NFC] Move BreakpointSite::IsEnabled/SetEnabled into Process
The Process class is the one responsible for managing the state of a
BreakpointSite inside the process. As such, it should be the one
answering questions about the state of the site.
https://github.com/llvm/llvm-project/pull/192910
[MC] Take MCAsmInfo by reference in MCContext and TargetMachine. NFC (#194280)
Both MCContext::MCContext and TargetMachine::getMCAsmInfo treat
MCAsmInfo as a pointer that must be non-null. Make the contract
explicit:
* MCContext's constructor takes `const MCAsmInfo &MAI`.
* TargetMachine::getMCAsmInfo returns `const MCAsmInfo &`.
Make this change now since the MCContext ctor has recently been updated.
[flang] only instantiate required symbols from parent modules (#193689)
Currently lowering is instantiating (creating
fir.address_of/hlfir.declare) for all module variables from host module
and submodules (for instance, in the new
host_module_variable_instantiation.f90 test, a fir.address_of was
generated the unused var2 inside the procedure foo).
This created a lot of noise (and in the worst cases, compile time
performance issues), and also some extra complexity at least for OpenACC
where the IR acc routine ended up referencing globals that are no
actually needed, creating the need to copy them on the GPU or to have
custom logic to ignore the globals.
This patch addresses this by doing a visit of the parse tree to detect
the required symbols and only instantiate those.
libcxxabi: declare __gnu_unwind_frame in cxa_personality (#189787)
ARM EHABI builds of libcxxabi fail with clang-22+ because
cxa_personality.cpp calls __gnu_unwind_frame without a visible
declaration, triggering:
error: use of undeclared identifier '__gnu_unwind_frame'
Add an extern "C" forward declaration before the EHABI unwind helper so
the source compiles correctly.
Signed-off-by: Khem Raj <khem.raj at oss.qualcomm.com>
[lldb-server] Implement support for MultiBreakpoint packet
This is fairly straightforward, thanks to the helper functions created
in the previous commit.
https://github.com/llvm/llvm-project/pull/192910
[flang][OpenMP] Support lowering of metadirective (part 1)
This patch implements following feature in metadirective:
- implementation={vendor(...)}
- device={kind(...), isa(...), arch(...)}
- user={condition(<constant-expr>)}
- construct={parallel, target, teams}
- default, nothing, and otherwise clause
Dynamic user conditions, target_device, and loop-associated
variants are deferred to follow-up patches.
This patch is part of the feature work for #188820.
Assisted with copilot and GPT-5.4
[CIR][AMDGPU] Add lowering for amdgcn_div_scale builtins (#192931)
Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/2050
This PR adds support for lowering of _builtin_amdgcn_div_scale* amdgpu
builtins to clangIR.
Followed similar lowering from reference clang->llvmir in
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp.
[flang][OpenMP] Support lowering of metadirective (part 1)
This patch implements following feature in metadirective:
- implementation={vendor(...)}
- device={kind(...), isa(...), arch(...)}
- user={condition(<constant-expr>)}
- construct={parallel, target, teams}
- default, nothing, and otherwise clause
Dynamic user conditions and loop-associated variants are deferred
to follow-up patches.
This patch is part of the feature work for #188820.
Assisted with copilot and GPT-5.4
[NVPTX] Improve error diagnostic when handling unknown intrinsics (#191194)
Following up on #146726, it may be desirable to gracefully fail the
compilation in the presence of unknown NVVM intrinsics, which
cannot be lowered by the NVPTX backend, rather than silently
emitting invalid PTX.
[LoongArch] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector (#164943)
On 64-bit targets the generic legalize will use an i64 load and a
scalar_to_vector for us. But on 32-bit targets, i64 isn't legal, and the
generic legalizer will end up emitting two 32-bit loads. This patch uses
f64 to avoid the splitting entirely and the redundant int->fp
conversion.
[MLIR][OpenMP] Post-translate declare-target USM indirection in OpenMPIRBuilder
When lowering OpenMP to LLVM IR for the target device, record pairs of the
`declare target` device global and the OMPIRBuilder "ref" pointer global
(used for unified shared memory) via `OpenMPIRBuilder`. During the
`OpenMPIRBuilder::finalize` pass, run a postpass that rewrites remaining uses of the
original global to load from the ref global and adjust the pointer (shared
path for `ConstantExpr` addrspace/bitcast chains and for direct
instruction uses).
This follows what is done by clang for similar cases:
https://reviews.llvm.org/D63108.
Co-authored-by: Composer
Co-authored-by: Gemini Pro
[Flang][OpenMP] Clear close on descriptor members for box parents in USM
Extend the MapInfoFinalization walk introduced in #185330 so
parent/member close consistency is enforced whenever
unified_shared_memory is in effect, not only when the parent map's
variable is a fir.RecordType. Allocatable (box) roots expand to member
maps the same way as derived-type instances; getDescriptorMapType may
add OMP_MAP_CLOSE to implicit descriptor members while the parent map
does not set close, which led to bad device behavior under
-fopenmp-force-usm with multiple mapped allocatables.
Co-authored-by: Composer (Cursor) <ai at cursor.com>
[LoopFusion][NFC] UTC gen some tests (#193755)
Some variables need rename as UTC normalizes IR value names. Also,
remove dead variable `%M` and `%N` from
`double_loop_nest_inner_guard.ll`
[MLIR][OpenMP] Post-translate declare-target USM indirection in OpenMPIRBuilder
When lowering OpenMP to LLVM IR for the target device, record pairs of the
`declare target` device global and the OMPIRBuilder "ref" pointer global
(used for unified shared memory) via `OpenMPIRBuilder`. During the
`OpenMPIRBuilder::finalize` pass, run a postpass that rewrites remaining uses of the
original global to load from the ref global and adjust the pointer (shared
path for `ConstantExpr` addrspace/bitcast chains and for direct
instruction uses).
This follows what is done by clang for similar cases:
https://reviews.llvm.org/D63108.
Co-authored-by: Composer
Co-authored-by: Gemini Pro
[Flang][OpenMP] Clear close on descriptor members for box parents in USM
Extend the MapInfoFinalization walk introduced in #185330 so
parent/member close consistency is enforced whenever
unified_shared_memory is in effect, not only when the parent map's
variable is a fir.RecordType. Allocatable (box) roots expand to member
maps the same way as derived-type instances; getDescriptorMapType may
add OMP_MAP_CLOSE to implicit descriptor members while the parent map
does not set close, which led to bad device behavior under
-fopenmp-force-usm with multiple mapped allocatables.
Co-authored-by: Composer (Cursor) <ai at cursor.com>
AMDGPU: Back-propagate wqm for sources of side-effect instruction (#193395)
For readfirstlane instruction, as it would get undefined value if exec
is zero. To handle the case that only helper lanes execute the parent
block, we let the readfirstlane to execute under wqm. But this is not
enough. If the parent block was also executed by non-helper lanes, we
also need to make sure its sources were calculated under wqm. Otherwise,
if the instruction that generate the source of readfirstlane was
executed under exact mode, the value would contain garbage data in help
lane. The garbage data in helper lane maybe returned by the
readfirstlane running under wqm.
To fix this issue, we need to enforce the back-propagation of wqm for
instructions like readfirstlane. This was only done if the instruction
was possibly in the middle of wqm region (by checking OutNeeds).