[libc++] Disable mistakenly enabled `optional<T&>` constructors for `optional<T>` (#194446)
Resolves #194415
- A constructor specifically meant for `optional<T&>` was left enabled
for `optional<T>`
- Fix it, and add a test to check for regression.
- This patch also corrects the constraints for `optional(optional<U>&)`
and `optional(const optional<U>&)` , as they were incorrectly
disallowing [valid conversions](https://godbolt.org/z/1r5Ea7z5M)
- Also, correct the `noexcept` specification.
- Add tests for both corrections.
[CIR] Emit target-cpu, target-features, and tune-cpu attrs on cir.func (#193458)
Add `getCPUAndFeaturesAttributes` to `CIRGenModule`, mirroring OGCG's
`GetCPUAndFeaturesAttributes`.
This sets `cir.target-cpu`, `cir.target-features` and `cir.tune-cpu`
string attributes on `cir.func`.
For AMDGPU, only features that differ from the target CPU's defaults are
emitted matching OGCG.
[OMPT][OpenMP] Use omp_initial_device for host in callbacks (#192924)
The OpenMP specification offers different ways for identifying the host
device. While users of the OpenMP API can use `omp_get_initial_device()`
or the constant `omp_initial_device` (available since OpenMP v5.2), a
tool needs to rely on the `initial_device_num` passed by the OpenMP
runtime during the `initialize` callback.
In #134451, it was discovered that the `initial_device_num` passed is
always `0`, regardless of any device are available for offload
execution. For host-only OpenMP code, this matches the result of
`omp_get_num_devices()`, and is a valid result. In the case of devices
being available though, this passed identifier is incorrect. While
`libomp` calls `omp_get_num_devices()`, `libomptarget` has not fully
initialized its PluginManager at that point, hence returning no
available devices. Tools relying on `initial_device_num` might therefore
incorrectly assume host-side execution when some code runs on a device.
Since the `ompt_get_num_devices()` entry point is also not fully
implemented, tools currently need to do on-the-fly handling for the host
[10 lines not shown]
[OpenMP][NFC] Update OpenMP Support doc for Tools Interface (#193173)
All enum values for OpenMP v5.1 are implemented.
Add entries for added and deprecated OpenMP Tools Interface features in
OpenMP v6.0.
Also fix link to PR for `transparent clause (hull tasks)`.
Signed-off-by: Jan André Reuter <j.reuter at fz-juelich.de>
[clang][NFC] Mark CWG2807 as implemented and add a test (#194755)
CWG2807 (https://wg21.link/cwg2807): One part of the standard correctly
said destructors can't be `consteval`, but another incorrectly said they
can be.
Clang diagnosed this in 9.0, for some reason started accepting it in
10.0, then went back to diagnosing in 11.0:
https://godbolt.org/z/6sWTYT38M. I've marked it as implemented since
11.0.
The issue that prompted the DR: #65665
[lldb] Implement delayed breakpoints
This patch changes the Process class so that it delays *physically*
enabling/disabling breakpoints until the process is about to
resume/detach/be destroyed, potentially reducing the packets transmitted
by batching all breakpoints together.
Most classes only need to know whether a breakpoint is "logically"
enabled, as opposed to "physically" enabled (i.e. the remote server has
actually enabled the breakpoint). However, lower level classes like
derived Process classes, or StopInfo may actually need to know whether
the breakpoint was physically enabled. As such, this commit also adds a
"IsPhysicallyEnabled" API.
https://github.com/llvm/llvm-project/pull/192910
[llvm-ir2vec] Place IR2Vec Python bindings in the tools/llvm-ir2vec/Bindings build directory (#194301)
## Place IR2Vec Python bindings `.so` in the Bindings build directory
Without an explicit output directory, CMake places the nanobind
extension module
in `<build>/lib/`, alongside unrelated LLVM libraries.
- This change adds `set_target_properties` to redirect the output to
`<build>/tools/llvm-ir2vec/Bindings/`, keeping the Python bindings
isolated within its own tool's build tree. This mirrors MLIR's
convention,
where Python extension modules are placed under
`<build>/tools/mlir/python_packages/` rather than the global `lib/`
directory.
- %llvm_lib_dir was pointing to build-llvm/lib but the .so actually
lives at build-llvm/tools/llvm-ir2vec/Bindings/. The tests were silently
[7 lines not shown]
[GlobalISel] skip type check when matching metadata operand (#191389)
Assisted-by: Claude Opus 4.6
---------
Co-authored-by: macurtis-amd <macurtis at amd.com>
[VPlan] Don't create sub(ext(mul(...))) partial reductions (#194660)
Currently if we have a loop that does a sub(ext(mul(...))) reduction
then createPartialReductions will try to transform it to a partial
reduction but then crash due to hitting an llvm_unreachable in
createPartialReductionExpression.
It looks like handling this in createPartialReductionExpression would
require adding a new expression recipe kind, so for now just don't try
to use a partial reduction so we avoid the crash.
Fixes #194000
[X86][FastISel] Restore support for struct returns (#194586)
After #180322, X86 FastISel forces SDAG fallback for any call with a
struct return. This caused major compile-time regressions for debug
builds in Rust, where struct returns are very common.
The type legality check should work on the de-aggregated types, not on
the return type directly.
[LLD][COFF] Move Archive::create call to LinkerDriver::addBuffer (NFC) (#194346)
This allows an upcoming change to Archive::create() to make decisions
based on the archive type.
[MLIR][GPU] Add cooperative launch support to gpu.launch_func (#190639)
Add a `cooperative` UnitAttr to `gpu.launch_func` that enables
cooperative kernel launch semantics. Cooperative launches guarantee that
all thread blocks in the grid are co-resident on the GPU simultaneously,
enabling grid-wide synchronization patterns.
## Implementation
When `cooperative` is set (with or without cluster sizes), the lowering
emits a call to the new `mgpuLaunchKernelCooperative` runtime function,
which uses `cuLaunchKernelEx` with a `CUlaunchConfig` and
`CU_LAUNCH_ATTRIBUTE_COOPERATIVE`. This API is guarded behind
`CUDA_VERSION >= 12000`. The HIP path funnels through
`hipModuleLaunchCooperativeKernel`.
## Changes
- **GPUOps.td**: add `cooperative` UnitAttr and assembly format keyword
[17 lines not shown]