[LoopFusion] Remove DT edge from Extiblock to ExitBlockSuc (#193641)
To remove the exit block, it cannot have successors, if this edge is not
removed, when applying the updates to the DT the following assertion
will appear:
"Assertion `Node->isLeaf() && "Node is not a leaf node."' failed"
This assertion does not always fail because before applying the updates
on the "GenericDomTreeContruction", "ApplyUpdates" function it runs
CalculateFromScratch on some situations:
// Make unittests of the incremental algorithm work
if (DT.DomTreeNodes.size() <= 100) {
if (BUI.NumLegalized > DT.DomTreeNodes.size())
CalculateFromScratch(DT, &BUI);
} else if (BUI.NumLegalized > DT.DomTreeNodes.size() / 40)
CalculateFromScratch(DT, &BUI);
[LLD][COFF] Use lazy object mechanism instead of relying on the archive map for thin archives on ARM64EC (#194349)
On ARM64EC/ARM64X, an archive may contain both native and EC symbols in
the symbol table, which can potentially conflict. Regular archives
handle this using the extended archive format, which stores the EC
symbol table in a separate section, but this is not available for thin
archives.
Work around this limitation by lazily parsing all thin archive members
instead of relying on the archive symbol table. This uses the same
mechanism as when thin archive members are passed with
-start-lib/-end-lib, where symbols are added to the symbol table without
pulling in the object file unless it is referenced.
Fixing this at the archive format level would require changes to the
format. Currently, the ECSYMBOLS section is supported only by the COFF
archive format, while thin archives require the GNU format. We would
either need to extend the COFF format to support thin archives or
introduce ECSYMBOLS support in the GNU format.
[LLD][ELF] Fix performance regression when using linker scripts (#194668)
The addition of the support for `--enable-non-contiguous-regions` from
PR #90007 moved an "early out" condition in
`LinkerScript::computeInputSections()`. This could result in other
relatively expensive checks, i.e. `pat.sectionPat.match`,
`cmd->matchesFile`, `pat.excludesFile` and `flagsMatch`, to be performed
unnecessarily in the default situation where
`--enable-non-contiguous-regions` is disabled.
This fix restores the "early out" condition and shows an ~14%
improvement for the Linux kernel benchmark link and has been seen to
improve performance by up to ~30% for a large UE5 link.
[mlir] Update CODEOWNERS after x86 dialects refactoring (#194388)
The two separate x86 dialects ('amx' and 'x86vector') have been merged
into a single 'x86' dialect.
Relevent paths are updated accordingly.
Also, adding myself to 'x86' dialect to enable notifications.
[libc++] Disable mistakenly enabled `optional<T&>` constructors for `optional<T>` (#194446)
Resolves #194415
- A constructor specifically meant for `optional<T&>` was left enabled
for `optional<T>`
- Fix it, and add a test to check for regression.
- This patch also corrects the constraints for `optional(optional<U>&)`
and `optional(const optional<U>&)` , as they were incorrectly
disallowing [valid conversions](https://godbolt.org/z/1r5Ea7z5M)
- Also, correct the `noexcept` specification.
- Add tests for both corrections.
[CIR] Emit target-cpu, target-features, and tune-cpu attrs on cir.func (#193458)
Add `getCPUAndFeaturesAttributes` to `CIRGenModule`, mirroring OGCG's
`GetCPUAndFeaturesAttributes`.
This sets `cir.target-cpu`, `cir.target-features` and `cir.tune-cpu`
string attributes on `cir.func`.
For AMDGPU, only features that differ from the target CPU's defaults are
emitted matching OGCG.
[OMPT][OpenMP] Use omp_initial_device for host in callbacks (#192924)
The OpenMP specification offers different ways for identifying the host
device. While users of the OpenMP API can use `omp_get_initial_device()`
or the constant `omp_initial_device` (available since OpenMP v5.2), a
tool needs to rely on the `initial_device_num` passed by the OpenMP
runtime during the `initialize` callback.
In #134451, it was discovered that the `initial_device_num` passed is
always `0`, regardless of any device are available for offload
execution. For host-only OpenMP code, this matches the result of
`omp_get_num_devices()`, and is a valid result. In the case of devices
being available though, this passed identifier is incorrect. While
`libomp` calls `omp_get_num_devices()`, `libomptarget` has not fully
initialized its PluginManager at that point, hence returning no
available devices. Tools relying on `initial_device_num` might therefore
incorrectly assume host-side execution when some code runs on a device.
Since the `ompt_get_num_devices()` entry point is also not fully
implemented, tools currently need to do on-the-fly handling for the host
[10 lines not shown]
[OpenMP][NFC] Update OpenMP Support doc for Tools Interface (#193173)
All enum values for OpenMP v5.1 are implemented.
Add entries for added and deprecated OpenMP Tools Interface features in
OpenMP v6.0.
Also fix link to PR for `transparent clause (hull tasks)`.
Signed-off-by: Jan André Reuter <j.reuter at fz-juelich.de>
[clang][NFC] Mark CWG2807 as implemented and add a test (#194755)
CWG2807 (https://wg21.link/cwg2807): One part of the standard correctly
said destructors can't be `consteval`, but another incorrectly said they
can be.
Clang diagnosed this in 9.0, for some reason started accepting it in
10.0, then went back to diagnosing in 11.0:
https://godbolt.org/z/6sWTYT38M. I've marked it as implemented since
11.0.
The issue that prompted the DR: #65665
[lldb] Implement delayed breakpoints
This patch changes the Process class so that it delays *physically*
enabling/disabling breakpoints until the process is about to
resume/detach/be destroyed, potentially reducing the packets transmitted
by batching all breakpoints together.
Most classes only need to know whether a breakpoint is "logically"
enabled, as opposed to "physically" enabled (i.e. the remote server has
actually enabled the breakpoint). However, lower level classes like
derived Process classes, or StopInfo may actually need to know whether
the breakpoint was physically enabled. As such, this commit also adds a
"IsPhysicallyEnabled" API.
https://github.com/llvm/llvm-project/pull/192910
[llvm-ir2vec] Place IR2Vec Python bindings in the tools/llvm-ir2vec/Bindings build directory (#194301)
## Place IR2Vec Python bindings `.so` in the Bindings build directory
Without an explicit output directory, CMake places the nanobind
extension module
in `<build>/lib/`, alongside unrelated LLVM libraries.
- This change adds `set_target_properties` to redirect the output to
`<build>/tools/llvm-ir2vec/Bindings/`, keeping the Python bindings
isolated within its own tool's build tree. This mirrors MLIR's
convention,
where Python extension modules are placed under
`<build>/tools/mlir/python_packages/` rather than the global `lib/`
directory.
- %llvm_lib_dir was pointing to build-llvm/lib but the .so actually
lives at build-llvm/tools/llvm-ir2vec/Bindings/. The tests were silently
[7 lines not shown]
[GlobalISel] skip type check when matching metadata operand (#191389)
Assisted-by: Claude Opus 4.6
---------
Co-authored-by: macurtis-amd <macurtis at amd.com>
[VPlan] Don't create sub(ext(mul(...))) partial reductions (#194660)
Currently if we have a loop that does a sub(ext(mul(...))) reduction
then createPartialReductions will try to transform it to a partial
reduction but then crash due to hitting an llvm_unreachable in
createPartialReductionExpression.
It looks like handling this in createPartialReductionExpression would
require adding a new expression recipe kind, so for now just don't try
to use a partial reduction so we avoid the crash.
Fixes #194000
[X86][FastISel] Restore support for struct returns (#194586)
After #180322, X86 FastISel forces SDAG fallback for any call with a
struct return. This caused major compile-time regressions for debug
builds in Rust, where struct returns are very common.
The type legality check should work on the de-aggregated types, not on
the return type directly.
[LLD][COFF] Move Archive::create call to LinkerDriver::addBuffer (NFC) (#194346)
This allows an upcoming change to Archive::create() to make decisions
based on the archive type.