[AMDGPU] This reverts patches to use fp16 inline constants for i16 (#200091)
Patches reverted:
commit c315c662cd2d33e0c7f962fed742ee53626d8005
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
Date: Wed May 27 12:51:13 2026
[AMDGPU] Fix codesize estimate after #198005 (#200033)
This fixes failure in libc tests which checks the exact encoding
size. Encoding is now shorter, but it did not recognize fp16
immediates as an inlinable constant and assumes literal encoding.
Shorter encodings were created here:
https://github.com/llvm/llvm-project/pull/198005
commit 2b3bc03b5ef00e7eaa245420ca981c700e1c05c4
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
[15 lines not shown]
[alpha.webkit.UncountedLocalVarsChecker] Detect an assignment to a guardian argument (#198695)
A function parameter of type RefPtr<T>& should not be used as a guardian
variable of a raw pointer/reference variable if the function body
contains an assignment to it since such an assignment can shorten the
lifetime of the guarded object.
test(llvm-symbolizer): fix Wasm layering violation by using YAML (#200080)
Avoid using wasm-ld in LLVM tests by prebuilding the test binary
as a YAML file and using yaml2obj at test time.
This matches the approach taken in
4bce216e6b550c770f2e536422c3d95333f65ba3.
Because yaml2obj always uses 5-byte LEBs, the CODE section offset
shifted from 0x37 to 0x4b, so the file offsets passed to llvm-symbolizer
were updated accordingly.
Replaces #200046
Assisted-by: Gemini
[AMDGPU] This reverts patches to use fp16 inline constants for i16
Patches reverted:
commit c315c662cd2d33e0c7f962fed742ee53626d8005
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
Date: Wed May 27 12:51:13 2026
[AMDGPU] Fix codesize estimate after #198005 (#200033)
This fixes failure in libc tests which checks the exact encoding
size. Encoding is now shorter, but it did not recognize fp16
immediates as an inlinable constant and assumes literal encoding.
Shorter encodings were created here:
https://github.com/llvm/llvm-project/pull/198005
commit 2b3bc03b5ef00e7eaa245420ca981c700e1c05c4
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
[16 lines not shown]
[clang] fix member specializations of class and variable partial specializations
A partial specialization may be a member specialization even if it is not
an instantiation of a member partial specialization.
For example:
```C++
template<class> struct X {
template<class> struct Inner;
};
template<> template<class T>
struct X<int>::Inner<T*> {};
```
Make sure this state is represented, so that [temp.spec.partial.member]p2
can be applied.
Split off from #199528
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
[coro] Use C calling convention for C++20 coroutines (#198943)
Change the calling convention for resume / destroy functions of C++
coroutines from `fastcc` to the C calling convention.
The resume / destroy functions are exposed as part of the coroutine ABI
and must be compatible with other compilers and other versions of LLVM.
fastcc is an LLVM-internal, unstable calling convention, though.
In practice, fastcc and the C calling convention are in sync for `void
func(void*)` function signatures on almost all platforms. Therefore, I
think we can still do this change without widespread ABI breakage.
`fastcc` and `ccc` do differ for i686 (x86-32), MIPS O32, PowerPC64
ELFv1 and Lanai. Afaik, those are all legacy ABIs and a recent feature
like C++20 coroutines is unlikely to be used by projects still targeting
legacy ABIs.
Historical context: I tried to figure out why `fastcc` was used. It is
[6 lines not shown]
[PGO][AMDGPU] Add basic HIP offload PGO support (#177665)
Provide the minimum HIP/offload path for device profile collection and
merge on HIP before layering profile-format and uniformity-specific
changes separately.
This adds the ROCm collection runtime, hooks device collection into the
host write-file path, lowers AMDGPU instrumentation to
__llvm_profile_instrument_gpu with regular counters, and disables GPU
indirect-call value profiling.
Clarify dynamic metadirective selection lowering
Explain that statically applicable variants are ranked before dynamic
user conditions. When a dynamic condition is selected, it is lowered to a
runtime branch whose else region continues selection among the remaining
candidates.
Add a begin/end variant test that includes clauses, and tighten checks
for the empty `nothing` fallback.
Place dynamic condition cleanups before branching
A dynamic user condition can create expression temporaries before the
selected variant is lowered. For example, a metadirective condition
such as:
when(user={condition(getbool("hello"))}: barrier)
passes a character literal through an associated temporary. That
temporary belongs to evaluating the condition, so it must be cleaned
up before lowering enters the generated fir.if that selects between
variants.
Finalize the statement context after evaluating the condition and
before creating the branch. Keep the condition expression and source
location together as DynamicUserCondition, use that source for
generated operations, and add a regression for the temporary-producing
condition case.
[AMDGPU][True16] Upstream some True16 test runlines
This testing batch preempts a change to G_MERGE_VALUES in True16 and will help demonstrate the changes. They all currently fail and so are commented out
[gsymutil] Disable readahead in `GsymReader::openFile()` (#199230)
`GsymReader::lookup()` has random access pattern (i.e. binary search an
address, then spot-load/parse info from rest of the GSYM data).
Readahead strategies in kernels (which was enabled by default) don't
necessarily improve (and may degrade) performance. This patch disables
readahead.
In a production system, similar change has seen 5% improvement on IOPS
and data reads. An offline performance test on a Linux machine shows
similar results - it reduces 14.3% total data read, 3.5% CPU%, and 2.9%
wall time (while adding 9.4% page faults). The reduction of total data
read and CPU % may help the performance of a heavily-loaded production
system.
```
┌────────────────┬─────────────┬─────────┬────────┐
│ Metric │ MADV_RANDOM │ Default │ Diff │
├────────────────┼─────────────┼─────────┼────────┤
│ Wall (s) │ 0.286 │ 0.294 │ -2.9% │
[18 lines not shown]
[DirectX] Add an "offset" operand to llvm.dbg.value (#197478)
Offset operand was removed in abe04759a6, so we need to bring it back
for DXIL. If offset is not specified, it should be zero.
---------
Co-authored-by: Andrew Savonichev <andrew.savonichev at gmail.com>
[lldb][Darwin] Read Mach-O binaries out of memory more efficiently (#200072)
When lldb needs to read a Mach-O binary out of memory, it first reads
512 bytes to get the mach header, which includes the size of the load
commands, and then does a second read to get the mach header and load
commands.
I am changing the initial read to get 3192 bytes, which will include the
full load commands for most binaries.
In April I changed debugserver to return the correct size of the mach
header and load commands in a `sizeof_mh_and_loadcmds` key. If this
number is provided, refine the amount we read to this size.
This reduces the number of memory read packets we issue from 2 to 1 for
a memory module, outside of packets that may be needed to get the symbol
table.
[LifetimeSafety] Propagate inner origins through std::move and related casts (#199600)
std::move and related casts (std::forward, std::forward_like,
std::move_if_noexcept, std::as_const) are reference casts: the result
refers to the same object as the argument. Flow all origin levels for
this family.
Fixes #191954
[clang] fix getTemplateInstantiationArgs
This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the template
context of out-of-line definitions.
This greatly simplifies the signature of that function, by removing a bunch
of workarounds, and simpliffying a couple that weren't removed yet.
Since this now relies on qualifiers and template parameter lists,
this patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.
Also makes the explicit specialization AST nodes stop abusing the template
parameter lists by storing it's own template parameter list, creating a
dedicated field for them, similar to partial specializations.
[VectorCombine] Fold deinterleave2 with smaller effective element size (#192121)
Found in real-world code where this sequence:
```
%d = llvm.vector.deinterleave2 <vscale x 16 x i32> %v
%f0 = extractvalue { <vscale x 8 x i32>, <vscale x 8 x i32> } %d, 0
%f1 = extractvalue { <vscale x 8 x i32>, <vscale x 8 x i32> } %d, 1
%low0 = and <vscale x 8 x i32> %f0, splat (i32 65535)
%low1 = shl <vscale x 8 x i32> %f1, splat (i32 16)
%merge0 = or disjoint <vscale x 8 x i32> %low0, %low1
%high0 = and <vscale x 8 x i32> %f1, splat (i32 -65536)
%high1 = lshr <vscale x 8 x i32> %f0, splat (i32 16)
%merge1 = or disjoint <vscale x 8 x i32> %high0, %high1
```
is really just doing `deinterleave2` but on `<vscale x 32 x i16>`. That
is, the same total vector size but with half the element width. So we
can turn it into:
[11 lines not shown]
[CodeGenPrepare] Use recomputed split-branch weights. (#199822)
splitBranchCondition computes new branch weights after splitting an
and/or condition into two branches, but then passed the original weights
to createBranchWeights at each metadata update. The recomputed values
were discarded.
Pass the scaled NewTrueWeight/NewFalseWeight values when installing
metadata on both generated branches.
This bug was found by a large run of Opus 4.7 looking for bugs in LLVM.
[clang][deps] Disable app extensions during scanning (#200041)
Application extension contributes to the context hash, but only affects
the availability attribute on declarations. Since it cannot affect
dependencies, disable it for the scan to reduce the number of scanning
PCM variants.
[flang][FIRToMemRef] fix stride calculation for complex lowering (#200035)
**Summary**
When `fir.array_coor` targets a projected slice of a complex array (path
0 = real, 1 = imag), FIRToMemRef must not treat the result as a dense
memref.
**Bug:** The pass stopped after fir.convert to `memref<…×complex>` (or
static-shape fast path) and used default/dense strides. Loads/stores
then stepped by sizeof(complex) instead of sizeof(re)/sizeof(im).
**Fix:** For constant `%re/%im` on `complex<T>` storage:
`fir.convert` storage to `memref<…×2×T>` and index the component (0 or
1).
Read layout from `fir.box_dims` on the box (even if the memref shape is
static).
Set each memref stride to `box_dims_byte_stride / sizeof(T)`.
Advised by Cursor
[lldb] Edits and clarifications to DataFileCache comments, NFC (#199787)
I was reading through Greg Clayton's DataFileCache PR and fixed a few
small typeos as I went along.
I also had a little trouble understanding the two types of hashes that
are calculated for a file, at first, and I tried to write comments for
the relevant methods (in Module, ObjectFile, and DataFileCache) to be
more explicit about their role and the role of the other hashes that are
calculated. It may be more detail than necessary, but it would have been
helpful for me while reading this through.
[lldb] Keep addr for Memory Modules separate (#199810)
This change is to make DataFileCache symbol table caching work with
memory-read binary modules.
When we read a Module out of memory, we keep the address of the module
in Module's m_object_name field as a string. This is normally the name
of a file in a ranlib/static library/.a archive like the "main.o" in
"foo.a(main.o)". The address is most often seen in the "image list"
output, and is the only easy way to distinguish in that output which
binaries were read out of memory, versus found on local disk. The "name"
of the Module ends up being the combination of the FileSpec plus this
m_object_name.
Reading a binary out of memory is expensive, primarily because of
reading the symbol table. The DataFileCache feature that Greg introduced
five years ago can cache the Symbol Table for a binary locally, and when
we see the same binary loaded again in a future debug session/lldb
session, we can skip parsing the symbol table (or in the case of Memory
[26 lines not shown]