WIR [CIR][CodeGen] Remove dead srcAS code in emitCastLValue address spacecast (#197016)
The srcAS variable was computed but never used since upstream's
performAddrSpaceCast only takes (value, destType). Remove the dead code
and its errorNYI for non-target address spaces.
Fixes part of #192314
[CIR] Implement implicit value init for aggregates (#197029)
This implements the AggExprEmitter::VisitImplicitValueInitExpr function
for CIR. The code to emit a zero-initializer was already present. We
just needed to hook it up to the visitor.
[CIR] Implement copy construction of EH catch values (#196419)
This change implements handling of exception variables that require copy
construction (on Itanium targets) before they can be used in a catch
handler, using the cir.contruct_catch_param operation.
Some targets, such as MSABI, do not need to perform an explicit copy.
The construct_catch_param operation is effectively a noop for those
cases and will be lowered as such when the EHABI lowering is implemented
for those targets.
Assisted-by: Cursor / claude-opus-4.7-thinking-xhigh
[Instrumentor] Add Alloca and Function support; stack usage example
This adds support for alloca instrumentation and function pre/post
instrumentation. Alloca support follows load/store support directly.
Functions require special care to determine the insertion points.
Together, we can showcase how the stack high watermark can be profiled,
see InstrumentorStackUsage.cpp.
[Instrumentor] Use the pass builder's FileSystem for reading files
In the IO sandbox, the old read calls caused the CI to fail. This
changes uses the PassBuilder's FileSystem the same way other passes
read files from disk (during CI).
[flang][cuda][openacc] Don't apply CUDA Fortran COMMON/EQUIVALENCE rule to internal UseDevice marker (#197036)
`CUDADataAttr::UseDevice` is not user-spellable; the symbol that
actually lives in COMMON/EQUIVALENCE carries no CUDA attribute. The CUDA
Fortran restriction (CUDA Fortran Programming Guide §3.2) does not apply
to it.
Exclude `UseDevice` from the COMMON/EQUIVALENCE check alongside the
existing `Pinned` exclusion, and add a Semantics regression test.
[Instrumentor] Add Alloca and Function support; stack usage example
This adds support for alloca instrumentation and function pre/post
instrumentation. Alloca support follows load/store support directly.
Functions require special care to determine the insertion points.
Together, we can showcase how the stack high watermark can be profiled,
see InstrumentorStackUsage.cpp.
[Instrumentor] Use the pass builder's FileSystem for reading files
In the IO sandbox, the old read calls caused the CI to fail. This
changes uses the PassBuilder's FileSystem the same way other passes
read files from disk (during CI).
[gsymutil] Fix build error in 196448 and remove warning message (#197028)
**Problem:**
#196448 broke the linux build of a test `DebugInfoGSYMTests`. See this
[buildbot](https://lab.llvm.org/buildbot/#/builders/10/builds/28337).
**Root cause:**
The `BinaryFormat` is a dependency that is required when the build is
done with `-DBUILD_SHARED_LIBS=ON`. This explains why some of the linux
builds passes, while the above buildbot fails.
**Fix:**
This patch fixes this by adding that dependency.
This patch also removes the warning message that was added by the same
patch, which should be added in a different way, as pointed out by this
[comment](https://github.com/llvm/llvm-project/pull/196448#discussion_r3221162626).
[10 lines not shown]
[lldb] Add specific error message for "process plugin" with no plugin loaded (#196933)
Fixes #196535.
The error was:
> command is not implemented
Which is incorrect. It is now:
> no process plugin commands are currently registered
Which is not very helpful either but it's not wrong at least. We could
expand it but I'm not sure what would help anyone here, given how rare
it is that anyone encounters this anyway.
[compiler-rt][common] Only unmap stacks the runtime has actually mapped (#179000)
When the sanitizer hasn't mapped the alternate signal stack, but the
host program has (like LLVM), the runtime still tries to unilaterally
unmap the alternate stack. Instead, the runtime should just check if
it's actually mmaped the alternate stack, and only unmap it if it has.
---------
Co-authored-by: Vitaly Buka <vitalybuka at google.com>
[WebAssembly][GlobalISel] Fix ordering of operands for calls and other issues (#196898)
Fixes a few of issues with `WebAssemblyCallLowering::lowerCall`.
- Fixes the ordering of operands on the call instruction. Defs (so call
returns) must come before uses (call target and args).
- Prevents the tail-call bail out from null derefing when the call base
is empty (e.g. for libcalls).
- Ensures that the reg class is always set for the return registers of
the call instruction (before, if the regs didn't need splitting, it
wouldn't assign a reg-class to the existing reg, causing failures later
down the pipeline).
[msan] Strengthen LLVM/NEON floating-point<->int propagation (#196875)
This generalizes handleNEONVectorConvertIntrinsic() to apply it to LLVM
cross-platform floating point<->int conversion intrinsics. The handler
uses an all-or-nothing approach: if any bit of an input element is
uninitialized, the corresponding output element is fully uninitialized.
This approximates how a single bit flip in an integer can affect
multiple bits of the equivalent floating-point (likewise for FP to int).
This implements the future work suggested in
https://github.com/llvm/llvm-project/pull/196429.
[DWARFLinker] Preserve children of DW_TAG_GNU_template_parameter_pack (#196439)
Pack children were not getting ordered synthetic keys, so TypePool
deduplicated them by name and TypesComparator sorted the survivors
alphabetically. Register the two missing tags with
SyntheticTypeNameBuilder.
[lldb] Add completion support for direct ivars (#195187)
Fixes the current shortcoming where `v m_na<TAB>` will not complete the
member `m_name` on `this`. This implements tab completion to complement
direct ivar access support in `frame variable`.
Assisted-by: claude
---------
Co-authored-by: Jonas Devlieghere <jonas at devlieghere.com>
[BOLT] Fix EH data encoding checks in relocateEHFrameSection (#196777)
Previously committed in 7ab26d7c3a16 (#195691) and later reverted
in bc654b438ffe (#196672) due to failures extended bolt-tests.
The problem was that the mask should be `0x70` instead of `0xf0`,
so to allow `DW_EH_PE_indirect` to pass through.
The `DW_EH_PE_*rel` constants are not defined as values that each
have only one distinctive bit set, so we rewrote the conditions to
check encoding scheme explicitly.
[MLIR][MemRef] Extend narrow-type emulation for dynamic offsets (#196945)
This patch adds three related extensions to the MemRef narrow-type
emulation patterns.
* `ConvertMemRefSubview` now accepts a dynamic innermost offset.
* `ConvertMemRefReinterpretCast` is generalized from the previous
static-rank-1, static-offset shape to accept any rank and dynamic
offsets, with the same alignment contract as the subview pattern.
* A new `ConvertMemRefCast` pattern handles `memref.cast` between
equivalent narrow-typed memref types so that emulation does not get
blocked by trivial casts.
[libc] Remove global printf_core StorageType declarations in float_inf_nan_converter.h (#196859)
fixed_converter.h and float_hex_converter.h have local declarations with
the same name shadowing it, causing -Wshadow warnings. The using
declaration is used in only one function, so just make it local.
[clang] Don't warn on __COUNTER__ in system macros
The introduction of extension and compatibility warnings means
that __COUNTER__ has started causing warnings (and -Werror= build
failures) due to use of system APIs.
This PR simply ensures that these diagnostics don't get reported
for system macro expansions as well.
[AMDGPU] Account for inline asm size in inst_pref_size calculation (#192306)
`SIProgramInfo::getFunctionCodeSize()` with `IsLowerBound=true` was
completely skipping inline assembly instructions, treating them as zero
bytes. This caused `amdhsa_inst_pref_size` to be severely underestimated
for kernels containing inline asm, defeating instruction prefetch on
gfx11+.
Use MCExpr label subtraction (`.Lfunc_end - func_sym`) to compute exact
function code size, resolved at assembly time. This avoids inline asm
string parsing which cannot reliably estimate code size and risks
overestimation (which causes prefetch of unmapped memory and a fatal
segfault).
Add a new `AMDGPUMCExpr` variant (`AGVK_InstPrefSize`) to compute
`min(divideCeil(codeSize, cacheLineSize), maxFieldVal)` as a custom
MCExpr, following the same pattern as `AGVK_Occupancy` and
`AGVK_AlignTo`. The cache line size and field width are derived from the
subtarget via `IsaInfo::getInstCacheLineSize` and feature-bit checks
[24 lines not shown]
[libc] Fix -Wshadow warning in sqrtf128.h (#196851)
sqrtf128() contained both `using namespace sqrtf128_internal;` and
`using FPBits = fputil::FPBits<float128>;`, but sqrtf128_internal also
had a `using FPBits = fputil::FPBits<float128>;`. The outer `using`
wasn't actually used, so remove that one.
[AMDGPU] Add `.amdgpu.info` section for per-function metadata (#192384)
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the
compiler must emit per-function metadata and call graph edges in the
relocatable object so the linker can compute whole-program resource
requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged,
length-prefixed binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags,
private segment size) and relational edges (direct calls, LDS uses,
indirect call signatures). String data such as function type signatures
[4 lines not shown]