[CIR] Coerce Direct args and returns in CallConvLowering (#195879)
Fourth PR in the split of #192119/#192124. Implements the
Direct-with-coercion path in CallConvLowering.
Every Direct argument or return whose ABI type differs from its source
type is now coerced through a store/reload roundtrip via an entry-block
alloca, mirroring classic codegen's CreateCoercedLoad/CreateCoercedStore.
The temporary alloca uses max(srcAlign, dstAlign) from the DataLayout and
is hoisted into the entry block so it composes with HoistAllocas
regardless of pipeline order. When the coerced type is larger than the
source -- e.g. a 12-byte aggregate returned as { i64, i64 } -- the slot is
sized to the larger type and accessed through a source-typed view for the
store and a destination-typed view for the load, so neither side
over-reads.
CallConvLowering is split into three phases (function-definition
coercion, call-site rewriting, and Ignore cleanup) because in-place
block-argument type changes from Direct-with-coerce otherwise confused the
[3 lines not shown]
[clang-sycl-linker][test] Improve dry-run mode and tighten test coverage (#200513)
- Rework `--dry-run` in `clang-sycl-linker` so it skips all real output
(writing bitcode, executing tools, etc.).
- The `link:`, `sycl-module-split:`, and a new `sycl-bundle:` summary
line are now gated on `-v` alone.
- Tighten `sycl-bundle:` checks in `basic.ll`, `split-mode.ll`, and
`triple.ll` to pin kind, triple, and arch (instead of just kind),
and add `-NOT: {{.+}}` after fully-covered dry-run check groups.
- replace the `clang-sycl-linker` + `llvm-objdump --offloading`
round-trip with a single `--dry-run -v` invocation.
- add dedicated `non-dry-run` mode test to verify code paths not exposed
in `dry-run`.
Assisted by Claude.
[X86][APX] Extend original LI to the same range as DstReg (#199182)
The #189222 folds NDD+Load to non-NDD when NDD memory variant not
preferred. However, this will changes DstReg from regular def to
early-clobber def, which causes "corrupted sub-interval" in
reMaterializeFor, because the OrigLI is not updated at the same time.
Fixes: https://godbolt.org/z/7n8ozz1EG
Assisted-by: Claude Sonnet 4.6
[libc] add shrink in-place support for reallocations (#200272)
This PR adds shrinking in-place for the freelist heap. This allows the
heap to reuse the place if the reallocation shrinks the size larger than
a minimal block unit.
Synthesized random action tests show that that increase heap utilization
rate from 87% to 97% percent, basically aligns with the expectation of
dlmalloc.
Assisted-by: AI tools, manually checked.
[CIR] Implement lowering for const-emitted global compound literals (#201152)
This came up in a test suite as a NYI, it is just emitting a
constant-backing literal for an initializer. These are specific to C, as
global compound literals have static storage duration in C. This patch,
just like classic codgen, just creates a '.compoundliteral' object as
backing for these variables, and lets us create references to them.
---------
Co-authored-by: Andy Kaylor <akaylor at nvidia.com>
[lldb] Stop hard-linking libpython into the dynamic Python plugin (#200530)
Drops ${Python3_LIBRARIES} from the SHARED build of
lldbPluginScriptInterpreterPython and lets undefined Python symbols
through at link time (`-undefined dynamic_lookup` on Darwin,
`--allow-shlib-undefined` on Linux; Windows keeps its existing
delay-load + import lib).
SystemInitializerFull::Initialize resolves the Python runtime loader
via ScriptInterpreterRuntimeLoader::Get(eScriptLanguagePython) and
calls Load() before initializing any plugin, so libpython is mapped
into the process before either entry point that references it: the
static script interpreter's Initialize() (which invokes Python via
the LLDB_PLUGIN_INITIALIZE loop) and the dynamic plugin's dlopen
(whose undefined references resolve against the in-process
libpython). This covers both LLDB_ENABLE_DYNAMIC_SCRIPTINTERPRETERS
=ON and =OFF, and keeps Windows working in static builds where the
delay-load thunks live in liblldb itself. The loader is
once_flag-cached, and errors propagate out via the existing Expected
[14 lines not shown]
[lldb] Add PythonRuntimeLoader for runtime libpython lookup (NFC) (#200524)
Generalizes the Windows-only Python lookup in PythonPathSetup into a
cross-platform abstraction. Adds an abstract ScriptInterpreterRuntimeLoader
with a per-language factory. The Python implementation dynamically loads Python
library into the current process.
The loader no-ops when Python is already in the process, then walks
LLDB_PYTHON_LIBRARY env override, the build-time Python
(LLDB_PYTHON_RUNTIME_LIBRARY_BUILD_PATH) and finally a platform candidate list:
- Darwin: DEVELOPER_DIR, the bundled Xcode.app, and Command Line Tools joined
against Python3.framework. Then python.org, /opt/homebrew, and /usr/local
joined against Python.framework. Then xcrun -f python3 and if that fails,
libpython3.dylib as a last resort.
- Linux: libpython3.so plus descending stable-ABI SONAMEs.
- Windows: the LLDB_PYTHON_RUNTIME_LIBRARY_FILENAME bare name (resolved via the
loader's default search list) and the exe-relative
LLDB_PYTHON_DLL_RELATIVE_PATH fallback (built off GetModuleFileNameW).
[5 lines not shown]
[clang-linker-wrapper] Drop SYCL dry-run stub-image special case (#201222)
Remove the `DryRun` branch in `bundleSYCL` that emitted a stub
`OffloadBinary`. SYCL goes through the same empty-buffer path as other
offload kinds, so the special case is no longer needed.
Update `linker-wrapper-image.c` to expect the resulting `[0 x i8]
zeroinitializer` constant and a size of `0` in the register/unregister
calls.
Assisted by Claude.
[CIR] Set ExternalWeakLinkage on weak/weak_import function declarations (#198422)
Classic CodeGen's `SetFunctionAttributes` calls `setLinkageForGV` to force `ExternalWeakLinkage` on `__attribute__((weak))` and Darwin `weak_import` declarations. CIR had no equivalent: weak function declarations were emitted with `ExternalLinkage` instead of `ExternalWeakLinkage`.
This adds `setLinkageForFunction` — the same weak/external-weak logic as `setLinkageForGV` — and calls it from `setFunctionAttributes`. The underlying crash on inline forward declarations (the original motivation) is already fixed by #195257; what remains is this linkage gap.
`inline-forward-decl.c` covers `__attribute__((weak))` on an inline forward declaration; `func-linkage-weak-import.c` covers Darwin `weak_import` (→ `extern_weak` in CIR and LLVM).
[NVPTX] Fix aggregate load/store lowering for (potentially) overlapping copies (#201177)
NVPTXLowerAggrCopies lowers load/store pairs of large values into a loop
of smaller copies.
However, it was incorrectly assuming that the load/store pairs it found
never alias.
This patch adds an alias check. If the pointers may alias, we emit a
memmov, which handles overlap correctly.
CUDA reproducer:
typedef char vec __attribute__((vector_size(256)));
__global__ void boom(char *p) {
*(vec *)(p + 8) = *(vec *)p;
}
[lldb][debugserver] Arguments to kill(2) are reversed (#201226)
This codepath is only executed as an attempt to clean up during a failed
launch, so the reversed arguments were rarely actually used.
rdar://175507620
[docs] Migrate 22 popular LLVM docs to MyST
This was done with LLM assistance.
I opened all 22 docs in a browser and scrolled through them, catching
and fixing a few errors.
[docs] Rename 20 popular LLVM docs .rst -> .md
Update filename references, but leave the docs with reST syntax to
ensure rename detection works.
I updated filename references so that the docs build to pass premerge
checks.
[VPlan] Don't expand SCEVs without uses to VPInstructions (NFC). (#201221)
If a VPExpandSCEVRecipe does not have users, there's no benefit to
expand it to VPInstructions, which then have to get cleaned up.
This also prevents DCE from removing VPInstructions pointed to by
TripCount after expansion.
[lldb] Have TestRunLocker run both styles of launch (#200978)
While debugging flakey behavior with TestRunLocker, I noticed that is
intended to run its test once with a stop at the entry function (and
then Continues) and once where we launch to the main() loop. But we were
never exercising the stop-at-entry codepath.
This doesn't fix the flakey behavior, although that only happens with
the launch-directly-into-main() codepath; I don't get failures when I
stop at the entry point and then continue.
[ORC] Make SimpleExecutorDylibManager::resolve an instance method. (#201211)
Promote the lambda inside resolveWrapper to a public method on
SimpleExecutorDylibManager. This brings SimpleExecutorDylibManager into
better alignment with the NativeDylibManager implementation in the new
ORC runtime, and is a step towards allowing NativeDylibManager to be
used as a drop-in replacement for SimpleExecutorDylibManager.
[RISCV][GISel] Add GPRPair to GPRB register bank and use getXLen() for GPRSize
Map GPRPair register classes to the GPRB register bank during GlobalISel
instruction selection. This is required because the introduction of HwMode-dependent
base pointer register classes (e.g. via PtrRegClassByHwMode) causes TableGen to
emit register bank checks for GPRPair variants in RISCVGenGlobalISel.inc.
Without this mapping, instruction selection crashes on unsupported classes.
To avoid assertion failures when GPRB's maximum size increases to 128-bit on RV64
due to the register pairs, update RISCVRegisterBankInfo::getInstrMapping to query
Subtarget.getXLen() for the scalar register width instead of relying on the bank's
getMaximumSize(). This matches AArch64's design pattern of mapping register pairs
(XSeqPairsClass) to GPR and resolving scalar register sizes dynamically.
This was fine previously but was exposed by the HwMode changes in
https://github.com/llvm/llvm-project/pull/177073.
Pull Request: https://github.com/llvm/llvm-project/pull/200510
[mlir][bytecode] Add option to elide locations during serialization (#201183)
Adds a setElideLocations option to BytecodeWriterConfig to elide
locations during bytecode serialization. When enabled, all LocationAttrs
are mapped to UnknownLoc during numbering and writing to produce
location-invariant bytecode (e.g., for stable fingerprinting).
Another way to achieve the same thing would be to apply the
strip-debuginfo pass,
but that requires mutating the module, which in turn requires cloning
the module if one still requires the unstripped original.
Assisted-by: Antigravity / Gemini
[cmake] Fix host tool path with driver build on Windows (#199152)
On Windows, the llvm-shlib dylib build uses the llvm-nm host tool to
make all symbols visible by default. The LLVM_TOOL_LLVM_DRIVER_BUILD=ON
build would fail because $<TARGET_FILE:llvm-nm> was invalid. This change
passes the name of the symlink / executable copy as a custom property so
things work out and the llvm-nm.exe host tool can be found.
Preserve dynamic user condition ranking
Non-constant user={condition(expr)} selectors use expr only for runtime
dispatch, so do not let it affect static applicability. Split the VMI:
static: compile-time traits, with runtime user_condition_unknown removed
ranking: static traits + user_condition_{true|unknown} [score] for explicit variants
lowering: if (expr) variant else next candidate
Use the static VMI for applicability and the ranking VMI for score/subset
specificity, so vendor(llvm), user={condition(flag)} still ranks above
vendor(llvm) and keeps any explicit score(...), even when condition(flag)
is unscored.
For extension(match_none), rank with user_condition_unknown instead of
user_condition_true since the latter is active in OMPContext and would
make the candidate reject itself.
Repeated condition traits are rejected semantically, so lowering never has
to choose between multiple runtime expressions in one user selector.
Reverting PR #184065 and #200323 to address some interplay with CFI (#201194)
There is a relation between CFI and ThinLTO GUIDs that still needs to be
disentangled first. Note that we leave the `MD_unique_id` in
`FixedMetadataKinds.def` to avoid needing to re-number it later. Plus
the metadata string ("guid") itself is used by ctxprof.
[lldb][test] Make delayed-definition-die-searching CU-layout agnostic (#201206)
The second `ParseTypeFromDWARF` for t1 (after `p v2`) only fires when
t1's definition lives in a separate CU from its forward declaration:
LLDB parses the forward-decl DIE during `p v1` and a distinct definition
DIE during `p v2`. dsymutil's parallel linker collapses both into a
single DIE in the artificial type unit, so t1 is parsed once during `p
v1` and only re-resolved during `p v2`.
Drop the second-parse CHECK so the test no longer presumes a per-CU type
layout. The remaining `'t1' resolving forward declaration...` CHECK
after `p v2` still verifies what the test was designed to catch: t1's
complete-type resolution is deferred until v2 is evaluated. If LLDB
regressed to eager resolution during `p v1`, that log line would move
and the test would fail. Add a `(t1) (x = 0)` CHECK at the end to cover
the end-to-end value.
[flang] Define array named constants from the iso_fortran_env intrins… (#201190)
…ic module
createIntrinsicModuleDefinitions() only emitted definitions for array
named constants belonging to the __fortran_ieee_exceptions intrinsic
module. Array constants declared directly in the iso_fortran_env
intrinsic module -- in practice character_kinds -- were therefore only
lowered as bodyless `fir.global` external declarations at their use site
and never defined anywhere, producing an undefined reference at link
time.
This is usually hidden because scalar iso_fortran_env parameters fold to
immediates and constant-shape array accesses are folded away, so the
dangling external symbol is DCE'd before linking. It surfaces when the
address of the array genuinely escapes to runtime, e.g.:
```
use iso_fortran_env
integer :: i, x(1)
[16 lines not shown]
[X86] Check useBWIRegs() instead of hasBWI() before creating x86_avx512_psad_bw_512 intrinsic. (#201167)
Need to check that 512-bit vectors are enabled before using a 512-bit
intrinsic.