[SelectionDAG] Fold extracts of subvector inserts
Fold extract_subvector(insert_subvector(...)) when the extraction is
outside the inserted subvector or the inserted subvector only amends
the extracted
In particular,
1. vA extract_subvector (vB insert_subvector(vB X, vC Y, C1), C2) =>
vA extract_subvector(X, C2) when [C2, C2 + A) intersect [C1, C1 + C)
is the empty set
2. ... => extract_subvector(Y, C2 - C1) if [C2, C2 + Y) is a subset of
[C1, C1 + C) - an existing simplification
3. ... => vA insert_subvector(vA extract_subvector(vB X, C2), vC Y, C1 - C2)
if [C1, C1 + C) is a subset of [C2, C2 + A) - that is, if you're only
updating the extracted sub-part.
Adds a regresssion tests for an infinite SelectionDAG cycle that is
fixed by a stack of commits that ends with this one.
[3 lines not shown]
[SelectionDAG] Fold extracts of subvector inserts
Fold extract_subvector(insert_subvector(...)) when the extraction is
outside the inserted subvector or the inserted subvector only amends
the extracted
In particular,
1. vA extract_subvector (vB insert_subvector(vB X, vC Y, C1), C2) =>
vA extract_subvector(X, C2) when [C2, C2 + A) intersect [C1, C1 + C)
is the empty set
2. ... => extract_subvector(Y, C2 - C1) if [C2, C2 + Y) is a subset of
[C1, C1 + C) - an existing simplification
3. ... => vA insert_subvector(vA extract_subvector(vB X, C2), vC Y, C1 - C2)
if [C1, C1 + C) is a subset of [C2, C2 + A) - that is, if you're only
updating the extracted sub-part.
Adds a regresssion tests for an infinite SelectionDAG cycle that is
fixed by a stack of commits that ends with this one.
[3 lines not shown]
[SelectionDAG] Fold subvector inserts into concat operands
Push insert_subvector into the containing CONCAT_VECTORS operand when the insertion is wholly contained there.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Fold extracts spanning concat operands
Factor the extract_subvector-of-CONCAT_VECTORS logic and handle
extracts that cover multiple whole concat operands by rebuilding a
smaller concat directly.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Track demanded select elements in noundef checks
Propagate demanded elements through to the two arms of a select, and
check the condition with or without demanded elements depending on if
it's a vector or not.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Fold nonzero extract-of-extract indices
Generalize the extract_subvector-of-extract_subvector fold to compose
nonzero indices instead of only handling an outer index of zero.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Track demanded concat elements in noundef checks
Teach isGuaranteedNotToBeUndefOrPoison to distribute fixed-length
demanded element masks across CONCAT_VECTORS operands. This is part of
the series of fixes needed to resolve a SelectionDAG hang by making it
possible to prove certain values don't need to be frozen.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Track bitcast demanded elements in noundef tests
Bitcasts preserve undef/poison status, but vector bitcasts can change
which source lanes cover a demanded result lane. Map the demanded
element mask through fixed-length vector bitcasts before checking the
source where possible.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Look through freeze in undef demanded checks
There were cycles where the freeze combiner and thet
demanded-elements simplification code would get into fights about
whethere the operands to a shuffle or a concat should be
`freeze undef` or `undef` once the simplifier had concluded zero
elements were demanded from some operation. This PR prevents such
cases.
AI note: an LLM generated the code and the test, I've read them
Co-Authored-By: OpenAI Codex <codex at openai.com>
[SelectionDAG] Pre-commit tests for dagcombine improvements
I've got a stack of dagcombine improvements that together make an
infinite cycle relating to freeze insertion in vector-manipulation IR.
Here we have
- Handling freeze(undef) in demanded-elts for shufflevector
- Improvements to noundef checks for bitcast, concat, and select
- Improvements to extract(concat), extract(extract), and
- extract(insert) nadling
[NVPTX] Fix the build after ce465594e239. (#201268)
ce465594e239 (#201177) added sm_90 / PTX ISA 7.8 instructions to
lower-aggr-copies.ll, so we need to guard the RUN line appropriately.
[AMDGPU] Added min operation for MCExprs (#199746)
The min operation is needed in MC Expressions for a future change that
caps the max number of registers used for indirect calls.
---------
Co-authored-by: JoshuaGrindstaff <jgrindst at amd.com>
[DAGCombiner] Remove no longer tested VP_MUL handling. (#201238)
We no longer use VP_MUL in SelectionDAG on RISC-V so this code isn't
tested.
This effectively reverts db6de1a20f75cbfe1024f41e64ad39def91fa70f
[RISCV] Make VSETTM/VSETTK not affect the VSETVL emit (#197890)
VSETTM/TK will modify VTYPE, but it only affects the TM/TK bits. This
modification is safe for other RVV operations. The TM/TK value will be
maintained in insertVSETMTK.
[clangd] Handle dependent call to function with explicit object parameter in InlayHintVisitor (#201264)
Dependent calls do not yet have the implicit object argument preprended
to the CallExpr's argument list, so the first argument should not be
expected to be present and dropped in this case.
Fixes https://github.com/llvm/llvm-project/issues/198588
[clang][CUDA] Avoid ambiguity in host/device template specializations (#201049)
This commit changes SemaOverload to resolve an otherwise diagnosed
ambiguity between addresses of template specializations of functions
that are overloaded for both device and host. Similar to how it works
for non-templated function overloads, these changes prioritizes the
specializations that corresponds to the target of the owning function,
i.e. if compiling for host, the address of the host specialization takes
precedence over the device specialization and vice versa.
Fixes https://github.com/llvm/llvm-project/issues/199299
---------
Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
[BOLT] Fix data race in multi-threaded DWP type unit processing and DWP type unit duplication (#197359)
## Summary
This PR fixes a race condition in LLVM BOLT's
DIEBuilder::buildTypeUnits() that is triggered when DWARF5 split-DWARF
(.dwo/.dwp) inputs are processed with multi-threaded CU processing.
Concurrent invocations from different worker threads share the same DWP
type-unit state, which results in duplicated DIE extraction, assertion
failures, and intermittent crashes. The fix serializes buildTypeUnits()
for DWP inputs via a function-local static std::mutex, leaving the
non-DWO fast path unchanged.
## Problem Description
When BOLT processes DWARF debug info with --debug-thread-count=4
--cu-processing-batch-size=4 on testcase
dwarf5-df-types-dup-dwp-input.test, multiple threads concurrently call
DIEBuilder::buildTypeUnits() on shared DWP type units. Since type units
within a DWP file are shared across compilation units, multiple threads
may attempt to extract DIEs from the same type unit simultaneously,
violating the assertion.
[5 lines not shown]
[lldb] Enable MyST colon_fence and deflist extensions (NFC) (#201250)
Enable the colon_fence and deflist MyST parser extensions in the LLDB
docs configuration. This is a preparatory step for converting the
remaining reStructuredText documentation pages to Markdown, where these
two extensions are needed to translate RST admonition directives
(:::{note}) and definition lists.
Context:
https://discourse.llvm.org/t/rfc-make-myst-markdown-the-llvm-docs-format-rip-rest/
[docs][Kaleidoscope] fix function name InitializeModuleAndManagers in Kaleidoscope (#199601)
### Description
resloves #199477
The Kaleidoscope tutorial was not fully updated with the new Pass
Manager. This pr aligns the tutorial doc with the example code.
### Changes
- Use `InitializeModuleAndManagers` instead of
`InitializeModuleAndPassManager`.
- Remove `TheModule->setDataLayout(TheJIT->getDataLayout());` in line
141, as the `setDataLayout` was introduced later.
- Use `KaleidoscopeJIT` instead of `my cool jit` as the ModuleName, to
align with the final code.
[M68k] Add to LINK_COMPONENTS to fix BUILD_SHARED_LIBS build (#201248)
Fixes: 6897c5e24ce5 ("[M68k][MC] Add MC support for PCI w/ base
displacement addressing mode (#200696)")
[NVPTX] NVVMIntrRange: Handle maxntid > UINT32_MAX. (#201245)
Previously we computed the overall maxntid and downcast it to unsigned
int. This is not correct; it can be larger than UINT32_MAX.
This would cause reads of tid.xyz and ntid.xyz to have incorrect range
information. Also if maxntid was an exact multiple of 2^32, we'd get an
ICE (because we'd incorrectly think that maxntid is 0).
[CIR] Implement destruction of TLS and static global references (#200227)
This implements destruction of lifetime-extended reference temporaries
used to initialize TLS or static duration reference variables.
Assisted-by: Cursor / claude-opus-4.7
[CIR] Fix insertion point tracking for switch with cleanups (#201210)
We had some problems where we would incorrectly maintain the insertion
point for switch statements that contained cleanup scopes. This resulted
in cir.scope statements without a terminator, tripping a verification
error.
This change adds a RunCleanupsScope RAII object for the switch statement
and adds a check inside popCleanup() to avoid moving the insertion point
to the point after the now-closed cleanup scope if the insertion point
had previously been somewhere other than inside the cleanup scope.
Assisted-by: Cursor / claude-opus-4.8
[CIR] Coerce Direct args and returns in CallConvLowering (#195879)
Fourth PR in the split of #192119/#192124. Implements the
Direct-with-coercion path in CallConvLowering.
Every Direct argument or return whose ABI type differs from its source
type is now coerced through a store/reload roundtrip via an entry-block
alloca, mirroring classic codegen's CreateCoercedLoad/CreateCoercedStore.
The temporary alloca uses max(srcAlign, dstAlign) from the DataLayout and
is hoisted into the entry block so it composes with HoistAllocas
regardless of pipeline order. When the coerced type is larger than the
source -- e.g. a 12-byte aggregate returned as { i64, i64 } -- the slot is
sized to the larger type and accessed through a source-typed view for the
store and a destination-typed view for the load, so neither side
over-reads.
CallConvLowering is split into three phases (function-definition
coercion, call-site rewriting, and Ignore cleanup) because in-place
block-argument type changes from Direct-with-coerce otherwise confused the
[3 lines not shown]
[clang-sycl-linker][test] Improve dry-run mode and tighten test coverage (#200513)
- Rework `--dry-run` in `clang-sycl-linker` so it skips all real output
(writing bitcode, executing tools, etc.).
- The `link:`, `sycl-module-split:`, and a new `sycl-bundle:` summary
line are now gated on `-v` alone.
- Tighten `sycl-bundle:` checks in `basic.ll`, `split-mode.ll`, and
`triple.ll` to pin kind, triple, and arch (instead of just kind),
and add `-NOT: {{.+}}` after fully-covered dry-run check groups.
- replace the `clang-sycl-linker` + `llvm-objdump --offloading`
round-trip with a single `--dry-run -v` invocation.
- add dedicated `non-dry-run` mode test to verify code paths not exposed
in `dry-run`.
Assisted by Claude.
[X86][APX] Extend original LI to the same range as DstReg (#199182)
The #189222 folds NDD+Load to non-NDD when NDD memory variant not
preferred. However, this will changes DstReg from regular def to
early-clobber def, which causes "corrupted sub-interval" in
reMaterializeFor, because the OrigLI is not updated at the same time.
Fixes: https://godbolt.org/z/7n8ozz1EG
Assisted-by: Claude Sonnet 4.6
[libc] add shrink in-place support for reallocations (#200272)
This PR adds shrinking in-place for the freelist heap. This allows the
heap to reuse the place if the reallocation shrinks the size larger than
a minimal block unit.
Synthesized random action tests show that that increase heap utilization
rate from 87% to 97% percent, basically aligns with the expectation of
dlmalloc.
Assisted-by: AI tools, manually checked.