[SROA] Extend tree-structured merge to handle init + RMW pattern (#194441)
## Problem
When SROA rewrites an alloca used as a read-modify-write accumulator, it
emits a linear chain of `shufflevector + select` per partial store.
`InstCombine`'s `SimplifyDemandedVectorElts` walks this chain
recursively per element, scaling quadratically with chain length — in
practice tens of seconds of compile time on some matmul kernels.
## Example
Take an `<8 x float>` alloca initialized once and then updated in 4
chunks of 2 elements each:
```llvm
%alloca = alloca <8 x float>
store <8 x float> %init, ptr %alloca ; full init
[104 lines not shown]
[RISCV] Rename VPseudoTernaryMaskPolicy->VPseudoReductionMaskPolicy. NFC (#204053)
This makes it clearer why this class doesn't set UsesMaskPolicy and can
prevent accidental misuse in the future.
[PAC][clang] Fix ptrauth module flags behavior
The `Error` merge behavior only has effect when module flags values
mismatch, while it allows the flag being present in one module and
absent in another one.
Always emit `ptrauth-elf-got` module flag for AArch64 targets and
`ptrauth-sign-personality` module flag for AArch64 Linux targets.
The value is either 0 or 1.
[OpenMP] Introduce the ompx_name clause for kernel naming
This adds support for the ompx_name clause that allows users to specify
custom kernel names for OpenMP target offloading regions. The clause
accepts a string literal and overrides the default compiler-generated
kernel names.
Example usage:
#pragma omp target ompx_name("my_kernel")
{ ... }
Kernel names need to be unique or they are diagnosed at compile or link
time as errors.
Co-Authored-By: Claude (claude-sonnet-4.5) <noreply at anthropic.com>
[OpenMP] Use ext linkage for kernels handles and globals handles keep linkage
Host handles are now emmitted with external linkage to clash if two
kernels with the same name are registered. This could have happen right
now and silently corrupt the program, but it can happen more easily once
we allow users to name their kernels.
In the same patch we make global variable handles retain the linkage of
the global variable, forcing clashes for external ones and continue to
support weak use cases.
[XRay][Hexagon] Use PC-rel addressing for runtime globals in trampoline (#203122)
The trampolines load the runtime handler globals
(__xray::XRayPatchedFunction and friends) with absolute
constant-extended immediates, which cannot be used in a PIC/PIE link, so
linking a default-PIE executable against the xray runtime fails -- and
-fPIC on user code does not help, the bad relocations are inside the
runtime archive:
ld.lld: error: relocation R_HEX_32_6_X cannot be used against symbol
'__xray::XRayPatchedFunction'; recompile with -fPIC
[XRay][Hexagon] Fix immext encoding of high bits in sled patcher (#203129)
encodeConstantExtender() places the high 12 bits of the 26-bit extension
at the wrong offset (<<16 instead of <<2), dropping them for any
constant above ~2^20. The runtime sled patcher then encodes a corrupted
trampoline address for PIE executables (load base 0x08000000+), so the
first patched function call jumps to a bogus address and crashes.
[DirectX] Generate PDB file with debug info (#202762)
This change adds DXContainerPDB pass for DirectX pipeline.
The pass creates PDB file containing sections with shader debug
information. PDB files comply with the format used by existing DirectX
debugging tools.
---------
Co-authored-by: Vladislav Dzhidzhoev <vdzhidzhoev at accesssoftek.com>
[flang][Semantics] Warn on repeated do-variable in nested I/O implied DO (#198757)
Fixes #198528
Add a warning when an io-implied-do's do-variable appears as, or is
associated with, the do-variable of a containing io-implied-do. This
diagnoses violations of Fortran 2023 12.6.3p7:
>The do-variable of an io-implied-do that is in another io-implied-do
shall not appear as, nor be associated with, the do-variable of the
containing io-implied-do.
Since this is not a constraint, a warning is emitted rather than an
error. As suggested in the associated issue, the warning is on by
default and can be suppressed with `-Wno-io-implied-do-index-conflict`.
The check detects:
- Direct name reuse (same symbol in inner and outer implied DO)
- Association via EQUIVALENCE
[14 lines not shown]
[lldb] Add unit tests for the MCP server (#202752)
Add unit-test coverage for the MCP protocol types and server under
source/Protocol/MCP and the MCP plugin under
source/Plugins/Protocol/MCP.
The Server handlers run over the in-memory TestTransport, which gains
SimulateError/SimulateClosed/SetRegisterMessageHandlerShouldFail helpers
to drive the handler lifecycle without a real socket.
Code that touches the filesystem or otherwise requires mucking with the
test environment are deliberately left uncovered until those layers can
be mocked.
Assisted-by: Claude
[lldb] Strip code pointers in lldb-server test binary on arm64e (#203988)
Otherwise an unstripped pointer will be sent to debugserver. LLDB strips
pointers before sending them to debugserver, so debugserver does not
know how to handle it.
This fixes TestGdbRemoteSingleStep.py, TestGdbRemote_qMemoryRegion.py,
and TestGdbRemote_vCont.py on arm64e.
AMDGPU: Replace tgsplit subtarget feature with attribute
This is a per-entrypoint property and has a corresponding
assembler directive, so it should not be baked into the
subtarget. I couldn't find much documentation on what this
actually does, so the description isn't great.
Fixes #204149
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
[mlir][arith] Fix crash in ConstantOp range inference for zero-element constants (#204180)
arith::ConstantOp::inferResultRanges computes the union of element
ranges by iterating a DenseIntElementsAttr. For a zero-element constant
(e.g. ) the loop body never runs,
leaving the std::optional result unset.
-> This fixes #202531
[SLP][NFC] Fix compile-time hang in isMaskedLoadCompress
For a large gathered-load cluster LoadVecTy spans hundreds of vector
registers and the shuffle cost query blows up in processShuffleMasks.
The shuffle cost is non-negative, so bail out before computing it when
VectorGEPCost + LoadCost already reaches GatherCost (not profitable).
Fixes #204163
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/204211
[MLIR][SCF] Support permutation-based parallel loop fusion (#203207)
Improve SCF parallel-loop fusion for loops with permuted iteration
spaces.
Allow fusion after rewriting the second loop using an arbitrary
permutation of its iteration space. When multiple axes have identical
bounds and steps, also enumerate additional candidate remaps for those
equal axes.
[msan] Apply handleGenericVectorConvertIntrinsic() to fptrunc/fpext (#204197)
The current instrumentation uses handleShadowOr(), which effectively
truncates or zero-extends the shadows for fptrunc/fpext respectively;
this is overly lax because floating-point has both mantissa and exponent
components (e.g., if the mantissa is initialized but the exponent is
uninitialized, an fptrunc might end up with a fully initialized shadow,
which is incorrect; conversely, if a floating-point value is fully
uninitialized, we want the fpext'ed shadow to be fully uninitialized,
not zero-extended). This patch strengthens the instrumentation of
fptrunc/fpext by using handleGenericVectorConvertIntrinsic(), which
applies an "all-or-nothing" approach to uninitialized bits of each
scalar.
Note: https://github.com/llvm/llvm-project/pull/203903 auto-upgraded
aarch64_neon_vcvtfp2hf and aarch64_neon_vcvthf2fp to fptrunc and fpext,
which had the effect of weakening MSan's instrumentation for those NEON
intrinsics. This patch restores the stronger instrumentation for them
(and also generalizes it to all instances of fptrunc and fpext).