[DAG] scalarizeExtractedBinOp - extract from non-constant one use buildvectors (#198013)
When attempting to scalarize a vector binop that has a single extract,
we currently only fold if either of the binop's operands is a constant
buildvector - but we can extract from non-constant buildvectors without
increasing instruction count as long as the vector binop was the only
use of the buildvector.
More yak shaving for #196493
[flang][acc] Handle Fortran do loops as acc loops in acc routine (#198420)
As was previously done for do loops in acc compute constructs in
https://github.com/llvm/llvm-project/issues/149614 , this PR does the
same for do loops in `acc routine`. The rules are follows:
- Do loops not marked with `acc loop` are considered `auto`
- Do concurrent loops are considered `independent`
- Any loops in an `acc routine seq` are considered `seq`
This ensures that the IV is correctly privatized and attached to acc
loop.
Reland "[CodeGen] Use byte offsets and ptradd in ShadowStackGCLowering" (#197436)
Replace typed struct GEPs with byte array allocation and ptradd
operations:
1. Track root offsets as byte offsets instead of building typed struct.
2. Use `ComputeFrameLayout` to compute byte offsets based on DataLayout,
properly accounting for each root's size and alignment.
3. Allocate frame as `[FrameSize x i8]` byte array instead of typed
struct.
4. Replace all CreateGEP operations with CreatePtrAdd using computed
offsets.
5. Frame layout unchanged: `[Next ptr | Map ptr | Root 0 | Root 1 | ...
| Root N]` where each root is placed at its computed aligned offset.
6. Zero out padding between roots with memset for deterministic frame
contents for GC.
Benefits:
- Removes dependency on `getAllocatedType` for building frame struct
[7 lines not shown]
[AMDGPU][NFCI] Change MCSubtargetInfo references in AMDGPUBaseInfo.h/.cpp to be const ref instead of pointers (#197038)
Change all `AMDGPU::IsaInfo` functions and `initDefaultAMDKernelCodeT`
to take `const MCSubtargetInfo &` instead of `const MCSubtargetInfo *`.
These functions never accept null, so a reference better expresses the
contract.
Also change `AMDGPUMCKernelCodeT::initDefault` to take a const reference
for consistency, and convert local `MCSubtargetInfo` pointer variables
to references in `AMDGPUMCExpr.cpp` where the pointer is always
dereferenced.
Requested by @arsenm in
https://github.com/llvm/llvm-project/pull/192306#discussion_r2076113671.
Co-authored-by: Claude Opus 4 (1M context) <noreply at anthropic.com>
[Utils] Examine debug info type instead of alloca type to guess the debug behavior of the alloca uses (#177480)
Replace `isArray` and `isStructure` helpers that queried alloca IR type
with a `isCompositeType` helper that checks the debug variable's
source-level type from debug info metadata to decide if this seems
perhaps profitable to convert to this debug info from #debug_declare to
a #debug_value.
This changes behavior: the lowering decision is now based on the
source-level type from debug info rather than the IR alloca type, which
is more semantically correct for debug info processing. This should
have minimal effect on clang, but may change behavior more
significantly on front-ends like rust that have not used semantically
meaningful alloca element types.
Removes all uses of getAllocatedType() from Utils/Local.cpp.
This seemed slightly more semantically correct to me, though it is
slightly challenging to enumerate all of the possible scalar debug
[7 lines not shown]
[VPlan] Simplify select x, (i1 y | z), y -> y | (x && z) (#190196)
Fixes https://github.com/llvm/llvm-project/issues/189553
This adds a canonicalization `select x, (i1 y | z), y -> y | (x && z)`,
[Alive2]( https://alive2.llvm.org/ce/z/qcQRn6). InstCombine already
performs this.
This adds a canonicalization which causes the `lhs | (headermask && rhs)
-> vp.merge rhs, true, lhs, evl` pattern in optimizeMasksToEVL to match,
improving the RISC-V codegen for an anyof select reduction.
[flang] Inline scalar-to-array hlfir.assign at -O0 (#197092)
At `-O0`, Flang can lower trivial scalar-to-array broadcasts such as `c
= a(1) + 1.0` through `_FortranAAssign`. That runtime path can call
`free()`, which is not valid in OpenMP GPU device code.
This patch teaches `InlineHLFIRAssign` to handle trivial scalar RHS
values. At `-O0`, the pipeline runs it in a scalar-RHS-only mode, so
only scalar-to-array broadcasts are inlined. Array-to-array assignments
still fall back to `_FortranAAssign` at `-O0`.
Scalar RHS values are materialized before the generated loop with
`loadTrivialScalar`, preserving intrinsic assignment ordering for cases
like `a = a(1)`. At `O1+`, the full `InlineHLFIRAssign` pass still runs
as before, now also supporting scalar RHS.
The remaining files are test updates from scalar-to-array assignments
now being inlined at `-O0` instead of lowering through
`_FortranAAssign`.
[5 lines not shown]
[lldb][windows] Fix second-chance exception delivery on lldb-server (#197956)
Currently, all tests that wait for the debugger to stop when the process
crashes time out on Windows under `LLDB_USE_LLDB_SERVER=1` because of 2
issues:
1. The `if (!first_chance) SetState(eStateStopped, false)` before the
switch silently advances `m_state` on every second-chance event. The
`default:` branch later calls `SetState(eStateStopped, true)` but this
is never reached because `state == m_state`. The client is waiting for a
reply that is never sent.
2. The `default:` branch's first-chance handling stops all threads and
then returns `SendToApplication`, which tells Windows
`DBG_EXCEPTION_NOT_HANDLED`. This hangs the process, the second-chance
event never arrives. `ProcessWindows` is a no-op on first-chance
non-breakpoint exceptions because of this: it just returns
`ExceptionResult::SendToApplication` with no `StopThread/SetState`.
[11 lines not shown]
[Flang][OpenMP] Restrict implicit default declare mapper from applying deep-copies of pointer members (#197885)
According to the OpenMP specification, only allocatables should get
deep-copy behaviour inside of implicit default declare mappers. This PR
restricts this behaviour. Relevant specification exert, added as a
comment for a reminder:
// "If a component of a derived type list item is a map clause list item
// that results from the predefined default mapper for that derived
type,
// and the component is not also an explicit list item or the array base
// of an explicit list item on the same construct, then: if it has the
// POINTER attribute, it is attach-INELIGIBLE. If a list item in a map
// clause is an associated pointer that is attach-ineligible, the effect
// of the map clause does not apply to its pointer target."
This prevents certain programs from unexpected over-mapping via pointer
nesting, doesn't prevent that for allocatables, but that's OpenMP
specification mandated foot shooting, so it's free game.
[clang] Give unnamed namespaces internal linkage (#198215)
Recently in #194600 we exposed formal linkage in AST dump. That PR came
with a bunch of FIXMEs. One of them is about the fact that we consider
unnamed namespaces to have external linkage, while the Standard says
it's internal linkage
([[basic.link]/4](https://eel.is/c++draft/basic.link#4.sentence-1)):
> An unnamed namespace or a namespace declared directly or indirectly
within an unnamed namespace has internal linkage.
Of course, declarations within unnamed namespaces still had internal
linkage (nothing would work otherwise).
The intent of this patch is to give unnamed namespaces internal linkage
and to do a bit of refactoring in
`LinkageComputer::getLVForNamespaceScopeDecl` to use linkage of the
enclosing namespace as the default linkage of declarations within it,
now that all kinds of namespaces have the correct linkage. No changes to
the behavior of programs are intended.
[libc] Port remaining socket functions to syscall_wrappers (#198463)
While in there:
- fix file headers to conform to latest standards
- add missing restrict qualifier to recvfrom
Assisted by Gemini.
[BOLT] Gate PointerAuthCFIFixup unit test on AArch64 target availability (#197464)
The test bodies reference AArch64:: namespace identifiers (ADDSXri, X0)
which fail to compile when AArch64 is not in LLVM_TARGETS_TO_BUILD. Wrap
all TEST_P bodies in #ifdef AARCH64_AVAILABLE and add
GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST to suppress GoogleTest's
uninstantiated suite error when no target instantiates the tests.
[libc] Add struct sockaddr_in (#197909)
The struct needs to be 16 bytes long for compatibility with the linux
kernel (which rejects smaller sizes, even though the reset of the bytes
are unused).
The padding field (and its name) is not specified by POSIX, but it's
traditionally called sin_zero, and there exists a fair amount of code
that references that name, so I'm matching it as well.
I'm testing the compatibility of this struct by binding to a localhost
address. This test requires that the machine has a loopback interface
with an assigned ipv4 address. If some of the environments do not have
it, we can try to detect this in the test and skip it, but this would
diminish the value of the test.
As a drive-by, I'm also adding the (non-POSIX) INADDR_LOOPBACK constant.
Assisted by Gemini.
[AtomicExpand] Add bitcasts when expanding store atomic vector
AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
[X86] Cast atomic vectors in IR to support floats
Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.
Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.
Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two
[4 lines not shown]
[SelectionDAG] Split vector types for atomic store
Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
[X86] Remove extra MOV after widening atomic store
This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
[SelectionDAG] Widen <2 x T> vector types for atomic store
Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.
[X86] Manage atomic store of fp -> int promotion in DAG
When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.