LLVM/project f6fb8f5lldb/test/API/functionalities/gdb_remote_client TestGDBRemoteClient.py

[lldb][windows] remove path separator replacement from TestGDBRemoteClient.py (#198537)

Since https://github.com/llvm/llvm-project/pull/197942, vRun packets use
the native path separators. TestGDBRemoteClient.py now fails on Windows
because it converts the path to POSIX style paths, which is a workaround
for what https://github.com/llvm/llvm-project/pull/197942 fixed.

rdar://177342572
DeltaFile
+3-6lldb/test/API/functionalities/gdb_remote_client/TestGDBRemoteClient.py
+3-61 files

LLVM/project 673b17ellvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AArch64 arm64-neon-v1i1-setcc.ll

[DAG] scalarizeExtractedBinOp - extract from non-constant one use buildvectors (#198013)

When attempting to scalarize a vector binop that has a single extract,
we currently only fold if either of the binop's operands is a constant
buildvector - but we can extract from non-constant buildvectors without
increasing instruction count as long as the vector binop was the only
use of the buildvector.

More yak shaving for #196493
DeltaFile
+44-60llvm/test/CodeGen/X86/ifma-combine-vpmadd52.ll
+25-27llvm/test/CodeGen/X86/masked_gather_scatter_widen.ll
+2-6llvm/test/CodeGen/X86/i128-add.ll
+3-4llvm/test/CodeGen/X86/known-signbits-vector.ll
+1-2llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll
+2-1llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+77-1006 files

LLVM/project bcfa53eflang/include/flang/Lower OpenACC.h, flang/lib/Lower OpenACC.cpp Bridge.cpp

[flang][acc] Handle Fortran do loops as acc loops in acc routine (#198420)

As was previously done for do loops in acc compute constructs in
https://github.com/llvm/llvm-project/issues/149614 , this PR does the
same for do loops in `acc routine`. The rules are follows:
- Do loops not marked with `acc loop` are considered `auto`
- Do concurrent loops are considered `independent`
- Any loops in an `acc routine seq` are considered `seq`

This ensures that the IV is correctly privatized and attached to acc
loop.
DeltaFile
+108-1flang/test/Lower/OpenACC/do-loops-to-acc-loops.f90
+81-24flang/lib/Lower/OpenACC.cpp
+8-0flang/include/flang/Lower/OpenACC.h
+4-3flang/lib/Lower/Bridge.cpp
+201-284 files

LLVM/project 75e4aafllvm/lib/CodeGen ShadowStackGCLowering.cpp, llvm/test/CodeGen/Generic shadow-stack-gc-lowering.ll

Reland "[CodeGen] Use byte offsets and ptradd in ShadowStackGCLowering" (#197436)

Replace typed struct GEPs with byte array allocation and ptradd
operations:

1. Track root offsets as byte offsets instead of building typed struct.
2. Use `ComputeFrameLayout` to compute byte offsets based on DataLayout,
properly accounting for each root's size and alignment.
3. Allocate frame as `[FrameSize x i8]` byte array instead of typed
struct.
4. Replace all CreateGEP operations with CreatePtrAdd using computed
offsets.
5. Frame layout unchanged: `[Next ptr | Map ptr | Root 0 | Root 1 | ...
| Root N]` where each root is placed at its computed aligned offset.
6. Zero out padding between roots with memset for deterministic frame
contents for GC.

Benefits:
- Removes dependency on `getAllocatedType` for building frame struct

    [7 lines not shown]
DeltaFile
+101-86llvm/lib/CodeGen/ShadowStackGCLowering.cpp
+30-44llvm/test/CodeGen/Generic/shadow-stack-gc-lowering.ll
+131-1302 files

LLVM/project 52ca170.github/workflows/containers/github-action-ci-tooling Dockerfile

[Github] Hashpin base container in CI Tooling containerfile (#197315)

https://github.com/llvm/llvm-project/security/code-scanning/1492
DeltaFile
+2-2.github/workflows/containers/github-action-ci-tooling/Dockerfile
+2-21 files

LLVM/project 213b329llvm/lib/Target/AMDGPU GCNSubtarget.h AMDGPUAsmPrinter.cpp, llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp

[AMDGPU][NFCI] Change MCSubtargetInfo references in AMDGPUBaseInfo.h/.cpp to be const ref instead of pointers (#197038)

Change all `AMDGPU::IsaInfo` functions and `initDefaultAMDKernelCodeT`
to take `const MCSubtargetInfo &` instead of `const MCSubtargetInfo *`.
These functions never accept null, so a reference better expresses the
contract.

Also change `AMDGPUMCKernelCodeT::initDefault` to take a const reference
for consistency, and convert local `MCSubtargetInfo` pointer variables
to references in `AMDGPUMCExpr.cpp` where the pointer is always
dereferenced.

Requested by @arsenm in
https://github.com/llvm/llvm-project/pull/192306#discussion_r2076113671.

Co-authored-by: Claude Opus 4 (1M context) <noreply at anthropic.com>
DeltaFile
+72-72llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+30-30llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+17-17llvm/lib/Target/AMDGPU/GCNSubtarget.h
+8-8llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+5-6llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+4-5llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+136-1388 files not shown
+153-15514 files

LLVM/project 94b1d19llvm/lib/Transforms/Utils Local.cpp, llvm/test/DebugInfo/Generic dbg-value-lower-linenos.ll

[Utils] Examine debug info type instead of alloca type to guess the debug behavior of the alloca uses (#177480)

Replace `isArray` and `isStructure` helpers that queried alloca IR type
with a `isCompositeType` helper that checks the debug variable's
source-level type from debug info metadata to decide if this seems
perhaps profitable to convert to this debug info from #debug_declare to
a #debug_value.

This changes behavior: the lowering decision is now based on the
source-level type from debug info rather than the IR alloca type, which
is more semantically correct for debug info processing. This should
have minimal effect on clang, but may change behavior more
significantly on front-ends like rust that have not used semantically
meaningful alloca element types.

Removes all uses of getAllocatedType() from Utils/Local.cpp.

This seemed slightly more semantically correct to me, though it is
slightly challenging to enumerate all of the possible scalar debug

    [7 lines not shown]
DeltaFile
+35-9llvm/lib/Transforms/Utils/Local.cpp
+4-3llvm/test/DebugInfo/Generic/dbg-value-lower-linenos.ll
+2-2llvm/test/Transforms/InstCombine/dbg-simplify-alloca-size.ll
+1-1llvm/test/Transforms/InstCombine/dbg-scalable-store-fixed-frag.ll
+42-154 files

LLVM/project 7d6ed54llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer alternate-non-profitable.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+74-48llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+24-24llvm/test/Transforms/SLPVectorizer/RISCV/buildvector-all-external-scalars.ll
+6-6llvm/test/Transforms/SLPVectorizer/X86/pr48879-sroa.ll
+4-4llvm/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll
+3-3llvm/test/Transforms/SLPVectorizer/X86/copyable_reorder.ll
+3-3llvm/test/Transforms/SLPVectorizer/alternate-non-profitable.ll
+114-8815 files not shown
+135-10921 files

LLVM/project b7cc800llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize select-cmp-predicated.ll

[VPlan] Simplify select x, (i1 y | z), y -> y | (x && z) (#190196)

Fixes https://github.com/llvm/llvm-project/issues/189553

This adds a canonicalization `select x, (i1 y | z), y -> y | (x && z)`,
[Alive2]( https://alive2.llvm.org/ce/z/qcQRn6). InstCombine already
performs this.

This adds a canonicalization which causes the `lhs | (headermask && rhs)
-> vp.merge rhs, true, lhs, evl` pattern in optimizeMasksToEVL to match,
improving the RISC-V codegen for an anyof select reduction.
DeltaFile
+7-7llvm/test/Transforms/LoopVectorize/select-cmp-predicated.ll
+9-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+2-3llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
+2-2llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
+20-124 files

LLVM/project 29f345ellvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64Arm64ECCallLowering.cpp, llvm/test/CodeGen/AArch64 arm64ec-exit-thunks.ll arm64ec-hybrid-patchable.ll

Revert "[AArch64] Copy x4/x5 vararg payload into the x64 stack in Arm64EC exi…"

This reverts commit e6a12781bcc2d1713f9e5593de36f68cc00aaab6.
DeltaFile
+6-208llvm/test/CodeGen/AArch64/arm64ec-exit-thunks.ll
+4-62llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+9-11llvm/test/CodeGen/AArch64/arm64ec-hybrid-patchable.ll
+1-9llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
+20-2904 files

LLVM/project 581dd5bflang/lib/Optimizer/HLFIR/Transforms InlineHLFIRAssign.cpp, flang/test/Driver mlir-pass-pipeline.f90

[flang] Inline scalar-to-array hlfir.assign at -O0 (#197092)

At `-O0`, Flang can lower trivial scalar-to-array broadcasts such as `c
= a(1) + 1.0` through `_FortranAAssign`. That runtime path can call
`free()`, which is not valid in OpenMP GPU device code.

This patch teaches `InlineHLFIRAssign` to handle trivial scalar RHS
values. At `-O0`, the pipeline runs it in a scalar-RHS-only mode, so
only scalar-to-array broadcasts are inlined. Array-to-array assignments
still fall back to `_FortranAAssign` at `-O0`.

Scalar RHS values are materialized before the generated loop with
`loadTrivialScalar`, preserving intrinsic assignment ordering for cases
like `a = a(1)`. At `O1+`, the full `InlineHLFIRAssign` pass still runs
as before, now also supporting scalar RHS.

The remaining files are test updates from scalar-to-array assignments
now being inlined at `-O0` instead of lowering through
`_FortranAAssign`.

    [5 lines not shown]
DeltaFile
+86-42flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+52-0flang/test/HLFIR/inline-hlfir-assign.fir
+27-13flang/lib/Optimizer/HLFIR/Transforms/InlineHLFIRAssign.cpp
+13-26flang/test/Integration/OpenMP/private-global.f90
+7-2flang/test/Driver/mlir-pass-pipeline.f90
+7-1flang/test/Lower/OpenMP/workdistribute-saxpy-and-scalar-assign.f90
+192-845 files not shown
+216-8711 files

LLVM/project ed81c50lldb/source/Plugins/Process/Windows/Common NativeProcessWindows.cpp

[lldb][windows] Fix second-chance exception delivery on lldb-server (#197956)

Currently, all tests that wait for the debugger to stop when the process
crashes time out on Windows under `LLDB_USE_LLDB_SERVER=1` because of 2
issues:

1. The `if (!first_chance) SetState(eStateStopped, false)` before the
switch silently advances `m_state` on every second-chance event. The
`default:` branch later calls `SetState(eStateStopped, true)` but this
is never reached because `state == m_state`. The client is waiting for a
reply that is never sent.

2. The `default:` branch's first-chance handling stops all threads and
then returns `SendToApplication`, which tells Windows
`DBG_EXCEPTION_NOT_HANDLED`. This hangs the process, the second-chance
event never arrives. `ProcessWindows` is a no-op on first-chance
non-breakpoint exceptions because of this: it just returns
`ExceptionResult::SendToApplication` with no `StopThread/SetState`.


    [11 lines not shown]
DeltaFile
+4-9lldb/source/Plugins/Process/Windows/Common/NativeProcessWindows.cpp
+4-91 files

LLVM/project 6938554llvm/utils/gn/secondary/libcxx/include BUILD.gn

[gn] port 182ae96a82fc (#198533)
DeltaFile
+1-2llvm/utils/gn/secondary/libcxx/include/BUILD.gn
+1-21 files

LLVM/project d06e693flang/lib/Utils OpenMP.cpp, flang/test/Lower/OpenMP default-mapper-no-pointer-map.f90 implicit-mapper-no-pointer-map.f90

[Flang][OpenMP] Restrict implicit default declare mapper from applying deep-copies of pointer members (#197885)

According to the OpenMP specification, only allocatables should get
deep-copy behaviour inside of implicit default declare mappers. This PR
restricts this behaviour. Relevant specification exert, added as a
comment for a reminder:

// "If a component of a derived type list item is a map clause list item
// that results from the predefined default mapper for that derived
type,
// and the component is not also an explicit list item or the array base
// of an explicit list item on the same construct, then: if it has the
// POINTER attribute, it is attach-INELIGIBLE. If a list item in a map
// clause is an associated pointer that is attach-ineligible, the effect
    // of the map clause does not apply to its pointer target."

This prevents certain programs from unexpected over-mapping via pointer
nesting, doesn't prevent that for allocatables, but that's OpenMP
specification mandated foot shooting, so it's free game.
DeltaFile
+76-0flang/test/Lower/OpenMP/default-mapper-no-pointer-map.f90
+74-0flang/test/Lower/OpenMP/implicit-mapper-no-pointer-map.f90
+30-0flang/lib/Utils/OpenMP.cpp
+180-03 files

LLVM/project 52871b5lldb/source/Plugins/ObjectFile/PECOFF ObjectFilePECOFF.cpp, lldb/test/API/functionalities/data-formatter/bytecode-summary TestBytecodeSummary.py

[lldb][PECOFF] Recognise truncated .lldb{summaries,formatters} section names (#198377)
DeltaFile
+8-5lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
+0-2lldb/test/API/functionalities/data-formatter/bytecode-synthetic/TestBytecodeSynthetic.py
+0-1lldb/test/API/functionalities/data-formatter/bytecode-summary/TestBytecodeSummary.py
+8-83 files

LLVM/project 7ae1a3fclang/lib/AST Decl.cpp, clang/test/AST ast-dump-linkage-internal.cpp

[clang] Give unnamed namespaces internal linkage (#198215)

Recently in #194600 we exposed formal linkage in AST dump. That PR came
with a bunch of FIXMEs. One of them is about the fact that we consider
unnamed namespaces to have external linkage, while the Standard says
it's internal linkage
([[basic.link]/4](https://eel.is/c++draft/basic.link#4.sentence-1)):

> An unnamed namespace or a namespace declared directly or indirectly
within an unnamed namespace has internal linkage.

Of course, declarations within unnamed namespaces still had internal
linkage (nothing would work otherwise).

The intent of this patch is to give unnamed namespaces internal linkage
and to do a bit of refactoring in
`LinkageComputer::getLVForNamespaceScopeDecl` to use linkage of the
enclosing namespace as the default linkage of declarations within it,
now that all kinds of namespaces have the correct linkage. No changes to
the behavior of programs are intended.
DeltaFile
+35-30clang/lib/AST/Decl.cpp
+9-3clang/test/AST/ast-dump-linkage-internal.cpp
+44-332 files

LLVM/project 8076d17libc/src/__support/OSUtil/linux/syscall_wrappers CMakeLists.txt recvfrom.h, libc/src/sys/socket/linux CMakeLists.txt

[libc] Port remaining socket functions to syscall_wrappers (#198463)

While in there:
- fix file headers to conform to latest standards
- add missing restrict qualifier to recvfrom

Assisted by Gemini.
DeltaFile
+96-0libc/src/__support/OSUtil/linux/syscall_wrappers/CMakeLists.txt
+26-22libc/src/sys/socket/linux/CMakeLists.txt
+39-0libc/src/__support/OSUtil/linux/syscall_wrappers/recvfrom.h
+39-0libc/src/__support/OSUtil/linux/syscall_wrappers/sendto.h
+36-0libc/src/__support/OSUtil/linux/syscall_wrappers/bind.h
+36-0libc/src/__support/OSUtil/linux/syscall_wrappers/sendmsg.h
+272-2214 files not shown
+482-13920 files

LLVM/project c8d9852llvm/test/CodeGen/AMDGPU reassoc-mul-add-1-to-mad.ll literal64.ll, llvm/test/CodeGen/AMDGPU/GlobalISel mul.ll

[AMDGPU][NFC] Add tests for 64bit literals in single DWORD instructions for gfx13 (#197907)

Co-authored-by: sstipano <sstipano7 at gmail.com>
DeltaFile
+1,343-0llvm/test/CodeGen/AMDGPU/reassoc-mul-add-1-to-mad.ll
+413-186llvm/test/CodeGen/AMDGPU/literal64.ll
+557-0llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll
+344-171llvm/test/CodeGen/AMDGPU/scale-offset-global.ll
+310-162llvm/test/CodeGen/AMDGPU/scale-offset-scratch.ll
+426-0llvm/test/CodeGen/AMDGPU/mul.ll
+3,393-5194 files not shown
+4,022-51910 files

LLVM/project 99824dcmlir/test/Transforms mem2reg.mlir, mlir/test/lib/Dialect/Test TestOpDefs.cpp TestOps.td

add dual aliaser test
DeltaFile
+56-0mlir/test/lib/Dialect/Test/TestOpDefs.cpp
+27-0mlir/test/Transforms/mem2reg.mlir
+20-0mlir/test/lib/Dialect/Test/TestOps.td
+103-03 files

LLVM/project d6d5a3dbolt/unittests/Passes PointerAuthCFIFixup.cpp

[BOLT] Gate PointerAuthCFIFixup unit test on AArch64 target availability (#197464)

The test bodies reference AArch64:: namespace identifiers (ADDSXri, X0)
which fail to compile when AArch64 is not in LLVM_TARGETS_TO_BUILD. Wrap
all TEST_P bodies in #ifdef AARCH64_AVAILABLE and add
GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST to suppress GoogleTest's
uninstantiated suite error when no target instantiates the tests.
DeltaFile
+4-0bolt/unittests/Passes/PointerAuthCFIFixup.cpp
+4-01 files

LLVM/project f3c6257llvm/test/CodeGen/Lanai multiply.ll

[lanai] multiply.ll - regenerate test checks (#198521)
DeltaFile
+75-15llvm/test/CodeGen/Lanai/multiply.ll
+75-151 files

LLVM/project ae8d973libc/cmake/modules LLVMLibCLibraryRules.cmake

Fix unused parameter for add_bitcode_entrypoint_library for GPU Libc (#198458)
DeltaFile
+2-1libc/cmake/modules/LLVMLibCLibraryRules.cmake
+2-11 files

LLVM/project 39a1ed4lldb/test/API/functionalities/postmortem/elf-core TestLinuxCore.py

[lldb][Windows] Disable TestLinuxCore.LinuxCoreTestCase.test_object_map on Windows (#198473)

See https://github.com/llvm/llvm-project/issues/198471 for details.
DeltaFile
+3-0lldb/test/API/functionalities/postmortem/elf-core/TestLinuxCore.py
+3-01 files

LLVM/project e2464bflibc/hdr/types struct_sockaddr_in.h CMakeLists.txt, libc/include/llvm-libc-types struct_sockaddr_in.h

[libc] Add struct sockaddr_in (#197909)

The struct needs to be 16 bytes long for compatibility with the linux
kernel (which rejects smaller sizes, even though the reset of the bytes
are unused).

The padding field (and its name) is not specified by POSIX, but it's
traditionally called sin_zero, and there exists a fair amount of code
that references that name, so I'm matching it as well.

I'm testing the compatibility of this struct by binding to a localhost
address. This test requires that the machine has a loopback interface
with an assigned ipv4 address. If some of the environments do not have
it, we can try to detect this in the test and skip it, but this would
diminish the value of the test.

As a drive-by, I'm also adding the (non-POSIX) INADDR_LOOPBACK constant.

Assisted by Gemini.
DeltaFile
+40-8libc/test/src/sys/socket/linux/bind_test.cpp
+33-0libc/include/llvm-libc-types/struct_sockaddr_in.h
+26-0libc/hdr/types/struct_sockaddr_in.h
+9-0libc/hdr/types/CMakeLists.txt
+4-2libc/utils/docgen/netinet/in.yaml
+4-0libc/test/src/sys/socket/linux/CMakeLists.txt
+116-104 files not shown
+124-1110 files

LLVM/project fb63f23llvm/lib/CodeGen AtomicExpandPass.cpp, llvm/test/CodeGen/ARM atomic-load-store.ll

[AtomicExpand] Add bitcasts when expanding store atomic vector

AtomicExpand fails for aligned \`store atomic <n x T>\` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling from #148900.
DeltaFile
+99-6llvm/test/CodeGen/X86/atomic-load-store.ll
+98-0llvm/test/Transforms/AtomicExpand/X86/expand-atomic-non-integer.ll
+49-0llvm/test/CodeGen/ARM/atomic-load-store.ll
+4-2llvm/lib/CodeGen/AtomicExpandPass.cpp
+250-84 files

LLVM/project 07d1319llvm/include/llvm/Target TargetSelectionDAG.td, llvm/lib/Target/X86 X86InstrFragmentsSIMD.td X86InstrAVX512.td

[X86] Cast atomic vectors in IR to support floats

Extend the X86 \`alignedstore\` PatFrag to also match \`atomic_store\`
with vector-size alignment, so existing MOVAPS/MOVAPD/MOVDQA-family
aligned-store patterns cover 128-bit aligned vector atomic stores on
SSE/AVX/AVX-512 without per-type duplicates. \`<4 x float>\`,
\`<2 x double>\`, \`<2 x i64>\`, \`<4 x i32>\`, \`<8 x half>\`, \`<8 x bfloat>\`
all codegen to a single \`movaps\`/\`movapd\` on AVX+ via this.

Adds v8f16/v8bf16 bitconvert variants to the widen-path
\`atomic_store_32\` / \`atomic_store_64\` patterns so \`<2 x half>\`,
\`<2 x bfloat>\`, \`<4 x half>\`, \`<4 x bfloat>\` atomic stores reaching
the PR4 widen path also collapse to a single instruction on AVX+
targets.

Vectors whose \`getTypeAction\` is split rather than widen still rely
on PR6's \`SplitVecOp_ATOMIC_STORE\` — that path bitcasts the vector
to a scalar integer and issues an integer \`atomic_store_N\`, picked
up by the pre-existing scalar atomic-store patterns. The two

    [4 lines not shown]
DeltaFile
+86-0llvm/test/CodeGen/X86/atomic-load-store.ll
+5-4llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
+3-2llvm/lib/Target/X86/X86InstrAVX512.td
+1-1llvm/include/llvm/Target/TargetSelectionDAG.td
+95-74 files

LLVM/project 9a15cc0llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp LegalizeTypes.h, llvm/test/CodeGen/X86 atomic-load-store.ll

[SelectionDAG] Split vector types for atomic store

Vector types that aren't widened are split so that a single ATOMIC_STORE
is issued for the entire vector at once. This enables SelectionDAG to
translate vectors with type bfloat,half.
DeltaFile
+440-0llvm/test/CodeGen/X86/atomic-load-store.ll
+20-0llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+1-0llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+461-03 files

LLVM/project 7b09891llvm/include/llvm/Target TargetSelectionDAG.td, llvm/lib/Target/X86 X86InstrSSE.td X86InstrAVX512.td

[X86] Remove extra MOV after widening atomic store

This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers <2 x i8> (SSE4.1+), <2 x i16>,
<4 x i8>, <2 x i32>, <2 x float>, <4 x i16>, <2 x ptr addrspace(270)>.
DeltaFile
+47-64llvm/test/CodeGen/X86/atomic-load-store.ll
+30-24llvm/test/CodeGen/X86/atomic-unordered.ll
+35-0llvm/include/llvm/Target/TargetSelectionDAG.td
+10-10llvm/lib/Target/X86/X86InstrSSE.td
+6-6llvm/lib/Target/X86/X86InstrAVX512.td
+1-1llvm/lib/Target/X86/X86ISelLowering.cpp
+129-1056 files

LLVM/project f6ebebcllvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp LegalizeTypes.h, llvm/test/CodeGen/X86 atomic-load-store.ll

[SelectionDAG] Widen <2 x T> vector types for atomic store

Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.
DeltaFile
+198-0llvm/test/CodeGen/X86/atomic-load-store.ll
+54-0llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+1-0llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+253-03 files

LLVM/project 21e67f6llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 atomic-load-store.ll

[X86] Manage atomic store of fp -> int promotion in DAG

When lowering atomic <1 x T> vector types with floats, selection can fail since
this pattern is unsupported. To support this, floats can be casted to
an integer type of the same size.
DeltaFile
+130-0llvm/test/CodeGen/X86/atomic-load-store.ll
+4-0llvm/lib/Target/X86/X86ISelLowering.cpp
+134-02 files