LLVM/project c02ace3llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Merge branch 'main' into users/kasuga-fj/pipeliner-remove-performcheap
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5265,591 files not shown
+1,438,292-1,248,9205,597 files

LLVM/project f7ac184llvm/lib/Transforms/Scalar DeadStoreElimination.cpp, llvm/test/Transforms/DeadStoreElimination simple.ll

feedback

Created using spr 1.3.7
DeltaFile
+14-0llvm/test/Transforms/DeadStoreElimination/simple.ll
+2-2llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
+16-22 files

LLVM/project c9f4bb6llvm/lib/Target/X86 X86PassRegistry.def

[X86][NewPM] Add rest of non-ported passes to X86PassRegistry (#176068)

I noticed these when writing up the pass builder. Put them in the pass
registry to make it easier to see what is not done yet for when people
start working on more porting.
DeltaFile
+7-0llvm/lib/Target/X86/X86PassRegistry.def
+7-01 files

LLVM/project b77f952llvm/utils/gn/secondary/clang/include/clang/Basic BUILD.gn, llvm/utils/gn/secondary/clang/lib/Basic BUILD.gn

gn build: Port d5442b8c963d
DeltaFile
+4-0llvm/utils/gn/secondary/clang/include/clang/Basic/BUILD.gn
+1-0llvm/utils/gn/secondary/clang/lib/Basic/BUILD.gn
+5-02 files

LLVM/project 4db68e4llvm/include/llvm/Passes MachinePassRegistry.def

[NewPM][CodeGen] Add missing non-ported pass to registry

Not sure why this did not make it in the list originally. But adding it
so that someone looking for passes to port in the registry will see it.
DeltaFile
+1-0llvm/include/llvm/Passes/MachinePassRegistry.def
+1-01 files

LLVM/project 036fa67llvm/lib/Target/AMDGPU SIRegisterInfo.td GCNSubtarget.cpp, llvm/test/CodeGen/AMDGPU regalloc-vgpr_lo128-gfx1250-t16.mir regalloc-vgpr_lo128-gfx1250.mir

[AMDGPU] Limit allocation of lo128 registers for occupancy

Parent change allows allocation of lo128 VGPRs from all 4 banks.
That may result in the undesired allocation leaving a hole of
maximum 128 registers in case if for example v0-v127 are allocated,
and v128-v255 are free.

Limit the available allocation order to the occupancy. Both hard
occupancy limits and occupancy achieved during scheduling are
considered. That is better to spill a register than to drop occupancy
in this case.
DeltaFile
+97-1llvm/test/CodeGen/AMDGPU/regalloc-vgpr_lo128-gfx1250-t16.mir
+97-1llvm/test/CodeGen/AMDGPU/regalloc-vgpr_lo128-gfx1250.mir
+53-0llvm/test/CodeGen/AMDGPU/shrink-vgpr_lo128-gfx1250-t16.mir
+30-4llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+29-0llvm/test/CodeGen/AMDGPU/shrink-vgpr_lo128-gfx1250.mir
+9-0llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+315-61 files not shown
+323-67 files

LLVM/project 9979f63llvm/test/MC/AMDGPU gfx10_asm_vopc_e64.s gfx11_asm_vop3_from_vopc.s, llvm/test/MC/Disassembler/AMDGPU gfx10_vop3c.txt gfx10_vop3.txt

Merge branch 'users/chapuni/mcdc/nest/nest' into users/chapuni/mcdc/nest/trunk
DeltaFile
+5,421-5,421llvm/test/MC/AMDGPU/gfx10_asm_vopc_e64.s
+5,392-5,392llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3c.txt
+3,733-3,733llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3.txt
+2,919-2,919llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_from_vopc.txt
+2,802-2,802llvm/test/MC/AMDGPU/gfx11_asm_vop3_from_vopc.s
+2,645-2,645llvm/test/MC/AMDGPU/gfx11_asm_vop3_from_vopc-fake16.s
+22,912-22,912742 files not shown
+126,623-105,479748 files

LLVM/project 1b17f31llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Merge branch 'users/chapuni/mcdc/nest/covmapdesc' into users/chapuni/mcdc/nest/nest

Conflicts:
        clang/lib/CodeGen/CodeGenPGO.cpp
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5265,661 files not shown
+1,449,922-1,252,6225,667 files

LLVM/project ef2ee43llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s, llvm/test/MC/Disassembler/AMDGPU gfx9_vop3.txt

Merge branch 'users/chapuni/mcdc/nest/bitmapaddr' into users/chapuni/mcdc/nest/covmapdesc
DeltaFile
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+22,711-22,884llvm/test/MC/Disassembler/AMDGPU/gfx9_vop3.txt
+22,276-22,275llvm/test/MC/AMDGPU/gfx8_asm_vopc.s
+193,358-193,5265,663 files not shown
+1,450,232-1,252,8205,669 files

LLVM/project a837107llvm/test/MC/AMDGPU gfx10_asm_vopc_e64.s gfx10_asm_vop1.s, llvm/test/MC/Disassembler/AMDGPU gfx10_vop3c.txt gfx10_vop3.txt

Merge branch 'users/chapuni/mcdc/nest/covgen' into users/chapuni/mcdc/nest/bitmapaddr
DeltaFile
+10,845-10,844llvm/test/MC/AMDGPU/gfx10_asm_vopc_e64.s
+5,425-5,424llvm/test/MC/AMDGPU/gfx10_asm_vop1.s
+5,392-5,392llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3c.txt
+4,676-4,675llvm/test/MC/AMDGPU/gfx10_asm_vop3.s
+4,672-4,671llvm/test/MC/AMDGPU/gfx10_asm_vop2.s
+3,733-3,733llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3.txt
+34,743-34,7392,431 files not shown
+225,579-160,7002,437 files

LLVM/project 11efca0llvm/test/MC/AMDGPU gfx10_asm_vopc_e64.s gfx11_asm_vop3_from_vopc.s, llvm/test/MC/Disassembler/AMDGPU gfx10_vop3c.txt gfx10_vop3.txt

Merge branch 'main' into users/chapuni/mcdc/nest/covgen
DeltaFile
+5,421-5,421llvm/test/MC/AMDGPU/gfx10_asm_vopc_e64.s
+5,392-5,392llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3c.txt
+3,733-3,733llvm/test/MC/Disassembler/AMDGPU/gfx10_vop3.txt
+2,919-2,919llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_from_vopc.txt
+2,802-2,802llvm/test/MC/AMDGPU/gfx11_asm_vop3_from_vopc.s
+2,645-2,645llvm/test/MC/AMDGPU/gfx11_asm_vop3_from_vopc-fake16.s
+22,912-22,912747 files not shown
+126,682-105,505753 files

LLVM/project a7f489bllvm/lib/Target/AMDGPU SIRegisterInfo.td GCNSubtarget.h, llvm/test/CodeGen/AMDGPU regalloc-vgpr_lo128-gfx1250.mir regalloc-vgpr_lo128-gfx1250-t16.mir

[AMDGPU] Limit allocation of lo128 registers for occupancy

Parent change allows allocation of lo128 VGPRs from all 4 banks.
That may result in the undesired allocation leaving a hole of
maximum 128 registers in case if for example v0-v127 are allocated,
and v128-v255 are free.

Limit the available allocation order to the occupancy. Both hard
occupancy limits and occupancy achieved during scheduling are
considered. That is better to spill a register than to drop occupancy
in this case.
DeltaFile
+97-1llvm/test/CodeGen/AMDGPU/regalloc-vgpr_lo128-gfx1250.mir
+97-1llvm/test/CodeGen/AMDGPU/regalloc-vgpr_lo128-gfx1250-t16.mir
+53-0llvm/test/CodeGen/AMDGPU/shrink-vgpr_lo128-gfx1250-t16.mir
+30-4llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+29-0llvm/test/CodeGen/AMDGPU/shrink-vgpr_lo128-gfx1250.mir
+8-0llvm/lib/Target/AMDGPU/GCNSubtarget.h
+314-61 files not shown
+321-67 files

LLVM/project 663647fcross-project-tests/dtlto multimodule.test, llvm/include/llvm/LTO LTO.h

[DTLTO] Fix handling of multi-module bitcode inputs (#174624)

This change fixes two issues when processing multi-module bitcode files
in DTLTO:

1. The DTLTO archive handling code incorrectly uses
getSingleBitcodeModule(), which asserts when the bitcode file contains
more than one module.
2. The temporary file containing the contents of an input archive member
was not emitted for multi-module bitcode files. This was due to
incorrect logic for recording whether a bitcode input contains any
ThinLTO modules. In a typical multi-module bitcode file, the first
module is a ThinLTO module while a subsequent auxiliary module is
non-ThinLTO. When modules are processed in order, the auxiliary module
causes the entire bitcode file to be classified as non-ThinLTO, and the
archive-member emission logic then incorrectly skips it.

In addition, this patch adds a test that verifies that multi-module
bitcode files can be successfully linked with DTLTO. The test reproduces

    [2 lines not shown]
DeltaFile
+42-0cross-project-tests/dtlto/multimodule.test
+3-1llvm/lib/LTO/LTO.cpp
+2-0llvm/include/llvm/LTO/LTO.h
+1-1llvm/lib/DTLTO/DTLTO.cpp
+48-24 files

LLVM/project d7b6df7clang/lib/CIR/CodeGen CIRGenBuiltinX86.cpp, clang/test/CIR/CodeGenBuiltins/X86 avx512vldq-builtins.c avx10_2bf16-builtins.c

[CIR][X86] Add CIR codegen support for fpclass x86 builtins (#172813)

This implements the handling for x86-specific fpclass builtin functions.
DeltaFile
+173-0clang/test/CIR/CodeGenBuiltins/X86/avx512vldq-builtins.c
+72-0clang/test/CIR/CodeGenBuiltins/X86/avx10_2bf16-builtins.c
+72-0clang/test/CIR/CodeGenBuiltins/X86/avx512dq-builtins.c
+60-4clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
+37-0clang/test/CIR/CodeGenBuiltins/X86/avx512fp16-builtins.c
+36-0clang/test/CIR/CodeGenBuiltins/X86/avx10_2_512bf16-builtins.c
+450-41 files not shown
+451-47 files

LLVM/project a11feefclang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp, clang/test/CIR/IR vector.cir throw.cir

[CIR] Make cir.alloca alignment mandatory (#172663)

Fixed a crash in `CIRToLLVMAllocaOpLowering` where `cir.alloca`
operations without an explicit alignment attribute caused failures.

Modified the ODS definition of `cir.alloca` to use
`ConfinedAttr<I64Attr, [IntMinValue<0>]>`. This ensures the attribute is
always present.

Added a regression test in `clang/test/CIR/Lowering/alloca.cir`.

---------

Co-authored-by: Sirui Mu <msrlancern at gmail.com>
DeltaFile
+17-17clang/test/CIR/IR/vector.cir
+13-0clang/test/CIR/Lowering/alloca.cir
+3-3clang/test/CIR/IR/throw.cir
+3-3clang/test/CIR/IR/invalid-complex.cir
+2-2clang/test/CIR/Transforms/vector-extract-fold.cir
+2-2clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+40-2710 files not shown
+53-4016 files

LLVM/project 779c05aclang/lib/CodeGen CodeGenPGO.cpp CodeGenPGO.h, clang/test/Profile c-mcdc-logicalop-ternary.c

[MC/DC] Create dedicated MCDCCondBitmapAddr for each Decision (#125411)

MCDCCondBitmapAddr is moved from `CodeGenFunction` into `MCDCState` and
created for each Decision.

In `maybeCreateMCDCCondBitmap`, Allocate bitmaps for all valid Decisions
and emit them order by ID, to prevent nondeterminism.
DeltaFile
+44-10clang/lib/CodeGen/CodeGenPGO.cpp
+10-8clang/test/Profile/c-mcdc-logicalop-ternary.c
+3-5clang/lib/CodeGen/CodeGenPGO.h
+0-3clang/lib/CodeGen/CodeGenFunction.h
+2-0clang/lib/CodeGen/MCDCState.h
+59-265 files

LLVM/project 01f7057utils/bazel MODULE.bazel.lock, utils/bazel/llvm-project-overlay/clang BUILD.bazel

Revert "Fix bazel build for d5442b8 (#176034)"

This reverts commit 43f1edf0cfcbcce7c928e0e27221a5de1fb797ba.

Fixed already by 44b691a1e9e1201034120d71de8bc5b9b3c044e6.
DeltaFile
+1-18utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+2-3utils/bazel/MODULE.bazel.lock
+3-212 files

LLVM/project a809862llvm/lib/Analysis ValueTracking.cpp, llvm/lib/IR Instruction.cpp Operator.cpp

[IR] Teach `drop/hasPoisonGeneratingAnnotations()` about `abs`, `ctlz` and `cttz` (#175941)

DeltaFile
+9-9llvm/test/Transforms/InstCombine/freeze-integer-intrinsics.ll
+13-0llvm/lib/IR/Instruction.cpp
+11-0llvm/lib/IR/Operator.cpp
+2-3llvm/lib/Analysis/ValueTracking.cpp
+0-5llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+35-175 files

LLVM/project 43f1edfutils/bazel MODULE.bazel.lock, utils/bazel/llvm-project-overlay/clang BUILD.bazel

Fix bazel build for d5442b8 (#176034)

Bazel equivalent of cmakelists changes.
DeltaFile
+18-1utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+3-2utils/bazel/MODULE.bazel.lock
+21-32 files

LLVM/project 882560eclang/lib/Sema OpenCLBuiltins.td, clang/test/SemaOpenCL fdeclare-opencl-builtins.cl

[OpenCL] Add missing mipmap read_write image builtins to OpenCLBuiltins.td (#175748)

This issue was discovered while writing tests for #175120.
DeltaFile
+57-0clang/test/SemaOpenCL/fdeclare-opencl-builtins.cl
+2-2clang/lib/Sema/OpenCLBuiltins.td
+59-22 files

LLVM/project eaa7516llvm/lib/Target/X86 X86ISelLowering.cpp X86InstrSSE.td, llvm/test/CodeGen/X86 clmul.ll clmul-x86.ll

[X86] Lower scalar llvm.clmul intrinsics to PCLMULQDQ (#175189) (#175216)

Add support for lowering scalar llvm.clmul intrinsics (i8/i16/i32/i64)
to the PCLMULQDQ hardware instruction on X86 targets with the PCLMUL
feature, instead of using the default software expansion.

The lowering:

- Extends smaller types to the target's native width (i64 on x86-64, i32
on i686)
- Uses SCALAR_TO_VECTOR to create vectors (v2i64 on x86-64, v4i32 with
bitcast to v2i64 on i686)
- Performs X86ISD::PCLMULQDQ with immediate 0x00
- Extracts the result and truncates back to the original type

i8/i16/i32 CLMUL is enabled on both 32-bit and 64-bit targets. i64
CLMUL/CLMULH is only enabled on 64-bit targets.

Also adds ISD::CLMULH i64 support by extracting the upper element from

    [2 lines not shown]
DeltaFile
+215-0llvm/test/CodeGen/X86/clmul.ll
+58-0llvm/lib/Target/X86/X86ISelLowering.cpp
+11-13llvm/lib/Target/X86/X86InstrSSE.td
+18-0llvm/test/CodeGen/X86/clmul-x86.ll
+4-4llvm/lib/Target/X86/X86InstrAVX512.td
+3-0llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
+309-172 files not shown
+315-178 files

LLVM/project fb0881fmlir/lib/Dialect/SCF/Transforms TileUsingInterface.cpp, mlir/lib/Dialect/Tensor/Transforms SwapExtractSliceWithProducerPatterns.cpp

[mlir][Tensor] Add rank-reducing slice in generatedSlices (#174248)

When `replaceExtractSliceWithTiledProducer `creates a rank-reducing
slice to handle type mismatches, it should be tracked in
`generatedSlices `so downstream cleanup patterns (like IREE's
FoldExtractSliceOfBroadcast) can process it.
 
This PR also fixes an infinite loop in getUntiledProducerFromSliceSource
where adding the slice to generatedSlices caused the fusion worklist to
repeatedly try to re-fuse producers already inside the innermost loop;
the fix skips producers that are already inside the innermost loop via
an isProperAncestor check.

Added a lit test (@fuse_through_rank_reducing_slice) demonstrating
correct fusion through rank-reducing slices. Note that demonstrating the
generatedSlices tracking benefit requires a cleanup pattern
(SwapExtractSliceWithFillPatterns) to consume the slice; IREE's full CI
suite (iree-org/iree#23012) validates this works correctly in practice
with patterns like FoldExtractSliceOfBroadcast.

    [3 lines not shown]
DeltaFile
+60-0mlir/test/Interfaces/TilingInterface/tile-and-fuse-using-interface.mlir
+12-1mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
+4-1mlir/test/Interfaces/TilingInterface/tile-and-fuse-with-reduction-tiling.mlir
+1-0mlir/lib/Dialect/Tensor/Transforms/SwapExtractSliceWithProducerPatterns.cpp
+77-24 files

LLVM/project a0b71b0lldb/source/Plugins/Language/CPlusPlus MsvcStlVariant.cpp, lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant TestDataFormatterStdVariant.py main.cpp

Revert "[LLDB] Fix MS STL `variant` with non-trivial types" (#176059)

Reverts llvm/llvm-project#171489 because it causes
`TestDataFormatterStdVariant.py` to fail on Darwin.

Affected bots:

- https://ci.swift.org/view/all/job/llvm.org/view/LLDB/job/as-lldb-cmake/
- https://ci.swift.org/view/all/job/llvm.org/view/LLDB/job/lldb-cmake/
DeltaFile
+0-21lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant/TestDataFormatterStdVariant.py
+5-11lldb/source/Plugins/Language/CPlusPlus/MsvcStlVariant.cpp
+0-5lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant/main.cpp
+5-373 files

LLVM/project 21b3642lldb/source/Plugins/SymbolFile/NativePDB PdbAstBuilderClang.cpp PdbAstBuilder.cpp, lldb/source/Plugins/TypeSystem/Clang TypeSystemClang.h

[LLDB][NativePDB] Introduce PdbAstBuilderClang (#175840)

This changes `PdbAstBuilder` to a language-neutral abstract interface
and moves all of its functionality to the `PdbAstBuilderClang` derived
class.

All Clang-specific methods with external callers are now public methods
on `PdbAstBuilderClang`. `TypeSystemClang` and `UdtRecordCompleter` use
`PdbAstBuilderClang` directly.

Did my best to clean up includes and unused methods.

RFC for context:

https://discourse.llvm.org/t/rfc-lldb-make-pdbastbuilder-language-agnostic/89117
DeltaFile
+1,547-0lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilderClang.cpp
+0-1,544lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.cpp
+182-0lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilderClang.h
+23-148lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.h
+7-1lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.h
+3-4lldb/source/Plugins/SymbolFile/NativePDB/UdtRecordCompleter.h
+1,762-1,6975 files not shown
+1,770-1,70711 files

LLVM/project 0f85aa1lldb/source/Plugins/Language/CPlusPlus MsvcStlVariant.cpp, lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant TestDataFormatterStdVariant.py main.cpp

Revert "[LLDB] Fix MS STL `variant` with non-trivial types (#171489)"

This reverts commit 9a632fd684e1729b93f9f5272ad6b5798f38ba77.
DeltaFile
+0-21lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant/TestDataFormatterStdVariant.py
+5-11lldb/source/Plugins/Language/CPlusPlus/MsvcStlVariant.cpp
+0-5lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/variant/main.cpp
+5-373 files

LLVM/project 1463fballvm/test/CodeGen/AMDGPU local-stack-alloc-add-references.gfx8.mir coalesce-copy-to-agpr-to-av-registers.mir, llvm/test/MC/Disassembler/AMDGPU gfx12_dasm_vop1_dpp8.txt

[AMDGPU] Allow allocation of lo128 registers from all banks

We can encode 16-bit operands in a short form for VGPRs [0..127].
When we have 1K registers available we can in fact allocate 4
times more from all 4 banks. That, however, requires an allocatable
class for these operands. When for most of the instructions it will
result in the VOP3 longer form, for V_FMAAMK/FMADAK_F16 it will
simply prohibit the encoding because these do not have VOP3 forms.

A straight forward solution would be to create a register class
with all registers having bit 8 of the encoding zero, i.e. to
create a register class with holes punched in it: [0-127, 256-383,
512-639, 768-895]. LLVM, however, does not like register classes
with punched holes when they also have subregisters. The cross-
product of all classes explodes and some combinations of a 'class
having a common subreg with another' becomeing impossible. Just
doing so explodes our register info to 4+Gb, uncompilable too.

The solution proposed is to define _lo128 RC with contigous 896

    [17 lines not shown]
DeltaFile
+180-180llvm/test/CodeGen/AMDGPU/local-stack-alloc-add-references.gfx8.mir
+118-118llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
+90-90llvm/test/CodeGen/AMDGPU/local-stack-alloc-add-references.gfx9.mir
+177-0llvm/test/CodeGen/AMDGPU/shrink-vgpr_lo128-gfx1250-t16.mir
+49-46llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop1_dpp8.txt
+94-0llvm/test/CodeGen/AMDGPU/regalloc-vgpr_lo128-gfx1250-t16.mir
+708-43440 files not shown
+1,340-73446 files

LLVM/project f1821a5clang/include/clang/Options Options.td, llvm/docs MemProf.rst

[docs][MemProf]Update compiler options for static data partitioning (#175872)

https://github.com/llvm/llvm-project/pull/124991 introduces a Clang
option for static data partitioning. Update the LLVM option with the
Clang option and some notes on how data hotness is inferred from
profiles.
DeltaFile
+16-5llvm/docs/MemProf.rst
+1-1clang/include/clang/Options/Options.td
+17-62 files

LLVM/project 782bf6allvm/lib/Target/X86 X86ISelLoweringCall.cpp X86ISelLowering.h, llvm/test/CodeGen/X86 musttail-struct.ll musttail-tailcc.ll

x86: fix musttail sibcall miscompilation (#168956)

fixes https://github.com/llvm/llvm-project/issues/56891
fixes https://github.com/llvm/llvm-project/issues/72390
fixes https://github.com/llvm/llvm-project/issues/147813

Currently the x86 backend miscompiles straightforward tail calls when
the stack is used for argument passing. This program segfaults on any
optimization level:

https://godbolt.org/z/5xr99jr4v

```c
typedef struct {
    uint64_t x;
    uint64_t y;
    uint64_t z;
} S;


    [41 lines not shown]
DeltaFile
+320-0llvm/test/CodeGen/X86/musttail-struct.ll
+154-34llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+0-18llvm/test/CodeGen/X86/musttail-tailcc.ll
+16-0llvm/lib/Target/X86/X86ISelLowering.h
+7-2llvm/test/CodeGen/X86/sibcall.ll
+2-4llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll
+499-583 files not shown
+502-649 files

LLVM/project 44b691autils/bazel/llvm-project-overlay/clang BUILD.bazel

[bazel] Update for #175873 (BuiltinsAMDGPU)
DeltaFile
+9-0utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+9-01 files

LLVM/project 8ac6c4aclang/test/Frontend rewrite-includes-bom.c

[Clang] Fix rewrite-includes-bom.c to use POSIX-compliant regex (#176043)

As `\s` is a GNU extension, it is not supported by the system grep on
AIX and thus fails in the
[buildbot](https://lab.llvm.org/buildbot/#/builders/64/builds/6835):

```
******************** TEST 'Clang :: Frontend/rewrite-includes-bom.c' FAILED ********************
Exit Code: 1
Command Output (stdout):
--
# RUN: at line 1
cat /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/clang/test/Frontend/Inputs/rewrite-includes-bom.h | od -t x1 | grep -q 'ef\s*bb\s*bf'
# executed command: cat /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/clang/test/Frontend/Inputs/rewrite-includes-bom.h
# executed command: od -t x1
# executed command: grep -q 'ef\s*bb\s*bf'
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1
--

    [6 lines not shown]
DeltaFile
+2-2clang/test/Frontend/rewrite-includes-bom.c
+2-21 files