LLVM/project 48eb697llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/X86 replicating-load-store-costs.ll

[LV] Count cost of middle block if TC <= VF. (#168949)

If the expected trip count is less than the VF, the vector loop will
only execute a single iteration. When that's the case, the cost of the
middle block has the same impact as the cost of the vector loop. Include
it in isOutsideLoopWorkProfitable to avoid vectorizing when the extra
work in the middle block makes it unprofitable.

Note that isOutsideLoopWorkProfitable already scales the cost of blocks
outside the vector region, but the patch restricts accounting for the
middle block to cases where VF <= ExpectedTC, to initially catch some
worst cases and avoid regressions.

This initial version should specifically avoid unprofitable tail-folding
for loops with low trip counts after re-applying
https://github.com/llvm/llvm-project/pull/149042.

PR: https://github.com/llvm/llvm-project/pull/168949
DeltaFile
+17-34llvm/test/Transforms/LoopVectorize/X86/replicating-load-store-costs.ll
+14-3llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+31-372 files

LLVM/project e92bb83llvm/lib/Target/AArch64 AArch64AsmPrinter.cpp

[AArch64][PAC] Simplify emission of authenticated pointer check (NFC) (#160899)

The `AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue` method accepts
two arguments, `bool ShouldTrap` and `const MCSymbol *OnFailure`, that
control the behavior of the emitted instruction sequence when the check
fails:
* `ShouldTrap` requests an error to be generated
* `OnFailure` requests branching to the given label after clearing the
  PAC field

An assertion in `emitPtrauthCheckAuthenticatedValue` ensures that when
`ShouldTrap` is true, `OnFailure` must be null. But the opposite holds
as well: when `ShouldTrap` is false, `OnFailure` is always non-null,
as otherwise the entire sequence following `AUT[ID][AB]` instruction
would turn into a very expensive equivalent of XPAC (unless the CPU
implements FEAT_FPAC):

    authenticate Xn
    inspect PAC field of Xn

    [12 lines not shown]
DeltaFile
+24-30llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+24-301 files

LLVM/project 0549aa1llvm/include/llvm/DWARFLinker/Classic DWARFLinkerDeclContext.h DWARFLinker.h, llvm/lib/DWARFLinker/Classic DWARFLinker.cpp DWARFLinkerDeclContext.cpp

[llvm][dsymutil] Use the DW_AT_name of the uniqued DIE for insertion into .debug_names (#168513)

Depends on:
* https://github.com/llvm/llvm-project/pull/168895

Note, the last commit is the one with the actual fix. The others are
drive-by/test changes

We've been seeing dsymutil verification failures like:
```
error: Name Index @ 0x0: Entry @ 0x11949d: mismatched Name of DIE @ 0x9c644c:
index - apply<(lambda at /some/build/dir/lib/LLVMSupport/include/llvm/Support/Error.h:1070:35)>;
debug_info - apply<(lambda at /some/build/dir/lib/LLVMCustom/include/llvm/Support/Error.h:1070:35)>
apply, _ZN11custom_llvm18ErrorHandlerTraitsIRFvRNS_13ErrorInfoBaseEEE5applyIZNS_12consumeErrorENS_5ErrorEEUlRKS1_E_EES7_OT_NSt3__110unique_ptrIS1_NSD_14default_deleteIS1_EEEE.
```
Not how the name of the DIE has a different lambda path than the one
that was used to insert the DIE into debug_names.

The root cause of the issue is that we have a DW_AT_subprogram

    [31 lines not shown]
DeltaFile
+69-7llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
+18-14llvm/lib/DWARFLinker/Classic/DWARFLinkerDeclContext.cpp
+28-0llvm/test/tools/dsymutil/AArch64/odr-uniquing-DW_AT_name-conflict.test
+9-5llvm/include/llvm/DWARFLinker/Classic/DWARFLinkerDeclContext.h
+5-1llvm/include/llvm/DWARFLinker/Classic/DWARFLinker.h
+6-0llvm/test/tools/dsymutil/Inputs/odr-uniquing-DW_AT_name-conflict/main.cpp
+135-278 files not shown
+155-3014 files

LLVM/project 658675fclang/lib/CIR/CodeGen CIRGenDeclOpenACC.cpp CIRGenOpenACCClause.cpp, clang/test/CIR/CodeGenOpenACC declare-deviceresident.cpp

[OpenACC][CIR] 'device_resident' clause lowering for local declare (#169389)

Just like the last handful of clauses, this is a pretty simple one,
doing device_resident (Entry op: declare_device_resident, and exit:
    delete).  This should be the last of the 'local' declare patches.
DeltaFile
+199-0clang/test/CIR/CodeGenOpenACC/declare-deviceresident.cpp
+8-8clang/lib/CIR/CodeGen/CIRGenDeclOpenACC.cpp
+12-0clang/lib/CIR/CodeGen/CIRGenOpenACCClause.cpp
+219-83 files

LLVM/project 4a0d485clang-tools-extra/clang-doc/assets class-template.mustache, clang-tools-extra/test/clang-doc namespace.cpp

[clang-doc] Add definition information to class templates (#169109)

DeltaFile
+4-5clang-tools-extra/test/clang-doc/namespace.cpp
+1-0clang-tools-extra/clang-doc/assets/class-template.mustache
+5-52 files

LLVM/project f5e228bllvm/lib/Target/DirectX DXILDataScalarization.cpp, llvm/test/CodeGen/DirectX bugfix_150050_data_scalarize_const_gep.ll scalarize-alloca.ll

[DirectX] Simplify DXIL data scalarization, and data scalarize whole GEP chains (#168096)

- The DXIL data scalarizer only needs to change vectors into arrays. It
does not need to change the types of GEPs to match the pointer type.
This PR simplifies the `visitGetElementPtrInst` method to do just that
while also accounting for nested GEPs from ConstantExprs. (Before this
PR, there were still vector types lingering in nested GEPs with
ConstantExprs.)
- The `equivalentArrayTypeFromVector` function was awkwardly placed near
the top of the file and away from the other helper functions. The
function is now moved next to the other helper functions.
- Removed an unnecessary `||` condition from `isVectorOrArrayOfVectors`

Related tests have also been cleaned up, and the test CHECKs have been
modified to account for the new simplified behavior.
DeltaFile
+47-82llvm/lib/Target/DirectX/DXILDataScalarization.cpp
+21-14llvm/test/CodeGen/DirectX/bugfix_150050_data_scalarize_const_gep.ll
+3-3llvm/test/CodeGen/DirectX/scalarize-alloca.ll
+3-3llvm/test/CodeGen/DirectX/scalarize-global.ll
+74-1024 files

LLVM/project f7f6b44clang-tools-extra/clang-doc/assets class-template.mustache, clang-tools-extra/test/clang-doc namespace.cpp

[clang-doc] Add definition information to class templates
DeltaFile
+4-5clang-tools-extra/test/clang-doc/namespace.cpp
+1-0clang-tools-extra/clang-doc/assets/class-template.mustache
+5-52 files

LLVM/project 4459564clang/docs ReleaseNotes.rst, clang/lib/Sema SemaDecl.cpp

[clang][Sema] Handle target_clones redeclarations that omit the attribute (#169259)

This patch adds a case to `CheckMultiVersionAdditionalDecl()` that
detects redeclarations of `target_clones` functions which omit the
attribute, and makes sure they are marked as redeclarations. It also
updates the comment at the call site of
`CheckMultiVersionAdditionalDecl()` to reflect this.

Previously, `target_clones` multiversioned functions that omitted the
attribute from subsequent declarations would cause Clang to hit an
`llvm_unreachable` and crash. In the following example, the second
declaration (the function definition) should inherit the `target_clones`
attribute from the first declaration (the forward declaration):

```
__attribute__((target_clones("arch=atom", "default")))
void foo(void);

void foo(void) { /* ... */ }

    [14 lines not shown]
DeltaFile
+29-0clang/test/CodeGen/attr-target-clones.c
+13-2clang/lib/Sema/SemaDecl.cpp
+13-0clang/test/Sema/attr-target-clones.c
+2-0clang/docs/ReleaseNotes.rst
+57-24 files

LLVM/project 40fb2callvm/utils/gn/secondary/llvm/lib/Target/RISCV BUILD.gn

[gn build] Port 645e0dcbff33
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/lib/Target/RISCV/BUILD.gn
+1-01 files

LLVM/project 0e86510llvm/utils/gn/secondary/clang/lib/Driver BUILD.gn, llvm/utils/gn/secondary/clang/lib/Frontend BUILD.gn

[gn build] Port 3773bbe9e791
DeltaFile
+2-0llvm/utils/gn/secondary/clang/lib/Driver/BUILD.gn
+1-1llvm/utils/gn/secondary/clang/lib/Frontend/BUILD.gn
+3-12 files

LLVM/project d4cd331llvm/utils/gn/secondary/libcxx/include BUILD.gn

[gn build] Port 2bdd1357c826
DeltaFile
+0-1llvm/utils/gn/secondary/libcxx/include/BUILD.gn
+0-11 files

LLVM/project 740d0bdmlir/include/mlir-c/Dialect LLVM.h, mlir/lib/Bindings/Python DialectLLVM.cpp

[MLIR][Python] add GetTypeID for llvm.struct_type and llvm.ptr and enable downcasting (#169383)

DeltaFile
+8-0mlir/lib/CAPI/Dialect/LLVM.cpp
+4-3mlir/lib/Bindings/Python/DialectLLVM.cpp
+6-0mlir/test/python/dialects/llvm.py
+4-0mlir/include/mlir-c/Dialect/LLVM.h
+22-34 files

LLVM/project 5242cfcllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU expand-waitcnt-profiling.ll

address review
DeltaFile
+130-184llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+19-11llvm/test/CodeGen/AMDGPU/expand-waitcnt-profiling.ll
+149-1952 files

LLVM/project 34b1dd9clang-tools-extra/clang-doc/assets class-template.mustache, clang-tools-extra/test/clang-doc namespace.cpp

[clang-doc] Add definition information to class templates
DeltaFile
+4-5clang-tools-extra/test/clang-doc/namespace.cpp
+1-0clang-tools-extra/clang-doc/assets/class-template.mustache
+5-52 files

LLVM/project 1b65752clang/lib/CIR/CodeGen CIRGenDeclOpenACC.cpp CIRGenOpenACCClause.cpp, clang/test/CIR/CodeGenOpenACC declare-present.cpp

[OpenACC][CIR] Implement 'present' lowering on local-declare (#169381)

Just like the last handful of patches that did copy, copyin, copyout,
     create, etc, this patch has the exact same behavior, except the
     entry op is a present, and the exit is delete.
DeltaFile
+199-0clang/test/CIR/CodeGenOpenACC/declare-present.cpp
+6-4clang/lib/CIR/CodeGen/CIRGenDeclOpenACC.cpp
+7-3clang/lib/CIR/CodeGen/CIRGenOpenACCClause.cpp
+212-73 files

LLVM/project 1c92344clang/lib/Sema SemaDeclCXX.cpp

[clang][NFC] Don't copy into a vector just to iterate in `IsInitListMemberExprInitialized`.
DeltaFile
+11-11clang/lib/Sema/SemaDeclCXX.cpp
+11-111 files

LLVM/project 0a0e570clang-tools-extra/clang-doc/assets class-template.mustache, clang-tools-extra/test/clang-doc namespace.cpp

[clang-doc] Add definition information to class templates
DeltaFile
+4-5clang-tools-extra/test/clang-doc/namespace.cpp
+1-0clang-tools-extra/clang-doc/assets/class-template.mustache
+5-52 files

LLVM/project 8617eefllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/tools/llvm-dwarfdump/X86 simplified-template-names.s

Merge branch 'main' into fix-zero-estimated-trip-count
DeltaFile
+41,820-45,029llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+11,644-10,635llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+5,981-8,885llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll
+5,981-8,885llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll
+7,387-7,087llvm/test/tools/llvm-dwarfdump/X86/simplified-template-names.s
+3,868-6,624llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v3i64.ll
+76,681-87,1456,545 files not shown
+410,734-318,0516,551 files

LLVM/project a27bb38llvm/tools/bugpoint ExecutionDriver.cpp BugDriver.h

Reapply "[NFC][bugpoint] Namespace cleanup in `bugpoint`" (#168961) (#169055)

This reverts commit b83e458fe5330227581e1e65f3866ddfcd597837.

Also undo the use of namespace qualifier for `ReducePassList` as that
seems to cause build failures.
DeltaFile
+6-9llvm/tools/bugpoint/ExecutionDriver.cpp
+5-0llvm/tools/bugpoint/BugDriver.h
+0-5llvm/tools/bugpoint/Miscompilation.cpp
+0-4llvm/tools/bugpoint/OptimizerDriver.cpp
+0-3llvm/tools/bugpoint/ExtractFunction.cpp
+11-215 files

LLVM/project 314c97allvm/include/llvm/CodeGen MachineFunction.h, llvm/lib/CodeGen MachineFunction.cpp

[AMDGPU][MC] Replace shifted registers in CFI instructions
DeltaFile
+67-67llvm/test/CodeGen/AMDGPU/sgpr-spill-overlap-wwm-reserve.mir
+33-0llvm/lib/MC/MCDwarf.cpp
+15-15llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll
+10-0llvm/lib/CodeGen/MachineFunction.cpp
+4-4llvm/test/CodeGen/AMDGPU/debug-frame.ll
+4-0llvm/include/llvm/CodeGen/MachineFunction.h
+133-864 files not shown
+141-8810 files

LLVM/project 5af79abllvm/test/CodeGen/AMDGPU gfx-callable-argument-types.ll accvgpr-spill-scc-clobber.mir

[AMDGPU] Implement CFI for CSR spills

Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.

Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.

Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+1,932-1,933llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
+2,688-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+490-82llvm/test/CodeGen/AMDGPU/pei-vgpr-block-spill-csr.mir
+487-11llvm/test/CodeGen/AMDGPU/debug-frame.ll
+171-160llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v8i64.ll
+114-114llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.576bit.ll
+5,882-2,30070 files not shown
+7,411-3,16376 files

LLVM/project 9875b29llvm/lib/Target/AMDGPU SIFrameLowering.cpp SIMachineFunctionInfo.h, llvm/test/CodeGen/AMDGPU amdgpu-spill-cfi-saved-regs.ll

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+2,556-0llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
+35-10llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+11-2llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+9-0llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+7-0llvm/lib/Target/AMDGPU/SIFrameLowering.h
+2-1llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+2,620-132 files not shown
+2,623-148 files

LLVM/project 5e30be8llvm/lib/Target/AMDGPU SIFrameLowering.cpp, llvm/test/CodeGen/AMDGPU whole-wave-functions.ll accvgpr-spill-scc-clobber.mir

WIP attempt to avoid MCRegAliasIterator
DeltaFile
+45-49llvm/test/CodeGen/AMDGPU/whole-wave-functions.ll
+30-13llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+12-12llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+4-4llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
+91-784 files

LLVM/project 13aef6f

Use register pair for PC spill
DeltaFile
+0-00 files

LLVM/project 32bd3c3llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll gfx-callable-argument-types.ll

Use nounwind to avoid touching unrelated tests
DeltaFile
+6,439-112llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+579-587llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
+9-694llvm/test/CodeGen/AMDGPU/pei-amdgpu-cs-chain-preserve.mir
+99-541llvm/test/CodeGen/AMDGPU/pei-vgpr-block-spill-csr.mir
+8-574llvm/test/CodeGen/AMDGPU/pei-amdgpu-cs-chain.mir
+9-286llvm/test/CodeGen/AMDGPU/amdgcn-call-whole-wave.ll
+7,143-2,79437 files not shown
+8,092-3,54243 files

LLVM/project b46525bllvm/test/CodeGen/AMDGPU materialize-frame-index-sgpr.ll gfx-callable-argument-types.ll

Use register pair for PC spill
DeltaFile
+818-816llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
+616-618llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
+552-552llvm/test/CodeGen/AMDGPU/indirect-call.ll
+160-139llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v8i64.ll
+140-140llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll
+111-111llvm/test/CodeGen/AMDGPU/sibling-call.ll
+2,397-2,37650 files not shown
+3,262-3,22956 files

LLVM/project 539d743llvm/test/CodeGen/AMDGPU eliminate-frame-index-v-add-u32.mir eliminate-frame-index-v-add-co-u32.mir

Respect MachineFunction::needsFrameMoves

Use nounwind to try to avoid cluttering tests
DeltaFile
+55-204llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-u32.mir
+63-134llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-co-u32.mir
+52-114llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-i32.mir
+22-26llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-co-u32-wave32.mir
+11-20llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-u32.mir
+10-10llvm/test/CodeGen/AMDGPU/issue98474-virtregrewriter-live-out-undef-subregisters.mir
+213-50812 files not shown
+254-56518 files

LLVM/project 4161e2fllvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir

[AMDGPU] Implement CFI for non-kernel functions

This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.

Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+2,136-0llvm/test/CodeGen/AMDGPU/vgpr-spill-scc-clobber.mir
+1,671-1llvm/test/CodeGen/AMDGPU/debug-frame.ll
+16,779-16977 files not shown
+25,213-1,12383 files

LLVM/project f312202llvm/lib/Target/AMDGPU SIFrameLowering.cpp

Prefer SIRegisterInfo to MCRegisterInfo and add braces
DeltaFile
+10-10llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+10-101 files

LLVM/project 2f1e405llvm/test/CodeGen/AMDGPU eliminate-frame-index-v-add-u32.mir eliminate-frame-index-v-add-co-u32.mir, llvm/test/CodeGen/AMDGPU/GlobalISel memory-legalizer-atomic-fence.ll

Don't add IR sections to MIR tests just to add nounwind
DeltaFile
+0-480llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll
+204-55llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-u32.mir
+134-63llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-co-u32.mir
+114-52llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-i32.mir
+26-22llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-co-u32-wave32.mir
+20-11llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-u32.mir
+498-6834 files not shown
+518-71310 files