LLVM/project 2fc42c7clang/lib/CodeGen CGHLSLRuntime.cpp CGHLSLRuntime.h, clang/test/CodeGenHLSL sret_output.hlsl

[HLSL] Add initial support for output semantics (#168095)

This commits adds the first part of the output semantics. It only
considers return values (and sret), but does not handle `inout` or `out`
parameters yet.
Those missing bits will reuse the same code, but will require additional
testing & some fixups, so planning on adding them separately.
DeltaFile
+164-7clang/lib/CodeGen/CGHLSLRuntime.cpp
+56-0clang/test/CodeGenHLSL/semantics/semantic.struct.output.hlsl
+37-0clang/test/CodeGenHLSL/semantics/semantic.array.output.hlsl
+35-0clang/test/CodeGenHLSL/semantics/semantic-struct-2-output.hlsl
+34-0clang/lib/CodeGen/CGHLSLRuntime.h
+22-10clang/test/CodeGenHLSL/sret_output.hlsl
+348-1720 files not shown
+427-7526 files

LLVM/project 7b8eee6llvm/test/CodeGen/RISCV sincos-expansion.ll

[RISCV][test] Add sincos-expansion.ll test case
DeltaFile
+178-0llvm/test/CodeGen/RISCV/sincos-expansion.ll
+178-01 files

LLVM/project 5da0445llvm/include/llvm/Transforms/Vectorize LoopVectorizationLegality.h, llvm/lib/Transforms/Vectorize LoopVectorize.cpp LoopVectorizationLegality.cpp

[LV] Consolidate shouldOptimizeForSize and remove unused BFI/PSI. NFC (#168697)

#158690 plans on passing BFI as a lazy lambda to avoid computing
BlockFrequencyInfo when not needed.

In preparation for that, this PR removes BFI and PSI from some
constructors that aren't used. It also consolidates the two calls to
llvm::shouldOptimizeForSize so that the result is computed once and
passed where needed.

This also renames OptForSize in LoopVectorizationLegality to clarify
that it's to prevent runtime SCEV checks, see
https://reviews.llvm.org/D68082
DeltaFile
+38-51llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+10-14llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+1-3llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+49-683 files

LLVM/project 9eee396flang/lib/Optimizer/Builder/Runtime Character.cpp

[flang] "Almost NFC" changes to fir::runtime::genCharCompare() (#168563)

As part of investigating a related issue, I made the following changes
to fir::runtime::genCharCompare():
- Renamed a variable
- Added an error check for the same kind of input args
- Updated another error check to use the same error found elsewhere in
this source file
DeltaFile
+15-8flang/lib/Optimizer/Builder/Runtime/Character.cpp
+15-81 files

LLVM/project 7fe3564clang/lib/Headers/llvm_libc_wrappers ctype.h string.h, libc/cmake/modules LLVMLibCHeaderRules.cmake

[Clang] Gut the libc wrapper headers and simplify (#168438)

Summary:
These were originally intended to represent the functions that are
present on the GPU as to be provided by the LLVM libc implementation.
The original plan was that LLVM libc would report which functions were
supported and then the offload interface would mark those as supported.
The problem is that these wrapper headers are very difficult to make
work given the various libc extensions everyone does so they were
extremely fragile.

OpenMP already declares all functions used inside of a target region as
implicitly host / device, while these headers weren't even used for CUDA
/ HIP yet anyway. The only things we need to define right now are the
stdio FILE types. If we want to make this work for CUDA we'd need to
define these manually, but we're a ways off and that's way easier
because they do proper overloading.
DeltaFile
+3-115clang/lib/Headers/llvm_libc_wrappers/ctype.h
+2-70clang/lib/Headers/llvm_libc_wrappers/string.h
+0-54libc/utils/hdrgen/hdrgen/gpu_headers.py
+10-38clang/lib/Headers/llvm_libc_wrappers/stdio.h
+3-24clang/lib/Headers/llvm_libc_wrappers/stdlib.h
+0-15libc/cmake/modules/LLVMLibCHeaderRules.cmake
+18-3166 files not shown
+31-35112 files

LLVM/project a2ddb02llvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange pr57148.ll loopnest-with-outer-btc0.ll

[LoopInterchange] Don't consider loops with BTC=0 (#167113)

Do not consider loops with a zero backedge taken count as candidates for
interchange. This seems like a sensible thing because it suggests the loop
doesn't execute and there is no point in interchanging. As a bonus, this
seems to avoid triggering an assert about phis and their uses from source
code, so this is a partial fix for #163954 but it needs more work to properly
fix that.
DeltaFile
+48-34llvm/test/Transforms/LoopInterchange/pr57148.ll
+74-0llvm/test/Transforms/LoopInterchange/loopnest-with-outer-btc0.ll
+54-0llvm/test/Transforms/LoopInterchange/zero-btc.ll
+18-0llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+3-3llvm/test/Transforms/LoopInterchange/pr43326.ll
+197-375 files

LLVM/project 68d2ce8llvm/lib/Analysis DependenceAnalysis.cpp, llvm/test/Analysis/DependenceAnalysis StrongSIV.ll Coupled.ll

[DA] Replace delinearization for fixed size array (#161822)

This patch replaces the delinearization function used in DA, switching
from one that depends on type information in GEPs to one that does not.
There are three types of changes in regression tests: improvements,
degradations, and degradations but the related features will be
removed. Since there were very few cases that are classified into the
second category, I believe the impact of this change should be
practically insignificant.
DeltaFile
+16-18llvm/lib/Analysis/DependenceAnalysis.cpp
+4-10llvm/test/Analysis/DependenceAnalysis/StrongSIV.ll
+6-5llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll
+2-4llvm/test/Analysis/DependenceAnalysis/Coupled.ll
+3-3llvm/test/Analysis/DependenceAnalysis/Separability.ll
+5-0llvm/test/Transforms/LoopFusion/pr164082.ll
+36-409 files not shown
+57-5015 files

LLVM/project 6fc48dellvm/lib/Target/AArch64 AArch64SchedNeoverseN3.td AArch64SchedNeoverseN2.td, llvm/test/tools/llvm-mca/AArch64/Neoverse N3-sve-instructions.s V1-zero-dependency.s

[AArch64] Update zero latency instructions in Neoverse scheduling tables (#165690)

NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and
V3AE.
Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including
these instructions:
MOV Xd|Wd, #0|XZR|WZR

For all the above Neoverse targets, the following instructions are also
decoded as not utilizing the scheduling and execution resources of the
machine:
MOV Wd,Wn
MOV Xd,Xn

For Neoverse-N3 only, these instructions also have zero latency 
FMOV Dd, Dn
FMOV Sd, Sn
MOV Vd, Vn (vector)
MOV Zd.D, Zn.D
PTRUE
PFALSE
DeltaFile
+28-28llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-sve-instructions.s
+34-7llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td
+20-20llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-zero-dependency.s
+15-15llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
+15-15llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-basic-instructions.s
+25-3llvm/lib/Target/AArch64/AArch64SchedNeoverseN2.td
+137-885 files not shown
+189-10911 files

LLVM/project 655662emlir/include/mlir/IR Properties.td

[MLIR][ODS] Fully qualify namespace for mlir::Attribute in ODS generated code (#168536)

ODS generate code can be included and used outside of the `mlir`
namespace and so references to symbols in the mlir namespace
must be fully qualified.
DeltaFile
+3-3mlir/include/mlir/IR/Properties.td
+3-31 files

LLVM/project 267c93abolt/test/AArch64 inline-pauth-lr.s

[BOLT] Add unittest for inliner using retaasppc <label>
DeltaFile
+19-3bolt/test/AArch64/inline-pauth-lr.s
+19-31 files

LLVM/project dce6002clang/include/clang/Basic Builtins.h, clang/lib/Basic Builtins.cpp

[Clang][Codegen] Move floating point math intrinsic check to separate function [NFC] (#168198)

This PR moves the code that checks whether an LLVM intrinsic should be
generated instead of a call to floating point math functions to a
separate function. This simplifies `EmitBuiltinExpr` in `CGBuiltin.cpp`
and will allow us to reuse the logic in ClangIR.
DeltaFile
+90-0clang/lib/Basic/Builtins.cpp
+6-76clang/lib/CodeGen/CGBuiltin.cpp
+17-0clang/include/clang/Basic/Builtins.h
+113-763 files

LLVM/project bdcaa00llvm/lib/Target/AArch64 AArch64PerfectShuffle.h, llvm/test/CodeGen/AArch64 reduce-shuffle.ll arm64-trn.ll

[AArch64] match TRN starting from undef elements (#167955)

When the first element of a trn mask is undef, the `isTRNMask` function
assumes `WhichResult = 1`. That has a 50% chance of being wrong, so we
fail to match some valid trn1/trn2.

This patch introduces a more precise test to determine the correct value
of `WhichResult`, based on corresponding code in the `isZIPMask` and
`isUZPMask` functions.

- This change is based on #89578. I'd like to follow it up with a
further change along the lines of #167235.
DeltaFile
+228-230llvm/test/CodeGen/AArch64/reduce-shuffle.ll
+113-1llvm/test/CodeGen/AArch64/arm64-trn.ll
+10-10llvm/test/CodeGen/AArch64/insert-extend.ll
+10-10llvm/test/CodeGen/AArch64/vldn_shuffle.ll
+17-2llvm/lib/Target/AArch64/AArch64PerfectShuffle.h
+378-2535 files

LLVM/project c32c1d0compiler-rt/cmake base-config-ix.cmake, offload CMakeLists.txt

[Runtimes] Default build must use its own output dirs (#168266)

Post-commit fix of #164794 reported at
https://github.com/llvm/llvm-project/pull/164794#issuecomment-3536253493

`LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by
`AddLLVM.cmake` as output directories. Unless we are in a
bootstrapping-build, It must not point to directories found by
`find_package(LLVM)` which may be read-only directories. MLIR for
instance sets thesese variables to its own build output
directory, so should the runtimes.
DeltaFile
+12-6runtimes/CMakeLists.txt
+6-6offload/CMakeLists.txt
+4-4openmp/CMakeLists.txt
+2-2compiler-rt/cmake/base-config-ix.cmake
+2-2offload/cmake/OpenMPTesting.cmake
+2-2openmp/cmake/OpenMPTesting.cmake
+28-222 files not shown
+30-248 files

LLVM/project 7b94dd3llvm/lib/Transforms/Vectorize VPlan.h

[VPLan] Reduce duplication in VPHeaderPHIRecipe::classof. (NFCI)

Implement VPHeaderPHIRecipe::classof(const VPValue *V)  in terms of the
variant taking VPRecipeBase.

Reduces some duplication, split off from
https://github.com/llvm/llvm-project/pull/141431.
DeltaFile
+4-6llvm/lib/Transforms/Vectorize/VPlan.h
+4-61 files

LLVM/project 42e24cfclang/include/clang/Driver Options.td, clang/include/clang/Options Options.td

Merge branch 'main' into users/s.barannikov/decoder-operands-7-arm
DeltaFile
+36,400-36,393llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+11,724-10,707llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+5,202-5,039llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+4,719-5,242llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp16.txt
+9,668-0clang/include/clang/Options/Options.td
+0-9,644clang/include/clang/Driver/Options.td
+67,713-67,0256,017 files not shown
+299,678-208,4906,023 files

LLVM/project 58e6d02llvm/lib/CodeGen/GlobalISel CombinerHelper.cpp, llvm/lib/Target/AArch64/GISel AArch64LegalizerInfo.cpp

[AArch64][GlobalISel] Check unmergeSrc is a vector in matchCombineBuildUnmerge (#168692)

This aims to fix the crash in #168495, my combine rule was
missing a check that the source vector was in fact a vector. This then
caused the legality check to fail in this example as the concat was
trying to concat a non vector.

I have also gated the bitcast of the concat to only work on non-scalable
vectors as the mutation calls `getNumElements` which crashes when called
on a scalable vector.

Fixes #168495
DeltaFile
+72-0llvm/test/CodeGen/AArch64/GlobalISel/combine-unmerge-undef.mir
+3-1llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+3-0llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+78-13 files

LLVM/project 5343dd9clang/include/clang/Analysis/Analyses/LifetimeSafety Facts.h, clang/lib/Analysis/LifetimeSafety LiveOrigins.cpp Checker.cpp

[LifetimeSafety] Detect use-after-return (#165370)

Adding "use-after-return" in Lifetime Analysis.

Detecting when a function returns a reference to its own stack memory:
[UAR Design
Doc](https://docs.google.com/document/d/1Wxjn_rJD_tuRdejP81dlb9VOckTkCq5-aE1nGcerb_o/edit?usp=sharing)

Consider the following example:

```cpp
std::string_view foo() {
    std::string_view a;
    std::string str = "small scoped string";
    a = str;
    return a;
}
```


    [35 lines not shown]
DeltaFile
+283-0clang/unittests/Analysis/LifetimeSafetyTest.cpp
+155-1clang/test/Sema/warn-lifetime-safety.cpp
+28-10clang/lib/Analysis/LifetimeSafety/LiveOrigins.cpp
+24-8clang/lib/Analysis/LifetimeSafety/Checker.cpp
+14-4clang/lib/Analysis/LifetimeSafety/Facts.cpp
+12-6clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+516-299 files not shown
+567-4415 files

LLVM/project 50791c3clang/include/clang/Basic BuiltinsX86.td, clang/lib/AST ExprConstant.cpp

[Clang][X86] allow VPERMILPD/S imm intrinsics to be used in constexpr (#168044)

Resolves #166529
DeltaFile
+53-0clang/test/CodeGen/X86/avx512vl-builtins.c
+32-0clang/test/CodeGen/X86/avx512f-builtins.c
+23-1clang/lib/AST/ExprConstant.cpp
+19-0clang/lib/AST/ByteCode/InterpBuiltin.cpp
+7-5clang/include/clang/Basic/BuiltinsX86.td
+5-0clang/test/CodeGen/X86/avx-builtins.c
+139-66 files

LLVM/project b42851bllvm/lib/Target/X86 X86ISelLowering.cpp

[X86] EltsFromConsecutiveLoads - add recursion depth limiter (#168694)

EltsFromConsecutiveLoads can be recursively called - ensure we limit the recursion depth.
DeltaFile
+7-3llvm/lib/Target/X86/X86ISelLowering.cpp
+7-31 files

LLVM/project 5b9efabclang/include/clang/Analysis/Analyses/LifetimeSafety Origins.h, clang/lib/Analysis/LifetimeSafety FactsGenerator.cpp Origins.cpp

lifetime-safety-multi-origin
DeltaFile
+232-86clang/lib/Analysis/LifetimeSafety/FactsGenerator.cpp
+75-53clang/lib/Analysis/LifetimeSafety/Origins.cpp
+104-23clang/test/Sema/warn-lifetime-safety.cpp
+79-15clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+19-50clang/lib/Sema/CheckExprLifetime.cpp
+31-0clang/lib/Analysis/LifetimeSafety/LifetimeAnnotations.cpp
+540-2279 files not shown
+582-25815 files

LLVM/project 1500536clang/docs AllocToken.rst, clang/include/clang/Basic LangOptions.h

[AllocToken] Fix and clarify -falloc-token-max=0 (#168689)

The option -falloc-token-max=0 is supposed to be usable to override
previous settings back to the target default max tokens (SIZE_MAX).

This did not work for the builtin:
```
| executed command: clang -cc1 [..] -nostdsysteminc -triple x86_64-linux-gnu -std=c++23 -fsyntax-only -verify clang/test/SemaCXX/alloc-token.cpp -falloc-token-max=0
| clang: llvm/lib/Support/AllocToken.cpp:38: std::optional<uint64_t> llvm::getAllocToken(AllocTokenMode, const AllocTokenMetadata &, uint64_t): Assertion `MaxTokens && "Must provide non-zero max tokens"' failed.
```

Fix it by also picking the default if "0" is passed.

Improve the documentation to be clearer what the value of "0" means.
DeltaFile
+4-3llvm/lib/Transforms/Instrumentation/AllocToken.cpp
+3-3clang/docs/AllocToken.rst
+2-2clang/include/clang/Basic/LangOptions.h
+2-1clang/lib/AST/ByteCode/InterpBuiltin.cpp
+2-1clang/lib/AST/ExprConstant.cpp
+1-1clang/include/clang/Options/Options.td
+14-111 files not shown
+15-117 files

LLVM/project ed7f2a4clang/test/CodeGen sparcv9-abi.c sparcv9-dwarf.c, clang/test/CodeGen/Sparc sparcv9-abi.c sparcv9-dwarf.c

[SPARC][NFC] Move clang tests into own subdirectory (#168657)

DeltaFile
+0-276clang/test/CodeGen/sparcv9-abi.c
+276-0clang/test/CodeGen/Sparc/sparcv9-abi.c
+0-99clang/test/CodeGen/sparcv9-dwarf.c
+99-0clang/test/CodeGen/Sparc/sparcv9-dwarf.c
+0-42clang/test/CodeGen/sparcv8-inline-asm.c
+42-0clang/test/CodeGen/Sparc/sparcv8-inline-asm.c
+417-41710 files not shown
+556-55616 files

LLVM/project 125af56llvm/lib/Target/AMDGPU AMDGPUISelDAGToDAG.cpp, llvm/test/CodeGen/AMDGPU memintrinsic-unroll.ll fold-gep-offset.ll

[AMDGPU][SDAG] Only fold flat offsets if they are inbounds PTRADDs (#165427)

For flat memory instructions where the address is supplied as a base address
register with an immediate offset, the memory aperture test ignores the
immediate offset. Currently, SDISel does not respect that, which leads to
miscompilations where valid input programs crash when the address computation
relies on the immediate offset to get the base address in the proper memory
aperture. Global or scratch instructions are not affected.

This patch only selects flat instructions with immediate offsets from PTRADD
address computations with the inbounds flag: If the PTRADD does not leave the
bounds of the allocated object, it cannot leave the bounds of the memory
aperture and is therefore safe to handle with an immediate offset.

Affected tests:

- CodeGen/AMDGPU/fold-gep-offset.ll: Offsets are no longer wrongly folded, added
  new positive tests where we still do fold them.
- CodeGen/AMDGPU/infer-addrspace-flat-atomic.ll: Offset folding doesn't seem

    [18 lines not shown]
DeltaFile
+5,202-5,039llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+789-119llvm/test/CodeGen/AMDGPU/fold-gep-offset.ll
+104-84llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll
+69-58llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+34-24llvm/test/CodeGen/AMDGPU/no-folding-imm-to-inst-with-fi.ll
+33-16llvm/test/CodeGen/AMDGPU/loop-prefetch-data.ll
+6,231-5,3402 files not shown
+6,246-5,3518 files

LLVM/project 2f6a8a7mlir/docs/Dialects NVVMDialect.md

[MLIR][NVVM] Add operations and interfaces
DeltaFile
+14-1mlir/docs/Dialects/NVVMDialect.md
+14-11 files

LLVM/project e38529dllvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 vector-compress-freeze.ll

[DAG] Update canCreateUndefOrPoison to handle ISD::VECTOR_COMPRESS (#168010)

Fixes #167710
DeltaFile
+36-0llvm/test/CodeGen/X86/vector-compress-freeze.ll
+3-0llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+39-02 files

LLVM/project 0730913llvm/lib/Transforms/Vectorize VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize vplan-printing.ll

[VPlan] Print debug info for all recipes. (#168454)

Use the recently refactored VPRecipeBase::print to print debug location
for all recipes.

PR: https://github.com/llvm/llvm-project/pull/168454
DeltaFile
+11-22llvm/test/Transforms/LoopVectorize/vplan-printing.ll
+4-5llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+15-272 files

LLVM/project d0943f1bolt/lib/Target/AArch64 AArch64MCPlusBuilder.cpp, bolt/test/AArch64 inline-pauth-lr.s

[BOLT] Fix: copy operands of MCInst in createMatchingAuth

- some PAuthAndRet variants need operands, so we need to copy them from
  the to-be-removed MCInst to the new one
- remove extra assertion
- add unittest about inlining an Armv9.5-A PAuthAndRet variant (with the
  operand copy).
DeltaFile
+45-0bolt/test/AArch64/inline-pauth-lr.s
+1-3bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+46-32 files

LLVM/project 915e9adclang-tools-extra/clang-tidy/google AvoidCStyleCastsCheck.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Provide fix-its for casts to void* in google-readability-casting (#167655)

DeltaFile
+6-0clang-tools-extra/clang-tidy/google/AvoidCStyleCastsCheck.cpp
+4-0clang-tools-extra/test/clang-tidy/checkers/google/readability-casting.cpp
+1-1clang-tools-extra/docs/ReleaseNotes.rst
+11-13 files

LLVM/project 907e851clang/lib/Interpreter IncrementalExecutor.cpp, llvm/include/llvm/ExecutionEngine/Orc EPCDebugObjectRegistrar.h

[ORC] Remove now unused EPCDebugObjectRegistrar (NFC) (#167868)

EPCDebugObjectRegistrar is unused now that the ELF debugger support plugin uses AllocActions
https://github.com/llvm/llvm-project/pull/167866
DeltaFile
+0-69llvm/include/llvm/ExecutionEngine/Orc/EPCDebugObjectRegistrar.h
+0-61llvm/lib/ExecutionEngine/Orc/EPCDebugObjectRegistrar.cpp
+0-16llvm/lib/ExecutionEngine/Orc/TargetProcess/JITLoaderGDB.cpp
+1-3clang/lib/Interpreter/IncrementalExecutor.cpp
+2-2utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+0-3llvm/include/llvm/ExecutionEngine/Orc/TargetProcess/JITLoaderGDB.h
+3-1549 files not shown
+5-16415 files

LLVM/project a2af185mlir/lib/Dialect/Tosa/Transforms CMakeLists.txt

[mlir][tosa] Fix linker failure in build bots introduced by #165581 (#168581)

This commit fixes linker failures evident on some failing build bots.
DeltaFile
+1-0mlir/lib/Dialect/Tosa/Transforms/CMakeLists.txt
+1-01 files