LLVM/project 946a068clang/test/Driver clang_f_opts.c

Specify linker and triple for LTO checks
DeltaFile
+4-4clang/test/Driver/clang_f_opts.c
+4-41 files

LLVM/project 09a3d83clang/lib/CodeGen CGExpr.cpp, clang/test/CodeGen builtin-counted-by-ref.c attr-counted-by-pr88931.c

[Clang][CodeGen] Fix __builtin_counted_by_ref for nested struct FAMs (#182575) (#182590)

GetCountedByFieldExprGEP() used getOuterLexicalRecordContext() to find
the RecordDecl containing the counted_by count field. This walks up
through all lexically enclosing records to find the outermost one, which
is wrong when a struct with a counted_by FAM is defined nested inside
another named struct.

For example, when struct inner (containing the FAM) is defined inside
struct outer, getOuterLexicalRecordContext() resolves to struct outer
instead of struct inner. The StructAccessBase visitor then fails to
match the base expression type (struct inner *) against the expected
record (struct outer), returning nullptr. This nullptr propagates back
as the GEP result, and the subsequent dereference in
*__builtin_counted_by_ref() triggers an assertion failure in
Address::getBasePointer().

Replace getOuterLexicalRecordContext() with a walk that only traverses
anonymous structs and unions, which are transparent in C and must be

    [13 lines not shown]
DeltaFile
+27-0clang/test/CodeGen/builtin-counted-by-ref.c
+12-1clang/lib/CodeGen/CGExpr.cpp
+6-1clang/test/CodeGen/attr-counted-by-pr88931.c
+45-23 files

LLVM/project 8ed04d9llvm/lib/Transforms/Vectorize LoopVectorize.cpp VPlan.h, llvm/test/Transforms/LoopVectorize vplan-based-stride-mv.ll

[VPlan] Start implementing VPlan-based stride multiversioning

This commit only implements the run-time guard without actually
optimizing the vector loop. That would come in a separate PR to ease
review.
DeltaFile
+227-59llvm/test/Transforms/LoopVectorize/vplan-based-stride-mv.ll
+137-65llvm/test/Transforms/LoopVectorize/VPlan/vplan-based-stride-mv.ll
+110-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+43-0llvm/lib/Transforms/Vectorize/VPlan.h
+14-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+7-0llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+538-1274 files not shown
+556-12910 files

LLVM/project a22cf92mlir/lib/Dialect/XeGPU/IR XeGPUDialect.cpp

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in XeGPUDialect.cpp (NFC)
DeltaFile
+3-3mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp
+3-31 files

LLVM/project 603e5c8flang/include/flang/Optimizer/Dialect FIRAttr.td, flang/lib/Lower CallInterface.cpp

[flang][debug] Supply missing subprogram attributes (#181425)

Add DW_AT_elemental, DW_AT_pure, and DW_AT_recursive attributes to
subprograms and functions when they are specified in the source.
DeltaFile
+17-0flang/test/Transforms/debug-fn-attr.fir
+14-0flang/lib/Optimizer/Transforms/AddDebugInfo.cpp
+7-5flang/include/flang/Optimizer/Dialect/FIRAttr.td
+7-2flang/lib/Lower/CallInterface.cpp
+8-0flang/test/Lower/HLFIR/recursive-user-procedure.f90
+2-2flang/test/Lower/OpenMP/declare-target-func-and-subr.f90
+55-93 files not shown
+60-149 files

LLVM/project 4196411mlir/lib/Dialect/ArmSME/IR Utils.cpp

[MLIR] Apply clang-tidy fixes for readability-simplify-boolean-expr in Utils.cpp (NFC)
DeltaFile
+1-4mlir/lib/Dialect/ArmSME/IR/Utils.cpp
+1-41 files

LLVM/project 9bd13fcllvm/test/Transforms/LoopVectorize vplan-based-stride-mv.ll, llvm/test/Transforms/LoopVectorize/VPlan vplan-based-stride-mv.ll

[NFC][VPlan] Add initial tests for future VPlan-based stride MV

I tried to include both the features that current
LoopAccessAnalysis-based transformation supports (e.g., trunc/sext of
stride) but also cases where the current implementation behaves poorly,
e.g., https://godbolt.org/z/h31c3zKxK; as well as some other potentially
interesting scenarios I could imagine.
DeltaFile
+2,282-0llvm/test/Transforms/LoopVectorize/VPlan/vplan-based-stride-mv.ll
+2,027-0llvm/test/Transforms/LoopVectorize/vplan-based-stride-mv.ll
+4,309-02 files

LLVM/project 633449fllvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/VPlan vplan-print-after-all.ll

[NFC][VPlan] Split `makeMemOpWideningDecisions` into subpasses

The idea is to have handling of strided memory operations (either from
https://github.com/llvm/llvm-project/pull/147297 or for VPlan-based
multiversioning for unit-strided accesses) done after some mandatory
processing has been performed (e.g., some types **must** be scalarized)
but before legacy CM's decision to widen (gather/scatter) or scalarize
has been committed.

And in longer term, we can uplift all other memory widening decision to
be done here directly at VPlan level. I expect this structure would also
be beneficial for that.
DeltaFile
+83-38llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+3-0llvm/test/Transforms/LoopVectorize/VPlan/vplan-print-after-all.ll
+86-382 files

LLVM/project a2a4beallvm/lib/Transforms/Vectorize LoopVectorize.cpp VPRecipeBuilder.h, llvm/test/Transforms/LoopVectorize/AArch64 predication_costs.ll

[NFCI][VPlan] Split initial mem-widening into a separate transformation

Preparation change before implementing stride-multiversioning as a
VPlan-based transformation. Might help
https://github.com/llvm/llvm-project/pull/147297/ as well.
DeltaFile
+92-31llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+14-12llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+3-2llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
+4-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+1-0llvm/test/Transforms/LoopVectorize/VPlan/vplan-print-after-all.ll
+114-455 files

LLVM/project fd4dec9mlir/lib/Dialect/Complex/IR ComplexOps.cpp, mlir/test/Dialect/Complex canonicalize.mlir

 [MLIR][Complex] Check for FastMathFlag in DivOp folder (#176249)

- Fold DivOp with LHS that has NaN as real or imag to Complex of NaNs
- Fold `div(a, Complex<1, 0>) -> a` if fast math flag with nnan is set
DeltaFile
+44-2mlir/test/Dialect/Complex/canonicalize.mlir
+24-21mlir/lib/Dialect/Complex/IR/ComplexOps.cpp
+68-232 files

LLVM/project 1f53b18llvm/lib/Target/AMDGPU AMDGPULibFunc.cpp

AMDGPU: Try to fix leak in AMDGPULibFunc

I don't know why this was trying to do placement do. I guess
this was overriding the unique_ptr, bypassing its destructor.
DeltaFile
+2-0llvm/lib/Target/AMDGPU/AMDGPULibFunc.cpp
+2-01 files

LLVM/project 67fcdc9llvm/lib/Target/AMDGPU SIFrameLowering.cpp GCNSubtarget.cpp

[AMDGPU] Efficient way to get NumArchVGPRs. (#182537)

No functional change. Cleaning up to get number of VGPRs for different
AMDGPU target based on features.
DeltaFile
+1-2llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+1-1llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+1-1llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+3-43 files

LLVM/project e1feac3libc/shared rpc_dispatch.h rpc_util.h

[libc] Properly handle null handles in rpc_dispatch.h

Summary:
We autuomatically dereference pointers, we should check if these are
null. Minimal change made by just keeping it zero and handling zero.
DeltaFile
+10-7libc/shared/rpc_dispatch.h
+2-1libc/shared/rpc_util.h
+12-82 files

LLVM/project 70b5a1dflang-rt/lib/runtime io-api-server.cpp io-api-gpu.cpp, offload/test/offloading/fortran formatted-io.f90

[flang-rt] Add support for formatted I/O on the GPU (#182580)

Summary:
Expands on the previous support to enable formatted output, characters,
and checking basic iostat. We intentionally do not handle cases where
the descriptor is non-null as this is a non-trivial class that cannot
easily be shepherded across the wire.
DeltaFile
+67-0offload/test/offloading/fortran/formatted-io.f90
+52-9flang-rt/lib/runtime/io-api-server.cpp
+26-6flang-rt/lib/runtime/io-api-gpu.cpp
+15-12flang-rt/lib/runtime/io-api-gpu.h
+160-274 files

LLVM/project 2b07482flang/lib/Optimizer/Transforms FIRToMemRef.cpp, flang/test/Transforms/FIRToMemRef array-coor-block-arg.mlir no-declare.mlir

Reapply "[flang] Lowering a ArrayCoorOp to arithmetic computations" (#182585)

Reapplying the changes. Reverted it wrongly yesterday

This reverts commit 3c6523dcb8ebc0396f69c578285599b66e16dce7.
DeltaFile
+14-15flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
+28-0flang/test/Transforms/FIRToMemRef/array-coor-block-arg.mlir
+7-6flang/test/Transforms/FIRToMemRef/no-declare.mlir
+49-213 files

LLVM/project fafaaa1llvm/lib/Transforms/Scalar LoopFuse.cpp

[LoopFusion] Improve collectFusionCandidates() (#182571)

The order of visiting loops in collectFusionCandidates() guarantees that
a new member can only possibly be added to the end of a set.

Also currently `NumFusionCandidates` counts any loop that is added to a
candidate set. Usually large majority of candidate sets have a single
members so they are not really candidates for fusion. Only the second
member of a candidate set and the ones that come after that could be
counted as fusion candidates.
DeltaFile
+2-11llvm/lib/Transforms/Scalar/LoopFuse.cpp
+2-111 files

LLVM/project 2f47bbfclang/include/clang/CIR/Dialect/IR CIRCUDAAttrs.td CIRAttrs.td, clang/lib/CIR/CodeGen CIRGenCall.cpp

[CIR][CUDA] Add CUDAKernelNameAttr for device stubs (#180051)

Besides the Attribute description. It is worth noting that this
attribute will later be consumed when handling runtime registration on
loweringPrepare.
DeltaFile
+40-0clang/include/clang/CIR/Dialect/IR/CIRCUDAAttrs.td
+10-0clang/lib/CIR/CodeGen/CIRGenCall.cpp
+3-3clang/test/CIR/CodeGenCUDA/kernel-stub-name.cu
+2-0clang/include/clang/CIR/Dialect/IR/CIRAttrs.td
+55-34 files

LLVM/project 5c92981utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[bazel][NVVM][MLIR] Port 7d9b863bc39f690c6f7716dd27d24e6ccfcfae33 (#182588)

Co-authored-by: Pranav Kant <prka at google.com>
DeltaFile
+27-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+27-01 files

LLVM/project 8735e69flang/lib/Optimizer/CodeGen CodeGen.cpp

[flang] OPTIONAL char dummy has no defining op; add null check (#182582)

size.getDefiningOp() returns nullptr for block arguments when a OPTIONAL
character length generated the conditional "fir.if". Check for a nullptr
before calling mlir::isa<> to avoid the crash.

Addresses: https://github.com/llvm/llvm-project/issues/182436
Passes check-flang, check-flang-rt, and llvm-test-suite (x86_64)

---------

Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval at gmail.com>
DeltaFile
+3-1flang/lib/Optimizer/CodeGen/CodeGen.cpp
+3-11 files

LLVM/project 94724e8llvm/test/Transforms/LoopVectorize epilog-iv-select-cmp.ll

[LV] Add epilogue test with FindLast reduction of wrapping IV.

Extra test coverage for https://github.com/llvm/llvm-project/pull/172569
DeltaFile
+71-0llvm/test/Transforms/LoopVectorize/epilog-iv-select-cmp.ll
+71-01 files

LLVM/project 7d9b863mlir/include/mlir/Conversion Passes.td, mlir/include/mlir/Conversion/GPUToNVVM GPUToNVVMPass.h

[NVVM][MLIR] Refactor conversion of Math / Arith Operations seperate Passes (#180058)

This Commit refactors the conversion of Math / Arith operations to NVVM
into a separate Pass called MathToNVVM. This was done to allow to
support the lowering of Math / Arith operations in flang. This mirrors
what was done in MathToROCDL.

This is PR (1/2) to address
https://github.com/llvm/llvm-project/issues/147023 and
https://github.com/llvm/llvm-project/issues/179347.

PR(2/2) that adds this pass to flang is here:
https://github.com/llvm/llvm-project/pull/180060
DeltaFile
+279-0mlir/lib/Conversion/MathToNVVM/MathToNVVM.cpp
+1-217mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+27-0mlir/include/mlir/Conversion/MathToNVVM/MathToNVVM.h
+26-0mlir/lib/Conversion/MathToNVVM/CMakeLists.txt
+14-0mlir/include/mlir/Conversion/Passes.td
+0-6mlir/include/mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h
+347-2233 files not shown
+350-2239 files

LLVM/project f1a9f1allvm/test/CodeGen/AMDGPU amdgpu-simplify-libcall-pow.ll amdgpu-simplify-libcall-powr.ll

Revert "AMDGPU: Perform libcall recognition to replace fast OpenCL pow (#182135)"

This reverts commit fdc4274e2fcc79b8ba3064235da721573cdeea83.
DeltaFile
+1,728-4,143llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow.ll
+881-1,244llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-powr.ll
+440-633llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-rootn.ll
+0-658llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-fast.ll
+0-566llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown-fast.ll
+0-487llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-powr-fast.ll
+3,049-7,7316 files not shown
+3,268-8,54112 files

LLVM/project a8989c0clang/unittests/Analysis/Scalable CMakeLists.txt, clang/unittests/Analysis/Scalable/Serialization JSONFormatTest.cpp

[clang][ssaf] Refactor `JSONFormatTest` into a directory with a shared fixture header (#182523)

This change converts `Serialization/JSONFormatTest.cpp` into a directory
to support reuse of the `JSONFormatTest` fixture by upcoming test files
for additional data structures with JSON serialization support. New test
files for other serializable data structures can now include
`JSONFormatTest.h`, inherit from `JSONFormatTest`, and add their own
fixture and tests without duplicating the filesystem scaffolding.
DeltaFile
+0-1,848clang/unittests/Analysis/Scalable/Serialization/JSONFormatTest.cpp
+1,736-0clang/unittests/Analysis/Scalable/Serialization/JSONFormatTest/TUSummaryTest.cpp
+144-0clang/unittests/Analysis/Scalable/Serialization/JSONFormatTest/JSONFormatTest.h
+1-1clang/unittests/Analysis/Scalable/CMakeLists.txt
+1,881-1,8494 files

LLVM/project 23d1e36llvm/lib/Target/AMDGPU AMDGPULibFunc.cpp

AMDGPU: Try to fix leak in AMDGPULibFunc

I don't know why this was trying to do placement do. I guess
this was overriding the unique_ptr, bypassing its destructor.
DeltaFile
+1-1llvm/lib/Target/AMDGPU/AMDGPULibFunc.cpp
+1-11 files

LLVM/project 0dd1cb0clang/lib/CodeGen/TargetBuiltins ARM.cpp, clang/lib/Sema SemaARM.cpp

[clang][ARM] Refactor argument handling in `EmitAArch64BuiltinExpr` (2/2) (NFC) (#181974)

Refactor `EmitAArch64BuiltinExpr` so that all AArch64/NEON builtins
handled by this hook _and marked as overloaded_ share a common path
for generating LLVM IR arguments (collected into the `Ops`
`SmallVector<Value*>`) (*). This is a follow-up for #181794 - please
refer to that PR for more context.

As in the previous PR, the key change is implemented in
`HasExtraNeonArgument` , i.e. in the hook that identifies Builtins with
the extra argument. In this PR, I am replacing the ad-hoc switch
statement with a more principled approach borrowed from SemaARM.cpp,
namely:
```cpp
static bool HasExtraNeonArgument(unsigned BuiltinID) {
  // (...)
  uint64_t mask = 0;
  switch (BuiltinID) {
  #define GET_NEON_OVERLOAD_CHECK

    [29 lines not shown]
DeltaFile
+48-254clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+3-1clang/lib/Sema/SemaARM.cpp
+51-2552 files

LLVM/project 11d7e13clang/tools/c-index-test core_main.cpp

[c-index-test] Avoid loading a module input file when we need a file name only. (#182426)

Loading a module input file triggers its validation. Avoid this process
when we need only a file name.

rdar://167647519
DeltaFile
+10-5clang/tools/c-index-test/core_main.cpp
+10-51 files

LLVM/project 741b2cdllvm/cmake/modules HandleLLVMOptions.cmake

Re-enable MSVC C4706 diagnostic; NFC (#182564)

From MSDN:

https://learn.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4706?view=msvc-170

> assignment used as a condition

This diagnostic was disabled as part of enabling /W4 use in
5c73e1f85c5d37a5b037c70f3c112eec5646acb3 where there were hundreds of
instances of the diagnostic being triggered. However, local testing
suggests we now are adding the parentheses required to silence the
diagnostic, and so I believe this can be enabled again.
DeltaFile
+0-1llvm/cmake/modules/HandleLLVMOptions.cmake
+0-11 files

LLVM/project 2092145libc/src/setjmp/arm longjmp.cpp setjmp.cpp

[libc] Save one instruction on ARM (#181515)

For assembler functions, doing them as best as possible is paramount.
Save sp in ARM, restore sp in ARM.
DeltaFile
+17-1libc/src/setjmp/arm/longjmp.cpp
+16-1libc/src/setjmp/arm/setjmp.cpp
+33-22 files

LLVM/project 6767bfelld/test/ELF/lto linker-script-symbols-ipo.ll, llvm/lib/CodeGen/AsmPrinter AsmPrinter.cpp

CodeGen: Emit .prefalign directives based on the prefalign attribute.

The result of the MachineFunction preferred alignment query is emitted
as a .prefalign directive if supported, otherwise it gets combined into
the minimum alignment.

Part of this RFC:
https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

Reviewers: nikic, vitalybuka

Reviewed By: vitalybuka

Pull Request: https://github.com/llvm/llvm-project/pull/155529
DeltaFile
+27-0llvm/test/CodeGen/X86/prefalign.ll
+14-2llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+2-2llvm/test/Transforms/SampleProfile/pseudo-probe-emit.ll
+1-1lld/test/ELF/lto/linker-script-symbols-ipo.ll
+44-54 files

LLVM/project a5fc887llvm/utils/gn/secondary/libcxx/include BUILD.gn

[gn build] Port d7a24d30f62bc
DeltaFile
+1-0llvm/utils/gn/secondary/libcxx/include/BUILD.gn
+1-01 files