LLVM/project cb1661bllvm/lib/Transforms/Vectorize VPlanUnroll.cpp, llvm/test/Transforms/LoopVectorize tail-folding-optimize-vector-induction-width.ll struct-return-replicate.ll

[VPlan] Explicitly unroll replicate-regions without live-outs by VF. (#170212)

This patch adds a new replicateReplicateRegionsByVF transform to
unroll replicate=regions by VF, dissolving them. The transform creates
VF copies of the replicate-region's content, connects them and converts
recipes to single-scalar variants for the corresponding lanes.

The initial version skips regions with live-outs (VPPredInstPHIRecipe),
which will be added  in follow-up patches.

Depends on https://github.com/llvm/llvm-project/pull/170053

PR: https://github.com/llvm/llvm-project/pull/170212
DeltaFile
+156-0llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+49-98llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+40-80llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
+43-41llvm/test/Transforms/LoopVectorize/VPlan/vplan-predicate-switch.ll
+9-18llvm/test/Transforms/LoopVectorize/tail-folding-optimize-vector-induction-width.ll
+6-14llvm/test/Transforms/LoopVectorize/struct-return-replicate.ll
+303-25154 files not shown
+420-53160 files

LLVM/project e849c68libcxx/utils/ci/docker linux-builder-base.dockerfile

[libc++] Install venv in the CI Docker image (#188825)

To support #165769
DeltaFile
+1-0libcxx/utils/ci/docker/linux-builder-base.dockerfile
+1-01 files

LLVM/project a4ce617clang/include/clang/AST ASTContext.h

[CUDA] Use SetVector for CUDADeviceVarODRUsedByHost for determinism (#188616)

This replaces DenseSet with SetVector to avoid non-deterministic
iteration order
when emitting device variables ODR-used by host.
DeltaFile
+1-1clang/include/clang/AST/ASTContext.h
+1-11 files

LLVM/project 1aefe3boffload/test/offloading strided_offset_multidim_update.c

[offload][L0] Remove XFAIL from XPASSING test strided_offset_multidim_update.c (#188836)

Passing now I guess

https://lab.llvm.org/buildbot/#/builders/225/builds/4729

```
********************
Unexpectedly Passed Tests (1):
  libomptarget :: spirv64-intel :: offloading/strided_offset_multidim_update.c
```

Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
DeltaFile
+0-1offload/test/offloading/strided_offset_multidim_update.c
+0-11 files

LLVM/project 80baf85mlir/include/mlir/Interfaces ValueBoundsOpInterface.h, mlir/lib/Analysis FlatLinearValueConstraints.cpp

[mlir] Bump SmallVector sizes along hot paths (#188827)

This is based on empirical data from compiling 9 medium to large
language and diffusion models with IREE. e2e, this improves compilation
times by 0.33% in terms of `instructions:u` (same metric is used by the
[CTMark for
Clang](https://www.npopov.com/2024/01/01/This-year-in-LLVM-2023.html#compile-time-improvements)).

I explored using other constants and these are the ones that performed
best while keeping the sizes relatively small.
DeltaFile
+8-4mlir/lib/IR/AffineMap.cpp
+6-2mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
+6-2mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
+4-3mlir/lib/Transforms/Utils/DialectConversion.cpp
+3-1mlir/lib/Interfaces/InferIntRangeInterface.cpp
+3-1mlir/lib/Analysis/FlatLinearValueConstraints.cpp
+30-133 files not shown
+37-169 files

LLVM/project ecfcdd6libc/cmake/modules LLVMLibCTestRules.cmake, libc/test CMakeLists.txt

[libc] Fix check-libc-lit running tests during build (#188081)

Updated check-libc-lit to depend only on build-only targets. Added
libc-integration-tests-build to track integration test executables and
updated LLVMLibCTestRules.cmake to populate it.

Removed incorrect dependencies on execution suites in include and
integration tests that were introduced in #184366.
DeltaFile
+29-7libc/cmake/modules/LLVMLibCTestRules.cmake
+3-1libc/test/CMakeLists.txt
+1-1libc/test/include/CMakeLists.txt
+1-1libc/test/integration/CMakeLists.txt
+34-104 files

LLVM/project 5aae014llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/AArch64 vector-loop-backedge-elimination-epilogue.ll epilogue-vectorization-fix-scalar-resume-values.ll

[LV] Refine tripcount estimate using minimum iteration count rt check. (#188135)

When not folding the tail the minimum iteration count check ensures that
the vector loop is not executed if computing the trip count wraps around
to zero, as the trip count must be at least VF when vectorizing without
tail-folding.

Add and use a new tryToRefineConstantMaxTripCount helper. This ensures
we do not create dead main loops when vectorizing the epilogue, as we
choose smaller main VFs.

PR: https://github.com/llvm/llvm-project/pull/188135
DeltaFile
+66-88llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-ordered-reduction.ll
+36-3llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+4-25llvm/test/Transforms/LoopVectorize/AArch64/vector-loop-backedge-elimination-epilogue.ll
+6-22llvm/test/Transforms/LoopVectorize/AArch64/epilogue-vectorization-fix-scalar-resume-values.ll
+112-1384 files

LLVM/project 120c4cfllvm/lib/Target/X86 X86ISelLowering.cpp

[X86] Remove custom widening legalization of vector udiv/sdiv/urem/srem. (#188786)

This custom legalization was preserving splat values in widened
build_vector to allow the div by constant optimization to work.

We now allow division by constant optimization on narrow vector types
before type legalization so we no longer need this.
DeltaFile
+0-29llvm/lib/Target/X86/X86ISelLowering.cpp
+0-291 files

LLVM/project 14321cclldb/source/Host/common File.cpp, lldb/unittests/Host FileTest.cpp

[lldb] Fix missing return in NativeFile::SeekFromEnd stream path (#188596)

The stream path in NativeFile::SeekFromEnd was missing a `return result`
statement after the fseek block, causing it to fall through to the error
handler which overwrites the error status with "invalid file handle"
even on success. Both SeekFromStart and SeekFromCurrent correctly return
after their stream blocks.

while no active callers to this function, It is still worth fixing this.
DeltaFile
+30-0lldb/unittests/Host/FileTest.cpp
+1-0lldb/source/Host/common/File.cpp
+31-02 files

LLVM/project 48e9c76llvm/lib/CodeGen/AsmPrinter CodeViewDebug.cpp CodeViewDebug.h, llvm/lib/IR DebugInfoMetadata.cpp

Revert "[CodeView] Generate `S_DEFRANGE_REGISTER_REL_INDIR` (#187709)" (#188833)

This reverts commit 08a4085. The change breaks `nvro.cpp` in the
debugging tests on the buildbot
(https://lab.llvm.org/buildbot/#/builders/46/builds/32873) but works
locally for me. It might be because the buildbot is using an older
Windows SDK.

In addition, it reverts parts of #188769 (using `.` over `->`).
DeltaFile
+0-212llvm/test/DebugInfo/COFF/indirect-local.ll
+66-56llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
+40-29llvm/test/DebugInfo/COFF/types-array-advanced.ll
+19-38llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.h
+1-34llvm/lib/MC/MCParser/AsmParser.cpp
+7-16llvm/lib/IR/DebugInfoMetadata.cpp
+133-3858 files not shown
+149-44314 files

LLVM/project 13f1fd0llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp

[TargetLowering] Remove AllowTruncation from matchUnaryPredicate in BuildExactSDIV/BuildExactUDIV. (#188785)

After #187378 these are no longer tested. I'm concerned that we can
create illegal scalar types after type legalization. I don't know how to
test this now so I'd like to remove support until it is needed and can
be tested.
DeltaFile
+2-4llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+2-41 files

LLVM/project 86475aflldb/source/Host/common FileCache.cpp

[lldb] Fix incorrect return value on error paths in FileCache (#188608)

WriteFile and ReadFile return uint64_t with UINT64_MAX as the error
sentinel, but two error paths incorrectly returned false (0), which
could be mistaken for a successful zero-byte operation.
DeltaFile
+2-2lldb/source/Host/common/FileCache.cpp
+2-21 files

LLVM/project 1759b81llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 copyable-operands-reordering.ll

[SLP]Improve analysis of copyables operands for commmutative main instruction

For commutative copyables, instruction operands are always LHS and other
are RHS. But if some instruction is main and has 2 instructions
operands and RHS is more compatible with LHS operands, than LHS
operands, need to swap such operands for better analysis.

Reviewers: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/185320
DeltaFile
+18-20llvm/test/Transforms/SLPVectorizer/X86/copyable-operands-reordering.ll
+18-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+36-202 files

LLVM/project 968b6aalibc/utils/hdrgen/hdrgen header.py, libc/utils/hdrgen/tests test_integration.py

[libc][hdrgen] Print __BEGIN_C_DECLS / __END_C_DECLS conditionally. (#188830)

Clean up the `%public_api` printer code slightly - get rid of explicit
`\n` and ensure we only print `__BEGIN_C_DECLS` and `__END_C_DECLS` if
the generated header actually contains functions or objects to declare.

I've noticed that after 27ba9e2a44c11f8123528c350227db2c9a707c8f landed,
generated errno.h header has two blocks of `__BEGIN_C_DECLS` /
`__END_C_DECLS`: an empty one was generated automatically from
`%public_api` section that was intended to only add the `errno_t` type
declaration.
DeltaFile
+16-0libc/utils/hdrgen/tests/expected_output/macro_only.h
+10-2libc/utils/hdrgen/hdrgen/header.py
+8-1libc/utils/hdrgen/tests/test_integration.py
+6-0libc/utils/hdrgen/tests/input/macro_only.yaml
+40-34 files

LLVM/project 049700fclang/test/Format dont-crash-on-nul.cpp, clang/tools/clang-format ClangFormat.cpp

[clang-format] Don't crash on an input with a NUL char (#188631)

In dry-run mode we copied the memory buffer, but that just looked until
the first NUL char. But since we exit directly afterwards we can move
the buffer into the check and retain the size information.

Fixes https://github.com/llvm/llvm-project/issues/188500
DeltaFile
+5-6clang/tools/clang-format/ClangFormat.cpp
+0-0clang/test/Format/dont-crash-on-nul.cpp
+5-62 files

LLVM/project 2a3e8dbllvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.h AMDGPUCoExecSchedStrategy.cpp

Minor changes

Change-Id: I9d877e83c003bf72726fe49715d4472ecad51fec
DeltaFile
+5-4llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+4-4llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+9-82 files

LLVM/project 6b3556acompiler-rt/lib/scudo/standalone common.h

[scudo] Add Last entry to ReleaseToOS enum. (#188645)

This allows static asserts to be set in tracing code that might use the
ReleaseToOS values as indexes.

This would have caused a compile failure instead of a runtime crash when
I added the use of a new ReleaseToOS value.
DeltaFile
+1-0compiler-rt/lib/scudo/standalone/common.h
+1-01 files

LLVM/project 122fb43libclc/clc/include/clc clc_target_defines.h, libclc/clc/include/clc/collective clc_work_group_scan_decl.inc clc_work_group_scan.h

libclc: Add work group scan functions
DeltaFile
+157-0libclc/clc/lib/generic/collective/clc_work_group_scan.inc
+41-0libclc/opencl/lib/generic/collective/work_group_scan.inc
+38-0libclc/clc/lib/generic/collective/clc_work_group_scan.cl
+25-0libclc/clc/include/clc/collective/clc_work_group_scan_decl.inc
+23-0libclc/clc/include/clc/clc_target_defines.h
+20-0libclc/clc/include/clc/collective/clc_work_group_scan.h
+304-03 files not shown
+321-09 files

LLVM/project b20e36blibclc/clc/lib/amdgpu/subgroup clc_amdgpu_ds_swizzle.inc clc_sub_group_scan.cl

Shrink ds_swizzle wrappers
DeltaFile
+23-33libclc/clc/lib/amdgpu/subgroup/clc_amdgpu_ds_swizzle.inc
+0-33libclc/clc/lib/amdgpu/subgroup/clc_sub_group_scan.cl
+23-662 files

LLVM/project d39a7celibclc/clc/include/clc/subgroup clc_sub_group_scan.inc, libclc/clc/lib/amdgpu/subgroup clc_sub_group_scan.cl clc_amdgpu_ds_swizzle.inc

libclc: Add subgroup scan functions

Add the base implementation using ds_swizzle which should work
on all subtargets. There are at least 2 more paths available for
newer targets.
DeltaFile
+133-0libclc/clc/lib/amdgpu/subgroup/clc_sub_group_scan.cl
+87-0libclc/clc/lib/amdgpu/subgroup/clc_amdgpu_ds_swizzle.inc
+83-0libclc/clc/lib/amdgpu/subgroup/clc_sub_group_scan.inc
+28-0libclc/opencl/lib/generic/subgroup/sub_group_scan_inclusive.inc
+28-0libclc/opencl/lib/generic/subgroup/sub_group_scan_exclusive.inc
+27-0libclc/clc/include/clc/subgroup/clc_sub_group_scan.inc
+386-06 files not shown
+441-012 files

LLVM/project e9cb778clang/test/Sema aix-pragma-pack-and-align.c ppc-pair-mma-types.c, clang/test/Sema/PowerPC aix-pragma-pack-and-align.c ppc-pair-mma-types.c

[NFC] Move PowerPC sema tests to test/Sema/PowerPC subdir (#188639)
DeltaFile
+229-0clang/test/Sema/PowerPC/aix-pragma-pack-and-align.c
+0-229clang/test/Sema/aix-pragma-pack-and-align.c
+0-223clang/test/Sema/ppc-pair-mma-types.c
+223-0clang/test/Sema/PowerPC/ppc-pair-mma-types.c
+0-189clang/test/Sema/ppc-dmf-types.c
+189-0clang/test/Sema/PowerPC/ppc-dmf-types.c
+641-64150 files not shown
+1,652-1,65256 files

LLVM/project 5499de2llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp

Use reference

Change-Id: Idd36fcb1752b8faf543b8d9384b1232de0a166b7
DeltaFile
+2-2llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+2-21 files

LLVM/project a9c6f38libc/include/llvm-libc-types Elf32_Phdr.h Elf64_Off.h, libc/src/__support frac128.h

[libc] Fixes all guard comments of libc (#188701)

This PR intends to fix ALL the wrong guard comments for libc


Script used:
[guard_checker](https://github.com/Sukumarsawant/guard_checker/blob/main/check_headers.py)
DeltaFile
+2-2libc/test/src/math/performance_testing/Timer.h
+1-1libc/include/llvm-libc-types/Elf32_Phdr.h
+1-1libc/include/llvm-libc-types/Elf64_Off.h
+1-1libc/src/__support/frac128.h
+1-1libc/src/__support/math/exp10_float16_constants.h
+1-1libc/src/__support/math/fmabf16.h
+7-716 files not shown
+23-2322 files

LLVM/project f2829b9clang Maintainers.md

Stepping up as clang-format maintainer (#188602)
DeltaFile
+3-0clang/Maintainers.md
+3-01 files

LLVM/project 0383cd0llvm/lib/Transforms/InstCombine InstCombineCompares.cpp, llvm/test/Transforms/InstCombine fcmp.ll

[InstCombine] Fold `fcmp (C - [su]itofp X), C` to integer compares (#185826)

Recognize `fcmp pred (C - [su]itofp X), C` in InstCombine and fold it to
`fcmp swap(pred) [su]itofp X, 0` for certain constant `C` (to make sure
`C - Y` nevers rounds back to `C`), then the new pattern further can be
folded by `foldFCmpIntToFPConst` to integer compares.

Fixes #185561
alive2: https://alive2.llvm.org/ce/z/9dWsCb
alive2 with constant constraints (needs local alive2 build):
https://alive2.llvm.org/ce/z/wDs9Tj

I tried generalizing the pattern to any `fcmp pred, (C - Y), C` but
alive2 says no: https://alive2.llvm.org/ce/z/qMLGah. So I will try to
find more constraints on C and Y to make this rewrite hold in future
PRs.
DeltaFile
+139-0llvm/test/Transforms/InstCombine/fcmp.ll
+30-0llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+169-02 files

LLVM/project 423f410llvm/lib/Transforms/Instrumentation MemorySanitizer.cpp, llvm/test/Instrumentation/MemorySanitizer/AArch64 aarch64-matmul.ll aarch64-bf16-dotprod-intrinsics.ll

[msan] Micro-optimize NEON matrix-multiply instrumentation (#188815)

Replace Or(SExt(),SExt()) with the equivalent SExt(Or()).
DeltaFile
+6-9llvm/test/Instrumentation/MemorySanitizer/AArch64/aarch64-matmul.ll
+4-5llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
+2-3llvm/test/Instrumentation/MemorySanitizer/AArch64/aarch64-bf16-dotprod-intrinsics.ll
+12-173 files

LLVM/project 5b8c175libcxx/include __hash_table, libcxx/test/extensions/gnu/hash_map copy.pass.cpp

[libc++] Add another const_cast to support hash_map copy assignment

There was one more const_cast needed after #183223 without which
copy assignment of hash_map was broken. Add it, together with a copy
assignment test.

Reviewers: ldionne

Pull Request: https://github.com/llvm/llvm-project/pull/188660
DeltaFile
+12-4libcxx/include/__hash_table
+5-1libcxx/test/extensions/gnu/hash_map/copy.pass.cpp
+17-52 files

LLVM/project f8fe67cllvm/lib/Transforms/Vectorize VPlan.cpp VPlanUtils.h

[VPlan] Expose cloneFrom and mergeBlocksIntoPredecessors. (NFC) (#188818)

Move cloneFrom from a file-static function in VPlan.cpp to a public
static method VPBlockUtils::cloneFrom, and move
mergeBlocksIntoPredecessors from a file-static function in
VPlanTransforms.cpp to a public static method
VPlanTransforms::mergeBlocksIntoPredecessors.

This is in preparation for dissolving replicate regions which needs both
utilities.

Split off from approved
https://github.com/llvm/llvm-project/pull/170212.

PR: https://github.com/llvm/llvm-project/pull/188818
DeltaFile
+4-10llvm/lib/Transforms/Vectorize/VPlan.cpp
+7-0llvm/lib/Transforms/Vectorize/VPlanUtils.h
+1-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+4-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+16-134 files

LLVM/project 0f963cbllvm/lib/Transforms/Vectorize VPlanUnroll.cpp

[VPlan] Extract addLaneToStartIndex helper from cloneForLane. (NFC) (#188819)

Factor out the logic for adding a lane offset to a
VPScalarIVStepsRecipe's start index into a standalone
addLaneToStartIndex helper function. This makes the logic reusable for
dissolving replicate regions.

PR: https://github.com/llvm/llvm-project/pull/188819
DeltaFile
+38-32llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+38-321 files

LLVM/project 09951fdclang/lib/CodeGen CGDebugInfo.cpp, clang/test/CodeGenHLSL/debug source-language.hlsl

Revert "[HLSL][SPIRV] Add support for -g to generate NonSemantic Debug Info" (#188771)

Reverts llvm/llvm-project#187051

Breaks some OpenMP offload tests
DeltaFile
+0-34clang/test/CodeGenHLSL/debug/source-language.hlsl
+0-32llvm/test/CodeGen/SPIRV/debug-info/hlsl-debug-info-auto-activation.ll
+5-6llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
+3-5llvm/docs/SPIRVUsage.rst
+2-6clang/lib/CodeGen/CGDebugInfo.cpp
+2-2llvm/test/CodeGen/SPIRV/debug-info/debug-compilation-unit.ll
+12-854 files not shown
+15-9010 files