[VPlan] Explicitly unroll replicate-regions without live-outs by VF. (#170212)
This patch adds a new replicateReplicateRegionsByVF transform to
unroll replicate=regions by VF, dissolving them. The transform creates
VF copies of the replicate-region's content, connects them and converts
recipes to single-scalar variants for the corresponding lanes.
The initial version skips regions with live-outs (VPPredInstPHIRecipe),
which will be added in follow-up patches.
Depends on https://github.com/llvm/llvm-project/pull/170053
PR: https://github.com/llvm/llvm-project/pull/170212
[CUDA] Use SetVector for CUDADeviceVarODRUsedByHost for determinism (#188616)
This replaces DenseSet with SetVector to avoid non-deterministic
iteration order
when emitting device variables ODR-used by host.
[mlir] Bump SmallVector sizes along hot paths (#188827)
This is based on empirical data from compiling 9 medium to large
language and diffusion models with IREE. e2e, this improves compilation
times by 0.33% in terms of `instructions:u` (same metric is used by the
[CTMark for
Clang](https://www.npopov.com/2024/01/01/This-year-in-LLVM-2023.html#compile-time-improvements)).
I explored using other constants and these are the ones that performed
best while keeping the sizes relatively small.
[libc] Fix check-libc-lit running tests during build (#188081)
Updated check-libc-lit to depend only on build-only targets. Added
libc-integration-tests-build to track integration test executables and
updated LLVMLibCTestRules.cmake to populate it.
Removed incorrect dependencies on execution suites in include and
integration tests that were introduced in #184366.
[LV] Refine tripcount estimate using minimum iteration count rt check. (#188135)
When not folding the tail the minimum iteration count check ensures that
the vector loop is not executed if computing the trip count wraps around
to zero, as the trip count must be at least VF when vectorizing without
tail-folding.
Add and use a new tryToRefineConstantMaxTripCount helper. This ensures
we do not create dead main loops when vectorizing the epilogue, as we
choose smaller main VFs.
PR: https://github.com/llvm/llvm-project/pull/188135
[X86] Remove custom widening legalization of vector udiv/sdiv/urem/srem. (#188786)
This custom legalization was preserving splat values in widened
build_vector to allow the div by constant optimization to work.
We now allow division by constant optimization on narrow vector types
before type legalization so we no longer need this.
[lldb] Fix missing return in NativeFile::SeekFromEnd stream path (#188596)
The stream path in NativeFile::SeekFromEnd was missing a `return result`
statement after the fseek block, causing it to fall through to the error
handler which overwrites the error status with "invalid file handle"
even on success. Both SeekFromStart and SeekFromCurrent correctly return
after their stream blocks.
while no active callers to this function, It is still worth fixing this.
Revert "[CodeView] Generate `S_DEFRANGE_REGISTER_REL_INDIR` (#187709)" (#188833)
This reverts commit 08a4085. The change breaks `nvro.cpp` in the
debugging tests on the buildbot
(https://lab.llvm.org/buildbot/#/builders/46/builds/32873) but works
locally for me. It might be because the buildbot is using an older
Windows SDK.
In addition, it reverts parts of #188769 (using `.` over `->`).
[TargetLowering] Remove AllowTruncation from matchUnaryPredicate in BuildExactSDIV/BuildExactUDIV. (#188785)
After #187378 these are no longer tested. I'm concerned that we can
create illegal scalar types after type legalization. I don't know how to
test this now so I'd like to remove support until it is needed and can
be tested.
[lldb] Fix incorrect return value on error paths in FileCache (#188608)
WriteFile and ReadFile return uint64_t with UINT64_MAX as the error
sentinel, but two error paths incorrectly returned false (0), which
could be mistaken for a successful zero-byte operation.
[SLP]Improve analysis of copyables operands for commmutative main instruction
For commutative copyables, instruction operands are always LHS and other
are RHS. But if some instruction is main and has 2 instructions
operands and RHS is more compatible with LHS operands, than LHS
operands, need to swap such operands for better analysis.
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/185320
[libc][hdrgen] Print __BEGIN_C_DECLS / __END_C_DECLS conditionally. (#188830)
Clean up the `%public_api` printer code slightly - get rid of explicit
`\n` and ensure we only print `__BEGIN_C_DECLS` and `__END_C_DECLS` if
the generated header actually contains functions or objects to declare.
I've noticed that after 27ba9e2a44c11f8123528c350227db2c9a707c8f landed,
generated errno.h header has two blocks of `__BEGIN_C_DECLS` /
`__END_C_DECLS`: an empty one was generated automatically from
`%public_api` section that was intended to only add the `errno_t` type
declaration.
[clang-format] Don't crash on an input with a NUL char (#188631)
In dry-run mode we copied the memory buffer, but that just looked until
the first NUL char. But since we exit directly afterwards we can move
the buffer into the check and retain the size information.
Fixes https://github.com/llvm/llvm-project/issues/188500
[scudo] Add Last entry to ReleaseToOS enum. (#188645)
This allows static asserts to be set in tracing code that might use the
ReleaseToOS values as indexes.
This would have caused a compile failure instead of a runtime crash when
I added the use of a new ReleaseToOS value.
libclc: Add subgroup scan functions
Add the base implementation using ds_swizzle which should work
on all subtargets. There are at least 2 more paths available for
newer targets.
[InstCombine] Fold `fcmp (C - [su]itofp X), C` to integer compares (#185826)
Recognize `fcmp pred (C - [su]itofp X), C` in InstCombine and fold it to
`fcmp swap(pred) [su]itofp X, 0` for certain constant `C` (to make sure
`C - Y` nevers rounds back to `C`), then the new pattern further can be
folded by `foldFCmpIntToFPConst` to integer compares.
Fixes #185561
alive2: https://alive2.llvm.org/ce/z/9dWsCb
alive2 with constant constraints (needs local alive2 build):
https://alive2.llvm.org/ce/z/wDs9Tj
I tried generalizing the pattern to any `fcmp pred, (C - Y), C` but
alive2 says no: https://alive2.llvm.org/ce/z/qMLGah. So I will try to
find more constraints on C and Y to make this rewrite hold in future
PRs.
[libc++] Add another const_cast to support hash_map copy assignment
There was one more const_cast needed after #183223 without which
copy assignment of hash_map was broken. Add it, together with a copy
assignment test.
Reviewers: ldionne
Pull Request: https://github.com/llvm/llvm-project/pull/188660
[VPlan] Expose cloneFrom and mergeBlocksIntoPredecessors. (NFC) (#188818)
Move cloneFrom from a file-static function in VPlan.cpp to a public
static method VPBlockUtils::cloneFrom, and move
mergeBlocksIntoPredecessors from a file-static function in
VPlanTransforms.cpp to a public static method
VPlanTransforms::mergeBlocksIntoPredecessors.
This is in preparation for dissolving replicate regions which needs both
utilities.
Split off from approved
https://github.com/llvm/llvm-project/pull/170212.
PR: https://github.com/llvm/llvm-project/pull/188818
[VPlan] Extract addLaneToStartIndex helper from cloneForLane. (NFC) (#188819)
Factor out the logic for adding a lane offset to a
VPScalarIVStepsRecipe's start index into a standalone
addLaneToStartIndex helper function. This makes the logic reusable for
dissolving replicate regions.
PR: https://github.com/llvm/llvm-project/pull/188819
Revert "[HLSL][SPIRV] Add support for -g to generate NonSemantic Debug Info" (#188771)
Reverts llvm/llvm-project#187051
Breaks some OpenMP offload tests