AMDGPU: Add range attribute to mbcnt intrinsic callsites (#189191)
It seems the known bits handling added in
686987a540bc176bceaad43ffe530cb3e88796d5
is insufficient to perform many range based optimizations. For some
reason
computeConstantRange doesn't fall back on KnownBits, and has a separate,
less used form which tries to use computeKnownBits.
[CIR] Implement global decomposition declarations (#190364)
No real challenge to these, it is effectively a copy/paste of the
classic codegen as it just requires we properly emit the holding
variable. The rest falls out of the rest of our handling of variables.
[clang][bytecode] Don't unref constexpr-unknown references (#190177)
If the pointer for a reference is constexpr-unknown, use the pointer
itself instead, instead of dereferencing it. Unfortunately, that means
constexpr-unknown pointers to reach a lot more places than before.
Split DWARF v2 tests to exclude 64-bit AIX targets (#189077)
64-bit AIX requires DWARF64 format, which was only introduced in DWARF
v3. DWARF v2 only supports 32-bit DWARF format, making it incompatible
with 64-bit AIX (the compiler throws a fatal error). These changes split
DWARF v2 tests into separate files that exclude 64-bit AIX targets while
still running on 32-bit AIX and other 64-bit platforms where DWARF v2 is
supported.
[CodeGen] Ignore `ANNOTATION_LABEL` in scheduler (#190499)
This fixes a crash in `clang` for `armv7` targets when optimizations are
enabled.
Fixes #190497
[VPlan] Skip successors outside any loop when updating LoopInfo. (#190553)
Successors outside of any loop do not contribute to the innermost loop,
skip them to avoid incorrect results due to
getSmallestCommonLoop(nullptr, X) returning nullptr.
[InstCombine] Fix #163110: Support peeling off matching shifts from icmp operands via canEvaluateShifted (#165975)
Consider a pattern like `icmp (shl nsw X, L), (add nsw (shl nsw Y, L),
K)`. When the constant K is a multiple of 2^L, this can be simplified to
`icmp X, (add nsw Y, K >> L)`.
This patch extends canEvaluateShifted to support `Instruction::Add` and
updates its signature to accept `Instruction::BinaryOps` instead of a
boolean. This change allows the function to distinguish between LShr and
AShr requirements, ensuring that information is preserved according to
the signedness and overflow flags (nsw/nuw) of the operands.
The logic is integrated into `foldICmpCommutative` to enable peeling off
matching shifts from both sides of a comparison even when an offset is
present.
Fixes: #163110
[LV] Return best VPlan together with VF from computeBestVF (NFC). (#190385)
computeBestVF iterates over all VPlans and picks the VF of the most
profitable VPlan. This VPlan is later needed for execution and
additional checks. Instead of retrieving it multiple times later, just
directly return it from computeBestVF.
This removes some redundant lookups.
PR: https://github.com/llvm/llvm-project/pull/190385
[llvm-ir2vec] Added Enum for ir2vec embedding mode (#190466)
Currently, the initEmbedding() takes mode as an input. This input is a
string input. This PR introduces a patch to take the input as an enum
value.
[VPlan] Mark unary ops as not having side-effects (NFC). (#190554)
Mark unary ops (only FNeg current) to neither read nor write memory,
similar to binary and cast ops.
Should currently be NFC end-to-end.
[MLIR][NVVM] Add new narrow FP convert Ops (#184291)
This change adds the following NVVM Ops for new narrow FP conversions
introduced in PTX 9.1:
- `convert.{f32x2/bf16x2}.to.s2f6x2`
- `convert.s2f6x2.to.bf16x2`
- `convert.bf16x2.to.f8x2` (extended for `f8E4M3FN` and `f8E5M2` types)
- `convert.{f16x2/bf16x2}.to.f6x2`
- `convert.{f16x2/bf16x2}.to.f4x2`
PTX ISA Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
Early exit llvm-bolt when coming across empty data files (#176859)
perf2bolt generates empty fdata files for small binaries and right now
BOLT does this check while parsing by calling `((!hasBranchData() &&
!hasMemData()))`. Instead, early exit as soon as the buffer finishes
reading the data file and exit with error message.
[Polly] Correct integer comparison bit width (#190493)
For making an integer compareable to bool, don't compare it to bool.
Bug occured during the reduction of #190459
[runtimes] Skip custom linker validation for gpu/offload targets (#189933)
This fixes `Host compiler does not support '-fuse-ld=lld'` error when
cross-build libclc for gpu target. Cmake configure command is:
-DRUNTIMES_amdgcn-amd-amdhsa-llvm_LLVM_ENABLE_RUNTIMES=libclc \
-DLLVM_RUNTIME_TARGETS="amdgcn-amd-amdhsa-llvm"
libclc targets only support offload target cross-build and can't link
host executable. The configuration error is false positive for offload.
This PR adds a baseline test to first check if the target can link
executable. If it fails (typical for gpu/offload), we skip the custom
linker validation.
[LV] Additional epilogue tests for find-iv and with uses of IV.(NFC) (#190548)
Additional test coverage for loops not yet supported, with sinkable
find-iv expressions (github.com/llvm/llvm-project/pull/183911) and uses
of the IV.
PR: https://github.com/llvm/llvm-project/pull/190548
[VPlan] Refactor FindLastSelect matching to use m_Specific(PhiR) (NFC). (#190547)
Match the select operands directly against PhiR using m_Specific,
binding only the non-phi IV expression. This replaces the generic
TrueVal/FalseVal matching followed by an assert and conditional
extraction.
Split off from approved
https://github.com/llvm/llvm-project/pull/183911/ as suggested.
[Orc][LibResolver] Fix GNU/Hurd build (#184470)
GNU/Hurd does not put a PATH_MAX static constraint on path lengths. We can instead check the symlink length.