[DA] Fix overflow of calculation in weakCrossingSIVtest
This patch fixes a correctness issue where integer overflow in the
upper bound calculation of weakCrossingSIVtest caused the pass to
incorrectly prove independence.
The previous logic used `SCEV::getMulExpr` to calculate
`2 * ConstCoeff * UpperBound` and compared it to `Delta` using
`isKnownPredicate`. In the presence of overflow, this could yield
unsafe results.
This change replaces the SCEV arithmetic with `ConstantRange` and
its operation (`smul_fast`). If the calculation overflows,
`intersectWith(MLRange).isEmptySet()` would be false, ensures we
conservatively assume a dependence if the bounds cannot be proven
safe.
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
[compiler-rt][msan] Fix 32-bit overflow in CheckMemoryLayoutSanity (#189199)
Use start + (end - start) / 2 instead of (start + end) / 2 to compute
the midpoint address. The original expression overflows when start + end
exceeds UPTR_MAX, which happens on 32-bit targets whose memory layout
includes regions above 0x80000000.
[compiler-rt][sanitizer] Add struct_rlimit64_sz for musl (#189197)
On musl, rlimit64 is an alias for rlimit rather than a distinct type
provided by glibc. Add a SANITIZER_MUSL elif branch so that
struct_rlimit64_sz is defined for musl-based Linux targets.
[Polly] Forward VFS from PassBuilder for IO sandboxing (#188657)
#184545 default-enables the IO sandbox in assert-builds. This causes
Clang using Polly to crash (#188568).
The issue is that `PassBuilder` uses `vfs::getRealFileSystem()` by
default which is considered a IO sandbox violation in the Clang process.
With this PR store the VFS from the `PassBuilder` from the original
`registerPollyPasses` call for creating other `PassBuilder` instances.
This PR also adds infrastructure for running Polly in `clang` (in
addition in `opt`). `opt` does not enable the sandbox such that we need
separate tests using Clang.
Closes: #188568
[SLP] Check if potential bitcast/bswap candidate is a root of reduction
Need to check if the potential bitcast/bswap-like construct is a root of
the reduction, otherwise it cannot represent a bitcast/bswap construct.
Fixes #189184
[DA] Fix overflow of calculation in weakCrossingSIVtest
This patch fixes a correctness issue where integer overflow in the
upper bound calculation of weakCrossingSIVtest caused the pass to
incorrectly prove independence.
The previous logic used `SCEV::getMulExpr` to calculate
`2 * ConstCoeff * UpperBound` and compared it to `Delta` using
`isKnownPredicate`. In the presence of overflow, this could yield
unsafe results.
This change replaces the SCEV arithmetic with `ConstantRange` and
its operation (`smul_fast`). If the calculation overflows,
`intersectWith(MLRange).isEmptySet()` would be false, ensures we
conservatively assume a dependence if the bounds cannot be proven
safe.
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
[DA] Hoist division check for early exit in weakCrossingSIVtest (NFC)
This patch moves the check that `Coeff` divides `Delta` earlier in the
function to enable an early exit. Potentially improve performance.
Signed-off-by: Ruoyu Qiu <cabbaken at outlook.com>
AMDGPU: Add range attribute to mbcnt intrinsic callsites
It seems the known bits handling added in 686987a540bc176bceaad43ffe530cb3e88796d5
is insufficient to perform many range based optimizations. For some reason
computeConstantRange doesn't fall back on KnownBits, and has a separate,
less used form which tries to use computeKnownBits.
AMDGPU: Skip last corrections in afn f64 reciprocal
Device libs has a fast reciprocal macro that is close
to the fast division expansion, but skips the last terms
compared to the full division.
The basic reciprocal handling has identical output to this
macro. The negative reciprocal case has different fneg placement
and smaller code size, but I believe should be the same.
[MLIR][XeGPU] Add distribution patterns for vector step, shape_cast & broadcast from sg-to-wi (#185960)
This PR adds distribution patterns for vector.step, vector.shape_cast &
vector.broadcast in the new sg-to-wi pass
[clang-format] Fix breaking enum braces when combined with export (#189128)
This fixes #186684.
Also fix (not) breaking variables declared on the same line as the
closing brace.
And adapt whitesmith to that changes.
[MLIR][Python] Add more field specifiers to Python-defined operations (#188064)
This PR adds two new field specifiers (`operand` and `attribute`) and
extends the existing one (`result`):
- `default_factory` parameter is added for `result` and `attribute` to
specify default value via a lambda/function
- `kw_only` parameter is added for all these three specifiers, to make a
field a keyword-only parameter (without giving a default value).
```python
def result(
*,
infer_type: bool = False,
default_factory: Optional[Callable[[], Any]] = None,
kw_only: bool = False,
) -> Any: ...
def operand(
[43 lines not shown]
[MLIR] Fix crash in test-bytecode-roundtrip when test dialect is absent (#189163)
When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a
module that contains no test dialect operations, the reader type
callback in `runTest0` called
`reader.getDialectVersion<test::TestDialect>()` and then immediately
asserted that it succeeded. However, if the test dialect was never
referenced in the bytecode (because no test dialect types appear in the
module), the dialect's version information is not stored in the
bytecode, so `getDialectVersion` legitimately returns failure.
When the test dialect version is unavailable in the bytecode being read,
the module contains no test dialect types, so no "funky"-group overrides
are needed and the callback can safely skip by returning `success()`.
A regression test is added with a module that has no test dialect ops,
exercising the `test-dialect-version=2.0` path that previously crashed.
Fixes #128321
[2 lines not shown]
[MLIR][Vector] Fix crash in foldDenseElementsAttrDestInsertOp on poison index (#188508)
When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into
the static position of a vector.insert op,
foldDenseElementsAttrDestInsertOp would proceed to call
calculateInsertPosition, which returned -1. The subsequent iterator
arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing
an assertion in DenseElementsAttr::get.
Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any
static position equals kPoisonIndex, consistent with how
InsertChainFullyInitialized already guards this case.
Fixes #188404
Assisted-by: Claude Code