[DA] Add tests for nsw doesn't hold on entire iteration space (NFC) (#162281)
The monotonicity definition states its domain as follows:
```
/// The property of monotonicity of a SCEV. To define the monotonicity, assume
/// a SCEV defined within N-nested loops. Let i_k denote the iteration number
/// of the k-th loop. Then we can regard the SCEV as an N-ary function:
///
/// F(i_1, i_2, ..., i_N)
///
/// The domain of i_k is the closed range [0, BTC_k], where BTC_k is the
/// backedge-taken count of the k-th loop
```
Current monotonicity check implementation doesn't match this definition
because:
- Just checking nowrap property of addrecs recursively is not sufficient
[7 lines not shown]
[mlir][arith] Fix SelectOp unsafe int range inference with uninitialized range case (#173716)
This PR fixes a bug in `arith::SelectOp::inferResultRangesFromOptional`
where uninitialized SelectOp branch int ranges were incorrectly joined
with initialized int ranges during dataflow analysis, leading to
incorrect folding in `-int-range-optimizations`.
**The Issue:**
When a `arith.select` branch has an uninitialized range (e.g., from an
op like `nvvm.read.ptx.sreg.cluster.ctaid.x`, `scf.switch`, `llvm.call`,
... that lacks range inference), the analysis computed
`IntegerValueRange::join(Uninitialized, Constant) = Constant`. This
caused the `arith.select` to be replaced with the constant, ignoring the
dynamic branch.
**Example:**
```mlir
// The bug before fix: -int-range-optimizations replaces %1 with %c32
// led to incorrect results and unsafe behaviours
[14 lines not shown]
Add fine-grained `__has_feature()` cutout (#170822)
This is a follow-up to pull #148323. It mints
`-fsanitize-ignore-for-ubsan-feature=...`, accepting a list of (UBSan)
sanitizers that should not cause
`__has_feature(undefined_behavior_sanitizer)` to evaluate true.
---------
Co-authored-by: Kalvin Lee <kdlee at chromium.org>
[libc++] Improve the script to manage libc++ conformance issues (#172905)
The previous script was fairly inflexible. This patch refactors the
script into a tool that can be used in various ways to manage the
conformance-tracking bits of libc++. This should make it possible to
synchronize the CSV status files, but also to find Github issues that
aren't linked to the 'C++ Standards Conformance' project, to create
missing issues more easily, etc.
Fix typos and spelling errors across codebase (#156270)
Corrected various spelling mistakes such as 'occurred', 'receiver',
'initialized', 'length', and others in comments, variable names,
function names, and documentation throughout the project. These
changes improve code readability and maintain consistency in naming
and documentation.
Co-authored-by: Louis Dionne <ldionne.2 at gmail.com>
AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7
Keep bf16/f16 values encoded as the low half of a 32-bit register,
instead of promoting to float. This avoids unwanted FP effects
from the fpext/fptrunc which should not be implied by just
passing an argument. This also fixes ABI divergence between
SelectionDAG and GlobalISel.
I've wanted to make this change for ages, and failed the last
few times. The main complication was the hack to return
shader integer types in SGPRs, which now needs to inspect
the underlying IR type.
AMDGPU: Change ABI of 16-bit element vectors on gfx6/7
Fix ABI on old subtargets so match new subtargets, packing
16-bit element subvectors into 32-bit registers. Previously
this would be scalarized and promoted to i32/float.
Note this only changes the vector cases. Scalar i16/half are
still promoted to i32/float for now. I've unsuccessfully tried
to make that switch in the past, so leave that for later.
This will help with removal of softPromoteHalfType.
GlobalISel: Fix mishandling vector-as-scalar in return values
This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16>
values as packed on gfx6/gfx7. The ABI does not pack values
currently; this is a pre-fix for that change.
Insert a bitcast if there is a single part with a different size.
Previously this would miscompile by going through the scalarization
and extend path, dropping the high element.
Also fix assertions in odd cases, like <3 x i16> -> i32. This needs
to unmerge with excess elements from the widened source vector.
All of this code is in need of a cleanup; this should look more
like the DAG version using getVectorTypeBreakdown.
AMDGPU: Directly use v2bf16 as register type for bf16 vectors. (#175761)
Previously we were casting v2bf16 to i32, unlike the f16 case. Simplify
this by using the natural vector type. This is probably a leftover from
before v2bf16 was treated as legal. This is preparation for fixing a
miscompile in globalisel.
[NFC] Add tablegen_compile_commands.yml to .gitignore (#175687)
People may want to symlink the autogonerated
tablegen_compile_commands.yml into their source directories by analogy
with compile_commands.json, and so this commit given them similar
.gitignore treatment.
[OpenMP] Remove special handling of implicit clauses in decomposition (#174654)
Applying implicit clauses should not cause any issues. The only
exception is that "simd linear(x)" could imply a "firstprivate", and
that clause is not allowed on the simd construct.
Add a check for that specific case, and apply all implicit clauses as if
they were explicit.
InstCombine: Handle fadd in SimplifyDemandedFPClass (#174853)
Note some of the tests currently fail with alive, but not
due to this patch. Namely, when performing the fadd x, 0 -> x
simplification in functions with non-IEEE denormal handling.
The existing instsimplify ignores the denormals-are-zero hazard by
checking cannotBeNegativeZero instead of isKnownNeverLogicalZero.
Also note the self handling doesn't really do anything yet, other
than propagate consistent known-fpclass information until there is
multiple use support.
This also leaves behind the original ValueTracking support, without
switching to the new KnownFPClass:fadd utility. This will be easier
to clean up after the subsequent fsub support patch.
[LowerMemIntrinsics] Propagate value profile to branch weights (#174490)
If the mem intrinsics have value profile information associated, we can synthesize branch weights when converting them (the intrinsics) to loops.
Issue #147390
[AMDGPU] Rematerialize VGPR candidates when SGPR spills to VGPR over the VGPR limit
Before, when selecting candidates to rematerialize, we would only
consider SGPR candidates when there was an excess of SGPR registers.
Failing to eliminate the excess would result in spills to VGPRs.
This is normally not an issue, unless spilling to VGPRs results in
excess VGPRs.
This patch does 2 things:
* It relaxes the GCNRPTarget success criteria: now we accept regions
where we spill SGPRs to VGPRs, as long as this does not end up in
excess VGPRs.
* It changes isSaveBeneficial to consider the excess VGPRs (which
includes the SGPRs that would be spilled to VGPR).
With these changes, the compiler rematerializes VGPRs when the excess
SGPRs would result in VGPR excess.
[4 lines not shown]