[LSR] Support SCEVPtrToAddr in SCEVDbgValueBuilder.
Allow SCEVPtrToAddr as cast in assertion in SCEVDbgValueBuilder.
SCEVPtrToAddr is handled similarly to SCEVPtrToInt.
Fixes a crash with debug info after bd40d1de9c9ee, which started to
generate ptrtoaddr instead of ptrtoint expressions.
[mlir][Interfaces] Add `ExecutionProgressOpInterface` + folding pattern (#179039)
Add the `ExecutionProgressOpInterface` with an interface method to check
if an operation "must progress". Add `mustProgress` attributes to
`scf.for` and `scf.while` (default value is "true").
`mustProgress` corresponds to the [`llvm.loop.mustprogress`
metadata](https://llvm.org/docs/LangRef.html#langref-llvm-loop-mustprogress).
Also add a canonicalization pattern to erase `RegionBranchOpInterface`
ops that must progress but loop infinitely (and are non-side-effecting).
This canonicalization pattern is enabled for `scf.for` and `scf.while`.
RFC: https://discourse.llvm.org/t/infinite-loops-and-dead-code/89530
[mlir] Fix build after #179039 (#179180)
Fix build after #179039.
[ValueTracking] Propagate sign information out of loop (#175590)
LLVM converts sqrt libcall to intrinsic call if the argument is within
the range(greater than or equal to 0.0). In this case the compiler is
not able to deduce the non-negativity on its own. Extended ValueTracking
to understand such loops.
Fixes llvm/llvm-project#174813
[mlir][AMDGPU] Avoid verifier crash in DPPOp on vector operand types (#178887)
### whats the problem
mlir-opt could crash while verifying amdgpu.dpp when its operands had
vector
types, such as ARM SME tile vectors produced by arm_sme.get_tile.
The crash occurred during IR verification, before any lowering or passes
ran.
### why it happens
DPPOp::verify() called Type::getIntOrFloatBitWidth() on the operand
type.
When the operand was a VectorType, this hit an assertion because only
scalar
integer and float types have a bitwidth.
### whats the fix
Query the bitwidth on the element type using getElementTypeOrSelf()
instead of
[5 lines not shown]
[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo (#174341)
## Summary
Fix null pointer dereference in
`SelectionDAGBuilder::resolveDanglingDebugInfo`.
## Problem
`Val.getNode()->getIROrder()` is called before checking if
`Val.getNode()` is null, causing crashes when compiling code with debug
info that contains aggregate constants with nested empty structs.
## Solution
Move the `ValSDNodeOrder` declaration inside the `if (Val.getNode())`
block.
## Test Case
Reproduces with aggregate types containing nested empty structs:
```llvm
%3 = insertvalue { { i1, {} }, ptr, { { {} }, { {} } }, i64 }
[47 lines not shown]
[InstCombine] Shrink added constant using LHS known zeros (#174380)
Previously, `SimplifyDemandedUseBits` for `add` instructions only
used known zeros from the RHS to simplify the LHS. It failed to
handle the symmetric case where the LHS has known zeros and the
result does not demand the low bits.
This patch implements this missing optimization, allowing the RHS
constant to be shrunk when the LHS low bits are known zero and unused.
Proof: https://alive2.llvm.org/ce/z/6v9iFY
Fixed: https://github.com/llvm/llvm-project/issues/135411
[compiler-rt] [ubsan] Fix missing include directory (#180341)
Fixes missing `-I` path that broke standalone builds in #179011. Matches
`include_directories()` in other compiler-rt libraries.
Signed-off-by: Michał Górny <mgorny at gentoo.org>
[libc++] Reduce the number of runs on the stop_token benchmarks (#179914)
Testing a bunch of sizes has relatively little value. This reduces the
number of benchmarks so we can run them on a regular basis.
Fixes #179697
[AMDGPU] Fix pattern selecting fmul to v_fma_mix_f32 (#180210)
This needs to use an addend of -0.0 to get the correct result when the
result should be -0.0.
[AMDGPU] Optimize S_OR_B32 to S_ADDK_I32 where possible (#177949)
This PR fixes #177753, converting disjoint S_OR_B32 to S_ADDK_I32
whenever possible, it avoids this transformation in case S_OR_B32 can be
converted to bitset.
Note on Test Failures (Draft Status) This change causes significant
register reshuffling across the test suite due to the new allocation
hints and the swaps performed in case src0 is not a register and src1,
along with the change from or to addk. To avoid a massive, noisy diff
during the initial logic review:
This Draft PR only includes a representative sample of updated tests.
CodeGen/AMDGPU/combine-reg-or-const.ll -> Showcases change from S_OR to
S_ADDK
CodeGen/AMDGPU/s-barrier.ll -> Showcases swap between Src0 and Src1 if
src0 is not a register
The rest of the tests show the result of the register allocation hint we
[3 lines not shown]
Revert "[MC][TableGen] Expand Opcode field of MCInstrDesc" (#180321)
Reverts llvm/llvm-project#179652
This PR causes the out-of-memory build failures on many Windows
builders.
[mlir][Linalg] Promote lhs/rhs when vectorizing conv1D as outerproduct (#179883)
-- vector.outerproduct requires lhs/rhs to have same element type as the
result.
-- This commit adds a fix to promote lhs/rhs to have result's element
type when vectorizing conv1D slice to vector.outerproduct.
-- This is along the similar lines of what happens when we are
vectorizing conv1D slice to vector.contract - the corresponding
CHECK line was incorrect and this commit fixes that too.
Signed-off-by: Abhishek Varma <abhvarma at amd.com>