[SelectionDAG] Fix null pointer dereference in resolveDanglingDebugInfo (#174341)
## Summary
Fix null pointer dereference in
`SelectionDAGBuilder::resolveDanglingDebugInfo`.
## Problem
`Val.getNode()->getIROrder()` is called before checking if
`Val.getNode()` is null, causing crashes when compiling code with debug
info that contains aggregate constants with nested empty structs.
## Solution
Move the `ValSDNodeOrder` declaration inside the `if (Val.getNode())`
block.
## Test Case
Reproduces with aggregate types containing nested empty structs:
```llvm
%3 = insertvalue { { i1, {} }, ptr, { { {} }, { {} } }, i64 }
[47 lines not shown]
[InstCombine] Shrink added constant using LHS known zeros (#174380)
Previously, `SimplifyDemandedUseBits` for `add` instructions only
used known zeros from the RHS to simplify the LHS. It failed to
handle the symmetric case where the LHS has known zeros and the
result does not demand the low bits.
This patch implements this missing optimization, allowing the RHS
constant to be shrunk when the LHS low bits are known zero and unused.
Proof: https://alive2.llvm.org/ce/z/6v9iFY
Fixed: https://github.com/llvm/llvm-project/issues/135411
[compiler-rt] [ubsan] Fix missing include directory (#180341)
Fixes missing `-I` path that broke standalone builds in #179011. Matches
`include_directories()` in other compiler-rt libraries.
Signed-off-by: Michał Górny <mgorny at gentoo.org>
[libc++] Reduce the number of runs on the stop_token benchmarks (#179914)
Testing a bunch of sizes has relatively little value. This reduces the
number of benchmarks so we can run them on a regular basis.
Fixes #179697
[AMDGPU] Fix pattern selecting fmul to v_fma_mix_f32 (#180210)
This needs to use an addend of -0.0 to get the correct result when the
result should be -0.0.
[AMDGPU] Optimize S_OR_B32 to S_ADDK_I32 where possible (#177949)
This PR fixes #177753, converting disjoint S_OR_B32 to S_ADDK_I32
whenever possible, it avoids this transformation in case S_OR_B32 can be
converted to bitset.
Note on Test Failures (Draft Status) This change causes significant
register reshuffling across the test suite due to the new allocation
hints and the swaps performed in case src0 is not a register and src1,
along with the change from or to addk. To avoid a massive, noisy diff
during the initial logic review:
This Draft PR only includes a representative sample of updated tests.
CodeGen/AMDGPU/combine-reg-or-const.ll -> Showcases change from S_OR to
S_ADDK
CodeGen/AMDGPU/s-barrier.ll -> Showcases swap between Src0 and Src1 if
src0 is not a register
The rest of the tests show the result of the register allocation hint we
[3 lines not shown]
Revert "[MC][TableGen] Expand Opcode field of MCInstrDesc" (#180321)
Reverts llvm/llvm-project#179652
This PR causes the out-of-memory build failures on many Windows
builders.
[mlir][Linalg] Promote lhs/rhs when vectorizing conv1D as outerproduct (#179883)
-- vector.outerproduct requires lhs/rhs to have same element type as the
result.
-- This commit adds a fix to promote lhs/rhs to have result's element
type when vectorizing conv1D slice to vector.outerproduct.
-- This is along the similar lines of what happens when we are
vectorizing conv1D slice to vector.contract - the corresponding
CHECK line was incorrect and this commit fixes that too.
Signed-off-by: Abhishek Varma <abhvarma at amd.com>
[SPIRV] Optimize getAllocatedType calls in LegalizeZeroSizeArrays (#179068)
Compute zero-sized allocation accurately using size APIs, and replace
them with 1 byte instead of 1 pointer of space.
Co-authored-by: Claude Sonnet 4.5 <noreply at anthropic.com>
[X86] AMD Zen 6 Initial enablement (#179150)
This patch adds initial support for AMD Zen 6 architecture (znver6):
- Added znver6 CPU target recognition in Clang and LLVM
- Updated compiler-rt CPU model detection for znver6
- Added znver6 to target parser and host CPU detection
- Added znver6 to various optimizer tests
znver6 features: FP16, AVXVNNIINT8, AVXNECONVERT, AVXIFMA (without BMM).
CodeGen, Driver: Add -fsanitize-trap-loop option.
This option may be used to opt into infinite loops for failed UBSan and
CFI checks. It causes Clang to generate an llvm.cond.loop intrinsic call
instead of a conditional branch to a trap instruction when generating
code for a conditional trap.
Part of this RFC:
https://discourse.llvm.org/t/rfc-optimizing-conditional-traps/89456
Reviewers: fmayer, vitalybuka
Reviewed By: vitalybuka, fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/177688
Add llvm.cond.loop intrinsic.
The llvm.cond.loop intrinsic is semantically equivalent to a conditional
branch conditioned on ``pred`` to a basic block consisting only of an
unconditional branch to itself. Unlike such a branch, it is guaranteed
to use specific instructions. This allows an interrupt handler or
other introspection mechanism to straightforwardly detect whether
the program is currently spinning in the infinite loop and possibly
terminate the program if so. The intent is that this intrinsic may
be used as a more efficient alternative to a conditional branch to
a call to ``llvm.trap`` in circumstances where the loop detection
is guaranteed to be present. This construct has been experimentally
determined to be executed more efficiently (when the branch is not taken)
than a conditional branch to a trap instruction on AMD and older Intel
microarchitectures, and is also more code size efficient by avoiding the
need to emit a trap instruction and possibly a long branch instruction.
On i386 and x86_64, the infinite loop is guaranteed to consist of a short
conditional branch instruction that branches to itself. Specifically,
[9 lines not shown]