Improved ISD::SRL handling in isKnownToBeAPowerOfTwo (#182562)
Fixes #181651
Added DemandedElts argument to isConstOrConstSplat and to
isKnowTobePowerOfTwo calls and OrZero || isKnownNeverZero(Val, Depth) is
checked before isKnowTobePowerOfTwo. Also added unit tests.
[IR] Make BranchInst operand order consistent (#186609)
Ensure that successors are always reported in the same order in which
they are stored in the operand list.
[CIR] Split CIR_UnaryOp into individual operations
Split the monolithic cir.unary operation (which dispatched on a
UnaryOpKind enum) into four separate operations: cir.inc, cir.dec,
cir.minus, and cir.not.
This follows the same pattern used when cir.binop was split into
individual binary operations (AddOp, SubOp, etc.).
Changes:
- Add CIR_UnaryOpInterface with getInput()/getResult() methods
- Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes
- Define IncOp, DecOp, MinusOp, NotOp with per-op folds
- Add Involution trait to NotOp for not(not(x)) -> x folding
- Replace createUnaryOp() with createInc/Dec/Minus/Not builders
- Split LLVM lowering into four separate patterns
- Split LoweringPrepare complex-type handling per unary op
- Update CIRCanonicalize and CIRSimplify for new op types
- Update all codegen files to use bool params instead of UnaryOpKind
[6 lines not shown]
[CIR] Remove cir.unary(plus, ...) and emit nothing for unary plus
Traditional codegen never emits any operation for unary plus — it just
visits the subexpression as a pure identity at the codegen level. Align
CIRGen with this behavior by removing Plus from UnaryOpKind entirely
and having VisitUnaryPlus directly visit the subexpression with the
appropriate promotion/demotion handling.
[CIR] Add cir.min op and refactor cir.max lowering (#185276)
Add cir.min operation for integer minimum computation. Refactor cir.max
lowering into a shared lowerMinMaxOp template reused by both ops.
[msan][NFCI] Replace unnecessary shadow cast with assertion (#186498)
Fabian Wolff pointed out that #176031 made the output of CreateIntCast()
unused in handleBitwiseAnd().
Upon closer inspection, the CreateIntCast()s are unnecessary, because the
arguments to handleBitwiseAnd() (and visitOr()) are integers or vectors of
integers, for which the shadow types are the same as the original types.
This patch removes the unnecessary if and shadow cast, and adds
assertions.
[IR] Add Instruction::successors() (#186606)
Nowadays all terminators store all successor operands consecutively, so
we can expose the range of successors through a unified interface.
Rename succ_op_iterator to succ_iterator for consistency, also with
Machine IR.
Preliminary work for replacing the succ_iterator in CFG.h with an
iterator that iterates directly over the uses.
[lldb] Rename Status variables to avoid confusion (NFC) (#186486)
Rename Status variables that are named `error` to `status` to avoid
confusion with llvm::Error as the latter becomes more and more
prevalent.
[llvm-mc] Default output assembly variant to AssemblerDialect (#186317)
Previously, llvm-mc always defaulted to output assembly variant 0
regardless of the target's AssemblerDialect. This was inconsistent:
llvm-mc -x86-asm-syntax=intel changed the input parser to Intel syntax
but output stayed AT&T, unlike clang's -masm=intel which affects both.
When --output-asm-variant is not explicitly specified, fall back to
MAI->getAssemblerDialect() instead of hardcoding variant 0. This
makes the output match the target's configured dialect:
- X86: -x86-asm-syntax=intel now produces Intel output
- AArch64: Apple triples default to Apple syntax output
- SystemZ: z/OS triples default to HLASM syntax output
Tests that relied on a specific output variant now use explicit
--output-asm-variant=0.
[X86] Reject 'p' constraint without 'a' modifier in inline asm (#185799)
The 'p' constraint produces an address operand that should only be
printed with the 'a' modifier (e.g., %a0). Without it, GCC and Clang
produce different and arguably incorrect output
https://github.com/llvm/llvm-project/issues/185343#issuecomment-4029670370
Reject the combination to catch misuse early.
[WebAssembly][NFC] Rename and test FastISel selectBr (#186577)
selectBr only handles conditional branches and also wasn't tested.
Clarify the name and add test that enforces that there's no fallback.
[IR][NFC] Remove BranchInst successor functions (#186604)
The efficient access is now handled by UncondBrInst/CondBrInst,
Instruction functions handle the more generic cases. These functions are
now largely unused now that most uses of BranchInst are gone.
Preliminary work for making the CondBrInst operand order consistent.
[CIR] Remove cir.unary(plus, ...) and emit nothing for unary plus
Traditional codegen never emits any operation for unary plus — it just
visits the subexpression as a pure identity at the codegen level. Align
CIRGen with this behavior by removing Plus from UnaryOpKind entirely
and having VisitUnaryPlus directly visit the subexpression with the
appropriate promotion/demotion handling.
[CIR] Split CIR_UnaryOp into individual operations
Split the monolithic cir.unary operation (which dispatched on a
UnaryOpKind enum) into four separate operations: cir.inc, cir.dec,
cir.minus, and cir.not.
This follows the same pattern used when cir.binop was split into
individual binary operations (AddOp, SubOp, etc.).
Changes:
- Add CIR_UnaryOpInterface with getInput()/getResult() methods
- Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes
- Define IncOp, DecOp, MinusOp, NotOp with per-op folds
- Add Involution trait to NotOp for not(not(x)) -> x folding
- Replace createUnaryOp() with createInc/Dec/Minus/Not builders
- Split LLVM lowering into four separate patterns
- Split LoweringPrepare complex-type handling per unary op
- Update CIRCanonicalize and CIRSimplify for new op types
- Update all codegen files to use bool params instead of UnaryOpKind
[6 lines not shown]
[CIR] Add cir.min op and refactor cir.max lowering
Add cir.min operation for integer minimum computation. Refactor cir.max
lowering into a shared lowerMinMaxOp template reused by both ops. Includes
lowering tests for signed, unsigned, and vector types, plus canonicalization
tests.
[LV] Move predication, early exit & region handling to VPlan0 (NFCI) (#185305)
Move handleEarlyExits, predication and region creation to operate
directly on VPlan0. This means they only have to run once, reducing
compile time a bit; the relative order remains unchanged.
Introducing the regions at this point in particular unlocks performing
more transforms once, on the initial VPlan, instead of running them for
each VF.
Whether a scalar epilogue is required is still determined by legacy cost
model, so we need to still account for that in the VF specific VPlan
logic.
PR: https://github.com/llvm/llvm-project/pull/185305
[Transforms/Scalar][NFC] Drop uses of BranchInst (#186592)
I ended up relaxing some of the checks that LoopInterchange made, the
assumptions that certain instructions were branches seemed to not be
used at all.
[clang-tidy] Fix false positive in `readability-else-after-return` on `return` jumped over by `goto` (#186370)
Given this code:
```cpp
if (...) {
goto skip_over_return;
return;
skip_over_return:
foo();
} else {
...
}
```
...the check suggests removing the `else`, which is not a valid
transformation. This is because it looks at *all* the substatements of
the then-branch for interrupting statements. This PR changes it to only
look at the *final* substatement.
[17 lines not shown]
[CIR] Add Commutative/Idempotent traits to binary ops (#185163)
Add missing MLIR traits to CIR binary operations:
- AndOp, OrOp: Commutative, Idempotent
- AddOp, MulOp, XorOp, MaxOp: Commutative
Add these ops to the CIRCanonicalize pass op list so trait-based
folding is exercised by applyOpPatternsGreedily.
[CIR] Add cir.min op and refactor cir.max lowering
Add cir.min operation for integer minimum computation. Refactor cir.max
lowering into a shared lowerMinMaxOp template reused by both ops. Includes
lowering tests for signed, unsigned, and vector types, plus canonicalization
tests.
[CIR] Split CIR_UnaryOp into individual operations
Split the monolithic cir.unary operation (which dispatched on a
UnaryOpKind enum) into four separate operations: cir.inc, cir.dec,
cir.minus, and cir.not.
This follows the same pattern used when cir.binop was split into
individual binary operations (AddOp, SubOp, etc.).
Changes:
- Add CIR_UnaryOpInterface with getInput()/getResult() methods
- Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes
- Define IncOp, DecOp, MinusOp, NotOp with per-op folds
- Add Involution trait to NotOp for not(not(x)) -> x folding
- Replace createUnaryOp() with createInc/Dec/Minus/Not builders
- Split LLVM lowering into four separate patterns
- Split LoweringPrepare complex-type handling per unary op
- Update CIRCanonicalize and CIRSimplify for new op types
- Update all codegen files to use bool params instead of UnaryOpKind
[6 lines not shown]
[CIR] Remove cir.unary(plus, ...) and emit nothing for unary plus
Traditional codegen never emits any operation for unary plus — it just
visits the subexpression as a pure identity at the codegen level. Align
CIRGen with this behavior by removing Plus from UnaryOpKind entirely
and having VisitUnaryPlus directly visit the subexpression with the
appropriate promotion/demotion handling.
[CIR] Add Commutative/Idempotent traits to binary ops
Add missing MLIR traits to CIR binary operations, matching the arith
dialect conventions:
- AndOp, OrOp: Commutative, Idempotent (fixes FIXME)
- AddOp, MulOp, XorOp, MaxOp: Commutative
Add these ops to the CIRCanonicalize pass op list so trait-based
folding is exercised by applyOpPatternsGreedily.
Update testFloatingPointBinOps in binop.cpp to use computed values,
preventing DCE of the now-canonicalized ops.
[StructurizeCFG] Fix incorrect zero-cost hoisting in nested control flow (#183792)
hoistZeroCostElseBlockPhiValues() hoists zero-cost instructions from
else blocks to their common dominator with the then block. When the
merge point has additional predecessors beyond the simple if-else
pattern, the hoisted instruction ends up in a dominator that feeds
a Flow phi on every edge, including edges where the else block was
never taken. simplifyHoistedPhis() then replaces poison entries in
those Flow phis with the hoisted value, causing it to leak into
unrelated paths.
This manifests as miscompilation in sorting kernels compiled with
code coverage: the PGO counter blocks create deeply nested CFGs
where the hoisted shufflevector (used for swapping sort keys)
reaches the no-swap path, corrupting sort results.
Fix by requiring a simple if-else CFG shape before hoisting: ThenBB
must branch directly to ElseSucc and ElseSucc must have exactly 2
predecessors. This matches the structure that simplifyHoistedPhis
assumes.
[X86] apply mulx optimization for two-wide mul instruction (mull, mulq) (#185127)
References: https://github.com/llvm/llvm-project/pull/184462
In the discussion for the linked PR, which removes unnecessary register
to register moves when one operand is in %rdx for mulx, the point was
brought up that this pattern also happens for mull and mulq.
The IR below:
```llvm
declare i32 @foo32()
declare i64 @foo64()
define i32 @mul32_no_implicit_copy(i32 %a0) {
%a1 = call i32 @foo32()
%a2 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %a0, i32 %a1)
%a3 = extractvalue { i32, i1 } %a2, 0
ret i32 %a3
[53 lines not shown]