[CodeGen] Avoid ambiguous Register comparison in C++20; NFC (#205814)
Fix an "ambiguous overload for ‘operator==’" error when compiling with
`-std=c++20`, caused by C++20's rewritten operator== candidate rules.
[VectorCombine] Bail out on all-poison leaves in shuffle transform (#206503)
foldShufflesOfLengthChangingShuffles() skips undef sources when
determining Y, so if all the leaves are undef, we can end up with Y
being nullptr after the loop. Bail out in this degenerate case.
[X86] Hygon C86-4G-M8 Initial enablement (#204587)
This patch adds initial support for Hygon C86-4G-M8 architectures:
- Added C86-4G-M8 CPU targets recognition in Clang and LLVM
- Added C86-4G-M8 to target parser and host CPU detection
- Updated compiler-rt CPU model detection for C86-4G-M8
- Added C86-4G-M8 to various optimizer tests
- Added scheduler models and llvm-mca tests for C86-4G-M8 CPU targets
[clang][bytecode] Use ASTRecordLayout offsets when subtracting pointers (#206496)
What we did here didn't work properly for pointers casted to bases. Add
`Pointer::computeLayoutOffset()` and use that to return the proper
values.
[LoopUnroll] Skip called function in constant-op reduction filter (#200868)
canParallelizeReductionWhenUnrolling iterates the latch instruction's
operands and rejects the reduction if any is a Constant. For calls the
called function is itself a Constant, falsely rejecting every intrinsic
form (fmuladd, smin/smax/umin/umax, etc.). Use CallBase::args() to
restrict the check to data operands.
[DAGCombiner][NVPTX] Avoid forming illegal-typed shuffles after type-legalization (#205056)
Currently, `combineInsertEltToShuffle` could create a shuffle of an
illegal type after type legalization, which when reaches the operation
legalizer, asserts ("Unexpected illegal type!").
https://github.com/llvm/llvm-project/pull/198259 fixed a crash resulting
from this in NVPTX but resulted in regressions with some types due to
the check blocking pre-type-legalization folds in addition to the
illegal post-type-legalization shuffle.
This change removes the TTI override in NVPTX and adds a guard in the
`combineInsertEltToShuffle` pattern to avoid forming illegal-typed
shuffles after type legalization.
[clang] The `__reference_meows_from_temporary` builtins should SFINAE friendly when the 1st type is not a reference type (#206527)
Suppose that `__reference_constructs_from_temporary` is defined as:
```cpp
__reference_constructs_from_temporary(_Tp, _Up);
```
A non-reference type can never bind to a temporary, so the result is
always `false` for such a `_Tp`. We should short-circuit before reaching
the instantiations by check the type of `_Tp`. But clang's
`__reference_constructs_from_temporary` eagerly instantiates the
construction of `_Up` (including the element's constructor exception
specification) even when `_Tp` is not a reference, which can hard-error
on misbehaved types.
The following code should be accepted, but clang raise a hard error:
```cpp
struct NoConv {};
[13 lines not shown]
[flang][cmake] Order flang profdata generation after clang's (#206023)
The clang and flang PGO pipelines clean and regenerate the same shared
profraw directories, so running them concurrently can truncate a profraw
while the other merge has it mmap'd. Add an ordering edge so flang's
pipeline runs after clang's.
Fixes issues introduced by
https://github.com/llvm/llvm-project/pull/198863
[mlir][linalg] Handle existing destination-passing-style ops in `transform.structured.rewrite_in_destination_passing_style` (#205034)
`transform.structured.rewrite_in_destination_passing_style` may be
applied to an operation that is already in destination-passing style,
e.g. `linalg.add`. In this case, the operation does not need to be
rewritten, but the current `TypeSwitch` does not handle
`DestinationStyleOpInterface` and falls through to the unreachable case.
Such operations can be handled by returning them unchanged. This makes
the transform accept already destination-style operations and avoids the
crash.
An regression test for applying `rewrite_in_destination_passing_style`
is added to `linalg.add`.
Fixes #204099
[WebAssembly][NFC] Remove direct access to FeatureKV (#206232)
This is preparatory work for changing the representation of
FeatureKV/SubTypeKV, in which they will no longer be that easily
accessible as global variables. Therefore, get them from the subtarget
instead.
[clang][bytecode] Implement support for `Expr::EvaluateWithSubstitution()` (#204781)
This regresses `Sema/enable_if.c`, which now fails when run with the
bytecode interpreter. We also get 14 more diagnostic differences in
`SemaCXX/builtin-object-size-cxx14.cpp`.
Fixes https://github.com/llvm/llvm-project/issues/138473
[mlir][python][NFC] Clean up nanobind compile options (#206559)
Follow-up to #204230.
Refactor nanobind warning suppression flags into `build_nanobind_lib`.
Drop duplicate RTTI and exception flags.
Revert "[MergeFunctions] Preserve entry counts on folds" (#206640)
Reverts llvm/llvm-project#202218
Causes build failures and needs to be rebased on top of main before
relanding.
[Clang][RISCV] packed reduction sum intrinsics (#206441)
Add the __riscv_predsum/predsumu_* header wrappers over new
__builtin_riscv_* builtins, lowering to the llvm.riscv.predsum/predsumu
intrinsics.
[RISCV][P-ext] Avoid redundant accumulator extend for reduction sum (#206430)
For a reduction sum with an i32 accumulator on RV64, the result is
computed at i64 and truncated, so the accumulator's upper bits are
unused. Any-extend it instead of sign-/zero-extending, dropping a
redundant sext.w/zext.w. Follow-up to #206004.
[FixIrreducible] Handle conditional branch with both successors as header (#206057)
A conditional branch redirecting edges to the cycle header may have both
successors equal to the header (e.g. `br i1 %c, label %h, label %h`),
which the previous `Succ1 = Succ0 ? nullptr : Header` logic mishandled
by dropping the second edge.
Check each successor independently against the header instead.
Fixes https://github.com/llvm/llvm-project/issues/191979.
[MergeFunctions] Preserve entry counts on folds (#202218)
**Summary**
`MergeFunctions` can fold equivalent functions into a single retained
implementation. When that happens, the retained body may be reached by
callers of both original functions, but its `function_entry_count`
metadata previously preserved only one side of the profile data.
For example, folding functions with entry counts `2000` and `1000` could
leave the retained body with only `2000`. This patch updates the
retained implementation after a successful merge, so the entry count
becomes `3000`, using saturating add.
For ODR/double-thunk merges, the private backing body gets the combined
count while the thunks keep their own entry counts. For alias-backed
merges, the backing function carries the combined count.
**AI Assistance Disclosure**
[3 lines not shown]
[RFC][CodeGen] Add generic target feature checks for intrinsics (#201470)
This PR adds target-independent infrastructure for annotating LLVM
intrinsics with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen
records. TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering
target intrinsics. This allows targets to opt in by annotating intrinsic
definitions directly, rather than adding custom checks during lowering,
legalization, or instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[Sema][NFC] Extract allocation overload diagnostics (#206219)
This extracts the code that emits diagnostics when no viable function is
found for allocation overload resolution to reduce the diff in #203824.