[KnownBits][SelectionDAG] Add KnownBits::clmul. Support trailing bits. NFC (#177517)
Borrow the known trailing bits logic from KnownBits::mul, but using
APIntOps::clmul.
[RISCV] Select (clmul (zext_inreg X, i32), (zext_inreg X, i32)) as (clmulh (slli X, 32), (slli X, 32)). (#177429)
Without Zba. We do the same for MUL->MULHU without Zba.
AMDGPU: Use generic legality checks instead of checking subtarget feature
Avoid checking predicates on AMDGPUSubtarget when possible. Also add a couple
of tests for the ctlz combine where ffbh isn't legal. I'm not sure what
the point of the previous check was.
[LLVM] Update assert to removed unused variable warning. (#177632)
Remove the variable definition and move the function call directly into
the assert statement. Otherwise builds with -Werror that don't use
asserts would fail.
[Support] Avoid misguided FreeBSD hack (#177508)
FreeBSD doesn't do anything wrong here, it just happens to define and
use a struct thread in its own headers. The problems arise because here
in LLVM we have using namespace llvm prior to including system headers,
which is bad practice for precisely this reason. If we instead play by
the rules and defer our using namespace llvm until after we've included
the system headers then we no longer need this hack.
This hack is particularly problematic by being conditional on
__FreeBSD__ as of 9093ba9f7ee5 ("[Support] Include Support/thread.h
before api implementations (#111175)"), since on non-FreeBSD
Threading.inc can reference anything in Support/thread.h, only causing
errors on FreeBSD, which is precisely what happened in 64be34c562a2
("Enable using threads on z/OS (#171847)").
By deferring the using namespace llvm until after Threading.inc is
included there may be build failures introduced on untested platforms
due to needing to replace unqualified identifiers with qualified ones by
prepending llvm::.
AMDGPU: Ignore type legality in isFAbsFree (#177630)
This treats it as free on targets without legal f16. This
matches the existing logic in fneg, and they should be the same.
The test changes are mostly neutral with a few improvements.
[HIP] Pass HIP library directly and refactor (#176019)
Summary:
Currently we pass `-L` and `-l` to get the HIP library. Because we are
attached to a single HIP installation it's far better to pass it by
filename. This is because the `-L` could be out of order with other user
libraries and those could override it. If someone uses HIP with a
specific ROCm installation they most likely want that library, otherwise
incompatibilities can occur. This is still overridable with command line
flags if users want to pass a different one for some reason.
This PR also refactors the handling to be more generic for future
additions.
[profcheck] Fix profle metatdata propagation for Large Integer operations (#175862)
This PR improves the propagation of profile metadata within the
ExpandIRInsts pass. When lowering large integer division operations, the
pass now ensures that branch weights are correctly attached to the
generated control flow, preventing the loss of profile data during IR
expansion.
This PR improves signed and unsigned division/remainder for non-native
bit widths (e.g., `sdiv/udiv i129`, `srem/urem i129`) and implemented
Heuristic-Based Branch Weights labeling using established heuristics for
edge cases e.g., `Division-by-zero guards` and `Magnitude comparisons
between dividends and divisors`.
It also adds detailed comments within the expansion logic to explain the
rationale behind specific branch weight choices and the underlying
mathematical invariants.
Please refer to the implementation details in the source code for the
[2 lines not shown]
[SystemZ] Implement ctor/dtor emission via @@SQINIT and .xtor sections (#171476)
This patch implements support for constructors/destructors by
introducing the
`@@SQINIT` section and emitting `.xtor.<priority>` sections within the
SystemZ
AsmPrinter and in the GOFF object lowering layer.
AMDGPU: Remove dead code configuring f16 is_fpclass (#177626)
isTypeLegal can never be true here. The register classes
are registered at the end of the target lowering constructor,
and in the subclasses.
[NFCI][AMDGPU] Fix the predicate `HasDsSrc2Insts` (#177621)
I'm not sure why the predicate has a `!`, and more surprisingly,
removing it doesn't change anything.
[clang][test] Fix builtin-rotate.c failure on ARM32 (#177290)
Replace unsigned __int128 with unsigned _BitInt(128) since __int128 is
not supported on ARM 32-bit targets.
Fixes https://lab.llvm.org/buildbot/#/builders/79/builds/2754
[VectorCombine] foldShuffleOfBinops - failure to track OperandValueInfo (#171934)
Resolves #170500.
Implemented mergeInfo static helper to return common
TTI::OperandValueInfo data .
Added common OperandValueInfo `Op0Info` && `Op1Info` to NewCost
calculation.
AMDGPU: Ignore type legality in isFAbsFree
This treats it as free on targets without legal f16. This
matches the existing logic in fneg, and they should be the same.
The test changes are mostly neutral with a few improvements.