[llvm] Fix comment references deprecated make_scope_exit (#175820)
After #173131 and #174030, make_scope_exit is no longer used in
ThreadPool. Fix comment that references old APIs and references the new
API instead.
[VPlan] Optimize BranchOnTwoConds to chain of 2 simple branches. (#174016)
This patch improves the lowering for BranchOnTwoConds added in
https://github.com/llvm/llvm-project/pull/172750 by replacing the branch
on OR with a chain of 2 branches.
On Apple M cores, the new lowering is ~8-10% faster for std::find-like
loops. It also makes it easier to determine the early exits in VPlan. I
am also planning on extensions to support loops with multiple early
exits and early-exits at different positions, which should also be
slightly easier to do with the new representation.
PR: https://github.com/llvm/llvm-project/pull/174016
[clang-tools-extra] Update Maintainers for Clang-Doc
Currently, Erick Velez has been doing the bulk of clang-doc development.
The maintainer being removed hasn't participated in almost a year, so it
would be good to have active maintainers listed in the file.
AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7
Keep bf16/f16 values encoded as the low half of a 32-bit register,
instead of promoting to float. This avoids unwanted FP effects
from the fpext/fptrunc which should not be implied by just
passing an argument. This also fixes ABI divergence between
SelectionDAG and GlobalISel.
I've wanted to make this change for ages, and failed the last
few times. The main complication was the hack to return
shader integer types in SGPRs, which now needs to inspect
the underlying IR type.
AMDGPU: Change ABI of 16-bit element vectors on gfx6/7
Fix ABI on old subtargets so match new subtargets, packing
16-bit element subvectors into 32-bit registers. Previously
this would be scalarized and promoted to i32/float.
Note this only changes the vector cases. Scalar i16/half are
still promoted to i32/float for now. I've unsuccessfully tried
to make that switch in the past, so leave that for later.
This will help with removal of softPromoteHalfType.
InstCombine: Implement SimplifyDemandedFPClass for fma
This can't do much filtering on the sources, except for nans.
We can also attempt to introduce ninf/nnan.
ValueTracking: Improve handling for fma/fmuladd
The handling for fma was very basic and only handled the
repeated input case. Re-use the fmul and fadd handling for more
accurate sign bit and nan handling.
GlobalISel: Fix mishandling vector-as-scalar in return values
This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16>
values as packed on gfx6/gfx7. The ABI does not pack values
currently; this is a pre-fix for that change.
Insert a bitcast if there is a single part with a different size.
Previously this would miscompile by going through the scalarization
and extend path, dropping the high element.
Also fix assertions in odd cases, like <3 x i16> -> i32. This needs
to unmerge with excess elements from the widened source vector.
All of this code is in need of a cleanup; this should look more
like the DAG version using getVectorTypeBreakdown.
[AMDGPU][Test][AIX] use tr instead of sed for line split (#175557)
Test case is using sed command `sed 's/,/,\n/g'` to split a line.
On AIX that is not working with the AIX system's `sed`
AIX external BB fails from
https://lab.llvm.org/buildbot/#/builders/64/builds/6911
Here substitute:
`sed 's/,/,\n/g'`
with:
`tr ',' '\n'`
but because `tr` does not keeps the comma, also needed to change looked
for texts i.e. to remove the comma `,` from them since it is not needed
for the correctness.
Co-authored-by: Daniel Chen <cdchen at ca.ibm.com>
[mlir] Use bind_front in RemarkEngine. NFC. (#175818)
Switch from C++11 `std::bind` to C++26 `bind_front` backported in
https://github.com/llvm/llvm-project/pull/175056.
The former is an old design that predates lambdas and uses explicit
placeholders. `bind_front` should produce a much smaller object (we only
need one pointer).
AArch64: Add TBZ/TBNZ matcher for x & (1 << y).
x & (1 << y) is InstCombine's canonical form of a bit test which is
currently code generated literally, missing an opportunity to use TBZ/TBNZ
on bit 0 of x >> y, which generally results in an instruction sequence
that is shorter by 2 instructions. Implement this optimization. On my
machine this results in a 0.05% reduction in clang binary size and a 0.25%
reduction in dynamic instruction count compiling AArch64ISelLowering.cpp.
Reviewers: davemgreen, fhahn
Reviewed By: davemgreen
Pull Request: https://github.com/llvm/llvm-project/pull/172962