[AMDGPU] Enabled GCN trackers (amdgpu-use-amdgpu-trackers) by default.
The LIT tests have been generally updated in one of the following ways:
(1) If the above option was not present and the test was auto-generated,
the test has now been auto-generated.
(2) If the above option was not present and the test was not
auto-generated, added the option -amdgpu-use-amdgpu-trackers=0 so as to
preserve any specific attributes the test was already checking.
(3) If the above option was present in a test, then its value has been
updated to reflect the change in the default.
Currently, there are 4 tests in category (2). They are:
CodeGen/AMDGPU/
addrspacecast.ll
schedule-regpressure-limit.ll
schedule-regpressure-limit2.ll
sema-v-unsched-bundle.ll
There are 8 tests in category (3). They are:
[15 lines not shown]
[DirectX] Denote `dx.resource.getpointer` with `IntrInaccessibleMemOnly` and `IntrReadMem` (#193593)
`IntrConvergent` was originally added to `dx.resource.getpointer` to
prevent optimization passes (`SimplifyCFG`, `GVN`) from sinking the
intrinsic out of control flow branches, which would create phi nodes on
the returned pointer.
Using `IntrInaccessibleMemOnly` and `IntrReadMem` semantics still
prevent passes from merging or sinking identical calls across branches.
However, this allows the call to be moved within a single control flow
path.
Updates relevant tests and adds a new test to demonstrate a now legal
potential optimization.
This was discovered when
https://github.com/llvm/llvm-project/pull/188792 caused the following
failure:
https://github.com/llvm/llvm-project/actions/runs/24577221310/job/71865579618.
[5 lines not shown]
[SLP] Skip FMulAdd conversion for alt-shuffle FAdd/FSub nodes (#193960)
isAddSubLikeOp() admits alt-shuffle nodes that mix FAdd and FSub, so
transformNodes() was marking them with CombinedOp = FMulAdd. The cost
model then priced the node as a single llvm.fmuladd vector intrinsic,
but emission for an alt shuffle still goes through the ShuffleVector
path and produces fmul + fadd + fsub + shufflevector, which the backend
cannot fuse into a single fmuladd. The resulting under-count made SLP
choose the vector form over the scalar form even when the scalar form
lowers to real FMAs (e.g. fmadd + fmsub on AArch64).
[clang][CIR] Add lowering for vcvtd_n_ and vcvts_n_ conversion intrinsics (#190961) (#193273)
This PR adds lowering for the missing conversion intrinsics with an
immediate argument (identified by `_n_` in the intrinsic name), namely
the `vcvts_n_` and `vcvtd_n_` variants.
It also moves the corresponding tests from:
* clang/test/CodeGen/AArch64/neon_intrinsics.c
to:
* clang/test/CodeGen/AArch64/neon/intrinsics.c
The lowering follows the existing implementation in
CodeGen/TargetBuiltins/ARM.cpp.
Reference:
[1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#conversions
[ThinLTO] Reduce the number of renaming due to promotions in distribu… (#188074)
…ted mode
For thin-lto, the pull request [1] reduced the number of renaming due to
promotions in process mode. This has been used in linux kernel ([2]) as
it helps kernel live patching a lot.
Recently, I found Rong Xu has added thin-lto distributed mode support in
linux kenrel ([3]) and it is likely to be merged in kernel as well. So
it would be a good idea for llvm to support reducing the number of
renaming in distributed mode too.
To implement this, in function gatherImportedSummariesForModule(),
import functions into summaries if those functions does not need rename.
This will ensure that imported functions have the same name as in there
original module.
[1] https://github.com/llvm/llvm-project/pull/183793
[3 lines not shown]
CodeGen: Fix double counting bundles in inst size verification (#191460)
The AMDGPU implementation handles bundles by summing the
member instructions. This was starting with the size of the
bundle instruction, then re-adding all of the same instructions.
This loop is over the iterator, not instr_iterator, so it should
not be looking through the bundled instructions. Most of the other
uses of getInstSizeInBytes are also on the iterator, not the
instr_iterator so the convention seems to be targets need to handle
BUNDLE correctly themselves.
[ARM] Fold SELECT (AND(X,1) == 0), C1, C2 -> XOR(C1,AND(NEG(AND(X,1)),XOR(C1,C2)) in Thumb1 (#185898)
Thumb1 has no native cmov, so this is a better solution.
[flang] Fix abort on invalid -fdo-concurrent-to-openmp value. (#193929)
We observed that following command can cause an assertion fail
`flang -fopenmp -fdo-concurrent-to-openmp=devic,e` <file>
It happened because `parseDoConcurrentMapping` reported an error but
still called `val.value()` on failure, tripping std::optional
assertions.
The fix is to return false on error and wire return into
`createFromArgs`.