[AMDGPU][GlobalIsel] Add regbank support for cvt_scalef32_sr_pk_f6_f116/32 intrinsics (#192745)
This patch adds register bank legalization rules for
cvt_scalef32_sr_pk_f6_f116/32 intrinsics in the AMDGPU GlobalISel
pipeline.
[AMDGPU] Report only local per-function resource usage when object linking is enabled
With object linking the linker aggregates resource usage across TUs via
`.amdgpu.info`, so compile-time pessimism and call-graph propagation duplicate
the linker's work or pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
max/or expressions.
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[RISCV] Cost UDIV/UREM by a constant power of 2 as a SHL/AND in getArithmeticInstrCost() (#179570)
Similar to behavior in X86 and AArch64.
---------
Co-authored-by: Ryan Buchner <rbuchner at qti.qualcomm.com>
Co-authored-by: Luke Lau <luke_lau at icloud.com>
Revert "Reland "[LTO][LLD] Prevent invalid LTO libfunc transforms (#164916)"" (#192741)
Reverts llvm/llvm-project#190642
A bisect shows this as the change leading to the link failure at
https://g-issues.fuchsia.dev/issues/503377901
[clang-tidy] Prevent false-positive in presence of derived-to-base cast in bugprone.use-after-move (#189638)
The following scenario is quite common, but was reported as a
use-after-move:
```cpp
struct Base {
Base(Base&&);
};
struct C : Base {
int field;
C(C&& c) :
Base(std::move(c)), // << only moves through the base type
field(c.field) // << this is a valid use-after-move
{}
};
```
Fix this by checking field origin when the moved value is immediately
cast to base.
[AArch64][GlobalISel] Fix nonterminating legalization for <8 x s4> vectors. (#192747)
G_CONCAT_VECTORS with <16 x s4> sources hits the bitcast legalization
path, which round-trips through scalar types (e.g. s32) and regenerates
<8 x s4> vectors via G_UNMERGE_VALUES and G_BUILD_VECTOR. The
G_BUILD_VECTOR is then widened to <8 x s8> (via .minScalarOrElt(0, s8)),
producing G_ANYEXT/G_TRUNC artifact pairs. The artifact combiner folds
these pairs away, restoring the original <8 x s4> types, which feeds
back into G_CONCAT_VECTORS again.
This change:
* Adds .minScalarOrElt(1, s8) to the G_ICMP rules to ensure operand
vector elements are at least s8. This causes <16 x s4> operands to be
widened
to <16 x s8>, and the result type follows via minScalarEltSameAs.
* Add custom legalization for G_CONCAT_VECTORS when element size < 8.
The custom handler widens source operands via G_ANYEXT (e.g.
<8 x s4> -> <8 x s8>), concats the widened vectors (producing a
[6 lines not shown]
security/nss: update to 3.123
Announcement:
https://groups.google.com/a/mozilla.org/g/dev-tech-crypto/c/AW6VHkn6E0o
Patch patch-lib_softoken_pkcs11c.c was dropped - it is unclear if it
was still relevant. The last discussion of the problem this patch was
supposed to fix happened >15 years ago, and nothing came out of that.
[flang] NameUniquer helper for detecting module-scope data (#192733)
Add NameUniquer::isModuleScopeDataUniquedName to detect uniqued names
for module-scope data (variables, named constants, and common blocks),
excluding procedures and other prefixed symbols.
[InstCombine] Fold bitcast into vp.load (#192173)
Similar to normal loads, we should be able to fold bitcast into
`vp.load` if (1) mask is all-ones (2) either the new vector type has a
larger known minimum length than that of the original vector, or you
need to make sure the original EVL can be exact divided by the
decreasing factor (of the known minimum length).
This patch adds such folding pattern, though it only support cases where
the new vector type has a larger known minimum length.
Merge tag 'hwlock-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull hwspinlock updates from Bjorn Andersson:
"Remove the unused u8500 hardware spinlock driver, and clean out the
hwspinlock_pdata struct as this was the last user of the struct"
* tag 'hwlock-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
hwspinlock: remove now unused pdata from header file
hwspinlock: u8500: delete driver
Merge tag 'rpmsg-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull rpmsg updates from Bjorn Andersson:
"Mark 'data' argument in rpmsg_send() const, and perculate to related
drivers. Replace deprecated class_destroy() with class_unregister()"
* tag 'rpmsg-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
media: platform: mtk-mdp3: Constify buffer passed to mdp_vpu_sendmsg()
ASoC: qcom: Constify GPR packet being send over GPR interface
rpmsg: Constify buffer passed to send API
remoteproc: mtk_scp: Constify buffer passed to scp_send_ipi()
remoteproc: mtk_scp_ipi: Constify buffer passed to scp_ipi_send()
drivers: rpmsg: class_destroy() is deprecated