Revert "[lldb] Rally around triple rather than arch in the API tests (#191416)" (#192763)
Temoprarily reverting while we look at the TestMacCatalyst.py and
TestRosetta.py fails introduced by this PR, to unblock the CI.
This reverts commit 86397f49c7725f35a51517a8290cb4207c97771d.
[BPF] Handle aliases in CodeGenModule::EmitExternalDeclaration. Fixes #192365 (#192374)
Adds handling of global aliases in
CodeGenModule::EmitExternalDeclaration. This fixes a clang crash on some
real code, see llvm#192365.
[lldb] Store the dummy target in the selected execution context (#190496)
Store the dummy target in the selected execution context. There's no
reason for everybody to have to independently fall back to the dummy
target.
[mlir][tensor] Remove unit-stride restriction in InsertSliceOp folding (#192600)
This PR replaces manual offset/size resolution with `affine::mergeOffsetsSizesAndStrides`, simplifying the code and extending subview-of-subview folding to support non-unit strides.
[UBSan][test] Make aggregate alignment test precise for Darwin
Darwin adds an alignment check on dest, which was causing test failure.
rdar://120802910
[NewPM] Port AArch64RedundantCondBranch to the new pass manager (#190897)
Adds a newPM pass for AArch64RedundantCondBranch
- Refactors base logic into an Impl class
- Renames old pass with the "Legacy" suffix
- Adds the new pass manager pass using refactored logic
- Updated existing .mir tests to also test with the New Pass Manager.
Context and motivation in
https://llvm.org/docs/NewPassManager.html#status-of-the-new-and-legacy-pass-managers
[NVPTX] Constant fold blockDim when reqntid is specified (#191575)
Currently, NVPTX cannot fold the `ntid.x/y/z` intrinsic calls into const
values when `reqntid` is specified, which prevents the code from further
optimization.
Therefore, in this change, we extend the `NVVMIntrRange` pass to:
- Tighten `ntid.x/y/z` intrinsic calls to one value range, which can be
const folded in later InstCombine pass
- Tighten `tid.x/y/z` range attributes to use per-dimension reqntid
bounds
- When .reqntid exceeds hardware limits, garbage-in/garbage-out
[AMDGPU] Report only local per-function resource usage when object linking is enabled
With object linking the linker aggregates resource usage across TUs via
`.amdgpu.info`, so compile-time pessimism and call-graph propagation duplicate
the linker's work or pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
max/or expressions.
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[AMDGPU][GlobalIsel] Add regbank support for cvt_scalef32_sr_pk_f6_f116/32 intrinsics (#192745)
This patch adds register bank legalization rules for
cvt_scalef32_sr_pk_f6_f116/32 intrinsics in the AMDGPU GlobalISel
pipeline.
[AMDGPU] Report only local per-function resource usage when object linking is enabled
With object linking the linker aggregates resource usage across TUs via
`.amdgpu.info`, so compile-time pessimism and call-graph propagation duplicate
the linker's work or pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
max/or expressions.
[AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
[4 lines not shown]
[RISCV] Cost UDIV/UREM by a constant power of 2 as a SHL/AND in getArithmeticInstrCost() (#179570)
Similar to behavior in X86 and AArch64.
---------
Co-authored-by: Ryan Buchner <rbuchner at qti.qualcomm.com>
Co-authored-by: Luke Lau <luke_lau at icloud.com>
Revert "Reland "[LTO][LLD] Prevent invalid LTO libfunc transforms (#164916)"" (#192741)
Reverts llvm/llvm-project#190642
A bisect shows this as the change leading to the link failure at
https://g-issues.fuchsia.dev/issues/503377901
[clang-tidy] Prevent false-positive in presence of derived-to-base cast in bugprone.use-after-move (#189638)
The following scenario is quite common, but was reported as a
use-after-move:
```cpp
struct Base {
Base(Base&&);
};
struct C : Base {
int field;
C(C&& c) :
Base(std::move(c)), // << only moves through the base type
field(c.field) // << this is a valid use-after-move
{}
};
```
Fix this by checking field origin when the moved value is immediately
cast to base.
[AArch64][GlobalISel] Fix nonterminating legalization for <8 x s4> vectors. (#192747)
G_CONCAT_VECTORS with <16 x s4> sources hits the bitcast legalization
path, which round-trips through scalar types (e.g. s32) and regenerates
<8 x s4> vectors via G_UNMERGE_VALUES and G_BUILD_VECTOR. The
G_BUILD_VECTOR is then widened to <8 x s8> (via .minScalarOrElt(0, s8)),
producing G_ANYEXT/G_TRUNC artifact pairs. The artifact combiner folds
these pairs away, restoring the original <8 x s4> types, which feeds
back into G_CONCAT_VECTORS again.
This change:
* Adds .minScalarOrElt(1, s8) to the G_ICMP rules to ensure operand
vector elements are at least s8. This causes <16 x s4> operands to be
widened
to <16 x s8>, and the result type follows via minScalarEltSameAs.
* Add custom legalization for G_CONCAT_VECTORS when element size < 8.
The custom handler widens source operands via G_ANYEXT (e.g.
<8 x s4> -> <8 x s8>), concats the widened vectors (producing a
[6 lines not shown]
[flang] NameUniquer helper for detecting module-scope data (#192733)
Add NameUniquer::isModuleScopeDataUniquedName to detect uniqued names
for module-scope data (variables, named constants, and common blocks),
excluding procedures and other prefixed symbols.
[InstCombine] Fold bitcast into vp.load (#192173)
Similar to normal loads, we should be able to fold bitcast into
`vp.load` if (1) mask is all-ones (2) either the new vector type has a
larger known minimum length than that of the original vector, or you
need to make sure the original EVL can be exact divided by the
decreasing factor (of the known minimum length).
This patch adds such folding pattern, though it only support cases where
the new vector type has a larger known minimum length.
[CIR] Fix typeinfo linkage and comdat (#192721)
We weren't properly setting the linkage on typeinfo objects, leading to
multiple definition linking error when typeinfo for a class was
referenced in multiple source files. We had the correct linkage
available in the buildTypeInfo function, but we weren't doing anything
with it. This also prevented us from hitting the diagnostic saying that
we should have set the comdat attribute for the typeinfo. This change
fixes both of those problems.