[AMDGPU] Change static NOP last terminator SI_DEMOTE_I1 to be replaced by S_BRANCH instead of assert (#204649)
This issue was first discovered in some testing downstream. A specific
chain of transformations on a ballot instruction with a constant
argument followed by an llvm.amgcn.wqm.demote call leads to an
instruction of `SI_DEMOTE_I1 -1, 0` being the last terminator of a block
with a single successor. This instruction is a NOP and can safely be
replaced with an S_BRANCH to the block's successor instead of asserting
failure.
The test added in this change is a very simplified recreation of the
pattern seen in the shader compilation in the downstream that lead to
assertion failure
[CIR] Lower byval/byref args in CallConvLowering (#201717)
[CIR] Lower byval/byref args in CallConvLowering
ArgKind::Indirect arguments were hitting an errorNYI in
CIRABIRewriteContext. Add the lowering: in the callee the block argument
type changes to !cir.ptr<T>, a load is inserted at entry so the body sees
the original value type, and llvm.byval or llvm.byref is attached based on
ownership. At call sites, both byval and byref are lowered by allocating a
stack slot, copying the value in, and passing the pointer.
For byval, llvm.noalias and llvm.noundef are also added -- llvm.noalias
because the call-site rewrite always produces a fresh alloca+store
(equivalent to -fpass-by-value-is-noalias), and llvm.noundef because the
copy is always fully defined. byref carries only llvm.byref and llvm.align
since it does not assert exclusive ownership.
Preserve descriptor strides for iterator map bounds
Iterator map lowering keeps a stable array base pointer and describes the
selected element or section with bounds inside omp.iterator. For boxed arrays,
those bounds must use the byte stride stored for each descriptor dimension.
Use fir.box_dims for boxed iterator map bounds and pass through each
dimension's descriptor byte stride. This preserves non-contiguous
assumed-shape actuals while keeping unit element strides for non-box arrays.
[X86] Hoist getMOVriOpcode to X86InstrInfo.h and share it, NFC (#205187)
The x86 backend often needs to materialize potentially 64-bit immediates
into registers, and the logic to pick between the available opcodes
exists in 3 places at least. Move this to X86InstrInfo.h so we can share
it over the x86 backend without copying it.
An LLM did the refactoring.
[DebugInfo][CodeView] Resolve forward references to types without unique name (#203781)
In the following code:
```cpp
// header.h
typedef struct lua_State lua_State;
lua_State *getState();
// source.c
#include "header.h"
struct lua_State { int field; };
lua_State *getState() {
static lua_State state = {.field=42}; // make sure the type is emitted
return &state;
}
// main.cpp
extern "C" {
[16 lines not shown]
[RISCV][XCV] Add missing IsRV32 predicate to the XCVmac block (#205095)
The XCVmac instruction block was missing the `IsRV32` predicate that
every other XCV block already carries. `HasVendorXCVmac` on its own does
not require RV32, so `-mtriple=riscv64 -mattr=+xcvmac` could select
these RV32-only vendor instructions on RV64. Add `IsRV32` to the XCVmac
block to match the other XCV extensions and prevent selecting invalid
instructions on RV64.
Split out of #204879 at review request (one fix per PR).
Part of a CORE-V (XCV) series; see RFC:
https://discourse.llvm.org/t/rfc-core-v-xcv-support-for-cv32e40p-clang-builtins-xcvsimd-intrinsics-and-generic-auto-selection/91111
[CIR] Implement support for emitting label address constants (#203644)
The evalloop.c test in the llvm-test-suite single source tests contains
a static array that is initialized with the address of labels within the
enclosing function. This wasn't implemented in CIR.
This change adds an implementation. The constant emitter change was
trivial. We just needed to create a #cir.block_addr_info attribute.
However, using that attribute as an initializer for a global requires
some additional handling and special lowering for the initializer.
The goto solver also needed to be updated to consider uses of labels in
global initializers.
The test case here was copied over directly from classic codegen. The
original test has an additional test case for the difference between two
label addresses. Support for that case will be added in a future change.
Assisted-by: Cursor / claude-opus-4.8
[SystemZ] Add missing asserts requirement for pre-RA sched mir tests (#205403)
This is based off https://github.com/llvm/llvm-project/pull/188823 and
is needed because tests are failing in release (non-asserts) builds
[BasicAA] Allow some more recursion across GEPs/phis. (#205010)
Allow recursive base-object analysis for some GEPs. The new version
still retains some bail-outs to avoid some redundant work.
This has a notable impact across a large IR corpus (32k modules from
large set of C/C++ workloads).
Some of the highlights include:
aa.NumNoAlias +1.52%
aa.NumMayAlias −0.10%
licm.NumMovedLoads +20.47%
licm.NumHoisted +2.03%
early-cse.NumCSELoad +1.59%
SLP.NumVectorInstructions +0.86%
loop-vectorize.LoopsVectorized +0.21%
instcount.TotalInsts −0.05%
instcount.NumLoadInst −0.10%
[17 lines not shown]
[dsymutil] Use more portable way to compare timestamp (NFC) (#204680)
`find` on AIX does not support `-maxpath` option. This patch is
to use python to compare the `mtime` of the file/directory.
[SLP] Don't recognize rotated widened strided stores in analyzeRtStrideCandidate() (#204013)
These cases which are nearly strided stores are being incorrectly
recognized as strided stores. Fixes #204011
[lldb] Fix data race in Module::GetSectionList (#205226)
Module::GetSectionList built the section list (m_sections_up) without a
lock, so parallel module loading (e.g. crashlog.py's thread pool) could
race two builders on the unique_ptr and the SectionList vector, crashing
in AppleObjCRuntime::GetObjCVersion.
Build through ObjectFile::GetSectionList instead of locking the module
mutex and calling CreateSections directly: that path locks the object
file's section mutex before the module mutex, and the build can re-enter
it, so holding the module mutex across the build would invert the order
and risk a deadlock. The Module-level counterpart to a0176fd9dfc5.
rdar://180308581
[CIR] Lower byval/byref args in CallConvLowering
ArgKind::Indirect arguments were hitting an errorNYI in
CIRABIRewriteContext. Add the lowering: in the callee the block argument
type changes to !cir.ptr<T>, a load is inserted at entry so the body sees
the original value type, and llvm.byval or llvm.byref is attached based on
ownership. At call sites, both byval and byref are lowered by allocating a
stack slot, copying the value in, and passing the pointer.
For byval, llvm.noalias and llvm.noundef are also added -- llvm.noalias
because the call-site rewrite always produces a fresh alloca+store
(equivalent to -fpass-by-value-is-noalias), and llvm.noundef because the
copy is always fully defined. byref carries only llvm.byref and llvm.align
since it does not assert exclusive ownership.
Revert "[flang][cuda] Do not emit data transfer for constant read on the rhs" (#205394)
Reverts llvm/llvm-project#205185
this is making couple of downstream tests failing. Another approach is
needed
[llvm][option] Remove bitfield marshalling (#203051)
Marshaling of bitfield options is adding some extra complexity in the
form of extractors and mergers, but is now unused. This PR removes that
feature.