Revert "[Allocator] Keep bump pointer at a minimum alignment" (#205091)
Arithmetic on nullptr is UB and gets flagged by UBSan.
Reverts llvm/llvm-project#203718
[mlir][x86] Fail on missing read source operation (#205077)
Adds an extra check to AMX lowering to fail gracefully when a source
operation for contraction input data is not found.
[flang][debug] Add fake use ops for dynamic array dimension variables (#200061)
In cases where the upper or lower bounds of a dynamic array are not
explicitly referenced in code, flang can optimize away the internal
variables that represent these values. This causes missing values to
appear in the debugger when examining the dynamic array's type. Adding
an llvm.fake.use op for each bound preserves it for use by a debugger,
similar to the fix for #185432.
Resolves #119474
[AMDGPU][NFC] Templatise and roundtrip gfx13_asm_vop3_dpp16.s (#204849)
Again, this is based on the templatised version of
gfx12_asm_vop3_dpp16.s with the GFX13-specific changes re-applied on top
of it.
gfx13_dasm_vop3_dpp16.txt was never upstreamed, so no changes for the
disassembler side.
[AMDGPU][NFC] Templatise and roundtrip gfx12_asm_vop3_dpp16.s (#203953)
This is effectively the changes between the non-template versions of
gfx11/12_asm_vop3_dpp16.s applied on top of the templatised
gfx11_asm_vop3_dpp16.s.
[orc-rt] Default QueueingRunner to a synchronized queue. (#205088)
Adds orc_rt::detail::SynchronizedDeque<T>, a mutex-protected deque whose
pop_front / pop_back return std::optional<T> (std::nullopt indicates the
queue is empty), and makes it QueueingRunner's default WorkQueue type.
Using a synchronized queue type allows QueueingRunner to be used in
multi-threaded contexts.
Updates SessionTest and InProcessControllerAccessTest to use the new
default, and extends QueueingRunnerTest to cover the new contract.
Reapply "[AMDGPU] Add compiler-rt checks for the GPU runtime" (#204898)
The original issue should've been solved by
https://github.com/llvm/llvm-project/pull/204694
Reverts llvm/llvm-project#204370
clang: Use the effective triple string for offload jobs (#205065)
Track the future effective triple for the job, rather than
the toolchain's default triple. In the future this will
change the result when amdgpu starts adjusting the triples
to contain subarches.
[SLP][Revec] Fix SLP crash when trying to fold trailing scalars into the reduced value (#203477)
Cost modelling change introduced by commit SHA f15666d is enabling revec
of test shown in the above issue. This is introducing a crash as the
reduced value(a scalar) is being added to a vector tail value.
Patch tries to address this.
Fixes #203195
AMDGPU/GlobalISel: Switch to extended LLTs
With minimal changes. Most notably because of changes to jumptable in isel
GIM_SwitchType requires explicit integer/float types and does not match scalar.
In most places change is in lowering to use LLT::integer or LLT::float.
Other changes:
- replaceRegWith can also change type on Dst register, this can cause CSE data
corruption (fix is to notify observer)
- mixed i32/f32 in G_MERGE_VALUES/G_UNMERGE_VALUES, common in legalizing
ray tracing and image intrinsics
- need extra bitcast between i32/f32 in some place
clang/AMDGPU: Use effective triple instead of raw toolchain triple (#205054)
Start using the effective triple instead of the raw toolchain triple.
For the moment this is NFC, but will change when new uses of the subarch
field are introduced.
[LifetimeSafety] resolved lifetimeBound violation in constructor (#204797)
Fix https://github.com/llvm/llvm-project/issues/203839.
Constructor body does not produce `ReturnEscapeFact`, but a constructor
parameter marked [[clang::lifetimebound]] may still be valid if it
escapes into a field of the constructed object.
Update the `LifetimeBound` logic to accept `FieldEscapeFact` for
constructors, and add a test case for this pattern.
[NFC][analyzer] Eliminate some simple node builders (#204187)
Replace four `NodeBuilder`s with direct use of the `makeNode` method
family. This is part of my commit series that simplifies the engine by
gradually eliminating the class `NodeBuilder`.
[DA] Add overflow check before calculating absolute value (#201964)
In findGCD, we call APInt::abs, which can overflow when the value is a
signed minimum one. Such overflow can lead miscompilation, so this patch
adds an overflow check for absolute value calculations and bail out
early if it actually overflows.
Fix #201559.
[clang][bytecode] Check for block pointers in Free() (#205043)
We need a block pointer here for the following operations, and non-block
pointers aren't valid anyway.
AMDGPU/GlobalISel: Remove -new-reg-bank-select option (#203929)
AMDGPU's -global-isel pipeline that uses AMDGPURegBankSelect and
AMDGPURegBankLegalize, previously -global-isel -new-reg-bank-select,
is now the default -global-isel pipeline.
Remove -new-reg-bank-select option from the compiler.
Remove -new-reg-bank-select from all llvm regression tests.
Edit a couple comments to reference RegBankLegalize instead of
-new-reg-bank-select.
[Allocator] Keep bump pointer at a minimum alignment (#203718)
Add a `MinAlign` template parameter (default 8, sizeof(size_t) on 64-bit
platforms) so that the common case `Alignment <= MinAlign` can skip
realigning `CurPtr`.
This is achieved by rounding each allocation's size up to MinAlign, so
the bump pointer stays MinAlign-aligned between allocations.
SpecificBumpPtrAllocator::DestroyAll() walks objects at a fixed
sizeof(T) stride and needs tight packing, so it uses MinAlign=1. (alignof(T) would
pack just as tightly and reuse the default instantiation, but T may be
incomplete here, e.g. `SpecificBumpPtrAllocator<MCSectionELF>`.)
Its `Allocate` still skips the realign: the slab is max_align_t-aligned
and every size is a multiple of alignof(T), so the bump pointer stays
alignof(T)-aligned and we can just request alignment 1. Over-aligned
types (alignof(T) > alignof(max_align_t)) keep requesting alignof(T).
[5 lines not shown]
AMDGPU/GlobalISel: Remove -new-reg-bank-select option
AMDGPU's -global-isel pipeline that uses AMDGPURegBankSelect and
AMDGPURegBankLegalize, previously -global-isel -new-reg-bank-select,
is now the default -global-isel pipeline.
Remove -new-reg-bank-select option from the compiler.
Remove -new-reg-bank-select from all llvm regression tests.
Edit a couple comments to reference RegBankLegalize instead of
-new-reg-bank-select.
Revert "[lldb] Survive ptrace(PT_DENY_ATTACH) when attaching" (#205075)
Reverts llvm/llvm-project#204688
This breaks green dragon where the error message is `error: attach
failed: this is a non-interactive debug session, cannot get permission
to debug processes.`
AMDGPU/GlobalISel: Use AMDGPURegBankSelect + AMDGPURegBankLegalize by default (#203928)
AMDGPU/GlobalISel: Use AMDGPURegBankSelect + AMDGPURegBankLegalize by default
Change AMDGPU's default -global-isel pipeline to use AMDGPURegBankSelect
and AMDGPURegBankLegalize (previously -global-isel -new-reg-bank-select)
by default instead of RegBankSelect which uses AMDGPURegisterBankInfo.
-global-isel pipeline that used RegBankSelect/AMDGPURegisterBankInfo is
now deprecated, since it could not generate functionally correct code in
some cases involving divergent control flow and phis.
-new-reg-bank-select option does nothing and will be removed in followup
patch.
Delete regbankselect-mui.ll and regbankselect-mui-salu-float.ll, which
existed to compare the -global-isel vs -global-isel -new-reg-bank-select.
Temporarily disable a couple of tests that are missing AMDGPURegBankLegalize
support.