[CodeGen][InstCombine][Sanitizers] Emit lifetimes when compiling with memtag-stack (#177130)
Currently we do not emit lifetimes by default when compiling with
memtag-stack - which means we don't catch use-after-scope (when
compiling without optimization).
This patch fixes that by mirroring ASan, HWASan and MSan, and always
emitting lifetime markers. The patch is based on the changes made in
aeca569.
rdar://163713381
AMDGPU: Change ABI of 16-bit element vectors on gfx6/7
Fix ABI on old subtargets so match new subtargets, packing
16-bit element subvectors into 32-bit registers. Previously
this would be scalarized and promoted to i32/float.
Note this only changes the vector cases. Scalar i16/half are
still promoted to i32/float for now. I've unsuccessfully tried
to make that switch in the past, so leave that for later.
This will help with removal of softPromoteHalfType.
AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7
Keep bf16/f16 values encoded as the low half of a 32-bit register,
instead of promoting to float. This avoids unwanted FP effects
from the fpext/fptrunc which should not be implied by just
passing an argument. This also fixes ABI divergence between
SelectionDAG and GlobalISel.
I've wanted to make this change for ages, and failed the last
few times. The main complication was the hack to return
shader integer types in SGPRs, which now needs to inspect
the underlying IR type.
GlobalISel: Fix mishandling vector-as-scalar in return values
This fixes 2 cases when the AMDGPU ABI is fixed to pass <2 x i16>
values as packed on gfx6/gfx7. The ABI does not pack values
currently; this is a pre-fix for that change.
Insert a bitcast if there is a single part with a different size.
Previously this would miscompile by going through the scalarization
and extend path, dropping the high element.
Also fix assertions in odd cases, like <3 x i16> -> i32. This needs
to unmerge with excess elements from the widened source vector.
All of this code is in need of a cleanup; this should look more
like the DAG version using getVectorTypeBreakdown.
[X86] computeKnownBitsForTargetNode - add basic X86ISD::BZHI handling (#177347)
Currently limited to constant masks, if the mask (truncated to i8) if
less than the bitwidth then it will zero the upper bits.
So far it mainly just handles BZHI(X,0) -> 0 and BZHI(C1,C2) constant
folding.
All the BMI node combines seem to just call SimplifyDemandedBits - so
I've merged them into a single combineBMI.
[NFCI][AMDGPU] Remove more redundant code from `GCNSubtarget.h` (#177297)
We are getting pretty close to use `GET_SUBTARGETINFO_MACRO` in the
header with this cleanup.
[InstCombine] Propagate profiles when folding addrscast through loads (#177214)
#176352 introduced a new fold and a new test for this functionality.
Given the select condition is the same before and after, we can
propagate any profile information that may be attached to the select
instruction. We should not need to explicitly drop any metadata off the
select.
[Clang] Rename `uinc_wrap` and add normal atomic builtin (#177253)
Summary:
The `__scoped_atomic` builtins are supposed to match the standard
GNU-flavored `__atomic` builtins. We added a scoped builtin without a
corresponding standard one before the fork so this should be added in
the release candidate. These were originally added in
https://github.com/llvm/llvm-project/pull/168666
Also, the name `uinc_wrap` does not follow the naming convention. The
GNU atomics use `fetch_xyz` to indicate that the builtin returns the
previous location's value as part of the RMW operation, which these do.
This PR renames it and its uses.
[CoroFrame][NFC] Create more helper functions for insertSpills (#177149)
This allows us to delete some variables and simplify the core loop of of
insertSpills.
AMDGPU: Select VGPR MFMAs by default (#159493)
AGPRs are undesirable since they are only usable by a
handful instructions like loads, stores and mfmas and everything
else requires copies to/from VGPRs. Using the AGPR form should be
a measure of last resort if we must use more than 256 VGPRs.
[SPIRV] Unify unsized array handling for AMGCN flavoured SPIR-V (#175848)
Currently we handle 0-sized arrays in multiple places, non-uniformly,
either via `SPIRVLegalizeZeroSizeArrays` or via `SPIRVPrepareGlobals`.
For AMDGCN flavoured SPIR-V we have a singular, simpler solution: set
all 0-sized arrays to be `UINT64_MAX` sized. This is an unambiguous
token that we can use during reverse translation to restore the intended
0 size.
[CIR][NFC] Move ABI lowering of dynamic_cast to CXXABILowering (#176931)
This patch moves the ABI lowering for `dynamic_cast` from
LoweringPrepare to the new CXXABILowering pass. This effectively removes
ABI lowering code away from LoweringPrepare, thus the patch also removes
the LoweringPrepareCXXABI classes and files.
Related to #175968 .
[lldb] Fix crash when there is no compile unit. (#177278)
The crash occurred in lldb-dap when we are in a shared library with no
debug information and we are trying to get the expression path for an
address.
[Clang][CIR] Implement CIRGen logic for __builtin_bit_cast
NOTE: This patch merely upstreams code from
* https://github.com/llvm/clangir.
This Op was originally implemented by Sirui Mu in #762 Further
modification were made by other ClangIR contributors.
co-authored-by: Sirui Mu <msrlancern at gmail.com>
[CIR] Add cir.libc.memcpy Op (#176781)
The operation is a 1:1 mapping to libc's memcpy.
NOTE: This patch upstreams code from
* https://github.com/llvm/clangir.
This Op was originally implemented by Vinicius Couto Espindola
in https://github.com/llvm/clangir/pull/237. Further
modifications were made by other ClangIR contributors.
Co-authored-by: Vinicius Couto Espindola <vini.couto.e at gmail.com>
AMDGPU: Select VGPR MFMAs by default
AGPRs are undesirable since they are only usable by a
handful instructions like loads, stores and mfmas and everything
else requires copies to/from VGPRs. Using the AGPR form should be
a measure of last resort if we must use more than 256 VGPRs.