[test][LowerTypeTests] Re-generate jump table tests with --check-globals (#192734)
Debug information will be updated in the
https://github.com/llvm/llvm-project/pull/192736,
so we want to track the difference.
Prevent undefined behavior caused by combination of branch and load delay slots on MIPS1 (#185427)
Under certain conditions the LLVM `MipsDelaySlotFiller` fills a branch
delay slot with an instruction requiring a load delay slot. However the
`MipsDelaySlotFiller` does not check the filled instruction for hazard
which leads to code like this:
```asm
beqz $1, $BB0_5
lbu $2, %lo(_RNvCs5jWYnRsDZoD_3app13CONTROLLERS_A)($2)
# --- Some other instructions
$BB0_5:
andi $1, $2, 1
```
`lbu` got moved into the branch delay slot but has a load delay slot -
so when jumping to `$BB0_5` the value for `$2` will not be ready, which
leads to undefined behavior.
This PR suggests to declare instructions with a load delay slot to be
hazardous for the branch delay slot, only for `MIPS1`. This will prevent
[21 lines not shown]
[ObjC] Fix missing ptrauth signing of isa in constant ObjC literals (#191091)
154d2267b897 added support for emitting ObjC number, array, and
dictionary literals as constants, but did not sign the class pointer
fields in NSConstantIntegerNumber, NSConstantFloatNumber,
NSConstantDoubleNumber, NSConstantArray, and NSConstantDictionary
structs with the ObjCIsaPointers ptrauth schema on arm64e. Fix this by
using addSignedPointer instead of add when emitting those fields.
rdar://174359070
[clang] implement CWG2064: ignore value dependence for decltype
The 'decltype' for a value-dependent (but non-type-dependent) should be known,
so this patch makes them non-opaque instead.
This patch also implements what's neceessary to allow overloading
on pure differences in instantiation dependence, making `std::void_t`
usable for SFINAE purposes.
This also readds a few test cases from da98651, which was a previous attempt
at resolving CWG2064.
Fixes #8740
Fixes #61818
Fixes #190388
[clang][riscv] Add tests for __builtin_reduce_X support [NFC] (#193082)
It turns out we already support use of the __builtin_reduce_ family of
builtins on the builtin RVV types, but we have no test coverage which
demonstrates this.
Note that __builtin_reduce_mul is a bit of a cornercase as currently the
clang part works just fine, but the lowering will crash since we don't
have a vredprod-esq instruction. (See
https://github.com/llvm/llvm-project/pull/193094 for the lowering fix.)
[offload] Remove unnecessary extra allocations in kernel replay tool (#193108)
The tool had two extra allocations holding the device memory and
globals. Apparently, the AMDGPU plugin failed in the past to transfer
data from the file memory mapping, and required these extra buffers.
After testing it on MI300A and MI250X, this issue is not present
anymore. Thus, we are removing them for now.
[SLP] Normalize copyable operand order to group loads for better vectorization
When building operands for entries with copyable elements, non-copyable
lanes may have inconsistent operand order (e.g., some lanes have
load,add while others have add,load for commutative ops). This prevents
VLOperands::reorder() from grouping consecutive loads on one side,
degrading downstream vectorization.
Normalize in two steps during buildOperands:
1) Majority voting: swap lanes that are the exact inverse of the
majority operand-type pattern.
2) Load preference: if the majority pattern has loads at OpIdx 1
(strict majority), swap to put loads at OpIdx 0, enabling
vector load + copyable patterns.
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/189181