[libc++] P3798R1: The unexpected in std::expected (#204826)
Closes #204394
Implements P3798 and related tests.
Applies the paper as a Defect Report per https://wg21.link/P3798/github.
[AMDGPU][DAGCombiner] Fix UADDO/USUBO_CARRY carry-out miscompile and remove redundant AMDGPU combine (#204362)
performAddCarrySubCarryCombine in SIISelLowering folded:
uaddo_carry((x+y), 0, cc) -> uaddo_carry(x, y, cc)
usubo_carry((x-y), 0, cc) -> usubo_carry(x, y, cc)
Both produce the same value but differ in carry-out when x+y (or x-y)
wraps. The fold was missing a !N->hasAnyUseOfValue(1) guard, giving
wrong carry values to consumers. E.g. x=0xFFFFFFFF, y=1, cc=0:
original: ((x+y) mod 2^32 + cc) >= 2^32 = 0 (correct)
folded: (x+y+cc) >= 2^32 = 1 (wrong)
The generic visitUADDO_CARRY (DAGCombiner.cpp) already handles
the UADDO_CARRY/ADD fold with the correct guard. Since target combines
fire before generic ones, the AMDGPU ADD arm was a buggy duplicate.
The USUBO_CARRY/SUB arm is produced by AMDGPU's performAddCombine
which converts add(sub(v,a), sext(cmp)) -> usubo_carry(sub(v,a), 0,
[12 lines not shown]
[mlir][memref] Add SCFDialect dependency to RuntimeOpVerification (#205241)
Explicitly load SCFDialect as a dependent dialect in
RuntimeOpVerification to avoid unregistered dialect errors when
generating `scf.if`/`scf.yield` ops. Fixes #204295.
[orc-rt] Add MacroUtils.h header for general purpose macros. (#205337)
For now just contains ORC_RT_DEPAREN, a macro for stripping parentheses
from its argument. This will be used in an upcoming commit.
Reland "[clang][ssaf][NFC] Make SSAFOptions available in Builders and Extractors" (#205334)
The original version of this was reverted part of #205279 because I
didn't know if this or the other patch caused the Windows build
failures. It turns out this patch is fine. I'm relating this now.
---
Now that we have SSAFOptions, it would make it a lot more ergonomic if
it was accessible from builders and extractors.
This PR does exactly that.
Part of rdar://179151023
Co-authored-by: Jan Korous <jkorous at apple.com>
Co-authored-by: Claude Opus 4.7 <noreply at anthropic.com>
[AArch64] Add flag to conditionally write FPMR (#203911)
Add a AArch64 codegen flag to make llvm.aarch64.set.fpmr avoid writing
FPMR when it already contains the requested value.
By default, llvm.aarch64.set.fpmr continues to lower directly to an MSR
FPMR instruction. With -aarch64-conditional-fpmr-write, the backend
lowers the intrinsic to an MRS/MSR conditional branch sequence.
This is based on the initial implementation from:
https://github.com/llvm/llvm-project/pull/114248
However this PR keeps the conditional FPMR write sequence behind a
codegen flag. One reason to change the codegen lowering is because GCC
emits the conditional branch sequence unconditionally. LLVM preserves
the existing direct MSR lowering by default.
[llvm-offload-binary] Add `member` key to single out archive members (#205170)
Summary:
Currently, archives offer three approaches.
1. `--archive` which takes an archive and puts all the output in a new
archive
2. No filename, which outputs based on the member names
3. Filename, which just matches everything.
This has a gap for when people wnat a single file without relying on
implicit naming that dumps all the contents to the CWD.
This PR adds `member` which lets you specify the member names as you
would get from `ar t libfoo.a` for this.
[Clang] Accept 'noconvergent' attributes outside of CUDA (#205247)
Summary:
There is no reason that `convergent` should be a generic attributes but
not `noconvergent`.
[LLVM][Runtimes] Forward 'LLVM_LIBDIR_SUFFIX' to runtimes by default (#205182)
Summary:
This option controls the logical path of the installed libraries. The
runtimes obften reach into libraries, or want to install to the same
location as the main build. Previously you had to set this per-runtime,
but we should likely forward it by default.
Fixes: https://github.com/llvm/llvm-project/issues/159762
Reapply "[InstCombine] Merge consecutive assumes" (#205177) (#205324)
The crash was caused by using `getOperandBundle` for an assume, which
requires that the operand bundles are unique. This isn't guaranteed by
assume bundles. This patch adds `hasOperandBundle` instead, which
doesn't have the same constraint.
Original message:
This should make assumes a bit more efficient, since it removes a few
instructions. This should also help with optimizations that are
limited in how many instructions they step through.
This reverts commit 3f0ef1efb26206c3f5d5621d86d740c7f466c67b.
AMDGPU: Use module flags to control xnack and sramecc
This ensures these ABI details are encoded in the IR module
rather than depending on external state from command-line flags.
Previously, these were encoded as function-level subtarget features.
The code object output was a single target ID directive implied
by the global subtarget. The backend would previously check if a
function's subtarget feature mismatched the global subtarget. This
is avoided by making xnack and sramecc module-level properties from
the start. This also provides proper linker compatibility
enforcement, moving the error point earlier.
The old encoding was also an abuse of the subtarget feature system.
Subtarget features are a bitvector, and later features in the string
can override earlier ones. The old handling added a special case
where explicit settings were preserved: ordinarily +feature,-feature
should result in the feature being disabled, but +xnack,-xnack would
preserve the explicit "-xnack" state, which differs from the absence
of any xnack setting.
[25 lines not shown]
[lldb] Fix race/timeout in TestInternalThreadSuspension (#203202)
This test launches a thread and then waits for a signal from the
launched thread. Below is one possible interleaving, where the
`pthread_cond_signal` (2) wins the race and becomes a no-op while (3) is
locking until the test times out.
```
void *
suspend_func (void *unused) {
[...]
// 2. Created thread reaches this and signals.
pthread_cond_signal(&signal_cond);
[...]
}
int main() {
pthread_mutex_lock(&signal_mutex);
[11 lines not shown]
[OpenMP] Remove unused isStrictSubset template (NFC) (#202987)
The `isStrictSubset` `ArrayRef<T>` template has no callers, so it never
instantiates and trips `-Wunused-template`. The `VariantMatchInfo`
overload does the work that's actually used, and `isSubset` stays
untouched. Removing the dead template.
NFC.
Part of #202945.
[orc-rt] Split AllocAction tests by SPS dependency. (#205322)
Rewrites AllocActionTest.cpp's integration tests (RunBasicAction,
RunFinalize*) to drive AllocActionFunction::handle with a small local
IntPtrDeserializer / IdentitySerializer pair instead of going through
SPS, and moves the existing SPS-using AllocAction tests into
SPSAllocActionTest.cpp.
Also adds two new SPS tests covering previously-uncovered paths:
- RunActionWithSPSArgsAndWFBReturn — SPS argument deserialization plus
AllocActionSPSSerializer's identity (WrapperFunctionBuffer) overload.
- RunActionWithUndecodableArgs — the deserialization-failure path in
AllocActionFunction::handle.
After the split, an AllocActionTest failure indicates problems with the
AllocAction machinery, and an SPSAllocActionTest failure without a
corresponding AllocActionTest failure indicates an SPS encoding /
decoding issue for AllocAction.
[LLVM][CodeGen] Remove +bf16 for ARM/AArch64 tests that don't strictly need the feature flag. (#204199)
Tests that use bfloat purely as an opaque datatype should not use
instructions that require the bf16 feature.
AMDGPU: Rename AMDGPUTargetID to TargetID (#205269)
The AMDGPU prefix is redundant with the namespace.
Co-Authored-By: Claude <noreply at anthropic.com>
[clang][X86] Add constexpr support for mpsadbw128/256 intrinsics (#202257)
Enable constexpr evaluation for `_mm_mpsadbw_epu8` and
`_mm256_mpsadbw_epu8` (`__builtin_ia32_mpsadbw128`/`mpsadbw256`).
Fixes #157522.
[SystemZ] Enable liveness reduction in pre-RA sched strategy. (#188823)
Add some handling of register pressure by scheduling an SU "low" if it closes a
live range (under certain conditions).
As this is checked before latency reduction, the "data-sequnces" check that was
used to selectively enable latency reduction can now be removed.
This gives good improvements on several benchmarks and is also a simplification
of the SystemZPreRASchedStrategy.