[offload][OpenMP] Fix record replay when no memory is used
Progams that do not use any memory (e.g., no mappings) were failing because
we were trying to execute zero size transfers.
[clang][cmake] Disable exceptions for ASan runtime on Fuchsia (#204512)
Fuchsia's default runtime environment prefers no-exceptions. Compiling
the C++ slice of ASan (asan_new_delete.cpp) with exceptions introduces
dependencies on EH symbols
(__cxa_begin_catch, etc.) in libclang_rt.asan.so. This causes link
failures when linking ASan-enabled binaries with noexcept libc++abi.
Explicitly disable COMPILER_RT_ASAN_ENABLE_EXCEPTIONS for Fuchsia
targets in the stage2 cache.
[X86] Simplify duplicate MMO offset tracking in breakBlockedCopies (NFC) (#202904)
LMMOffset and SMMOffset in breakBlockedCopies/buildCopies/buildCopy were
both initialized to 0 and advanced in lockstep by identical amounts, so
they were always equal. Collapse them into a single Offset used for both
the load and store MachineMemOperands.
This also removes a latent typo: the final buildCopies call passed
LMMOffset for the store offset argument instead of SMMOffset. Since the
two were always equal this was harmless, and the unified Offset makes
the divergence unrepresentable.
Found via @jlebar's X86 LLVM bug hunt / FuzzX effort:
https://github.com/SemiAnalysisAI/FuzzX/blob/master/x86/bugs/042-sfb-buildcopies-wrong-mmo-offset/NOTES.md
cc @jlebar
[lldb][test] Speed up ProcessAttach test (#201530)
ProcessAttach is our slowest test and runs for about 70s. We spend 60s
in the autocontinue test waiting for the target program to terminate.
The reason we wait for the program is that our autocontinue test is not
running its command in async mode, and we wait after the attach for the
next breakpoint or the program terminates.
This patch makes the attach and autocontinue run in async mode so we
don't wait for the program to finish. This reduces the test time from
70s to about 10s.
It also replaces the assertTrue call that was supposed to be an
assertEqual, which made the test succeed even though the inferior
process already terminated.
[AArch64][GlobalISel] Select narrow G_INSERT_VECTOR_ELT GPR operands (#203568)
RegBankSelect currently extends narrow i8/i16 G_INSERT_VECTOR_ELT GPR
operands to 32-bits. Move this widening to pre-isel lowering. This will
help enable a simple fast pure type-based RBS alternative.
Assisted-by: codex
[libomp] Add kmp_vector (ADT 2/2) (#176163)
See rationale in the commit adding kmp_str_ref.
This commit introduces kmp_vector, a class intended primarily for small
vectors. It currently only includes methods I need at the moment, but
it's easily extensible.
AMDGPU: Remove xnack-any-only subtarget feature and handling
This reverts commit f4caa0a172d96597c375e6b6b2192c289723a6b9.
This feature was added to gfx12-5-generic only, which does not make
sense given that both gxf1250 and gfx1251 have the same unconditional
xnack handling. It also does not make sense to diagnose trying to use
a specific xnack mode on the generic target only, and only from the
backend.
The current feature management is a confusing mess, given that we have
2 parallel feature systems. AMDGPUTargetParser has a table containing
a bitmask of features, which already contained FEATURE_XNACK_ALWAYS
for gfx1250/gfx1251, but not gfx12-5-generic. Add this handling there
so the sanitizer detection is consistent on the generic target.
These 2 feature tables probably should be unified in some way. We also
probably should have a subtarget feature for the xnack handling, but it
should be inverted. xnack-any-only is an antifeature, in that it removes
[2 lines not shown]
AMDGPU: Remove xnack-any-only subtarget feature and handling
This reverts commit f4caa0a172d96597c375e6b6b2192c289723a6b9.
This feature was added to gfx12-5-generic only, which does not make
sense given that both gxf1250 and gfx1251 have the same unconditional
xnack handling. It also does not make sense to diagnose trying to use
a specific xnack mode on the generic target only, and only from the
backend.
The current feature management is a confusing mess, given that we have
2 parallel feature systems. AMDGPUTargetParser has a table containing
a bitmask of features, which already contained FEATURE_XNACK_ALWAYS
for gfx1250/gfx1251, but not gfx12-5-generic. Add this handling there
so the sanitizer detection is consistent on the generic target.
These 2 feature tables probably should be unified in some way. We also
probably should have a subtarget feature for the xnack handling, but it
should be inverted. xnack-any-only is an antifeature, in that it removes
[2 lines not shown]
[lld-macho] Ignore labels on sections ld64 treats as ignoreLabel (#194275)
In ld64, labels on records in some sections never become named atoms and
never enter the symbol table:
- Unconditionally: __cfstring, __objc_classrefs, and __objc_selrefs
- Prefix-gated on `L`/`l`: __literal{4,8,16} and __cstring-family
sections such as __objc_methname
LLD, however, ran every such label through `SymbolTable::addDefined`,
which diverged from ld64 whenever an identically-named symbol appeared
in another section. This patch mirrors ld64's behavior in LLD. The
Defined is still created for the affected labels, but it bypasses the
symbol table entirely and cannot collide with any cross-TU symbol.
I have encountered a few link failures caused by this, and reduced them
into the regression tests in the patch.
[RISCV] Fix the AST type printing code for VectorKind::RVVFixedLengthMask_1/2/4 (#204498)
These types have a fixed size of 1, 2, 4. The formula used for the other
types does not apply.
Assisted-by: Claude
[clang-cl][test] Use /Zs to avoid writing unnecessary output files (#204501)
#194779 adds a test clang/test/Preprocessor/init-datetime-macros.c which
verifies some diagnostics. However, it does so with `/c`, which will
unnecessarily generate an output, and when run on a build system that
does not run tests in a writeable dir by default, will cause the test to
fail.
Since we don't care about the resulting object file, use `/Zs`
(equivalent of `-fsyntax-only`) to check the diagnostics but not produce
any output files.
[offload][OpenMP] Fix record replay when no memory is used
Progams that do not use any memory (e.g., no mappings) were failing because
we were trying to execute zero size transfers.
[RFC][BOLT] Add a new parallel DWARF processing(2/2) (#197859)
This PR implements a new parallel DWARF debug info processing pipeline
for BOLT that significantly speeds up `--update-debug-sections` for
large binaries. It is the second part of the split from the overall RFC
changes
RFC - [[RFC][BOLT] A New Parallel DWARF Processing Approach in
BOLT](https://discourse.llvm.org/t/rfc-bolt-a-new-parallel-dwarf-processing-approach-in-bolt/90736)
(The overall changes.)
This PR does the following:
1. **Equivalence-class CU partitioning:** Replaces batchsize grouping
with union-find over DW_FORM_ref_addr references. Connected CUs share a
bucket; isolated CUs become singletons.
> For the non-LTO case, CUs have no cross-CU dependencies, so each CU is
placed into its own singleton bucket and processed fully in parallel.
> For the LTO case, CUs with cross-CU dependencies are grouped into the
same bucket and processed sequentially within that bucket, while
[7 lines not shown]
[AMDGPU] Keep i64 carry chains on VCC when feeding VALU users
This PR fixes an issue where ISel could mix scalar and vector carry chains when
lowering widened integer add/sub operations. A scalar-looking i64 carry producer
may feed a divergent carry consumer, so ISel now keeps that carry chain on VCC
to avoid invalid MIR.
[LoongArch] Combine FP_TO_UINT/FP_TO_SINT with [X]VFTINTRZ instruction (#201569)
Combine double conversion to signed 32-bit integer with
`[X]VFTINTRZ_W_D` instructions.
There are three cases:
1. For VT smaller than i32, we promote it to i32 then truncate to the
final result.
2. For `fptoui double to i32`, we convert it to `fptosi double to i64`
then truncate, avoid doing so with LASX enabled because we already have
the corresponding pattern in TableGen.
3. Last, for `fptosi double to i32`, we'll split them into blocks
(128-bit or 256-bit depending on whether LASX is enabled or not) and
then feed them into `[X]VFINTRZ_W_D` instructions, we using the XV
version, a shuffle is need because of the data layout is per 128-bit
lane.
[LoopInterchange] Reject if inner loop header has duplicate successors (#204128)
Previously, loop interchange crashed in several cases where the inner
loop header had duplicate successors. In practice, the following was
happening:
- During the transformation phase, the inner loop header was not split
because its first non-PHI instruction was its terminator.
- `updateSuccessor` was called on the header with `MustUpdateOnce=true`,
which triggers an assertion failure.
This patch fixes the issue by rejecting such cases during the legality
check phase. I believe this situation is rare, so it should not
significantly affect real-world cases.
Fix #203887.