[libFuzzer] Fix -Wunused-variable when building with NDEBUG (#188301)
The variable `FuzzerInitIsRunning` is only used within `assert()`.
Follow up to #178342.
[lldb] Fix trace load hang (#187768)
#179799 removed the `SetPrivateState(eStateStopped)` call in
`ProcessTrace::DidAttach()`. This makes the call to
`WaitForProcessToStop` hang forever, causing the `trace load` command to
hang.
This fix reintroduces the `SetPrivateState` call so a postmortem trace
process will "stop" after being loaded, matching the logic used in
`Process::LoadCore()`.
[CFGuard] Consider function aliases as indirect call targets (#188223)
With vector deleting destructors, it's common to include function
aliases in vftables.
After #185653 it's become more likely that the alias gets overridden in
a different TU. It's therefore important that it's the alias itself that
goes in the control-flow guard table.
[IR] Allow non-constrained math intrinsics in strictfp functions
The current implementation of floating-point support uses two different
representations for each floating-point operation, such as `llvm.trunc`
and `llvm.experimental.constrained.trunc`. The main difference between
them is the presence of side effects that describe interaction with the
floating-point environment. Which of the two functions should be used is
determined by the enclosing function's attribute 'strictfp'. The
compiler does not check whether a regular functions, like `llvm.trunc`
is used in a strictfp function, so maintaining consistency is the user's
responsibility. It is easy to mistakenly use the regular,
side-effect-free intrinsic in a strictfp function, and even LLVM tests
contain examples of this.
If the variant of intrinsic is determined solely by the 'strictfp'
function attribute, the distinction between the two forms appear to be
redundant, and the regular form could be used in all cases. This would
require the compiler to deduce side effects from the function
attributes. In this scenario, floating-point operations would have
[17 lines not shown]
[RISCV] Merge Base Offset for SFB Pseudos (#187620)
This implements the Merge Base Offset pass for the SFB Load Pseudos.
These Pseudos are expanded after Merge Base Offset, so the pass needs to
handle them.
I also had to extend support in MergeBaseOffset to ensuring that ImmOp
could be a Constant Pool Index, which seemed to be supported in some
checks but not others.
[NVPTX] Fix assumption of sm versioning (#188282)
The test case in #188118 assumes sm-90 is always available, leading to a
crash
```
# | ptxas fatal : SM version specified by .target is higher than default SM version assumed
```
This PR updates the test case to follow the check specified in
`llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-reduce.ll`,
namely `%if ptxas-sm_90 && ptxas-isa-7.8`
[X86][NewPM] Mark X86AsmPrinter isRequired (#188278)
Otherwise the pass does not run when a function has the optnone
attribute, which means we get no assembly out for functions marked
optnone.
[CIR][NFC] Mark invalid-linkage.cir as XFAIL (#188279)
The invalid-linkage.cir test is currently failing as a result of a
recent change to the MLIR attribute parser. I am temporarily marking
this test as XFAIL while that problem is being worked on to unblock CIR
development. I added a check that will force the test to fail even after
the problem is fixed so that we don't start getting unexpected passes
when the fix is merged. (CIR testing isn't run during CI for MLIR
changes.) I will reenable the test after the problem has been fixed.
[AMDGPU][Uniformity][TTI] Make Uniformity Analysis Operand-Aware via Custom Uniformity Checks (#137639)
See: https://github.com/llvm/llvm-project/issues/131779
Extends uniformity analysis to support instructions whose uniformity
depends on which specific operands are uniform. Introduces
`InstructionUniformity::Custom` and a target hook `TTI::isUniform(I,
UniformArgs)` that allows targets to define custom uniformity rules.
During propagation, custom candidates are checked via the target hook.
If we can prove they are uniform, we skip marking them divergent and let
iterative propagation re-evaluate as operands change.
Implements AMDGPU's `llvm.amdgcn.wave.shuffle` rules (uniform when
either operand is uniform, divergent only when both are divergent) as
the motivating example.
This inverted-logic approach is critical for correctness: proving
uniformity early during propagation would be unsafe, as operands can
transition from uniform to divergent during divergence propagation.
[3 lines not shown]
AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter whihc will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
[Support] Use atomic counter in parallelFor instead of per-task spawning (#187989)
This function is primarily used by lld and debug info tools.
Instead of pre-splitting work into up to MaxTasksPerGroup (1024) tasks
and spawning each through the Executor's mutex+condvar, use an atomic
counter for work distribution. Only ThreadCount workers are spawned;
each grabs the next chunk via atomic fetch_add.
This reduces futex calls from ~31K (glibc, release+assertions build) to
~1.4K when linking clang-14 (191MB PIE with --export-dynamic) with
`ld.lld --threads=8` (each parallelFor spawned up to 1024 tasks, each
requiring mutex lock + condvar signal).
```
Wall System futex
glibc (assertions) before: 927ms 897ms 31K
glibc (assertions) after: 879ms 765ms 1.4K
mimalloc before: 872ms 694ms 25K
mimalloc after: 830ms 661ms 1K
```
[compiler-rt] Support unit tests for the GPU build (#187895)
Summary:
This PR enables the basic unit tests for builtins to be run on the GPU
architectures. Other targets like profiling are supported, but the
host-device natures will make it more difficult to adequately unit
test. It may be be possible to do basic tests there, to simply verify
that
counters are present and in the proper format for when they are copied
to the host.
[clang-tidy] Do not provide diagnostics for cert-dcl58-cpp on implicit declarations (#188152)
Do not provide diagnostics for cert-dcl58-cpp for compiler generated
intrinsic as it will be a false positive.
In provided tests compiler generates align_val_t which ends up inside
std namespace, resulting in std::align_val_t symbol. This symbol is
compiler generated, having no location, causing compiler crash. Also
there is no point to notify user about violations which user has no
control of.
Resolution: Diagnostics suppressed.
Co-authored-by: Vladislav Aranov <vladislav.aranov at ericsson.com>
[PowerPC] Fix some instruction sizes (#188227)
This fixes:
* PADDIdtprel: Lowers to PADDI8, which is prefixed.
* PATCHABLE_FUNTION_ENTER/PATCHABLE_RET: Handle xray sleds.
These came up when generalizing the instruction size verification
infrastructure.
[ADT] Add predicate based match support to StringSwitch (#188046)
This introduces `Predicate` and `IfNotPredicate` case selection to
StringSwitch to allow use cases like
```
StringSwitch<...>(..)
.Case("foo", FooTok)
.Predicate([](StringRef Str){ ... }, IdentifierTok)
...
```
This is mostly useful for improving conciseness and clarity when
processing generated strings, diagnostics, and similar.