[clang][deps] Don't treat the 'P1689' output format specially (#182069)
This patch essentially reverts 62fec3d2 which was an NFC, and replaces
the remaining check for the output format with more generic scanning
service option that controls whether we report absolute or relative file
paths. Along with #182063 this makes the scanner implementation entirely
independent of the desired output format.
[clang] Add missing support for traversal kind in addMatcher overloads (#170953)
This was noted in #170540, and seems to simply be an oversight. This
patch just add the same logic already used in other addMatcher()
implementations that honor the traversal kind.
Fixes #179386.
[NFC] [HWAsan] Run UTC on hwasan tests (#181437)
```
for x in $(grep -l 'UTC' llvm/test/Instrumentation/HWAddressSanitizer/**/*.ll); do
llvm/utils/update_test_checks.py --opt-binary build/bin/opt $x; done
```
[libc][math] Refactor fmax family to header-only (#182165)
Refactors the fmax math family to be header-only.
Closes https://github.com/llvm/llvm-project/issues/182164
Target Functions:
- fmax
- fmaxbf16
- fmaxf
- fmaxf128
- fmaxf16
- fmaxl
[libc] Fix RPC server with independent thread scheduling (#182211)
Summary:
The NVIDIA ITS protocol allows lanes to diverge inside of a warp. We
previously had contingencies around this, but there were cases where
issues would still show up under highly stressed usage.
The rules state that as long as the PC is the same, threads can
reconverge. This means that we can see a 'convergent' warp even when
they took completely divergent paths to get there. This resulted in the
'index' value in the RPC port lookup loop thinking we were in a
convergent group while all the indices were different. Fix this with a
broadcast to force the expected behavior
Additionally, we did not force that the threads were actually done with
their 'work_fn'. If the work included something that caused divergence
the other threads could continue and toggle the mailbox, resulting in
the server seeing unfinished work. Fix this with an explicit sync and
have one thread do it.
Add a test to make sure this actually works.
[libc] Improve GPU allocator lane usage and fences (#182388)
Summary:
Improves performance on the GPU allocator. First, we can use `uniform`
as our mask value when we obtain a slab. Because this is guarnateed
uniform we can safely treat it as our mask. This also improves the
behavior on NVIDIA's ITS.
Secondly, we do not actually need acquire / release fences on the
bitfield. These are one-to-one interfaces and the malloc / free
interface provides the necessary happens-before context. The only fences
that matter are the lifetime management for the guard pointer.
[LoopUnroll] Fix freqs for unconditional latches: N>2, uniform
This patch introduces the command-line option
`-unroll-uniform-weights`. When computing probabilities for the
remaining N conditional latches in the unrolled loop after converting
some iterations' latches to unconditional, LoopUnroll now supports the
following three strategies:
- A. If N <= 2, use a simple formula to compute a single uniform
probability across those latches.
- B. Otherwise, if `-unroll-uniform-weights` is not specified, apply
the original loop's probability to all N latches and then, as
needed, adjust as few of them as possible.
- C. Otherwise, bisect the range [0,1] to find a single uniform
probability across all N latches. This patch implements this
strategy.
An issue with C is that it could impact compiler performance, so this
patch makes it opt-in. Its appeal over B is that it treats all
[5 lines not shown]
[LoopUnroll] Fix freqs for unconditional latches: N>2, fast
This patch extends PR #179520 to the N > 2 case, where N is the number
of remaining conditional latches. Its strategy is to apply the
original loop's probability to all N latches and then, as needed,
adjust as few of them as possible.
[Hexagon] Fix UB from left shift of negative in SelectSHL (#181235)
Perform left shifts in unsigned arithmetic in SelectSHL to avoid
undefined behavior when the multiply constant is negative. The result is
cast back to int32_t for the subsequent isInt<9> range check.
Also fix a similar issue where shifting 1 by up to 31 could shift into
the sign bit.
This fixes the UBSan issue in ISelDAGtoDAG:
/local/mnt/workspace/bcain-20260212_105837/llvm-project/llvm/lib/Target/Hexagon/HexagonISelDAGToDAG.cpp:597
:44: runtime error: left shift of negative value -1431655765
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
/local/mnt/workspace/bcain-20260212_105837/llvm-project/llvm/lib/Target/Hexagon/HexagonISelDAGToDAG.cpp:597:44
in
[libc][math] Refactor ceil family to header-only (#182121)
Refactors the ceil math family to be header-only.
Closes https://github.com/llvm/llvm-project/issues/182120
Target Functions:
- ceil
- ceilbf16
- ceilf
- ceilf128
- ceilf16
- ceill
Extract many changes into descendant PRs
New commit log:
[LoopUnroll] Fix freqs for unconditional latches: N<=2
As another step in issue #135812, this patch fixes block frequencies
when LoopUnroll converts a conditional latch in an unrolled loop
iteration to unconditional. It thus includes complete loop unrolling
(the conditional backedge becomes an unconditional loop exit), which
might be applied to the original loop or to its remainder loop.
As explained in detail in the header comments on the
fixProbContradiction function that this patch introduces, these
conversions mean LoopUnroll has proven that the original uniform latch
probability is incorrect for the original loop iterations associated
with the converted latches. However, LoopUnroll often is able to
perform these corrections for only some iterations, leaving other
iterations with the original latch probability, and thus corrupting
[19 lines not shown]
[lldb] Batch breakpoint step-over for threads stopped at the same site (#180101)
Following up from
https://discourse.llvm.org/t/improving-performance-of-multiple-threads-stepping-over-the-same-breakpoint/89637
When multiple threads are stopped at the same breakpoint, LLDB currently
steps each thread over the breakpoint one at a time. Each step requires
disabling the breakpoint, single-stepping one thread, and re-enabling
it, resulting in N disable/enable cycles and N individual vCont packets
for N threads.
Now we batch the step-over so that all threads at the same breakpoint
site are stepped together in a single vCont packet, with the breakpoint
disabled once at the start and re-enabled once after the last thread
finishes.
When we hit `WillResume` any leftover `StepOverBreakpoint` plans from a
previous cycle are popped with their re-enable side effect suppressed
via `SetReenabledBreakpointSite`, giving a clean slate.
[23 lines not shown]
sha2_test: do correctness checks for all implementations
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Attila Fülöp <attila at fueloep.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18232
get_cpu_freq: handle CPUs with variable frequency
If a CPU has variable frequency, then lscpu will list separate "CPU min
freq" and "CPU max freq" values. In this case, take the maximum.
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Attila Fülöp <attila at fueloep.org>
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin at TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18232
[NFC][TableGen] Use `bit` instead of `int` for some Target flags (#182375)
Change `AllowRegisterRenaming` and `RegistersAreIntervals` to a bit
instead of int as they are just a bool flag.
[BOLT] Mark BOLTReserved segment executable (#181606)
Summary:
When .bolt_reserved section is defined in the linker script, there's
no way to mark the containing segment executable other than via PHDRS
command which overrides program headers entirely which is impractical.
Since .bolt_reserved contains executable code, mark segment executable
in BOLT.
Test Plan: bolt-reserved.test
[SimplifyCFG] process prof data when remove case in umin (#182261)
In #164097, we introduce a optimization for umin. But it does not handle
profile data correctly.
This PR remove profile data when remove cases.
Fixed: #181837
[Hexagon] Fix UB from signed left shift overflow in evaluateEXTRACTi (#181243)
The evaluateEXTRACTi function in HexagonConstPropagation uses a left
shift to position a bitfield at the top of a 64-bit word before
extracting it with a right shift. When the source value has high bits
set, the left shift of the int64_t value overflows, which is undefined
behavior.
Fix by performing the left shift in uint64_t, then casting to int64_t
only for the subsequent arithmetic right shift (signed extract case).
kdsoap: fix build on NetBSD-current
Remove -Wl,--fatal-warnings, there are linker warnings about
ld: /usr/lib/libutil.so.7: warning: warning: reference to compatibility login_getpwclass(); include <login_cap.h> for correct reference
etc.
[clangd] Guard against null TemplateName in DumpVisitor (#181554)
Add a guard against null values for TemplateName in
DumpVisitor::TraverseTemplateName.
clangd’s DumpVisitor may attempt to traverse a null TemplateName when
handling dependent nested template names. On LLVM main this can lead to
a crash in TemplateName::getKind().
Add a defensive check in DumpVisitor::TraverseTemplateName() to skip
null TemplateName instances before invoking traverseNode(). Following
the same design as other functions in the class.
No functional change is intended beyond preventing the crash.
Fixes: #180902
---------
Signed-off-by: Emily Dror <emilydror01 at gmail.com>
[clang][TypePrinter] Introduce AnonymousTagMode enum (#182317)
As part of https://github.com/llvm/llvm-project/pull/159592, we want to
emit unique lambda names into debug-info without relying on
`AnonymousTagLocations` (i.e., we don't want the source files included
in the names).
The plan is to implement this as a third `AnonymousTagMode`. This patch
turns the existing `AnonymousTagLocations` into an enum as preparation.
(full prototype is at https://github.com/llvm/llvm-project/pull/168533)