Reland HIP offload PGO compiler support and link the device-profile runtime (#201607)
This mostly relands the compiler part of #177665 (approved and merged,
then reverted in #201416). The first commit restores it as merged: the
AMDGPU instrumentation in LLVM and the HIP codegen in Clang.
#177665 was reverted because of a Windows CRT problem, fixed by
splitting the ROCm runtime into a separate library clang_rt.profile_rocm
(see the compiler-rt PR). The second commit links that library on the
host for HIP device PGO, in addOffloadRTLibs for the Linux and MSVC
toolchains, gated on HIP + profiling + the library being present. It is
a superset of clang_rt.profile and is linked first, so the base library
stays inert. Non-HIP links are unaffected.
Depends on the compiler-rt PR that adds clang_rt.profile_rocm.
[HLSL] Set visibility of cbuffer global variables to internal (#200312)
Global variables for all resources except `cbuffer` are already emitted
with internal linkage (since #166844). This change adds internal linkage
to the `cbuffer` handle globals as well.
One problem is that the `cbuffer` handle globals appears unused between
Clang CodeGen and `{DXIL|SPIRV}CBufferAccess` pass, which replaces
individual `cbuffer` constant globals with accesses through the
`cbuffer` handle globals. Before this pass runs, the unused globals
could get optimized away in `GlobalOptPass` with `-O3`.
To solve this, the `cbuffer` handle globals are added to the
`@llvm.compiler.used` list to make sure they stay in the module until
the `{DXIL|SPRIV}CBufferAccess` pass, which then removes them from the
list.
Reland "[clang-tidy] Preserve line endings in macro-to-enum fixes" (#202271)
Use StringRef::detectEOL() when inserting enum braces so fix-its do not
mix LF into CRLF source files.
This reland fixes the previous buildbot failure by adding `--` in test
file.
[rtsan][clang] Add Hexagon support for RTSan (#200313)
Enable RTSan for the Hexagon architecture.
* Add Hexagon to ALL_RTSAN_SUPPORTED_ARCH in cmake
* Add a clang driver test for hexagon-unknown-linux-musl
* Guarding a static_assert(sizeof(unsigned long) >= sizeof(off_t)) with
SANITIZER_WORDSIZE >= 64, since off_t syscall args are split into two
regs.
[test][Support] Disable CFI-icall for DynamicLibrary Overload test (#202446)
The test performs manual symbol lookup and calls, which triggers
Control Flow Integrity indirect call checks.
[Clang][counted_by] Honor counted_by in __bdos on direct struct access (#201161)
__builtin_dynamic_object_size on a flexible array member must consult
the 'counted_by' attribute even when the containing struct is accessed
directly (a local or global variable) rather than through a pointer
dereference. The pointer-deref form (p->fam) already worked because the
constant evaluator could not determine the LValue for an opaque
parameter and fell through to the counted_by-aware runtime path in
CGBuiltin. The direct form (af.fam, gaf.fam) was being folded by
tryEvaluateBuiltinObjectSize to a layout-derived size (e.g. trailing
struct padding for locals, trailing initializer data for globals)
silently bypassing emitCountedBySize.
Make the AST constant evaluator refuse to fold __bdos on the same
operands that CGBuiltin's __bdos lowering classifies as a counted_by
FAM access. The check runs after the existing negative-offset early
return so that obviously out-of-bounds operands like &p->array[-42]
still fold to 0, preserving the behavior the sanitizer-bounds test in
attr-counted-by.c (test35) relies on.
[25 lines not shown]
[clang-cl] Fix friend class warning on Windows (#201720)
clang-cl warned on "friend class CallInst;" because MSVC may resolve
that to "friend llvm::CallInst" instead of the sbox IR mirrored
hierarchy. Drop the class tag and refer to forward declared names
instead.
[Github] Make bazel-checks workflow validate shasums (#202405)
For some added security (although it's probably not super helpful here),
and consistency across the code base.
[SPIR-V] Allow bfloat vector atomics lowering without scalarization (#202083)
This is a workaround only for AMD triple - to use
SPV_NV_shader_atomic_fp16_vector for this lowering.
[WebAssembly] Don't stackify multi-def instructions (#200429)
This commit updates the `WebAssemblyRegStackify.cpp` pass to
specifically exclude attempting to stackify the first def of a multi-def
instruction. As the previous comments indicate this is possible to do in
some situations, but the current logic is incomplete and has led to
miscompilations such as #98323 and #199910. One option would be to make
the logic more robust, but in lieu of that in the meantime the change
here is to completely disable stackification in these situations. This
provides at least a "known working" base to build on later and fixes the
known regressions around this.
Closes #98323
Closes #199910
[clang][ssaf] Convert `JSONFormat` tests for `TUSummary` and `TUSummaryEncoding` to lit tests (#192187)
This change converts most of the `TUSummary` json tests in `TUSummaryTest.cpp` to use `lit`. Some tests require more care and will be addressed in future PRs.
[AMDGPU] Do not always add latency between LDSDMA -> S_WAIT_LDSDMA (#201942)
In loop bodies we typically see LDSDMA instructions prefetched an
iteration or more. Thus, we may have LDSDMA, followed by S_WAIT_LDSDMA
that is waiting on prior iteration LDSDMA. Currently, the scheduler
thinks there will be a long stall between this LDSDMA and S_WAIT_LDSDMA.
This adds some basic checking for LDSDMA and S_WAIT_LDSDMA in the same
region to avoid adding latency in cases where we are certain the
S_WAIT_LDSDMA does not correspond with the LDSDMA.
[lldb][NFC] Remove redundant TypeSystemClang.h includes (#202439)
TypeSystemClang.h includes a lot of other unique headers, and should not
be included unless needed.
[VPlan] Fix vplan printing for VPExpressionRecipe w/conditional reduction. (#198954)
This patch contains two parts.
- Add a new vplan-printing test which is duplicated from
vplan-printing-reductions.ll and force tail folding.
- Fix the printing of VPExpressionRecipe for conditional reductions.
Since the mask operand cannot be accessed directly through the reduction
recipe once folded, it need to be fetched from the expression recipe's
operands.
[test][DynamicLibrary] Add visibility attribute for GCC/Clang in PipSqueak.h (#202445)
By default CFI builds with hidden, failing expectation for the test.
[analyzer] Fix misleading 'initialized here' note for uninitialized d… (#198345)
…eclarations
When a variable is declared without an initializer, the
BugReporterVisitor would emit 'initialized here' as a note, which is
confusing because the variable was never initialized.
Change the note to 'declared without an initial value' for declarations
that have no initializer. Global-storage variables are also taken into
consideration.
Removed the SI.Value.isUndef() case, as it is unreachable in
practice because core.uninitialized.Assign (a core checker, always
enabled) reports
the assignment before this note can surface.