[libc] Make cpp::byte alias-safe (#194171)
Change LIBC_NAMESPACE::cpp::byte from an enum-backed type to unsigned
char so libc’s raw-memory utilities and sorting code can legally access
object representations without violating C++ strict-aliasing rules.
[MemoryBuiltins] Capture more information for alloc/free from attributes
We now read the `alloc_align` attribute to provide better alignment
information to users. `alloc-family` should be used as well, as
described in the LangRef. Two new helpers provide argument numbers,
rather than values.
[flang] Recognize effects on non-addressable resources in opt-bufferization.
opt-bufferization has been only handling `fir::DebuggingResource`
explicitly. This patch adds support for other non-addressable
resources, such as `fir::VolatileMemoryResource`. This allows
merging elemental/assign for the `volatile_src_nonvolatile_dst`
example in the updated LIT test.
[flang] Pass-through fir.volatile_cast in FIR AliasAnalysis.
It should be safe to pass-through `fir.volatile_cast` for the purpose
of alias analysis. The missing pass-through prevented optimization
of the `nonvolatile_src_volatile_dst` test (see updated LIT test).
[libc] Fix install-libc to work with LLVM_LIBC_FULL_BUILD=OFF (#197366)
Initialize variables that are conditionally set to avoid undefined
references in install-libc and install-libc-stripped targets:
- Initialize added_bitcode_targets to empty string (may be undefined
when LIBC_TARGET_OS_IS_GPU=OFF)
- Initialize startup_target to empty string and only set to
"libc-startup" when both LLVM_LIBC_FULL_BUILD=ON and NOT baremetal
(startup directory is only included in full builds)
- Initialize header_install_target to empty string (may be undefined
when LLVM_LIBC_FULL_BUILD=OFF)
[DirectX] Do not emit !dbg on function definitions (#197449)
This was not done in LLVM 3.7. Instead, the !DISubprogram contains a
reference to the function (already emitted).
[libc] Add config option to use memory builtin functions. (#197977)
Add a new CMake and C++ definition configuration option
`LIBC_CONF_USE_MEM_BUILTINS` to allow users to use compiler builtins for
memory utility functions (memcpy, memset, memmove, memcmp, and bcmp)
instead of LLVM libc's internal implementations. Main use-cases are:
- when users want to bring their own memory functions implementations
that are highly optimized for their targets
- improve portability by providing a fallback for targets for which LLVM
libc does not have memory utility implementations yet
- to be used for libc/shared functions and their testings, as we expect
libc/shared functions to provide their own memory functions.
[lldb] Fix data race in ObjectFile::GetSectionList (#197812)
The early `m_sections_up == nullptr` check was performed outside the
module mutex, so two threads sharing the same Module could both enter
the branch and race on the write in CreateSections. Restructure so the
check and populate both happen under the module mutex; this is a
standard double-checked locking fix.
Found by ThreadSanitizer as part of #197792.
PGO] Drop consecutive-zeros.ll test
pgo-memop-opt has previously validated VP metadata and bailed if it runs
into duplicate values in the VP metadata. VP metadata values will soon
be deduplicated at construction, making this no longer necessary, and
will also cause this test to fail, so drop it. Keep the
verification/deduplication pgo-memop-opt for now to avoid leaving main
in a broken state.
Reviewers: mtrofin, ormris
Pull Request: https://github.com/llvm/llvm-project/pull/197615
[AMDGPU] Fix VOPD assembler validation for GFX12+ (#198034)
The related `codegen` side of this change was already landed by
https://github.com/llvm/llvm-project/commit/c510ee553e2057f94c2f023c72abb3c9afec0962
("[AMDGPU] VOPD: AllowSameVGPR on GFX12"), which changed
`GCNVOPDUtils.cpp` to use `hasGFX12Insts()` instead of
`hasGFX1250Insts()`.
However, the assembler validation in `AMDGPUAsmParser.cpp` was not
updated to match, causing it to reject valid VOPD instruction pairs that
share the same VGPR as src0 on `gfx1200`.
This fix aligns the assembler with the `codegen` by changing
`isGFX1250Plus()` to `isGFX12Plus()` in `checkVOPDRegBankConstraints`,
and adds a positive test case to verify same-VGPR src0 pairs assemble
correctly on `gfx12`.
[Instrumentor] Add call instrumentation support
We can now instrument call instructions and extract information about
the arguments, (de)allocation, intrinsic kind, etc.
[IR] Note that duplicate profile values are illegal in VP metadata
It is not legal to have duplicate VP metadata as it should be merged
appropriately before it actually ends up transcribed into the IR.
I will put up a verifier patch for this to follow this one, but do so
separately in case we need to revert due to detecting actual issues in
the code base.
Reviewers: david-xl, teresajohnson, mtrofin
Pull Request: https://github.com/llvm/llvm-project/pull/193077
[CIR][CUDA] Support device-side printf for NVPTX (#196573)
Implement device-side printf lowering for NVPTX targets in CIR codegen.
The variadic arguments are packed into a stack-allocated struct and
passed to vprintf, matching the classic codegen behavior in
CGGPUBuiltin.cpp
When the target triple is NVPTX and the builtin is
printf/__builtin_printf, we route to emitNVPTXDevicePrintfCallExpr
The no-varargs case passes a null pointer directly.
AMDGCN device printf remains NYI.
part of https://github.com/llvm/llvm-project/issues/179278
[MLIR] Add `IntegerDivisibilityAnalysis` and `InferIntDivisibilityOpInterface` (#197728)
This patch is a port from
https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Dialect/Util/Analysis/IntegerDivisibilityAnalysis.cpp
to upstream
It introduces a dataflow analysis that tracks integer divisibility
(divisor + remainder lattice) for SSA values, plus an op interface
`InferIntDivisibilityOpInterface` for ops to participate.
It adds:
* `IntegerDivisibilityAnalysis` produces a `Divisibility` lattice
`{divisor, remainder}`
* `InferIntDivisibilityOpInterface` interface
* External-model implementations for `arith` and `affine` ops
* `test-int-divisibility` test pass + lit tests
Example:
Here is the usual approach to laod element `i` from `i4` buffer emulated
[11 lines not shown]
[flang][acc] Accept component of global variable in `acc declare` (#197819)
This MR partially extends the current implementation to accept cases of
`acc declare` on a `parent%comp` whenever the `parent` has been `acc
declare`d with the same clause. This is done by generating only the acc
global constructor only for mapping the parent as the child is expected
to be part of parent.
The limitations still remain as a TODO unless it can be proven parent is
mapped. A generic implementation would need either compiler generated
ordering on the global constructors used for mapping or runtime managed
ordering.
[AArch64] Do not pass debug insn to liveness analysis (#198021)
Fix another stepBackward location.
Debug instructions must not affect liveness analysis. stepBackward has
an assertion failure on debug instructions after
https://github.com/llvm/llvm-project/pull/193104.
Signed-off-by: John Lu <John.Lu at amd.com>
[RISCV][MCA] Use the new infrastructure for SiFive P500 and P800's tests. NFC (#198016)
Some tests -- mostly vector crypto -- are kept for SiFive P800.
NFC.
[flang][NFC] Finishing touches on legacy lowering conversion (#197973)
At the beginning of legacy lowering conversion, some tests were
initially converted to emit FIR. After some discussion, it was decided
to revisit those tests and convert them to emit HLFIR. This change
completes that step and should be the final change in removing vestiges
of legacy lowering.
Assisted-by: AI
[lldb] Avoid unnecessary strlen of mangled names in ConstString (NFC) (#197995)
C++ mangled names are known to be quite long at times. This change makes
use of available length data, instead of using the `StringRef(const char
*)` constructor which calls `strlen`.
The main detail is to replace `selectPool(llvm::StringRef(raw))` with a
call to `selectPool` using a readily available StringRef.
[libc] Reduce number of iterations in threading tests. (#198030)
Previously the threading tests were running noticeably slowly and
causing flakey timeouts on some buildbots (e.g.
https://lab.llvm.org/buildbot/#/builders/71/builds/48420)
[CIR] Cast global var address to declared type at dtor call site
A C++ global with a constexpr default constructor that fixes the active member of a union — `std::basic_string`'s SSO `__short` variant is a common example — has a `cir.global` whose stored record type is the narrowed shape of that active variant. Classic CodeGen does the same (`@g = global { { { [16 x i8] } } } zeroinitializer`) and accepts the resulting `__cxa_atexit(@D1, @g, ...)` because LLVM IR uses opaque pointers. CIR has typed pointers, so the `cir.call` registering the destructor for `__cxa_atexit` carries an operand type that doesn't match the dtor's `this` parameter. This trips 16 libcxx tests and 71 cases total across libcxx, MultiSource, SingleSource, and SPEC in our build.
`verifyPointerTypeArgs(oldF, newF, userMap)` in `CIRGenModule::applyReplacements` (`clang/lib/CIR/CodeGen/CIRGenModule.cpp:1700`) catches this when ctor-dtor aliases are enabled and D1 is RAUW'd by D2. Without aliases, the `cir.call` op verifier rejects the same operand-type mismatch directly.
The fix mirrors the cast pattern `emitGlobalVarDeclLValue` (`clang/lib/CIR/CodeGen/CIRGenExpr.cpp:441-445`) already uses for every AST-level reference to a global: bitcast the result of `getAddrOfGlobalVar` to `convertTypeForMem(type)` before any typed-pointer op consumes it. `getAddrOfGlobalVar` itself stays raw so callers that walk to the underlying `GetGlobalOp` via `getDefiningOp()` keep working.
`global-dtor-union-narrowed.cpp` pins the CIR bitcast, the lowered LLVM helper-wrapped `__cxa_atexit`, and the equivalent OGCG direct `__cxa_atexit`.
[clang] remove lots of "innocuous" addrspacecasts (#197745)
These originally added many addrspacecast early on, where often it
wasn't needed, or could be added later. This makes these fairly
straightforward to remove (other than changing some tests). By swapping
all calls to this function (except the intended semantic ones for
parameters and variables) with the uncasted version, AMDGPU will
eventually not need to attempt to apply a fix up afterwards by having
different addrspace maps. This PR does not yet fix all calls, but the
main ones that might have been missed are in matrix/vector extensions
(which seem to weirdly override the memory type for temporary values to
be different from the type of the object in all other uses).
Fix dynamic map iterator target data lowering
Hoist runtime-sized offload map array allocation for regional target data with
iterator modifiers so the dynamic count and arrays dominate both begin and end
runtime calls.
[libc] Fix shadowing in printf (#197985)
The 320 bit float converter defined StorageType and DECIMAL_POINT
outside of its functions. This caused issues with other definitions of
the same variables after #197516.
[DWARFLinker] Preserve source order of member subprograms (#196443)
Children of class/struct/union/interface DIEs in the parallel
DWARFLinker's artificial type unit are sorted lexicographically by the
TypePool synthetic-name key. Data members already get a positional slot
through the synthetic name, but subprograms don't: they collapse to
alphabetical-by-linkage-name order. That breaks LLDB's
SBType::GetMemberFunctionAtIndex(N), which contractually returns members
in DWARF order.
Add a uint32_t SortKey on TypeEntryBody, atomically min-merged across
CUs with the input DIE's ordinal in its parent's child list, and consult
it before the synthetic-name key in TypePool's comparator. The ordinal
is computed by cloneDIE's existing child walk and threaded into
createTypeDIEandCloneAttributes. Scoped to children of
class/struct/union/interface so top-level types in the artificial type
unit keep their existing sort order.