[lld][AMDGPU] Support R_AMDGPU_ABS32_(LO|HI) relocations (#191550)
Summary:
These relocations are extremely rare, but they are listed as an expected
relocation in https://llvm.org/docs/AMDGPUUsage.html#relocation-records
and you can theoretically make them happen so we should probably support
it in the linker.
Changed stat passes to count instructions before and after optimizations (#188837)
Created this for instcount and func-properties-analysis to be able to
see the change the optimization pipelines have on stats
[clang-doc] Use distinct APIs for fixed arena allocation sites
Typically, code either always emits data into the TransientArena or the
PersistentArena. Use more explicit APIs to convey the intent directly
instead of relying on parameters or defaults.
[clang-doc] Removed OwnedPtr alias
The alias served a purpose during migration, but now conveys the wrong
semantics, as the memory of these pointers is generally interned inside
a local arena.
[clang-doc] Update type aliases
Many of the type aliases we introduced to simplify migration to arena
allocation are no longer relevant after completing the migration. We
can use more relevant names and remove dead aliases.
[flang] Detect non-optional boxes inside acc.compute_region. (#191328)
This should be a temporary change until we figure out
a better way for representing definitely present boxes.
It allows me to experiment with flang-licm further,
so I would like to ask for approval.
[flang][CUF] Limit Flang LICM for operations with symbol operands. (#191494)
There is probably an ordering issue between `CUFDeviceGlobal`
and `OffloadLiveInValueCanonicalization` passes: Flang LICM hoists
`fir.address_of` out of `cuf.kernel`, it is pulled back by
`OffloadLiveInValueCanonicalization`, but the symbol is never added
into the device module because `CUFDeviceGlobal` does not run after.
Changing the passes order may take some time, so this is a temporary
workaround to unblock #191309.
The change is currently NFC.
[Support] On Windows, silence FARPROC casts (#191563)
When building with clang-cl 19, this was generating:
```
warning: cast from 'FARPROC' ... converts to incompatible function type
[-Wcast-function-type-mismatch]
```
[clang-doc] Initialize member variable (#191570)
We don't always initialize the IsType field in the current
implementation. We can ensure this field is always initialized to
`false`, and avoid any UB due to garbage data.
[mlir][sparse][gpu] fix sparse GPU codegen out buffer (#189221)
When lowering sparse tensor operations to GPU code using
`-sparse-gpu-codegen`, the generated `gpu.memcpy` op for device-to-host
copy was targeting the wrong buffer. In my case, it did not copy back
the output buffer and instead only copied back the input positions
buffer which results in the output buffer in host memory being empty.
The `SparseGPUCodegen` pass carries an assumption that the first buffer
is the out buffer. It looks like this assumption is not always true, as
in my case its the input positions buffer which made it the only buffer
getting copied back to host.
This change introduces a fix by removing the assumption and replacing it
with an analysis that checks for `memref::StoreOp` and write
MemoryEffects. This change also adds a regression test which highlights
the problematic edge case.
Assisted by Gemini 3.1 Pro for finding the issue of using incorrect
buffers in `gpu.memcpy` op in the lowered code.
[clang-doc] Initialize member variable
We don't always initialize the IsType field in the current
implementation. We can ensure this field is always initialized to
`false`, and avoid any UB due to garbage data.
[Codegen, X86] Add prefetch insertion based on Propeller profile (#166324)
This PR implements the prefetch insertion in the InsertCodePrefetch pass
based on the
[RFC](https://discourse.llvm.org/t/rfc-code-prefetch-insertion/88668).
If the prefetch target is not defined in the same module (i.e, prefetch
target function is not defined in the same module), we emit a fallback
weak symbol after the prefetch instruction so that if the symbol is not
ever defined, we don't get undefined symbol error and the prefetch
instruction prefetches the next address:
```
prefetchit1 __llvm_prefetch_target_foo(%rip)
.weak __llvm_prefetch_target_foo
__llvm_prefetch_target_foo:
```
The weak symbol semantic is tied to ELF, so this makes this PR
target-dependent.
[SystemZ][z/OS] Show instruction encoding in HLASM output
This change adds the support to show instruction encoding as a comment
when emitting HLASM text. With this, the last 2 LIT tests migrate to
HLASM syntax.
[clang-doc] Removed OwnedPtr alias
The alias served a purpose during migration, but now conveys the wrong
semantics, as the memory of these pointers is generally interned inside
a local arena.
[clang-doc] Use distinct APIs for fixed arena allocation sites
Typically, code either always emits data into the TransientArena or the
PersistentArena. Use more explicit APIs to convey the intent directly
instead of relying on parameters or defaults.
[clang-doc] Update type aliases
Many of the type aliases we introduced to simplify migration to arena
allocation are no longer relevant after completing the migration. We
can use more relevant names and remove dead aliases.
[clang-doc] Merge data into persistent memory (#190056)
We have a need for persistent memory for the final info. Since each
group processes a single USR at a time, every USR is only ever processed
by a single thread from the thread pool. This means that we can keep per
thread persistent storage for all the info. There is significant
duplicated data between all the serialized records, so we can just merge
the final/unique items into the persistent arena, and clear out the
scratch/transient arena as we process each record in the bitcode.
The patch adds some APIs to help with managing the data, merging, and
allocation of data in the correct arena. It also safely merges and deep
copies data from the transient arenas into persistent storage that is
never reset until the program completes.
This patch reduces memory by another % over the previous patches,
bringing the total savings over the baseline to 57%. Runtime performance
and benchmarks stay mostly flat with modest improvements.
[30 lines not shown]
[flang][OpenMP] Use common utility functions to get affected nest depth (#191418)
Remove the existing code that calculates the number of affected loops in
an OpenMP construct. There is a single function that does that and that
handles all directives and clauses.
Issue: https://github.com/llvm/llvm-project/issues/191249
[clang-doc] Removed OwnedPtr alias
The alias served a purpose during migration, but now conveys the wrong
semantics, as the memory of these pointers is generally interned inside
a local arena.
[clang-doc] Update type aliases
Many of the type aliases we introduced to simplify migration to arena
allocation are no longer relevant after completing the migration. We
can use more relevant names and remove dead aliases.