[VPlan] Print debug info for all recipes. (#168454)
Use the recently refactored VPRecipeBase::print to print debug location
for all recipes.
PR: https://github.com/llvm/llvm-project/pull/168454
[ORC] Remove now unused EPCDebugObjectRegistrar (NFC) (#167868)
EPCDebugObjectRegistrar is unused now that the ELF debugger support plugin uses AllocActions
https://github.com/llvm/llvm-project/pull/167866
[AMDGPU] Ignore wavefront barrier latency during scheduling DAG mutation (#168500)
Do not add latency for wavefront and singlethread scope fences during
barrier latency DAG mutation.
These scopes do not typically introduce any latency and adjusting
schedules based on them significantly impacts latency hiding.
[orc-rt] Initial ORC Runtime design documentation. (#168681)
This document aims to lay out the high level design and goals of the ORC
runtime, and the relationships between key components.
[RISCV] Convert -mtune=generic to generic-rv32/rv64 in RISCVSubtarget::initializeSubtargetDependencies. (#168612)
The "generic" entry in tablegen is really a dummy entry. We shouldn't
use it for anything. Remap "generic" to either generic-rv32 or
generic-rv64 based on the triple.
[NFCI][bolt][test] Use AT&T syntax explicitly (#167225)
This enables building LLVM with `-mllvm -x86-asm-syntax=intel` in one's
Clang config files (i.e. a global preference for Intel syntax).
`-masm=att` is insufficient as it doesn't override a specification of `-mllvm -x86-asm-syntax`.
[clang][NVPTX] Add remaining float to fp16 conversions (#167641)
This change adds intrinsics and clang builtins for the remaining float
to fp16 conversions. This includes the following conversions:
- float to bf16x2 - satfinite variants
- float to f16x2 - satfinite variants
- float to bf16 - satfinite variants
- float to f16 - all variants
Tests are added in `convert-sm80.ll` and `convert-sm80-sf.ll` for the
intrinsics and in `builtins-nvptx.c` for the clang builtins.
[RISCV][NewPM] Port RISCVCodeGenPrepare to the new pass manager (#168381)
As suggested in the review for #160536 it would be good to follow up and
port the RISC-V passes to the new pass manager. This PR starts that
task. It provides the bare minimum necessary to run RISCVCodeGenPrepare
with opt -passes=riscv-codegenprepare. The approach used is modeled on
my observations of the AMDGPU backend and the recent work to port the
X86 passes.
The testing approach is to add a `-passes=riscv-foo` RUN line to at
least one test, if an appropriate test exists.
ELF,test: Test unversioned undefined symbols of index 0 and 1
My 2020 change that added versioned symbol recognition
(reviews.llvm.org/D80059) checks both VER_NDX_LOCAL and VER_NDX_GLOBAL,
though test coverage was missing. lld/test/ELF/dso-undef-extract-lazy.s
checks that the undefined symbol is indeed considered unversioned.
[libc] Fix -Wshorten-64-to-32 in fileop_test. (#168451)
Explicitly cast 0 to size_t type to match fread() return type. This
follows the pattern used elsewhere in this file, and fixes
-Wshorten-64-to-32 warnings when building the test.
[orc-rt] Simplify Session shutdown. (#168664)
Moves all Session member variables dedicated to shutdown into a new
ShutdownInfo struct, and uses the presence / absence of this struct as
the flag to indicate that we've entered the "shutting down" state. This
simplifies the implementation of the shutdown process.
[MLIR][XeGPU] Allow create mem desc from 2d memref (#167767)
This PR relax the create_mem_desc's restriction on source memref,
allowing it to be a 2d memref.
[libclc] Use CLC atomic functions for legacy OpenCL atom/atomic builtins (#168325)
Main changes:
* OpenCL legacy atom/atomic builtins now call CLC atomic functions
(which use Clang __scoped_atomic_*), replacing previous Clang __sync_*
functions.
* Change memory order from seq_cst to relaxed; keep device scope (spec
permits broader than workgroup). LLVM IR for _Z8atom_decPU3AS1Vi in
amdgcn--amdhsa.bc:
Before:
%2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1
syncscope("agent") seq_cst
After:
%2 = atomicrmw volatile sub ptr subrspace(1) %0, i32 1
syncscope("agent") monotonic
* Also adds OpenCL 1.0 atom_* variants without volatile on the pointer.
They are added for backward compatibility.
[LV]: Skip Epilogue scalable VF greater than RemainingIterations. (#156724)
Consider skipping epilogue scalable VF when they are greater than
RemainingIterations same as fixed VF.
And skip scalable RemainingIterations from that comparison because
SCEV ATM can't evaluate non-canonical vscale-based expressions.
Reapply "[Github] Update PR labeller to v6.0.1 (#167246)"
This reverts commit d772663a9f003a08ee76414397963c58e80b27d7.
This fixes the final issue with the labeller landing. There were
two remaining issues:
1. There was an extra quote on one of the globs
2. Some of the yaml keys were named incorrectly (should have been
plural)
[PowerPC] Add custom lowering for SADD overflow for i32 and i64 (#159255)
This patch improves the codegen for saddo on i32 and i64 in both 32-bit
and 64-bit modes by custom lowering. It implements signed-add overflow
detection using the `(x eqv y) & (sum xor x)`bit-level sequence.