[RISCV] Emit lpad for function with returns-twice attribute (#170520)
Insert the landing pad after the functions with attribute "returns-twice"
as such function could return from a indirect branch (e.g. `setcontext`,
`swapcontext`, `setjmp`), so that they could use a normal indirect branch
which is safer than a software-guarded branch.
[MLIR][NVVM] Add mbarrier.try_wait Op (#170285)
This patch adds an Op for mbarrier.try_wait operation which lowers
to the corresponding intrinsics. This Op has support for an optional
time-limit, state-or-phase as well as relaxed memory semantics,
completing the features on this Op up to Blackwell.
Unlike the existing `nvvm.mbarrier.try_wait.parity` Op, this Op
does not provide a _blocking_ implementation. We intend to
add looping around this at NVGPU in a subsequent PR
(and deprecate the inline-asm based Op here).
lit tests are added to verify the lowering to the intrinsics.
Signed-off-by: Durgadoss R <durgadossr at nvidia.com>
[delinearize] use SCEV exprs in getIndexExpressionsFromGEP (#162888)
clean up interface of getIndexExpressionsFromGEP to get SCEV expressions
instead of int for Sizes of the arrays.
This intends to simplify the code in #156342 by avoiding conversions
from SCEV to int and back to SCEV.
[KCFI][NFC] Remove unused header (#170599)
In addition to being unused, this forms a layering violation between
Transforms/Utils and Transforms/Instrumentation
[C-API] LLVMOrcCreateObjectLinkingLayerWithInProcessMemoryManager (#169862)
Allow C programs to use JITLink with trivial new C-API wrapper. Modeled
on `LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager`
except that it has to deal with failure of
`jitlink::InProcessMemoryManager::Create()`. Function name suggested by
@lhames in https://github.com/llvm/llvm-project/issues/106203.
I suppose failure of underlying platform-specific things like
`sysconf(_SC_PAGESIZE)` shouldn't really happen. An alternative error
reporting style might be to follow
`LLVMOrcCreateDynamicLibrarySearchGeneratorForProcess` and return
`LLVMErrorRef` with an output parameter for the `LLVMOrcObjectLayerRef`,
but then it wouldn't be a drop-in replacement for
`LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager`.
Thoughts?
This is wanted by PostgreSQL (branch using this API:
https://github.com/macdice/postgres/tree/llvm-22-proposed-c-api). (We're
[4 lines not shown]
[CodeGen] Fix lpad padding at section start after empty block (#112595)
If a landing pad is at the very start of a split section, it has to be
padded by a nop instruction. Otherwise its offset is marked as zero in
the LSDA, which means no landing pad (leading it to be skipped).
LLVM already handles this. If a landing pad is the first machine block
in a section, a nop is inserted to ensure a non-zero offset. However, if
the landing pad is preceeded by an empty block, the nop would be
omitted.
To fix this, this patch adds a field to machine blocks indicating
whether this block contains the first instruction in its section. This
variable is then used to determine whether to emit the padding.
Co-authored-by: Jinjie Huang <huangjinjie at bytedance.com>
[AMDGPU] Readd assertions requirement to test after #170468
This was removed in #170468 now that debug counters are enabled by
default rather than requiring asserts. This AMDGPU test exercises
functionality in SIInsertWaitcnts.cpp that is fully wrapped in NDEBUG
though, so this test still needs an assertions requirement to pass.
expandFMINIMUMNUM_FMAXIMUMNUM: Improve compare between zeros (#140193)
1. On GPR32 platform, expandIS_FPCLASS may fail due to ISD::BITCAST
double to int64 may fail. Let's FP_ROUND double to float first.
Since we use it if MinMax is zero only, so the flushing won't
break anything.
2. Only one IS_FPCLASS is needed. MinMax will always be RHS if equal.
So we can select between LHS and MinMax.
It will even safe if FP_ROUND flush a small LHS, as if LHS is not zero
then, MinMax won't be Zero, so we will always use MinMax.
---------
Co-authored-by: Nikita Popov <github at npopov.com>
Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
[llvm-dwp] Fix FoundCUUnit problem on soft-stop with DWARF5 (#169783)
Currently, when a 'soft-stop' is triggered due to debug_info overflow,
there is an additional check for Dwarf5 to verify if the dwo contains a
split_compile unit (CU). However, since split_type units (TUs) are
typically placed before CUs in debug_info for Dwarf5, if an overflow is
detected within a TU causing an early break, the logic incorrectly
assumes this DWO lacks a CU and triggers an error.
Since the overflowing DWO will be discarded anyway, this validation is
redundant. This patch tries to fix this by removing the CU check during
a soft-stop.
Before this patch:
```
llvm-dwp main.dwo -continue-on-cu-index-overflow=soft-stop -o main.dwp
warning: debug_info Section Contribution Offset overflow 4G. Previous Offset 4294967271, After overflow offset 38.
error: no compile unit found in file: main.dwo
```
[4 lines not shown]
[MLIR][Presburger] optimize bound computation by pruning orthogonal constraints (#164199)
IntegerRelation uses Fourier-Motzkin elimination and Gaussian
elimination to simplify constraints. These methods may repeatedly
perform calculations and elimination on irrelevant variables.
Preemptively eliminating irrelevant variables and their associated
constraints can speed up up the calculation process.
Utils: Inhibit load/store folding through phis for llvm.protected.field.ptr.
Protected pointer field loads/stores should be paired with the intrinsic
to avoid unnecessary address escapes.
Reviewers: nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/151649
[TTI] Remove masked/gather-scatter/strided/expand-compress costing from TTIImpl (#169885)
Following #165532, this patch moves scalarization‑cost computation into
BaseT::getMemIntrinsicCost and lets backends override it via their
getMemIntrinsicCost.
It also removes the masked/gather‑scatter/strided/expand‑compress
costing interfaces from TTIImpl.
Targets may keep them locally if needed.
Stacked on #170426 and #170436.
[libclc] Fix memory fence scope mapping for OpenCL (#170542)
The function `__opencl_get_memory_scope` incorrectly assumed that the
Clang built-in `__MEMORY_SCOPE_*` macros defined as bitmasks, while they
are actually defined as distinct integer values. This led to incorrect
mapping of OpenCL memory fence flags to LLVM memory scopes, causing
issues in generated code.
The fix involves updating the `__opencl_get_memory_scope` function to
return the correct `__MEMORY_SCOPE_*` values based on the provided
`cl_mem_fence_flags`. Additionally, the `__opencl_get_memory_semantics`
and the `__opencl_get_memory_scope` functions are marked as `static`
to avoid potential multiple definition issues during linking.