Reapply "AMDGPU: Use real copysign in fast pow (#97152)" (#178036)
This reverts commit bff619f91015a633df659d7f60f842d5c49351df.
This was reverted due to regressions caused by poor copysign
optimization, which have been fixed.
[clang][modules] Support every import syntax in single-module-parse-mode (#179610)
Previously, `-fmodules-single-module-parse-mode` only prevented module
compilation/loading when initiated from an `#include` or `#import`
directive. This PR does the same for `@import`, `#pragma clang module
import` and `#pragma clang module load`. This is done by sinking the
logic down into `CompilerInstance::loadModule()`.
[AMDGPU] Disable VALU sinking and hoisting with WWM
Machine LICM can hoist a VALU instruction from a WWM region.
In this case WQM pass will have to create yet another WWM region
around the hoisted instruction, which is not desired.
Unfortunatelly we cannot tell if an instruction is in the WWM
region, so this patch disables hoisting if WWM is used in the
function.
This works around the bug SWDEV-502411.
[mlir] disable folding collapse expand to cast (#179209)
Collapsing expand(collapse(src)) to cast(src) is supported in cases
where the source and result are cast compatible but not equal. When the
source has dynamic dimensions this leads to cases where the cast is
enabled even though certain dimensions cast from static to dynamic when
the dynamic size is not assured to be equal to the static size.
Currently blocking applying this folding when the source has dynamic
dimensions to preserve correctness.
In the future it could be possible to enable some cases of folding when
not all dimensions of the source are static.
Such cases could be when:
1) expand and collapse happened on non dynamic dims
2) expand and collapse on dynamic dims could be folded to no op
[MLIR][Python] Add llvm raw fd ostream c api (#179770)
This PR adds a C API `MlirLlvmRawFdOstream` for `llvm::raw_fd_ostream`,
which cannot be safely replaced by `std::ofstream` on Windows.
`llvm::raw_fd_ostream` configures Win32 file sharing flags, allowing
other handles (e.g. Python temp file handles) to coexist, see details
[here](https://llvm.org/doxygen/Windows_2Path_8inc_source.html#l1281),
while `std::ofstream` disables file sharing by default.
[lldb] Broadcast `eBroadcastBitStackChanged` when frame providers change (#171482)
We want to reload the call stack whenever the frame providers are
updated. To do so, we now emit a `eBroadcastBitStackChanged` on all
threads whenever any changes to the frame providers take place.
I found this very useful while iterating on a frame provider in
lldb-dap. So far, the new frame provider only took effect after
continuing execution. Now the backtrace in VS-Code gets refreshed
immediately upon running `target frame-provider add`.
[lldb] Return Expected<ModuleSP> from Process::ReadModuleFromMemory (#179583)
I noticed that Module::GetMemoryObjectFile populates a Status object
upon error but it's effectively dropped on the floor. Instead, the
clients can report the error as desired.
At the moment, all clients are either (1) consuming the error because
it's only trying to find a module, or (2) log the error and bail out
early. I tried to preserve existing behavior as faithfully as possible.
[CIR][AArch64] Add lowering for predicated SVE svdup builtins (zeroing)
This PR adds CIR lowering support for predicated SVE `svdup` builtins on
AArch64. The corresponding ACLE intrinsics are documented at:
https://developer.arm.com/architectures/instruction-sets/intrinsics
This change focuses on the zeroing-predicated variants (suffix `_z`, e.g.
`svdup_n_f32_z`), which lower to the LLVM SVE `dup` intrinsic with a
`zeroinitializer` passthrough operand.
IMPLEMENTATION NOTES
--------------------
* The CIR type converter is extended to support `BuiltinType::SveBool`,
which is lowered to `cir.vector<[16] x i1>`, matching current Clang
behaviour and ensuring compatibility with existing LLVM SVE lowering.
* Added logic that converts `cir.vector<[16] x i1>` according to the
underlying element type. This is done by calling
`@llvm.aarch64.sve.convert.from.svbool`.
[56 lines not shown]
[SLP]Remove LoadCombine workaround after handling of the copyables
LoadCombine pattern handling was added as a workaround for the cases,
where the SLP vectorizer could not vectorize the code effectively. With
the copyables support, it can handle it directly.
Also, patch adds support for scalar loads[ + bswap] pattern for byte
sized loads (+ reverse bytes for bswap)
Recommit after revert in 6377c86d718232fe60c548dfd7ab439f7ff84df7
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/174205
[AMDGPU][SIInsertWaitcnt][NFC] Don't expose internal data structure to user (#179736)
With this patch we are no longer exposing the internal data structure
that holds the WaitEvents to the user through the `getWaitEventMask()`
API. Instead we only allow the user to query a specific type and get the
corresponding `WaitEventSet` with `getWaitEvents(T)`.
Note: This patch also renames `getWaitEventMask()` to `getWaitEvents()`
because we are no longer returning a mask but instead a `WaitEventSet`
object.
[clang] remove unused SrcAddr parameter from performAddrSpaceCast (#179330)
The conversion code always ended up just getting the type of Src from
the Src argument itself, with no virtual users of this, so there is no
point in also providing this API hook. Fix the documentation as well,
since it seems DestAddr must have been similarly removed at some point
in the past from the API but was still documented.
Also fixes CIR to actually return the casted value!
[VectorCombine] Fold (icmp eq/ne (reduce.add X), 0) to reduce.umax
When vector elements are known to be either non-positive (e.g., from
sext i1), or non-negative (e.g., from zext i1), comparing the sum
against zero is equivalent to checking if all elements are zero. This
can be done more efficiently using reduce.umax.
[offload] Fix DeviceImage to handle OffloadBinary::create returning vector (#180003)
OffloadBinary::create() now returns
`Expected<SmallVector<unique_ptr<OffloadBinary>>>`
instead of a single unique_ptr, to support multiple entries in version 2
format.
Updated DeviceImageTy constructor to extract the first binary from the
returned
vector, with empty check. In this context, only one image per
OffloadBinary is expected.
[MLIR][XeGPU][XeVM] Update single element vector type handling. (#178558)
Type conversion rule for single element vector and materialization
function to support the conversion has a mismatch.
Update materialization function to match the type conversion rule.