[SPIRV] Support non-constant indices for vector insert/extract (#172514)
This patch updates the legalization of spv_insertelt and spv_extractelt
to
handle non-constant (dynamic) indices. When a dynamic index is
encountered, the
vector is spilled to the stack, and the element is accessed via
OpAccessChain
(lowered from spv_gep).
This patch also adds custom legalization for G_STORE to scalarize vector
stores
and refines the legalization rules for G_LOAD, G_STORE, and
G_BUILD_VECTOR.
Fixes https://github.com/llvm/llvm-project/issues/170534
Address reviewer feedback: fix getWaitCountMax and reduce code duplication
- Fix getWaitCountMax() to use correct bitmasks based on architecture:
- Pre-GFX12: Use getVmcntBitMask/getLgkmcntBitMask for LOAD_CNT/DS_CNT
- GFX12+: Use getLoadcntBitMask/getDscntBitMask for LOAD_CNT/DS_CNT
- Refactor repetitive if-blocks for LOAD_CNT, DS_CNT, EXP_CNT into
a single loop using getCounterRef helper function
- Fix X_CNT to return proper getXcntBitMask(IV) instead of 0
[LLDB] Add MSVC STL span formatter (#173053)
`std::span` didn't have a formatter for MSVC's STL yet. The type is
quite useful in C++ 20, so this PR adds a formatter for it.
Since the formatter is new, I made it work with both DWARF and PDB from
the start.
[SPIRV] Lower i1 comparisons to logical operations in regularizer pass.
UGT, UGE, ULT, ULE, SGT, SGE, SLT, SLE predicates for i1 types are now
lowered to equivalent logical operations (AND, OR, NOT) to ensure
valid SPIR-V, since SPIR-V boolean types only support logical operations.
Revert "SelectionDAG: Do not propagate divergence through glue (#174766)"
This reverts commit 47a0d0e42832558f999b149b22cfd48c46ef2a57.
Reverted due to test failures in LLVM_ENABLE_EXPENSIVE_CHECKS builds.
[libclc] Initial support for cross-compiling OpenCL libraries (#174022)
Summary:
The other GPU enabled libraries, (openmp, flang-rt, compiler-rt, libc,
libcxx, libcxx-abi) all support builds through a runtime cross-build. In
these builds we use a separate CMake build that cross-compiles to a
single target.
This patch provides basic support for this with the `libclc` libraries.
Changes include adding support for the more standard GPU compute triples
(amdgcn-amd-amdhsa, nvptx64-nvidia-cuda) and building only one target in
this mode.
Some things left to do:
This patch does not change the compiler invocations, this method would
allow us to use standard CMake routines but this keeps it minimal.
The prebuild support is questionable and doesn't fit into this scheme
[3 lines not shown]
[OpenMP][OMPIRBuilder] Attach `Attribute::OptimizeNone` to user-defined (and default) mappers
Disabling opts for user-defined mappers since, in some cases (see
`default-mapper-nested-derived-type.f90`), some optimizations and
instrumentations causes runtime crashes on the host. In particular, the
following:
- ths `X86DAGToDAGISel` pass and
- `OptNoneInstrumentation` (TODO zoom in further on passes that use this
instrumention object to find out the exact pass(es) causing the crash).
I couldn't find a way to fine-tune this for only the problematic passes
yet. I am very reluctant to do this, please let me know if there are
better solutions.
[cross-project-tests][formatters] Move LLDB test setup into it's own CMakeLists.txt
Once we start adding more tests, having this in a separate CMakeLists.txt is more maintainable.
SelectionDAG: Do not propagate divergence through glue (#174766)
Glue does not carry any value (in the LLVM IR Value sense) that could be
considered uniform or divergent.
[mlir][Transforms] `remove-dead-values`: Rely on canonicalizer for region simplification (#173505)
This commit simplifies the `remove-dead-values` pass and fixes a bug in
the handling of `RegionBranchOpInterface` ops. The pass used to produce
invalid IR ("null value found") for the newly added test case.
`remove-dead-values` is a pass for additional IR simplification that
cannot be performed by the canonicalizer pass. Based on a liveness
analysis, it erases dead values / IR. (The liveness analysis is a
dataflow analysis that has more information about the IR than a
canonicalization pattern, which can see only "local" information.)
Region-based ops are difficult. The liveness analysis may determine that
an SSA value is dead. However, that does not mean that the value can
actually be removed. Doing so may violate an region data flow (as
modeled by the `RegionBranchOpInterface`). As an example, consider the
case where a region branch terminator may dispatch to one of two region
successor with the same forwarded values. A successor input (block
argument) can be erased only if it is dead on both successors.
[11 lines not shown]
[clang][bytecode] Fix some imag/real corner cases (#174764)
Fix real/imag when taking a primitive parameter _and_ being discarded,
and fix the case where their subexpression can't be classified.
Fixes https://github.com/llvm/llvm-project/issues/174668
[AMDGPU] Handle `s_setreg_imm32_b32` targeting `MODE` register
On certain hardware, this instruction clobbers VGPR MSB `bits[12:19]`, so we need to restore the current mode.