[MLIR][Linalg][NFC] Simplify tiling canonical pattern (#182909)
Prepare for better composition of canonicalization patterns by splitting
the linalg own canonicalizers from others for particular purposes (ex.
tiling).
Once dialects have their own registration mechanisms, specific passes
can just add more ops/dialects using a yet-to-be-created helper that
would be similar to the existing
`populateLinalgTilingCanonicalizationPatterns`.
[mlir][python] Fix segfault in DenseResourceElementsAttr.get_from_buffer for 0-d tensors (#183070)
When `ndim == 0`, `view->strides[view->ndim - 1]` is an out-of-bounds
access (unsigned underflow to `SIZE_MAX`). Use `view->itemsize` for
alignment instead, since a scalar buffer is trivially aligned to its
element size.
Fixes iree-org/iree-turbine#1312.
[clang] Clean up large clang binaries copied into test temp directories (#182304)
I noticed a couple of tests leave behind copies of clang binaries they
copy into their temp directories. Replicate the cleanup from another
test (clang/test/Driver/clang_f_opts_withspaces.c) to remove these.
[LAA][LV]Allow recognition of strided pointers with constant stride (#171151)
This patch fixes an issue found during LoopAccessAnalysis with respect
to recognizing strided pointers that make use of runtime constants. Loop
accesses of the form `p[base + offset * const]` , where `const` is a
runtime constant
should be considered for vectorization. However, it was found that there
were cases that these access patterns weren't recognized. This patch
resolves
this by adding an explicit pattern match within LAA.
---------
Co-authored-by: Florian Hahn <flo at fhahn.com>
[MLIR][NVVM][NVPTX] Support for new mma/mma.sp variants from PTX 9.1 (#182325)
This change adds support for `.scale_vec::4X` with `.ue8m0` as `.stype`
with `.kind::mxf4nvf4` for `mma/mma.sp` instructions introduced in [PTX
ISA
9.1](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html?highlight=mma%2520sp#ptx-isa-version-9-1).
Also, it updates MLIR mma/mma.sp block scale tests with struct usage
instead of vector.
[CIR] Implement 'assume' attribute lowering (#182960)
This attribute applies to null statements and emits an assume-op
sometimes. This patch adds this for statements, which includes the
infrastructure for AttributedStmt lowering.
---------
Co-authored-by: Sirui Mu <msrlancern at gmail.com>
[IR] Specify alloca with poison element count (#183072)
An alloca with a poison element count is undefined behavior.
This matches existing behavior of optimizations. This also matches the
behavior of llubui for `poison`, but llubi currently does not report
immediate UB for `undef` element counts. A future patch will fix that.
[Hexagon] Avoid contracting predicates in createHvxPrefixPred
The function createHvxPrefixPred should only need to expand a predicate
to match the result's bytes-per-bit. Otherwise, contracting of the
predicate may lead to an input that is shorter than 4 bytes, making it
unsuitable for VINSERTW0.
When calling createHvxPrefixPred for vector concatention, re-group the
inputs to the concat to make sure that the resulting inputs to
createHvxPrefixPred would not need contraction.
Fixes https://github.com/llvm/llvm-project/issues/181362
[libsycl] Fix for static vars deinit order (libsycl vs liboffload) (#181366)
both libsycl & liboffload uses static variables.
on Linux static variable destructor is called earlier than the method
with `__attribute__((destructor(...)))`.
this fix helps to avoid crash due to liboffload static variable early
destruction.
the approach utilizes the following rule
"For each local object obj with static storage duration, obj is
destroyed as if a function calling the destructor of obj were registered
with
[std::atexit](https://en.cppreference.com/w/cpp/utility/program/atexit.html)
at the completion of the constructor of obj."
from `std::exit`.
in the first call of get_platforms we call liboffload's iterateDevices
that leads to liboffload static storage initialization. Then we
initialize our own local static var after this to be able to call our
shutdown methods earlier and before the liboffload objects are
[8 lines not shown]
[HIP] Do not apply 'externally_initialized' to constant device variables (#182157)
Summary:
From the Language reference:
> By default, global initializers are optimized by assuming that global
variables defined within the module are not modified from their initial
values before the start of the global initializer. This is true even for
variables potentially accessible from outside the module, including
those with external linkage or appearing in @llvm.used or dllexported
variables. This assumption may be suppressed by marking the variable
with externally_initialized.
This is intended because device programs can be modified beyond the
normal lifetime expected by the optimization pipeline. However, for
constant variables we should be able to safely assume that these are
truly constant within the module. In the vast majority of cases these
will not get externally visible symbols, but even `extern const` uses we
should assert that the user should not be writing them if they are
marked const.
[Offload][clang-linker-wrapper][SPIRV] Tell spirv-link to not optimize out exported symbols (#182930)
`spirv-link` seems to internalize all symbols, which ends up causing the
OpenMP Device Environment global generated by the OMP FE to get
optimized out which causes `liboffload` to run in the wrong
parallelization mode which breaks at least one liboffload lit test.
Pass `--create-library` to tell it not to do that.
```
--create-library
Link the binaries into a library, keeping all exported symbols.
```
This fixes the test.
Closes: https://github.com/llvm/llvm-project/issues/182901
Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
[CIR] Fix global-refs test that got committed in github 'race' (#183068)
Despite my best efforts, this crossed in the air with the attributes on
arguments patch, and thus had a problem with the test. This patch
updates the test to be tolerant of the attributes.
[MC/DC] Make covmap tolerant of nested Decisions (#125407)
CoverageMappingWriter reorders `Region`s by `endLoc DESC` to prioritize
wider `Decision` with the same `startLoc`.
In `llvm-cov`, tweak seeking Decisions by reversal order to find smaller
Decision first.
llvmorg-23-init-2321-g8f690ec7ffd8
[LLVM] Fix accidentally included POSIX header on Windows
Summary:
This used to only build on Linux, I forgot that these changes would
cause it to be built on Windows.