[AMDGPU] Take into account amdgpu-waves-per-eu in getRegPressureLimit
The minimum occupancy computed by `getOccupancyWithWorkGroupSizes`
doesn't take into account that the user may have provided a
low-occupancy target through the amdgpu-waves-per-eu attribute.
Use getWavesPerEU which gives the proper occupancy bounds.
When the user specifies a small amdgpu-waves-per-eu range (like "1,1"), this
results in higher vpgr limits.
InstCombine: Stop preserving undef in SimplifyDemandedFPClass
If we know there are no valid values, fold to poison. Previously this
would leave values that started as undef alone.
[NFC][AMDGPU] Add missing `lit.local.cfg` to `PreISelIntrinsicLowering` tests (#178154)
Add `lit.local.cfg` to restrict the `PreISelIntrinsicLowering/AMDGPU`
tests to AMDGPU only.
These tests were previously being run for all targets.
[flang][OpenMP][DoConcurrent] Add `collapse` clause to generated `omp.loop_nest` op (#178138)
Adds the collpase clause to the generated loop nest both on host and
device.
InstCombine: Apply demanded mask at recursion limit in SimplifyDemandedFPClass
This fixes missed flag inference in some cases, due to not inferring
no-nan result implies no-nan source. Also start treating explicit nofpclass
attributes as a leaf value, like a constant or argument.
InstCombine: Add a few more tests for SimplifyDemandedeFPClass exp handling (#178147)
These got lost in various merges. Test a few cases where flags are
inferred from context.
[Clang] Remove gnuwin32 documentation references (#177557)
Remove the documentation references to GnuWin32. The project is no
longer maintained, and as LLVM is now using Git, `llvm-lit` is now using
the GNU core utilities packaged with it rather than requiring a separate
installation.
This appears to have been on
[discourse](https://discourse.llvm.org/t/gnuwin32-alternatives-for-tests-of-msvc-build/42846/3)
but not implemented yet.
InstCombine: Add a few more tests for SimplifyDemandedeFPClass exp handling
These got lost in various merges. Test a few cases where flags are inferred
from context.
[mlir][shard, bufferization] Adding sharding extensions for bufferization ops (#177378)
Adding trivial sharding support for `bufferization.alloc_tensor`,
`bufferization.dealloc_tensor` and
`bufferization.materialize_in_destination`.
include/mlir/Dialect/Tensor/IR/ShardingInterfaceImpl.h -> mlir/include/mlir/Dialect/Bufferization/Extensions/ShardingExtensions.h
---------
Co-authored-by: Adam Siemieniuk <adam.siemieniuk at intel.com>
[OpenMP][MLIR] Add num_threads clause with dims modifier support (#171767)
PR adds support of openmp 6.1 feature num_threads with dims modifier.
llvmIR translation for num_threads with dims modifier is marked as NYI.
[AArch64][GlobalISel] Remove -global-isel-abort=2 from a number of tests. NFC
This cleans up some -global-isel-abort=2 uses, either removing the unnecessary
flags or cleaning up the tests that use them.
[SPIRV] Emit intrinsics for globals only in function that references them
In the SPIRV backend, the SPIRVEmitIntrinscs::processGlobalValue
function adds intrinsic calls for every global variable of the module,
on every function.
These intrinsics are used to keep track of global variables, their types and
initializers.
In SPIRV everything is an instruction (even globals/constants). We currently
represent these global entities as individual instructions on every function.
Later, the `SPIRVModuleAnalysis` collects these entities and maps function _local_ registers
to _global_ registers. The `SPIRVAsmPrinter` is in charge of mapping back the _local_
registers to the appropiate _global_ register.
These instructions associated with global entities on functions that do not reference them leads
to a bloated intermediate representation and high memory consumption (as it happend
in https://github.com/llvm/llvm-project/issues/170339).
[25 lines not shown]
[libc++][NFC] Don't use std::distance in std::equal (#177113)
We don't need to use `std::distance`, since we know for a fact that we
have random access iterators in that place. Instead, we can just
subtract the iterators, avoiding a bunch of template machinery and
imrpoving compile times a bit.
[flang] fix DIR IVDEP for array assignments inside loops (#177940)
The access attribute set on hlfir.assign for arrays was lost in
InlineHLFIRAssign.cpp. This patch propagates it to the creates loads and
stores.
[compiler-rt][aarch64][sme] Add SVE/FP variant of `__arm_sc_memcpy` (#127093)
When SVE is available use the `-sve` variant of memcpy from AOR for
`__arm_sc_memcpy`. From:
https://github.com/ARM-software/optimized-routines/blob/71e36403858ab3ff743fcde336fb31890e57af7e/string/aarch64/memcpy-sve.S
This implementation uses FPR/ZPR load/store instructions to do the copy,
so should not cause memory hazards if called in streaming mode (with the
memory later being accessed in the streaming mode with SVE/SME
instructions).
The implementation has been slightly modified from AOR to use local
labels (matching other compiler-rt functions) but still passes the
memcpy and memmove tests from AOR.
[analyzer][docs] Add basic description of checker 'core.CallAndMessage' (#177179)
The checker had very little documentation. Now a more detailed (but
still not much) description of the features and options is added.