[RISCV] Relax reversed mask's mask requirement in reverse to strided load/store combine (#180706)
We have combines for vp.reverse(vp.load) -> vp.strided.load stride=-1
and vp.store(vp.reverse) -> vp.strided.store stride=-1.
If the load or store is masked, the mask needs to be also a vp.reverse
with the same EVL. However we also have the requirement that the mask's
vp.reverse is unmasked (has an all-ones mask).
vp.reverse's mask only sets masked off lanes to poison, and doesn't
affect the permutation of elements. So given those lanes are poison, I
believe the combine is valid for any mask, not just all ones.
This is split off from another patch I plan on posting to generalize
those combines to vector.splice+vector.reverse patterns, as part of
#172961
[Mips] Fix cttz.i32 fails to lower on mips16 (#179633)
MIPS16 cannot handle constant pools created by CTTZ table lookup
expansion. This causes "Cannot select" errors when trying to select
MipsISD::Lo nodes for constant pool addresses.
Modify the table lookup conditions to check ConstantPool operation
status, and only set ConstantPool to Custom in non-MIPS16 mode in MIPS
backend.
This ensures MIPS16 uses the ISD::CTPOP instead of attempting
unsupported constant pool operations.
Fix #61055.
[LLDB] Fix KDP plugin path (#180897)
This should fix a failure on the macOS buildbots (see
https://github.com/llvm/llvm-project/pull/179524#issuecomment-3882784085).
I can't test this, but the only plugins not enabled on Linux and Windows
are `ProcessKDP`, `PlatformDarwin`, and `PlatformDarwinKernel`. Looking
at the path for KDP, it uses `GetPluginNameStatic` as the last name in
the path. This is `kdp-remote` instead of `kdp`.
[PowerPC] Require PPC32 for 32-bit addc/adde/subc/sube (#179186)
Unlike the base add/sub opcodes which will just overflow, these will
produce incorrect results, because the carry operates on the full
64-bits. Trying to use these with i32 operands on PPC64 should result in
a selection failure instead of a silent miscompile, like the one seen in
https://github.com/llvm/llvm-project/pull/178979.
[OFFLOAD] Support host plugin on Windows (#180401)
Changes to make host plugin compile on Windows:
* Change IO code to be portable
* Adjust Makefiles
Allow plugin to work partially when libffi support is not found
dynamically (compilation works fine even on Windows because of the
wrapper support).
[MLIR] Guard optional operand resolution in generated op parsers (#180796)
Skip resolveOperands for optional operands when they are absent to
avoid out-of-bounds access on the empty types vector.
[AMDGPU] Introduce asyncmark/wait intrinsics (#180467)
Asynchronous operations are memory transfers (usually between the global
memory and LDS) that are completed independently at an unspecified
scope. A thread that requests one or more asynchronous transfers can use
async marks to track their completion. The thread waits for each mark to
be completed, which indicates that requests initiated in program order
before this mark have also completed.
For now, we implement asyncmark/wait operations on pre-GFX12
architectures that support "LDS DMA" operations. Future work will extend
support to GFX12Plus architectures that support "true" async operations.
This is part of a stack split out from #173259
- #180467
- #180466
Co-authored-by: Ryan Mitchell ryan.mitchell at amd.com
Fixes: SWDEV-521121
[libc++][test] Include `<ios>` and `<ctime>` in tests for `time` locale facets (#179986)
Add inclusion of `<ios>` and `<ctime>` to ensure that the definitions of `std::basic_ios` and `std::tm` are available.
As a drive-by fix, change uses of `tm` to `std::tm`. The latter is guaranteed to be available in `<ctime>`, but the former isn't.
[CIR] Add CIRGen support for static local variables with non-constant initializers
This adds CIRGen infrastructure for C++ function-local static variables
that require guarded initialization (Itanium C++ ABI).
Changes:
- Add ASTVarDeclAttr to carry VarDecl AST through the pipeline
- Add emitGuardedInit() to CIRGenCXXABI for guarded initialization
- Add emitCXXGuardedInit() to CIRGenFunction
- Replace NYI in addInitializerToStaticVarDecl() with ctor region emission
- Set static_local attribute on GlobalOp and GetGlobalOp
The global's ctor region contains the initialization code, which will be
lowered by LoweringPrepare to emit the actual guard variable pattern with
__cxa_guard_acquire/__cxa_guard_release calls.
[NewPM] Port x86-insert-x87-wait (#180128)
Similar to other portings created by @aidenboom154. No specific test
coverage as there are no MIR->MIR tests that exercise this pass. Going
with other naming conventions, I renamed WaitInsert to
X86InsertX87WaitLegacy
[MLIR][Affine] Remove restriction in slice validity check on symbols (#180709)
Remove restriction in affine analysis utility for checking slice
validity. This was unnecessarily bailing out still after the underlying
methods were extended. This update enables fusion of affine nests with
symbolic bounds.
Fixes: https://github.com/llvm/llvm-project/issues/61784
Based on and revived from https://reviews.llvm.org/D148559 from
@anoopjs.
[Flang][OpenMP] Fix visibility of user-defined reductions for derived types and module imports (#180552)
User-defined reductions declared in a module were not visible to
programs that imported the module via USE statements, causing valid code
to be incorrectly rejected. The reduction identifier defined in the
module scope wasn't being found during semantic analysis of the main
program.
Ref:
OpenMP Spec 5.1
_"If a directive appears in the specification part of a module then the
behavior is as if that directive,
with the variables, types and procedures that have PRIVATE accessibility
omitted, appears in the
specification part of any compilation unit that references the module
unless otherwise specified "_
Fixes :
[https://github.com/llvm/llvm-project/issues/176279](https://github.com/llvm/llvm-project/issues/176279)
Co-authored-by: Chandra Ghale <ghale at pe31.hpc.amslabs.hpecorp.net>
[AMDGPU] Introduce asyncmark/wait intrinsics
Asynchronous operations are memory transfers (usually between the global memory
and LDS) that are completed independently at an unspecified scope. A thread that
requests one or more asynchronous transfers can use async marks to track their
completion. The thread waits for each mark to be completed, which indicates that
requests initiated in program order before this mark have also completed.
For now, we implement asyncmark/wait operations on pre-GFX12 architectures that
support "LDS DMA" operations. Future work will extend support to GFX12Plus
architectures that support "true" async operations.
Co-authored-by: Ryan Mitchell ryan.mitchell at amd.com
Fixes: SWDEV-521121
[clang] Ensure -mno-outline adds attributes
Before this change, `-mno-outline` and `-moutline` only controlled the
pass pipelines for the invoked compiler/linker.
The drawback of this implementation is that, when using LTO, only the
flag provided to the linker invocation is honoured (and any files which
individually use `-mno-outline` will have that flag ignored).
This change serialises the `-mno-outline` flag into each function's
IR/Bitcode, so that we can correctly disable outlining from functions in
files which disabled outlining, without affecting outlining choices for
functions from other files. This matches how other optimisation flags
are handled so the IR/Bitcode can be correctly merged during LTO.
[clang] Add clang::nooutline Attribute
This change:
- Adds a `[[clang::nooutline]]` function attribute for C and C++. There
is no equivalent GNU syntax for this attribute, so no `__attribute__`
syntax.
- Uses the presence of `[[clang::nooutline]]` to add the `nooutline`
attribute to IR function definitions.
- Adds test for the above.
The `nooutline` attribute disables both the Machine Outliner (enabled at
Oz for some targets), and the IR Outliner (disabled by default).