[flang][OpenMP] Implement nest depth calculation in LoopSequence (#186477)
Calculate two depths, a semantic one and a perfect one. The former is
the depth of a loop nest taking into account any loop- or
sequence-transforming OpenMP constructs. The latter is the maximum level
to which the semantic nest is a perfect nest.
Issue: https://github.com/llvm/llvm-project/issues/185287
Reinstate PR185298 after a fix has been merged in PR186416. Includes a
testcase that triggered failures before.
[X86] Blocklist instructions that are unsafe for masked-load folding. (#178888)
This PR blocklist instructions that are unsafe for masked-load folding.
Folding with the same mask is only safe if every active destination
element reads only from source elements that are also active under the
same mask. These instructions perform element rearrangement or
broadcasting, which may cause active destination elements to read from
masked-off source elements.
VPERMILPD and VPERMILPS are safe only in the rrk form, the rik form
needs to be blocklisted. In the rrk form, the masked source operand is a
control mask, while in the rik form the masked source operand is the
data/value. This is also why VPSHUFB is safe to fold, while other
shuffles such as VSHUFPS are not.
Examples:
```
EVEX.128.66.0F.WIG 67 /r VPACKUSWB xmm1{k1}{z}, xmm2, xmm3/m128
[35 lines not shown]
[flang][NFC] Converted five tests from old lowering to new lowering (part 31) (#186299)
Tests converted from test/Lower/Intrinsics: iall.f90, iand.f90,
iany.f90, ibclr.f90, ibits.f90
[AMDGPU] Simplify state clearing in SIInsertWaitcnts. NFC. (#186399)
There is no need to clear state at the start or end of the run method,
because a fresh instance of SIInsertWaitcnts is constructed for each run
on a MachineFunction.
[AMDGPU] Initialize more fields in the SIInsertWaitcnts constructor. NFC. (#186394)
ST, TII, TRI and MRI can all be initialized in the constructor and hence
be references instead of pointers.
[orc-rt] Add Controller Interface (CI) symbol table to Session. (#186747)
The Controller Interface is the extended set of symbols (mostly wrapper
functions) that the controller can call prior to loading any JIT'd code.
It is expected that it will be used to inspect the process and create /
configure services to enable JITing.
[lldb-dap] Mark return value as readonly (#186329)
Marked return value as readonly to give VS Code a hint that this
variable doesn't support `setVariable` request.
[C++20] [Modules] Don't add discardable variables to module initializers (#186752)
Close https://github.com/llvm/llvm-project/issues/170099
The root cause of the problem is, we shouldn't add the inline variable
(which is discardable in linker's point of view) to the module's
initializers.
I verified with GCC's generated code to make the behavior consistent.
This is also a small optimization by the way.
[ARM] Try to lower sign bit SELECT_CC to shift (#186349)
Lower a `x < 0 ? 1 : 0` style SELECT_CC to `x>>(bw-1)`. This will become
more important with an upcoming change, but also appears to be somewhat
useful by itself.
[clang-tidy] Fix an edge case in readability-implicit-bool-conversion (#186234)
Fix a FP for condition expressions wrapped by `ExprWithCleanups`.
Co-authored-by: EugeneZelenko <eugene.zelenko at gmail.com>
Co-authored-by: Zeyi Xu <zeyi2 at nekoarch.cc>
[mlir][linalg] Use inferConvolutionDims for generic convolution downscaling (#180586)
The goal of this PR is to implement a generic, structure-aware
convolution downscaling transformation that works for any
convolution-like operation regardless of its specific layout or naming,
rather than relying on pattern-matching against specific named
operations.
Each pattern we currently have, have hardcoded dimension indices
specific to its layout (e.g., NHWC vs NCHW).
This approach :-
1. Requires maintaining many similar patterns.
2. Is brittle when new layouts are introduced.
3. Cannot handle batchless versions of the conv variants.
This PR thus creates a single downscaleSizeOneWindowedConvolution
function that uses `inferConvolutionDims` to semantically understand the
convolution structure (batch dims, output image dims, filter loop dims,
etc.) rather than hardcoding indices.
[8 lines not shown]
[Clang] Dump noexcept expression in compound requirement AST dumps
When a compound requirement has a noexcept(expr) specification, the
expression is now visited as a child node in AST dumps. The text dumper
also shows "noexcept(expr)" instead of just "noexcept" to indicate the
presence of the expression.
Fix noexcept requirement not being checked when concept is used in another concept
The RecursiveASTVisitor was not traversing the noexcept expression in
compound requirements, causing template parameters used only in noexcept
expressions to be missed during constraint normalization.
This resulted in concepts with dependent noexcept requirements (like
noexcept(noexc) where noexc is a template parameter) not being properly
evaluated when the concept was used inside another concept definition.
Fix by adding traversal of getNoexceptExpr() in
TraverseConceptExprRequirement.
Test: Uncommented and verified the test case in compound-requirement.cpp
that was previously commented out because it didn't work.
[SelectionDAG] Add CTTZ_ELTS[_ZERO_POISON] nodes. NFCI (#185600)
Currently llvm.experimental.cttz.elts are directly lowered from the
intrinsic.
If the type isn't legal then the target tells SelectionDAGBuilder to
expand it into a reduction, but this means we can't split the operation.
E.g. it's possible to split a cttz.elts nxv32i1 into two nxv16i1,
instead of expanding it into a nxv32i64 reduction.
vp.cttz.elts can be split because it has a dedicated SelectionDAG node.
This adds CTTZ_ELTS and CTTZ_ELTS[_ZERO_POISON] nodes and just enough
legalization to get tests passing. A follow up patch will add splitting
and move the expansion into LegalizeDAG.
[Clang] Support noexcept(expr) in C++ concepts compound requirements
This patch implements P3822R0 support for noexcept specifications with constant expressions in C++20 concepts compound requirements.
Previously, only 'noexcept' keyword was supported, which is now
equivalent to 'noexcept(true)'.