[X86] Add tests showing failure to concat 256-bit rotate nodes on non-vlx targets (#203517)
These are widened in tablegen, we don't need to limit these to VLX targets
[mlir][vector] extend `createReadOrMaskedRead`/`createWriteOrMaskedWrite` with permutation map support (#202766)
Follow-up to #201180.
Extends the existing `createReadOrMaskedRead` and
`createWriteOrMaskedWrite` utilities in `VectorUtils` with two optional
trailing parameters:
- `ArrayRef<Value> indices`
- `AffineMap permutationMap`
The affine super-vectorizer is updated to call these functions instead
of constructing `TransferReadOp`/`TransferWriteOp` directly.
@banach-space, please correct me if this wasn't what you meant in the
previous PR.
---------
Signed-off-by: Federico Bruzzone <federico.bruzzone.i at gmail.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski at gmail.com>
[MergeICmps] Perform dereferenceability check with context (#202884)
To support deref-at-point semantics, we need to check dereferenceability
with a context instruction. Currently, MergeICmps does the check for
each individual load instruction. In this PR, I'm replacing this with a
check for all the loads that are part of a chain after they have been
collected, so we do the context-sensitive check only once.
The choice of context instruction is a bit tricky: Normally, this would
just be the first block in the chain (the "entry block"), but it's also
possible for the block to "do extra work", in which case it will get
split. If this happens, we should be checking at the splitting point, as
the extra work might be freeing the pointer.
Another question to consider here is whether we need to be concerned
about frees at all: After all, the original code will be accessing at
least one byte of the two objects, so doesn't that imply that it wasn't
freed already? This is indeed the case, as long as allocations cannot
shrink. This is something we currently don't allow, but I think it's
something we want to allow, so I'm going with the conservative treatment
here.
[DirectX] Drop DICommonBlock metadata (#201948)
DICommonBlock cannot be represented in LLVM 3.7, but it is a scope
within a parent scope, so we can refer to the parent scope instead.
[X86] - Prevent the wrong fold of x86_avx512_mask_cmp_ss/sd to fcmp (#202321)
The issue is based upon the SemiAnalysisAI by @jlebar.
[058-mask-cmp-ss-imm-immediate-not-validated](https://github.com/SemiAnalysisAI/FuzzX/blob/master/x86/bugs/058-mask-cmp-ss-imm-immediate-not-validated/NOTES.md)
It is not a real bug, just a warning for the future fold implementation
of mask_cmp → fcmp.
There is non to fix as of now in the source code. Added a few comments
and test cases for the future implementation of the folds.
@topperc @phoebewang
[lit] Add support for %{s:stem} substitution. (#202885)
It provides the source file name with the (last) extension removed.
This is to align with what is available for %t and actually needed
downstream.
[X86] combineConcatVectorOps - concat(permi(x,imm0),permi(y,imm1)) -> vpermv3(widen(x),m,widen(y)) (#203508)
Add handling for X86ISD::VPERMI nodes with different immediates -
folding to a X86ISD::VPERMV3 instead, replacing a
INSERT_SUBVECTOR+2xPERMI nodes with a mask load
We don't need to concat the source operands - we have other folds that
will do this if beneficial - we just rely on (free) implicit widening.
[clang][bytecode] Add `PtrView` for non-tracking pointers (#184129)
Currently, when creating a `Pointer` (of block type, which I will assume
here), the pointer will add itself (via its address) to its block's
pointer list. This way, a block always knows what pointers point to it.
That's important so we can handle the case when a block (which was e.g.
created for a local variable) is destroyed and we now need to update its
pointers.
However, since always do this for all `Pointer` instances, it creates a
weird performance problem where we do this dance all the time for no
reason, e.g. consider `Pointer::stripBaseCasts()`:
https://github.com/llvm/llvm-project/blob/88693c49d9ac58a33af5978d31f6c70fe1d5b45b/clang/lib/AST/ByteCode/Pointer.h#L778-L783
This will add and remove the newly created pointer from the block's
pointer list every iteration. Other offenders are `Pointer::toRValue()`,
`EvaluationResult::checkFullyInitialized()` or
`Pointer::computeOffsetForComparison()`.
[8 lines not shown]
[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging
It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.
I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
[Flang][OpenMP] Add combined construct information
This patch adds the `omp.combined` attribute to OpenMP dialect
operations following changes to the `ComposableOpInterface`.
This attribute is added to operations representing non-innermost leaf
constructs of a combined construct and to standalone block-associated
constructs that can be combined with their parent construct.
Changes are made to the OpenMP lowering logic, as well as the
do-concurrent, workshare and workdistribute transformation passes.
[MLIR][OpenMP] Explicit tagging of combined constructs
Combined OpenMP constructs, such as `parallel do`, which represent
nests of constructs where each one contains a single other construct
without any other directives or statements in between, are currently not
marked in any way in the MLIR representation.
This works because they don't usually require any specific handling
other than what would be done for the included operations. However, the
handling of `target` regions needs to know whether it was part of a
combined construct in order to properly optimize for the SPMD case and
detect when certain clauses must be inconditionally evaluated in the
host.
So far, this has been achieved by having some MLIR pattern-matching
logic to infer whether a nest of operations could have potentially been
produced for a combined construct. This approach is error prone,
computationally expensive and it can't really work in the general case.
On the other hand, a compiler frontend can easily tell the difference
[10 lines not shown]
[openmp] Fix export file paths (#202692)
The files omp_lib.h and omp-tools.h are the outputs of two
configure_file invocations which specify the full path of the outputs.
Use these full paths in LibompExports.cmake so they can actually be
found.
[Dexter] Add at_frame_idx to check values in frames above current
This patch adds a new attribute for !and nodes, `at_frame_idx`, which
matches against frames above its parent node; for example, in the script:
```
!where {function: foo}:
!where {function: bar}:
!and {at_frame_idx: 1}:
!value x: 0
```
The `!value x` node checks the value of 'x' in 'foo' while the debugger is
inside 'bar'. Use of this attribute comes with some restrictions: a !where
node can never be nested under a !and{at_frame_idx} node, and neither can
another !and{at_frame_idx} node.
runtimes: Pass CMAKE_SYSTEM_NAME based on target triple
Compute the cmake system name from the target triple, rather
than passing through the host's. This is primarily to stop
forwarding OSX specific cmake variables.
This fixes build failures when trying to build gpu libc on mac
hosts. Previously it would fail on several issues, starting with
an unused argument -mmacos-version-min error, followed by other
errors caused by passing -isysroot.
Secondarily, restrict the cmake imported targets when cross compiling.
Without this, the amdgpu build prints many cmake warnings about the
target not supporting shared libraries.
Claude did most of the actual work, though it required quite a few
rounds of prodding to get it into the right place. In particular it
took care of handling all of the cmake platform recognized names from
the triple.
[2 lines not shown]
Emit debug type vector (#200056)
This emits `DebugTypeVector` for HLSL `float4`-style vectors.
`partitionTypes()` separates vector `DICompositeType` nodes from basic
types so both can be visited in a single pass over the debug metadata. A
new `emitDebugTypeVector()` helper builds the `DebugTypeVector`
instruction and looks up the base-type register in `DebugTypeRegs`.
The helper skips four cases silently:
1. Absent or non-`DIBasicType` base type: only scalar element types are
supported for now.
2. Base type not yet emitted: the type was not reached during the
`DebugTypeBasic` pass.
3. Multiple subranges: `DebugTypeVector` models one-dimensional vectors
only (NSDI cannot encode multi-subrange types).
4. Non-constant subrange count: NSDI cannot represent variable-length
counts.
[2 lines not shown]