[HLSL][DirectX] Emit convergence control tokens when targeting DirectX (#188792)
This pr allows codegen to generate convergence control tokens. This
allows for a more accurate description of convergence behaviour to
prevent (or allow) invalid control flow graph transforms. As noted, the
use of convergence control tokens is the ideal norm and this follows
that by enabling it for `DirectX`.
This was done now under the precedent of preventing a convergent exit
condition of a loop from being illegally moved across control flow. Test
cases for this are explicitly added.
Please see the individual commits for logically similar chunks.
Unfortunately, it is tricky to stage this in smaller individual commits.
Resolves https://github.com/llvm/llvm-project/issues/180621.https://github.com/llvm/llvm-project/pull/188537 is a pre-requisite of
this passing HLSL offload suite tests.
Assisted by: Github Copilot
[AMDGPU] Report only local per-function resource usage when object linking is enabled
With object linking the linker aggregates resource usage across TUs, so
compile-time pessimism and call-graph propagation duplicate the linker's work or
pollute its inputs.
In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
`max`/`or` expressions.
[AArch64][clang][llvm] Add ACLE Armv9.7 MMLA intrinsics
Implement new ACLE matrix multiply-accumulate intrinsics for Armv9.7:
```c
// 16-bit floating-point matrix multiply-accumulate.
// Only if __ARM_FEATURE_SVE_B16MM
// Variant also available for _f16 if (__ARM_FEATURE_SVE2p2 && __ARM_FEATURE_F16MM).
svbfloat16_t svmmla[_bf16](svbfloat16_t zda, svbfloat16_t zn, svbfloat16_t zm);
// Half-precision matrix multiply accumulating to single-precision
// instruction from Armv9.7-A. Requires the +f16f32mm architecture extension.
float32x4_t vmmlaq_f32_f16(float32x4_t r, float16x8_t a, float16x8_t b)
// Non-widening half-precision matrix multiply instruction. Requires the
// +f16mm architecture extension.
float16x8_t vmmlaq_f16_f16(float16x8_t r, float16x8_t a, float16x8_t b)
```
[GlobalISel] Fix -Wunused-variable (#193009)
These variables are only used in assertions and set outside of the
variable definition, so mark them [[maybe_unused]].
[VectorCombine] Fix transitive Uses in foldShuffleToIdentity (#188989)
The Uses in foldShuffleToIdentity is intended to detect where an operand
is used to distinguish between splats, identities and concats of the
same value. When looking through multiple unsimplified shuffles the same
Use could be both a splat and a identity though. This patch changes the
Use to a Value and an original Use, so that even if we are looking
through multiple vectors we recognise the splat vs identity vs concat of
each use correctly.
Fixes #180338
(cherry picked from commit fd40c606652137706bc336ef80ed1814ab3d3680)
[NFC][test] Precommit test for pr188989 (#188667)
Precommit test for #188989.
This test case covers a scenario in the vector combine
foldShuffleToIdentity function where incorrect folding was caused when
different shuffle sequences shared the same initial Use *. This issue
may be due to cost model differences and currently reproduces only on
LoongArch for this test case.
(cherry picked from commit 3e015b89e8bd9c71f6bb1cf38747d2862f5d5a3d)
[X86] Fix missing ByValTemporaries update in CopyViaTemp path for musttail calls (#190540)
This fixes a miscompilation in musttail calls with byval arguments on
X86.
In the CopyViaTemp path, a temporary stack object is created and the
argument is copied into it.
However, the temporary is not recorded in ByValTemporaries,
so the final lowering phase does not emit the copy to the real outgoing
argument slot.
As a result, the callee may read incorrect values from the stack.
Fix this by recording the temporary in ByValTemporaries so that the
final lowering step correctly copies the argument to the expected stack
location.
Reproducer: https://github.com/llvm/llvm-project/issues/190429
(cherry picked from commit abd502a44e5ef19a302d943eeb017c29124b96e9)
[libsycl] Fix _LIBSYCL_EXPORT placement (#192243)
Current placement of _LIBSYCL_EXPORT in usm_functions.hpp causes
compilation errors on Windows and is not aligned with other header
files.
[PatternMatchHelpers] Factor deferred and bind matchers (NFC) (#191373)
Factor bind_ty and deferredval_ty as match_bind and match_deferred from
existing PatternMatch implementations into PatternMatchHelpers.
[lldb][docs] Add conference talks to the links page (#192724)
At EuroLLVM, I mentioned a previous LLDB talk and realized they would be
a lot more discoverable if we linked them from the website.
[LV][NFC] Rename PreferPredicateOverEpilogue to TailFoldingPolicy (#191803)
Rename the -prefer-predicate-over-epilogue flag and its associated
enum values to use 'TailFold' terminology instead of 'Predicate'. The
term 'Predicate' is overloaded in the vectorizer context and would
cause further confusion as more tail-folding styles are added.
[VPlan] CSE ScalarIVSteps recipes (#191307)
Extend getOpCodeOrIntrinsicID to return a pseudo opcode for
ScalarIVSteps, so it can be CSE'd, when extended to also check the
InductionOpcode.
[lldb] Fix crash when evaluating expressions in Wasm targets (#192893)
LLDB crashes with "LLVM ERROR: Incompatible object format!" when
evaluating expressions while debugging WebAssembly because ProcessWasm
never disables JIT. RuntimeDyld only supports ELF, MachO, and COFF
object formats, so attempting to JIT-compile an expression for a Wasm
target produces the aforementioned fatal error.
This PR avoids the crash by calling `SetCanJIT(false)` in the
`ProcessWasm` ctor. Simple expressions will still work via the IR
interpreter, while expression requiring the JIT now show a proper error
message instead of crashing.
Fixes #179915
[clang] implement CWG2064: ignore value dependence for decltype
The 'decltype' for a value-dependent (but non-type-dependent) should be known,
so this patch makes them non-opaque instead.
This patch also implements what's neceessary to allow overloading
on pure differences in instantiation dependence, making `std::void_t`
usable for SFINAE purposes.
This also readds a few test cases from da98651, which was a previous attempt
at resolving CWG2064.
Fixes #8740
Fixes #61818
Fixes #190388
[SPIR-V] Handle [N x i8] byte addressing in SPIRVEmitIntrinsics
LLVM started generating [N x i8] types on array indexing GEPs. Emit
intrinsiscs did not know what to do with it so it was generating a
cast to [N x i8] to perform the GEP. This does not work in logical
addressing.
The handle this, we expand the `i8` gep handling for logical addressing
mode to work for arbitrary size byte addressing.
[VPlan] Remove dead partial reduction case in addReductionResultComputation. NFCI (#192985)
Partial reductions don't exist until createPartialReductions, which is
called after addReductionResultComputation. So we don't need to check
partial reductions anymore. I presume this happened after #167851
[MLIR][OpenMP] Unify device shared memory logic
This patch creates a utils library for the OpenMP dialect with functions
used by MLIR to LLVM IR translation as well as the stack-to-shared pass
to determine which allocations must use local stack memory or device
shared memory.
[MLIR][OpenMP][OMPIRBuilder] Improve shared memory checks
This patch refines checks to decide whether to use device shared memory or
regular stack allocations. In particular, it adds support for parallel regions
residing on standalone target device functions.
The changes are:
- Shared memory is introduced for `omp.target` implicit allocations, such as
those related to privatization and mapping, as long as they are shared across
threads in a nested parallel region.
- Standalone target device functions are interpreted as being part of a Generic
kernel, since the fact that they are present in the module after filtering
means they must be reachable from a target region.
- Prevent allocations whose only shared uses inside of an `omp.parallel` region
are as part of a `private` clause from being moved to device shared memory.
[lldb] Override UpdateBreakpointSites in ProcessGDBRemote to use MultiBreakpoint
This concludes the implementation of MultiBreakpoint by actually using
the new packet to batch breakpoint requests.
https://github.com/llvm/llvm-project/pull/192910
[clang] fix profiling of pack index expressions (#192810)
This replaces a few incorrect calls of VisitExpr on subcomponents, which
should have been plain `Visit` instead, because the former just
implements the commonality between all kind-specific profile functions
(marking the class kind and visiting children).
So this for example would visit a DeclRefExpr but not actually profile
any of it's properties, like the parameter declaration, so it would fail
to distinguish between DeclRefExps referencing distinct entities.
This also adds a call to record the PackIndexExpr's kind in the profile,
to avoid false positives when comparing expressions with different
kinds.
[lldb] Override UpdateBreakpointSites in ProcessGDBRemote to use MultiBreakpoint
This concludes the implementation of MultiBreakpoint by actually using
the new packet to batch breakpoint requests.
https://github.com/llvm/llvm-project/pull/192910
[lldb] Implement delayed breakpoints
This patch changes the Process class so that it delays *physically*
enabling/disabling breakpoints until the process is about to
resume/detach/be destroyed, potentially reducing the packets transmitted
by batching all breakpoints together.
Most classes only need to know whether a breakpoint is "logically"
enabled, as opposed to "physically" enabled (i.e. the remote server has
actually enabled the breakpoint). However, lower level classes like
derived Process classes, or StopInfo may actually need to know whether
the breakpoint was physically enabled. As such, this commit also adds a
"IsPhysicallyEnabled" API.
https://github.com/llvm/llvm-project/pull/192910
[mlir][acc][flang] Add genCast API to PointerLikeType (#192720)
Introduces new API for PointerLikeType named genCast which can be used
for generating IR that does type conversions. This is implemented for
FIR reference types, memref, and LLVM ptr.
[Flang][OpenMP] Add pass to replace allocas with device shared memory
This patch introduces a new OpenMP MLIR pass, only for target device modules,
that identifies `llvm.alloca` operations that should use device shared
memory and replaces them with pairs of `omp.alloc_shared_mem` and
`omp.free_shared_mem` operations.
This works in conjunction to the MLIR to LLVM IR translation pass' handling of
privatization, mapping and reductions in the OpenMP dialect to properly select
the right memory space for allocations based on where they are made and where
they are used.
This pass, in particular, handles explicit stack allocations in MLIR, whereas
the aforementioned translation pass takes care of implicit ones represented by
entry block arguments.
[clang] fix profiling of pack index expressions
This replaces a few incorrect calls of VisitExpr on subcomponents,
which should have been plain `Visit` instead, because the former
just implements the commonality between all kind-specific profile
functions (marking the class kind and visiting children).
So this for example would visit a DeclRefExpr but not actually profile
any of it's properties, like the parameter declaration, so it would fail
to distinguish between DeclRefExps referencing distinct entities.
This also adds a call to record the PackIndexExpr's kind in the profile,
to avoid false positives when comparing expressions with different kinds.
[clang] fix matching constrained out-of-line definitions of class specialization member function templates (#192806)
The method which gathered the template arguments for transforming
constraints was incorrectly skipping adding the arguments for function
templates which are members of class template specializations.
This fixes that, and removes an undocumented workaround for template
alias CTAD.
Also adds a test case showing #139276 caused a profiling issue with
PackIndexExprs,
which for the tests added in that PR gave the false impression they were
fixing the
problem, but were actually causing the implementation to be too
accepting, which
masked the bug solved in this patch.
[MIR] Always print symbolic INLINEASM operands
We don't need the flag now that all tests are updated to use
symbolic operands.
Remove the update_mir_regclass_numbers script as it shouldn't be
needed anymore.