[flang][acc] Add ACCOptimizeFirstprivateMap pass (#178546)
This pass optimizes acc.firstprivate_map operations generated during
OpenACC recipe materialization when acc.firstprivate is materialized
into the mapping and a private allocation inside region. The
optimization applies to scalar variables of trivial types (integers,
reals, logicals) as long as they are not optional.
The pass hoists loads from the firstprivate variable to before the
compute region, converting the firstprivate copy to a pass-by-value
pattern. This eliminates the need for runtime copying the firstprivate
variable since only its value is needed for initializing private copies.
[OpenMP][MLIR] Modify lowering OpenMP Dialect lowering to support attach mapping
This PR adjusts the LLVM-IR lowering to support the new attach map type that the runtime
uses to link data and pointer together, this swaps the mapping from the older
OMP_MAP_PTR_AND_OBJ map type in most cases and allows slightly more complicated ref_ptr/ptee
and attach semantics.
[HLSL] Make log10, exp2, sinh overload tests stricter NFC (#177495)
This patch updates log10, exp2, sinh overload tests to use -O1 instead
of -disable-llvm-passes; also, the checks are updated to match the
change accordingly.
This work is part of https://github.com/llvm/llvm-project/issues/138016.
[CodeGen] Remove -gc-empty-basic-blocks alias flag (#178716)
This was deprecated earlier and switched to an alias so we would have
some time to update things through our internal compiler release
process. That has completed, so we can now remove the flag upstream.
[AMDGPU][GlobalISel] Add RegBankLegalize rules for buffer store variants (#178488)
Add rules for G_AMDGPU_BUFFER_STORE, G_AMDGPU_BUFFER_STORE_FORMAT,
G_AMDGPU_BUFFER_STORE_FORMAT_D16, and G_AMDGPU_TBUFFER_STORE_FORMAT.
[Offload] Provide a cache file for building OpenMP w/ Flang offloading (#178472)
Summary:
This build is more annoying, enables everything required for this
version of the runtime.
[ProfCheck] Add ExpandIRInsts test to profcheck-xfail.txt
There are in-flight patches to fix the remaining ExpandIRInsts failures
that should fix this. Disable for now to get the bot back to green.
[libc] Add pthread_getattr_np declaration (#178549)
Add header declaration and implementation header for the GNU
extension function pthread_getattr_np. An actual implementation
can come later.
[NFC][UTC] Avoid copy-paste in `update_analyze_test_checks.py` (#178456)
I'm going to extend it to support VPlan dump tests soon and this cleanup
avoids future growth of the `if/elif/elif` chain.
[lldb] Make `print` delegate to synthetic frames.
This patch is more of a proposal in that it's a pretty dramatic change to the way that `print` works. It completely delegates getting values to the frame if the frame is synthetic, and does not redirect at all if the frame fails.
For this patch, the main goal was to allow the synthetic frame to bubble up its own errors in expression evaluation, rather than having errors come back with an extra "could not find identifier <blah>" or worse, simply get swallowed. If there's a better way to handle this, I'm more than happy to change this as long as the core goals of 'delegate variable/value extraction to the synthetic frame', and 'allow the synthetic frame to give back errors that are displayed to the user' can be met.
stack-info: PR: https://github.com/llvm/llvm-project/pull/178602, branch: users/bzcheeseman/stack/7
[lldb] Add support for ScriptedFrame to provide values/variables.
This patch adds plumbing to support the implementations of StackFrame::Get{*}Variable{*} on ScriptedFrame. The major pieces required are:
- A modification to ScriptedFrameInterface, so that we can actually call the python methods.
- A corresponding update to the python implementation to call the python methods.
- An implementation in ScriptedFrame that can get the variable list on construction inside ScriptedFrame::Create, and pass that list into the ScriptedFrame so it can get those values on request.
There is a major caveat, which is that if the values from the python side don't have variables attached, right now, they won't be passed into the scripted frame to be stored in the variable list. Future discussions around adding support for 'extended variables' when printing frame variables may create a reason to change the VariableListSP into a ValueObjectListSP, and generate the VariableListSP on the fly, but that should be addressed at a later time.
This patch also adds tests to the frame provider test suite to prove these changes all plumb together correctly.
stack-info: PR: https://github.com/llvm/llvm-project/pull/178575, branch: users/bzcheeseman/stack/6
[lldb] Add conversions for SBValueList and SBValue to the python bridge.
This patch adds support for:
- PyObject -> SBValueList (which was surprisingly not there before!)
- PyObject -> SBValue
- SBValue -> ValueObjectSP using the ScriptInterpreter
These three are the main remaining plumbing changes necessary before we can get to the meat of actually using ScriptedFrame to provide values to the printer/etc. Future patches build off this change in order to allow ScriptedFrames to provide variables and get values for variable expressions.
stack-info: PR: https://github.com/llvm/llvm-project/pull/178574, branch: users/bzcheeseman/stack/5
[lldb] Move ValueImpl and ValueLocker to ValueObject, NFC.
This patch moves ValueImpl and ValueLocker to ValueObject.{h,cpp}. This follows the example set in TypeImpl/SBType, where we have something that SBType uses internally that needs to be exposed in the layer below. In this case, SBValue uses ValueImpl, which wraps ValueObject. The wrapper helps avoid bugs, so we want to keep it, but the script interpreter needs to use it and said interpreter is conceptually *below* the SB layer...which means we can't use methods on SBValue.
This patch is purely the code motion part of that, future patches will actually make use of this moved code.
stack-info: PR: https://github.com/llvm/llvm-project/pull/178573, branch: users/bzcheeseman/stack/4
[mlir][vector] Add assumeAligned mode to vector.store narrow type emulation (#178565)
The revision adds a new `assumeAligned` mode to the emulation, so
downstream projects can use simple path when it meets the requirements.
E.g., if the offset is always aligned with container's element type, we
can skip the check of front padding sizes.
---------
Signed-off-by: hanhanW <hanhan0912 at gmail.com>
[ELFDebugObjectPlugin] Do not wait for std::future in post-fixup phase in the absent of debug info (#178541)
If there is no debug information, we wouldn't call
`DebugObject::collectTargetAlloc` in the post-allocation phase.
Therefore, when it's in the post-fixup phase,
`DebugObject::awaitTargetMem` will fail with _"std::future_error: No
associated state"_ because the std::future was not even populated.
[libc++] Fix `__split_buffer_size_layout` bugs (#178341)
As `__split_buffer` doesn't have any unit tests, and because #139632
disassociated adding `__split_buffer` support for both pointer-based and
size-based layouts from its `vector` counterpart, libc++ had no way to
expose `__split_buffer_size_layout` bugs until the size-based vector
patch integrated #139632. This commit fixes the two problems that were
identified while working on #155330.
[Github] Make prune branches workflow save branch patches
This patch makes the prune branches workflow create patches for the
branches that it is about to prune containing all their changes. These
are then uploaded as a Github artifact with a 90 day experiation just in
case there was something in a branch that someone will later need and is
not available elsewhere.
Reviewers: petrhosek, vbvictor, cmtice
Pull Request: https://github.com/llvm/llvm-project/pull/178539
[SPIRV] Split async copy tests and fix invalid tests
After a spirv-val update, tests that mix spirv32 and spirv64 targets with
the same LLVM IR are now correctly flagged as invalid. The SPIR-V
specification requires that NumElements and Stride operands in
OpGroupAsyncCopy must be 32-bit integers when the addressing model is
Physical32, and 64-bit integers for Physical64.
[mlir][xegpu] Add `XeGPUSgToWiDistributeExperimental` pass. (#177492)
Currently XeGPU lowering pipeline uses `XeGPUSubgroupDistribute` pass to
subgroup to work item distribution of ops. This pass is well established
and relies on vector distribution's `WarpOp` based distribution
mechanism. However, recent experiments with larger kernels have shown
that this pass is very expensive in terms of compile time (see below).
This prompted us to create a new pass that does not rely on `WarpOp`
based distribution. This PR adds the initial infra to move away from the
old way and align Wg To WI distribution with Wg to Sg distribution. New
pass also uses context-aware type conversion based on XeGPU layouts to
distributed vector types from SG to WI.
This PR adds the following changes:
* SG to WI distribution pass based on context-aware type conversions
using `OpConversionPatterns`
* Test pass for testing individual patterns
(`TestXeGPUSgToWiDistributeExperimental`)
[32 lines not shown]
[mlir][GPU|NVVM] Update the default SM to 7.5 (#177469)
Update MLIR's default SM to `sm_75`. This matches the behavior of
offline compilation tools in the CUDA Toolkit (`nvcc`, `ptxas`, ...) and
follows suit with 9fc5fd0ad689eed94f65b1d6d10f9c5642935e68.
Additionally, `sm_75` is the oldest GPU variant compatible with the
widest range of recent major CUDA Toolkit versions (11/12/13).
[MLIR] convert OpAsmDialectInterface using ODS (#171488)
This PR converts OpAsmDialectInterface using ODS.
It also introduces a new Interface Method class `InterfaceMethodDeclaration` which will declare the function without definition.
[mlir][spirv] Enforce `GroupNonUniformQuadSwap` direction values using an attribute (#178684)
The direction can only take one of the three values {0, 1, 2} so we use
a SPIR-V attribute to enforce it. This property cannot be enforced when
the direction is a constant value as the verifier cannot test for
non-local properties.