[Clang] Make gpuintrin out of range grid dimension accessors match OpenCL (#174605)
Summary:
Currently these return an unreachable / invalid value if used out of
range. This PR changes this to match the OpenCL behavior to both give it
a defined value and make it easier to use in those contexts.
[CodeGen] Consider imm offsets when sorting framerefs (#171012)
LocalStackSlotAllocation pass disallows negative offsets with respect to
a base register. The pass ends up introducing a new register for such
frame references. This patch helps LocalStackSlotAlloca to additionally
consider the immediate offset of an instruction, when sorting frame refs
- hence, avoiding negative offsets and maximizing reuse of the existing
registers.
[mlir][Python] fix namespace shadowing on MSVC (#175077)
If you set `MLIR_PYTHON_BINDINGS_DOMAIN=mlir`, you get namespace nesting
like `mlir::python::mlir` and then `mlir::Twine` shadows `llvm::Twine`
(but only on MSVC). So prefix with `::llvm` to have the correct root
namespace.
Co-authored-by: Abhishek Varma <abhvarma at amd.com>
[clang][driver][darwin] Report bad SDKSettings as a fatal error rather than unreachable (#175073)
Fatal error is more appropriate than unreachable when the SDKSettings is
not in a recognized form (encountered in a few tests with incomplete
SDKSettings.json).
[SFrame][Retry] Add assembler option --gsframe (#165806)
This plumbs the option --gsframe through the various levels needed to
support it in the assembler.
This is the final step in assembler-level sframe support for x86. With
it in place, clang produces sframe-sections that successfully link with
gnu-ld.
LLD support is pending some discussion.
The previous PR (https://github.com/llvm/llvm-project/pull/165322) had a
bad merge, but the only comments were as below. Both done.
1. Fix some stray formatting.
2. Add tests that:
the option is passed on to cc1
the correct error is emitted when an unsupported platform is used
[4 lines not shown]
[CodeGen] add RuntimeLibraryInfoWrapper pass to addPassesToEmitMC (#174682)
Register RuntimeLibraryInfoWrapper with the pass manager, following the
change in 04c81a99735c, so that codegen in JIT compiler using ORC JIT is
working correctly.
In our downstream target, memcpy was lowered to a loop because
RuntimeLibraryInfo was missing.
[ORC] Add JITDylibDefunct Error. (#174923)
This Error can be returned from operations on JITDylibs that cannot
proceed as the target JITDylib has been closed.
This patch uses the new error to replace an unsafe assertion in
JITDylib::define: If a JITDylib::define operation is run by an in-flight
task after the target JITDylib is closed it should error out rather than
asserting.
See also https://github.com/llvm/llvm-project/issues/174922
www/nginx-vts-exporter: deprecate and expire
The hnlq715/nginx-vts-exporter upstream has been unmaintained for years.
An actively maintained replacement is available as www/nginx-prometheus-exporter.
Sponsored by: Netzkommune GmbH
[SelectionDAG] Unify ISD::LOAD handling in ComputeNumSignBits. NFC (#175060)
Range metadata was handled in a ISD::LOAD case in the main opcode
switch. Extending loads and constant pools were handled with special
code after the main switch. Move this code into the ISD::LOAD case of
the main switch.
There is one slight change here, I put the Op.getResNo() == 0 check
before the range handling. This should be more correct.
[lldb] Keep the unexpected b/p state for suspended threads (#174264)
This fixes stepping out for a case when two threads reach the
stepping-out target breakpoint simultaneously, but a concurrent thread
executes the breakpoint first. The issue affects platforms with software
breakpoints. The scenario is as follows:
* The `step-out` command is executed for thread `A`.
* `ThreadPlanStepOut` creates a breakpoint at the target location.
* All threads are resumed, because the `step-out` command does not
suspend other threads.
* Threads `A` and `B` reach the stepping-out address at the same time,
but `B` executes the breakpoint instruction first.
* `SetThreadStoppedAtUnexecutedBP()` is called for thread `A`, and
`SetThreadHitBreakpointSite()` is called for thread `B`.
* Thread `B` has no plans to stop at this location, so
`ThreadPlanStepOverBreakpoint` is scheduled.
* The plan disables the breakpoint and resumes thread `B` with
`eStateStepping`; for thread `A`, `ShouldResume(eStateSuspended)` is
[8 lines not shown]
[OFFLOAD] Make L0 provide more information about device to be consistent with other plugins (#172946)
Update information about devices provided by level zero plugin in order
to be more consistent with other plugins.
[LoopFusion] Non-loop block must be the immediate successor of exit (#175034)
Loop fusion assumes the non-loop block of a guarded adjacent loop is the
immediate successor of its exit block. This patch ensures this condition
is hold and fixes the crash #166356.
[AMDGPU] Optimize block count calculations to the new ABI (#174112)
Summary:
We already have a way to get the block count using the old grid size
lookup and dividing it by the number of threads. We did not want to make
a new intrinsic to do the same thing, so this optimization pattern
matches on this usage to automatically optimize it to the new form. This
should improve performance of old kernels by converting branches into a
simple index lookup and removing the division.
[SPIR-V] Add clang builtin for group-wide barrier (#175064)
Summary:
This adds a clang builtin for the existing group sync. I was considering
instead exposing a raw barrier operation and chaining it with a
`__scoped_atomic_thread_fence` but this seemed simpler. Right now this
implies a sequentially consistent memory fence. These semantics should
already match with what's implied with CUDA `__syncthreads`. I'm unsure
if there's a situation where we'd need more control. If we want more
control we'd probably just want to match it up with the scoped atomic
scopes.
[SPIR-V] Add clang builtin for subgroup shuffles (#174655)
Summary:
This is an attempt to begin filling out some missing pieces to allow
more generic compute code to use SPIR-V flavored builtins. This should
provide the basic shuffle operation. The next most important one is the
ballot, but I don't think we have an IR intrinsic for that yet.
I don't know SPIR-V very well so let me know if this is the proper
function with the proper semantic checks.
[SampleProf] Handle coro wrapper function name canonicalization (#174881)
Fix an issue where `FunctionSamples::getCanonicalFnName` incorrectly
canonicalizes coro await suspense wrapper functions to collide with the
coro function itself. This causes the sample annotation to skip coro
function. Canonicalization strips everything comes after the first dot
(.), unless the function attribute
"sample-profile-suffix-elision-policy" is set to "selected", in which
case it strips after the known suffixes. The wrapper function name has
the suffix of ".__await_suspend_wrapper__" + await_kind. Add the
attribute to wrapper function so that the suffix is not stripped.