[DirectX] Don't byte-swap returned byte-offset (#155860)
- The returned byte offset from `rewriteOffsetToCurrentByte` should not
be byte-swapped as it will be compared and interpreted as a uint32_t in
its uses
This commit corrects build failures that hit an assert on big-endian
builds
libclc: CMake: include GetClangResourceDir (#155836)
`get_clang_resource_dir` is not guarantee to be there. Make sure of it
by including `GetClangResourceDir`.
[lit] Refactor available `ptxas` features (#154439)
ToT `lit` currently assumes that a given `ptxas` version supports all
capabilities of prior `ptxas` releases. This approach was flexible
enough to support the removal of 32-bit address compilation from `ptxas`
in CUDA 12.1, but it struggles with the removal of Volta and prior
compilation in CUDA 13.0.
To deal with this, this PR refactors how `lit` defines the set of
features available for a given `ptxas` version. It invokes `ptxas` not
just to get its version, but also to get the list of supported SMs,
supported PTX ISA versions, and support for 32-bit compilation.
This approach should be flexible enough to deal with the changing
support matrix of `ptxas` as it goes forward. One obvious downside is
that this relies on parsing the `stdout` of `ptxas`, something that's
inherently unstable. But, IMO, this is something that we can fix as
needed.
[MLIR][Vector] Add warp distribution for `vector.step` op (#155425)
This PR adds a distribution pattern for
[`vector.step`](https://mlir.llvm.org/docs/Dialects/Vector/#vectorstep-vectorstepop)
op.
The result of the step op is a vector containing a sequence
`[0,1,...,N-1]`. For the warp distribution, we consider a vector with `N
== warp_size` (think SIMD). Distributing it to SIMT, means that each
lane is represented by a thread/lane id scalar.
More complex cases with the support for warp size multiples (e.g.,
`[0,1,...,2*N-1]`) require additional layout information to be handled
properly. Such support may be added later.
The lane id scalar is wrapped into a `vector<1xindex>` to emulate the
sequence distribution result.
Other than that, the distribution is similar to that of
`arith.constant`.
Provide ErrorBadParamsToCopyContiguousContainerAnnotations a more cor… (#139870)
…rect 'reason' when constructing ErrorBase
Co-authored-by: Tacet <advenam.tacet at gmail.com>
[flang][OpenMP] Reassociate floating-point ATOMIC update expressions (#155840)
This is a follow-up to PR153488, this time the reassociation is enabled
for floating-point expressions, but only when associative-nath is
enabled in the language options. This can be done via -ffast-math on the
command line.
[CI] Support using blob prefix for downloading
So that uploading and downloading use the same file paths. Also refactor
everything so that _get_blob_prefix is the common implementation.
[HLSL][DirectX] Remove uniformity bit from resource initialization intrinsics (#155332)
Removes uniformity bit from resource initialization intrinsics `llvm.{dx|spv}.resource.handlefrombinding` and `llvm.{dx|spv}.resource.handlefromimplicitbinding`. The flag currently always set to `false`. It should be derived from resource analysis and not provided by codegen.
Closes #135452
[HLSL] Add static methods for resource initialization and a constructor from handle
Adds static methods `__createFromBinding` and `__createFromImplicitBinding`
to resource classes. These methods will be used for resource initialization
instead of the resource constructors that take binding information.
Also adds a private resource constructor that takes an initialized resource handle.
This constructor will be called from the static create methods.
[flang][OpenMP] Replace OpenMPBlockConstruct with OmpBlockConstruct
OpenMPBlockConstruct, somewhat confusingly, represents most but not all
block-associated constructs. It's derived from OmpBlockConstruct, as are
all the remaining block-associated constructs.
It does not correspond to any well-defined group of constructs. It's the
collection of constructs that don't have their own types (and those that
do have their own types do so for their own reasons).
Using the broader OmpBlockConstruct in type-based visitors won't cause
issues, because the specific overloads (for classes derived from it) will
always be preferred.
[CI] Prefix lit timing blobs with platform
Windows and Linux might have vastly different test runtimes. We also end
up with significantly fewer files on Windows and do not want to incur
the overhead of unpacking/repacking a bunch of files that we never end
up using.
[HLSL] Add static methods for resource initialization and a constructor from handle
Adds static methods `__createFromBinding` and `__createFromImplicitBinding`
to resource classes. These methods will be used for resource initialization
instead of the resource constructors that take binding information.
Also adds a private resource constructor that takes an initialized resource handle.
This constructor will be called from the static create methods.
[LTO] Enhance unified/nonunified LTO checks. (#148229)
For the PS targets, unified LTO pipeline is used as default behaviour when enabling LTO. Other targets use Distinct LTO pipeline behaviour as default behavious. Unified/nonunified LTO checks are enhanced in this PR:
1. Check that the default, unified, and non-unified behavior for asan-unified-lto.ll all match (irrespective of what the default unified/nonunified behavior is).
2. Check that the difference in behavior for 'unified' and 'full' runs is (regarding the passes) is what is expected (rather than simply that they are different).
[HLSL] Reorder the arguments of handle initialization builtins
Reorder the arguments of handle initialization builtins to match the order of the
llvm intrinsics, and also to match the arguments on the static create methods for
resources (coming soon).
[TargetLoweringObjectFile] Handle riscv BE (#155166)
Add DWARF exception handling support for riscv big-endian targets.
More CodeGen changes related to riscvbe are coming.