[libclc] Add OpenCL atomic_*_explicit builtins (#168318)
Implement atomic_*_explicit (e.g. atomic_store_explicit) with
memory_order plus optional memory_scope.
OpenCL memory_order maps 1:1 to Clang (e.g. OpenCL memory_order_relaxed
== Clang __ATOMIC_RELAXED), so we pass it unchanged to clc_atomic_*
function which forwards to Clang _scoped_atomic* builtins.
Other changes:
* Add __opencl_get_clang_memory_scope helper in opencl/utils.h (OpenCL
scope -> Clang scope).
* Correct atomic_compare_exchange return type to bool.
* Fix atomic_compare_exchange to return true when value stored in the
pointer equals expected value.
* Remove volatile from CLC functions so that volatile isn't present in
LLVM IR.
* Add '-fdeclare-opencl-builtins -finclude-default-header' flag to
include
[3 lines not shown]
Add a `breakpoint add` command to fix the option-madness that is `breakpoint set` (#156067)
Someone came up with a clever idea for a new breakpoint type, but we
couldn't figure out how to add it in an ergonomic way because
`breakpoint set` has used up all the short-option characters. And even
if they did find a way to add it, the help for `break set` is so
confusing - because of the way it is implemented - that very few people
would detect the addition.
The basic problem is that `break set` distinguishes amongst the
fundamental breakpoint types it offers by which "required option" you
provide. If you pass a `-a` you are setting an address breakpoint, if
`-n`, `-F`, etc. a symbol name based breakpoint. And so forth. That is
however pretty hard to discern from the option grouping printing from
`help break set`. `break set` also suffers from the problem that it uses
common options in different ways depending on which "required" option is
present, which makes documenting the various behaviors difficult. And as
we run out of single letters it makes extending it difficult to
impossible.
[130 lines not shown]
build_llvm_release.bat: Use absolute path when building the tarball (#169951)
The rest of the script uses an absolute path for the llvm source
directory too.
[Clang][OpenCL] Add support for the cl_intel_subgroup_buffer_prefetch (#170532)
The commit adds support for the cl_intel_subgroup_buffer_prefetch OpenCL
extension. The extension introduces a new built-in functions that allow
prefetching data from a global memory to caches as a subgroup-level
operation.
The extension is defined here:
https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroup_buffer_prefetch.html
---------
Co-authored-by: Mészáros Gergely <maetveis at gmail.com>
Revert "[WebAssembly] Implement addrspacecast to funcref" (#170785)
Reverts llvm/llvm-project#166820
There was a failure in the ENABLE_EXPENSIVE_CHECKS configuration.
[clang-doc] Remove uses of consumeError (#168759)
In BitcodeReader, we were using consumeError(), which drops the error
and hides it from normal usage. To avoid that, we can just slightly
tweak the API to return an Expected<T>, and propagate the error
accordingly.
[WebAssembly] Implement addrspacecast to funcref (#166820)
Adds lowering of `addrspacecast [0 -> 20]` to allow easy conversion of
function pointers to Wasm `funcref`
When given a constant function pointer, it lowers to a direct
`ref.func`. Otherwise it lowers to a `table.get` from
`__indirect_function_table` using the provided pointer as the index.
Reland Refactor WIDE_READ to allow finer control over high-performance function selection (#165613) (#170738)
[Previous commit had an incorrect default case when
FIND_FIRST_CHARACTER_WIDE_READ_IMPL was not specified in config.json.
This PR is identical to that one with one line fixed.]
As we implement more high-performance string-related functions, we have
found a need for better control over their selection than the big-hammer
LIBC_CONF_STRING_LENGTH_WIDE_READ. For example, I have a memchr
implementation coming, and unless I implement it in every variant, a
simple binary value doesn't work.
This PR makes gives finer-grained control over high-performance
functions than the generic LIBC_CONF_UNSAFE_WIDE_READ option. For any
function they like, the user can now select one of four implementations
at build time:
1. element, which reads byte-by-byte (or wchar by wchar)
2. wide, which reads by unsigned long
[11 lines not shown]
[mlir][acc] Improve verifier messages for device_type duplicates (#170773)
This improves the acc dialect IR verifier messages when duplicate
device_types are found by also noting which device_type is the one
causing the error.
[acc] Add acc.specialized_routine attribute (#170766)
Introduce a new attribute `acc.specialized_routine` to mark functions
that have been specialized from a host function marked with
`acc.routine_info`.
The new attribute captures:
- A SymbolRefAttr referencing the original `acc.routine` operation
- The parallelism level via the new `ParLevel` enum
- The original function name (since specialized functions may be
renamed)
Example - before specialization:
```
acc.routine @routine_gang func(@foo) gang
acc.routine @routine_vector func(@foo) vector
func.func @foo() attributes {
acc.routine_info = #acc.routine_info<[@routine_gang, @routine_vector]>
[26 lines not shown]
[profcheck] Fix missing profile metadata in ExpandMemCmp (#169979)
This patch fixes a profile metadata missing in the `ExpandMemCmp` pass
when it expanding `memcmp` calls. This would cause branches between
different blocks to lose their profile data, potentially leading to
suboptimal code generation.
The patch updates the `ExpandMemCmp` pass to set branch weights to a
default `unknown`(50/50 weights) value when a profile is available. This
prevents the expansion from making a previously profiled branch
unprofiled.
The patch also includes updates to the tests to reflect the new branch
weights.
Co-authored-by: Jin Huang <jingold at google.com>