[AArch64][GlobalISel] Update fp legalization mir tests. NFC (#194561)
This updates a number of the floating point mir legalization tests to
use f
types instead of generic s types.
Hash.cpp: include ErrorHandling.h (#194553)
Hash.cpp uses llvm_unreachable but currently picks up ErrorHandling.h
only transitively through xxhash.h -> ArrayRef.h -> Hashing.h.
[LangRef] Specify that syncscopes can affect the monotonic modification order
If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.
So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
[AMDGPUUsage] Specify what one-as syncscopes do (#189016)
This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
[LangRef] Specify that syncscopes can affect the monotonic modification order
If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.
So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
[AMDGPUUsage] Specify what one-as syncscopes do
This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
[LangRef][AMDGPU] Specify that syncscope can cause atomic operations to race (#189015)
Targets should be able to specify that the syncscope of atomic operations
influences whether they participate in data races with each other.
For example, in AMDGPU, we want (and already implement) the load in the
following case to be in a data race (i.e., return `undef` according to the
current definition), because there is an atomic store with workgroup syncscope
executing in a different workgroup:
```
; workgroup 0:
store atomic i32 1, ptr %p syncscope("workgroup") monotonic, align 4
; workgroup 1:
store atomic i32 2, ptr %p syncscope("workgroup") monotonic, align 4
load atomic i32, ptr %p syncscope("workgroup") monotonic, align 4
```
[3 lines not shown]
[libc][math] Refactor fmin family to header-only (#194549)
Refactors the fmin math family to be header-only.
part of: #147386
Target Functions:
- fmin
- fminbf16
- fminf
- fminf128
- fminf16
- fminl
[clang][bytecode] Add a missing fallthrough() call (#194537)
When the local variable is enabled but we don't emit any dtor
instructions, we need to fallthrough to the `EndLabel`.
[mlir][tosa] Make tosa-attach-target optional in addTosaToLinalgPasses (#193467)
addTosaToLinalgPasses unconditionally scheduled tosa-attach-target, thus
adding a `tosa.target_env` attribute to the module. Callers therefore
had no way to opt out of such attribute. This attribute is consumed if
validationOptions is enabled, which is optional. Therefore, if the
caller disables validationOptions, the tosa-attach-target attribute will
exist even after TosaToLinalg. So consumers that don't load the TOSA
dialect can't even parse the resulting module.
This PR makes sure that we schedule tosa-attach-target only when the
caller actually needs it, i.e. when validationOptions or
attachTargetOptions is set. The default values stay inside the
`!attachTargetOptions` branch so callers that set validationOptions
still get a target env to validate against, preserving existing
behavior.
Also add a `validation` pipeline option (default `true`) to the
registered `tosa-to-linalg-pipeline`, so it can be invoked without
scheduling either `tosa-attach-target` / `tosa-validate`. A lit test is
also added.
[clang-tidy] Fix UB in SuspiciousIncludeCheck when IgnoredRegex is not set (#194521)
When the "IgnoredRegex" option is not set, `IgnoredRegexString` is
default-constructed, i.e. initialized with a null data pointer. This is
passed to `llvm_regcomp` as the pattern argument, causing a nullptr+0 UB
in regcomp.c (caught by UBSan). Fix by initializing
`IgnoredRegexString` with an empty string literal instead.
[CodeGen] Make AsmPrinter::MAI a reference. NFC (#194538)
AsmPrinter::MAI is non-null. This is made more explicit after
PR #194523 changed TargetMachine::getMCAsmInfo to return a reference
with recent MCAsmInfo/MCTargetOptions related refactoring.
Convert the member from const MCAsmInfo * to const MCAsmInfo & and
update all consumers.
Fix UNSUPPORTED added to test in #194502. (#194541)
The change in #194502 attempted to mark the test as UNSUPPORTED for
AArch64, but it didn't work because it wasn't specified correctly. This
fixes it.
[RFC][AMDGPU] Add AMDGPU_SUMMARY bitcode block for ThinLTO
With AMDGPU object linking, device functions are compiled separately from the
kernels that call them. Without whole-program visibility, the compiler must be
conservative about occupancy for every device function, leading to suboptimal
resource usage. However, GPU kernels typically carry explicit occupancy control
attributes that constrain the launch environment. ThinLTO is the natural place
to propagate these kernel attributes to callees: the combined module summary
index contains a cross-TU call graph, allowing occupancy information to be
propagated top-down from kernels to all reachable device functions. The backend
can then generate better code with the propagated constraints, achieving
whole-program awareness without the compile-time overhead of full LTO.
This patch introduces a dedicated AMDGPU_SUMMARY bitcode block that serializes
per-function summary data alongside the standard module summary. The block is
scoped to AMDGPU so that non-AMDGPU targets are completely unaffected. A
follow-up patch will add the ThinLTO propagation logic that reads these
summaries and applies conservative attribute bounds to device functions
reachable from multiple kernels.
[DataLayout] Add null pointer value infrastructure
Add support for specifying the null pointer bit representation per address space
in DataLayout via new pointer spec flags:
- 'z': null pointer is all-zeros
- 'o': null pointer is all-ones
When neither flag is present, the address space inherits the default set by the
new 'N<null-value>' top-level specifier ('Nz' or 'No'). If that is also absent,
the null pointer value is zero.
No target DataLayout strings are updated in this change. This is pure
infrastructure for a future ConstantPointerNull semantic change to support
targets with non-zero null pointers (e.g. AMDGPU).
[CIR][AMDGPU] Add lowering for amdgcn div fmas builtins (#194334)
Upstreaming ClangIR PR: https://github.com/llvm/clangir/pull/2051
This PR adds support for lowering of _builtin_amdgcn_div_fmas* amdgpu
builtins to clangIR.
Followed similar lowering from reference clang->llvmir in
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp.