AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7 (#175795)
Keep bf16/f16 values encoded as the low half of a 32-bit register,
instead of promoting to float. This avoids unwanted FP effects
from the fpext/fptrunc which should not be implied by just
passing an argument. This also fixes ABI divergence between
SelectionDAG and GlobalISel.
I've wanted to make this change for ages, and failed the last
few times. The main complication was the hack to return
shader integer types in SGPRs, which now needs to inspect
the underlying IR type.
Add persistent option to cache plugin
This commit adds ability to persistently set cache entries
(survives across middleware restarts / reboots, but not system
upgrades), and set clustered cache entries (ditto about
lifecycle).
In basic benchmarking there wasn't that much perf difference
between implementations and so this reduces overall complexity
Reapply "[CGObjC] Allow clang.arc.attachedcall on -O0 (#164875)" (#177285)
This reverts commit 8eac375a4bff84f0a10a9c8ee23c4da409a805f9.
A unit test needed to be updated, that was all.
I do not have merge permissions.
[flang][cuda] Allow device descriptor in show_descriptor (#177424)
Descriptor are always in managed memory so it is safe to call
show_descriptor for them.
[NVPTX] Update the default SM to 7.5 (#176021)
Update NVPTX's default SM to sm_75. This matches the behavior of offline
compilation tools in the CUDA Toolkit (nvcc, ptxas, ...)
[Clang][CIR] Implement CIRGen logic for __builtin_bit_cast (#176782)
NOTE: This PR upstreams code from
* https://github.com/llvm/clangir.
This logic was originally implemented by Sirui Mu in
https://github.com/llvm/clangir/pull/762. Further
modification were made by other ClangIR contributors.
Co-authored-by: Sirui Mu <msrlancern at gmail.com>
[HLSL] Make radians overload tests stricter. NFC (#177252)
This patch is updating
`clang/test/CodeGenHLSL/builtins/radians-overloads.hlsl` to use -O1
instead of -disable-llvm-passes, and updating the check to match the
change accordenlly.
This work is part of #138016.
[TableGen] Prefer base class on tied RC sizes (#177257)
When searching for a matching subclass tablegen behavior is non
deterministic if we have several classes with the same size.
Break the tie by choosing a class with smaller BaseClassOrder.
[SystemZ] Implement ctor/dtor emission via @@SQINIT and .xtor sections
This patch implements support for constructors/destructors by introducing the
@@SQINIT section and emitting .xtor.<priority> sections within the SystemZ
AsmPrinter and in the GOFF object lowering layer. Improvements to ADA descriptor
handling is also done within this change.
DAG: Remove softPromoteHalfType
Remove the now unimplemented target hook and associated DAG machinery
for the old half legalization path.
Really fixes #97975
[msan] Handle NEON vsli/vsri (vector shift left/right and insert) (#177283)
This does a shift and combine on the two vectors, hence we can precisely
propagate the shadow by applying the intrinsic to the input shadows.
mdmfs: Fix soft updates logic
Now that newfs(8) has a command-line argument to disable soft updates,
use that instead of running tunefs(8) after the fact to turn them off.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
Reviewed by: mckusick, imp
Differential Revision: https://reviews.freebsd.org/D54783
[libc] Fix character converter in C++20 (#177421)
Internally character converter uses char8_t to represent a character.
Before C++20 we provide a typedef for it, but since C++20 it's a
keyword. The keyword version isn't listed in our is_integral, causing
countl to reject it as an invalid type. This patch just casts from
char8_t to int8_t to sidestep the issue, but in future we may want to
add char8_t, char16_t, and char32_t to our is_integral.
[profcheck] Print the function name in the error (#177264)
This is helpful for tests with a lot of test cases and grepping for
instructions with missing profile data isn't feasible because it doesn't
account for things like vectors which are exempt.
Tracking issue: #147390
[lld][ELF/MachO] Use `.contains` rather than `.count` for set membership. NFC (#177404)
This matches the usage in the other linker backends.
See #176610, #177067
R600: Remove softPromoteHalfType
Also includes a kind of hacky, minimal change to avoid assertions
when softPromoteHalfType is removed to fix kernel arguments
lowered as f16. Half support was never really implemented
for r600, and there just happened to be a few incidental tests
which included a half argument (which were also not even meaningful,
since the function body just folded to nothing due to no callable
function support).