AMDGPU: Stop implementing shouldCoalesce
Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.
[mlir][linalg] Clean up op verifiers without custom checks(NFC) (#168712)
This PR removes op verifiers that do not implement any custom
verification logic.
TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744)
Query RuntimeLibcalls for the support and the name. The check
that the implementation is exactly __guard_local instead of
unsupported feels a bit strange.
add a sshbuf_get_nulterminated_string() function to pull a \0-
terminated string from a sshbuf. Intended to be used to improve
parsing of SOCKS headers for dynamic forwarding.
ok deraadt; feedback Tim van der Molen
[OpenCL] Add missing OpenCL 3.0 features to OpenCLExtensions.def; revert header-only macros (#168016)
Adds the remaining optional feature macros from the OpenCL C 3.0 spec
(section 6.2.1 table). Targets can now enable these via
OpenCLFeaturesMap returned by getSupportedOpenCLOpts().
Revert a84599f177a6 (header‑only feature macros).
Header‑only macros are difficult to disable on SPIR-V targets,
and the prior undef approach (a60b8f468119) does not scale.
After this PR, they can be disabled via `-cl-ext=-<feature>`.
https://github.com/KhronosGroup/OpenCL-Docs/issues/1328 also notes that
unconditional definition of the header‑only macros in opencl-c-base.h
should be removed.
15.0: RC3 builds are underway
The hopefully-final Release Candidate builds are underway. Unless
any critical issues emerge in the next week, 15.0-RELEASE should
land on schedule on December 2nd.
[Clang] Refactor getOptimizationLevel and getOptimizationLevelSize to non-static. NFC. (#168839)
So that we can reuse these functions in few place, such as in
clang/lib/Driver/ToolChains/CommonArgs.cpp. Part of the code there is
currently copied from getOptimizationLevel.
AMDGPU: Fix treating unknown mem operands as uniform
The test changes are mostly GlobalISel specific regressions.
GlobalISel is still relying on isUniformMMO, but it doesn't really
have an excuse for doing so. These should be avoidable with new
regbankselect.
There is an additional regression for addrspacecast for cov4. We
probably ought to be using a separate PseudoSourceValue for the
access of the queue pointer.
VectorCombine: Improve the insert/extract fold in the narrowing case
Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:
1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
compatible, which allows foldLengthChangingShuffles to successfully
recognize a chain that can be folded.
There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.
commit-id:c151bb04
AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles
These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"
The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.
commit-id:8b76e888
Determine how many queue pairs we have by looking at the I40E_PFLAN_QALLOC
register, rather than assuming we have the full capacity of the whole
chip, which is likely to be split among 2 or 4 functions.
ok jan@ dlg@
[AMDGPU] Removal of language sensitive option for _Float16 and half( 'e') handling (#168037)
Removing the 'e' handling for the amdgcn builtins as we decided to use
_Float16 for both HIP/C++ and OpenCL
AMDGPU: Use ConstantPool as source value for DAG lowered kernarg loads
This isn't quite a constant pool, but probably close enough for this
purpose. We just need some known invariant value address. The aliasing
queries against the real kernarg base pointer will falsely report
no aliasing, but for invariant memory it probably doesn't matter.