[dwarf] make dwarf fission compatible with RISCV relaxations 2/2
This patch makes DWARF fission compatible with RISC-V relaxations by
using indirect addressing for the DW_AT_high_pc attribute. This
eliminates the remaining relocations in .dwo files.
[dwarf] make dwarf fission compatible with RISCV relaxations 1/2
Currently, -gsplit-dwarf and -mrelax are incompatible options in
Clang. The issue is that .dwo files should not contain any
relocations, as they are not processed by the linker. However,
relaxable code emits relocations in DWARF for debug ranges that
reside in the .dwo file when DWARF fission is enabled.
This patch makes DWARF fission compatible with RISC-V relaxations.
It uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which
allow referencing addresses from .debug_addr instead of using
absolute addresses. This approach eliminates relocations from .dwo
files.
[llc][NPM] Use buffer_ostream support for non-seekable streams (#168842)
NPM was missing buffering for non-seekable output streams (stdout,
pipes), causing assertion failures when generating object files with `-o
-`.
Use buffer_ostream to provide seekable buffering, matching legacy PM
behavior.
Co-authored-by: vikhegde <vikram.hegde at amd.com>
[clang][NFC] Inline Frontend/FrontendDiagnostic.h -> Basic/DiagnosticFrontend.h (#162883)
d076608d58d1ec55016eb747a995511e3a3f72aa moved some deps around to avoid
cycles and left clang/Frontend/FrontendDiagnostic.h as a shim that
simply includes clang/Basic/DiagnosticFrontend.h. This PR inlines it so
that nothing in tree still includes clang/Frontend/FrontendDiagnostic.h.
Doing this will help prevent future layering issues. See #162865.
Frontend already depends on Basic, so no new deps need to be added
anywhere except for places that do strict dep checking.
AMDGPU: Stop implementing shouldCoalesce
Use the default, which freely coalesces anything it can.
This mostly shows improvements, with a handful of regressions.
The main concern would be if introducing wider registers is more
likely to push the register usage up to the next occupancy tier.
[mlir][linalg] Clean up op verifiers without custom checks(NFC) (#168712)
This PR removes op verifiers that do not implement any custom
verification logic.
TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744)
Query RuntimeLibcalls for the support and the name. The check
that the implementation is exactly __guard_local instead of
unsupported feels a bit strange.
[OpenCL] Add missing OpenCL 3.0 features to OpenCLExtensions.def; revert header-only macros (#168016)
Adds the remaining optional feature macros from the OpenCL C 3.0 spec
(section 6.2.1 table). Targets can now enable these via
OpenCLFeaturesMap returned by getSupportedOpenCLOpts().
Revert a84599f177a6 (header‑only feature macros).
Header‑only macros are difficult to disable on SPIR-V targets,
and the prior undef approach (a60b8f468119) does not scale.
After this PR, they can be disabled via `-cl-ext=-<feature>`.
https://github.com/KhronosGroup/OpenCL-Docs/issues/1328 also notes that
unconditional definition of the header‑only macros in opencl-c-base.h
should be removed.
[Clang] Refactor getOptimizationLevel and getOptimizationLevelSize to non-static. NFC. (#168839)
So that we can reuse these functions in few place, such as in
clang/lib/Driver/ToolChains/CommonArgs.cpp. Part of the code there is
currently copied from getOptimizationLevel.
AMDGPU: Fix treating unknown mem operands as uniform
The test changes are mostly GlobalISel specific regressions.
GlobalISel is still relying on isUniformMMO, but it doesn't really
have an excuse for doing so. These should be avoidable with new
regbankselect.
There is an additional regression for addrspacecast for cov4. We
probably ought to be using a separate PseudoSourceValue for the
access of the queue pointer.
VectorCombine: Improve the insert/extract fold in the narrowing case
Keeping the extracted element in a natural position in the narrowed
vector has two beneficial effects:
1. It makes the narrowing shuffles cheaper (at least on AMDGPU), which
allows the insert/extract fold to trigger.
2. It makes the narrowing shuffles in a chain of extract/insert
compatible, which allows foldLengthChangingShuffles to successfully
recognize a chain that can be folded.
There are minor X86 test changes that look reasonable to me. The IR
change for AVX2 in llvm/test/Transforms/VectorCombine/X86/extract-insert-poison.ll
doesn't change the assembly generated by `llc -mtriple=x86_64-- -mattr=AVX2`
at all.
commit-id:c151bb04
AMDGPU: Improve getShuffleCost accuracy for 8- and 16-bit shuffles
These shuffles can always be implemented using v_perm_b32, and so this
rewrites the analysis from the perspective of "how many v_perm_b32s does
it take to assemble each register of the result?"
The test changes in Transforms/SLPVectorizer/reduction.ll are
reasonable: VI (gfx8) has native f16 math, but not packed math.
commit-id:8b76e888
[AMDGPU] Removal of language sensitive option for _Float16 and half( 'e') handling (#168037)
Removing the 'e' handling for the amdgcn builtins as we decided to use
_Float16 for both HIP/C++ and OpenCL
AMDGPU: Use ConstantPool as source value for DAG lowered kernarg loads
This isn't quite a constant pool, but probably close enough for this
purpose. We just need some known invariant value address. The aliasing
queries against the real kernarg base pointer will falsely report
no aliasing, but for invariant memory it probably doesn't matter.