[X86] combineConcatVectorOps - IsConcatFree - detect splats that comes from a common load/broadcastload (#174986)
Allows us to handle freely concatable cases after a broadcast load has
become shared by different vector width uses by peeking through
bitcasts/extract_subvector nodes
[RISC-V][Mach-O] Print immediate operands in hexadecimal format. (#174505)
This is done for logical operations and auipc/lui.
Patch based on code written by Tim Northover.
[SPIRV] Additional fixes for const init via `UtoPtr` (#172584)
#166494 added support for using `inttoptr` in global initialisation, and
lowering int into `OpSpecConstantOp OpConvertUToPtr`. Unfortunately, it
slightly more subtle case / exposed an existing issue around the `COPY`
pseudo-op. This patch ensures that we glance through a `COPY` when
figuring out whether an `OpConvertUToPtr` is actually operating on a
global. We also correctly handle the case where a `G_PTR_ADD` is used by
an `OpSpecConstantOp` in the context of global initialisation, which
would otherwise lead to broken SPIR-V wherein the latter would reference
a non constant Op.
---------
Co-authored-by: Marcos Maronas <marcos.maronas at intel.com>
[InlineSpiller][AMDGPU] Implement subreg reload during RA spill
Currently, when a virtual register is partially used, the
entire tuple is restored from the spilled location, even if
only a subset of its sub-registers is needed. This patch
introduces support for partial reloads by analyzing actual
register usage and restoring only the required sub-registers.
This improvement enhances register allocation efficiency,
particularly for cases involving tuple virtual registers.
For AMDGPU, this change brings considerable improvements
in workloads that involve matrix operations, large vectors,
and complex control flows.
[AMDGPU] Test precommit for subreg reload
This test currently fails due to insufficient
registers during allocation. Once the subreg
reload is implemented, it will begin to pass
as the partial reload help mitigate register
pressure.
[AMDGPU] Put back ProperlyAlighedRC helper functions
Putting back the functions that are recently deleted
as they were found unused. They are needed for
implementing subreg reload during RA.
[CodeGen] Enhance createFrom for sub-reg aware cloning
Instead of just cloning the virtual register, this
function now creates a new virtual register derived
from a subregister class of the original value.
[AMDGPU] Make AMDGPURewriteAGPRCopyMFMA aware of subreg reload
AMDGPURewriteAGPRCopyMFMA pass is currently not subreg-aware.
In particular, the logic that optimizes spills into COPY
instructions assumes full register reloads. This becomes
problematic when the reload instruction partially restores
a tuple register. This patch introduces the necessary changes
to make this pass subreg-aware, for a future patch that
implements subreg reload during RA.
[AMDGPU] Introduce Offset field in SGPR spill Pseudos
Currently, SGPR spill pseudo-instructions lack
an offset field to represent non-zero stack offsets.
This patch introduces an additional offset field to
SGPR spill pseudo-instructions and updates all
relevant passes that handle spill lowering to support
this new field. This field is essential for a future
patch that implements subreg reload of tuple registers
from their stack location during RA.
[SLP]Do not generate extractelement subnodes with the same indeces
The compiler should not generate subvectors with the same extractelement
instructions, it may cause a crash and leads to inefficient
vectorization.
Fixes #174773
[Headers][X86] __builtin_ia32_pmovwb128_mask is not constexpr (#174985)
Appears to be a copy+paste type - most of the x86 masked truncation intrinsics still can't be made constexpr at this time
Fixes #166814
[SDPatternMatch] Add m_FAbs matcher (#174975)
Adds a pattern matcher for floating-point absolute value (ISD::FABS),
following the same pattern as m_Abs for integer absolute value.
Fixes #174751
[compiler-rt][AArch64] Exit early from __arm_za_disable. (#174942)
Because `__arm_za_disable` is a private-ZA function, it's only ever
entered with ZA state `off` or `dormant`. If the state is `off` then we
can safely return and there is no need to call `__arm_tpidr2_save` or to
explicitly set PSTATE.ZA or TPIDR2_EL0 to zero.
[libc++][NFC] Update <any> to a more modern code style (#174619)
This patch refactors `enable_if`s inside `<any>` to use the `..., int> =
0` variant that we try to use throughout the code base and inlines some
of the functions into the class body to avoid duplicating the
`enable_if`s.
[Clang] expunge `trivially_relocate_if_eligible` (#174344)
In Kona, WG21 decided to revert trivial relocation (P2786).
Keep the notion of relocatability
(used in the wild and likely to come back),
but remove the keyword which is no longer conforming
[mlir][OpenMP] Fix sanitizer error in buildTaskLikeBodyGenCallback (#174983)
This is a fix for the asan bot after
https://github.com/llvm/llvm-project/pull/174386
Failing bot: https://lab.llvm.org/buildbot/#/builders/24/builds/16371
This commit undoes a simplification I thought reduced copied+pasted
code. I will merge it like this now to unblock the bot, and then work
separately on a different way to share code between both callbacks.
[AMDGPU] Fix a potential use-after-erase in `AMDGPUPromoteAlloca` pass
In some cases, the placeholder itself can be used as the value for its corresponding block in `SSAUpdater`, and later used as an incoming value in another block in `GetValueInMiddleOfBlock`. If we erase it too early, this can lead to a use-after-erase. The tricky part is that it may not trigger any error right away, but can cause weird and completely unrelated issues later in the pipeline.
[PowerPC] Change `half` to use soft promotion rather than `PromoteFloat` (#152632)
On PowerPC targets, `half` uses the default legalization of promoting to
a `f32`. However, this has some fundamental issues related to inability
to round trip. Resolve this by switching to the soft legalization, which
passes `f16` as an `i16`.
The PowerPC ABI Specification does not define a `_Float16` type, so the
calling convention changes are acceptable.
Fixes the PowerPC part of
https://github.com/llvm/llvm-project/issues/97975
Fixes the PowerPC part of
https://github.com/llvm/llvm-project/issues/97981
[SystemZ][z/OS] Improve use of formatv (#174503)
Using a `raw_svector_ostream` object is not necessary, because this is
hidden in the conversion function. In addition, there is no need to
reason about a zero termination of the string. Declaring the ascii and
ebcdic version of the string variables at the same time makes sure that
both strings are allocated with the same size.
[flang] Check for errors when analyzing array constructors (#173092)
Errors in array constructor values result in the array having
less elements than it should, which can cause other errors that
will confuse the user. Avoid this by not returning an expression
on errors.
Fixes #127425
[AMDGPU] Add intrinsic exposing s_alloc_vgpr
Make it possible to use `s_alloc_vgpr` at the IR level. This is a huge
footgun and use for anything other than compiler internal purposes is
heavily discouraged. The calling code must make sure that it does not
allocate fewer VGPRs than necessary - the intrinsic is NOT a request to
the backend to limit the number of VGPRs it uses (in essence it's not so
different from what we do with the dynamic VGPR flags of the
`amdgcn.cs.chain` intrinsic, it just makes it possible to use this
functionality in other scenarios).
Revert "[BAZEL] Move FuncTransformsPassIncGen to CAPIIR header dep (#174982)"
This reverts commit 46d0862773ac3ac07fd1a8abe76db623b26d7d45.
This previously landed a couple commits ago and now duplicates the dep,
breaking the bazel build.