[TableGen][NFCI] Change TableGenMain() to take function_ref. (#167888)
It was switched from a function pointer to std::function in
TableGen: Make 2nd arg MainFn of TableGenMain(argv0, MainFn) optional.
f675ec6165ab6add5e57cd43a2e9fa1a9bc21d81
but there's no mention of any particular reason for that.
[CMake] Declare all parts of *GenRegisterInfo.inc as outputs. (#168405)
This tells the build system to check and regenerate the
*GenRegisterInfo*.inc files, should any of them be missing for
whatever reason.
A follow-up from
<https://github.com/llvm/llvm-project/pull/167700>.
[Clang] Fix cleanup attribute by delaying type checks after the type is deduced (#164440)
Previously, the handling of the `cleanup` attribute had some checks
based on the type, but we were deducing the type after handling the
attribute.
This PR fixes the way the are dealing with type checks for the `cleanup`
attribute by delaying these checks after we are deducing the type.
It is also fixed in a way that the solution can be adapted for other
attributes that does some type based checks.
This is the list of C/C++ attributes that are doing type based checks
and will need to be fixed in additional PRs:
- CUDAShared
- MutualExclusions
- PassObjectSize
- InitPriority
- Sentinel
- AcquireCapability
- RequiresCapability
[5 lines not shown]
[AArch64][GlobalISel] Add better basic legalization for llround. (#168427)
This adds handling for f16 and f128 lround/llround under LP64 targets,
promoting the f16 where needed and using a libcall for f128. This
codegen is now identical to the selection dag version.
[TableGen][NFCI] Change TableGenMain() to take function_ref.
It was switched from a function pointer to std::function in
TableGen: Make 2nd arg MainFn of TableGenMain(argv0, MainFn) optional.
f675ec6165ab6add5e57cd43a2e9fa1a9bc21d81
but there's no mention of any particular reason for that.
[CMake] Declare all parts of *GenRegisterInfo.inc as outputs.
This tells the build system to check and regenerate the
*GenRegisterInfo*.inc files, should any of them be missing for
whatever reason.
A follow-up from
<https://github.com/llvm/llvm-project/pull/167700>.
[LLVM][AArch64] Mark SVE integer intrinsics as speculatable. (#167915)
Exceptions include intrinsics that:
* take or return floating point data
* read or write FFR
* read or write memory
* read or write SME state
[MLIR][SPIRV] Lower SPIR-V Tan/Tanh ops to LLVM intrinsics (#168419)
Fixed #148354
Lower SPIR-V Tan/Tanh ops using the corresponding LLVM intrinsics to
reduce instructions and prevent overflow caused by the previous
`exp`-based expansion.
[VPlan] Support isa/dyn_cast from VPRecipeBase to VPIRMetadata (NFC). (#166245)
Implement CastInfo from VPRecipeBase to VPIRMetadata to support
isa/dyn_Cast. This is similar to CastInfoVPPhiAccessors, supporting
dyn_cast by down-casting to the concrete recipe types inheriting from
VPIRMetadata.
Can be used for more generalized VPIRMetadata printing following
https://github.com/llvm/llvm-project/pull/165825.
PR: https://github.com/llvm/llvm-project/pull/166245
[compiler-rt][ARM] Optimized mulsf3 and divsf3 (#168394)
(Reland of #161546, fixing three build and test issues)
This commit adds optimized assembly versions of single-precision float
multiplication and division. Both functions are implemented in a style
that can be assembled as either of Arm and Thumb2; for multiplication, a
separate implementation is provided for Thumb1. Also, extensive new
tests are added for multiplication and division.
These implementations can be removed from the build by defining the
cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF.
Outlying parts of the functionality which are not on the fast path, such
as NaN handling and underflow, are handled in helper functions written
in C. These can be shared between the Arm/Thumb2 and Thumb1
implementations, and also reused by other optimized assembly functions
we hope to add in future.
[LoongArch] Add late branch optimisation pass
This commit adds a new target specific optimization pass for
LoongArch to convert conditional branches into unconditional
branches when the condition can be statically evaluated.
Similar to riscv.
Extend MemoryEffects to Support Target-Specific Memory Locations (#148650)
This patch introduces preliminary support for additional memory
locations.
They are: target_mem0 and target_mem1 and they model memory locations
that cannot be represented with existing memory locations.
It was a solution suggested in :
https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6
Currently, these locations are not yet target-specific. The goal is to
enable the compiler to express read/write effects on these resources.
[AArch64][llvm] GICv5 instruction `GIC CDEOI` takes no operand (#167322)
There was a minor oversight in commit 6836261ee; the AArch64 GICv5
instruction `GIC CDEOI` takes no operands, since the text of the
specification says:
```
The Rt field should be set to 0b11111. If the Rt field is not
set to 0b11111, it is CONSTRAINED UNPREDICTABLE whether:
* The instruction is UNDEFINED.
* The instruction behaves as if the Rt field is set to 0b11111.
```
[AMDGPU] Rematerialize VGPR candidates when SGPR spills to VGPR over the VGPR limit
Before, when selecting candidates to rematerialize, we would only
consider SGPR candidates when there was an excess of SGPR registers.
Failing to eliminate the excess would result in spills to VGPRs.
This is normally not an issue, unless spilling to VGPRs results in
excess VGPRs.
This patch does 2 things:
* It relaxes the GCNRPTarget success criteria: now we accept regions
where we spill SGPRs to VGPRs, as long as this does not end up in
excess VGPRs.
* It changes isSaveBeneficial to consider the excess VGPRs (which
includes the SGPRs that would be spilled to VGPR).
With these changes, the compiler rematerializes VGPRs when the excess
SGPRs would result in VGPR excess.
[4 lines not shown]
[X86][GlobalISel] Enable nest arguments (#165173)
Nest arguments are supported by CC in X86CallingConv.td. Nothing special
is required in GlobalISel as we reuse the code.
Nest attribute is mostly generated by fortran frontend.
[ORC] Move DebugObjectManagerPlugin into Debugging/ELFDebugObjectPlugin (NFC) (#168343)
In 4 years the plugin wasn't adapted to other object formats. This patch
makes it specific for ELF, which will allow to remove some abstractions
down the line. It also moves the plugin from LLVMOrcJIT into
LLVMOrcDebugging, which didn't exist back then.
[Headers][X86] Allow AVX512 masked arithmetic pd/ps/epi/epu intrinsics to be used in constexpr (#168496)
### Summary
This PR resolves #160559 - other pd/ps/epi/epu part of AVX512 masked arithmetic intrinsics.
[Headers][X86] Allow AVX512 masked arithmetic ss/sd intrinsics to be used in constexpr (#162816)
This PR just resolves ss/sd part of AVX512 masked arithmetic intrinsics of #160559.
[BOLT] Fix when inlining into a context with a tailcall
When inlining to a call site with a tailcall, the return in the inlined
block does not get removed. Because of this, we don't have to generate
the matching authentication.
Add test for this case.
[mlir][bufferization] Refine tensor-buffer compatibility checks (#167705)
Generally, to_tensor and to_buffer already perform sufficient
verification. However, there are some unnecessarily strict constraints:
* builtin tensor requires its buffer counterpart to always be memref
* to_buffer on ranked tensor requires to always return memref
These checks are assertions (i.e. preconditions), however, they actually
prevent an apparently useful bufferization where builtin tensors could
become custom buffers. Lift these assertions, maintaining the
verification procedure unchanged, to allow builtin -> custom
bufferizations at operation boundary level.
[MC] AsmLexer assert buffer is null-terminated at CurBuf.end() (#154972)
AsmLexer expects the buffer it's provided for lexing to be
NULL-terminated, where the NULL terminator is pointed to by
`CurBuf.end()`. However, this expectation isn't explicitly stated
anywhere.
This commit adds a couple of comments as well as an assert as means of
documenting this expectation.