CodeGen: Add LibcallLoweringInfo analysis pass
The libcall lowering decisions should be program dependent,
depending on the current module's RuntimeLibcallInfo. We need
another related analysis derived from that plus the current
function's subtarget to provide concrete lowering decisions.
This takes on a somewhat unusual form. It's a Module analysis,
with a lookup keyed on the subtarget. This is a separate module
analysis from RuntimeLibraryAnalysis to avoid that depending on
codegen. It's not a function pass to avoid depending on any
particular function, to avoid repeated subtarget map lookups in
most of the use passes, and to avoid any recomputation in the
common case of one subtarget (and keeps it reusable across
repeated compilations).
This also switches ExpandFp and PreISelIntrinsicLowering as
a sample function and module pass. Note this is not yet wired
up to SelectionDAG, which is still using the LibcallLoweringInfo
constructed inside of TargetLowering.
CodeGen: Move libcall lowering configuration to subtarget
Previously libcall lowering decisions were made directly
in the TargetLowering constructor. Pull these into the subtarget
to facilitate turning LibcallLoweringInfo into a separate analysis
in the future.
CodeGen: Add subtarget to TargetLoweringBase constructor
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
[NVPTX] TableGen-erate SDNode descriptions (#168367)
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.
The verification functionality detected a few issues, two of them were
fixed (missing `SDNPMemOperand` property on `TCGEN05_MMA` nodes and
extra glue operand/result on `CallPrototype`), the one remaining is with
`ProxyReg` node, see `NVPTXSelectionDAGInfo::verifyTargetNode()`.
Part of #119709.
Pull Request: https://github.com/llvm/llvm-project/pull/168367
[libc] Implement pkey_alloc/free/get/set/mprotect for x86_64 linux (#162362)
This patch provides definitions for `pkey_*` functions for linux x86_64.
`pkey_alloc`, `pkey_free`, and `pkey_mprotect` are simple syscall
wrappers. `pkey_set` and `pkey_get` modify architecture-specific
registers. The logic for these live in architecture specific
directories:
* `libc/src/sys/mman/linux/x86_64/pkey_common.h` has a real
implementation
* `libc/src/sys/mman/linux/generic/pkey_common.h` contains stubs that
just return `ENOSYS`.
[AMDGPU] Don't fold an i64 immediate value if it can't be replicated from its lower 32-bit (#168458)
On some targets, a packed f32 instruction can only read 32 bits from a
scalar operand (SGPR or literal) and replicates the bits to both
channels. In this case, we should not fold an immediate value if it
can't be replicated from its lower 32-bit.
Fixes SWDEV-567139.
[VPlan] VPIRFlags kind for FCmp with predicate + fast-math flags (NFCI).
FCmp instructions have both a predicate and fast-math flags. Introduce a
new FCmp kind, that combines both to model this correctly in the current
system.
This should be NFC modulo VPlan printing which now includes the correct
fast-math flags.
[runtimes] Remove pstl from the list of supported runtimes (#168414)
The pstl top-level directory was removed, but we forgot to remove pstl
from the list of valid subdirectories.
[HLSL][DirectX] Use a padding type for HLSL buffers. (#167404)
This change drops the use of the "Layout" type and instead uses explicit
padding throughout the compiler to represent types in HLSL buffers.
There are a few parts to this, though it's difficult to split them up as
they're very interdependent:
1. Refactor HLSLBufferLayoutBuilder to allow us to calculate the padding
of arbitrary types.
2. Teach Clang CodeGen to use HLSL specific paths for cbuffers when
generating aggregate copies, array accesses, and structure accesses.
3. Simplify DXILCBufferAccesses such that it directly replaces accesses
with dx.resource.getpointer rather than recalculating the layout.
4. Basic infrastructure for SPIR-V handling, but the implementation
itself will need work in follow ups.
Fixes several issues, including #138996, #144573, and #156084.
Resolves #147352.
[clang-tidy] Fix bugs in misc-coroutine-hostile-raii check (#167947)
1. Handle transformed awaitables for `AllowedCallees`, which generate
temporaries and weren't being handled by #167778.
1. Fix name mismatches in `storeOptions`.
Implement a more seamless way to provide missing functions on z/OS (#167703)
In this PR I'm changing the way we provide the missing functions like
strnlen() on z/OS from the separate header file to a wrapper around the
system headers that declare these functions. This will be less
intrusive.
---------
Co-authored-by: Zibi Sarbinowski <zibi at ca.ibm.com>
[VPlan] Fix OpType-mismatch in getFlagsFromIndDesc (#168560)
Follow up on a cse OpType-mismatch crash reported due to ef023cae388d
(Reland [VPlan] Expand WidenInt inductions with nuw/nsw), setting the
OpType correctly when returning from getFlagsFromIndDesc.
[lldb/aarch64] Add STR/LDR instructions for FP registers to Emulator (#168187)
A function prologue can begin with a pre-index STR instruction for a
floating-point register. To construct an unwind plan from assembly
correctly, the instruction emulator must support such instructions.
[clang][deps] NFC: Use qualified names for function definitions (#168586)
The compiler doesn't emit a diagnostics when the signature of a function
defined in a namespace gets out-of-sync with its declaration. Let's use
qualified names for function definitions instead of nesting them in a
namespace so that mismatches are diagnosed by the compiler rather than
by the (less understandable) linker.
[AMDGPU] Don't fold an i64 immediate value if it can't be replicated from its lower 32-bit
On some targets, a packed f32 instruction can only read 32 bits from a scalar operand (SGPR or literal) and replicates the bits to both channels. In this case, we should not fold an immediate value if it can't be replicated from its lower 32-bit.
[GISel] Use getScalarSizeInBits in LegalizerHelper::lowerBitCount (#168584)
For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The
lowering should be based on the element width.
I noticed this by inspection. No tests in tree are currently affected,
but I thought it would be good to fix so someone doesn't have to debug
it in the future.
[CIR] Mark globals as constants (#168463)
We previously added support for marking GlobalOp operations as constant,
but the handling to actually do so was left mostly unimplemented. This
fills in the missing pieces.
[AllocToken] Test compatibility with -fsanitize=kcfi,memtag (#168600)
Test that -fsanitize=alloc-token is compatible with kcfi and memtag, as
these should also be possible to combine.
NFC.
Mips: Remove manual libcall name search and table (#168595)
This should really check if the libcall is known supported.
For now mips doesn't configure its RuntimeLibcallsInfo
correctly, and does not have any of the mips16 calls in it.
For now there isn't a way to add them without triggering conflicting
cases in tablegen, so keep parsing the raw name as it was before.
[AArch64] - Improve costing for Identity shuffles for SVE targets. (#165375)
Identity masks can be treated as free when scalable vectorization is
possible making the check agnostic of the vectorization policy
fixed/scalable, This allows for aggressive vector combines for identity
shuffle masks.