LLVM/project 7cd747dllvm/include/llvm/CodeGen LibcallLoweringInfo.h, llvm/lib/Analysis RuntimeLibcallInfo.cpp

CodeGen: Add LibcallLoweringInfo analysis pass

The libcall lowering decisions should be program dependent,
depending on the current module's RuntimeLibcallInfo. We need
another related analysis derived from that plus the current
function's subtarget to provide concrete lowering decisions.

This takes on a somewhat unusual form. It's a Module analysis,
with a lookup keyed on the subtarget. This is a separate module
analysis from RuntimeLibraryAnalysis to avoid that depending on
codegen. It's not a function pass to avoid depending on any
particular function, to avoid repeated subtarget map lookups in
most of the use passes, and to avoid any recomputation in the
common case of one subtarget (and keeps it reusable across
repeated compilations).

This also switches ExpandFp and PreISelIntrinsicLowering as
a sample function and module pass. Note this is not yet wired
up to SelectionDAG, which is still using the LibcallLoweringInfo
constructed inside of TargetLowering.
DeltaFile
+68-0llvm/include/llvm/CodeGen/LibcallLoweringInfo.h
+36-17llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+42-0llvm/lib/CodeGen/LibcallLoweringInfo.cpp
+26-5llvm/lib/CodeGen/ExpandFp.cpp
+17-6llvm/tools/opt/NewPMDriver.cpp
+16-0llvm/lib/Analysis/RuntimeLibcallInfo.cpp
+205-2830 files not shown
+303-6136 files

LLVM/project 2921bb9llvm/lib/Target/ARM ARMSubtarget.cpp ARMISelLowering.cpp, llvm/lib/Target/MSP430 MSP430Subtarget.cpp MSP430ISelLowering.cpp

CodeGen: Move libcall lowering configuration to subtarget

Previously libcall lowering decisions were made directly
in the TargetLowering constructor. Pull these into the subtarget
to facilitate turning LibcallLoweringInfo into a separate analysis
in the future.
DeltaFile
+70-0llvm/lib/Target/ARM/ARMSubtarget.cpp
+0-68llvm/lib/Target/ARM/ARMISelLowering.cpp
+64-0llvm/lib/Target/MSP430/MSP430Subtarget.cpp
+0-62llvm/lib/Target/MSP430/MSP430ISelLowering.cpp
+40-0llvm/lib/Target/Sparc/SparcSubtarget.cpp
+1-36llvm/lib/Target/Sparc/SparcISelLowering.cpp
+175-16611 files not shown
+239-20517 files

LLVM/project 1fd957fllvm/include/llvm/CodeGen TargetLowering.h, llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp

CodeGen: Add subtarget to TargetLoweringBase constructor

Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
DeltaFile
+16-14llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
+4-2llvm/include/llvm/CodeGen/TargetLowering.h
+3-2llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+3-2llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+3-2llvm/unittests/CodeGen/MFCommon.inc
+4-0llvm/lib/Target/SPIRV/SPIRVISelLowering.cpp
+33-2225 files not shown
+62-4931 files

LLVM/project e47e9f3llvm/lib/Target/NVPTX NVPTXISelLowering.h NVPTXISelLowering.cpp

[NVPTX] TableGen-erate SDNode descriptions (#168367)

This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.

The verification functionality detected a few issues, two of them were
fixed (missing `SDNPMemOperand` property on `TCGEN05_MMA` nodes and
extra glue operand/result on `CallPrototype`), the one remaining is with
`ProxyReg` node, see `NVPTXSelectionDAGInfo::verifyTargetNode()`.

Part of #119709.

Pull Request: https://github.com/llvm/llvm-project/pull/168367
DeltaFile
+0-114llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+8-98llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+51-3llvm/lib/Target/NVPTX/NVPTXSelectionDAGInfo.cpp
+42-2llvm/lib/Target/NVPTX/NVPTXSelectionDAGInfo.h
+11-2llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+1-1llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+113-2202 files not shown
+115-2208 files

LLVM/project db71cc5libc/src/sys/mman/linux CMakeLists.txt pkey_mprotect.cpp, libc/src/sys/mman/linux/x86_64 pkey_common.h

[libc] Implement pkey_alloc/free/get/set/mprotect for x86_64 linux (#162362)

This patch provides definitions for `pkey_*` functions for linux x86_64.

`pkey_alloc`, `pkey_free`, and `pkey_mprotect` are simple syscall
wrappers. `pkey_set` and `pkey_get` modify architecture-specific
registers. The logic for these live in architecture specific
directories:

* `libc/src/sys/mman/linux/x86_64/pkey_common.h` has a real
implementation
* `libc/src/sys/mman/linux/generic/pkey_common.h` contains stubs that
just return `ENOSYS`.
DeltaFile
+241-0libc/test/src/sys/mman/linux/pkey_test.cpp
+95-0libc/src/sys/mman/linux/CMakeLists.txt
+91-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+61-0libc/src/sys/mman/linux/x86_64/pkey_common.h
+58-0libc/src/sys/mman/linux/pkey_mprotect.cpp
+38-0libc/src/sys/mman/linux/mprotect_common.h
+584-020 files not shown
+1,007-1226 files

LLVM/project 6665642llvm/lib/Target/AMDGPU SIFoldOperands.cpp GCNSubtarget.h, llvm/test/CodeGen/AMDGPU bug-pk-f32-imm-fold.mir packed-fp32.ll

[AMDGPU] Don't fold an i64 immediate value if it can't be replicated from its lower 32-bit (#168458)

On some targets, a packed f32 instruction can only read 32 bits from a
scalar operand (SGPR or literal) and replicates the bits to both
channels. In this case, we should not fold an immediate value if it
can't be replicated from its lower 32-bit.

Fixes SWDEV-567139.
DeltaFile
+64-0llvm/test/CodeGen/AMDGPU/bug-pk-f32-imm-fold.mir
+41-0llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+5-4llvm/test/CodeGen/AMDGPU/packed-fp32.ll
+7-0llvm/lib/Target/AMDGPU/GCNSubtarget.h
+117-44 files

LLVM/project 1e3ea03llvm/lib/Transforms/Vectorize VPlan.h VPlanRecipes.cpp, llvm/test/Transforms/LoopVectorize vplan-printing.ll

[VPlan] VPIRFlags kind for FCmp with predicate + fast-math flags (NFCI).

FCmp instructions have both a predicate and fast-math flags. Introduce a
new FCmp kind, that combines both to model this correctly in the current
system.

This should be NFC modulo VPlan printing which now includes the correct
fast-math flags.
DeltaFile
+55-16llvm/lib/Transforms/Vectorize/VPlan.h
+26-17llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+2-2llvm/test/Transforms/LoopVectorize/ARM/mve-icmpcost.ll
+1-1llvm/test/Transforms/LoopVectorize/vplan-printing.ll
+84-364 files

LLVM/project 5cde345runtimes CMakeLists.txt

[runtimes] Remove pstl from the list of supported runtimes (#168414)

The pstl top-level directory was removed, but we forgot to remove pstl
from the list of valid subdirectories.
DeltaFile
+1-1runtimes/CMakeLists.txt
+1-11 files

LLVM/project c4898f3clang/lib/CodeGen HLSLBufferLayoutBuilder.cpp CGHLSLRuntime.cpp, clang/test/CodeGenHLSL/resources cbuffer.hlsl

[HLSL][DirectX] Use a padding type for HLSL buffers. (#167404)

This change drops the use of the "Layout" type and instead uses explicit
padding throughout the compiler to represent types in HLSL buffers.

There are a few parts to this, though it's difficult to split them up as
they're very interdependent:

1. Refactor HLSLBufferLayoutBuilder to allow us to calculate the padding
of arbitrary types.
2. Teach Clang CodeGen to use HLSL specific paths for cbuffers when
generating aggregate copies, array accesses, and structure accesses.
3. Simplify DXILCBufferAccesses such that it directly replaces accesses
with dx.resource.getpointer rather than recalculating the layout.
4. Basic infrastructure for SPIR-V handling, but the implementation
itself will need work in follow ups.

Fixes several issues, including #138996, #144573, and #156084.
Resolves #147352.
DeltaFile
+27-281llvm/lib/Target/DirectX/DXILCBufferAccess.cpp
+79-218clang/lib/CodeGen/HLSLBufferLayoutBuilder.cpp
+266-23clang/lib/CodeGen/CGHLSLRuntime.cpp
+164-85clang/test/CodeGenHLSL/resources/cbuffer.hlsl
+0-216llvm/test/CodeGen/DirectX/CBufferAccess/memcpy.ll
+29-97llvm/test/CodeGen/DirectX/CBufferAccess/arrays.ll
+565-92033 files not shown
+988-1,30339 files

LLVM/project 31ec633clang-tools-extra/clang-tidy/misc CoroutineHostileRAIICheck.cpp, clang-tools-extra/test/clang-tidy/checkers/misc coroutine-hostile-raii.cpp

[clang-tidy] Fix bugs in misc-coroutine-hostile-raii check (#167947)

1. Handle transformed awaitables for `AllowedCallees`, which generate
temporaries and weren't being handled by #167778.

1. Fix name mismatches in `storeOptions`.
DeltaFile
+9-5clang-tools-extra/clang-tidy/misc/CoroutineHostileRAIICheck.cpp
+10-2clang-tools-extra/test/clang-tidy/checkers/misc/coroutine-hostile-raii.cpp
+19-72 files

LLVM/project 8fce476llvm CMakeLists.txt, llvm/include/llvm/Support/SystemZ zOSSupport.h

Implement a more seamless way to provide missing functions on z/OS (#167703)

In this PR I'm changing the way we provide the missing functions like
strnlen() on z/OS from the separate header file to a wrapper around the
system headers that declare these functions. This will be less
intrusive.

---------

Co-authored-by: Zibi Sarbinowski <zibi at ca.ibm.com>
DeltaFile
+82-0llvm/lib/Support/zOSLibFunctions.cpp
+0-47llvm/include/llvm/Support/SystemZ/zOSSupport.h
+35-0llvm/include/llvm/Support/SystemZ/zos_wrappers/string.h
+3-4llvm/lib/Support/Unix/Program.inc
+5-0llvm/CMakeLists.txt
+0-1llvm/tools/obj2yaml/macho2yaml.cpp
+125-5211 files not shown
+126-6217 files

LLVM/project 507f236llvm/lib/Transforms/Vectorize VPlanUtils.h, llvm/test/Transforms/LoopVectorize induction-wrapflags.ll

[VPlan] Fix OpType-mismatch in getFlagsFromIndDesc (#168560)

Follow up on a cse OpType-mismatch crash reported due to ef023cae388d
(Reland [VPlan] Expand WidenInt inductions with nuw/nsw), setting the
OpType correctly when returning from getFlagsFromIndDesc.
DeltaFile
+70-0llvm/test/Transforms/LoopVectorize/induction-wrapflags.ll
+4-1llvm/lib/Transforms/Vectorize/VPlanUtils.h
+74-12 files

LLVM/project d3c2973lldb/source/Plugins/Instruction/ARM64 EmulateInstructionARM64.cpp, lldb/unittests/UnwindAssembly/ARM64 TestArm64InstEmulation.cpp

[lldb/aarch64] Add STR/LDR instructions for FP registers to Emulator (#168187)

A function prologue can begin with a pre-index STR instruction for a
floating-point register. To construct an unwind plan from assembly
correctly, the instruction emulator must support such instructions.
DeltaFile
+108-0lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp
+32-11lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp
+140-112 files

LLVM/project 3e8dc4dclang/lib/Tooling/DependencyScanning DependencyScannerImpl.cpp

[clang][deps] NFC: Use qualified names for function definitions (#168586)

The compiler doesn't emit a diagnostics when the signature of a function
defined in a namespace gets out-of-sync with its declaration. Let's use
qualified names for function definitions instead of nesting them in a
namespace so that mismatches are diagnosed by the compiler rather than
by the (less understandable) linker.
DeltaFile
+23-21clang/lib/Tooling/DependencyScanning/DependencyScannerImpl.cpp
+23-211 files

LLVM/project bb03359llvm/lib/Target/AMDGPU SIFoldOperands.cpp

further name
DeltaFile
+4-3llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+4-31 files

LLVM/project a3e8aa6llvm/lib/Target/AMDGPU SIFoldOperands.cpp

use `getEffectiveImmVal` instead of adding a new API function
DeltaFile
+1-6llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+1-61 files

LLVM/project 67f5f1fllvm/lib/Target/AMDGPU SIFoldOperands.cpp AMDGPU.td, llvm/test/CodeGen/AMDGPU bug-pk-f32-imm-fold.mir packed-fp32.ll

[AMDGPU] Don't fold an i64 immediate value if it can't be replicated from its lower 32-bit

On some targets, a packed f32 instruction can only read 32 bits from a scalar operand (SGPR or literal) and replicates the bits to both channels. In this case, we should not fold an immediate value if it can't be replicated from its lower 32-bit.
DeltaFile
+64-0llvm/test/CodeGen/AMDGPU/bug-pk-f32-imm-fold.mir
+41-0llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+5-4llvm/test/CodeGen/AMDGPU/packed-fp32.ll
+8-0llvm/lib/Target/AMDGPU/AMDGPU.td
+6-0llvm/lib/Target/AMDGPU/GCNSubtarget.h
+124-45 files

LLVM/project d679d57llvm/lib/Target/AMDGPU AMDGPU.td GCNSubtarget.h

remove target feature
DeltaFile
+0-8llvm/lib/Target/AMDGPU/AMDGPU.td
+3-2llvm/lib/Target/AMDGPU/GCNSubtarget.h
+1-1llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+4-113 files

LLVM/project 58aa511llvm/lib/Target/AMDGPU SIFoldOperands.cpp

update helper function name
DeltaFile
+7-3llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+7-31 files

LLVM/project 1157a22llvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp

[GISel] Use getScalarSizeInBits in LegalizerHelper::lowerBitCount (#168584)

For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The
lowering should be based on the element width.

I noticed this by inspection. No tests in tree are currently affected,
but I thought it would be good to fix so someone doesn't have to debug
it in the future.
DeltaFile
+3-3llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+3-31 files

LLVM/project 56b1d42clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp

[CIR] Mark globals as constants (#168463)

We previously added support for marking GlobalOp operations as constant,
but the handling to actually do so was left mostly unimplemented. This
fills in the missing pieces.
DeltaFile
+30-30clang/test/CIR/CodeGen/constant-inits.cpp
+20-0clang/test/CIR/CodeGen/global-constant.c
+8-8clang/test/CIR/CodeGen/record-zero-init-padding.c
+7-7clang/test/CIR/CodeGen/vtt.cpp
+9-2clang/lib/CIR/CodeGen/CIRGenModule.cpp
+1-3clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+75-502 files not shown
+77-528 files

LLVM/project 69be8f6openmp/runtime/unittests CMakeLists.txt

rename add_openmp_unittest argument
DeltaFile
+5-5openmp/runtime/unittests/CMakeLists.txt
+5-51 files

LLVM/project e1bb50butils/bazel/llvm-project-overlay/llvm BUILD.bazel

[bazel] fix #168212 (#168598)

DeltaFile
+4-0utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+4-01 files

LLVM/project 8aca6c3clang/test/Driver fsanitize-alloc-token.c

[AllocToken] Test compatibility with -fsanitize=kcfi,memtag (#168600)

Test that -fsanitize=alloc-token is compatible with kcfi and memtag, as
these should also be possible to combine.

NFC.
DeltaFile
+1-0clang/test/Driver/fsanitize-alloc-token.c
+1-01 files

LLVM/project 4155cdcllvm/lib/Target/Mips Mips16ISelLowering.cpp

Mips: Remove manual libcall name search and table (#168595)

This should really check if the libcall is known supported.
For now mips doesn't configure its RuntimeLibcallsInfo
correctly, and does not have any of the mips16 calls in it.
For now there isn't a way to add them without triggering conflicting
cases in tablegen, so keep parsing the raw name as it was before.
DeltaFile
+32-67llvm/lib/Target/Mips/Mips16ISelLowering.cpp
+32-671 files

LLVM/project d770308llvm/lib/Target/AMDGPU SIFoldOperands.cpp

update helper function name
DeltaFile
+7-3llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+7-31 files

LLVM/project a898af9clang/test/Sema/AArch64 arm_sve_feature_dependent_sve___sme.c, llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll

Rebase

Created using spr 1.3.7
DeltaFile
+81,706-74,281llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+24,223-22,518llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+19,255-3,889llvm/test/CodeGen/RISCV/atomic-rmw.ll
+19,470-0clang/test/Sema/AArch64/arm_sve_feature_dependent_sve___sme.c
+10,536-7,642llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+10,015-7,219llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+165,205-115,54921,592 files not shown
+1,778,562-657,01621,598 files

LLVM/project 124fa5cllvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Analysis/CostModel/AArch64 shuffle-other.ll

[AArch64] - Improve costing for Identity shuffles for SVE targets. (#165375)

Identity masks can be treated as free when scalable vectorization is
possible making the check agnostic of the vectorization policy
fixed/scalable, This allows for aggressive vector combines for identity
shuffle masks.
DeltaFile
+61-0llvm/test/Transforms/VectorCombine/AArch64/identity-shuffle-sve.ll
+9-8llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+12-0llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll
+82-83 files

LLVM/project 164c72fmlir/lib/Conversion/XeGPUToXeVM XeGPUToXeVM.cpp, mlir/test/Conversion/XeGPUToXeVM loadstore_matrix.mlir

using implict type converter for memref input
DeltaFile
+3-10mlir/lib/Conversion/XeGPUToXeVM/XeGPUToXeVM.cpp
+2-3mlir/test/Conversion/XeGPUToXeVM/loadstore_matrix.mlir
+5-132 files

LLVM/project 576e1afllvm/lib/Target/AMDGPU AMDGPUIGroupLP.cpp

[NFC][AMDGPU] IGLP: Fixes for unsigned int handling (#135090)

Fixes unsigned int underflows in
`MFMASmallGemmSingleWaveOpt::applyIGLPStrategy`.
DeltaFile
+3-3llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
+3-31 files