LLVM/project 3249623llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Use `post_order` directly as `vp_post_order_shallow` has been removed
DeltaFile
+2-2llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+2-21 files

LLVM/project 3d4b452clang/lib/CIR/CodeGen CIRGenCall.cpp CIRGenBuiltin.cpp, clang/test/CIR/CodeGenBuiltins setjmp.c

[CIR] Allow _setjmp and _setjmpex to fall through to library calls (#193021)

This change allows calls to _setjmp and _setjmpex to fall through the
builtin handling and be emitted as library calls when we are not
targeting OSMSVCRT. It also adds the code to set "returns_twice" on
functions matching an explicit list, as they are in classic codegen.
DeltaFile
+95-0clang/test/CIR/CodeGenBuiltins/setjmp.c
+15-2clang/lib/CIR/CodeGen/CIRGenCall.cpp
+7-1clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+117-33 files

LLVM/project de45a7cllvm/docs/CommandGuide dsymutil.rst, llvm/test/tools/dsymutil embed-resource.test cmdline.test

[dsymutil] Add --embed-resource to copy files into dSYM bundles. (#190663)

Add a new --embed-resource flag that copies files or directories into
the dSYM bundle's Contents/Resources/ directory during generation.

Projects often need to embed files such as LLDB Python scripts into dSYM
bundles, and this is usually done with a post dsym generation script,
which may race stripping and code signing steps.

rdar://50633614
DeltaFile
+65-0llvm/test/tools/dsymutil/embed-resource.test
+62-0llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
+26-0llvm/tools/dsymutil/dsymutil.cpp
+9-0llvm/tools/dsymutil/Options.td
+7-0llvm/docs/CommandGuide/dsymutil.rst
+4-0llvm/test/tools/dsymutil/cmdline.test
+173-02 files not shown
+179-08 files

LLVM/project 356ab40llvm/test/tools/llvm-nm special-syms-arm.test special-syms-csky.test, llvm/tools/llvm-nm llvm-nm.cpp

[llvm-nm] Drop STT_FILE/STT_SECTION from --special-syms (#192129)

The filter for SF_FormatSpecific symbols exempted all such symbols
for architectures having mapping symbols. This caused STT_FILE and
STT_SECTION symbols to appear with --special-syms on these targets
but not on x86_64. Narrow the exemption to only STT_NOTYPE symbols,
which are the actual mapping symbols ($d, $x, etc.).
DeltaFile
+14-11llvm/tools/llvm-nm/llvm-nm.cpp
+13-5llvm/test/tools/llvm-nm/special-syms-arm.test
+12-4llvm/test/tools/llvm-nm/special-syms-csky.test
+12-4llvm/test/tools/llvm-nm/special-syms-aarch64.test
+12-4llvm/test/tools/llvm-nm/special-syms-riscv.test
+63-285 files

LLVM/project 944f382clang/lib/CodeGen CGHLSLRuntime.cpp CGExpr.cpp, clang/lib/Sema SemaHLSL.cpp

[HLSL] Add codegen for accessing resource members of a struct (#187127)

Any expression that accesses a resource or resource array member of a global struct instance must be during codegen replaced by an access of the corresponding implicit global resource variable.

When codegen encounters a `MemberExpr` of a resource type, it traverses the AST to locate the parent struct declaration, building the expected global resource variable name along the way. If the parent declaration
is a non-static global struct instance, codegen searches its `HLSLAssociatedResourceDeclAttr` attributes to locate the matching global resource variable and then generates IR code to access the resource global in place of the member access.

Fixes #182989
DeltaFile
+146-10clang/lib/CodeGen/CGHLSLRuntime.cpp
+132-0clang/test/CodeGenHLSL/resources/resources-in-structs-inheritance.hlsl
+100-0clang/test/CodeGenHLSL/resources/resources-in-structs-array.hlsl
+83-0clang/test/CodeGenHLSL/resources/resources-in-structs.hlsl
+19-24clang/lib/Sema/SemaHLSL.cpp
+12-4clang/lib/CodeGen/CGExpr.cpp
+492-383 files not shown
+525-389 files

LLVM/project 43d4b7bllvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Addressing code review comments
DeltaFile
+21-22llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+21-221 files

LLVM/project 99929c3llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Minor stylistic cleanup
DeltaFile
+12-12llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+12-121 files

LLVM/project a2bcfaellvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Just use `vputils::onlyFirstLaneUsed`
DeltaFile
+1-7llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+1-71 files

LLVM/project a2907d6llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Extend post_order's lifetime
DeltaFile
+5-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+5-31 files

LLVM/project 32b0750llvm/lib/Transforms/Vectorize VPlanTransforms.cpp

Use `reverse`/`IsaPred`
DeltaFile
+2-4llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+2-41 files

LLVM/project 0779995llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h

Don't pass RecipeBuilder

Legacy calls `setRecipe` on all processed recipes but really queries `getRecipe`
for memory operations only, that we don't touch in the scalarization as that
happens after all memory recipes has been processed.
DeltaFile
+1-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+1-2llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+1-1llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+3-63 files

LLVM/project 6271936llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanTransforms.h, llvm/test/Transforms/LoopVectorize/AArch64 binop-costs.ll

[VPlan] Scalarize to first-lane-only directly on VPlan

This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595.

I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because that would require us to introduce
interleave groups much earlier in VPlan pipeline, and without that we
can't really `assert` this new decision matches the previous CM-based
one. And without those `assert`s it's really hard to ensure we properly
port all the previous logic.

As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.

Depends on https://github.com/llvm/llvm-project/pull/182592 but is stacked on
top of https://github.com/llvm/llvm-project/pull/182594 to enable linear
stacking for https://github.com/llvm/llvm-project/pull/182595.
DeltaFile
+65-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+5-5llvm/test/Transforms/LoopVectorize/AArch64/binop-costs.ll
+6-0llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+4-2llvm/test/Transforms/LoopVectorize/X86/funclet.ll
+2-2llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
+3-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+85-93 files not shown
+88-119 files

LLVM/project 725ecf6llvm/lib/Target/NVPTX NVPTXAsmPrinter.cpp

[NVPTX] NVPTXAsmPrinter::bufferAggregateConstVec - append null constants instead of iterating (#192742)

Avoids unnecessary loop with repeated ConstantInt::getNullValue calls
and fixes MSVC unused variable warning

Introduced by #183628
DeltaFile
+2-2llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+2-21 files

LLVM/project a0ac2edllvm/test/CodeGen/AArch64 sve-fixed-length-masked-expandloads.ll sve-streaming-mode-fixed-length-masked-expandload.ll, llvm/test/CodeGen/AArch64/GlobalISel select-with-no-legality-check.mir knownbits-vector.mir

comments

Created using spr 1.3.7
DeltaFile
+26,606-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-expandloads.ll
+4,078-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-expandload.ll
+1,604-1,567llvm/test/CodeGen/AArch64/clmul-scalable.ll
+0-1,370llvm/unittests/CodeGen/GlobalISel/KnownBitsVectorTest.cpp
+662-662llvm/test/CodeGen/AArch64/GlobalISel/select-with-no-legality-check.mir
+1,291-0llvm/test/CodeGen/AArch64/GlobalISel/knownbits-vector.mir
+34,241-3,599741 files not shown
+53,383-16,854747 files

LLVM/project 633a098llvm/test/CodeGen/AArch64 sve-fixed-length-masked-expandloads.ll sve-streaming-mode-fixed-length-masked-expandload.ll, llvm/test/CodeGen/AArch64/GlobalISel select-with-no-legality-check.mir knownbits-vector.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+26,606-0llvm/test/CodeGen/AArch64/sve-fixed-length-masked-expandloads.ll
+4,078-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-expandload.ll
+1,604-1,567llvm/test/CodeGen/AArch64/clmul-scalable.ll
+0-1,370llvm/unittests/CodeGen/GlobalISel/KnownBitsVectorTest.cpp
+662-662llvm/test/CodeGen/AArch64/GlobalISel/select-with-no-legality-check.mir
+1,291-0llvm/test/CodeGen/AArch64/GlobalISel/knownbits-vector.mir
+34,241-3,599740 files not shown
+53,382-16,838746 files

LLVM/project 62d7aa0clang/test/Modules DebugInfoSubmodules.c lsv-debuginfo.cpp

[clang] Disable some module tests on AIX (#193008)

PR https://github.com/llvm/llvm-project/pull/190062 makes two module
tests fail on AIX. Disable them on that platform until we get to the
bottom of it.
DeltaFile
+1-0clang/test/Modules/DebugInfoSubmodules.c
+1-0clang/test/Modules/lsv-debuginfo.cpp
+2-02 files

LLVM/project 0dd5054mlir/include/mlir/Dialect/GPU/IR GPUOps.td, mlir/include/mlir/Dialect/GPU/Transforms IndexedAccessOpInterfaceImpl.h

[mlir][MemRef][GPU] Migrate GPU dialect ops to IndexedAccessOpInterface (#190380)

This commit migrates the handling of GPU dialect ops in
fold-memref-alias-ops from hard-coded support to the new
IndexedAccessOphinterface, which also adds expand_shape folding support
for those ops.

Once other memref-dialect passes are migrated to use this interface,
this will allow us to break the dependency between the memref and gpu
dialects.
DeltaFile
+119-0mlir/lib/Dialect/GPU/Transforms/IndexedAccessOpInterfaceImpl.cpp
+79-0mlir/test/Dialect/GPU/fold-memref-alias-ops.mlir
+0-21mlir/lib/Dialect/MemRef/Transforms/FoldMemRefAliasOps.cpp
+21-0mlir/include/mlir/Dialect/GPU/Transforms/IndexedAccessOpInterfaceImpl.h
+8-0mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+3-0mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+230-212 files not shown
+233-218 files

LLVM/project 78b0a51llvm/utils/TableGen/Common/GlobalISel GlobalISelMatchTable.cpp

[GlobaISel] Protect against Variable 'NumBucketedMatchers' set but not used Error/Warning. (#193000)

Fixes the build issue reported on #177158
DeltaFile
+6-2llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp
+6-21 files

LLVM/project 68d22f4flang/include/flang/Semantics tools.h, flang/lib/Lower OpenACC.cpp

[flang][cuda] Only apply the implicit data attribute on the component for use_device (#192146)

For interoperability between CUDA Fortran and OpenACC, the OpenACC
host_data use_device clause needs implicitly add the DEVICE attribute to
the object symbol for the duration of the region. When the object was a
component, we were adding the symbol to the base which is not what we
want.
Update the handling to copy the base symbol with a new DerivedTypeScope
and set the attribute on the component. New test is added to test the
behavior is indeed on the component.
DeltaFile
+159-37flang/lib/Semantics/resolve-names.cpp
+64-0flang/test/Lower/OpenACC/acc-host-data-cuda-device.f90
+64-0flang/lib/Semantics/type.cpp
+30-9flang/lib/Semantics/expression.cpp
+9-0flang/include/flang/Semantics/tools.h
+2-6flang/lib/Lower/OpenACC.cpp
+328-521 files not shown
+331-527 files

LLVM/project a298e79clang/docs UsersManual.rst, clang/lib/Driver/ToolChains Clang.cpp

Option to control signaling NaN support

This change implements the Clang command-line option `-fsignaling-nans`,
which is a counterpart of the GCC option with the same name. It allows a
user to control support for signaling NaNs. This option instructs the
compiler that signaling NaNs are to be treated according to IEEE 754:
they are quieted in arithmetic operations and raise `Invalid`
floating-point exception. The opposite option, `-fno-signaling-nans`,
does the reverse, - it indicates that signaling NaNs are handled
identically to quiet NaNs. If neither of these options is specified, no
signaling NaNs support is assumed, except for functions that have
`strictfp` attribute.

At the IR level, signaling NaN support is represented by the function
attribute "signaling-nans". It is set by Clang when it generates code in
cases when signaling NaNs are supported. If the target architecture does
not support signaling NaNs, Clang does not set this attribute.

The primary motivation for this change is the optimization of strictfp

    [11 lines not shown]
DeltaFile
+187-2llvm/test/Transforms/InstSimplify/strictfp-fsub.ll
+111-2llvm/test/Transforms/InstSimplify/strictfp-fadd.ll
+30-0clang/docs/UsersManual.rst
+18-1clang/test/Driver/clang_f_opts.c
+19-0llvm/docs/LangRef.rst
+16-1clang/lib/Driver/ToolChains/Clang.cpp
+381-619 files not shown
+460-2225 files

LLVM/project b549d96llvm/lib/Target/AArch64 AArch64PointerAuth.cpp

[llvm][ptrauth] Refactor an early return into authenticateLR. NFC (#190415)
DeltaFile
+24-23llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
+24-231 files

LLVM/project 918b8f5mlir/lib/Transforms CSE.cpp

[mlir][CSE] Pre-process trivially dead ops (NFC) (#191135)

This PR avoids calling `simplifyRegion` on dead region ops.
`simplifyRegion` attempts to perform CSE optimization on the ops within
the region, which is unnecessary for ops that are already trivially
dead.
DeltaFile
+10-8mlir/lib/Transforms/CSE.cpp
+10-81 files

LLVM/project 871c9bbmlir/lib/Dialect/SparseTensor/IR SparseTensorDialect.cpp, mlir/test/Dialect/SparseTensor encoding_with_symbols.mlir

[mlir][SparseTensor] add `numSymbols` information to simplify affine expressions (#191649)

Previously, the `translateShape` function hard-coded the `numSymbols`
parameter to 0. This makes the affine expression fail when the sparse
tensor encoding has symbols.

This PR fixes the issue by extracting and passing the `numSymbols`
information during translation. A regression test has also been added to
ensure this behavior remains supported.

Closes #191209
DeltaFile
+26-0mlir/test/Dialect/SparseTensor/encoding_with_symbols.mlir
+7-2mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
+33-22 files

LLVM/project 00a70e8llvm/lib/Target/AMDGPU AMDGPUMCResourceInfo.cpp AMDGPUResourceUsageAnalysis.cpp, llvm/test/CodeGen/AMDGPU object-linking-local-resources.ll

[AMDGPU] Report only local per-function resource usage when object linking is enabled (#192594)

With object linking the linker aggregates resource usage across TUs, so
compile-time pessimism and call-graph propagation duplicate the linker's
work or pollute its inputs.

In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building
call-graph max/or expressions.
DeltaFile
+109-0llvm/test/CodeGen/AMDGPU/object-linking-local-resources.ll
+26-8llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
+10-1llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
+4-0llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h
+149-94 files

LLVM/project 4c2834dflang/module cudadevice.f90

[flang] add missing accel intrinsics (#193020)

Add the missing `__float2ull_*` intrinsic interfaces.

Co-authored-by: Yebin Chon <ychon at nvidia.com>
DeltaFile
+28-0flang/module/cudadevice.f90
+28-01 files

LLVM/project 093d807llvm/lib/CodeGen MachineBlockHashInfo.cpp

[CodeGen] Fix non-determinism in MachineBlockHashInfo (#192826)

The previous implementation used `hash_value(MachineOperand)`, which
is not guaranteed to be stable across different executions because it
hashes pointers for certain operand types (like MBB, GlobalAddress,
etc).

Use existing stableHashValue which has no problem.
    
The rest of the file should the same, but it may break profile
compatibility.
Changing behavior for Operand is not an issue, as existing one is a low
quality RNG.

Code does not have test coverage, it will be fixed in #192911.

Fixes #173933.
DeltaFile
+4-2llvm/lib/CodeGen/MachineBlockHashInfo.cpp
+4-21 files

LLVM/project ce4ebdellvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp AMDGPURegBankLegalizeRules.cpp

AMDGPU/GlobalISel: RegbankLegalize rules for merge-like opcodes

Move RegbankLegalize handling for G_BUILD_VECTOR, G_MERGE_VALUES and
G_CONCAT_VECTORS from AMDGPURegBankLegalize to AMDGPURegBankLegalizeRules
by implementing rules for all supported types.
DeltaFile
+0-22llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+10-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+0-10llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+0-3llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
+10-354 files

LLVM/project b2b27b8llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp AMDGPURegBankLegalize.cpp

AMDGPU/GlobalISel: RegbankLegalize rules for G_BITCAST

Move RegbankLegalize handling for G_BITCAST from AMDGPURegBankLegalize to
AMDGPURegBankLegalizeRules by implementing rules for all supported types.
DeltaFile
+4-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+5-12 files

LLVM/project ede881bllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp AMDGPURegBankLegalize.cpp

AMDGPU/GlobalISel: RegbankLegalize rules for undef and constants

Move RegbankLegalize handling for G_IMPLICIT_DEF, G_CONSTANT and G_FCONSTANT
from AMDGPURegBankLegalize to AMDGPURegBankLegalizeRules by implementing
rules for all supported types.
DeltaFile
+17-5llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+0-12llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+17-172 files

LLVM/project e5bce12clang/docs ReleaseNotes.rst, clang/include/clang/Basic AttrDocs.td

[Clang][AMDGPU] Deprecate `amdgpu-num-vgpr` and `amdgpu-num-sgpr`

We will just emit a warning at this moment. This will still take effect for
regular compilation, but in object linking, we will simply ignore them.
DeltaFile
+13-5clang/docs/ReleaseNotes.rst
+16-0clang/test/SemaOpenCL/amdgpu-num-sgpr-vgpr-deprecated.cl
+6-4llvm/docs/AMDGPUUsage.rst
+5-1clang/include/clang/Basic/AttrDocs.td
+5-0llvm/docs/ReleaseNotes.md
+4-0clang/test/CIR/CodeGenHIP/amdgpu-attrs.hip
+49-106 files not shown
+61-1412 files