LLVM/project 00a70e8llvm/lib/Target/AMDGPU AMDGPUMCResourceInfo.cpp AMDGPUResourceUsageAnalysis.cpp, llvm/test/CodeGen/AMDGPU object-linking-local-resources.ll

[AMDGPU] Report only local per-function resource usage when object linking is enabled (#192594)

With object linking the linker aggregates resource usage across TUs, so
compile-time pessimism and call-graph propagation duplicate the linker's
work or pollute its inputs.

In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building
call-graph max/or expressions.
DeltaFile
+109-0llvm/test/CodeGen/AMDGPU/object-linking-local-resources.ll
+26-8llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
+10-1llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
+4-0llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h
+149-94 files

LLVM/project 4c2834dflang/module cudadevice.f90

[flang] add missing accel intrinsics (#193020)

Add the missing `__float2ull_*` intrinsic interfaces.

Co-authored-by: Yebin Chon <ychon at nvidia.com>
DeltaFile
+28-0flang/module/cudadevice.f90
+28-01 files

LLVM/project 093d807llvm/lib/CodeGen MachineBlockHashInfo.cpp

[CodeGen] Fix non-determinism in MachineBlockHashInfo (#192826)

The previous implementation used `hash_value(MachineOperand)`, which
is not guaranteed to be stable across different executions because it
hashes pointers for certain operand types (like MBB, GlobalAddress,
etc).

Use existing stableHashValue which has no problem.
    
The rest of the file should the same, but it may break profile
compatibility.
Changing behavior for Operand is not an issue, as existing one is a low
quality RNG.

Code does not have test coverage, it will be fixed in #192911.

Fixes #173933.
DeltaFile
+4-2llvm/lib/CodeGen/MachineBlockHashInfo.cpp
+4-21 files

LLVM/project ce4ebdellvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp AMDGPURegBankLegalizeRules.cpp

AMDGPU/GlobalISel: RegbankLegalize rules for merge-like opcodes

Move RegbankLegalize handling for G_BUILD_VECTOR, G_MERGE_VALUES and
G_CONCAT_VECTORS from AMDGPURegBankLegalize to AMDGPURegBankLegalizeRules
by implementing rules for all supported types.
DeltaFile
+0-22llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+10-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+0-10llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+0-3llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
+10-354 files

LLVM/project b2b27b8llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp AMDGPURegBankLegalize.cpp

AMDGPU/GlobalISel: RegbankLegalize rules for G_BITCAST

Move RegbankLegalize handling for G_BITCAST from AMDGPURegBankLegalize to
AMDGPURegBankLegalizeRules by implementing rules for all supported types.
DeltaFile
+4-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+1-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+5-12 files

LLVM/project ede881bllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp AMDGPURegBankLegalize.cpp

AMDGPU/GlobalISel: RegbankLegalize rules for undef and constants

Move RegbankLegalize handling for G_IMPLICIT_DEF, G_CONSTANT and G_FCONSTANT
from AMDGPURegBankLegalize to AMDGPURegBankLegalizeRules by implementing
rules for all supported types.
DeltaFile
+17-5llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+0-12llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+17-172 files

LLVM/project e5bce12clang/docs ReleaseNotes.rst, clang/include/clang/Basic AttrDocs.td

[Clang][AMDGPU] Deprecate `amdgpu-num-vgpr` and `amdgpu-num-sgpr`

We will just emit a warning at this moment. This will still take effect for
regular compilation, but in object linking, we will simply ignore them.
DeltaFile
+13-5clang/docs/ReleaseNotes.rst
+16-0clang/test/SemaOpenCL/amdgpu-num-sgpr-vgpr-deprecated.cl
+6-4llvm/docs/AMDGPUUsage.rst
+5-1clang/include/clang/Basic/AttrDocs.td
+5-0llvm/docs/ReleaseNotes.md
+4-0clang/test/CIR/CodeGenHIP/amdgpu-attrs.hip
+49-106 files not shown
+61-1412 files

LLVM/project 357b8e8clang/lib/CIR/CodeGen CIRGenExprCXX.cpp CIRGenFunction.cpp, clang/test/CIR/CodeGen paren-list-agg-init.cpp

[CIR] Implement emitNewArrayInit for constant and strings (#192666)

This patch further fleshes out the emit New ArrayInit for constant and
string variables. Implementation wise, this is pretty much the same as
classic-codegen, however it required a few differences. First, our use
of cir.copy instead of a memcpy call means we had to 'lift' an
dyn_allocated pointer type to the array type. Second, we had to make
some changes to make sure that 'empty' extra init was skipped in a place
we didn't do before.

In order to test this, I found 2 tests from classic-codegen that I
pulled in nearly verbatum. 'Check' lines from paren-list-agg-init.cpp
are converted to LLVM lines with slight relaxation, mostly to make up
for cases where CIR lowering ntroduces extra branches or GEPS on
conversion changes.

new-array-init.cpp's 'Check' lines were particularly bad/not detailed,
  so I wrote new ones.

ONE test was commented out, as it requires the rest of emitNewArrayInit
to be implemented.
DeltaFile
+884-0clang/test/CIR/CodeGen/paren-list-agg-init.cpp
+651-0clang/test/CIR/CodeGenCXX/new-array-init.cpp
+107-16clang/lib/CIR/CodeGen/CIRGenExprCXX.cpp
+5-0clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+1,647-164 files

LLVM/project 3df82f5lldb/packages/Python/lldbsuite/test decorators.py, lldb/test/API/windows/conpty TestConPTY.py

[lldb][windows] only run basic ConPTY tests on older Windows versions (#192984)
DeltaFile
+24-2lldb/test/API/windows/conpty/TestConPTY.py
+14-0lldb/packages/Python/lldbsuite/test/decorators.py
+38-22 files

LLVM/project 4a21ae9clang/lib/Sema SemaOpenACC.cpp, clang/test/SemaOpenACC compute-construct-private-clause.cpp

[OpenACC] Make sure array-section diag normalizes width for diag (#193013)

The below issue exposed that the comparison of the values was not
properly adjusting the width of the constant values before comparison.
Thus, when we did an addition of the two, it caused an assert in APSInt.

This patch makes sure that the 'adjust width + sign' branch is taken if
the sign or width don't match. Previously we only did this if it was a
sign mismatch.

Fixes: #192783
DeltaFile
+17-0clang/test/SemaOpenACC/compute-construct-private-clause.cpp
+4-2clang/lib/Sema/SemaOpenACC.cpp
+21-22 files

LLVM/project 91cba19clang/lib/CIR/CodeGen CIRGenItaniumCXXABI.cpp, clang/test/CIR/CodeGen dynamic-cast.cpp

[CIR] Fix dynamic cast of const types (#192751)

When a dynamic cast was performed using const-qualified values, we were
generating a reference to const-qualified typeinfo but never emitting
such const-qualified typeinfo, leading to an undefined reference at link
time.

This change fixes that by stripping the type qualifiers before
processing the cast. This matches the behavior of classic codegen in
ItaniumCXXABI::emitDynamicCastCall.
DeltaFile
+38-0clang/test/CIR/CodeGen/dynamic-cast.cpp
+2-2clang/lib/CIR/CodeGen/CIRGenItaniumCXXABI.cpp
+40-22 files

LLVM/project 2c8c2bdclang/test/CodeGenHLSL/builtins mad.hlsl, clang/test/CodeGenHLSL/convergence for.hlsl while.hlsl

[HLSL][DirectX] Emit convergence control tokens when targeting DirectX (#188792)

This pr allows codegen to generate convergence control tokens. This
allows for a more accurate description of convergence behaviour to
prevent (or allow) invalid control flow graph transforms. As noted, the
use of convergence control tokens is the ideal norm and this follows
that by enabling it for `DirectX`.

This was done now under the precedent of preventing a convergent exit
condition of a loop from being illegally moved across control flow. Test
cases for this are explicitly added.

Please see the individual commits for logically similar chunks.
Unfortunately, it is tricky to stage this in smaller individual commits.

Resolves https://github.com/llvm/llvm-project/issues/180621.

https://github.com/llvm/llvm-project/pull/188537 is a pre-requisite of
this passing HLSL offload suite tests.

Assisted by: Github Copilot
DeltaFile
+72-0llvm/test/Transforms/LoopRotate/convergent-controlled.ll
+70-0llvm/test/Transforms/IndVarSimplify/convergent-controlled-loop.ll
+70-0llvm/test/Transforms/SimpleLoopUnswitch/convergent-controlled.ll
+28-26clang/test/CodeGenHLSL/convergence/for.hlsl
+24-24clang/test/CodeGenHLSL/builtins/mad.hlsl
+21-19clang/test/CodeGenHLSL/convergence/while.hlsl
+285-6967 files not shown
+697-16373 files

LLVM/project d534b56llvm/lib/Target/AMDGPU AMDGPUMCResourceInfo.cpp AMDGPUResourceUsageAnalysis.cpp, llvm/test/CodeGen/AMDGPU object-linking-local-resources.ll

[AMDGPU] Report only local per-function resource usage when object linking is enabled

With object linking the linker aggregates resource usage across TUs, so
compile-time pessimism and call-graph propagation duplicate the linker's work or
pollute its inputs.

In this mode, skip the per-callsite conservative bumps in
`AMDGPUResourceUsageAnalysis` and assign each resource symbol in
`AMDGPUMCResourceInfo` a concrete local constant instead of building call-graph
`max`/`or` expressions.
DeltaFile
+109-0llvm/test/CodeGen/AMDGPU/object-linking-local-resources.ll
+26-8llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.cpp
+10-1llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
+4-0llvm/lib/Target/AMDGPU/AMDGPUMCResourceInfo.h
+149-94 files

LLVM/project 8f89591llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.global.prefetch.ll llvm.amdgcn.flat.prefetch.ll

AMDGPU/GlobalISel: RegBankLegalize rules for flat/global prefetch (#192764)
DeltaFile
+4-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+1-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.prefetch.ll
+1-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.flat.prefetch.ll
+6-23 files

LLVM/project 861ebcaflang/lib/Evaluate fold-integer.cpp, flang/lib/Semantics resolve-names.cpp

Adding tests and intrinsic piece required for relationals.
DeltaFile
+117-0flang/test/Semantics/enumeration-type-relational.f90
+84-0flang/test/Semantics/enumeration-type-declarations.f90
+28-0flang/lib/Evaluate/fold-integer.cpp
+1-4flang/lib/Semantics/resolve-names.cpp
+230-44 files

LLVM/project 133eb8cclang/test/CodeGen/AArch64 v9.7a-neon-mmla-intrinsics.c, clang/test/CodeGen/AArch64/sve-intrinsics acle_sve_mmla-f16.c acle_sve_mmla-bf16.c

[AArch64][clang][llvm] Add ACLE Armv9.7 MMLA intrinsics

Implement new ACLE matrix multiply-accumulate intrinsics for Armv9.7:

```c
  // 16-bit floating-point matrix multiply-accumulate.
  // Only if __ARM_FEATURE_SVE_B16MM
  // Variant also available for _f16 if (__ARM_FEATURE_SVE2p2 && __ARM_FEATURE_F16MM).
  svbfloat16_t svmmla[_bf16](svbfloat16_t zda, svbfloat16_t zn, svbfloat16_t zm);

  // Half-precision matrix multiply accumulating to single-precision
  // instruction from Armv9.7-A. Requires the +f16f32mm architecture extension.
  float32x4_t vmmlaq_f32_f16(float32x4_t r, float16x8_t a, float16x8_t b)

  // Non-widening half-precision matrix multiply instruction. Requires the
  // +f16mm architecture extension.
  float16x8_t vmmlaq_f16_f16(float16x8_t r, float16x8_t a, float16x8_t b)
```
DeltaFile
+47-0clang/test/CodeGen/AArch64/v9.7a-neon-mmla-intrinsics.c
+32-0clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_mmla-f16.c
+32-0clang/test/Sema/AArch64/arm_sve_non_streaming_only_sve_AND_sve2p2_AND_f16mm.c
+32-0clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_mmla-bf16.c
+32-0clang/test/Sema/AArch64/arm_sve_non_streaming_only_sve_AND_sve-b16mm.c
+16-0llvm/test/CodeGen/AArch64/sve-fmmla-f16.ll
+191-014 files not shown
+297-1220 files

LLVM/project 58fb00fllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.unreachable.ll

AMDGPU/GlobalISel: RegBankLegalize rules for amdgcn_unreachable (#192762)
DeltaFile
+2-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.unreachable.ll
+1-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+3-02 files

LLVM/project dcbb5c8llvm/utils/TableGen/Common/GlobalISel GlobalISelMatchTable.cpp

[GlobalISel] Fix -Wunused-variable (#193009)

These variables are only used in assertions and set outside of the
variable definition, so mark them [[maybe_unused]].
DeltaFile
+2-2llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp
+2-21 files

LLVM/project 74049f6llvm/lib/Transforms/Vectorize VectorCombine.cpp, llvm/test/Transforms/VectorCombine/AArch64 shuffletoidentity.ll

[VectorCombine] Fix transitive Uses in foldShuffleToIdentity (#188989)

The Uses in foldShuffleToIdentity is intended to detect where an operand
is used to distinguish between splats, identities and concats of the
same value. When looking through multiple unsimplified shuffles the same
Use could be both a splat and a identity though. This patch changes the
Use to a Value and an original Use, so that even if we are looking
through multiple vectors we recognise the splat vs identity vs concat of
each use correctly.

Fixes #180338

(cherry picked from commit fd40c606652137706bc336ef80ed1814ab3d3680)
DeltaFile
+82-73llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+131-0llvm/test/Transforms/VectorCombine/AArch64/shuffletoidentity.ll
+4-2llvm/test/Transforms/VectorCombine/LoongArch/shuffle-identity-miscompile.ll
+217-753 files

LLVM/project a01a450llvm/test/Transforms/VectorCombine/LoongArch shuffle-identity-miscompile.ll lit.local.cfg

[NFC][test] Precommit test for pr188989 (#188667)

Precommit test for #188989.

This test case covers a scenario in the vector combine
foldShuffleToIdentity function where incorrect folding was caused when
different shuffle sequences shared the same initial Use *. This issue
may be due to cost model differences and currently reproduces only on
LoongArch for this test case.

(cherry picked from commit 3e015b89e8bd9c71f6bb1cf38747d2862f5d5a3d)
DeltaFile
+22-0llvm/test/Transforms/VectorCombine/LoongArch/shuffle-identity-miscompile.ll
+4-0llvm/test/Transforms/VectorCombine/LoongArch/lit.local.cfg
+26-02 files

LLVM/project 9f29c1ellvm/lib/Target/X86 X86ISelLoweringCall.cpp, llvm/test/CodeGen/X86 musttail-struct.ll

[X86] Fix missing ByValTemporaries update in CopyViaTemp path for musttail calls (#190540)

This fixes a miscompilation in musttail calls with byval arguments on
X86.

In the CopyViaTemp path, a temporary stack object is created and the
argument is copied into it.
However, the temporary is not recorded in ByValTemporaries,
so the final lowering phase does not emit the copy to the real outgoing
argument slot.

As a result, the callee may read incorrect values from the stack.

Fix this by recording the temporary in ByValTemporaries so that the
final lowering step correctly copies the argument to the expected stack
location.

Reproducer: https://github.com/llvm/llvm-project/issues/190429
(cherry picked from commit abd502a44e5ef19a302d943eeb017c29124b96e9)
DeltaFile
+45-13llvm/test/CodeGen/X86/musttail-struct.ll
+1-0llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+46-132 files

LLVM/project 6796efelibsycl/include/sycl/__impl usm_functions.hpp

[libsycl] Fix _LIBSYCL_EXPORT placement (#192243)

Current placement of _LIBSYCL_EXPORT in usm_functions.hpp causes
compilation errors on Windows and is not aligned with other header
files.
DeltaFile
+10-10libsycl/include/sycl/__impl/usm_functions.hpp
+10-101 files

LLVM/project 2f0b2ffruntimes CMakeLists.txt, runtimes/cmake config-Fortran.cmake

Partial inlining of config-Fortran
DeltaFile
+34-5runtimes/CMakeLists.txt
+0-18runtimes/cmake/config-Fortran.cmake
+34-232 files

LLVM/project fd1b872llvm/include/llvm/Analysis ScalarEvolutionPatternMatch.h, llvm/include/llvm/IR PatternMatch.h

[PatternMatchHelpers] Factor deferred and bind matchers (NFC) (#191373)

Factor bind_ty and deferredval_ty as match_bind and match_deferred from
existing PatternMatch implementations into PatternMatchHelpers.
DeltaFile
+37-84llvm/include/llvm/IR/PatternMatch.h
+20-32llvm/include/llvm/Analysis/ScalarEvolutionPatternMatch.h
+10-30llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+21-0llvm/include/llvm/Support/PatternMatchHelpers.h
+1-1llvm/lib/Transforms/Scalar/NaryReassociate.cpp
+89-1475 files

LLVM/project df6792blldb/docs/use links.rst

[lldb][docs] Add conference talks to the links page (#192724)

At EuroLLVM, I mentioned a previous LLDB talk and realized they would be
a lot more discoverable if we linked them from the website.
DeltaFile
+28-0lldb/docs/use/links.rst
+28-01 files

LLVM/project 6b201d5llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/AArch64 scalable-strict-fadd.ll sve-wide-lane-mask.ll

[LV][NFC] Rename PreferPredicateOverEpilogue to TailFoldingPolicy (#191803)

Rename the -prefer-predicate-over-epilogue flag and its associated
enum values to use 'TailFold' terminology instead of 'Predicate'. The
term 'Predicate' is overloaded in the vectorizer context and would
cause further confusion as more tail-folding styles are added.
DeltaFile
+23-34llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+6-6llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
+5-5llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll
+4-4llvm/test/Transforms/LoopVectorize/VPlan/RISCV/vplan-vp-intrinsics-reduction.ll
+4-4llvm/test/Transforms/LoopVectorize/AArch64/sve-wide-lane-mask.ll
+4-4llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-intermediate-store.ll
+46-57142 files not shown
+260-271148 files

LLVM/project eff4d47llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/AArch64 transform-narrow-interleave-to-widen-memory-multi-block.ll

[VPlan] CSE ScalarIVSteps recipes (#191307)

Extend getOpCodeOrIntrinsicID to return a pseudo opcode for
ScalarIVSteps, so it can be CSE'd, when extended to also check the
InductionOpcode.
DeltaFile
+14-7llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+6-15llvm/test/Transforms/LoopVectorize/X86/cost-conditional-branches.ll
+1-3llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll
+1-2llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-multi-block.ll
+22-274 files

LLVM/project 94482e8lldb/source/Plugins/Process/wasm ProcessWasm.cpp

[lldb] Fix crash when evaluating expressions in Wasm targets (#192893)

LLDB crashes with "LLVM ERROR: Incompatible object format!" when
evaluating expressions while debugging WebAssembly because ProcessWasm
never disables JIT. RuntimeDyld only supports ELF, MachO, and COFF
object formats, so attempting to JIT-compile an expression for a Wasm
target produces the aforementioned fatal error.

This PR avoids the crash by calling `SetCanJIT(false)` in the
`ProcessWasm` ctor. Simple expressions will still work via the IR
interpreter, while expression requiring the JIT now show a proper error
message instead of crashing.

Fixes #179915
DeltaFile
+3-0lldb/source/Plugins/Process/wasm/ProcessWasm.cpp
+3-01 files

LLVM/project ea897e6clang/include/clang/AST ASTContext.h, clang/lib/AST ASTContext.cpp ItaniumMangle.cpp

[clang] implement CWG2064: ignore value dependence for decltype

The 'decltype' for a value-dependent (but non-type-dependent) should be known,
so this patch makes them non-opaque instead.

This patch also implements what's neceessary to allow overloading
on pure differences in instantiation dependence, making `std::void_t`
usable for SFINAE purposes.

This also readds a few test cases from da98651, which was a previous attempt
at resolving CWG2064.

Fixes #8740
Fixes #61818
Fixes #190388
DeltaFile
+906-175clang/lib/AST/ASTContext.cpp
+312-12clang/test/SemaTemplate/instantiation-dependence.cpp
+151-93clang/lib/AST/ItaniumMangle.cpp
+76-68clang/lib/AST/Type.cpp
+76-48clang/lib/Sema/SemaTemplate.cpp
+93-16clang/include/clang/AST/ASTContext.h
+1,614-41282 files not shown
+2,373-77588 files

LLVM/project 916f9fdllvm/lib/Target/SPIRV SPIRVEmitIntrinsics.cpp, llvm/test/CodeGen/SPIRV/pointers getelementptr-byte-addressing-array.ll

[SPIR-V] Handle [N x i8] byte addressing in SPIRVEmitIntrinsics

LLVM started generating [N x i8] types on array indexing GEPs. Emit
intrinsiscs did not know what to do with it so it was generating a
cast to [N x i8] to perform the GEP. This does not work in logical
addressing.

The handle this, we expand the `i8` gep handling for logical addressing
mode to work for arbitrary size byte addressing.
DeltaFile
+52-18llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+64-0llvm/test/CodeGen/SPIRV/pointers/getelementptr-byte-addressing-array.ll
+116-182 files