LLVM/project a5aaa9dllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 bool-mask.ll

[SLP]Convert compares from zexts, promoted to selects, to inversed op, if improves codegen

Some of the zext i1 (cmp) + select sequences can be transformed by
inverting compare predicates to remove extra shuffles, like
zext 1 (cmp ne) + select (cmp eq), 0, 2 can be modeled as select <2
x > (cmp ne), <1, 2>, zeroinitializer

Reviewers: RKSimon, hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/181580
DeltaFile
+102-8llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+12-36llvm/test/Transforms/SLPVectorizer/X86/bool-mask.ll
+114-442 files

LLVM/project 3dd1a3dlldb/source/Plugins/Process/elf-core ThreadElfCore.cpp, lldb/test/API/commands/target/stop-hooks/on-core-load TestStopHookOnCoreLoad.py

[LLDB][ELF CORE] Only display a stop reason when there is a valid signo (#172781)

This patch fixes where ELF cores will report all threads as `STOP REASON
0`.

This was/is a large personal annoyance of mine; added a test to verify a
default elf core process/thread has no valid stop reason.
DeltaFile
+17-0lldb/unittests/Process/elf-core/ThreadElfCoreTest.cpp
+9-3lldb/test/API/commands/target/stop-hooks/on-core-load/TestStopHookOnCoreLoad.py
+9-0lldb/source/Plugins/Process/elf-core/ThreadElfCore.cpp
+4-4lldb/test/Shell/Register/Core/x86-32-netbsd-multithread.test
+4-4lldb/test/Shell/Register/Core/x86-64-netbsd-multithread.test
+3-3lldb/test/Shell/Register/Core/x86-32-linux-multithread.test
+46-142 files not shown
+50-208 files

LLVM/project 46ed620llvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp AArch64LegalizerInfo.cpp, llvm/test/CodeGen/AArch64 fp16_i16_intrinsic_scalar.ll fp16_intrinsic_scalar_1op.ll

[AArch64][GlobalISel] Remove fallbacks for fpcvt intrinsics with 16-bit operands (#179693)

Previously, GlobalISel failed to lower neon fpcvt intrinsics, as
RegBankSelect was not keeping the result on a fpr.
An additional fix is needed for the fpcvtz intrinsics, as these are the
"default" floating point convert intrinsics. As a result, Instruction
Selection has patterns mapping the FPCVTZ intrinsic to the
architecture-agnostic G_FP_TO*I_SAT node.
This also provides the opportunity for more optimisations to be made to
the code before Selection.
DeltaFile
+0-128llvm/test/CodeGen/AArch64/fp16_i16_intrinsic_scalar.ll
+110-0llvm/test/CodeGen/AArch64/fp16_intrinsic_scalar_1op.ll
+11-6llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+4-0llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+125-1344 files

LLVM/project 99263d5clang/lib/Driver/ToolChains Clang.cpp, clang/test/Driver openmp-offload-gpu.c

Revert "[Clang][OpenMP][Driver] Make linker to link Device RTL  when built for SPIRV" (#181876)

Reverts llvm/llvm-project#180066
DeltaFile
+0-17clang/test/Driver/openmp-offload-gpu.c
+1-1clang/lib/Driver/ToolChains/Clang.cpp
+1-182 files

LLVM/project 26f944bllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AArch64 revec-non-pow2.ll

[SLP]Fix an ArrayRef out-of-bounds access in slice

If the revec is enabled, may have the number of parts (registers) for
the combined node, not a single element node, so need to check for
potential out-of-bounds access

Fixes #181798
DeltaFile
+121-0llvm/test/Transforms/SLPVectorizer/AArch64/revec-non-pow2.ll
+7-2llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+128-22 files

LLVM/project 1d57656llvm/lib/CodeGen/MIRParser MIParser.cpp

[NFC][CodeGen] Add braces for else per LLVM coding standard (#181750)

Add braces for else bodies if the if body has braces.
DeltaFile
+16-8llvm/lib/CodeGen/MIRParser/MIParser.cpp
+16-81 files

LLVM/project e4e671fllvm/test/CodeGen/AArch64 sve-fixed-length-int-reduce.ll

Add missing SVE2 checks
DeltaFile
+65-31llvm/test/CodeGen/AArch64/sve-fixed-length-int-reduce.ll
+65-311 files

LLVM/project a4962c4clang/lib/CIR/CodeGen CIRGenTypes.cpp CIRGenExprConstant.cpp, clang/test/CIR/CodeGen pointer-to-member-func.cpp pointer-to-member-func-cast.cpp

[CIR] Fix emission of functions referenced by member-pointer (#181452)

While working on attributes for these, I discovered that when a function
was referenced only via a member function pointer (see no-odr-use.cpp
test for the example that failed!), that we were incorrectly generating
the type of the function to not include the 'this' pointer. This
restores that behavior by making sure we generate the type for the
member-pointer type correctly.
DeltaFile
+25-25clang/test/CIR/CodeGen/pointer-to-member-func.cpp
+10-10clang/test/CIR/CodeGen/pointer-to-member-func-cast.cpp
+17-0clang/test/CIR/CodeGen/no-odr-use.cpp
+6-6clang/test/CIR/CodeGen/pointer-to-member-func-cmp.cpp
+7-3clang/lib/CIR/CodeGen/CIRGenTypes.cpp
+2-1clang/lib/CIR/CodeGen/CIRGenExprConstant.cpp
+67-456 files

LLVM/project 5929882llvm/lib/Target/SPIRV SPIRVTypeInst.h SPIRVGlobalRegistry.h

[NFC][SPIRV] Move `SPIRVTypeInst` to its own header: `SPIRVTypeInst.h/cpp`
DeltaFile
+77-0llvm/lib/Target/SPIRV/SPIRVTypeInst.h
+1-65llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.h
+26-0llvm/lib/Target/SPIRV/SPIRVTypeInst.cpp
+2-0llvm/lib/Target/SPIRV/SPIRVUtils.h
+1-0llvm/lib/Target/SPIRV/CMakeLists.txt
+107-655 files

LLVM/project 49571d5

Fix Bazel build for dfc5469 (#181868)

Co-authored-by: Pranav Kant <prka at google.com>
DeltaFile
+0-00 files

LLVM/project ade05a3utils/bazel/llvm-project-overlay/mlir BUILD.bazel

[bazel] fix #181383 (#181867)

fix #181383
DeltaFile
+1-1utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+1-11 files

LLVM/project a166de9llvm/lib/MCA InstrBuilder.cpp, llvm/test/tools/llvm-mca/AArch64/HiSilicon tsv110-forwarding.s

[llvm-mca] Missing data dependencies due to constant registers not being cached (#177990)

Commit 385f59f modified MCA InstrBuilder methods `populateReads` and
`populateWrites` to discard information about constant registers and
avoid creating non-existent dependency chains.

However, information about reads and writes is cached based on
instruction descriptions. In this way, if the same instruction is
encountered multiple times with (before) and without (after) a constant
register, the cached entry will not contain information about that
specific register, resulting in missing data dependencies.

This patch moves the check of constant registers to `createInstruction`,
so that cached entries will also take into account constant registers
and, if necessary, they will be discarded later when creating the
instruction.
DeltaFile
+97-0llvm/test/tools/llvm-mca/RISCV/Andes45/zero-reg.s
+18-17llvm/test/tools/llvm-mca/AArch64/Neoverse/V3-forwarding.s
+18-17llvm/test/tools/llvm-mca/AArch64/Neoverse/V3AE-forwarding.s
+12-12llvm/test/tools/llvm-mca/AArch64/HiSilicon/tsv110-forwarding.s
+3-10llvm/lib/MCA/InstrBuilder.cpp
+148-565 files

LLVM/project 1229c23clang/lib/CodeGen CGHLSLRuntime.cpp, clang/test/CodeGenHLSL/semantics semantic.nested.vs.hlsl semantic.explicit-mix-builtin.vs.hlsl

[Clang][HLSL] Fix struct semantic store (#181681)

The store to a nested semantic had an issue we the field index was not
increased when walking through it.
One of the check-in test was bad, causing this to slip by.

Fixes #181674
DeltaFile
+52-0clang/test/CodeGenHLSL/semantics/semantic.nested.vs.hlsl
+5-1clang/test/CodeGenHLSL/semantics/semantic.explicit-mix-builtin.vs.hlsl
+1-1clang/lib/CodeGen/CGHLSLRuntime.cpp
+58-23 files

LLVM/project 3c32747flang-rt/unittests/Runtime/CUDA DefaultStream.cpp, flang/include/flang/Optimizer/Builder CUDAIntrinsicCall.h

[flang][cuda] Lower set/get default stream (#181775)

DeltaFile
+37-0flang/lib/Optimizer/Builder/CUDAIntrinsicCall.cpp
+15-0flang-rt/unittests/Runtime/CUDA/DefaultStream.cpp
+4-0flang/include/flang/Optimizer/Builder/CUDAIntrinsicCall.h
+2-2flang/module/cuda_runtime_api.f90
+58-24 files

LLVM/project 9bbd8e2llvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp

[AArch64][GlobalISel] Add other factors to comment

fp conversion result may also be stored on an fpr if the result is of equal size to its input size, or if PRCVT Is present.
DeltaFile
+3-1llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+3-11 files

LLVM/project 5addddfllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

[AMDGPU][SIInsertWaitcnts][NFC] Move soft xcnt deletion to separate function (#181760)

This patch simplifies the logic of `insertWaitcntInBlock()` by moving
the code that removes the redundant soft xcnt instructions to a new
function: `removeRedundantSoftXcnts()`.

While doing so, this patch also cleans up the logic a bit by dropping
the AtomiRMWState and the corresponding functions.

This helps in several ways:
- insertWaitcntInBlock() will now do what its name suggests, i.e., only
insert and not remove.
- it makes it clear that removal of softxcnts is orthogonal to insertion
of waitcnts.
- we won't have to worry about both erased and new instruction in
insertWaitcntInBlock()'s loop.

The change should be NFC.
DeltaFile
+42-72llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+42-721 files

LLVM/project 60e50a4clang/lib/CIR/CodeGen CIRGenBuiltin.cpp, clang/test/CIR/CodeGenBuiltins builtin_call.cpp

[CIR] Fix handling of boolean builtin expressions (#181444)

Previously we were generating a signed 1-bit integer constant for
builtin expressions that returned a boolean value. This caused a
verification error of mismatched types when we tried to store this
constant result to a pointer-to-bool location. This change adds a check
for boolean types.
DeltaFile
+11-10clang/test/CIR/CodeGenBuiltins/builtin_call.cpp
+6-1clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+17-112 files

LLVM/project c52f2b6llvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp

[AArch64][GlobalISel] Re-add necessary brackets

Some brackets needed to allow "Build and Test Linux" CI test to pass.
This is because some configurations of clang see the order of operations in A || B && C as ambigious. Add the brackets in to avoid this.
DeltaFile
+6-7llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+6-71 files

LLVM/project 8a49130flang/test/Lower/OpenMP/Todo omp-declare-simd.f90

[NFC][Flang][OpenMP] Remove obsolete declare simd lowering TODO test (#181756)

The TODO test for Flang OpenMP `declare simd` lowering is no longer
needed, as the lowering was implemented in
https://github.com/llvm/llvm-project/pull/175604.
DeltaFile
+0-11flang/test/Lower/OpenMP/Todo/omp-declare-simd.f90
+0-111 files

LLVM/project d3e683clld/ELF/Arch SystemZ.cpp, lld/test/ELF systemz-tls-ld.s

[ELF][SystemZ] Fix R_390_TLS_LDO32/64 in non-SHF_ALLOC sections

These can appear in .debug_info so, like other architectures (e.g.
X86_64), we still need to handle them in getRelExpr.

Fixes: aec1c984266c ("[ELF] Add target-specific relocation scanning for SystemZ (#181563)")
DeltaFile
+11-0lld/test/ELF/systemz-tls-ld.s
+3-0lld/ELF/Arch/SystemZ.cpp
+14-02 files

LLVM/project 7131244mlir/lib/Dialect/AMDGPU/IR AMDGPUOps.cpp, mlir/test/Dialect/AMDGPU canonicalize.mlir

[mlir][AMDGPU] Allow packing of exactly 4 elements. (#181843)

`amdgpu.scaled_mfma` ops ingest byte sized scales stored in 4-byte
registers. To avoid unnecessary padding (where we only ever use the
first byte in this 4-byte register), this canonicalization finds
opportunities to enable packing multiple scales into 4-byte chunks
whenever possible. Note this is necessary but not sufficient to avoid
byte loads from LDS.

This canonicalization should try to pack scales that are extracted from
an alloc in shared mem of size 4 bytes or larger (meaning packing to 4
bytes is possible). Currently we bail out if it is exactly 4 bytes long
which is incorrect and fixed in this PR.

---------

Signed-off-by: Muzammiluddin Syed <muzasyed at amd.com>
DeltaFile
+15-0mlir/test/Dialect/AMDGPU/canonicalize.mlir
+2-2mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
+17-22 files

LLVM/project 03ad654llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/AArch64 funnel-shift.ll

[DAGCombiner] Combine (fshl A, X, Y) | (shl X, Y) --> fshl (A|X), X, Y (#180887)

Similar for (fshr X, B, Y) | (srl X, Y) --> fshr X, (X|B), Y

This is similar to the FSHL/FSHR handling in
hoistLogicOpWithSameOpcodeHands but here we treat a shl/shr like a
fshl/fshr with 0.

The pattern doesn't require X to be the same in both sides, but that's
what occurred in the case I was looking at so that's what is
implemented.

Alive2: https://alive2.llvm.org/ce/z/eUou-u
DeltaFile
+20-40llvm/test/CodeGen/X86/funnel-shift.ll
+23-31llvm/test/CodeGen/AArch64/funnel-shift.ll
+26-8llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+69-793 files

LLVM/project fb46677clang/include/clang/StaticAnalyzer/Core/PathSensitive CoreEngine.h, clang/lib/StaticAnalyzer/Core ExprEngine.cpp ExprEngineC.cpp

[NFC][analyzer] Remove StmtNodeBuilder (#181431)

The class `StmtNodeBuilder` was practically equivalent to its base class
`NodeBuilder` -- its data members and constructors were identical and
the only distinguishing feature was that it supported two additional
methods that were not present in `NodeBuilder`.

This commit moves those two methods to `NodeBuilder` (there is no reason
why they cannot be defined there) and replaces all references to
`StmtNodeBuilder` with plain `NodeBuilder`.

Note that previously `StmtNodeBuilder` had a distinguishing feature
where its destructor could pass nodes to an "enclosing node builder" but
this became dead code somewhen in the past, so my previous commit
320d0b5467b9586a188e06dd2620126f5cb99318 removed it.
DeltaFile
+16-35clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
+21-21clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+18-17clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
+16-16clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
+12-10clang/lib/StaticAnalyzer/Core/ExprEngineObjC.cpp
+1-12clang/test/Analysis/misc-ps-eager-assume.m
+84-1112 files not shown
+87-1158 files

LLVM/project dfc5469mlir/include/mlir/Dialect/OpenACC/Transforms Passes.td, mlir/lib/Dialect/OpenACC/Transforms ACCDeclareGPUModuleInsertion.cpp CMakeLists.txt

[mlir][acc] Add pass to insert acc declare globals into GPU module (#181383)

Adds a new OpenACC pass that copies globals with the `acc.declare`
attribute into the GPU module so that device code (acc routine, compute
regions) can reference them.

---------

Co-authored-by: Susan Tan <zujunt at nvidia.com>
DeltaFile
+145-0mlir/lib/Dialect/OpenACC/Transforms/ACCDeclareGPUModuleInsertion.cpp
+14-0mlir/test/Dialect/OpenACC/acc-declare-gpu-module-insertion.mlir
+9-0mlir/include/mlir/Dialect/OpenACC/Transforms/Passes.td
+1-0mlir/lib/Dialect/OpenACC/Transforms/CMakeLists.txt
+169-04 files

LLVM/project 1ed85c7clang/test/CIR/CodeGen ret-attrs.cpp

[NFC] Remove forgotten self-reminder comment that is no longer relevant (#181849)

DeltaFile
+0-1clang/test/CIR/CodeGen/ret-attrs.cpp
+0-11 files

LLVM/project ff30eabllvm/lib/Transforms/Scalar ScalarizeMaskedMemIntrin.cpp, llvm/test/CodeGen/X86 masked_gather_scatter.ll

[ScalarizeMaskedMemIntr][ProfCheck] Correctly annotate branch weights (#181568)

There are two cases in ScalarizeMaskedMemIntr where conditional branches
are created using conditionals derived from the mask. Given these are
synthesized ad we do not have VP metadata for them, we need to mark them
as unknown.
DeltaFile
+12-5llvm/test/CodeGen/X86/masked_gather_scatter.ll
+11-2llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp
+0-1llvm/utils/profcheck-xfail.txt
+23-83 files

LLVM/project 73e7eb4.ci/green-dragon relay-clang-stage2-thinlto.groovy relay-clang-stage2-sanitizers.groovy

[green dragon] limit relay jobs to prevent concurrent builds (#181854)

Limit the relay jobs to prevent eating up infra resources
DeltaFile
+4-0.ci/green-dragon/relay-clang-stage2-thinlto.groovy
+4-0.ci/green-dragon/relay-clang-stage2-sanitizers.groovy
+4-0.ci/green-dragon/relay-lnt-ctmark.groovy
+4-0.ci/green-dragon/relay-test-suite-verify-machineinstrs.groovy
+16-04 files

LLVM/project 24fad91clang/test/OpenMP teams_distribute_simd_codegen.cpp target_is_device_ptr_codegen.cpp

[Clang] Regenerate test checks (NFC)

To reduce spurious diffs in future changes.
DeltaFile
+194-194clang/test/OpenMP/teams_distribute_simd_codegen.cpp
+168-168clang/test/OpenMP/target_is_device_ptr_codegen.cpp
+149-149clang/test/OpenMP/for_reduction_codegen_UDR.cpp
+127-127clang/test/OpenMP/target_teams_generic_loop_codegen.cpp
+29-29clang/test/OpenMP/target_teams_distribute_parallel_for_reduction_task_codegen.cpp
+28-28clang/test/OpenMP/teams_distribute_parallel_for_reduction_task_codegen.cpp
+695-69512 files not shown
+937-93718 files

LLVM/project 095c8aallvm/lib/Target/AArch64 AArch64ISelLowering.cpp

Fixups
DeltaFile
+4-6llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+4-61 files

LLVM/project 7018ddfllvm/test/Transforms/LoopVectorize/RISCV runtime-check-dependent-on-stride.ll

[NFC][VPlan] Test showing that unit-stride-mv should be done later in pipeline (#180292)

Right now memory dependencies checks and speculation for unit-strideness
are performed somewhat simultaneously. This is wrong because:

* Ideally, if accesses aren't unit-strided in runtime we might want to
take a version with gather/strided load (longer term). Those two loops
should share legality checks and the dispatch based on stride should
only happen after the legality condition has been satisfied.
* Even if we don't generate multiple vector loops (current situation),
not vectorizing at all is worse than generating gather-only vector loop.

This PR adds a test for the latter as that could be a first step in
adding full support for the former.

This isn't target-specific, but gathers aren't supported in generic
target and result in very ugly scalarized code/CHECKs, hence put the
test under RISCV/.

Co-authored-by: Florian Hahn <flo at fhahn.com>
DeltaFile
+158-0llvm/test/Transforms/LoopVectorize/RISCV/runtime-check-dependent-on-stride.ll
+158-01 files