LLVM/project a1f83ballvm/lib/Transforms/Vectorize VPlanTransforms.cpp

[LV] NFCI: Move extend optimization to transformToPartialReduction. (#182860)

The reason for doing this in `transformToPartialReduction` is so that we
can create the VPExpressions directly when transforming reductions into
partial reductions (to be done in a follow-up PR).

I also intent to see if we can merge the in-loop reductions with partial
reductions, so that there will be no need for the separate
`convertToAbstractRecipes` VPlan Transform pass.
DeltaFile
+65-8llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+65-81 files

LLVM/project 5af5bd4llvm/lib/Target/X86 X86ISelLowering.cpp X86ExpandPseudo.cpp

[AMX][NFC] Match pseudo name with isa  (#182235)

Adds missing suffix to clear intent for isa.
we switch from `TILEMOVROWrre` to `TILEMOVROWrte` in
https://github.com/llvm/llvm-project/pull/168193 , however pseudo was
same, updating pseudo to intent right isa version, This patch makes
changes `PTILEMOVROWrre` to `PTILEMOVROWrte`, even though pseudo does
not actually have any tile register.

---------

Co-authored-by: mattarde <mattarde at intel.com>
DeltaFile
+24-24llvm/lib/Target/X86/X86ISelLowering.cpp
+24-24llvm/lib/Target/X86/X86ExpandPseudo.cpp
+18-18llvm/lib/Target/X86/X86InstrAMX.td
+12-12llvm/lib/Target/X86/X86PreTileConfig.cpp
+78-784 files

LLVM/project 058705bclang/include/clang/StaticAnalyzer/Core/PathSensitive ProgramState.h, clang/lib/StaticAnalyzer/Core ProgramState.cpp

[Clang][NFCI] Make program state GDM key const pointer (#183477)

This commit makes the GDM key in ProgramState a constant pointer. This
is done to better reflect the intention of the key as a unique
identifier for the data stored in the GDM, and to prevent the use of the
storage pointed to by the key as global state.

Signed-off-by: Steffen Holst Larsen <sholstla at amd.com>
DeltaFile
+9-8clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+8-7clang/lib/StaticAnalyzer/Core/ProgramState.cpp
+17-152 files

LLVM/project 9145bf6llvm/lib/Target/ARM ARMISelLowering.cpp, llvm/test/CodeGen/ARM fp-intrinsics-vector-v8.ll

Lower strictfp vector rounding operations similar to default mode

Previously the strictfp rounding nodes were lowered using unrolling to
scalar operations, which has negative impact on performance. Partially
this issue was fixed in #180480, this change continues that work and
implements optimized lowering for v4f16 and v8f16.
DeltaFile
+10-220llvm/test/CodeGen/ARM/fp-intrinsics-vector-v8.ll
+7-12llvm/lib/Target/ARM/ARMISelLowering.cpp
+17-2322 files

LLVM/project db56f21llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU fsqrt.f64.ll rsq.f64.ll

AMDGPU: Skip last corrections and scaling for afn llvm.sqrt.f64

Device libs has a fast sqrt macro implemented this way.
DeltaFile
+240-652llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll
+140-602llvm/test/CodeGen/AMDGPU/rsq.f64.ll
+23-17llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+22-17llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+425-1,2884 files

LLVM/project a4b65abllvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU fsqrt.f64.ll rsq.f64.ll

AMDGPU: Improve fsqrt f64 expansion with ninf

Address todo to reduce the is_fpclass check to an fcmp with 0.
DeltaFile
+52-92llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll
+60-80llvm/test/CodeGen/AMDGPU/rsq.f64.ll
+10-6llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-3llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+3-2llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fsqrt.mir
+132-1835 files

LLVM/project 9270406llvm/lib/Target/X86 X86TargetTransformInfo.cpp, llvm/lib/Transforms/Vectorize VectorCombine.cpp

[VectorCombine][X86] Ensure we recognise free sign extends of vector comparison results (#183575)

Unless we're working with AVX512 mask predicate types, sign extending a
vXi1 comparison result back to the width of the comparison source types
is free.

VectorCombine::foldShuffleOfCastops - pass the original CastInst in the
getCastInstrCost calls to track the source comparison instruction.

Fixes #165813
DeltaFile
+12-54llvm/test/Transforms/VectorCombine/X86/shuffle-of-casts.ll
+14-0llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+2-2llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+28-563 files

LLVM/project 4708508llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU fsqrt.f64.ll rsq.f64.ll

AMDGPU: Skip last corrections and scaling for afn llvm.sqrt.f64

Device libs has a fast sqrt macro implemented this way.
DeltaFile
+180-660llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll
+143-625llvm/test/CodeGen/AMDGPU/rsq.f64.ll
+23-17llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+22-17llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+368-1,3194 files

LLVM/project a5bbedfllvm/test/Transforms/LoopVectorize/AArch64 scalable-reductions-tf.ll

[LV] Convert test to UTC. NFC
DeltaFile
+40-16llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions-tf.ll
+40-161 files

LLVM/project 96a65cfllvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU rsq.f64.ll fdiv.f64.ll

AMDGPU: Skip last corrections in afn f64 reciprocal

Device libs has a fast reciprocal macro that is close
to the fast division expansion, but skips the last terms
compared to the full division.

The basic reciprocal handling has identical output to this
macro. The negative reciprocal case has different fneg placement
and smaller code size, but I believe should be the same.
DeltaFile
+32-116llvm/test/CodeGen/AMDGPU/rsq.f64.ll
+37-7llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll
+16-0llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+15-0llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+12-2llvm/test/CodeGen/AMDGPU/fdiv.f64.ll
+0-4llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll
+112-1291 files not shown
+112-1317 files

LLVM/project becbc33llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU fsqrt.f64.ll

AMDGPU: Improve fsqrt f64 expansion with ninf

Address todo to reduce the is_fpclass check to an fcmp with 0.
DeltaFile
+52-92llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll
+10-6llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+7-3llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+3-2llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fsqrt.mir
+72-1034 files

LLVM/project b0b3e3ellvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/RISCV reductions.ll tail-folding-cast-intrinsics.ll

[VPlan] Don't drop NUW flag on tail folded canonical IVs (#183301)

After #183080 vscale can no longer be a non-power of 2, which means the
canonical IV can't overflow with tail folding w/ scalable vectors
anymore. Therefore we don't need to drop the NUW flag.

IVUpdateMayOverflow is left to be removed in a separate PR since it
removes further runtime checks.
DeltaFile
+36-36llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll
+22-22llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cast-intrinsics.ll
+18-18llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-call-intrinsics.ll
+17-17llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
+7-25llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+14-14llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll
+114-13254 files not shown
+243-26160 files

LLVM/project 72ad4d2llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/MC/AMDGPU gfx8_asm_vop3.s gfx7_asm_vop3.s

Merge branch 'main' into users/evelez7/clang-doc-anonymous-enums
DeltaFile
+121,423-138,333llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+43,316-44,830llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+42,349-42,348llvm/test/MC/AMDGPU/gfx8_asm_vop3.s
+41,419-41,418llvm/test/MC/AMDGPU/gfx7_asm_vop3.s
+36,428-36,427llvm/test/MC/AMDGPU/gfx9_asm_vop3.s
+28,175-28,174llvm/test/MC/AMDGPU/gfx9_asm_vopc.s
+313,110-331,53018,218 files not shown
+2,990,465-1,991,46518,224 files

LLVM/project 192acd6clang/include/clang/Basic BuiltinsAMDGPU.td, clang/test/CodeGenHIP builtins-amdgcn-gfx1250-wmma-f16.hip

[Clang][AMDGPU] Change __fp16 to _Float16 in GFX1250 WMMA/SWMMAC builtin definitions (#183493)

Change the type signature of `gfx1250 WMMA/SWMMAC` builtins from
`__fp16` to `_Float16` in the tablegen builtin definitions.
DeltaFile
+469-0clang/test/CodeGenHIP/builtins-amdgcn-gfx1250-wmma-f16.hip
+16-16clang/include/clang/Basic/BuiltinsAMDGPU.td
+485-162 files

LLVM/project 64d9e6bllvm/test/CodeGen/ARM fp-intrinsics-vector-v8.ll

Precommit tests: strictfp rounding vector f16 intrinsics
DeltaFile
+361-1llvm/test/CodeGen/ARM/fp-intrinsics-vector-v8.ll
+361-11 files

LLVM/project 32134a6mlir/lib/Dialect/LLVMIR/IR LLVMDialectBytecode.cpp, mlir/test/mlir-tblgen bytecode-write.td

[mlirbc] Switch generator to enable write's with failures. (#182464)

Previously one had to have a matching case per entry (e.g., one could
use a printer predicate, but the assumption was one woujld never
fallback) and just always return success.
DeltaFile
+28-0mlir/test/mlir-tblgen/bytecode-write.td
+8-5mlir/tools/mlir-tblgen/BytecodeDialectGen.cpp
+0-7mlir/lib/Dialect/LLVMIR/IR/LLVMDialectBytecode.cpp
+36-123 files

LLVM/project ed8f080clang/www cxx_status.html

[Clang][docs] Fix proposal number typo for P1847R4 (#183671)

This PR fixes a typo in `clang/www/cxx_status.html`. 

The link text for the feature "Make declaration order layout mandated"
incorrectly referred to **P1874R4**, while the actual URL
(https://wg21.link/p1847r4) and the feature name correctly point to
**P1847R4**.

This change corrects the displayed text to match the proposal number.
DeltaFile
+1-1clang/www/cxx_status.html
+1-11 files

LLVM/project 86b99efclang/docs ReleaseNotes.rst, clang/lib/Sema SemaDecl.cpp

Revert "[Sema] Fix crash on invalid operator template-id (#181404)" (#183682)

Reverts llvm/llvm-project#181404
(c056d7c5d6ea076b38fa937c54ab44ce2e5a95e1) because of post-commit ci
failure.
DeltaFile
+0-7clang/test/SemaCXX/crash-invalid-operator-template.cpp
+0-4clang/lib/Sema/SemaDecl.cpp
+0-1clang/docs/ReleaseNotes.rst
+0-123 files

LLVM/project 07007b7lldb/source/Host/macosx/objcxx HostInfoMacOSX.mm

[lldb] Don't add remap entries for empty segments (#183651)

There are some binaries in the shared cache with a zero-length segment,
or segments who get mapped to lldb address 0 to indicate a failure. Do
not add entries to the VirtualDataExtractor's LookupTablefor those -
they
are not readable.

rdar://171106338
DeltaFile
+2-1lldb/source/Host/macosx/objcxx/HostInfoMacOSX.mm
+2-11 files

LLVM/project 145051bclang/docs ReleaseNotes.rst, clang/lib/Sema SemaDecl.cpp

Revert "[Sema] Fix crash on invalid operator template-id (#181404)"

This reverts commit c056d7c5d6ea076b38fa937c54ab44ce2e5a95e1.
DeltaFile
+0-7clang/test/SemaCXX/crash-invalid-operator-template.cpp
+0-4clang/lib/Sema/SemaDecl.cpp
+0-1clang/docs/ReleaseNotes.rst
+0-123 files

LLVM/project b033dcbclang/unittests/Analysis/Scalable/Serialization/JSONFormatTest TUSummaryTest.cpp, lldb/test/API/functionalities/postmortem/FreeBSD-Kernel-Core/tools libfbsdvmcore-hacks.patch

rebase

Created using spr 1.3.7
DeltaFile
+456-103llvm/test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll
+0-323lldb/test/API/functionalities/postmortem/FreeBSD-Kernel-Core/tools/libfbsdvmcore-hacks.patch
+301-0llvm/test/CodeGen/AArch64/faddv.ll
+140-134clang/unittests/Analysis/Scalable/Serialization/JSONFormatTest/TUSummaryTest.cpp
+160-64llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
+2-221llvm/test/CodeGen/PowerPC/fmf-propagation.ll
+1,059-845180 files not shown
+4,264-1,909186 files

LLVM/project 77600cbmlir/lib/Dialect/XeGPU/Transforms XeGPULayoutImpl.cpp, mlir/test/Dialect/XeGPU propagate-layout-inst-data.mlir propagate-layout.mlir

[MLIR][XeGPU] XeGPU Layout adds support for fractional-subgroup-size vector  (#183434)

This PR enhances the layout assignment for XeGPU load/store operations
to handle vector size smaller than subgroup size.
Say for vector[4], in case of lane_data=[1], lane_layout=[4] and
inst_data=[4].
The fractional-subgroup-size vector support is required to support the
cross-subgroup reduction case. The number of participant subgroups in
reduction can be small, so it causes each subgroup needs to reduce a
small vector size, often a fraction of subgroup size.
Most layout-based subgroup distribution patterns support
fraction-subgroup-size without no change except a few: reduction,
insert/extract, constant. We don't expect ND operations (like
load_nd/store_nd/dpas) accept fractional-subgroup-size vector.
DeltaFile
+18-16mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutImpl.cpp
+15-1mlir/test/Dialect/XeGPU/propagate-layout-inst-data.mlir
+14-0mlir/test/Dialect/XeGPU/propagate-layout.mlir
+47-173 files

LLVM/project f30dfe7mlir/tools/mlir-tblgen OpDocGen.cpp

Revert "[mlir-tblgen] Remove `namespace {}` around OpDocGroup (#182721)" (#183458)

Reverts #182721, it's not needed after #183457.

It was a work around for #182720.

This reverts commit a0f344f69d7eb5d87dd78c628a196a3a7440e792.
DeltaFile
+2-4mlir/tools/mlir-tblgen/OpDocGen.cpp
+2-41 files

LLVM/project b354b20clang/lib/Driver SanitizerArgs.cpp, clang/test/Driver fsanitize-minimal-runtime.c

[SafeStack] Allow -fsanitize-minimal-runtime with -fsanitize=safestack (#183644)

SafeStack does not require a full sanitizer runtime, so it should be
compatible
with the minimal runtime flag.
DeltaFile
+3-3clang/lib/Driver/SanitizerArgs.cpp
+6-0compiler-rt/test/safestack/overflow.c
+4-0clang/test/Driver/fsanitize-minimal-runtime.c
+13-33 files

LLVM/project 5929c90mlir/lib/Dialect/Vector/IR VectorOps.cpp

[mlir][vector] Fix fold result for empty vector.mask with no results (#180345)

This PR fixes `foldEmptyMaskOp` to return `failure` when folding an
empty vector.mask whose terminatorhas no operands. Previously this case
returned success without producing any folded results, which violates
the folding contract. Fixes #177825.
DeltaFile
+2-4mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+2-41 files

LLVM/project a54df46

rebase

Created using spr 1.3.7
DeltaFile
+0-00 files

LLVM/project df77f2b

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+0-00 files

LLVM/project 5d78c8cllvm/include/llvm/ADT DenseMap.h

[DenseMap] Add memory barrier for sanitizers in getInlineBuckets/getLargeRep (#183457)

Add a compiler memory barrier to prevent optimizations from triggering
false positives on partially poisoned buckets in (HW)ASan.

Fixes #182720.
DeltaFile
+9-0llvm/include/llvm/ADT/DenseMap.h
+9-01 files

LLVM/project aeba098clang/lib/CIR/CodeGen CIRGenCleanup.cpp, llvm/test/CodeGen/AMDGPU load-saddr-offset-imm.ll

rebase

Created using spr 1.3.7
DeltaFile
+456-103llvm/test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll
+160-64llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
+8-91clang/lib/CIR/CodeGen/CIRGenCleanup.cpp
+88-3utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+89-0llvm/test/CodeGen/AMDGPU/load-saddr-offset-imm.ll
+0-64llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll
+801-32567 files not shown
+1,634-57873 files

LLVM/project 682add2clang/lib/CIR/CodeGen CIRGenCleanup.cpp, llvm/test/CodeGen/AMDGPU load-saddr-offset-imm.ll

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+456-103llvm/test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll
+160-64llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
+8-91clang/lib/CIR/CodeGen/CIRGenCleanup.cpp
+88-3utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+89-0llvm/test/CodeGen/AMDGPU/load-saddr-offset-imm.ll
+0-64llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-runtime.ll
+801-32567 files not shown
+1,634-57873 files