LLVM/project 055ef48llvm/test/CodeGen/X86 vector-shuffle-combining-avx512f.ll

[X86] Add tests showing failure to concat 256-bit rotate nodes on non-vlx targets (#203517)

These are widened in tablegen, we don't need to limit these to VLX targets
DeltaFile
+34-0llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
+34-01 files

LLVM/project 3b63f04mlir/include/mlir/Dialect/Vector/IR VectorOps.td, mlir/include/mlir/Dialect/Vector/Utils VectorUtils.h

[mlir][vector] extend `createReadOrMaskedRead`/`createWriteOrMaskedWrite` with permutation map support (#202766)

Follow-up to #201180.

Extends the existing `createReadOrMaskedRead` and
`createWriteOrMaskedWrite` utilities in `VectorUtils` with two optional
trailing parameters:
- `ArrayRef<Value> indices`
- `AffineMap permutationMap`

The affine super-vectorizer is updated to call these functions instead
of constructing `TransferReadOp`/`TransferWriteOp` directly.

@banach-space, please correct me if this wasn't what you meant in the
previous PR.

---------

Signed-off-by: Federico Bruzzone <federico.bruzzone.i at gmail.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski at gmail.com>
DeltaFile
+83-25mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+25-23mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+9-34mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
+12-4mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+9-2mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+138-885 files

LLVM/project f77a290llvm/lib/Transforms/Scalar MergeICmps.cpp, llvm/test/Transforms/MergeICmps/X86 no-gep-other-work.ll opaque-ptr.ll

[MergeICmps] Perform dereferenceability check with context (#202884)

To support deref-at-point semantics, we need to check dereferenceability
with a context instruction. Currently, MergeICmps does the check for
each individual load instruction. In this PR, I'm replacing this with a
check for all the loads that are part of a chain after they have been
collected, so we do the context-sensitive check only once.

The choice of context instruction is a bit tricky: Normally, this would
just be the first block in the chain (the "entry block"), but it's also
possible for the block to "do extra work", in which case it will get
split. If this happens, we should be checking at the splitting point, as
the extra work might be freeing the pointer.

Another question to consider here is whether we need to be concerned
about frees at all: After all, the original code will be accessing at
least one byte of the two objects, so doesn't that imply that it wasn't
freed already? This is indeed the case, as long as allocations cannot
shrink. This is something we currently don't allow, but I think it's
something we want to allow, so I'm going with the conservative treatment
here.
DeltaFile
+49-10llvm/lib/Transforms/Scalar/MergeICmps.cpp
+53-4llvm/test/Transforms/MergeICmps/X86/no-gep-other-work.ll
+1-0llvm/test/Transforms/MergeICmps/X86/opaque-ptr.ll
+103-143 files

LLVM/project 229e547llvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange lcssa-incoming-value-is-not-instr.ll

[LoopInterchange] Fix crash when followLCSSA returns constant
DeltaFile
+70-0llvm/test/Transforms/LoopInterchange/lcssa-incoming-value-is-not-instr.ll
+7-5llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+77-52 files

LLVM/project a4bdf9dllvm/lib/Target/DirectX/DirectXIRPasses DXILDebugInfo.cpp, llvm/test/CodeGen/DirectX/DebugInfo dbg-assign.ll dbg-value-arglist.ll

[DirectX] Lower DbgAssign to DbgValue (#200267)

DbgAssign is not representable in LLVM 3.7.
DeltaFile
+119-15llvm/lib/Target/DirectX/DirectXIRPasses/DXILDebugInfo.cpp
+45-0llvm/test/CodeGen/DirectX/DebugInfo/dbg-assign.ll
+44-0llvm/test/tools/dxil-dis/dbg-assign.ll
+43-0llvm/test/tools/dxil-dis/dbg-value-arglist.ll
+41-0llvm/test/CodeGen/DirectX/DebugInfo/dbg-value-arglist.ll
+292-155 files

LLVM/project fb009c3llvm/lib/Target/DirectX/DirectXIRPasses DXILDebugInfo.cpp, llvm/test/CodeGen/DirectX/DebugInfo di-commonblock.ll

[DirectX] Drop DICommonBlock metadata (#201948)

DICommonBlock cannot be represented in LLVM 3.7, but it is a scope
within a parent scope, so we can refer to the parent scope instead.
DeltaFile
+48-0llvm/test/CodeGen/DirectX/DebugInfo/di-commonblock.ll
+42-0llvm/test/tools/dxil-dis/di-commonblock.ll
+8-0llvm/lib/Target/DirectX/DirectXIRPasses/DXILDebugInfo.cpp
+98-03 files

LLVM/project 03e33ecllvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange lcssa-incoming-value-is-not-instr.ll

[LoopInterchange] Fix crash when followLCSSA returns constant
DeltaFile
+70-0llvm/test/Transforms/LoopInterchange/lcssa-incoming-value-is-not-instr.ll
+2-2llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+72-22 files

LLVM/project 1f21f15clang/test/CodeGen/X86 avx512f-builtins-constrained-cmp.c

[X86] - Prevent the wrong fold of x86_avx512_mask_cmp_ss/sd to fcmp (#202321)

The issue is based upon the SemiAnalysisAI by @jlebar.
[058-mask-cmp-ss-imm-immediate-not-validated](https://github.com/SemiAnalysisAI/FuzzX/blob/master/x86/bugs/058-mask-cmp-ss-imm-immediate-not-validated/NOTES.md)

It is not a real bug, just a warning for the future fold implementation
of mask_cmp → fcmp.

There is non to fix as of now in the source code. Added a few comments
and test cases for the future implementation of the folds.

@topperc @phoebewang
DeltaFile
+54-0clang/test/CodeGen/X86/avx512f-builtins-constrained-cmp.c
+54-01 files

LLVM/project 8f069e7llvm/docs/CommandGuide lit.rst, llvm/utils/lit/lit TestRunner.py

[lit] Add support for %{s:stem} substitution. (#202885)

It provides the source file name with the (last) extension removed.

This is to align with what is available for %t and actually needed
downstream.
DeltaFile
+2-0llvm/utils/lit/lit/TestRunner.py
+2-0llvm/utils/lit/tests/substitutions.py
+1-0llvm/docs/CommandGuide/lit.rst
+5-03 files

LLVM/project 2b4e89bllvm/test/CodeGen/X86 vector-interleaved-store-i16-stride-7.ll vector-interleaved-store-i16-stride-6.ll

[X86] combineConcatVectorOps - concat(permi(x,imm0),permi(y,imm1)) -> vpermv3(widen(x),m,widen(y)) (#203508)

Add handling for X86ISD::VPERMI nodes with different immediates -
folding to a X86ISD::VPERMV3 instead, replacing a
INSERT_SUBVECTOR+2xPERMI nodes with a mask load

We don't need to concat the source operands - we have other folds that
will do this if beneficial - we just rely on (free) implicit widening.
DeltaFile
+3,204-3,450llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
+1,905-2,037llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+812-846llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-5.ll
+638-628llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll
+592-660llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
+600-624llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-5.ll
+7,751-8,2452 files not shown
+7,779-8,2538 files

LLVM/project 9623ae8clang/lib/AST/ByteCode Pointer.h Pointer.cpp

[clang][bytecode] Add `PtrView` for non-tracking pointers (#184129)

Currently, when creating a `Pointer` (of block type, which I will assume
here), the pointer will add itself (via its address) to its block's
pointer list. This way, a block always knows what pointers point to it.
That's important so we can handle the case when a block (which was e.g.
created for a local variable) is destroyed and we now need to update its
pointers.

However, since always do this for all `Pointer` instances, it creates a
weird performance problem where we do this dance all the time for no
reason, e.g. consider `Pointer::stripBaseCasts()`:

https://github.com/llvm/llvm-project/blob/88693c49d9ac58a33af5978d31f6c70fe1d5b45b/clang/lib/AST/ByteCode/Pointer.h#L778-L783

This will add and remove the newly created pointer from the block's
pointer list every iteration. Other offenders are `Pointer::toRValue()`,
`EvaluationResult::checkFullyInitialized()` or
`Pointer::computeOffsetForComparison()`.

    [8 lines not shown]
DeltaFile
+371-210clang/lib/AST/ByteCode/Pointer.h
+65-67clang/lib/AST/ByteCode/Pointer.cpp
+24-23clang/lib/AST/ByteCode/InterpBuiltin.cpp
+20-21clang/lib/AST/ByteCode/EvaluationResult.cpp
+18-15clang/lib/AST/ByteCode/Interp.cpp
+11-10clang/lib/AST/ByteCode/InterpBuiltinBitCast.cpp
+509-3463 files not shown
+535-3509 files

LLVM/project 67444b6llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll

AMDGPU/GlobalISel: RegBankLegalize rules for mfma_scale
DeltaFile
+9,287-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll
+4,206-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll
+7-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+13,500-23 files

LLVM/project 5e65f12llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-debug.mir

[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging

It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.

I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
DeltaFile
+1-50llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-44llvm/test/CodeGen/AMDGPU/waitcnt-debug.mir
+1-942 files

LLVM/project 375a36cllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Add helper for getLimit
DeltaFile
+9-8llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+9-81 files

LLVM/project fdbb553clang/test/Sema/aarch64-sme2p3-intrinsics acle_sme2p3_target.c, clang/test/Sema/aarch64-sve2p3-intrinsics acle_sve2p3_target.c

fixup! Last few adjustments based on CR comments
DeltaFile
+0-54clang/test/Sema/aarch64-sve2p3-intrinsics/acle_sve2p3_target.c
+0-20clang/test/Sema/aarch64-sme2p3-intrinsics/acle_sme2p3_target.c
+1-0llvm/include/llvm/IR/IntrinsicsAArch64.td
+1-743 files

LLVM/project 6d73d5cflang/lib/Lower/OpenMP OpenMP.cpp

address review comments
DeltaFile
+8-6flang/lib/Lower/OpenMP/OpenMP.cpp
+8-61 files

LLVM/project b08a295flang/lib/Lower/OpenMP OpenMP.cpp, flang/lib/Optimizer/OpenMP DoConcurrentConversion.cpp

[Flang][OpenMP] Add combined construct information

This patch adds the `omp.combined` attribute to OpenMP dialect
operations following changes to the `ComposableOpInterface`.

This attribute is added to operations representing non-innermost leaf
constructs of a combined construct and to standalone block-associated
constructs that can be combined with their parent construct.

Changes are made to the OpenMP lowering logic, as well as the
do-concurrent, workshare and workdistribute transformation passes.
DeltaFile
+1,094-0flang/test/Lower/OpenMP/compound.f90
+56-20flang/lib/Lower/OpenMP/OpenMP.cpp
+6-6flang/test/Transforms/DoConcurrent/use_loop_bounds_in_body.f90
+5-5flang/test/Transforms/DoConcurrent/local_device.mlir
+4-4flang/test/Transforms/DoConcurrent/reduce_device.mlir
+6-2flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+1,171-3727 files not shown
+1,225-7133 files

LLVM/project 9e60e47mlir/include/mlir/Dialect/OpenMP OpenMPOps.td, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[MLIR][OpenMP] Explicit tagging of combined constructs

Combined OpenMP constructs, such as `parallel do`, which represent
nests of constructs where each one contains a single other construct
without any other directives or statements in between, are currently not
marked in any way in the MLIR representation.

This works because they don't usually require any specific handling
other than what would be done for the included operations. However, the
handling of `target` regions needs to know whether it was part of a
combined construct in order to properly optimize for the SPMD case and
detect when certain clauses must be inconditionally evaluated in the
host.

So far, this has been achieved by having some MLIR pattern-matching
logic to infer whether a nest of operations could have potentially been
produced for a combined construct. This approach is error prone,
computationally expensive and it can't really work in the general case.
On the other hand, a compiler frontend can easily tell the difference

    [10 lines not shown]
DeltaFile
+137-134mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+123-76mlir/test/Dialect/OpenMP/invalid.mlir
+106-0mlir/test/Dialect/OpenMP/invalid-interface.mlir
+33-33mlir/test/Dialect/OpenMP/ops.mlir
+29-33mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+24-24mlir/test/Target/LLVMIR/openmp-teams-clauses-trunc-ext.mlir
+452-30035 files not shown
+565-37041 files

LLVM/project 0e5c89eflang/lib/Lower/OpenMP OpenMP.cpp

address review comments
DeltaFile
+14-13flang/lib/Lower/OpenMP/OpenMP.cpp
+14-131 files

LLVM/project 8f6cb73cmake/Modules GetTripleCMakeSystemName.cmake

Handle more cases from the chart
DeltaFile
+20-3cmake/Modules/GetTripleCMakeSystemName.cmake
+20-31 files

LLVM/project 355735ecmake/Modules GetTripleCMakeSystemName.cmake

Handle mingw
DeltaFile
+1-1cmake/Modules/GetTripleCMakeSystemName.cmake
+1-11 files

LLVM/project 23d906eopenmp/runtime/cmake LibompExports.cmake

[openmp] Fix export file paths (#202692)

The files omp_lib.h and omp-tools.h are the outputs of two
configure_file invocations which specify the full path of the outputs.
Use these full paths in LibompExports.cmake so they can actually be
found.
DeltaFile
+2-2openmp/runtime/cmake/LibompExports.cmake
+2-21 files

LLVM/project 6505f14llvm/include/llvm/CodeGen UnreachableBlockElim.h, llvm/lib/Target/AMDGPU AMDGPU.h SIWholeQuadMode.h

[NPM] Make few more passes Required
DeltaFile
+4-4llvm/lib/Target/AMDGPU/AMDGPU.h
+2-2llvm/include/llvm/CodeGen/UnreachableBlockElim.h
+1-1llvm/lib/Target/AMDGPU/SIWholeQuadMode.h
+1-1llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.h
+1-1llvm/lib/Target/AMDGPU/SILowerWWMCopies.h
+1-1llvm/lib/Target/AMDGPU/SILowerSGPRSpills.h
+10-1013 files not shown
+23-2319 files

LLVM/project 88429cacross-project-tests/debuginfo-tests/dexter/dex/dextIR StepIR.py, cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/debugging then_at_frame.cpp

[Dexter] Add at_frame_idx to check values in frames above current

This patch adds a new attribute for !and nodes, `at_frame_idx`, which
matches against frames above its parent node; for example, in the script:

```
!where {function: foo}:
  !where {function: bar}:
    !and {at_frame_idx: 1}:
      !value x: 0
```

The `!value x` node checks the value of 'x' in 'foo' while the debugger is
inside 'bar'. Use of this attribute comes with some restrictions: a !where
node can never be nested under a !and{at_frame_idx} node, and neither can
another !and{at_frame_idx} node.
DeltaFile
+61-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/Inputs/rewrite_at_frame_expected.cpp
+60-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/debugging/then_at_frame.cpp
+49-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/rewrite_at_frame.cpp
+46-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/evaluation/eval_at_frame.cpp
+26-13cross-project-tests/debuginfo-tests/dexter/dex/dextIR/StepIR.py
+33-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/parser/reject-bad-at_frame_idx.test
+275-1313 files not shown
+364-5319 files

LLVM/project a782830llvm/docs AMDGPUUsage.rst

Clean-up docs
DeltaFile
+3-3llvm/docs/AMDGPUUsage.rst
+3-31 files

LLVM/project 6684278. .git-blame-ignore-revs

Add "Split clang/lib/CodeGen/CGBuiltin.cpp" to .git-blame-ignore-revs (#203419)
DeltaFile
+3-0.git-blame-ignore-revs
+3-01 files

LLVM/project 663bcb3clang/lib/CodeGen CodeGenFunction.h, clang/lib/CodeGen/TargetBuiltins ARM.cpp

[SVE] Replace unnecessary Intrinsic::aarch64_sve_ptrue construction. (#203349)

Prefer ConstantInt::getTrue() over sve.ptrue(31) when creating
all-active boolean vectors.
DeltaFile
+24-46llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
+22-30clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_dupq.c
+14-23llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+6-14llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+1-8clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+0-1clang/lib/CodeGen/CodeGenFunction.h
+67-1226 files

LLVM/project ae2ef21llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll

AMDGPU/GlobalISel: RegBankLegalize rules for mfma_scale
DeltaFile
+9,306-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll
+4,210-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll
+7-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+13,523-23 files

LLVM/project 33c56b4clang/cmake/modules ClangConfig.cmake.in, cmake/Modules GetTripleCMakeSystemName.cmake

runtimes: Pass CMAKE_SYSTEM_NAME based on target triple

Compute the cmake system name from the target triple, rather
than passing through the host's. This is primarily to stop
forwarding OSX specific cmake variables.

This fixes build failures when trying to build gpu libc on mac
hosts. Previously it would fail on several issues, starting with
an unused argument -mmacos-version-min error, followed by other
errors caused by passing -isysroot.

Secondarily, restrict the cmake imported targets when cross compiling.
Without this, the amdgpu build prints many cmake warnings about the
target not supporting shared libraries.

Claude did most of the actual work, though it required quite a few
rounds of prodding to get it into the right place. In particular it
took care of handling all of the cmake platform recognized names from
the triple.

    [2 lines not shown]
DeltaFile
+32-37llvm/cmake/modules/LLVMConfig.cmake.in
+65-0cmake/Modules/GetTripleCMakeSystemName.cmake
+25-2llvm/cmake/modules/LLVMExternalProjectUtils.cmake
+4-1clang/cmake/modules/ClangConfig.cmake.in
+0-4llvm/runtimes/CMakeLists.txt
+126-445 files

LLVM/project 056b4a7llvm/lib/Target/SPIRV SPIRVNonSemanticDebugHandler.cpp SPIRVNonSemanticDebugHandler.h, llvm/test/CodeGen/SPIRV/debug-info debug-type-vector-skipped.ll debug-type-vector.ll

Emit debug type vector (#200056)

This emits `DebugTypeVector` for HLSL `float4`-style vectors.

`partitionTypes()` separates vector `DICompositeType` nodes from basic
types so both can be visited in a single pass over the debug metadata. A
new `emitDebugTypeVector()` helper builds the `DebugTypeVector`
instruction and looks up the base-type register in `DebugTypeRegs`.

The helper skips four cases silently:

1. Absent or non-`DIBasicType` base type: only scalar element types are
supported for now.
2. Base type not yet emitted: the type was not reached during the
`DebugTypeBasic` pass.
3. Multiple subranges: `DebugTypeVector` models one-dimensional vectors
only (NSDI cannot encode multi-subrange types).
4. Non-constant subrange count: NSDI cannot represent variable-length
counts.

    [2 lines not shown]
DeltaFile
+66-0llvm/test/CodeGen/SPIRV/debug-info/debug-type-vector-skipped.ll
+66-0llvm/test/CodeGen/SPIRV/debug-info/debug-type-vector.ll
+49-7llvm/lib/Target/SPIRV/SPIRVNonSemanticDebugHandler.cpp
+14-1llvm/lib/Target/SPIRV/SPIRVNonSemanticDebugHandler.h
+195-84 files