LLVM/project 75383d6clang/test/Format lit.local.cfg

clang-format/test: Anchor the empty .clang-format-ignore to test_exec_root (#203444)

The test suite's lit.local.cfg creates an empty .clang-format-ignore at
config discovery time to protect the multiple-inputs[-inplace].cpp tests
that work on files in temporary locations.

This file should be written to where the tests execute instead of the
CWD during config discovery. The CWD might not even be an ancestor of
where the tests execute, and it might be the repository root which does
have a .clang-format-ignore that is incorrectly clobbered without this
change.

An alternative would be to just fix the tests that need to be protected,
but having a blanket guard like this does seem like a reasonable thing
to do.

Fixes: 915de1a5889c ("Generate empty .clang-format-ignore before running
tests (#136154)")
DeltaFile
+3-3clang/test/Format/lit.local.cfg
+3-31 files

LLVM/project daa9ecfllvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-shuffle-combining-avx512f.ll

[X86] combineConcatVectorOps - concat(roti(x,i),roti(y,i)) -> roti(concat(x,y),i) on non-vlx targets (#203528)

128/256-bit rotates are widened in tablegen, we don't need to limit
these to VLX targets - any AVX512 target can perform these

We already have test coverage to ensure 128-bit XOP rotates don't get
concatenated to 256-bit
DeltaFile
+3-4llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
+1-1llvm/lib/Target/X86/X86ISelLowering.cpp
+4-52 files

LLVM/project b0cc322mlir/test/IR test-func-erase-arg-error.mlir, mlir/test/lib/IR TestFunc.cpp

[mlir] Check for argument uses in test-func-erase-arg pass (#203367)

The -test-func-erase-arg pass crashed when erasing arguments that still
had uses. Diagnose every such argument and fail the pass without
erasing.

Fixes https://github.com/llvm/llvm-project/issues/203218

Assisted-by: Claude (Claude Code)
DeltaFile
+13-3mlir/test/lib/IR/TestFunc.cpp
+15-0mlir/test/IR/test-func-erase-arg-error.mlir
+28-32 files

LLVM/project 5096057flang/lib/Semantics check-omp-structure.cpp, flang/test/Semantics/OpenMP linear-clause01.f90

[Flang][OpenMP] Fix crash when common block name is used in LINEAR clause (#203250)

[Flang][OpenMP] Fix crash when common block name is used in LINEAR
clause

  Using a common block name in a LINEAR clause (e.g. linear(/c/))
  caused
  a symbol-must-have-a-type crash during lowering. The semantic checker
  was not emitting an error because GetSymbolsInObjectList expands /c/
  to its member variables before the check runs, so the
  symbol->has<CommonBlockDetails>() guard was never reached.

  Fix by checking for common block names directly on the OmpObjectList
  before the expansion, where the Name variant of OmpObject still holds
  the common block symbol.

  Fixes #202329
DeltaFile
+14-0flang/test/Semantics/OpenMP/linear-clause01.f90
+10-0flang/lib/Semantics/check-omp-structure.cpp
+24-02 files

LLVM/project 1badbb2llvm/lib/Target/AMDGPU SIISelLowering.cpp

[AMDGPU] Fix copy-paste in hasNon16BitAccesses OpIs16Bit check (#203499)

OpIs16Bit tested TempOtherOp width instead of TempOp, mismatching
symmetric OtherOpIs16Bit clause

No observed miscompiles or direct issues to due to that so far
DeltaFile
+1-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+1-11 files

LLVM/project 7d42028lldb/packages/Python/lldbsuite/test/make Makefile.rules

[lldb][Windows] Make RM_RF a no-op on an empty argument and swallow errors (#203040)

This patch makes the Windows `RM_RF` a no-op on an empty argument and
swallow errors, matching Unix `rm -rf`. This fixes issues in swiftlang
on fresh builds.

This is needed for https://github.com/swiftlang/llvm-project/pull/13180
DeltaFile
+1-1lldb/packages/Python/lldbsuite/test/make/Makefile.rules
+1-11 files

LLVM/project f3dcc7fllvm/test/Analysis/DependenceAnalysis gcd-miv-addrec-wrap.ll

[DA] Add test for addrec can wrap in GCD MIV (NFC)
DeltaFile
+73-0llvm/test/Analysis/DependenceAnalysis/gcd-miv-addrec-wrap.ll
+73-01 files

LLVM/project 7c0a3a5clang/lib/CodeGen CGCUDANV.cpp, clang/lib/Driver/ToolChains MSVC.cpp

[PGO][HIP] Fix HIP device profile collection and sections emission (#202095)

Several related HIP device-PGO fixes:

Windows device collection. HIP rejects a hipMemcpy that reads past the
bounds
of a symbol registered with __hipRegisterVar, but device
data/counters/names
live in merged linker sections. Register a separate shadow for each
device
data, counters, and names symbol and copy each one by its exact
hipGetSymbolSize
size; this also lets static TUs with several kernels keep all their
profile
data. Open the device profile file in binary mode and pass the device
names to
the correct lprofWriteDataImpl arguments so llvm-profdata can read the
raw
profile. Open the versioned amdhip64_7.dll first, falling back to

    [41 lines not shown]
DeltaFile
+650-107compiler-rt/lib/profile/InstrProfilingPlatformROCm.cpp
+79-45clang/lib/CodeGen/CGCUDANV.cpp
+74-3llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
+48-6clang/test/CodeGenHIP/offload-pgo-sections.hip
+5-6compiler-rt/lib/profile/InstrProfilingFile.c
+7-1clang/lib/Driver/ToolChains/MSVC.cpp
+863-1683 files not shown
+875-1699 files

LLVM/project fc15b71clang/lib/CodeGen/Targets SystemZ.cpp

[SystemZ] Rename GetSingleElementType to getSingleElementType (#203078)

# Refactor: Rename GetSingleElementType to getSingleElementType in
SystemZ ABI

## Summary
This PR refactors the SystemZ ABI code to follow LLVM coding standards
by renaming `GetSingleElementType` to `getSingleElementType` (camelCase
convention).

## Motivation
Rename to avoid having 'GetSingleElementType` in one class and
`getSingleElementType` in another one.
DeltaFile
+6-6clang/lib/CodeGen/Targets/SystemZ.cpp
+6-61 files

LLVM/project 055ef48llvm/test/CodeGen/X86 vector-shuffle-combining-avx512f.ll

[X86] Add tests showing failure to concat 256-bit rotate nodes on non-vlx targets (#203517)

These are widened in tablegen, we don't need to limit these to VLX targets
DeltaFile
+34-0llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
+34-01 files

LLVM/project 0eaca71mlir/include/mlir-c/Dialect LLVM.h, mlir/include/mlir/Dialect/LLVMIR LLVMAttrDefs.td

fix: rename `MDFunc` to `MDValue`
DeltaFile
+9-9mlir/include/mlir-c/Dialect/LLVM.h
+9-9mlir/lib/Bindings/Python/DialectLLVM.cpp
+8-8mlir/include/mlir/Dialect/LLVMIR/LLVMAttrDefs.td
+8-8mlir/lib/CAPI/Dialect/LLVM.cpp
+7-7mlir/test/python/dialects/llvm.py
+4-4mlir/test/Dialect/LLVMIR/roundtrip.mlir
+45-455 files not shown
+55-5511 files

LLVM/project 42a87bdmlir/test/Dialect/LLVMIR call-intrin.mlir

fix: delete invalid test (read_global_ref_metadata)
DeltaFile
+0-13mlir/test/Dialect/LLVMIR/call-intrin.mlir
+0-131 files

LLVM/project 3b63f04mlir/include/mlir/Dialect/Vector/IR VectorOps.td, mlir/include/mlir/Dialect/Vector/Utils VectorUtils.h

[mlir][vector] extend `createReadOrMaskedRead`/`createWriteOrMaskedWrite` with permutation map support (#202766)

Follow-up to #201180.

Extends the existing `createReadOrMaskedRead` and
`createWriteOrMaskedWrite` utilities in `VectorUtils` with two optional
trailing parameters:
- `ArrayRef<Value> indices`
- `AffineMap permutationMap`

The affine super-vectorizer is updated to call these functions instead
of constructing `TransferReadOp`/`TransferWriteOp` directly.

@banach-space, please correct me if this wasn't what you meant in the
previous PR.

---------

Signed-off-by: Federico Bruzzone <federico.bruzzone.i at gmail.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski at gmail.com>
DeltaFile
+83-25mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+25-23mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+9-34mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp
+12-4mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+9-2mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+138-885 files

LLVM/project f77a290llvm/lib/Transforms/Scalar MergeICmps.cpp, llvm/test/Transforms/MergeICmps/X86 no-gep-other-work.ll opaque-ptr.ll

[MergeICmps] Perform dereferenceability check with context (#202884)

To support deref-at-point semantics, we need to check dereferenceability
with a context instruction. Currently, MergeICmps does the check for
each individual load instruction. In this PR, I'm replacing this with a
check for all the loads that are part of a chain after they have been
collected, so we do the context-sensitive check only once.

The choice of context instruction is a bit tricky: Normally, this would
just be the first block in the chain (the "entry block"), but it's also
possible for the block to "do extra work", in which case it will get
split. If this happens, we should be checking at the splitting point, as
the extra work might be freeing the pointer.

Another question to consider here is whether we need to be concerned
about frees at all: After all, the original code will be accessing at
least one byte of the two objects, so doesn't that imply that it wasn't
freed already? This is indeed the case, as long as allocations cannot
shrink. This is something we currently don't allow, but I think it's
something we want to allow, so I'm going with the conservative treatment
here.
DeltaFile
+49-10llvm/lib/Transforms/Scalar/MergeICmps.cpp
+53-4llvm/test/Transforms/MergeICmps/X86/no-gep-other-work.ll
+1-0llvm/test/Transforms/MergeICmps/X86/opaque-ptr.ll
+103-143 files

LLVM/project 229e547llvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange lcssa-incoming-value-is-not-instr.ll

[LoopInterchange] Fix crash when followLCSSA returns constant
DeltaFile
+70-0llvm/test/Transforms/LoopInterchange/lcssa-incoming-value-is-not-instr.ll
+7-5llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+77-52 files

LLVM/project a4bdf9dllvm/lib/Target/DirectX/DirectXIRPasses DXILDebugInfo.cpp, llvm/test/CodeGen/DirectX/DebugInfo dbg-assign.ll dbg-value-arglist.ll

[DirectX] Lower DbgAssign to DbgValue (#200267)

DbgAssign is not representable in LLVM 3.7.
DeltaFile
+119-15llvm/lib/Target/DirectX/DirectXIRPasses/DXILDebugInfo.cpp
+45-0llvm/test/CodeGen/DirectX/DebugInfo/dbg-assign.ll
+44-0llvm/test/tools/dxil-dis/dbg-assign.ll
+43-0llvm/test/tools/dxil-dis/dbg-value-arglist.ll
+41-0llvm/test/CodeGen/DirectX/DebugInfo/dbg-value-arglist.ll
+292-155 files

LLVM/project fb009c3llvm/lib/Target/DirectX/DirectXIRPasses DXILDebugInfo.cpp, llvm/test/CodeGen/DirectX/DebugInfo di-commonblock.ll

[DirectX] Drop DICommonBlock metadata (#201948)

DICommonBlock cannot be represented in LLVM 3.7, but it is a scope
within a parent scope, so we can refer to the parent scope instead.
DeltaFile
+48-0llvm/test/CodeGen/DirectX/DebugInfo/di-commonblock.ll
+42-0llvm/test/tools/dxil-dis/di-commonblock.ll
+8-0llvm/lib/Target/DirectX/DirectXIRPasses/DXILDebugInfo.cpp
+98-03 files

LLVM/project 03e33ecllvm/lib/Transforms/Scalar LoopInterchange.cpp, llvm/test/Transforms/LoopInterchange lcssa-incoming-value-is-not-instr.ll

[LoopInterchange] Fix crash when followLCSSA returns constant
DeltaFile
+70-0llvm/test/Transforms/LoopInterchange/lcssa-incoming-value-is-not-instr.ll
+2-2llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+72-22 files

LLVM/project 1f21f15clang/test/CodeGen/X86 avx512f-builtins-constrained-cmp.c

[X86] - Prevent the wrong fold of x86_avx512_mask_cmp_ss/sd to fcmp (#202321)

The issue is based upon the SemiAnalysisAI by @jlebar.
[058-mask-cmp-ss-imm-immediate-not-validated](https://github.com/SemiAnalysisAI/FuzzX/blob/master/x86/bugs/058-mask-cmp-ss-imm-immediate-not-validated/NOTES.md)

It is not a real bug, just a warning for the future fold implementation
of mask_cmp → fcmp.

There is non to fix as of now in the source code. Added a few comments
and test cases for the future implementation of the folds.

@topperc @phoebewang
DeltaFile
+54-0clang/test/CodeGen/X86/avx512f-builtins-constrained-cmp.c
+54-01 files

LLVM/project 8f069e7llvm/docs/CommandGuide lit.rst, llvm/utils/lit/lit TestRunner.py

[lit] Add support for %{s:stem} substitution. (#202885)

It provides the source file name with the (last) extension removed.

This is to align with what is available for %t and actually needed
downstream.
DeltaFile
+2-0llvm/utils/lit/lit/TestRunner.py
+2-0llvm/utils/lit/tests/substitutions.py
+1-0llvm/docs/CommandGuide/lit.rst
+5-03 files

LLVM/project 2b4e89bllvm/test/CodeGen/X86 vector-interleaved-store-i16-stride-7.ll vector-interleaved-store-i16-stride-6.ll

[X86] combineConcatVectorOps - concat(permi(x,imm0),permi(y,imm1)) -> vpermv3(widen(x),m,widen(y)) (#203508)

Add handling for X86ISD::VPERMI nodes with different immediates -
folding to a X86ISD::VPERMV3 instead, replacing a
INSERT_SUBVECTOR+2xPERMI nodes with a mask load

We don't need to concat the source operands - we have other folds that
will do this if beneficial - we just rely on (free) implicit widening.
DeltaFile
+3,204-3,450llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
+1,905-2,037llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll
+812-846llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-5.ll
+638-628llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll
+592-660llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
+600-624llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-5.ll
+7,751-8,2452 files not shown
+7,779-8,2538 files

LLVM/project 9623ae8clang/lib/AST/ByteCode Pointer.h Pointer.cpp

[clang][bytecode] Add `PtrView` for non-tracking pointers (#184129)

Currently, when creating a `Pointer` (of block type, which I will assume
here), the pointer will add itself (via its address) to its block's
pointer list. This way, a block always knows what pointers point to it.
That's important so we can handle the case when a block (which was e.g.
created for a local variable) is destroyed and we now need to update its
pointers.

However, since always do this for all `Pointer` instances, it creates a
weird performance problem where we do this dance all the time for no
reason, e.g. consider `Pointer::stripBaseCasts()`:

https://github.com/llvm/llvm-project/blob/88693c49d9ac58a33af5978d31f6c70fe1d5b45b/clang/lib/AST/ByteCode/Pointer.h#L778-L783

This will add and remove the newly created pointer from the block's
pointer list every iteration. Other offenders are `Pointer::toRValue()`,
`EvaluationResult::checkFullyInitialized()` or
`Pointer::computeOffsetForComparison()`.

    [8 lines not shown]
DeltaFile
+371-210clang/lib/AST/ByteCode/Pointer.h
+65-67clang/lib/AST/ByteCode/Pointer.cpp
+24-23clang/lib/AST/ByteCode/InterpBuiltin.cpp
+20-21clang/lib/AST/ByteCode/EvaluationResult.cpp
+18-15clang/lib/AST/ByteCode/Interp.cpp
+11-10clang/lib/AST/ByteCode/InterpBuiltinBitCast.cpp
+509-3463 files not shown
+535-3509 files

LLVM/project 67444b6llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll

AMDGPU/GlobalISel: RegBankLegalize rules for mfma_scale
DeltaFile
+9,287-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll
+4,206-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll
+7-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+13,500-23 files

LLVM/project 5e65f12llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-debug.mir

[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging

It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.

I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
DeltaFile
+1-50llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-44llvm/test/CodeGen/AMDGPU/waitcnt-debug.mir
+1-942 files

LLVM/project 375a36cllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Add helper for getLimit
DeltaFile
+9-8llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+9-81 files

LLVM/project fdbb553clang/test/Sema/aarch64-sme2p3-intrinsics acle_sme2p3_target.c, clang/test/Sema/aarch64-sve2p3-intrinsics acle_sve2p3_target.c

fixup! Last few adjustments based on CR comments
DeltaFile
+0-54clang/test/Sema/aarch64-sve2p3-intrinsics/acle_sve2p3_target.c
+0-20clang/test/Sema/aarch64-sme2p3-intrinsics/acle_sme2p3_target.c
+1-0llvm/include/llvm/IR/IntrinsicsAArch64.td
+1-743 files

LLVM/project 6d73d5cflang/lib/Lower/OpenMP OpenMP.cpp

address review comments
DeltaFile
+8-6flang/lib/Lower/OpenMP/OpenMP.cpp
+8-61 files

LLVM/project b08a295flang/lib/Lower/OpenMP OpenMP.cpp, flang/lib/Optimizer/OpenMP DoConcurrentConversion.cpp

[Flang][OpenMP] Add combined construct information

This patch adds the `omp.combined` attribute to OpenMP dialect
operations following changes to the `ComposableOpInterface`.

This attribute is added to operations representing non-innermost leaf
constructs of a combined construct and to standalone block-associated
constructs that can be combined with their parent construct.

Changes are made to the OpenMP lowering logic, as well as the
do-concurrent, workshare and workdistribute transformation passes.
DeltaFile
+1,094-0flang/test/Lower/OpenMP/compound.f90
+56-20flang/lib/Lower/OpenMP/OpenMP.cpp
+6-6flang/test/Transforms/DoConcurrent/use_loop_bounds_in_body.f90
+5-5flang/test/Transforms/DoConcurrent/local_device.mlir
+4-4flang/test/Transforms/DoConcurrent/reduce_device.mlir
+6-2flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+1,171-3727 files not shown
+1,225-7133 files

LLVM/project 9e60e47mlir/include/mlir/Dialect/OpenMP OpenMPOps.td, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[MLIR][OpenMP] Explicit tagging of combined constructs

Combined OpenMP constructs, such as `parallel do`, which represent
nests of constructs where each one contains a single other construct
without any other directives or statements in between, are currently not
marked in any way in the MLIR representation.

This works because they don't usually require any specific handling
other than what would be done for the included operations. However, the
handling of `target` regions needs to know whether it was part of a
combined construct in order to properly optimize for the SPMD case and
detect when certain clauses must be inconditionally evaluated in the
host.

So far, this has been achieved by having some MLIR pattern-matching
logic to infer whether a nest of operations could have potentially been
produced for a combined construct. This approach is error prone,
computationally expensive and it can't really work in the general case.
On the other hand, a compiler frontend can easily tell the difference

    [10 lines not shown]
DeltaFile
+137-134mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+123-76mlir/test/Dialect/OpenMP/invalid.mlir
+106-0mlir/test/Dialect/OpenMP/invalid-interface.mlir
+33-33mlir/test/Dialect/OpenMP/ops.mlir
+29-33mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+24-24mlir/test/Target/LLVMIR/openmp-teams-clauses-trunc-ext.mlir
+452-30035 files not shown
+565-37041 files

LLVM/project 0e5c89eflang/lib/Lower/OpenMP OpenMP.cpp

address review comments
DeltaFile
+14-13flang/lib/Lower/OpenMP/OpenMP.cpp
+14-131 files