LLVM/project 332c06bllvm/include/llvm/Target Target.td, llvm/test/TableGen aarch64-apple-tuning-features.td

[llvm] Sort the Subtarget feature implies list by name (#197700)
DeltaFile
+76-76llvm/test/TableGen/aarch64-apple-tuning-features.td
+1-1llvm/include/llvm/Target/Target.td
+77-772 files

LLVM/project 44027b2llvm/lib/Transforms/Scalar LoopInterchange.cpp

address review comment
DeltaFile
+17-14llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+17-141 files

LLVM/project 0646ec9clang/include/clang/AST ASTContext.h, clang/include/clang/Basic Builtins.td

Revert "Add clang warning if fp exception functions are called without appropriate flags/pragmas" (#198341)

Reverts llvm/llvm-project#187860

Reason: this breaks compiling several different versions of libc, and is
also issuing diagnostics for platforms that are incompatible (see
https://github.com/llvm/llvm-project/pull/187860 for details).

Revert for now until we resolve how to move forward and reland.
DeltaFile
+0-68clang/test/Sema/fenv-access.c
+0-55clang/include/clang/Basic/Builtins.td
+0-51clang/test/Sema/builtin-fenv.c
+0-36clang/lib/Serialization/ASTReader.cpp
+1-34clang/include/clang/AST/ASTContext.h
+0-35clang/test/Sema/fenv-access-implicit.c
+1-27914 files not shown
+2-42420 files

LLVM/project 72daa33clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp, clang/test/CodeGen/AArch64 neon-across.c neon-intrinsics.c

[CIR] max-across-vector (vmaxv_*) intrinsics (#197095)

Part of #185382 
Added the vmax_v variants.
Moved the test cases to
[intrinsics.c](https://github.com/llvm/llvm-project/pull/clang/test/CodeGen/AArch64/neon/intrinsics.c)
Removed the test cases from
[neon-intrinsics.c](clang/test/CodeGen/AArch64/neon/intrinsics.c)
DeltaFile
+169-0clang/test/CodeGen/AArch64/neon/intrinsics.c
+1-112clang/test/CodeGen/AArch64/neon-across.c
+0-39clang/test/CodeGen/AArch64/neon-intrinsics.c
+15-0clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+185-1514 files

LLVM/project 6a86650mlir/include/mlir/Dialect/AMDGPU/Utils MemorySpaceUtils.h, mlir/lib/Dialect/AMDGPU/Transforms MemoryAccessOpInterfacesImpl.cpp FoldMemRefsOps.cpp

[mlir][AMDGPU] Move memory access op folding to memref interfaces (#197310)

This PR implements IndexedAccessOpInterface and
IndexedMemCopyOpInterface for relevant ops in the AMDGPU dialect,
removing the custom folding pass we used to have now that there's
interfaces for this sort of thing.

As a result:

- The in-bonuds semantics of various AMDGPU ops have been clarified
- Interface methods to enable oob checks on DMA operations have been
added (to prevent accidental `disjoint`ing and the like)
- Said memref rewrite patterns have been hardened to allow for mixed
tensor/memref semantics.
- Helpers for detecting memory spaces were factored out of
`AMDGPUOps.cpp` so that they could be re-used in the interface
implementations.

# Breaking changes / migration

    [4 lines not shown]
DeltaFile
+644-0mlir/test/Dialect/AMDGPU/fold-memref-alias-ops.mlir
+0-470mlir/test/Dialect/AMDGPU/amdgpu-fold-memrefs.mlir
+258-0mlir/lib/Dialect/AMDGPU/Transforms/MemoryAccessOpInterfacesImpl.cpp
+0-211mlir/lib/Dialect/AMDGPU/Transforms/FoldMemRefsOps.cpp
+94-0mlir/test/Dialect/AMDGPU/invalid.mlir
+64-0mlir/include/mlir/Dialect/AMDGPU/Utils/MemorySpaceUtils.h
+1,060-68112 files not shown
+1,244-77318 files

LLVM/project daff70eutils/bazel/llvm-project-overlay/mlir BUILD.bazel

[Bazel] Fixes 755732f (#198347)

This fixes 755732f184ea73b9f6f28765b33cf3030c0dc9d7.

Co-authored-by: Google Bazel Bot <google-bazel-bot at google.com>
DeltaFile
+2-0utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+2-01 files

LLVM/project 2a53990lldb/packages/Python/lldbsuite/test lldbtest.py dotest.py

[lldb][test] Use octal literal (NFC) (#198343)
DeltaFile
+1-3lldb/packages/Python/lldbsuite/test/lldbtest.py
+2-2lldb/packages/Python/lldbsuite/test/dotest.py
+3-52 files

LLVM/project 755732fmlir/include/mlir/Dialect/GPU/IR GPUOps.td, mlir/lib/Conversion/GPUToNVVM LowerGpuOpsToNVVMOps.cpp

[mlir][GPU] Extend gpu.barrier with scope and named-barrier support (#195692)

This commit adds two features to gpu.barrier that are supported on
targets like recent AMDGPU chips, Nvidia's hardware, and SPIR-V.

The first of these is named barriers, which allow creating a barrier
object that is initialized with the number of subgroups that must arrive
at it before those subgroups are released. These are represented in MLIR
with a new `!gpu.named_barrier` type and created by
`gpu.initialized_named_barrier` operation. These named barriers then
become arguments to `gpu.barrier`.

The other change is adding a "scope" enum and using it to specify the
execution scope of barriers. This allows for rerpresenting cluster- and
subgroup-wide barriers (the latter exists on AMDGPU and Nvidia, and
while I suspect Nvidia has cluster-scope barriers, I didn't go looking)
and allows us to fully lower to SPIR-V's OpControlBarrier.

While these are two different features, I figured I'd land them in one

    [4 lines not shown]
DeltaFile
+173-53mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+131-8mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+83-16mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+69-8mlir/lib/Conversion/GPUToSPIRV/GPUToSPIRV.cpp
+69-0mlir/test/Dialect/GPU/named-barrier.mlir
+54-0mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-barriers-gfx12.mlir
+579-8516 files not shown
+863-9322 files

LLVM/project 7eab3e0flang/lib/Optimizer/Transforms FIRToMemRef.cpp, flang/test/Transforms/FIRToMemRef omp-wsloop-simd-private.mlir

[FIRToMemRef] Fix fir.convert insertion inside omp.wsloop (#197653)

When replaceFIRMemrefs inserted a fir.convert before an op inside a
LoopWrapperInterface region (e.g. omp.simd inside omp.wsloop), it
violated the single-nested-op invariant, producing a verifier error. Fix
by walking up the LoopWrapperInterface parent chain and inserting before
the outermost wrapper instead.

Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>

Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+33-0flang/test/Transforms/FIRToMemRef/omp-wsloop-simd-private.mlir
+7-1flang/lib/Optimizer/Transforms/FIRToMemRef.cpp
+40-12 files

LLVM/project 58ee64eclang/lib/Driver/ToolChains OHOS.cpp Fuchsia.cpp, clang/test/Driver fuchsia.c ohos.c

[Driver] Uniform handling of invalid rtlib across drivers (#198219)

This is mostly an NFC except for a different diagnostic being emitted.
The goal is to unify validation and handling of invalid rtlib value
across different drivers to simplify supporting more -rtlib= values in
the future.
DeltaFile
+4-4clang/lib/Driver/ToolChains/OHOS.cpp
+4-4clang/lib/Driver/ToolChains/Fuchsia.cpp
+1-1clang/lib/Driver/ToolChains/Darwin.cpp
+1-1clang/test/Driver/fuchsia.c
+1-1clang/test/Driver/ohos.c
+11-115 files

LLVM/project 9886c72llvm/include/llvm/Analysis FunctionPropertiesAnalysis.h, llvm/include/llvm/IR FunctionProperties.def

Add noreturn call count to FunctionPropertiesAnalysis pass (#198322)

Adding this metric to visualize how many noreturn functions there are
with the idea of analyzing their relationship with unreachable
instructions
DeltaFile
+4-0llvm/lib/Analysis/FunctionPropertiesAnalysis.cpp
+1-0llvm/include/llvm/Analysis/FunctionPropertiesAnalysis.h
+1-0llvm/include/llvm/IR/FunctionProperties.def
+6-03 files

LLVM/project 1d4c14bclang/include/clang/AST ASTContext.h, clang/include/clang/Basic Builtins.td

Revert "Add clang warning if fp exception functions are called without approp…"

This reverts commit 5f2bedca745d5efa1955369cfe352bcd09be4633.
DeltaFile
+0-68clang/test/Sema/fenv-access.c
+0-55clang/include/clang/Basic/Builtins.td
+0-51clang/test/Sema/builtin-fenv.c
+0-36clang/lib/Serialization/ASTReader.cpp
+0-35clang/test/Sema/fenv-access-implicit.c
+1-34clang/include/clang/AST/ASTContext.h
+1-27914 files not shown
+2-42420 files

LLVM/project 823be5ellvm/lib/Support UnicodeNameToCodepointGenerated.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel legalize-load-local.mir

rebase

Created using spr 1.3.7
DeltaFile
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+8,633-8,584llvm/test/CodeGen/Thumb2/mve-clmul.ll
+1,243-8,768llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll
+0-4,752llvm/test/tools/llvm-mca/RISCV/SiFiveP800/vlseg-vsseg.s
+4,549-0llvm/test/tools/llvm-mca/RISCV/SiFiveP800/rvv/arithmetic.test
+3,706-328llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-local.mir
+42,004-43,3552,168 files not shown
+133,338-93,5472,174 files

LLVM/project c93c17bclang/include/clang/DependencyScanning ModuleDepCollector.h, clang/lib/DependencyScanning ModuleDepCollector.cpp

[clang][deps] Move `ModuleDepCollectorPP` to .cpp file (#197964)

This PR moves the `ModuleDepCollectorPP` type into the .cpp file. It's
an implementation detail that the header doesn't need to expose.
DeltaFile
+54-45clang/lib/DependencyScanning/ModuleDepCollector.cpp
+2-33clang/include/clang/DependencyScanning/ModuleDepCollector.h
+56-782 files

LLVM/project f4ee477llvm/test/Transforms/HotColdSplit issue-197982.ll

[NFC][CodeExtractor] simplify test for #197986 (#198011)
DeltaFile
+16-63llvm/test/Transforms/HotColdSplit/issue-197982.ll
+16-631 files

LLVM/project dd199b4llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp

[AMDGPU][GlobalISel] Remove dependency on legal ruleset (#197371)

This fills in always legal rules, to remove the dependency on the legacy
ruleset. This is not guaranteed to be all the rules, just the ones that
appear in tests.
DeltaFile
+7-0llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+7-01 files

LLVM/project 633d731llvm/test/TableGen aarch64-apple-tuning-features.td

[llvm] Re-format aarch64-apple-tuning-features.td. NFC (#197777)

It's much easier to review diffs with each feature on its own line. Also
add an -implicit-check-not so we don't miss any CPUs going forward.
DeltaFile
+274-23llvm/test/TableGen/aarch64-apple-tuning-features.td
+274-231 files

LLVM/project cab2fdallvm/lib/Target/SPIRV SPIRVLegalizerInfo.cpp SPIRVUtils.h, llvm/test/CodeGen/SPIRV/GlobalISel fn-ptr-addrspacecast.ll

[SPIRV] Allow casting between CodeSectionINTEL and Generic storage classes (#197556)

In the previous versions of the SPV_INTEL_function_pointers
[spec](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc),
casts between the CodeSectionINTEL storage class (used for function
pointers) and the Generic storage class were illegal.

The spec was updated a few months ago, and the new version allows the
cast, specifying `CodeSectionIntel` as one of the overloaded storage
classes that can be represented by Generic, alongside `WorkGroup`, etc.
I also confirmed with a spec author that one of the intentions of the
spec updates was to allow the cast.

Update the SPIR-V backend to allow the cast. This is basically required
to use function pointers in real world use cases.

---------

Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
DeltaFile
+23-3llvm/test/CodeGen/SPIRV/GlobalISel/fn-ptr-addrspacecast.ll
+0-3llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
+1-0llvm/lib/Target/SPIRV/SPIRVUtils.h
+24-63 files

LLVM/project f33e9e4libc/hdr float_macros.h limits_macros.h

[libc][NFC] Fix #endif comments in hdr/ proxy headers (#198313)

The #endif closing the LIBC_FULL_BUILD guard used the CMake variable
name LLVM_LIBC_FULL_BUILD in its comment rather than the preprocessor
macro LIBC_FULL_BUILD that the #ifdef above references. These are
distinct: LLVM_LIBC_FULL_BUILD is the CMake option; LIBC_FULL_BUILD is
the C macro defined via -DLIBC_FULL_BUILD when that option is ON.

Fixed 113 files under libc/hdr/ with a mechanical substitution.

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+1-1libc/hdr/float_macros.h
+1-1libc/hdr/limits_macros.h
+1-1libc/hdr/link_macros.h
+1-1libc/hdr/locale_macros.h
+1-1libc/hdr/math_function_macros.h
+1-1libc/hdr/math_macros.h
+6-6107 files not shown
+113-113113 files

LLVM/project 1c8c500llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU ctpop.ll ctpop16.ll

[AMDGPU][GISel] Add Register Bank Legalization rules for G_CTPOP. (#197510)
DeltaFile
+1,631-154llvm/test/CodeGen/AMDGPU/ctpop.ll
+746-0llvm/test/CodeGen/AMDGPU/ctpop16.ll
+207-3llvm/test/CodeGen/AMDGPU/ctpop64.ll
+11-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+6-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+2-3llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ctpop.mir
+2,603-1601 files not shown
+2,604-1607 files

LLVM/project 176134cllvm/test/CodeGen/AArch64 aarch64-isel-umin.ll

[AArch64] Regenerate aarch64-isel-umin.ll. NFC (#198337)
DeltaFile
+54-116llvm/test/CodeGen/AArch64/aarch64-isel-umin.ll
+54-1161 files

LLVM/project 2f35736llvm/include/llvm/DebugInfo/PDB/Native RawConstants.h PDBFile.h, llvm/lib/DebugInfo/PDB/Native PDBFile.cpp

[PDB] Add DXContainer parsing inside PDBFile
DeltaFile
+145-0llvm/test/tools/llvm-pdbutil/dxcontainer.test
+25-1llvm/lib/DebugInfo/PDB/Native/PDBFile.cpp
+4-1llvm/include/llvm/DebugInfo/PDB/Native/RawConstants.h
+3-0llvm/include/llvm/DebugInfo/PDB/Native/PDBFile.h
+177-24 files

LLVM/project cf80e0ellvm/test/Transforms/SLPVectorizer/X86 scalarize-ctlz.ll arith-fp-inseltpoison.ll

[SLP] Preserve profitable trees when subtree trimming would reduce to buildvector-only

In calculateTreeCostAndTrimNonProfitable, the subtree trim loop returns
Invalid when trimming node Idx==1 under an InsertElement root would
leave only a buildvector, to avoid infinite vectorization attempts.
This is too aggressive when the original untrimmed tree is already
profitable (Cost < -SLPCostThreshold). In that case, undo any partial
trims and return the original cost instead of rejecting the tree.

Original Pull Request: https://github.com/llvm/llvm-project/pull/197763

Recommit after unrelated revert in https://github.com/llvm/llvm-project/pull/198265

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/198336
DeltaFile
+48-29llvm/test/Transforms/SLPVectorizer/X86/scalarize-ctlz.ll
+19-32llvm/test/Transforms/SLPVectorizer/X86/arith-fp-inseltpoison.ll
+19-32llvm/test/Transforms/SLPVectorizer/X86/arith-fp.ll
+9-10llvm/test/Transforms/SLPVectorizer/X86/deleted-instructions-clear.ll
+7-10llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll
+7-10llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll
+109-1234 files not shown
+138-14010 files

LLVM/project dce3bc2.ci compute_projects_test.py compute_projects.py

[CI] Run libc tests on clang changes (#198295)

The libc tests are relatively lightweight, and given we build libc with
a just built clang, it's very easy for clang changes to cause issues in
libc, especially with -Werror. For example, #187860 broke libc due to
adding a new warning that libc was not clean on.
DeltaFile
+6-4.ci/compute_projects_test.py
+1-1.ci/compute_projects.py
+7-52 files

LLVM/project 4c76d40libsycl/src/detail queue_impl.cpp

apply new liboffload kernel launch API

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+4-7libsycl/src/detail/queue_impl.cpp
+4-71 files

LLVM/project 20165e1llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 reused-extract-scalar-lanes.ll

[SLP] Prefer VF-matching scalar-set match in gather-shuffle lookup

In isGatherShuffledSingleRegisterEntry, the perfect-match search accepted
an entry that isSame(TE->Scalars) regardless of the entry's vector factor.
isSame can succeed via ReuseShuffleIndices on an entry whose actual VF is
smaller than TE->Scalars.size(); the subsequent mask construction then
copies TE->getCommonMask() indices that overrun the chosen source's lanes,
producing wrong shufflevector masks and a more-poisonous result than the
scalar code.

Fixes #197765

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/198334
DeltaFile
+3-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+1-3llvm/test/Transforms/SLPVectorizer/X86/reused-extract-scalar-lanes.ll
+4-42 files

LLVM/project 2e93539llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 reused-extract-scalar-lanes.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+3-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+1-3llvm/test/Transforms/SLPVectorizer/X86/reused-extract-scalar-lanes.ll
+4-42 files

LLVM/project 8ec281bllvm/lib/IR Intrinsics.cpp, llvm/test/CodeGen/AArch64 sve-bad-intrinsics.ll

[LLVM] Precise error message for intrinsic signature verification (1/n) (#196802)

Generate more precise error message when intrinsic signature
verification fails. Keep track of the current position/component of the
intrinsic signature being checked and print a more descriptive error
message which includes the position/element of the signature that failed
and the reason it failed.

Note that not all cases in `matchIntrinsicType` generate errors, so have
a temporary fallback to keep generating a generic error message in those
cases. This fallback will be eventually removed.

Added a C++ unit test for testing intrinsic struct return type that is
either an identified struct or a packed struct, as these cases cannot be
created from a .ll file directly (since autoupgrade in the parser fixes
them up).
DeltaFile
+291-0llvm/test/Verifier/intrinsic-bad-arg-type1.ll
+171-47llvm/lib/IR/Intrinsics.cpp
+36-0llvm/unittests/IR/VerifierTest.cpp
+2-2llvm/test/CodeGen/AArch64/sve-bad-intrinsics.ll
+1-1llvm/test/CodeGen/WinEH/wineh-intrinsics-invalid.ll
+501-505 files

LLVM/project f63b8eellvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 pr196804.ll

[SelectionDAG] Fix miscompile in known-0/1 setcc fold with XOR (#196804) (#197767)

When simplifySetCC folds `(xor X, C) != 0` (where the XOR result is
known 0/1) into `TRUNCATE(XOR X, C)`, later DAG combines can incorrectly
fold the XOR back into its source operand, losing the NOT semantics.
This causes the x86 backend to test the original value instead of the
XOR result, inverting the condition and producing wrong code.

Fix by folding `(xor X, C) ==/!= N1` directly into `setcc(X, N1^C,
cond)` instead of returning TRUNCATE(XOR). The SETCC form is canonical
and immune to the problematic DAG combine.

Fixes #196804.
DeltaFile
+26-0llvm/test/CodeGen/X86/pr196804.ll
+3-1llvm/lib/Target/X86/X86ISelLowering.cpp
+29-12 files

LLVM/project e88d1b7clang/docs TypeSanitizer.rst

[Docs] Fix and update TySan docs (#198331)
DeltaFile
+3-3clang/docs/TypeSanitizer.rst
+3-31 files