LLVM/project 1ffae54llvm/lib/Target/X86 X86TargetTransformInfo.cpp, llvm/test/Analysis/CostModel/X86 alternate-shuffle-cost.ll shuffle-select.ll

[CostModel][X86] Reduce cost of pre-SSE41 select shuffle (#207400)

The all-logic instructions have better throughput/latency than shuffles

Confirmed with uops.info, llvm-mca and agner
DeltaFile
+24-24llvm/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll
+14-18llvm/test/Transforms/SLPVectorizer/X86/reduced-value-stored.ll
+8-8llvm/test/Analysis/CostModel/X86/shuffle-select.ll
+6-6llvm/test/Analysis/CostModel/X86/shuffle-two-src.ll
+3-7llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+2-3llvm/test/Transforms/PhaseOrdering/X86/hadd.ll
+57-662 files not shown
+61-718 files

LLVM/project 4ba692fclang/include/clang/Lex TextEncoding.h, clang/lib/Lex TextEncoding.cpp

Change ToLiteralEncodingConverter to a unique_ptr (#207258)

This patch changes ToLiteralEncodingConverter from a raw pointer to a unique_ptr
DeltaFile
+2-2clang/lib/Lex/TextEncoding.cpp
+1-1clang/include/clang/Lex/TextEncoding.h
+3-32 files

LLVM/project 433eef6llvm/include/llvm/IR Module.h, llvm/lib/AsmParser LLParser.cpp

Revert "[IR] Explicitly specify target feature for module asm" (#207399)

Reverts llvm/llvm-project#204548

This is causing the runtimes build to fail with e.g.:

```
<inline asm>:11:5: error: 32 bit reloc applied to a field with a different size
   11 | jmp __interceptor_strlen at plt
      |     ^
```

See comments on the PR.
DeltaFile
+11-85llvm/include/llvm/IR/Module.h
+36-38llvm/lib/Object/ModuleSymbolTable.cpp
+12-29llvm/lib/IR/AsmWriter.cpp
+3-36llvm/lib/AsmParser/LLParser.cpp
+0-29llvm/test/Bitcode/module-asm.ll
+6-22llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+68-23953 files not shown
+180-56059 files

LLVM/project 6bc5135llvm/lib/CodeGen/AsmPrinter DwarfDebug.cpp, llvm/test/DebugInfo/X86 implicit-value-truncated-integer.ll

[DebugInfo] Truncate implicit value constants to source type width (#206671)

This is a follow-up to #204353.

mikaelholmen and bevin-hansson reported that the previous change could
assert downstream when emitting `DW_OP_implicit_value` for a source
integer type wider than the target generic DWARF stack type, if the
debug-value carrier integer contains bits outside the declared source
type width.

The fix is to construct the source-width `APInt` with explicit
truncation enabled before emitting the implicit value bytes. This
preserves the intended wrap/truncate behavior and avoids asserting on
otherwise recoverable debug-value input.

A regression test is added for an `unsigned _BitInt(48)` debug value on
i386, covering both an out-of-range positive carrier value and an
all-ones negative carrier value.
DeltaFile
+38-0llvm/test/DebugInfo/X86/implicit-value-truncated-integer.ll
+2-1llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+40-12 files

LLVM/project e788a46mlir/lib/Conversion/SPIRVToLLVM SPIRVToLLVM.cpp, mlir/test/Conversion/SPIRVToLLVM gl-ops-to-llvm.mlir

[mlir][SPIR-V] Add SPIRVToLLVM lowering for GL Radians and Degrees ops (#205967)
DeltaFile
+35-0mlir/lib/Conversion/SPIRVToLLVM/SPIRVToLLVM.cpp
+30-0mlir/test/Conversion/SPIRVToLLVM/gl-ops-to-llvm.mlir
+65-02 files

LLVM/project fa0822amlir/lib/Dialect/SPIRV/IR SPIRVOps.cpp, mlir/test/Dialect/SPIRV/IR structure-ops.mlir

[mlir][SPIR-V] Fix null deref in SpecConstantOperationOp::verifyRegions (#207328)
DeltaFile
+11-0mlir/test/Dialect/SPIRV/IR/structure-ops.mlir
+3-2mlir/lib/Dialect/SPIRV/IR/SPIRVOps.cpp
+14-22 files

LLVM/project 2a95108mlir/lib/Dialect/SPIRV/Transforms UnifyAliasedResourcePass.cpp, mlir/test/Dialect/SPIRV/Transforms unify-aliased-resource.mlir

[mlir][SPIR-V] Guard getSizeInBytes() optionals in UnifyAliasedResourcePass rewriters (#207325)
DeltaFile
+18-6mlir/lib/Dialect/SPIRV/Transforms/UnifyAliasedResourcePass.cpp
+18-0mlir/test/Dialect/SPIRV/Transforms/unify-aliased-resource.mlir
+36-62 files

LLVM/project d1c2d86clang/include/clang/Lex TextEncoding.h, clang/lib/Lex TextEncoding.cpp

Forward declare TextEncodingConverter in TextEncoding.h, move config.h into TextEncoding.cpp (#207382)

This patch forward declares TextEncodingConverter in
clang/include/clang/Lex/TextEncoding.h, and moves config.h into
llvm/lib/Support/TextEncoding.cpp instead of the header.
DeltaFile
+4-1clang/include/clang/Lex/TextEncoding.h
+1-0llvm/lib/Support/TextEncoding.cpp
+0-1llvm/include/llvm/Support/TextEncoding.h
+1-0clang/lib/Lex/TextEncoding.cpp
+6-24 files

LLVM/project df4641ellvm/include/llvm/IR Module.h, llvm/lib/AsmParser LLParser.cpp

Revert "[IR] Explicitly specify target feature for module asm (#204548)"

This reverts commit 0341dd51303fc13cafdd21a33e2985cbf603c66b.
DeltaFile
+11-85llvm/include/llvm/IR/Module.h
+36-38llvm/lib/Object/ModuleSymbolTable.cpp
+12-29llvm/lib/IR/AsmWriter.cpp
+3-36llvm/lib/AsmParser/LLParser.cpp
+0-29llvm/test/Bitcode/module-asm.ll
+6-22llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+68-23953 files not shown
+180-56059 files

LLVM/project 62f8a7bllvm/lib/Transforms/Vectorize VectorCombine.cpp

[VectorCombine] isExtractExtractCheap - add dbg message showing OldCost vs NewCost (#207386)

Add missing dbs message for VC fold costs decision
DeltaFile
+3-0llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+3-01 files

LLVM/project ee82fc0clang/lib/Sema SemaExpr.cpp, clang/test/Sema matrix-type-operators.c

[Clang] Fix crash on subscripting a complete matrix subscript expression (#207317)

Subscripting a complete MatrixSubscriptExpr (which has scalar type)
caused an assertion failure in ActOnArraySubscriptExpr because the code
unconditionally asserted isIncomplete() on any MatrixSubscriptExpr base.

Fix by guarding the matrix subscript path with an isIncomplete() check,
allowing complete matrix subscript expressions to fall through to the
standard subscript handling, which emits an appropriate diagnostic.

Fixes #203163
DeltaFile
+5-0clang/test/Sema/matrix-type-operators.c
+1-3clang/lib/Sema/SemaExpr.cpp
+6-32 files

LLVM/project ccfab9ellvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 build-vector-reconstructshuffle.ll

[AArch64] Fix ReconstructShuffle for known vscale>1 (#205099)

The code at AArch64TargetLowering::ReconstructShuffle expects
NEON-compatible types. But for e.g. vscale_range = {2}, we can get legal
fixed-length vectors that are wider than 128 bits.
DeltaFile
+99-0llvm/test/CodeGen/AArch64/build-vector-reconstructshuffle.ll
+6-2llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+105-22 files

LLVM/project 6247810llvm/test/CodeGen/X86 haddsub-undef.ll

[X86] haddsub-undef.ll - update remaining tests to match IR generated by middle-end (#207391)

Last prep for #143000
DeltaFile
+104-268llvm/test/CodeGen/X86/haddsub-undef.ll
+104-2681 files

LLVM/project c87a57fclang/include/clang/Basic IdentifierTable.h, clang/lib/Basic IdentifierTable.cpp

[Clang] Remove unused TokenKey::KEYNOZOS (#207132)

[Clang] Remove unused TokenKey::KEYNOZOS

  KEYNOZOS was defined as a TokenKey flag to mark keywords not supported
  on z/OS, but no keyword in TokenKinds.def actually uses it. This patch
  removes the unused enum value and its associated handling code.

  Build: `ninja clang` succeeded (2923/2923 targets).
  Tests: `ninja check-clang` passed — 51180 passed, 0 failed.

AI assistance was used for code review analysis and CI failure
debugging.

  Fixes #206877

Co-authored-by: Chenguang Ding <dingchenguang at kylinos.cn>
DeltaFile
+8-3clang/include/clang/Basic/IdentifierTable.h
+0-3clang/lib/Basic/IdentifierTable.cpp
+8-62 files

LLVM/project 1d577d2clang/docs ClangStaticAnalyzer.md, clang/docs/ScalableStaticAnalysis index.md

[analyzer][docs] Fix invalid MyST toctree 'numbered' option after Markdown migration (#207217)

The RST-to-Markdown migration (#206181) converted the RST flag
`:numbered:` into `:numbered: true`.

MyST parses the toctree `numbered` option as `int_or_nothing`, so the
string `true` fails with:

```
'toctree': Invalid option value for 'numbered': true:
invalid literal for int() with base 10: 'true'
```

This breaks the `-W` (warnings-as-errors) `docs-clang-html` build.
Make `numbered` a valueless flag, which MyST accepts (equivalent to the
original RST behavior of numbering all levels).

Assisted-By: claude
DeltaFile
+1-1clang/docs/ClangStaticAnalyzer.md
+1-1clang/docs/ScalableStaticAnalysis/index.md
+2-22 files

LLVM/project 7a3ed4dclang/include/clang/Basic BuiltinsAArch64NeonSVEBridge.def, clang/test/CodeGen/aarch64_neon_sve_bridge_intrinsics acle_neon_sve_bridge_dup_neonq.c acle_neon_sve_bridge_get_neonq.c

[Clang][SVE ACLE] Remove +bf16 requirement from neon-sve bridge builtins. (#205332)

These builtins only care about the size of the element type and do not
require bfloat specific instructions.
DeltaFile
+5-5clang/test/CodeGen/aarch64_neon_sve_bridge_intrinsics/acle_neon_sve_bridge_dup_neonq.c
+5-5clang/test/CodeGen/aarch64_neon_sve_bridge_intrinsics/acle_neon_sve_bridge_get_neonq.c
+5-5clang/test/CodeGen/aarch64_neon_sve_bridge_intrinsics/acle_neon_sve_bridge_set_neonq.c
+3-3clang/include/clang/Basic/BuiltinsAArch64NeonSVEBridge.def
+18-184 files

LLVM/project c1a0167llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-intrinsics.ll

[AMDGPU] Accept sext addresses when folding image ops to a16 (#203189)

canSafelyConvertTo16Bit() only accepts a zext when narrowing image
address coordinates to 16 bits. Add an opt-in AllowI16SExt flag so a
sext from i16 is accepted too, and enable it for sampler-less image
instructions.
Coordinates of sampler-less loads/stores are unsigned, so sext and zext
only disagree for a negative i16 (>= 0x8000), which is already out of
bounds since the maximum image dimension is <= 0x8000. Accepting the
sext therefore lets such coordinates fold to the a16 form, reducing VGPR
pressure.

Co-authored-by: Barbara Mitic <Barbara.Mitic at amd.com>
DeltaFile
+14-5llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+3-4llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+17-92 files

LLVM/project a4e51ffllvm/lib/Transforms/Vectorize VPlanPatternMatch.h VPlanVerifier.cpp

[VPlan] Introduce m_Branch matcher (NFC) (#207383)
DeltaFile
+14-10llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+2-7llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+16-172 files

LLVM/project 4cdb033llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize tail-folding-iv-outside-user.ll vector-loop-backedge-elimination-tail-folding.ll

[VPlan] Optimize pre-increment IV latch users with tail folding (#206499)

This was noticed after #204089 caused IndVarsSimplify to convert some
live out IV users to use the pre-incremented IV, not the
post-incremented.

Tail folded live-outs don't have the `(extract-last-lane
(extract-last-part foo))` form, but instead have the form `(extract-lane
(last-active-lane header-mask), foo)`.
For post-incremented IVs in tail folding, these are converted to
VPInstruction::ExitingIVValue which are handled separately. But
ExitingIVValue can't be used for the pre-incremented IV. So this teaches
optimizeLatchExitInductionUser to detect the last-active-lane of the
header mask form.
DeltaFile
+66-0llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-iv-outside-user.ll
+16-14llvm/test/Transforms/LoopVectorize/VPlan/buildvector-first-lane-only.ll
+3-20llvm/test/Transforms/LoopVectorize/tail-folding-iv-outside-user.ll
+2-8llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-tail-folding.ll
+8-2llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+2-4llvm/test/Transforms/LoopVectorize/no-fold-tail-by-masking-iv-external-uses.ll
+97-486 files

LLVM/project 722c030libc/src/string CMakeLists.txt

Fix CMakeLists dependencies
DeltaFile
+1-1libc/src/string/CMakeLists.txt
+1-11 files

LLVM/project ac47f99clang/include/clang/Basic TokenKinds.h, llvm/include/llvm/ADT DenseMap.h DenseMapInfo.h

[ADT][NFC] Remove unused includes in DenseMap/DenseSet headers (#207282)

Remove unused includes in DenseMap/DenseSet headers.
`llvm/Support/AlignOf.h` was transitively included in
`llvm/Support/JSON.h`
DeltaFile
+2-0clang/include/clang/Basic/TokenKinds.h
+0-2llvm/include/llvm/ADT/DenseMap.h
+0-2llvm/include/llvm/ADT/DenseMapInfo.h
+1-0llvm/include/llvm/Support/JSON.h
+0-1llvm/include/llvm/ADT/DenseSet.h
+3-55 files

LLVM/project afe7d8cllvm/lib/Target/CSKY/AsmParser CSKYAsmParser.cpp

[CSKY] Fix build (#207389)

After 0b413b7d0f5a64e2bb1dea136688b3f4e4ea5e22.
DeltaFile
+2-2llvm/lib/Target/CSKY/AsmParser/CSKYAsmParser.cpp
+2-21 files

LLVM/project 4f2222elibc/include/llvm-libc-macros annex-k-macros.h, libc/src/string strcpy_s.cpp string_utils.h

Address comments
DeltaFile
+37-31libc/src/string/strcpy_s.cpp
+8-1libc/include/llvm-libc-macros/annex-k-macros.h
+4-0libc/src/string/string_utils.h
+1-1libc/src/string/strnlen_s.cpp
+50-334 files

LLVM/project 22e5273mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR omptarget-declare-target-func-visibility.mlir openmp-llvm.mlir

[mlir][OpenMP] Change device declare target functions to hidden visibility (#207234)

During OpenMP lowering, globally visible device functions are emitted.
These functions might not be kernels themselves, but are designed to
only be called in a kernel context. However, if they are unused, and not
inlined, and reference LDS, the AMDGPU ISel emits lots of misleading
warnings related to "local memory global used by non-kernel function".
Fix by changing visibility from external+default to external+hidden,
which allows DCE to just remove the functions.

Claude assisted with this patch.
DeltaFile
+38-0mlir/test/Target/LLVMIR/omptarget-declare-target-func-visibility.mlir
+15-0mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+3-3mlir/test/Target/LLVMIR/openmp-llvm.mlir
+2-2mlir/test/Target/LLVMIR/omptarget-wsloop.mlir
+1-1mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir
+1-1mlir/test/Target/LLVMIR/omptarget-device-shared-mem.mlir
+60-76 files

LLVM/project 965b73allvm/lib/Target/M68k M68kRegisterInfo.cpp

[M68k] Fix build after removal of RegisterClasses pointer array (#207364)

Commit 4d8ec1968023 ("[CodeGen][NFC] Remove RegisterClasses pointer
array (#207204)") removed regclass_begin()/regclass_end() from
TargetRegisterInfo, so those names now resolve to the MCRegisterInfo
versions whose iterator dereferences to a MCRegisterClass rather than a
const TargetRegisterClass *, breaking getMaximalPhysRegClass():

  error: cannot convert 'const llvm::MCRegisterClass' to
  'const llvm::TargetRegisterClass*' in initialization

M68k was not updated in that commit. Switch to the range-based
regclasses() idiom used elsewhere in the same change.

Regressor: 4d8ec1968023 ("[CodeGen][NFC] Remove RegisterClasses pointer
array") (#207204)
DeltaFile
+4-7llvm/lib/Target/M68k/M68kRegisterInfo.cpp
+4-71 files

LLVM/project df6e380llvm/lib/Target/AArch64 AArch64LoadStoreOptimizer.cpp

[AArch64] Minor simplification in aarch64-ldst-opt with an early return (#207182)

Remove the local `MBBIWithRenameReg` by moving an early return at an
even earlier point.

When `MBBIWithRenameReg` is set we always return early. By moving the
early return to `MBBIWithRenameReg` update we get rid of a local
variable which spans  200+ lines. This also fixes a misleading debug
print between `MBBIWithRenameReg` update and early return:

```
LLVM_DEBUG(dbgs() << "Unable to combine these instructions due to "
                << "interference in between, keep looking.\n");
```

This line shouldn't be printed when we set `MBBIWithRenameReg`, which is
fixed with this change.
DeltaFile
+1-5llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+1-51 files

LLVM/project 3b908aallvm/test/CodeGen/X86 haddsub-undef.ll, llvm/test/Transforms/PhaseOrdering/X86 hadd.ll hsub.ll

[X86] haddsub-undef.ll - sync more testnames with their phaseordering equivalents (#207370)

Ensure we have equivalent hadd/sub middle-end test coverage with similar names for lookup
DeltaFile
+455-0llvm/test/Transforms/PhaseOrdering/X86/hadd.ll
+455-0llvm/test/Transforms/PhaseOrdering/X86/hsub.ll
+31-31llvm/test/CodeGen/X86/haddsub-undef.ll
+941-313 files

LLVM/project 61f64e3llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

[AMDGPU][InsertWaitCnt] Remove Leftover Comment (#207378)

The right test cases were added in #206439 so that comment no longer
applies.
DeltaFile
+1-3llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-31 files

LLVM/project f16abd7clang/include/clang/Options FlangOptions.td, flang/include/flang/Lower LoweringOptions.def

[flang][Driver] Add option for real sum reassociation

Compiler driver option for #207371: -freal-sum-reassociation. This is in
the hidden help for now. Disabled by default.

Assisted-by: Codex
DeltaFile
+9-0clang/include/clang/Options/FlangOptions.td
+1-6flang/lib/Lower/ConvertExprToHLFIR.cpp
+5-0flang/lib/Frontend/CompilerInvocation.cpp
+4-0flang/include/flang/Lower/LoweringOptions.def
+2-1flang/test/Lower/split-sum-expression-tree-lowering.f90
+2-0flang/test/Driver/frontend-forwarding.f90
+23-72 files not shown
+26-78 files

LLVM/project 4f06fa9libcxx/test/std/library/description/conventions/customization.point.object cpo.compile.pass.cpp

[libc++][ranges] Enable CPO compile tests (#207123)

`adjacent_transform_view` and `stride_view` were implemented but the
test cases were omitted.

Co-authored-by: Hristo Hristov <zingam at outlook.com>
DeltaFile
+2-2libcxx/test/std/library/description/conventions/customization.point.object/cpo.compile.pass.cpp
+2-21 files