LLVM/project 8240cf3llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanRecipes.cpp, llvm/unittests/Transforms/Vectorize VPlanTest.cpp VPlanVerifierTest.cpp

[VPlan] Always set flags for overflowing ops etc via VPIRFlags. (#179138)

Enforce that all VPInstructions set the correct OpType of the VPIRFlags.
Flag mis-matches (e.g. VPInstruction Add without `OverflowingBinOp`
being set) can cause crashes (e.g. in CSE) or potentially mis-compiles.

Add a few helpers in VPBuilder to create common instructions with
correct flags.

PR: https://github.com/llvm/llvm-project/pull/179138
DeltaFile
+43-44llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+48-19llvm/unittests/Transforms/Vectorize/VPlanTest.cpp
+59-0llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+29-9llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+27-9llvm/unittests/Transforms/Vectorize/VPlanVerifierTest.cpp
+10-15llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+216-964 files not shown
+249-10810 files

LLVM/project 0fe2c50libcxx/include/__algorithm for_each.h generate_n.h

make func accept rval
DeltaFile
+10-4libcxx/include/__algorithm/for_each.h
+5-2libcxx/include/__algorithm/generate_n.h
+3-2libcxx/include/__algorithm/for_each_n.h
+18-83 files

LLVM/project aff1d33llvm/test/CodeGen/AMDGPU/GlobalISel unmerge-sgpr-s16.mir

AMDGPU/GlobalISel: add mir test for sgpr s16 unmerge
DeltaFile
+65-0llvm/test/CodeGen/AMDGPU/GlobalISel/unmerge-sgpr-s16.mir
+65-01 files

LLVM/project 2ba1b05llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel unmerge-sgpr-s16.mir

AMDGPU/GlobalISel: Fix sgpr s16 unmerge lowering in regbanklegalize

Used to fail EXPENSIVE_CHECKS because of type mismatch.
DeltaFile
+5-3llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+4-4llvm/test/CodeGen/AMDGPU/GlobalISel/unmerge-sgpr-s16.mir
+9-72 files

LLVM/project affcbcelibcxx/test/benchmarks/format std_format_spec_string_unicode_escape.bench.cpp std_format_spec_string_unicode.bench.cpp

[libc++][NFC] Disable std_format_spec benchmarks through lit instead of the preprocessor (#179228)

This is probably a relic from when we didn't use lit to run benchmarks.
Nowadays we should just use the lit features to disable benchmarks like
we do in any other test instead of using the preprocessor.
DeltaFile
+10-15libcxx/test/benchmarks/format/std_format_spec_string_unicode_escape.bench.cpp
+10-14libcxx/test/benchmarks/format/std_format_spec_string_unicode.bench.cpp
+20-292 files

LLVM/project da092b4llvm/lib/CodeGen CodeGenPrepare.cpp

use DT.get() for eliminateFallThrough() as we are going to use it not update it in eliminateFallThrough()
DeltaFile
+1-1llvm/lib/CodeGen/CodeGenPrepare.cpp
+1-11 files

LLVM/project f0b184allvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel regbankselect-unmerge-values.mir

AMDGPU/GlobalISel: Fix sgpr s16 unmerge lowering in regbanklegalize

Used to fail EXPENSIVE_CHECKS because of type mismatch.
DeltaFile
+5-3llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+4-4llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-unmerge-values.mir
+9-72 files

LLVM/project d0b552ellvm/test/CodeGen/AMDGPU/GlobalISel regbankselect-unmerge-values.mir

AMDGPU/GlobalISel: add mir test for sgpr s16 unmerge
DeltaFile
+66-3llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-unmerge-values.mir
+66-31 files

LLVM/project 9411f5dllvm/test/Transforms/SLPVectorizer/X86 shl-to-add-transformation2.ll

[SLP][NFC]Add another test for shl-to-add transformation, NFC
DeltaFile
+34-0llvm/test/Transforms/SLPVectorizer/X86/shl-to-add-transformation2.ll
+34-01 files

LLVM/project 3b6b109llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeHelper.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel unmerge-sgpr-s16.mir

AMDGPU/GlobalISel: Fix sgpr s16 unmerge lowering in regbanklegalize

Used to fail EXPENSIVE_CHECKS because of type mismatch.
DeltaFile
+5-3llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+4-4llvm/test/CodeGen/AMDGPU/GlobalISel/unmerge-sgpr-s16.mir
+9-72 files

LLVM/project b2fe23ellvm/test/CodeGen/AMDGPU/GlobalISel unmerge-sgpr-s16.mir

AMDGPU/GlobalISel: add mir test for sgpr s16 unmerge
DeltaFile
+65-0llvm/test/CodeGen/AMDGPU/GlobalISel/unmerge-sgpr-s16.mir
+65-01 files

LLVM/project 9ceb6a8polly/lib/CodeGen LoopGeneratorsKMP.cpp LoopGeneratorsGOMP.cpp

[Polly][NFCI] Avoid R-value modification
DeltaFile
+1-1polly/lib/CodeGen/LoopGeneratorsKMP.cpp
+1-1polly/lib/CodeGen/LoopGeneratorsGOMP.cpp
+2-22 files

LLVM/project a2c7c60llvm/lib/Transforms/Scalar SeparateConstOffsetFromGEP.cpp, llvm/test/Transforms/SeparateConstOffsetFromGEP negative-i32-offset.ll

Revert "[SeparateConstOffsetFromGEP] Decompose constant xor operand if possible" (#179339)

A miscompile was found (see #175724), and it's complicated to fix. We're
going to revert for now, and look at reimplementing a fixed version
later.
DeltaFile
+0-435llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/xor-decompose.ll
+4-81llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+2-3llvm/test/Transforms/SeparateConstOffsetFromGEP/negative-i32-offset.ll
+6-5193 files

LLVM/project 139e2fbmlir/include/mlir/Dialect/Tosa/IR TosaShapeOps.td, mlir/lib/Dialect/Tosa/IR TosaCanonicalizations.cpp

[mlir][tosa]: Add Binary Shape Ops folders (#178877)

* SUB_SHAPE
* MUL_SHAPE
* DIV_CEIL_SHAPE
* DIV_FLOOR_SHAPE
* MOD_SHAPE


Change-Id: I12500bbc05c62730e0dc9cc8d3f20b02845d407e

Signed-off-by: Udaya Ranga <udaya.ranga at arm.com>
DeltaFile
+165-0mlir/test/Dialect/Tosa/constant_folding.mlir
+141-11mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp
+10-0mlir/include/mlir/Dialect/Tosa/IR/TosaShapeOps.td
+316-113 files

LLVM/project df0c2e4llvm/lib/Target/AMDGPU SIFoldOperands.cpp

[AMDGPU] Clear no convergence flag on onperand folding. NFCI

Clear the flag. It fails verification if set, only convergent
operations may have NoConvergent flag. NFCI as it is now because
it just does not happen.
DeltaFile
+2-0llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+2-01 files

LLVM/project 79eb804llvm/lib/Target/AArch64 MachineSMEABIPass.cpp, llvm/test/CodeGen/AArch64 sme-za-lazy-save-buffer.ll sme-agnostic-za.ll

[AArch64][SME] Limit where SME ABI optimizations apply (#179273)

These were added recently with a fairly complex propagation step,
however, these optimizations can cause regressions in some cases.
    
This patch limits the cross-block optimizations to the simple case
picking a state that matches all incoming blocks. If any block doesn't
match, we fallback to using "ACTIVE", the default state.
DeltaFile
+17-149llvm/lib/Target/AArch64/MachineSMEABIPass.cpp
+48-93llvm/test/CodeGen/AArch64/sme-za-lazy-save-buffer.ll
+27-55llvm/test/CodeGen/AArch64/sme-agnostic-za.ll
+50-19llvm/test/CodeGen/AArch64/sme-za-exceptions.ll
+15-11llvm/test/CodeGen/AArch64/sme-za-control-flow.ll
+4-17llvm/test/CodeGen/AArch64/sme-new-za-function.ll
+161-3446 files

LLVM/project a35b594clang/lib/CIR/CodeGen CIRGenModule.cpp TargetInfo.cpp, clang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

[CIR][CUDA][HIP] Add NVPTX target info and CUDA/HIP global emission filtering (#177827)

related: #175871 

This patch adds foundational infra for device-side CUDA/HIP compilation
by introducing NVPTX target info and implementing the global emission
filtering logic.


  NVPTX Target Info to allows us to compile against that triple:
  - Add NVPTXABIInfo and NVPTXTargetCIRGenInfo classes
  - Wire up nvptx and nvptx64 triples in getTargetCIRGenInfo()
  - Add createNVPTXTargetCIRGenInfo() factory function

CUDA/HIP Global Emission Filtering (most of this is boilerplate from the
AST) This basically narrows down to:
- Skip host-only functions (no `__device__` attribute) when
`-fcuda-is-device`
   - Skip device-only functions (device without  host) on host side

    [5 lines not shown]
DeltaFile
+58-0clang/lib/CIR/CodeGen/CIRGenModule.cpp
+55-0clang/test/CIR/CodeGenCUDA/filter-decl.cu
+30-0clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+19-0clang/lib/CIR/CodeGen/TargetInfo.cpp
+4-0clang/lib/CIR/CodeGen/CIRGenModule.h
+2-0clang/lib/CIR/CodeGen/TargetInfo.h
+168-06 files

LLVM/project c7408d1llvm/include/llvm/IR IRBuilder.h, llvm/lib/IR IRBuilder.cpp

[AMDGPU][SROA] Unify cast chain implementations (#177945)

The AMDGPU promote alloca pass is missing a conversion link when casting
between vectors of pointers and pointers or vectors of pointers with
different number of elements. This causes codegen to crash due to
invalid casts being generated. To address this, this commit adds the
missing conversion link.

In addition to this, the commit moves the common load/store cast logic
into a new function `createLoadStoreCastChain`.

---------

Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>
Co-authored-by: Steffen Holst Larsen <HolstLarsen.Steffen at amd.com>
DeltaFile
+13-106llvm/lib/Transforms/Scalar/SROA.cpp
+73-0llvm/lib/IR/IRBuilder.cpp
+57-6llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll
+8-44llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+7-0llvm/include/llvm/IR/IRBuilder.h
+158-1565 files

LLVM/project 9f7c00ellvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Remove `+xs` gating for `tlbip *nxs` instructions

A recent specification update has removed FEAT_XS gating for `tlbip *nxs`
instructions. It remains gated on FEAT_XS for `tlbi *nxs` instructions.
DeltaFile
+6-16llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+8-9llvm/test/MC/AArch64/armv9a-sysp.s
+0-8llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
+2-2llvm/test/MC/AArch64/tlbip-tlbid-or-d128.s
+1-2llvm/lib/Target/AArch64/AArch64SystemOperands.td
+17-375 files

LLVM/project e0fb6d7lldb/include/lldb/Symbol CompilerType.h, lldb/source/Symbol CompilerType.cpp

[lldb][CompilerType] Remove CompilerType::IsFloat (#179212)

Depends on:
* https://github.com/llvm/llvm-project/pull/178906

Ever since https://github.com/llvm/llvm-project/pull/178906 this API is
the same as `IsFloatingPointType`. There's no compelling reason for this
to exist.
DeltaFile
+9-9lldb/source/ValueObject/ValueObject.cpp
+3-2lldb/source/ValueObject/DILEval.cpp
+0-2lldb/source/Symbol/CompilerType.cpp
+0-2lldb/include/lldb/Symbol/CompilerType.h
+12-154 files

LLVM/project 347e21apolly/lib/CodeGen BlockGenerators.cpp, polly/test/CodeGen/OpenMP issue179135.ll

[Polly] Use GenDT in assertion (#179433)

`DT` is always the analysis for the to-be-optimized function while
`GenDT` is the analysis of the function that we currently generate code
for which can also be an outlined function. Here, we want to check
dominance in the generated code, hence we must use `GenDT`.

Fixes: #179135
DeltaFile
+32-0polly/test/CodeGen/OpenMP/issue179135.ll
+4-4polly/lib/CodeGen/BlockGenerators.cpp
+36-42 files

LLVM/project a6b232allvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

[AArch64][llvm] Gate some `tlbip` insns with +tlbid or +d128

Change the gating of `tlbip` instructions containing `*E1IS*`, `*E1OS*`,
`*E2IS*` or `*E2OS*` to be used with `+tlbid` or `+d128`. This is because
the 2025 Armv9.7-A MemSys specification says:

```
  All TLBIP *E1IS*, TLBIP*E1OS*, TLBIP*E2IS* and TLBIP*E2OS* instructions
  that are currently dependent on FEAT_D128 are updated to be dependent
  on FEAT_D128 or FEAT_TLBID
```
DeltaFile
+259-0llvm/test/MC/AArch64/tlbip-tlbid-or-d128.s
+66-66llvm/test/MC/AArch64/armv9a-sysp.s
+14-6llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+20-0llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+11-2llvm/lib/Target/AArch64/AArch64SystemOperands.td
+370-745 files

LLVM/project e7475e6llvm/lib/Target/AArch64 AArch64InstrInfo.td AArch64SystemOperands.td, llvm/test/MC/AArch64 armv9a-sysp.s armv9-mrrs.s

[AArch64][llvm] Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions

Remove `+d128` gating on `sysp`, `msrr` and `mrrs` instructions.

We removed gating for `sys`, `mrs` and `mrs` instructions previously,
on the basis that it doesn't add value, as it doesn't indicate that
any particular system registers or system instructions are available.

Therefore, remove `+d128` gating for these too.

(In an upcoming change, some `tlbip` instructions, which are `sysp` aliases
are allowed to be used with either `+d128` or `tlbid`. If we don't remove
this gating, then it would require some ugly work-arounds in the code to
support the relaxation mandated by the 2025 MemSys specification.

In this change, retain `+d128` gating for all `tlbip` instructions, which
will then be loosened to either `+d128` or `+tlbid` in a subsequent change)
DeltaFile
+122-196llvm/test/MC/AArch64/armv9a-sysp.s
+7-97llvm/test/MC/AArch64/armv9-mrrs.s
+42-46llvm/lib/Target/AArch64/AArch64InstrInfo.td
+7-53llvm/test/MC/AArch64/armv9-msrr.s
+4-2llvm/lib/Target/AArch64/AArch64SystemOperands.td
+2-3llvm/test/MC/AArch64/directive-arch_extension-negative.s
+184-3973 files not shown
+190-3989 files

LLVM/project b96ef9cclang/include/clang/StaticAnalyzer/Core/PathSensitive CoreEngine.h ExprEngine.h, clang/lib/StaticAnalyzer/Core ExprEngine.cpp CoreEngine.cpp

[NFC][analyzer] Refactor switch handling in the engine (#178678)

This commit refactors `ExprEngine::processSwitch()` and related logic to
make it easier to understand and "prepare the ground" for planned
functional changes.

Unfortunately there were many idiosyncratic decisions in this part of
the engine -- e.g. `BranchNodeBuilder` does not derive from
`NodeBuilder` and doesn't use a `NodeBuilderContext`. For now I left
these skeletons in the closet, but I tried to pick the low-hanging fruit
and moved `processSwitch` a bit closer to its "big sibling"
`processBranch`.

For example I moved the initialization of the node builder into the body
of `processSwitch` because if I want to trigger `BranchCondition`
callbacks from this method (the way `processBranch` does it) I will need
to iterate over the nodes created by checkers and construct a new node
builder in each iteration.


    [5 lines not shown]
DeltaFile
+35-44clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+8-41clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
+5-10clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
+2-1clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+50-964 files

LLVM/project c85457flldb/source/Plugins/ABI/ARM ABISysV_arm.cpp, lldb/source/Plugins/ABI/X86 ABISysV_x86_64.cpp ABIWindows_x86_64.cpp

[lldb][TypeSystemClang] Remove mostly unused is_complex output parameter to IsFloatingPointType (#178906)

Depends on:
* https://github.com/llvm/llvm-project/pull/178904

(only last commit is relevant for the review)

This is part of a patch series to clean up the
TypeSystemClang::IsFloatingPointType API. The `is_complex` parameter is
rarely checked. This patch introduces a `CompilerType::IsComplexType`
API which callers that previously checked `is_complex` can use instead.

This will also allow us to remove `CompilerType::IsFloat`, which is just
`IsFloatingPointType` that ignores the `is_complex` parameter.
DeltaFile
+12-11lldb/source/Symbol/CompilerType.cpp
+10-12lldb/source/Plugins/ABI/ARM/ABISysV_arm.cpp
+5-14lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp
+15-0lldb/unittests/Symbol/TestTypeSystemClang.cpp
+2-4lldb/source/Plugins/ABI/X86/ABISysV_x86_64.cpp
+2-4lldb/source/Plugins/ABI/X86/ABIWindows_x86_64.cpp
+46-455 files not shown
+54-5511 files

LLVM/project 871c643llvm/include/llvm/Transforms/IPO Attributor.h, llvm/lib/Passes PassBuilderPipelines.cpp

Attributor: Add -light options to -attributor-enable flag (#179346)

Add light, module-light, and cgscc-light options. This just
supplements the existing flag to use the light variants of the
pass in place of the full versions.

Way back when attributor-light was added in 400fde92963588ae2b,
there was no way to change the pass pipeline to use it. There
were some benchmarks posted, but I don't see precisely how it
was benchmarked in the pipeline.

I'm also surprised this option is only additive, and doesn't remove
FunctionAttrs. If this is to be the option to drive the enablement,
I would expect it to not run the old passes.
DeltaFile
+24-0llvm/test/Other/opt-pipeline-attributor-enable.ll
+12-2llvm/lib/Passes/PassBuilderPipelines.cpp
+5-1llvm/include/llvm/Transforms/IPO/Attributor.h
+41-33 files

LLVM/project da43386clang/test/CIR/CodeGenCUDA filter-decl.cu

fix nit test case
DeltaFile
+1-1clang/test/CIR/CodeGenCUDA/filter-decl.cu
+1-11 files

LLVM/project ac0327aclang/lib/CIR/CodeGen CIRGenModule.cpp

le format monseiur
DeltaFile
+3-4clang/lib/CIR/CodeGen/CIRGenModule.cpp
+3-41 files

LLVM/project b204f16clang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

fix lit includes yet again
DeltaFile
+5-5clang/test/CIR/CodeGenCUDA/filter-decl.cu
+2-2clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+7-72 files

LLVM/project bb7054cclang/test/CIR/CodeGenCUDA filter-decl.cu nvptx-basic.cu

nit: fix lit includes
DeltaFile
+4-4clang/test/CIR/CodeGenCUDA/filter-decl.cu
+1-1clang/test/CIR/CodeGenCUDA/nvptx-basic.cu
+5-52 files