LLVM/project ed5f4afmlir/include/mlir/Bytecode BytecodeImplementation.h, mlir/include/mlir/IR BuiltinDialect.h BuiltinDialect.td

[mlir][bytecode] Add builtin dialect version (#184678)

This adds a singular Builtin dialect version for use with bytecode
serialization. This version is not currently print unless set and not 0
(not planned a bump until next LLVM version). Created a unit test as
this was easiest way to track.

Additionally add emitWarning virtual method to DialectBytecodeReader,
mirroring emitError.

Tested on old mlir-opt reader, and could read, so should be non-breaking
change.
DeltaFile
+56-0mlir/unittests/IR/BuiltinDialectVersionTest.cpp
+35-0mlir/lib/IR/BuiltinDialectBytecode.cpp
+20-0mlir/include/mlir/IR/BuiltinDialect.h
+11-0mlir/lib/Bytecode/Reader/BytecodeReader.cpp
+4-1mlir/include/mlir/IR/BuiltinDialect.td
+3-0mlir/include/mlir/Bytecode/BytecodeImplementation.h
+129-12 files not shown
+132-18 files

LLVM/project d8474abclang/lib/Driver/ToolChains/Arch X86.cpp, llvm/lib/Target/X86 X86RegisterInfo.cpp X86Subtarget.h

[X86] Reduce -ffixed-r compile-time overhead (#184606)

PR #180242 added reserve-r support across the driver and backend, but it
also introduced avoidable compile-time work in hot paths.

In Clang, delay +egpr detection until -ffixed-r16 through -ffixed-r31
are actually queried instead of computing it for every x86_64
invocation.

In LLVM, store X86Subtarget::ReservedRReg in a fixed-size std::bitset
and update X86RegisterInfo::getReservedRegs() to iterate only over the
reserve-r register ranges instead of scanning every target register.

These changes keep reserve-r behavior unchanged while trimming the extra
compile-time overhead introduced by the PR.

Signed-off-by: ZhouGuangyuan <zhouguangyuan.xian at gmail.com>
DeltaFile
+27-18clang/lib/Driver/ToolChains/Arch/X86.cpp
+9-6llvm/lib/Target/X86/X86RegisterInfo.cpp
+3-2llvm/lib/Target/X86/X86Subtarget.h
+1-2llvm/lib/Target/X86/X86Subtarget.cpp
+40-284 files

LLVM/project 1901886llvm/lib/Target/RISCV RISCVSchedAndes45.td, llvm/test/tools/llvm-mca/RISCV/Andes45 rvv-mask.s rvv-permutation.s

[RISCV] Update Andes45 vector mask scheduling info (#184719)

This PR adds latency/throughput for all RVV mask to the andes45 series
scheduling model.
DeltaFile
+314-314llvm/test/tools/llvm-mca/RISCV/Andes45/rvv-mask.s
+35-35llvm/test/tools/llvm-mca/RISCV/Andes45/rvv-permutation.s
+19-6llvm/lib/Target/RISCV/RISCVSchedAndes45.td
+368-3553 files

LLVM/project 7c13f88mlir/test/Conversion/NVGPUToNVVM nvgpu-to-nvvm.mlir, mlir/test/Dialect/NVGPU canonicalization.mlir

[mlir][NVGPU] Fix double spaces in tests after ODS printer fix. NFC. (#185327)

Follow-up to #184253. Update tests that checked for the old double-space
output of GPU and NVVM ops using GPU_DimensionAttr and
SetMaxRegisterActionAttr.

Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+12-12mlir/test/Examples/NVGPU/Ch4.py
+11-11mlir/test/Examples/NVGPU/Ch5.py
+2-2mlir/test/Dialect/NVGPU/canonicalization.mlir
+1-1mlir/test/Examples/NVGPU/Ch0.py
+1-1mlir/test/Examples/NVGPU/Ch3.py
+1-1mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir
+28-282 files not shown
+30-308 files

LLVM/project ade6309mlir/test/Dialect/XeGPU sg-to-wi-experimental.mlir propagate-layout-subgroup.mlir, mlir/test/Integration/Dialect/XeGPU/LANE no-xegpu-ops.mlir

[mlir][XeGPU] Fix double spaces in tests after ODS printer fix. NFC. (#185324)

Follow-up to #184253. Update tests that checked for the old double-space
output of gpu.block_id using GPU_DimensionAttr.

Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+12-12mlir/test/Dialect/XeGPU/sg-to-wi-experimental.mlir
+4-4mlir/test/Dialect/XeGPU/propagate-layout-subgroup.mlir
+4-4mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+3-3mlir/test/Integration/Dialect/XeGPU/LANE/no-xegpu-ops.mlir
+3-3mlir/test/Dialect/XeGPU/subgroup-distribute.mlir
+2-2mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir
+28-281 files not shown
+29-297 files

LLVM/project 9ed0012llvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp

[NFC][AMDGPU] Add debug print to `AMDGPULowerVGPREncoding.cpp`
DeltaFile
+91-3llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+91-31 files

LLVM/project 47cc090llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp, llvm/unittests/Frontend OpenMPIRBuilderTest.cpp

Refactor createIteratorLoop to use OMPIRBuilder utility functions and make end-of-block insertion robust.

- Replace manual splitBasicBlock/branch with splitBB
  and redirectTo()
- When insertion point is at BB.end() and the block is terminated, split
  before the terminator so the original successor path is preserved
  through omp.it.cont
- Add test for unterminated blocks
DeltaFile
+66-0llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+13-23llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+2-1mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+81-243 files

LLVM/project 7ede5b3flang/lib/Optimizer/OpenMP MapInfoFinalization.cpp, flang/test/Transforms omp-map-info-finalization-usm.fir

[Flang][OpenMP] Fix close map flag propagation for derived types in USM (#1557)

This fixes a bug in USM mode where the `close` map type modifer was
attached to some `map.info.op`'s corresponding to user-defined type
members while the parent type instance itself is not marked as `close`.

This fix ensures that if a parent record type map does not have the
'close' flag, it is cleared from its members as well, maintaining
consistency.

Gemini was used to create tests. AI generated test code was reviewed
line-by-line by me. Which were derived from a reproducer I was working
with to debug the issue.

Assisted-by: Gemini <gemini at google.com>
DeltaFile
+35-0offload/test/offloading/fortran/usm_derived_type_allocatable_member.f90
+35-0flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+24-0flang/test/Transforms/omp-map-info-finalization-usm.fir
+94-03 files

LLVM/project a7e4d09llvm/lib/Target/AMDGPU GCNSchedStrategy.cpp AMDGPUCoExecSchedStrategy.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

[AMDGPU] Add structural stall heuristic to scheduling strategies

Implements a structural stall heuristic that considers both resource
hazards and latency constraints when selecting instructions. In coexec,
this changes the pending queue from a binary “not ready to issue”
distinction into part of a unified candidate comparison. Pending
instructions still identify structural stalls in the current cycle, but
they are now evaluated directly against available instructions by stall
cost, making the heuristics both more intuitive and more expressive.

- Add getStructuralStallCycles() to GCNSchedStrategy that computes the
number of cycles an instruction must wait due to:
  - Resource conflicts on unbuffered resources (from the SchedModel)
  - Sequence-dependent hazards (from GCNHazardRecognizer)

- Add getHazardWaitStates() to GCNHazardRecognizer that returns the number
of wait states until all hazards for an instruction are resolved,
providing cycle-accurate hazard information for scheduling heuristics.
DeltaFile
+35-0llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+26-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+7-2llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+6-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+4-0llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+2-2llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+80-71 files not shown
+82-77 files

LLVM/project ed37edellvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir

Update coexec-sched-effective-stall.mir
DeltaFile
+0-2llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+0-21 files

LLVM/project 8fec3f2llvm/lib/Target/AMDGPU AMDGPUTargetMachine.cpp GCNSubtarget.cpp, llvm/test/CodeGen/AMDGPU amdgpu-workload-type-scheduler-debug.mir

Remove module "workload-type" metadata.
DeltaFile
+0-114llvm/test/CodeGen/AMDGPU/amdgpu-workload-type-scheduler-debug.mir
+10-45llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+1-16llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+11-3llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+4-1llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+4-0llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
+30-1796 files

LLVM/project 8ed46c2llvm/lib/Target/AMDGPU AMDGPUCoExecSchedStrategy.cpp AMDGPUTargetMachine.cpp, llvm/test/CodeGen/AMDGPU coexec-sched-effective-stall.mir amdgpu-workload-type-scheduler-debug.mir

[AMDGPU] Add ML-oriented coexec scheduler selection and queue handling

This patch adds the initial coexec scheduler scaffold for machine
learning workloads on gfx1250.

It introduces function and module-level controls for selecting the
AMDGPU preRA and postRA schedulers, including an `amdgpu-workload-type`
module flag that maps ML workloads to coexec preRA scheduling and a nop
postRA scheduler by default.

It also updates the coexec scheduler to use a simplified top-down
candidate selection path that considers both available and pending
queues through a single flow, setting up follow-on heuristic work.
DeltaFile
+275-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.cpp
+124-0llvm/test/CodeGen/AMDGPU/coexec-sched-effective-stall.mir
+114-0llvm/test/CodeGen/AMDGPU/amdgpu-workload-type-scheduler-debug.mir
+64-5llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+43-0llvm/lib/Target/AMDGPU/AMDGPUCoExecSchedStrategy.h
+22-0llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+642-53 files not shown
+663-149 files

LLVM/project 2e93d4cllvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

Refactor and support multiple affinity register for a task

- Support multiple affinity register for a task
- Move iterator loop generate logic to OMPIRBuilder
- Extract iterator loop body convertion logic
- Refactor buildAffinityData by hoisting the creation of affinity_list
- IteratorsOp -> IteratorOp
- Add mlir to llvmir test
DeltaFile
+143-123mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+226-0mlir/test/Target/LLVMIR/openmp-iterator.mlir
+68-16llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+34-1llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+33-0mlir/test/Target/LLVMIR/openmp-llvm.mlir
+504-1405 files

LLVM/project a3e5345llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[mlir][llvmir][OpenMP] Translate affinity clause in task construct to llvmir

Translate affinity entries to LLVMIR by passing affinity information to
createTask (__kmpc_omp_reg_task_with_affinity is created inside PostOutlineCB).
DeltaFile
+92-0llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+59-13mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+19-3llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+12-6llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+0-12mlir/test/Target/LLVMIR/openmp-todo.mlir
+2-0mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
+184-346 files

LLVM/project 9cba219mlir/lib/Target/LLVMIR/Dialect/OpenMP OpenMPToLLVMIRTranslation.cpp, mlir/test/Target/LLVMIR openmp-iterator.mlir openmp-llvm.mlir

Fix insert point for affinity list

Fix dominance issue if affinity list created before dynamic count
DeltaFile
+37-8mlir/test/Target/LLVMIR/openmp-iterator.mlir
+3-5mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+3-2mlir/test/Target/LLVMIR/openmp-llvm.mlir
+43-153 files

LLVM/project a8f0895llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

Implement lowering for omp.iterator in affinity

Create IteratorLoopNestScope for building nested loop for iterator.
Take advantage of RAII so that we can have correct exit for each
level of the loop.
DeltaFile
+158-22mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+82-0llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+27-0llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+1-0mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
+268-224 files

LLVM/project 75decb4llvm/include/llvm/Frontend/OpenMP OMPIRBuilder.h, llvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

Use createLoopSkeleton intead of manually building nested loop

Create flattened 1-dimension canonical loop for omp.iterator
DeltaFile
+92-52mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+0-82llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+0-27llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+92-1613 files

LLVM/project e4160cbflang/lib/Lower/OpenMP Utils.cpp ClauseProcessor.cpp, flang/test/Lower/OpenMP task-affinity.f90

[Flang][mlir][OpenMP] Support affinity clause codegen in Flang (#182222)

This patch translates the Flang AST to the OpenMP dialect for the
affinity clause, including support for the iterator modifier.

2/3 in stack for implementing affinity clause with iterator modifier
1/3 #182218
2/3 #182222
3/3 #182223
DeltaFile
+495-45flang/test/Lower/OpenMP/task-affinity.f90
+302-0flang/lib/Lower/OpenMP/Utils.cpp
+194-7flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+84-18mlir/test/Dialect/OpenMP/ops.mlir
+70-18mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+36-0flang/lib/Lower/OpenMP/Utils.h
+1,181-885 files not shown
+1,216-10411 files

LLVM/project fac1866llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 known-pow2.ll setcc-logic.ll

[X86] Remove redundant and-not pattern code in X86 (#157687)

These transforms are now handled in DAGCombine, so enable
hasAndNotCompare for all scalar cases on X86, and remove the
platform-specific code that does the same thing.
DeltaFile
+12-52llvm/lib/Target/X86/X86ISelLowering.cpp
+30-30llvm/test/CodeGen/X86/known-pow2.ll
+5-5llvm/test/CodeGen/X86/setcc-logic.ll
+2-2llvm/test/CodeGen/X86/avx512-cmp.ll
+49-894 files

LLVM/project 15e7177mlir/test/Conversion/VectorToGPU vector-to-mma-ops.mlir, mlir/test/Dialect/GPU transform-gpu.mlir subgroupId-rewrite.mlir

[mlir][GPU] Fix double spaces in tests after ODS printer fix. NFC. (#185325)

Follow-up to #184253. The ODS attr/type printer fix removed the leading
space from generated print() methods. Update tests that checked for the
old double-space output of GPU ops using GPU_DimensionAttr and
GPU_MmaElementwiseOpAttr.

Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+38-38mlir/test/Dialect/GPU/transform-gpu.mlir
+9-9mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
+7-7mlir/test/Integration/GPU/CUDA/sm90/tma_load_128x128_stride_noswizzle.mlir
+6-6mlir/test/Conversion/VectorToGPU/vector-to-mma-ops.mlir
+5-5mlir/test/Dialect/GPU/subgroupId-rewrite.mlir
+4-4mlir/test/python/dialects/gpu/dialect.py
+69-6919 files not shown
+116-11625 files

LLVM/project 4df95b1mlir/test/Conversion/NVVMToLLVM nvvm-to-llvm.mlir, mlir/test/Dialect/LLVMIR nvvm.mlir

[mlir][NVVM] Fix double spaces in tests after ODS printer fix. NFC. (#185326)

Follow-up to #184253. Update tests that checked for the old double-space
output of NVVM ops using ReductionKindAttr, ShflKindAttr, and
LoadCacheModifierAttr.

Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+36-36mlir/test/python/dialects/nvvm.py
+8-8mlir/test/Conversion/NVVMToLLVM/nvvm-to-llvm.mlir
+8-8mlir/test/Dialect/LLVMIR/nvvm.mlir
+8-8mlir/test/Target/LLVMIR/nvvmir.mlir
+60-604 files

LLVM/project 483fc73llvm/include/llvm/Analysis LoopAccessAnalysis.h Loads.h, llvm/lib/Analysis Loads.cpp LoopAccessAnalysis.cpp

[Loads] Add overload for isDerefAndAlignedInLoop that takes SCEVs.(NFC)

Add an overload of isDereferenceableAndAlignedInLoop that directly takes
the pointer and element sizes as SCEVs. This allows using it from
contexts without relying on an underlying load instruction in follow-up
patches.
DeltaFile
+18-7llvm/lib/Analysis/Loads.cpp
+18-4llvm/lib/Analysis/LoopAccessAnalysis.cpp
+10-3llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+8-0llvm/include/llvm/Analysis/Loads.h
+54-144 files

LLVM/project ae28297clang-tools-extra/clang-tidy/readability ElseAfterReturnCheck.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Fix readability-else-after-return for [[likely]]/[[unlikely]] if (#184684)

Following the PR #181878 I have noticed a false negative when if is
attributed.

Repro:

```cpp
void f()
{
    if (true) {
        return;
    } else {
        // Warns as expected
    }

    if (true) [[likely]] {
        return;
    } else {

    [5 lines not shown]
DeltaFile
+41-0clang-tools-extra/test/clang-tidy/checkers/readability/else-after-return-cxx20.cpp
+7-2clang-tools-extra/docs/ReleaseNotes.rst
+3-2clang-tools-extra/clang-tidy/readability/ElseAfterReturnCheck.cpp
+51-43 files

LLVM/project ecda6d1mlir/lib/Bindings/Python stubgen_runner.py, utils/bazel/llvm-project-overlay/mlir BUILD.bazel build_defs.bzl

[MLIR] [Bazel] Removed the stubgen plumbing added in #179211 (#185292)

The hope was that we would be able to reuse parts of this for
Google-internal builds and for open-source jaxlib builds, but we ended
up with custom plumbing in both cases, so I am now removing this
effectively dead code.
DeltaFile
+0-54mlir/lib/Bindings/Python/stubgen_runner.py
+0-24utils/bazel/llvm-project-overlay/mlir/BUILD.bazel
+0-21utils/bazel/llvm-project-overlay/mlir/build_defs.bzl
+0-2utils/bazel/third_party_build/nanobind.BUILD
+0-1014 files

LLVM/project 74b009cllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 bswap-i64-by-i32-chunks.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+19-11llvm/test/Transforms/SLPVectorizer/X86/bswap-i64-by-i32-chunks.ll
+17-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+36-112 files

LLVM/project c7f6d95clang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/Dialect/IR CIRDialect.cpp

[CIR] Split CIR_UnaryOp into individual operations

Split the monolithic cir.unary operation (which dispatched on a
UnaryOpKind enum) into four separate operations: cir.inc, cir.dec,
cir.minus, and cir.not.

This follows the same pattern used when cir.binop was split into
individual binary operations (AddOp, SubOp, etc.).

Changes:
- Add CIR_UnaryOpInterface with getInput()/getResult() methods
- Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes
- Define IncOp, DecOp, MinusOp, NotOp with per-op folds
- Add Involution trait to NotOp for not(not(x)) -> x folding
- Replace createUnaryOp() with createInc/Dec/Minus/Not builders
- Split LLVM lowering into four separate patterns
- Split LoweringPrepare complex-type handling per unary op
- Update CIRCanonicalize and CIRSimplify for new op types
- Update all codegen files to use bool params instead of UnaryOpKind

    [6 lines not shown]
DeltaFile
+91-105clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+56-88clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+111-28clang/include/clang/CIR/Dialect/IR/CIROps.td
+62-62clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-CtorDtor.cpp
+41-41clang/test/CIR/CodeGenOpenACC/private-clause-pointer-array-recipes-NoOps.cpp
+36-36clang/test/CIR/CodeGenOpenACC/loop-reduction-clause-outline-ops.cpp
+397-36078 files not shown
+1,393-1,36784 files

LLVM/project e25e010llvm/test/Transforms/SLPVectorizer/X86 bswap-i64-by-i32-chunks.ll

[SLP][NFC]Add a test for bswap of i64 by 2 i32 bswaps
DeltaFile
+67-0llvm/test/Transforms/SLPVectorizer/X86/bswap-i64-by-i32-chunks.ll
+67-01 files

LLVM/project 2207296llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/RISCV low-trip-count.ll

[VPlan] Fold constant trunc after EVL simplification.

This fixes a crash for the new test after
6aa115bba55054b0dc81ebfc049e8c7a29e614b2.
DeltaFile
+67-0llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll
+9-0llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+76-02 files

LLVM/project f7ab364llvm/lib/Target/ARM ARMLoadStoreOptimizer.cpp ARMPassRegistry.def, llvm/test/CodeGen/ARM prera-ldst-aliasing.mir

[ARM] Add basic NPM support for LoadStoreOptimizer (#184139)

This is similar to #184090 for ARM, porting the LoadStoreOptimizer to
the new pass manager. The time there are both a pre-ra and post-ra
variant that are ported.
DeltaFile
+103-48llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
+31-0llvm/lib/Target/ARM/ARMPassRegistry.def
+17-3llvm/lib/Target/ARM/ARM.h
+12-6llvm/lib/Target/ARM/ARMTargetMachine.cpp
+2-0llvm/lib/Target/ARM/ARMTargetMachine.h
+1-0llvm/test/CodeGen/ARM/prera-ldst-aliasing.mir
+166-571 files not shown
+167-577 files

LLVM/project 647b1e3mlir/include/mlir/Dialect/LLVMIR NVVMOps.td, mlir/lib/Dialect/LLVMIR/IR NVVMDialect.cpp

[MLIR][NVVM] Add LLVMIR lowering for nvvm.subf (#184968)

This change adds direct LLVMIR lowering to the `nvvm.subf` operation
added in #179162 to prevent translation failures when canonicalization
is not run. Also adds `mlir-translate` tests for `nvvm.subf`.

PTX ISA Reference:
1.
https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-sub
2.
https://docs.nvidia.com/cuda/parallel-thread-execution/#half-precision-floating-point-instructions-sub
DeltaFile
+313-0mlir/test/Target/LLVMIR/nvvm/subf/subf_vector.mlir
+117-0mlir/test/Target/LLVMIR/nvvm/subf/subf.mlir
+67-0mlir/test/Target/LLVMIR/nvvm/subf/subf_invalid.mlir
+20-15mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+13-19mlir/lib/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.cpp
+14-2mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+544-361 files not shown
+551-437 files