LLVM/project 6d96ae6llvm/lib/Target/RISCV RISCVInstrInfoXSf.td

[RISCV] Add tied destination constraint to CustomSiFiveVMACC. (#179567)

As the name suggess, these are multiply-accumulate instructions and
thus they have 3 sources.
DeltaFile
+2-1llvm/lib/Target/RISCV/RISCVInstrInfoXSf.td
+2-11 files

LLVM/project 24c7a10clang/lib/AST Type.cpp, clang/lib/CodeGen/TargetBuiltins WebAssembly.cpp

[Clang][WebAssembly] Fix WASM tables to allow `__funcref` function pointers (#178720)

Allows __funcref pointers to be used as the element type for WASM tables
in Clang (static, global, zero-length arrays of a reference type).
Modifies `QualType::isWebAssemblyFuncrefType` to correctly look at the
addrspace of the pointee, rather than the pointer type.

Related: #140933
DeltaFile
+69-0clang/test/CodeGen/WebAssembly/builtins-table-funcref.c
+0-67clang/test/CodeGen/WebAssembly/builtins-table.c
+67-0clang/test/CodeGen/WebAssembly/builtins-table-externref.c
+18-0clang/test/Sema/wasm-funcref-table.c
+2-2clang/lib/CodeGen/TargetBuiltins/WebAssembly.cpp
+2-1clang/lib/AST/Type.cpp
+158-706 files

LLVM/project 7083354llvm/lib/Target/RISCV RISCVSubtarget.h

[RISCV] Remove deprecated RISCVSubtarget::hasStdExtCOrZca(). NFC (#179616)

DeltaFile
+0-2llvm/lib/Target/RISCV/RISCVSubtarget.h
+0-21 files

LLVM/project 273ee97llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel regbankselect-ssube.mir regbankselect-sadde.mir

[AMDGPU][GlobalISel] Add G_SADDE/SSUBE RegBankLegalize rule (#179603)

DeltaFile
+54-111llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ssube.mir
+53-110llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-sadde.mir
+1-1llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+108-2223 files

LLVM/project 26b1f61mlir/include/mlir/Dialect/MPI/IR Utils.h, mlir/lib/Conversion/MPIToLLVM MPIToLLVM.cpp

[mlir][shard,mpi] Fixing lowering allgather shard->mpi->llvm (#178870)

`shard.allgather` concatenates along a specified gather-axis. However,
`mpi.allgather` always concatenates along the first dimension and there
is no MPI operation that allows gathering along an arbitrary axis.
Hence, if gather-axis!=0, we need to create a temporary buffer where we
gather along the first dimension and then copy from that buffer to the
final output along the specified gather-axis. This is not ideal by far.

Along the way also
- fixing computation of memref size in mpitollvm
- adding a simple canonicalization pattern for comm_size for easier
debugging
- adding more tests
DeltaFile
+93-7mlir/test/Conversion/ShardToMPI/convert-shard-to-mpi.mlir
+90-6mlir/lib/Conversion/ShardToMPI/ShardToMPI.cpp
+43-15mlir/test/Conversion/MPIToLLVM/mpitollvm.mlir
+54-2mlir/test/Dialect/Shard/partition.mlir
+38-15mlir/lib/Conversion/MPIToLLVM/MPIToLLVM.cpp
+43-0mlir/include/mlir/Dialect/MPI/IR/Utils.h
+361-453 files not shown
+391-679 files

LLVM/project 7ea33e6llvm/lib/Target/AArch64 AArch64InstrInfo.td, llvm/lib/Target/PowerPC PPCInstrVSX.td

[CodeGen] Remove unused first operand of SUBREG_TO_REG (#179690)

The first input operand of SUBREG_TO_REG was an immediate that most
targets set to 0. In practice it had no effect on codegen. Remove it.
DeltaFile
+137-155llvm/lib/Target/PowerPC/PPCInstrVSX.td
+107-115llvm/lib/Target/AArch64/AArch64InstrInfo.td
+35-35llvm/test/CodeGen/AArch64/bf16_fast_math.ll
+32-32llvm/lib/Target/X86/X86InstrAVX512.td
+30-30llvm/test/CodeGen/X86/tail-dup-pred-succ-size.mir
+22-22llvm/test/CodeGen/AArch64/aarch64-combine-gather-lanes.mir
+363-389148 files not shown
+840-944154 files

LLVM/project 8dc73b6llvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 arm64-cvt-simd-fptoi.ll arm64-cvtf-simd-itofp.ll

[AArch64][llvm] Pre-commit tests for enabling streaming with +fprcvt

Add pre-commit tests for enabling streaming with +fprcvt. Because I've
added a `+sve,+neon,+fullfp16,+fprcvt -force-streaming-compatible` line
to the testfiles, this required a small change to prevent an assert.
DeltaFile
+1,552-0llvm/test/CodeGen/AArch64/arm64-cvt-simd-fptoi.ll
+282-0llvm/test/CodeGen/AArch64/arm64-cvtf-simd-itofp.ll
+62-0llvm/test/CodeGen/AArch64/fp16_i16_intrinsic_scalar.ll
+4-0llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+1,900-04 files

LLVM/project 265a994llvm/lib/Target/RISCV RISCVSubtarget.cpp RISCVSubtarget.h, llvm/lib/Target/RISCV/MCTargetDesc RISCVMCTargetDesc.cpp RISCVMCTargetDesc.h

[RISCV] Add C/Zcf/Zcd/Zce implication rules to subtarget construction. (#179615)

This ensures the feature bits and RISCVSubtarget flags match what
RISCVISAInfo would do.

I'm not excited about the code duplication, but I need to set the
RISCVSubtarget flags along with calling ToggleFeature. I'll think about
how to improve this.
DeltaFile
+62-2llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+9-0llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+3-4llvm/lib/Target/RISCV/RISCVSubtarget.h
+5-0llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.h
+1-2llvm/lib/Target/RISCV/RISCVFeatures.td
+80-85 files

LLVM/project 96448efllvm/test/tools/llvm-readobj/ELF unwind-sdata8.test unwind.test, llvm/tools/llvm-readobj DwarfCFIEHPrinter.h

[llvm-readelf] --unwind: Support DW_EH_PE_sdata8 encoding (#179152)

... for both eh_frame_ptr_enc and table_enc fields when parsing the
PT_GNU_EH_FRAME program header (which contains .eh_frame_hdr) . This is
needed for large binaries where offsets exceed the 32-bit range,

The sdata8 encoding has been tested on an executable
generated by lld patched with
https://github.com/llvm/llvm-project/pull/179089

```
 % cat a.cc
  #include <stdio.h>
int main() { try { throw 1; } catch (...) { puts("a"); } }
% cat a.lds
SECTIONS
{
  . = SIZEOF_HEADERS;


    [39 lines not shown]
DeltaFile
+112-0llvm/test/tools/llvm-readobj/ELF/unwind-sdata8.test
+33-0llvm/test/tools/llvm-readobj/ELF/unwind.test
+20-7llvm/tools/llvm-readobj/DwarfCFIEHPrinter.h
+165-73 files

LLVM/project 4fdb10bclang/include/clang/Frontend CompilerInstance.h, clang/lib/Frontend CompilerInstance.cpp

[clang][modules] Allow specifying thread-safe module cache (#179510)

This PR adds new member to `CompilerInstance::ThreadSafeCloneConfig` to
allow using a different `ModuleCache` instance in the cloned
`CompilerInstance`. This is done so that the original and the clone
can't concurrently work on the same `InMemoryModuleCache`, which is not
thread safe. This will be made use of shortly from the dependency
scanner along with the single-module-parse-mode to compile modules
asynchronously/concurrently.

This also fixes an old comment that incorrectly claimed that
`CompilerInstance`'s constructor is responsible for finalizing
`InMemoryModuleCache` buffers, which is no longer the case.
DeltaFile
+9-5clang/lib/Frontend/CompilerInstance.cpp
+5-1clang/include/clang/Frontend/CompilerInstance.h
+14-62 files

LLVM/project f03da7fllvm/include/llvm/Transforms/InstCombine InstCombiner.h, llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp

AMDGPU: Strip sign bit operations on llvm.amdgcn.trig.preop uses

The instruction ignores the sign bit, so we can find the magnitude source.
The real library use has a fabs input which this avoids.

stripSignOnlyFPOps should probably go directly into PatternMatch in some
form.
DeltaFile
+64-0llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+11-0llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
+0-9llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+5-0llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+80-94 files

LLVM/project b0aea05llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel llvm.amdgcn.struct.tbuffer.load.f16.ll llvm.amdgcn.struct.ptr.tbuffer.load.f16.ll

[AMDGPU][GlobalISel] Add buffer load format D16 RegBankLegalize rules (#179566)

DeltaFile
+5-5llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.tbuffer.load.f16.ll
+4-4llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.ptr.tbuffer.load.f16.ll
+7-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+3-3llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.buffer.load.format.f16.ll
+3-3llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.struct.buffer.load.format.f16.ll
+3-3llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.tbuffer.load.f16.ll
+25-183 files not shown
+31-249 files

LLVM/project 23bf55elldb/test/API/tools/lldb-dap/eventStatistic TestVSCode_eventStatistic.py Makefile, lldb/tools/lldb-dap JSONUtils.cpp JSONUtils.h

[lldb]Send statistics in initialized event (#178978)

Re-attemp landing of old commit:
https://github.com/llvm/llvm-project/commit/7fe3586cda5b683766ec6b6d5ca2d98c2baaf162

Co-authored-by: George Hu <georgehuyubo at gmail.com>
DeltaFile
+76-0lldb/test/API/tools/lldb-dap/eventStatistic/TestVSCode_eventStatistic.py
+17-0lldb/test/API/tools/lldb-dap/eventStatistic/Makefile
+8-0lldb/test/API/tools/lldb-dap/eventStatistic/main.cpp
+6-0lldb/tools/lldb-dap/JSONUtils.cpp
+6-0lldb/tools/lldb-dap/JSONUtils.h
+2-2lldb/tools/lldb-dap/Handler/RequestHandler.h
+115-23 files not shown
+119-29 files

LLVM/project 4a0a205. .mailmap, clang Maintainers.rst AreaTeamMembers.txt

Update my email across the project (#179361)

At the moment, LLVM OSS stuff should go to rnk at llvm.org.
DeltaFile
+2-2clang/Maintainers.rst
+2-2llvm/Maintainers.md
+1-1.mailmap
+1-1clang/AreaTeamMembers.txt
+6-64 files

LLVM/project 7df2bd6llvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor/AMDGPU nofpclass-amdgcn-fract.ll

AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.fract (#179134)

DeltaFile
+66-0llvm/test/Transforms/Attributor/AMDGPU/nofpclass-amdgcn-fract.ll
+17-0llvm/lib/Analysis/ValueTracking.cpp
+83-02 files

LLVM/project e747287llvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor/AMDGPU nofpclass-amdgcn-trig-preop.ll

AMDGPU: Implement computeKnownFPClass for llvm.amdgcn.trig.preop (#179026)

Surprisingly this doesn't consider the special cases, and literally
just extracts the exponent and proceeds as normal.
DeltaFile
+12-0llvm/test/Transforms/Attributor/AMDGPU/nofpclass-amdgcn-trig-preop.ll
+4-0llvm/lib/Analysis/ValueTracking.cpp
+16-02 files

LLVM/project 8a83911llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-intrinsics.ll

AMDGPU: Fix incorrect fold of undef for llvm.amdgcn.trig.preop (#179025)

We were folding undef inputs to qnan which is incorrect. The instruction
never returns nan. Out of bounds segment select will return 0, so fold
undef segment to 0.
DeltaFile
+29-28llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+18-18llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+47-462 files

LLVM/project c61d434llvm/test/CodeGen/NVPTX fence.py fence-nocluster.ll

[NVPTX][NFC] Update fence.py and cmpxchg.py to generate ptxas-sm_XY and ptxas-isa-X.Y checks in RUN lines (#179378)

The cmpxchg-sm*.ll, fence*.ll files were manually updated to include
version checks. Modifying the generator scripts so that they will
correctly generate the version checks.

Fixes the issue raised in
https://github.com/llvm/llvm-project/pull/176078#issuecomment-3792304497
that led to
https://github.com/llvm/llvm-project/commit/acff9fa4dba2e39da73227d835dfd12be434645e.
(Thanks @vvereschaka!)

When I regenerated cmpxchg tests, I ended up overwriting the ptxas-sm
checks, because the generator script does not have them. Added comments
in the tests explaining that they should not be modified manually.
DeltaFile
+7-3llvm/test/CodeGen/NVPTX/fence.py
+4-3llvm/test/CodeGen/NVPTX/fence-nocluster.ll
+3-2llvm/test/CodeGen/NVPTX/cmpxchg.py
+2-1llvm/test/CodeGen/NVPTX/cmpxchg-sm70.ll
+2-1llvm/test/CodeGen/NVPTX/cmpxchg-sm90.ll
+2-1llvm/test/CodeGen/NVPTX/cmpxchg-sm60.ll
+20-111 files not shown
+21-127 files

LLVM/project 8b28f52llvm/lib/Target/SystemZ SystemZFrameLowering.cpp, llvm/test/CodeGen/SystemZ zos-prologue-epilog.ll

[SystemZ][z/OS] Reverse the order of instructions to save and restore CSRs (#179540)

Reverse the order of instructions to save and restore CSRs so
instruction on small numbered reg goes first.
DeltaFile
+30-30llvm/test/CodeGen/SystemZ/zos-prologue-epilog.ll
+7-2llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
+37-322 files

LLVM/project c08b2c7mlir/lib/Dialect/GPU/Transforms EliminateBarriers.cpp, mlir/test/Dialect/GPU barrier-elimination.mlir

[milr][gpu] Make barrier elimination address-space aware (#178101)

Upgrade the barrier eliminiation pass to account for the address spaces
of accessed memory when deciding which barriers to eliminiate. In
particular, a loop that only reads and writes global memory that has a
workgoup-memory-fencing barrier inside of it will now have that barrier
marked for elimiination, as the global memory traffic is not being
synchronized by the barrier.

The pass is also adjusted to ignore barriers whose memory fencing list
is [], as those do not synchronize memory and therefore the logic in
this pass would potentially incorrectly remove them after proving that
fact.

---------

Co-authored-by: Jakub Kuderski <kubakuderski at gmail.com>
DeltaFile
+173-69mlir/lib/Dialect/GPU/Transforms/EliminateBarriers.cpp
+207-0mlir/test/Dialect/GPU/barrier-elimination.mlir
+380-692 files

LLVM/project 10d8070mlir/include/mlir/Dialect/EmitC/Transforms Passes.td Transforms.h, mlir/lib/Dialect/EmitC/Transforms WrapFuncInClass.cpp

[mlir][emitc] Update the `WrapFuncInClassPass` pass (#179184)

Update the `WrapFuncInClassPass` pass so that, by default, the generated
method is named `operator()()` rather than `execute()`. This makes the
pass more generic, instead of catering to specific users expecting an
`execute()` method.

To preserve the original behaviour, add a new pass option to override
the method name: `func-name`. For example:

```bash
  mlir-opt file.mlir -wrap-emitc-func-in-class=func-name=execute
```

Additionally, make a couple of small editorial changes:
* Rename `populateFuncPatterns` to `populateWrapFuncInClass` to make it
    clear that the corresponding pattern is specific to the
    `WrapFuncInClass` pass.
  * Remove `// CHECK: module {` to reduce test noise.

    [2 lines not shown]
DeltaFile
+12-6mlir/lib/Dialect/EmitC/Transforms/WrapFuncInClass.cpp
+9-6mlir/test/Dialect/EmitC/wrap-func-in-class.mlir
+9-2mlir/include/mlir/Dialect/EmitC/Transforms/Passes.td
+5-2mlir/include/mlir/Dialect/EmitC/Transforms/Transforms.h
+35-164 files

LLVM/project 36dadddflang/lib/Lower/OpenMP ClauseProcessor.cpp, flang/test/Lower/OpenMP task-affinity.f90

[Flang][mlir][OpenMP] Add affinity clause to omp.task and Flang lowering (#179003)

- Add MLIR OpenMP affinity clause
- Lower flang task affinity to mlir
- Emit TODO for iterator modifier and update negative test
DeltaFile
+111-0flang/test/Lower/OpenMP/task-affinity.f90
+30-3mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+30-0mlir/test/Dialect/OpenMP/ops.mlir
+23-0mlir/include/mlir/Dialect/OpenMP/OpenMPClauses.td
+17-0flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+10-0mlir/test/Target/LLVMIR/openmp-todo.mlir
+221-35 files not shown
+233-1011 files

LLVM/project 2828ee6llvm/test/CodeGen/X86 peephole.mir

[X86] Fix incorrect SUBREG_TO_REG usage in a MIR test (#179682)

DeltaFile
+2-2llvm/test/CodeGen/X86/peephole.mir
+2-21 files

LLVM/project fc5b1bbflang/include/flang/Optimizer/Passes Pipelines.h, flang/lib/Optimizer/Passes Pipelines.cpp

[flang] Add getFIRToLLVMPassOptions helper function (#179293)

Extract `FIRToLLVMPassOptions` initialization into a helper function,
allowing other code to construct pass options from pipeline
configuration without duplication.

---------

Co-authored-by: Delaram Talaashrafi <dtalaashrafi at rome5.pgi.net>
DeltaFile
+8-2flang/lib/Optimizer/Passes/Pipelines.cpp
+4-0flang/include/flang/Optimizer/Passes/Pipelines.h
+12-22 files

LLVM/project ca30ef8libc/src/__support/math exp.h expm1.h

[libc][math] Resolve size issues on baremetal and cleanup code.
DeltaFile
+21-20libc/src/__support/math/exp.h
+17-18libc/src/__support/math/expm1.h
+16-18libc/src/__support/math/exp10.h
+24-9libc/src/__support/math/sincosf_utils.h
+14-13libc/src/__support/math/acosf.h
+12-14libc/src/__support/math/exp2.h
+104-92110 files not shown
+336-322116 files

LLVM/project 279600alibcxx/test/benchmarks/format formatter_int.bench.cpp

[libc++] Refactor formatter_int.bench.cpp to not use CartesianProduct (#179483)

The CartesianProduct machinery is incredibly expensive and makes it
trivial to add significant amounts of benchmarks which may not actually
serve much of a purpose. This patch doesn't remove any of the actual
benchmarks, but explicitly lists the benchmarks previous generated via
the CartesianProduct machinery. Still, the benchmarks run ~2x faster.

Fixes #178458
DeltaFile
+78-143libcxx/test/benchmarks/format/formatter_int.bench.cpp
+78-1431 files

LLVM/project 569a9b4llvm/test/Transforms/SLPVectorizer/X86 shl-to-add-transformation3.ll

[SLP][NFC]Add another shl-to-add transformation test, NFC
DeltaFile
+56-0llvm/test/Transforms/SLPVectorizer/X86/shl-to-add-transformation3.ll
+56-01 files

LLVM/project 3c0e326llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 add-i512.ll sub-i512.ll

[X86] Lower i512 ADD/SUB using Kogge-Stone on AVX512 (#174761)

Closes #173996
DeltaFile
+228-39llvm/test/CodeGen/X86/add-i512.ll
+228-39llvm/test/CodeGen/X86/sub-i512.ll
+73-2llvm/lib/Target/X86/X86ISelLowering.cpp
+529-803 files

LLVM/project 6932f1fmlir/include/mlir/IR BuiltinTypeInterfaces.td, mlir/lib/AsmParser AttributeParser.cpp

[mlir][WIP] `DenseElementsAttr` generalized
DeltaFile
+145-4mlir/lib/AsmParser/AttributeParser.cpp
+57-15mlir/lib/IR/AsmPrinter.cpp
+58-0mlir/include/mlir/IR/BuiltinTypeInterfaces.td
+28-0mlir/test/lib/Dialect/Test/TestTypes.cpp
+28-0mlir/test/IR/dense-elements-type-interface.mlir
+14-0mlir/test/lib/Dialect/Test/TestTypeDefs.td
+330-193 files not shown
+336-199 files

LLVM/project bc80d1allvm/lib/Target/SystemZ SystemZFrameLowering.cpp, llvm/test/CodeGen/SystemZ zos-prologue-epilog.ll

[SystemZ][z/OS] Set R5 as not restored. (#179666)

R5 (environment register) should not be restored. This is missing in the
code.
Add it back and also add a test to verify it.
DeltaFile
+25-4llvm/test/CodeGen/SystemZ/zos-prologue-epilog.ll
+3-1llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
+28-52 files