LLVM/project ffe446ellvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV/rvv vp-combine-store-reverse.ll vp-combine-reverse-load.ll

[RISCV] Relax reversed mask's mask requirement in reverse to strided load/store combine (#180706)

We have combines for vp.reverse(vp.load) -> vp.strided.load stride=-1
and vp.store(vp.reverse) -> vp.strided.store stride=-1.

If the load or store is masked, the mask needs to be also a vp.reverse
with the same EVL. However we also have the requirement that the mask's
vp.reverse is unmasked (has an all-ones mask).

vp.reverse's mask only sets masked off lanes to poison, and doesn't
affect the permutation of elements. So given those lanes are poison, I
believe the combine is valid for any mask, not just all ones.

This is split off from another patch I plan on posting to generalize
those combines to vector.splice+vector.reverse patterns, as part of
#172961
DeltaFile
+10-9llvm/test/CodeGen/RISCV/rvv/vp-combine-store-reverse.ll
+16-0llvm/test/CodeGen/RISCV/rvv/vp-combine-reverse-load.ll
+2-4llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+28-133 files

LLVM/project 9b043ccmlir/test/CAPI rewrite.c

[MLIR] Fix mismatched format specifier warning (#180792)

DeltaFile
+3-2mlir/test/CAPI/rewrite.c
+3-21 files

LLVM/project 1e42c76llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/lib/Target/Mips MipsISelLowering.cpp

[Mips] Fix cttz.i32 fails to lower on mips16 (#179633)

MIPS16 cannot handle constant pools created by CTTZ table lookup
expansion. This causes "Cannot select" errors when trying to select
MipsISD::Lo nodes for constant pool addresses.
    
Modify the table lookup conditions to check ConstantPool operation
status, and only set ConstantPool to Custom in non-MIPS16 mode in MIPS
backend.
    
This ensures MIPS16 uses the ISD::CTPOP instead of attempting
unsupported constant pool operations.

Fix #61055.
DeltaFile
+61-0llvm/test/CodeGen/Mips/GlobalISel/llvm-ir/cttz-mips16.ll
+5-1llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+4-2llvm/lib/Target/Mips/MipsISelLowering.cpp
+70-33 files

LLVM/project ee92a9elldb/source/Plugins/Process/MacOSX-Kernel ProcessKDPProperties.td

[LLDB] Fix KDP plugin path (#180897)

This should fix a failure on the macOS buildbots (see
https://github.com/llvm/llvm-project/pull/179524#issuecomment-3882784085).
I can't test this, but the only plugins not enabled on Linux and Windows
are `ProcessKDP`, `PlatformDarwin`, and `PlatformDarwinKernel`. Looking
at the path for KDP, it uses `GetPluginNameStatic` as the last name in
the path. This is `kdp-remote` instead of `kdp`.
DeltaFile
+1-1lldb/source/Plugins/Process/MacOSX-Kernel/ProcessKDPProperties.td
+1-11 files

LLVM/project 7353ca7llvm/include/llvm/TableGen TableGenBackend.h, llvm/lib/TableGen Main.cpp TableGenBackend.cpp

[NFC][TableGen] Use std::move to avoid copy (#180785)

DeltaFile
+1-1llvm/include/llvm/TableGen/TableGenBackend.h
+1-1llvm/lib/TableGen/Main.cpp
+1-1llvm/lib/TableGen/TableGenBackend.cpp
+3-33 files

LLVM/project 17a9170llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-sqrt.ll

InstCombine: Fix wrong insert point for sqrt -> copysign simplify (#180838)

DeltaFile
+12-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-sqrt.ll
+2-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+14-02 files

LLVM/project bd4fe78llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-fmul.ll

InstCombine: Fix wrong insert point for various fmul->copysign simplifies (#180840)

DeltaFile
+76-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-fmul.ll
+18-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+94-02 files

LLVM/project 8503cb6llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-fdiv.ll

InstCombine: Fix wrong insert point for fdiv->copysign simplify (#180839)

DeltaFile
+24-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-fdiv.ll
+6-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+30-02 files

LLVM/project 1f05778llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-rounding-intrinsics.ll

InstCombine: Fix insert point for rounding intrinsic -> copysign (#180837)

This would use the wrong insert point if reached in a recursive
call.
DeltaFile
+12-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-rounding-intrinsics.ll
+3-0llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+15-02 files

LLVM/project ad1e107llvm/lib/Target/PowerPC PPCInstrInfo.td

[PowerPC] Require PPC32 for 32-bit addc/adde/subc/sube (#179186)

Unlike the base add/sub opcodes which will just overflow, these will
produce incorrect results, because the carry operates on the full
64-bits. Trying to use these with i32 operands on PPC64 should result in
a selection failure instead of a silent miscompile, like the one seen in
https://github.com/llvm/llvm-project/pull/178979.
DeltaFile
+17-10llvm/lib/Target/PowerPC/PPCInstrInfo.td
+17-101 files

LLVM/project 680124cllvm/lib/Option Option.cpp

[Option] Fix param name mismatch & coding style (NFC) (#180746)

Align parameter names between the declaration and
definition in Option.cpp.
Also, update variable names to adhere to the LLVM
Coding Standards regarding casing.
See
https://github.com/llvm/llvm-project/blob/6c0ff8d12fe5b7d1d55098ca31dac56d8925bf7b/llvm/include/llvm/Option/Option.h#L242.

However the defination is:

https://github.com/llvm/llvm-project/blob/6c0ff8d12fe5b7d1d55098ca31dac56d8925bf7b/llvm/lib/Option/Option.cpp#L112-L114
DeltaFile
+14-14llvm/lib/Option/Option.cpp
+14-141 files

LLVM/project b2444d0llvm/test/CodeGen/AArch64 branch-cond-split-fcmp.ll, llvm/test/CodeGen/Thumb2 arm_canberra_distance_f32.ll

[AArch64][ARM] Add some tests for fcmp or branches. NFC
DeltaFile
+425-0llvm/test/CodeGen/AArch64/branch-cond-split-fcmp.ll
+88-0llvm/test/CodeGen/Thumb2/arm_canberra_distance_f32.ll
+513-02 files

LLVM/project 8b9fd48offload CMakeLists.txt, offload/plugins-nextgen/host CMakeLists.txt

[OFFLOAD] Support host plugin on Windows (#180401)

Changes to make host plugin compile on Windows:
* Change IO code to be portable
* Adjust Makefiles

Allow plugin to work partially when libffi support is not found
dynamically (compilation works fine even on Windows because of the
wrapper support).
DeltaFile
+48-32offload/plugins-nextgen/host/src/rtl.cpp
+0-6offload/plugins-nextgen/host/CMakeLists.txt
+0-5offload/CMakeLists.txt
+48-433 files

LLVM/project 6c51938mlir/test/lib/Dialect/Test TestOpsSyntax.td, mlir/test/mlir-tblgen op-format.mlir

[MLIR] Guard optional operand resolution in generated op parsers (#180796)

Skip resolveOperands for optional operands when they are absent to
avoid out-of-bounds access on the empty types vector.
DeltaFile
+11-0mlir/test/lib/Dialect/Test/TestOpsSyntax.td
+6-0mlir/test/mlir-tblgen/op-format.mlir
+6-0mlir/tools/mlir-tblgen/OpFormatGen.cpp
+23-03 files

LLVM/project 128437fclang/test/CodeGenOpenCL builtins-amdgcn-asyncmark.cl, llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

[AMDGPU] Introduce asyncmark/wait intrinsics (#180467)

Asynchronous operations are memory transfers (usually between the global
memory and LDS) that are completed independently at an unspecified
scope. A thread that requests one or more asynchronous transfers can use
async marks to track their completion. The thread waits for each mark to
be completed, which indicates that requests initiated in program order
before this mark have also completed.

For now, we implement asyncmark/wait operations on pre-GFX12
architectures that support "LDS DMA" operations. Future work will extend
support to GFX12Plus architectures that support "true" async operations.

This is part of a stack split out from #173259
- #180467
- #180466

Co-authored-by: Ryan Mitchell ryan.mitchell at amd.com

Fixes: SWDEV-521121
DeltaFile
+268-12llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+279-0llvm/test/CodeGen/AMDGPU/asyncmark-max-pregfx12.ll
+194-75llvm/test/CodeGen/AMDGPU/asyncmark-pregfx12.ll
+38-16llvm/test/CodeGen/AMDGPU/async-buffer-loads.ll
+19-0llvm/test/CodeGen/AMDGPU/asyncmark-err.ll
+16-0clang/test/CodeGenOpenCL/builtins-amdgcn-asyncmark.cl
+814-1036 files not shown
+870-10512 files

LLVM/project e0285e4libcxx/test/std/localization/locale.categories/category.time/locale.time.get.byname get_weekday_wide.pass.cpp get_one_wide.pass.cpp, libcxx/test/std/localization/locale.categories/category.time/locale.time.get/locale.time.get.members get_time_wide.pass.cpp

[libc++][test] Include `<ios>` and `<ctime>` in tests for `time` locale facets (#179986)

Add inclusion of `<ios>` and `<ctime>` to ensure that the definitions of `std::basic_ios` and `std::tm` are available.

As a drive-by fix, change uses of `tm` to `std::tm`. The latter is guaranteed to be available in `<ctime>`, but the former isn't.
DeltaFile
+5-2libcxx/test/std/localization/locale.categories/category.time/locale.time.put/locale.time.put.members/put2.pass.cpp
+5-2libcxx/test/std/localization/locale.categories/category.time/locale.time.put.byname/put1.pass.cpp
+4-2libcxx/test/std/localization/locale.categories/category.time/locale.time.put/locale.time.put.members/put1.pass.cpp
+4-1libcxx/test/std/localization/locale.categories/category.time/locale.time.get.byname/get_weekday_wide.pass.cpp
+4-1libcxx/test/std/localization/locale.categories/category.time/locale.time.get/locale.time.get.members/get_time_wide.pass.cpp
+4-1libcxx/test/std/localization/locale.categories/category.time/locale.time.get.byname/get_one_wide.pass.cpp
+26-912 files not shown
+50-1318 files

LLVM/project aa1c310clang/include/clang/CIR/Dialect/IR CIRAttrs.td, clang/lib/CIR/CodeGen CIRGenDeclCXX.cpp CIRGenDecl.cpp

[CIR] Add CIRGen support for static local variables with non-constant initializers

This adds CIRGen infrastructure for C++ function-local static variables
that require guarded initialization (Itanium C++ ABI).

Changes:
- Add ASTVarDeclAttr to carry VarDecl AST through the pipeline
- Add emitGuardedInit() to CIRGenCXXABI for guarded initialization
- Add emitCXXGuardedInit() to CIRGenFunction
- Replace NYI in addInitializerToStaticVarDecl() with ctor region emission
- Set static_local attribute on GlobalOp and GetGlobalOp

The global's ctor region contains the initialization code, which will be
lowered by LoweringPrepare to emit the actual guard variable pattern with
__cxa_guard_acquire/__cxa_guard_release calls.
DeltaFile
+41-0clang/include/clang/CIR/Dialect/IR/CIRAttrs.td
+30-0clang/test/CIR/CodeGen/static-local.cpp
+18-0clang/lib/CIR/CodeGen/CIRGenDeclCXX.cpp
+13-3clang/lib/CIR/CodeGen/CIRGenDecl.cpp
+5-0clang/lib/CIR/CodeGen/CIRGenFunction.h
+2-1clang/lib/CIR/CodeGen/CIRGenCXX.cpp
+109-45 files not shown
+116-811 files

LLVM/project c8e7c3allvm/utils/gn/secondary/llvm/lib/Target/X86 BUILD.gn

[gn build] Port 70b96befd832
DeltaFile
+1-1llvm/utils/gn/secondary/llvm/lib/Target/X86/BUILD.gn
+1-11 files

LLVM/project 70b96bellvm/lib/Target/X86 X86InsertX87Wait.cpp X86InsertWait.cpp, llvm/test/CodeGen/X86 llc-pipeline-npm.ll

[NewPM] Port x86-insert-x87-wait (#180128)

Similar to other portings created by @aidenboom154. No specific test
coverage as there are no MIR->MIR tests that exercise this pass. Going
with other naming conventions, I renamed WaitInsert to
X86InsertX87WaitLegacy
DeltaFile
+142-0llvm/lib/Target/X86/X86InsertX87Wait.cpp
+0-130llvm/lib/Target/X86/X86InsertWait.cpp
+6-1llvm/lib/Target/X86/X86.h
+4-0llvm/test/CodeGen/X86/llc-pipeline-npm.ll
+1-1llvm/lib/Target/X86/X86PassRegistry.def
+1-1llvm/lib/Target/X86/X86TargetMachine.cpp
+154-1332 files not shown
+156-1358 files

LLVM/project 8a00fd0mlir/lib/Dialect/Affine/Analysis Utils.cpp, mlir/test/Dialect/Affine loop-fusion-4.mlir

[MLIR][Affine] Remove restriction in slice validity check on symbols (#180709)

Remove restriction in affine analysis utility for checking slice
validity. This was unnecessarily bailing out still after the underlying
methods were extended. This update enables fusion of affine nests with
symbolic bounds.

Fixes: https://github.com/llvm/llvm-project/issues/61784

Based on and revived from https://reviews.llvm.org/D148559 from
@anoopjs.
DeltaFile
+59-0mlir/test/Dialect/Affine/loop-fusion-4.mlir
+2-6mlir/lib/Dialect/Affine/Analysis/Utils.cpp
+61-62 files

LLVM/project 4b33d45flang/include/flang/Semantics symbol.h, flang/lib/Semantics check-omp-structure.cpp

[Flang][OpenMP] Fix visibility of user-defined reductions for derived types and module imports (#180552)

User-defined reductions declared in a module were not visible to
programs that imported the module via USE statements, causing valid code
to be incorrectly rejected. The reduction identifier defined in the
module scope wasn't being found during semantic analysis of the main
program.

Ref:
OpenMP Spec 5.1 
_"If a directive appears in the specification part of a module then the
behavior is as if that directive,
with the variables, types and procedures that have PRIVATE accessibility
omitted, appears in the
specification part of any compilation unit that references the module
unless otherwise specified "_

Fixes :
[https://github.com/llvm/llvm-project/issues/176279](https://github.com/llvm/llvm-project/issues/176279)

Co-authored-by: Chandra Ghale <ghale at pe31.hpc.amslabs.hpecorp.net>
DeltaFile
+30-0flang/test/Semantics/OpenMP/declare-reduction-derived-module.f90
+16-0flang/include/flang/Semantics/symbol.h
+14-0flang/lib/Semantics/check-omp-structure.cpp
+60-03 files

LLVM/project 7995fc0clang/test/CodeGenOpenCL builtins-amdgcn-asyncmark.cl, llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

[AMDGPU] Introduce asyncmark/wait intrinsics

Asynchronous operations are memory transfers (usually between the global memory
and LDS) that are completed independently at an unspecified scope. A thread that
requests one or more asynchronous transfers can use async marks to track their
completion. The thread waits for each mark to be completed, which indicates that
requests initiated in program order before this mark have also completed.

For now, we implement asyncmark/wait operations on pre-GFX12 architectures that
support "LDS DMA" operations. Future work will extend support to GFX12Plus
architectures that support "true" async operations.

Co-authored-by: Ryan Mitchell ryan.mitchell at amd.com

Fixes: SWDEV-521121
DeltaFile
+268-12llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+279-0llvm/test/CodeGen/AMDGPU/asyncmark-max-pregfx12.ll
+194-75llvm/test/CodeGen/AMDGPU/asyncmark-pregfx12.ll
+38-16llvm/test/CodeGen/AMDGPU/async-buffer-loads.ll
+19-0llvm/test/CodeGen/AMDGPU/asyncmark-err.ll
+16-0clang/test/CodeGenOpenCL/builtins-amdgcn-asyncmark.cl
+814-1037 files not shown
+874-10913 files

LLVM/project f63477fllvm/lib/Target/AMDGPU SIInstrInfo.h

don't move usesLGKM_CNT()
DeltaFile
+4-4llvm/lib/Target/AMDGPU/SIInstrInfo.h
+4-41 files

LLVM/project 67acd02clang/docs ReleaseNotes.rst

Release Note
DeltaFile
+2-0clang/docs/ReleaseNotes.rst
+2-01 files

LLVM/project 186a2bfclang/include/clang/Basic CodeGenOptions.def, clang/include/clang/Options Options.td

[clang] Ensure -mno-outline adds attributes

Before this change, `-mno-outline` and `-moutline` only controlled the
pass pipelines for the invoked compiler/linker.

The drawback of this implementation is that, when using LTO, only the
flag provided to the linker invocation is honoured (and any files which
individually use `-mno-outline` will have that flag ignored).

This change serialises the `-mno-outline` flag into each function's
IR/Bitcode, so that we can correctly disable outlining from functions in
files which disabled outlining, without affecting outlining choices for
functions from other files. This matches how other optimisation flags
are handled so the IR/Bitcode can be correctly merged during LTO.
DeltaFile
+7-10clang/include/clang/Options/Options.td
+14-3clang/test/CodeGen/attr-no-outline.c
+6-5clang/lib/Driver/ToolChains/CommonArgs.cpp
+3-1clang/lib/CodeGen/CodeGenModule.cpp
+3-0clang/include/clang/Basic/CodeGenOptions.def
+1-1clang/test/Driver/aarch64-outliner.c
+34-203 files not shown
+37-239 files

LLVM/project 2aa680eclang/docs ReleaseNotes.rst, clang/test/Sema attr-nooutline.cpp attr-nooutline.c

Address Review Feedback: Flags, Release Notes
DeltaFile
+3-3clang/docs/ReleaseNotes.rst
+1-1clang/test/Sema/attr-nooutline.cpp
+1-1clang/test/Sema/attr-nooutline.c
+5-53 files

LLVM/project 3876629clang/include/clang/Basic AttrDocs.td, clang/test/CodeGen attr-no-outline.c attr-nooutline.c

Change spelling to clang::no_outline, more tests
DeltaFile
+107-0clang/test/CodeGen/attr-no-outline.c
+40-0clang/test/CodeGenObjC/attr-no-outline.m
+0-25clang/test/CodeGen/attr-nooutline.c
+0-18clang/include/clang/Basic/AttrDocs.td
+0-7clang/test/Sema/attr-nooutline.c
+0-7clang/test/Sema/attr-nooutline.cpp
+147-575 files not shown
+166-6211 files

LLVM/project 99d640dclang/docs ReleaseNotes.rst

Release Notes
DeltaFile
+3-0clang/docs/ReleaseNotes.rst
+3-01 files

LLVM/project c20d4c5clang/include/clang/Basic AttrDocs.td Attr.td, clang/test/CodeGen attr-nooutline.c

Address reviewer feedback: Tests, Docs, TableGen
DeltaFile
+30-0clang/include/clang/Basic/AttrDocs.td
+19-10clang/test/CodeGen/attr-nooutline.c
+2-2clang/include/clang/Basic/Attr.td
+51-123 files

LLVM/project 5e156d4clang/include/clang/Basic Attr.td, clang/lib/CodeGen CodeGenModule.cpp

[clang] Add clang::nooutline Attribute

This change:
- Adds a `[[clang::nooutline]]` function attribute for C and C++. There
  is no equivalent GNU syntax for this attribute, so no `__attribute__`
  syntax.
- Uses the presence of `[[clang::nooutline]]` to add the `nooutline`
  attribute to IR function definitions.
- Adds test for the above.

The `nooutline` attribute disables both the Machine Outliner (enabled at
Oz for some targets), and the IR Outliner (disabled by default).
DeltaFile
+16-0clang/test/CodeGen/attr-nooutline.c
+7-0clang/test/Sema/attr-nooutline.c
+7-0clang/test/Sema/attr-nooutline.cpp
+7-0clang/include/clang/Basic/Attr.td
+3-0clang/lib/CodeGen/CodeGenModule.cpp
+1-0clang/test/Misc/pragma-attribute-supported-attributes-list.test
+41-06 files