LLVM/project cacf225llvm/docs/PDB CodeViewSymbols.rst, llvm/include/llvm/DebugInfo/CodeView SymbolRecord.h

[LLVM][CodeView] Add `S_REGREL32_INDIR` (#183172)

This adds `RegRelativeIndirSym` (`S_REGREL32_INDIR`) as a record, so we
can emit and dump it (#34392). It encodes a variable at the location
`*($Register+ Offset) + OffsetInUdt` and is used by MSVC in C++ 20
coroutines and C++ 17 structured bindings. Clang also needs this for
coroutines (for `__promise` which has the location `DW_OP_deref,
DW_OP_plus_uconst, 16`).

For example:

```cpp
struct Foo { int a, b; };

void fn() {
  Foo f = {1, 2};
  //  ╰─ S_REGREL32{ reg = rsp, offset = 0 }
  auto &[x, y] = f;
  //     │  ╰─ S_REGREL32_INDIR{ reg = rsp, offset = 8, offset-in-udt = 4, type = int }

    [17 lines not shown]
DeltaFile
+52-0llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
+36-0llvm/docs/PDB/CodeViewSymbols.rst
+20-0llvm/include/llvm/DebugInfo/CodeView/SymbolRecord.h
+11-0llvm/lib/DebugInfo/CodeView/SymbolRecordMapping.cpp
+11-0llvm/lib/DebugInfo/CodeView/SymbolDumper.cpp
+11-0llvm/tools/llvm-pdbutil/MinimalSymbolDumper.cpp
+141-08 files not shown
+177-214 files

LLVM/project 5e88771clang/test/CodeGen arm-bf16-getset-intrinsics.c, clang/test/CodeGen/AArch64 bf16-getset-intrinsics.c

[Clang][AArch64] Remove duplicate CodeGen test for bf16 get/set intrinsics (#186084)

The following test files contain identical test bodies (aside from the
RUN lines):

  * clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c
  * clang/test/CodeGen/arm-bf16-getset-intrinsics.c

The differences in the RUN lines do not appear to be relevant for the
tested functionality. This change keeps a single test file and
simplifies its RUN lines to match the generic style used in
clang/test/CodeGen/AArch64/neon.

This also moves toward unifying and reusing RUN lines across tests.
DeltaFile
+0-175clang/test/CodeGen/arm-bf16-getset-intrinsics.c
+1-2clang/test/CodeGen/AArch64/bf16-getset-intrinsics.c
+1-1772 files

LLVM/project 93b4720clang/lib/CodeGen/Targets AMDGPU.cpp

[AMDGPU] Address post-commit review of #177343 (#186064)

- OpenCL has no cluster scope so "cluster-one-as" does not exist and
cannot be emitted.
DeltaFile
+2-1clang/lib/CodeGen/Targets/AMDGPU.cpp
+2-11 files

LLVM/project 78d8a94flang/include/flang/Semantics openmp-utils.h

Fix merge error
DeltaFile
+0-8flang/include/flang/Semantics/openmp-utils.h
+0-81 files

LLVM/project ad81f7bllvm/docs AMDGPUUsage.rst

Address comments
DeltaFile
+50-54llvm/docs/AMDGPUUsage.rst
+50-541 files

LLVM/project 2f573acclang/test/CodeGen scoped-atomic-ops.c, llvm/test/CodeGen/AArch64 clmul-fixed.ll

Merge branch 'main' into users/kparzysz/e06-sequence-class
DeltaFile
+853-1,663llvm/test/CodeGen/AArch64/clmul-fixed.ll
+927-1,424llvm/test/tools/dsymutil/AArch64/stmt-seq-macho.test
+706-1,470llvm/test/CodeGen/X86/funnel-shift-i512.ll
+1,769-0llvm/test/CodeGen/X86/vector-mul-i8-decompose.ll
+1,189-529llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pkrtz.ll
+1,419-130clang/test/CodeGen/scoped-atomic-ops.c
+6,863-5,2162,631 files not shown
+112,605-40,6552,637 files

LLVM/project 1cd094fopenmp/tools/omptest CMakeLists.txt

Revert "[OpenMP][OMPT] Remove Threads dependency from omptest" (#186111)

Reverts llvm/llvm-project#185930

Breaks various buildbots
DeltaFile
+4-0openmp/tools/omptest/CMakeLists.txt
+4-01 files

LLVM/project e78c797llvm/lib/Target/AMDGPU GCNVOPDUtils.cpp, llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp

[AMDGPU] Allow bank conflicts on src0 for V_DUAL_MOV_B32 for gfx1170 (#186100)
DeltaFile
+147-339llvm/test/CodeGen/AMDGPU/wmma-gfx12-w32-imm.ll
+238-112llvm/test/CodeGen/AMDGPU/vopd-combine.mir
+8-0llvm/test/MC/AMDGPU/gfx1170_asm_features.s
+2-2llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+2-1llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+397-4545 files

LLVM/project ef7301cclang/lib/Sema SemaHLSL.cpp HLSLBuiltinTypeDeclBuilder.cpp, clang/test/CodeGenHLSL/resources Texture2D-Subscript.hlsl

[HLSL] Implement Texture2D::operator[]

Implments the Texture2D::operator[] method. It uses the same design as
Buffer::operator[]. However, this requires us to chagne the
resource_getpointer intrinsic to accept integer vectors for the index.

Assisted-by: Gemini
DeltaFile
+74-0clang/test/CodeGenHLSL/resources/Texture2D-Subscript.hlsl
+40-4clang/lib/Sema/SemaHLSL.cpp
+27-1clang/test/SemaHLSL/BuiltIns/resource_getpointer-errors.hlsl
+21-7clang/lib/Sema/HLSLBuiltinTypeDeclBuilder.cpp
+14-14llvm/test/Transforms/SimplifyCFG/DirectX/no-sink-dxgetpointer.ll
+12-12llvm/test/Transforms/GVN/no-sink-dxgetpointer.ll
+188-3818 files not shown
+312-10024 files

LLVM/project ef8db55flang/lib/Semantics openmp-utils.cpp

[flang][OpenMP] Implement checks of intervening code (#185295)

Invalid intervening code will cause the containing loop to be the final
loop in the loop nest. Transparent intervening code will not affect
perfect nesting if present. Currently compiler directives are considered
transparent to allow code mixing OpenMP and such directives to compile.

Issue: https://github.com/llvm/llvm-project/issues/185287
DeltaFile
+153-2flang/lib/Semantics/openmp-utils.cpp
+153-21 files

LLVM/project 7e39b52clang/test/CodeGen scoped-atomic-ops.c, llvm/test/CodeGen/AMDGPU llvm.amdgcn.cvt.pkrtz.ll maximumnum.ll

Merge branch 'main' into users/kasuga-fj/da-fix-signature-of-weak-zero-siv
DeltaFile
+927-1,424llvm/test/tools/dsymutil/AArch64/stmt-seq-macho.test
+706-1,470llvm/test/CodeGen/X86/funnel-shift-i512.ll
+1,769-0llvm/test/CodeGen/X86/vector-mul-i8-decompose.ll
+1,189-529llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pkrtz.ll
+1,419-130clang/test/CodeGen/scoped-atomic-ops.c
+1,240-0llvm/test/CodeGen/AMDGPU/maximumnum.ll
+7,250-3,5531,053 files not shown
+43,288-15,6521,059 files

LLVM/project fb0cb77llvm/lib/Transforms/Vectorize VPlanPredicator.cpp VPlanDominatorTree.h, llvm/test/Transforms/LoopVectorize if-pred-stores.ll hoist-predicated-loads-with-predicated-stores.ll

[VPlan] Simplify the computation of the block entry mask. (#173265)

When encountering a control-flow join, VPPredicator emit a disjunction
over the incoming edge masks as the entry mask of the joining block.
However, such a complex mask is not always necessary. If the block is
control-flow equivalent to the header block, we can directly use the
header block’s entry mask as the entry mask of that block.

This patch introduces a VPlan post-dominator tree to determine whether a
block is control-flow equivalent to the header block, and simplifies the
computation of block masks accordingly.

Based on #178724
DeltaFile
+18-23llvm/test/Transforms/LoopVectorize/VPlan/predicator.ll
+10-10llvm/test/Transforms/LoopVectorize/if-pred-stores.ll
+17-2llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+6-10llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-complex-mask.ll
+6-6llvm/test/Transforms/LoopVectorize/hoist-predicated-loads-with-predicated-stores.ll
+9-0llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
+66-511 files not shown
+69-557 files

LLVM/project 2c4f4efllvm/test/CodeGen/AMDGPU coalesce-copy-to-agpr-to-av-registers.mir

[AMDGPU] Fix missing "---" in MIR test. NFCI. (#186097)

The only problem this caused was confusing the update script so that it
failed to update checks in the following function.
DeltaFile
+1-0llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir
+1-01 files

LLVM/project 9a81d71libc/cmake/modules prepare_libc_gpu_build.cmake

[libc] Use the proper name for the 'llvm-gpu-loader' (#186101)

Summary:
This used to be two separate executables but was merged awhile back. The
LLVM libc code was never updated to use the new tool name and a recent
refactoring unintentionally removed the symlinks. Just look for
`llvm-gpu-loader`.
DeltaFile
+3-13libc/cmake/modules/prepare_libc_gpu_build.cmake
+3-131 files

LLVM/project 3a8c16fllvm/test/CodeGen/AMDGPU memset-pattern.ll

Add AS7 tests
DeltaFile
+1,066-0llvm/test/CodeGen/AMDGPU/memset-pattern.ll
+1,066-01 files

LLVM/project 90d8edfllvm/lib/Transforms/Utils LowerMemIntrinsics.cpp, llvm/test/CodeGen/AMDGPU memset-pattern.ll lower-buffer-fat-pointers-mem-transfer.ll

[LowerMemIntrinsics][AMDGPU] Optimize memset.pattern lowering

This patch changes the lowering of the [experimental.memset.pattern intrinsic](https://llvm.org/docs/LangRef.html#llvm-experimental-memset-pattern-intrinsic)
to match the optimized memset and memcpy lowering when possible. (The tl;dr of
memset.pattern is that it is like memset, except that you can use it to set
values that are wider than a single byte.)

The memset.pattern lowering now queries `TTI::getMemcpyLoopLoweringType` for a
preferred memory access type. If the size of that type is a multiple of the set
value's type, and if both types have consistent store and alloc sizes (since
memset.pattern behaves in a way that is not well suitable for access widening
if store and alloc size differ), the memset.pattern is lowered into two loops:
a main loop that stores a sufficiently wide vector splat of the SetValue with
the preferred memory access type and a residual loop that covers the remaining
set values individually.

In contrast to the memset lowering, this patch doesn't include a specialized
lowering for residual loops with known constant lengths. Loops that are
statically known to be unreachable will not be emitted.

    [7 lines not shown]
DeltaFile
+745-0llvm/test/CodeGen/AMDGPU/memset-pattern.ll
+273-0llvm/test/Transforms/PreISelIntrinsicLowering/AMDGPU/memset-pattern.ll
+105-56llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+104-30llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-mem-transfer.ll
+31-31llvm/test/CodeGen/RISCV/memset-pattern.ll
+14-14llvm/test/Transforms/PreISelIntrinsicLowering/RISCV/memset-pattern.ll
+1,272-1315 files not shown
+1,297-14511 files

LLVM/project 95bc1abllvm/lib/Transforms/Utils LowerMemIntrinsics.cpp, llvm/test/Transforms/PreISelIntrinsicLowering/X86 memcpy-inline-non-constant-len.ll memset-inline-non-constant-len.ll

[LowerMemIntrinsics] Avoid emitting unreachable loops in insertLoopExpansion

This patch refactors insertLoopExpansion and allows it to skip loops that are
statically known to be unreachable and make conditional branches with a
statically known condition unconditional. Those situations arise when the loop
count is a known constant.

These cases don't occur at the existing call sites in the memcpy and memset
lowering, since they have custom handling for constant loop sizes anyway. They
will however occur in a follow-up patch that uses insertLoopExpansion for
memset.pattern, where similar custom handling for constant loop sizes would
make less sense.

This is mostly NFC with the current use except for slight changes in the branch
weight computation from profiling data (which causes the included test
changes).
DeltaFile
+194-85llvm/lib/Transforms/Utils/LowerMemIntrinsics.cpp
+5-5llvm/test/Transforms/PreISelIntrinsicLowering/X86/memcpy-inline-non-constant-len.ll
+3-3llvm/test/Transforms/PreISelIntrinsicLowering/X86/memset-inline-non-constant-len.ll
+202-933 files

LLVM/project bae8b84llvm/test/CodeGen/AMDGPU select-nsz-known-values-to-fmin-fmax.ll

AMDGPU: Add more tests for fp min/max combines (#184336)

There's some overlap with existing tests which
use the nnan flag. The vector cases get missed here.
DeltaFile
+956-0llvm/test/CodeGen/AMDGPU/select-nsz-known-values-to-fmin-fmax.ll
+956-01 files

LLVM/project 5f97e19openmp/tools/omptest CMakeLists.txt

[OpenMP][OMPT] Remove Threads dependency from omptest (#185930)

Removed link against `Threads`.
Reason: it is potentially problematic and optional.

The issue would manifest, if `omptest` is used via `find_package`.
But `Threads` might not be found and cause a link error.
DeltaFile
+0-4openmp/tools/omptest/CMakeLists.txt
+0-41 files

LLVM/project 754abc1clang/include/clang/AST OpenMPClause.h, clang/lib/AST OpenMPClause.cpp

[OpenMP] Add variable capture support for transparent clause expression. (#185419)

This patch extends the `transparent` clause implementation to properly
handle runtime variable expressions as the `impex-type` argument, as
required by the OpenMP specification:
`"The use of a variable in an impex-type expression causes an implicit
reference to the variable in all enclosing constructs. The impex-type
expression is evaluated in the context outside of the construct on which
the clause appears."`
DeltaFile
+222-71clang/test/OpenMP/task_transparent_serialization.cpp
+142-101clang/test/OpenMP/taskloop_codegen.cpp
+33-15clang/lib/Sema/SemaOpenMP.cpp
+13-8clang/include/clang/AST/OpenMPClause.h
+4-4clang/test/OpenMP/task_transparent_messages.cpp
+2-1clang/lib/AST/OpenMPClause.cpp
+416-2006 files

LLVM/project b2dbefdllvm/lib/Target/AArch64 AArch64SystemOperands.td, llvm/lib/Target/AArch64/AsmParser AArch64AsmParser.cpp

fixup! More simplification
DeltaFile
+415-447llvm/test/MC/AArch64/armv9a-tlbip.s
+1-15llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+7-9llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+1-8llvm/lib/Target/AArch64/AArch64SystemOperands.td
+424-4794 files

LLVM/project bcd8e64llvm/include/llvm/Support FormatProviders.h NativeFormatting.h, llvm/lib/DebugInfo/DWARF DWARFExpressionPrinter.cpp DWARFCFIPrinter.cpp

[llvm][Support] formatv: non-negative-plus for integral numbers (#185008)

The older `format()` allows you to print a `+` sign for non-negative    
integral numbers upon request.                                          
                                                                        
Examples:                                                               
                                                                        
```c++                                                                  
format("%+d", 255); // -> "+255"                                        
format("%+d", -12); // -> "-12"                                         
```                                                                     
                                                                        
This change adds the ability to do the same with `formatv()`:           
                                                                        
```c++                                                                  
formatv("{0:+d}", 255); // -> "+255"                                    
formatv("{0:+d}", -12); // -> "-12"                                     
```                                                                     
                                                                        

    [9 lines not shown]
DeltaFile
+46-4llvm/unittests/Support/FormatVariadicTest.cpp
+24-19llvm/lib/Support/NativeFormatting.cpp
+12-1llvm/include/llvm/Support/FormatProviders.h
+7-6llvm/include/llvm/Support/NativeFormatting.h
+5-4llvm/lib/DebugInfo/DWARF/DWARFExpressionPrinter.cpp
+2-1llvm/lib/DebugInfo/DWARF/DWARFCFIPrinter.cpp
+96-356 files

LLVM/project 0c6bca6flang/lib/Parser openmp-parsers.cpp, flang/test/Parser/OpenMP no-commas-in-ods-list-item.f90

[flang][OpenMP] Allow parsing ODS as directive-specification list item (#185737)

Normally a directive specification may use commas between the directive
name and the clauses, and between the clauses. There are some instances,
however, when a directive-specification is treated as a list item.
Specifically in arguments to the APPLY clause and as an argument to
WHEN, OTHERWISE, and the now-deprecated DEFAULT when used on a
METADIRECTIVE. In those cases, use of commas is prohibited to avoid
confusion between commas being part of the directive-specification, and
the argument list separators.
DeltaFile
+123-61flang/lib/Parser/openmp-parsers.cpp
+16-0flang/test/Parser/OpenMP/no-commas-in-ods-list-item.f90
+139-612 files

LLVM/project d352aaclibclc check_external_funcs.sh, libclc/cmake/modules AddLibclc.cmake

[libclc][CMake] Add check-libclc umbrella test target (#186053)

This allows running the full test suite using `ninja check-libclc`.
DeltaFile
+50-0libclc/test/CMakeLists.txt
+46-0libclc/test/lit.cfg.py
+30-0libclc/test/check_external_funcs.sh
+0-30libclc/check_external_funcs.sh
+14-0libclc/test/lit.site.cfg.py.in
+0-10libclc/cmake/modules/AddLibclc.cmake
+140-401 files not shown
+142-427 files

LLVM/project ab12828flang/lib/Parser openmp-parsers.cpp

format
DeltaFile
+0-1flang/lib/Parser/openmp-parsers.cpp
+0-11 files

LLVM/project 540ea54llvm/lib/Transforms/Vectorize VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize pr128062-interleaved-accesses-narrow-group.ll

Revert "[VPlan] Extend interleave-group-narrowing to WidenCast" (#186072)

This reverts commit bd5f9384 (#183204) to buy us time to investigate a
AArch64 SVE-fixed-length buildbot miscompile.

Ref: https://lab.llvm.org/buildbot/#/builders/143/builds/14601
DeltaFile
+20-20llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops-and-casts.ll
+26-3llvm/test/Transforms/LoopVectorize/pr128062-interleaved-accesses-narrow-group.ll
+8-9llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+54-323 files

LLVM/project 69e0768flang/lib/Parser openmp-parsers.cpp

Move deletions to the beginning of the file
DeltaFile
+13-11flang/lib/Parser/openmp-parsers.cpp
+13-111 files

LLVM/project efd20a3llvm/test/CodeGen/AMDGPU maximumnum.ll minimumnum.ll

[AMDGPU] Codegen for min/max instructions for gfx1170 (#185625)

gfx1170 does not have s_minimum/maximum_f16/f32 instructions so a new
feature `SALUMinimumMaximumInsts` is added for gfx12+ subtargets.
DeltaFile
+1,240-0llvm/test/CodeGen/AMDGPU/maximumnum.ll
+1,204-0llvm/test/CodeGen/AMDGPU/minimumnum.ll
+811-0llvm/test/CodeGen/AMDGPU/fminimum3.ll
+811-0llvm/test/CodeGen/AMDGPU/fmaximum3.ll
+678-0llvm/test/CodeGen/AMDGPU/vector-reduce-fmax.ll
+678-0llvm/test/CodeGen/AMDGPU/vector-reduce-fmin.ll
+5,422-024 files not shown
+10,350-31530 files

LLVM/project a372ecalibclc/clc/lib/generic/math clc_maxmag.inc clc_maxmag.cl

libclc: Improve minmag and maxmag (#186092)

Gives slightly better codegen.
DeltaFile
+4-7libclc/clc/lib/generic/math/clc_maxmag.inc
+2-8libclc/clc/lib/generic/math/clc_maxmag.cl
+2-8libclc/clc/lib/generic/math/clc_minmag.cl
+4-6libclc/clc/lib/generic/math/clc_minmag.inc
+12-294 files

LLVM/project 5fc04b9flang/test/Semantics/OpenMP resolve07.f90

Use test_symbols.py instead of test_errors.py
DeltaFile
+20-20flang/test/Semantics/OpenMP/resolve07.f90
+20-201 files