LLVM/project b123b70clang/lib/Headers opencl-c.h

[NFC] Fix build error: multi-line comment in opencl-c.h (#171953)

Fix build error after enabling cl_ext_float_atomics in downstream target:
opencl-c.h:13798:80: error: multi-line // comment [-Werror,-Wcomment]
13798 | #endif // defined(__opencl_c_ext_fp16_global_atomic_min_max) && \

Patch By: Jinsong Ji <jinsong.ji at intel.com>
DeltaFile
+16-16clang/lib/Headers/opencl-c.h
+16-161 files

LLVM/project 9c5744cflang/include/flang/Lower ConvertVariable.h

[flang] add missing headers in ConvertVariable.h after #171501 (#171983)

Missing headers were caught by windows build bots:
-
https://lab.llvm.org/buildbot/#/builders/222/builds/817/steps/6/logs/stdio
-
https://lab.llvm.org/buildbot/#/builders/207/builds/10970/steps/5/logs/stdio

In #171501, I removed IterationSpace.h include from Utils.h, and
IterationSpace.h was itself including SymbolMap.h and FIRBuilder.h that
are required here.
DeltaFile
+2-7flang/include/flang/Lower/ConvertVariable.h
+2-71 files

LLVM/project 04ce013llvm/utils/lit/lit TestTimes.py cl_arguments.py, llvm/utils/lit/tests filter-failed.py filter-failed-rerun.py

Reapply "[llvm][lit] Add option to run only the failed tests" (#171588)

This reverts commit 3847648e84d2ff5194f605a8a9a5c0a5e5174939.

Relands https://github.com/llvm/llvm-project/pull/158043 which got
auto-merged on a revision which wasn't approved.

The only addition to the approved version was that we adjust how we set
the time for failed tests. We used to just assign it the negative value
of the elapsed time. But if the test failed with `0` seconds (which some
of the new tests do), we would mark it `-0`. But the check for whether
something failed checks for `time < 0`. That messed with the new
`--filter-failed` option of this PR. This was only an issue on Windows
CI, but presumably can happen on any platform. Happy to do this in a
separate PR.

---- Original PR

This patch adds a new --filter-failed option to llvm-lit, which when
set, will only run the tests that have previously failed.
DeltaFile
+23-0llvm/utils/lit/tests/filter-failed.py
+18-0llvm/utils/lit/tests/filter-failed-rerun.py
+16-0llvm/utils/lit/tests/filter-failed-delete.py
+6-1llvm/utils/lit/lit/TestTimes.py
+7-0llvm/utils/lit/tests/Inputs/filter-failed/lit.cfg
+6-0llvm/utils/lit/lit/cl_arguments.py
+76-17 files not shown
+90-113 files

LLVM/project 123d4d9llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp

[AMGGPUInstCombine] Use getSigned() for frexp exponent

It may be negative.
DeltaFile
+2-1llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+2-11 files

LLVM/project 1d7bfb7llvm/lib/CodeGen SafeStack.cpp

[SafeStack] Use getSigned() for negative value
DeltaFile
+2-2llvm/lib/CodeGen/SafeStack.cpp
+2-21 files

LLVM/project 89c37fellvm/lib/Transforms/IPO WholeProgramDevirt.cpp

[WPD] Use getSigned() for offset

This offset is a signed int64_t which can take negative values.
DeltaFile
+1-1llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+1-11 files

LLVM/project 7eb4bfeclang/lib/ExtractAPI DeclarationFragments.cpp, clang/test/ExtractAPI typedef.c

[ExtractAPI] Format typedef params correctly (#171516)

Typically, pointer types are formatted in a way where the identifier
comes right after the type definition without a space separating them,
e.g. `int *foo`, where the type is `int *` and the identifier is `foo`.
However, if a type alias to a pointer type is used, the emitted
declaration fragments are incorrect due to the missing space between the
type and identifier, like in the below example:

```
typedef int *T;
// The declaration fragment contains `Tbar` instead of `T bar`
void foo(T bar);
```

This patch checks if pointer types are aliased, and inserts the space
correctly if so.

rdar://132022003
(cherry picked from commit 794218bc53a42bd87048317506e8794deb0dc8be)
DeltaFile
+80-1clang/test/ExtractAPI/typedef.c
+4-1clang/lib/ExtractAPI/DeclarationFragments.cpp
+84-22 files

LLVM/project 917e458mlir/lib/Conversion/LinalgToStandard LinalgToStandard.cpp

[mlir] Cleanup the addLegalOp of convert-linalg-to-std pass (NFC) (#171979)

DeltaFile
+1-1mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp
+1-11 files

LLVM/project 2aa3450flang/include/flang/Lower SymbolMap.h, flang/include/flang/Lower/Support Utils.h

[flang][OpenACC] remap component references in structured constructs (#171501)

OpenACC data clauses of structured constructs may contain component
references (`obj%comp`, or `obj%array(i:j:k)`, ...).

This changes allows using the ACC dialect data operation result for such
clauses every time the component is referred to inside the scope of the
construct.

The bulk of the change is to add the ability to map
`evaluate::Component` to mlir values in the symbol map used in lowering.
This is done by adding the `ComponentMap` helper class to the lowering
symbol map, and using it to override `evaluate::Component` reference
lowering in expression lowering (ConvertExprToHLFIR.cpp).

Some changes are made in Lower/Support/Utils.h in order to set-up/expose
the hashing/equality helpers needed to use `evaluate::Component` in
llvm::DenseMap.


    [26 lines not shown]
DeltaFile
+310-260flang/lib/Lower/OpenACC.cpp
+157-0flang/test/Lower/OpenACC/acc-use-device-remapping.f90
+74-0flang/include/flang/Lower/SymbolMap.h
+39-4flang/include/flang/Lower/Support/Utils.h
+22-0flang/lib/Lower/SymbolMap.cpp
+15-1flang/lib/Lower/ConvertExprToHLFIR.cpp
+617-2655 files not shown
+644-28111 files

LLVM/project f0d7d83lldb/test/API/tools/lldb-dap/disassemble TestDAP_disassemble.py, lldb/tools/lldb-dap JSONUtils.cpp JSONUtils.h

[lldb-dap] Allow empty memory reference in disassemble arguments (#162517)

This patch implements a workaround for a VSCode bug that causes it to
send disassemble requests with empty memory reference. You can find more
detailed description
[here](https://github.com/microsoft/vscode/pull/270361). I propose to
allow empty memory reference and return invalid instructions when this
occurs.

Error log example:
```
1759923554.517830610 (stdio) --> {"command":"disassemble","arguments":{"memoryReference":"","offset":0,"instructionOffset":-50,"instructionCount":50,"resolveSymbols":true},"type":"request","seq":3}
1759923554.518007517 (stdio) queued (command=disassemble seq=3)
1759923554.518254757 (stdio) <-- {"body":{"error":{"format":"invalid arguments for request 'disassemble': malformed memory reference at arguments.memoryReference\n{\n  \"instructionCount\": 50,\n  \"instructionOffset\": -50,\n  \"memoryReference\": /* error: malformed memory reference */ \"\",\n  \"offset\": 0,\n  \"resolveSymbols\": true\n}","id":3,"showUser":true}},"command":"disassemble","request_seq":3,"seq":0,"success":false,"type":"response"}
```

I am not sure that we should add workaround here when bug on VSCode
side, but I think this bug affects our users. WDYT?
DeltaFile
+22-0lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
+6-1lldb/tools/lldb-dap/JSONUtils.cpp
+5-1lldb/tools/lldb-dap/JSONUtils.h
+5-0lldb/tools/lldb-dap/Handler/DisassembleRequestHandler.cpp
+1-1lldb/tools/lldb-dap/Protocol/ProtocolRequests.cpp
+39-35 files

LLVM/project b0d982bflang/lib/Lower OpenACC.cpp

limit no_create remapping until other issue is fixed
DeltaFile
+8-1flang/lib/Lower/OpenACC.cpp
+8-11 files

LLVM/project c47d65cflang/lib/Lower OpenACC.cpp, flang/test/Lower/OpenACC acc-use-device-remapping.f90

review comments
DeltaFile
+11-10flang/test/Lower/OpenACC/acc-use-device-remapping.f90
+3-3flang/lib/Lower/OpenACC.cpp
+14-132 files

LLVM/project feace08flang/include/flang/Lower SymbolMap.h, flang/include/flang/Lower/Support Utils.h

[flang][OpenACC] remap component references in structured constructs
DeltaFile
+303-260flang/lib/Lower/OpenACC.cpp
+156-0flang/test/Lower/OpenACC/acc-use-device-remapping.f90
+74-0flang/include/flang/Lower/SymbolMap.h
+39-4flang/include/flang/Lower/Support/Utils.h
+22-0flang/lib/Lower/SymbolMap.cpp
+15-1flang/lib/Lower/ConvertExprToHLFIR.cpp
+609-2655 files not shown
+636-28111 files

LLVM/project 71c3acbllvm/include/llvm/CodeGen BasicTTIImpl.h, llvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp

[Analysis][AArch64] Add cost model for loop.dependence.{war/raw}.mask (#167551)

This PR adds the cost model for the loop dependence mask intrinsics,
both for cases where they must be expanded and when they can be lowered
for AArch64.

---------

Co-authored-by: Benjamin Maxwell <benjamin.maxwell at arm.com>
DeltaFile
+189-0llvm/test/Analysis/CostModel/AArch64/loop_dependence_mask.ll
+50-0llvm/include/llvm/CodeGen/BasicTTIImpl.h
+13-0llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+252-03 files

LLVM/project 51bd0edlibcxx/include valarray, libcxx/test/libcxx/numerics/numarray nodiscard.verify.cpp

[libc++][valarray] Applied `[[nodiscard]]` (#170996)

`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.

- https://libcxx.llvm.org/CodingGuidelines.html
DeltaFile
+59-52libcxx/include/valarray
+96-0libcxx/test/libcxx/numerics/numarray/nodiscard.verify.cpp
+155-522 files

LLVM/project 51e5074compiler-rt/lib/sanitizer_common sanitizer_linux.cpp sanitizer_platform_limits_posix.h

[compiler-rt][sanitizer] fix i386 build for Haiku (#171075)

r13 does not provide the trap err.

Co-authored-by: Jerome Duval <jerome.duval at gmail.com>
(cherry picked from commit 62dbe573cf05135875e36fc2a81f5f56c0db5820)
DeltaFile
+10-2compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
+1-1compiler-rt/lib/sanitizer_common/sanitizer_platform_limits_posix.h
+11-32 files

LLVM/project d2e835bllvm/lib/CodeGen SelectOptimize.cpp, llvm/test/CodeGen/AArch64 selectopt-cast.ll

[SelectOptimize] Fix incorrect -1 immediate for large integers (#170860)

This was creating a -1 with zero extension, while it needs to use sign
extension.

(cherry picked from commit 1165e41c876f3beba938805329416647bd21ee5e)
DeltaFile
+46-0llvm/test/CodeGen/AArch64/selectopt-cast.ll
+1-1llvm/lib/CodeGen/SelectOptimize.cpp
+47-12 files

LLVM/project 4b24e73llvm/lib/Target/WebAssembly WebAssemblyExplicitLocals.cpp WebAssemblyRegStackify.cpp, llvm/test/CodeGen/WebAssembly fake-use.ll

[WebAssembly] Remove FAKE_USEs before ExplicitLocals (#160768)

`FAKE_USE`s are essentially no-ops, so they have to be removed before
running ExplicitLocals so that `drop`s will be correctly inserted to
drop those values used by the `FAKE_USE`s.

---

This is reapplication of #160228, which broke Wasm waterfall. This PR
additionally prevents `FAKE_USE`s uses from being stackified.

Previously, a 'def' whose first use was a `FAKE_USE` was able to be
stackified as `TEE`:
- Before
```
Reg = INST ...            // Def
FAKE_USE ..., Reg, ...    // Insert
INST ..., Reg, ...
INST ..., Reg, ...

    [46 lines not shown]
DeltaFile
+25-0llvm/test/CodeGen/WebAssembly/fake-use.ll
+14-0llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp
+4-0llvm/lib/Target/WebAssembly/WebAssemblyRegStackify.cpp
+43-03 files

LLVM/project 81e746ellvm/lib/Transforms/InstCombine InstCombineCalls.cpp, llvm/test/Transforms/InstCombine ldexp.ll fold-select-fmul-if-zero.ll

Reapply "InstCombine: Fold ldexp with constant exponent to fmul" (#171895)

This reverts commit 757c5b3bc70c6f0b55afa310f3fab07a4985e8b8.

Reapply with the transform skipped if the scaling overflows or underflows.
DeltaFile
+26-26llvm/test/Transforms/InstCombine/ldexp.ll
+13-0llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+2-8llvm/test/Transforms/InstCombine/fold-select-fmul-if-zero.ll
+41-343 files

LLVM/project cf1c1bfllvm/lib/Transforms/InstCombine InstCombineCalls.cpp, llvm/test/Transforms/InstCombine ldexp.ll

Skip if overflow or underflow
DeltaFile
+8-5llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+1-1llvm/test/Transforms/InstCombine/ldexp.ll
+9-62 files

LLVM/project 1a1e7d6llvm/test/Transforms/InstCombine ldexp.ll

InstCombine: Add more ldexp by constant tests
DeltaFile
+80-0llvm/test/Transforms/InstCombine/ldexp.ll
+80-01 files

LLVM/project d714a6cclang/lib/AST ASTContext.cpp DeclCXX.cpp, clang/lib/CodeGen CGClass.cpp MicrosoftCXXABI.cpp

Reland [MS][clang] Add support for vector deleting destructors (#170337)

This reverts commit
https://github.com/llvm/llvm-project/commit/54a4da9df6906b63878ad6d0ea6da3ed7d2d8432.

MSVC supports an extension allowing to delete an array of objects via
pointer whose static type doesn't match its dynamic type. This is done
via generation of special destructors - vector deleting destructors.
MSVC's virtual tables always contain a pointer to the vector deleting
destructor for classes with virtual destructors, so not having this
extension implemented causes clang to generate code that is not
compatible with the code generated by MSVC, because clang always puts a
pointer to a scalar deleting destructor to the vtable. As a bonus the
deletion of an array of polymorphic object will work just like it does
with MSVC - no memory leaks and correct destructors are called.

This patch will cause clang to emit code that is compatible with code
produced by MSVC but not compatible with code produced with clang of
older versions, so the new behavior can be disabled via passing

    [2 lines not shown]
DeltaFile
+336-0clang/test/CodeGenCXX/microsoft-vector-deleting-dtors.cpp
+102-1clang/lib/CodeGen/CGClass.cpp
+99-0clang/test/CodeGenCXX/microsoft-vector-deleting-dtors2.cpp
+85-0clang/lib/AST/ASTContext.cpp
+63-10clang/lib/AST/DeclCXX.cpp
+56-14clang/lib/CodeGen/MicrosoftCXXABI.cpp
+741-2555 files not shown
+1,364-17061 files

LLVM/project a318c50mlir/lib/Conversion/TosaToLinalg TosaToLinalg.cpp, mlir/test/Conversion/TosaToLinalg tosa-to-linalg.mlir

[mlir][tosa] Remove NegateOp to SubOp and 48-bit promotion in TosaToLinalg (#170622)

The patch motivated by Tosa Conformance test negate_32x45x49_i16_full failure.

TosaToLinalg pass has an optimization to transfer Tosa Negate to Sub if the zero points are zeros. However, when the input value is minimum negative number, the transformation will cause the underflow. By removing the transformation, if zp = 0 it would do the promotion to avoid the underflow.

Promotion types could be from int32 to int48. TOSA negate specification does not mention support for int48. Should we consider removing the promotion to int48 to stay aligned with the TOSA spec?
DeltaFile
+21-4mlir/test/Conversion/TosaToLinalg/tosa-to-linalg.mlir
+1-11mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
+22-152 files

LLVM/project 19e1011llvm/lib/CodeGen/SelectionDAG LegalizeVectorTypes.cpp LegalizeVectorOps.cpp, llvm/test/CodeGen/AArch64 alias_mask_scalable.ll alias_mask.ll

[SelectionDAG] Fix unsafe cases for loop.dependence.{war/raw}.mask (#168565)

Both `LOOP_DEPENDENCE_WAR_MASK` and `LOOP_DEPENDENCE_RAW_MASK` are
currently hard to split correctly, and there are a number of incorrect
cases.

The difficulty comes from how the intrinsics are defined. For example,
take `LOOP_DEPENDENCE_WAR_MASK`.

It is defined as the OR of:

* `(ptrB - ptrA) <= 0`
* `elementSize * lane < (ptrB - ptrA)`

Now, if we want to split a loop dependence mask for the high half of the
mask we want to compute:

* `(ptrB - ptrA) <= 0`
* `elementSize * (lane + LoVT.getElementCount()) < (ptrB - ptrA)`

    [17 lines not shown]
DeltaFile
+50-506llvm/test/CodeGen/AArch64/alias_mask_scalable.ll
+107-434llvm/test/CodeGen/AArch64/alias_mask.ll
+34-15llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+2-46llvm/test/CodeGen/AArch64/alias_mask_scalable_nosve2.ll
+21-26llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+0-45llvm/test/CodeGen/AArch64/loop-dependence-mask-ccmp.ll
+214-1,0726 files not shown
+268-1,13512 files

LLVM/project fbde1dcflang/include/flang/Lower DirectivesCommon.h, flang/lib/Lower OpenACC.cpp

[flang][OpenACC] do not load pointer and allocatables component in data clauses (#171445)

`gatherDataOperandAddrAndBounds` did not handle pointers and allocatable
pointer components (`obj%p`) in the same way as pointer and allocatable
whole objects (`p`).

The difference is that whole object are kept as a descriptor address
(`fir.ref<fir.box>`) in the acc data operation while components were
dereferenced (`fir.box<>`).

I do not think this was intentional, and is mainly a side effect of the
`genExprAddr` for components that generate a dereference for
pointer/allocatables.

In the work that I am doing on remapping components, this is an issue
because the data operation must return a fir.ref<fir.box> so that I can
remap any appearance to the component to it (which could be in a pointer
association statement for instance, requiring access to a descriptor
address as opposed to a value).
DeltaFile
+25-17flang/test/Lower/OpenACC/acc-enter-data.f90
+14-5flang/include/flang/Lower/DirectivesCommon.h
+4-11flang/test/Lower/OpenACC/acc-bounds.f90
+8-4flang/lib/Lower/OpenACC.cpp
+51-374 files

LLVM/project 025d0c0llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU insert-waitcnts-merge.ll lds-dma-waits.ll

(reland) [AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077) (#171779)

Fixed a crash in Blender due to some weird control flow.
The issue was with the "merge" function which was only looking at the
keys of the "Other" VMem/SGPR maps. It needs to look at the keys of both
maps and merge them.

Original commit message below
----

The pass was already "reinventing" the concept just to deal with 16 bit
registers. Clean up the entire tracking logic to only use register
units.

There are no test changes because functionality didn't change, except:
- We can now track more LDS DMA IDs if we need it (up to `1 << 16`)
- The debug prints also changed a bit because we now talk in terms of
register units.


    [8 lines not shown]
DeltaFile
+322-281llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+168-0llvm/test/CodeGen/AMDGPU/insert-waitcnts-merge.ll
+4-4llvm/test/CodeGen/AMDGPU/lds-dma-waits.ll
+494-2853 files

LLVM/project b492b35llvm/test/Transforms/LoopInterchange large-nested-6d.ll large-nested-4d.ll

[LoopInterchange] Motivating example for interchange. NFC. (#171631)

This is precommitting a full reproducer of one of our motivating
examples. Looking at a full reproducer is helpful for further discussion
on DependenceAnalysis and Delinearization issues and the runtime
predicates discussion. I appreciate that this is a larger than usual
test case, but that is by design, because I think it is useful to look
at the whole thing with all of its complexities.

I have given useful names to all the relevant loop variables, and the
relevant blocks in these loops and their functions, but have
intentionally not done that for others as there are quite a few more.
DeltaFile
+569-0llvm/test/Transforms/LoopInterchange/large-nested-6d.ll
+155-0llvm/test/Transforms/LoopInterchange/large-nested-4d.ll
+724-02 files

LLVM/project 3e2a8e2libcxx/include set, libcxx/test/libcxx/diagnostics multiset.nodiscard.verify.cpp

[libc++][multiset] Applied `[[nodiscard]]` (#171654)

`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.

- https://libcxx.llvm.org/CodingGuidelines.htm
- https://wg21.link/multiset
DeltaFile
+107-0libcxx/test/libcxx/diagnostics/multiset.nodiscard.verify.cpp
+49-39libcxx/include/set
+156-392 files

LLVM/project 5bc7b9dllvm/lib/DebugInfo/DWARF DWARFDie.cpp, llvm/test/DebugInfo/Generic namespace.ll import-inlined-declaration.ll

[llvm][dwarfdump] Print the name (if available) of entities referenced by DW_AT_import (#171859)

Instead of this:
```
0x00018cff:   DW_TAG_imported_declaration
                DW_AT_decl_line (12)
                DW_AT_import    (0x0000000000018cfb)
```
print:
```
0x00018cff:   DW_TAG_imported_declaration
                DW_AT_decl_line (12)
                DW_AT_import    (0x0000000000018cfb "platform")
```

Where `0x0000000000018cfb` in this example could be a `DW_TAG_module`
with `DW_AT_name ("platform")`
DeltaFile
+122-0llvm/test/tools/llvm-dwarfdump/AArch64/DW_AT_import.yaml
+14-14llvm/test/DebugInfo/Generic/namespace.ll
+2-2llvm/test/DebugInfo/X86/dwarfdump-DIImportedEntity_elements.ll
+2-2llvm/test/tools/dsymutil/X86/modules-empty.m
+1-1llvm/test/DebugInfo/Generic/import-inlined-declaration.ll
+1-1llvm/lib/DebugInfo/DWARF/DWARFDie.cpp
+142-202 files not shown
+144-228 files

LLVM/project 3660338llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

merge v_read_access and v_mem_access because no subtarget use both
DeltaFile
+4-5llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+4-51 files