LLVM/project 9841f81llvm/docs LangRef.rst

Apply reviewer suggestions

Co-authored-by: Sameer Sahasrabuddhe <sameer.sahasrabuddhe at amd.com>
DeltaFile
+9-9llvm/docs/LangRef.rst
+9-91 files

LLVM/project f05e531lldb/source/Plugins/Process/gdb-remote ProcessGDBRemote.h ProcessGDBRemote.cpp

fixup! use map typedef
DeltaFile
+2-4lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+2-3lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+4-72 files

LLVM/project bdf7a56lldb/include/lldb/Utility GDBRemote.h, lldb/source/Plugins/Process/gdb-remote ProcessGDBRemote.cpp ProcessGDBRemote.h

[lldb] Override UpdateBreakpointSites in ProcessGDBRemote to use MultiBreakpoint

This concludes the implementation of MultiBreakpoint by actually using
the new packet to batch breakpoint requests.

https://github.com/llvm/llvm-project/pull/192910
DeltaFile
+192-0lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+8-0lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+4-0lldb/source/Utility/GDBRemote.cpp
+3-0lldb/include/lldb/Utility/GDBRemote.h
+207-04 files

LLVM/project 157fcfflldb/include/lldb/Target Process.h, lldb/source/Target Process.cpp

fixup! fix order of class declaration
DeltaFile
+14-12lldb/include/lldb/Target/Process.h
+2-3lldb/source/Target/Process.cpp
+16-152 files

LLVM/project fb4af2allvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 atomic-lock-and-setcc-folded.ll

[X86] lock opt ptr const inconsistencies (#185195)

Resolves: https://github.com/llvm/llvm-project/issues/147280

The linked issue mentions cases of atomic arithmetic followed by a test
which can be recovered by flags that are emitting cas loops instead of
lock + op which can be inferred from flags.

There's one fold that solves the issue's code: `lock and` sets ZF on the
result of old & C, so any nonzero comparison against new = old & C can
be answered with ZF, so this fold does just that, reduces to a != 0 or
== 0.

I also decided to refactor `shouldExpandCmpArithRMWInIR` into a
dispatching function and make `getCmpArithCC` just return X86::CondCodes
directly. This deleted the dispatching switch later in the code.

Also I broke out the different cases of `getCmpArithWithCC` into helper
functions for each case (add, sub, and, xor, or, add with overflow, sub

    [3 lines not shown]
DeltaFile
+159-74llvm/lib/Target/X86/X86ISelLowering.cpp
+59-0llvm/test/CodeGen/X86/atomic-lock-and-setcc-folded.ll
+218-742 files

LLVM/project 6db8e46llvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 4274239llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 packus.ll

[X86] combineVTRUNCSAT - don't split 128-bit concatenated vectors when folding to PACKSS/US (#194347)

If the VTRUNCS/US node has 128-bit src and dst types, then ensure we
don't split into sub-128-bit vectors - just treat it as padded with
zeros to match VTRUNC behaviour.

Fixes #194344
DeltaFile
+19-0llvm/test/CodeGen/X86/packus.ll
+14-3llvm/lib/Target/X86/X86ISelLowering.cpp
+33-32 files

LLVM/project c8a148blldb/include/lldb/Target Process.h, lldb/source/Target Process.cpp

fixup! fix order of class declaration
DeltaFile
+14-12lldb/include/lldb/Target/Process.h
+2-3lldb/source/Target/Process.cpp
+16-152 files

LLVM/project e9ad511llvm/lib/Transforms/Vectorize VPRecipeBuilder.h

[VPlan] Remove unused PhisToFix member from VPRecipeBuilder. NFC (#194451)

The field is not referenced anywhere; the only PhisToFix in the codebase
is a local variable in VPlanConstruction.cpp.
DeltaFile
+0-5llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+0-51 files

LLVM/project cad0ef5llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/CodeGen/AMDGPU bpermute-xor-dpp-combine.ll

[AMDGPU] Convert ds_bpermute/wave_shuffle XOR patterns to DPP row_xmask and permlanex16
DeltaFile
+115-7llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+19-84llvm/test/CodeGen/AMDGPU/bpermute-xor-dpp-combine.ll
+18-74llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-bpermute-xor-to-dpp.ll
+11-38llvm/test/Transforms/InstCombine/AMDGPU/llvm.amdgcn.wave.shuffle.ll
+163-2034 files

LLVM/project e4b6407llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-replicaton-i1-mask.ll avx512-mask-op.ll

[X86] combineKSHIFT - fold kshift(logicop(X,C1),C2) -> logicop(kshift(X,C2),kshift(C1,C2)) (#194343)

Attempt to push KSHIFTs up through logicops in the DAG to expose additional folding

Requires us to add constant folding handling for KSHIFTL/R instructions as well

Yak shaving for #193700
DeltaFile
+43-43llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll
+21-2llvm/lib/Target/X86/X86ISelLowering.cpp
+4-12llvm/test/CodeGen/X86/avx512-mask-op.ll
+5-9llvm/test/CodeGen/X86/avx512-insert-extract.ll
+3-9llvm/test/CodeGen/X86/avx512-ext.ll
+4-4llvm/test/CodeGen/X86/masked_store.ll
+80-796 files

LLVM/project 60e9465llvm/test/CodeGen/AMDGPU bpermute-xor-dpp-combine.ll, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-bpermute-xor-to-dpp.ll llvm.amdgcn.wave.shuffle.ll

[AMDGPU][NFC] Pre-commit tests for ds_bpermute XOR to DPP/permlane lowering
DeltaFile
+461-0llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-bpermute-xor-to-dpp.ll
+251-0llvm/test/Transforms/InstCombine/AMDGPU/llvm.amdgcn.wave.shuffle.ll
+240-0llvm/test/CodeGen/AMDGPU/bpermute-xor-dpp-combine.ll
+952-03 files

LLVM/project 61760fdllvm/test/CodeGen/AArch64/GlobalISel legalize-pow.mir legalize-atan2.mir

[AArch64][GlobalISel] Update fp legalization mir tests. NFC (#194561)

This updates a number of the floating point mir legalization tests to
use f
types instead of generic s types.
DeltaFile
+166-166llvm/test/CodeGen/AArch64/GlobalISel/legalize-pow.mir
+159-159llvm/test/CodeGen/AArch64/GlobalISel/legalize-atan2.mir
+125-125llvm/test/CodeGen/AArch64/GlobalISel/legalize-intrinsic-roundeven.mir
+108-108llvm/test/CodeGen/AArch64/GlobalISel/legalize-fexp2.mir
+96-96llvm/test/CodeGen/AArch64/GlobalISel/legalize-modf.mir
+92-92llvm/test/CodeGen/AArch64/GlobalISel/legalize-fma.mir
+746-74632 files not shown
+2,456-2,44838 files

LLVM/project f299805compiler-rt/test/cfi/cross-dso/icall diag.cpp

update

Created using spr 1.3.7
DeltaFile
+1-1compiler-rt/test/cfi/cross-dso/icall/diag.cpp
+1-11 files

LLVM/project 9085c76utils/bazel/llvm-project-overlay/libc BUILD.bazel

[libc][math][bazel] fix build (#194559)
DeltaFile
+1-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-11 files

LLVM/project 347c5d6llvm/lib/Support Hash.cpp

Hash.cpp: include ErrorHandling.h (#194553)

Hash.cpp uses llvm_unreachable but currently picks up ErrorHandling.h
only transitively through xxhash.h -> ArrayRef.h -> Hashing.h.
DeltaFile
+1-0llvm/lib/Support/Hash.cpp
+1-01 files

LLVM/project a5f8f47llvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 534ba39llvm/docs AMDGPUUsage.rst

[AMDGPUUsage] Specify what one-as syncscopes do (#189016)

This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
DeltaFile
+89-77llvm/docs/AMDGPUUsage.rst
+89-771 files

LLVM/project c857cb2libc/shared/math dsqrtf128.h, libc/src/__support/math dsqrtf128.h CMakeLists.txt

[libc][math] Refactor dsqrtf128 to header-only (#194552)

part of: #147386
DeltaFile
+31-0libc/src/__support/math/dsqrtf128.h
+29-0libc/shared/math/dsqrtf128.h
+11-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+11-0libc/src/__support/math/CMakeLists.txt
+2-4libc/src/math/generic/dsqrtf128.cpp
+1-2libc/src/math/generic/CMakeLists.txt
+85-74 files not shown
+90-710 files

LLVM/project 90372f5compiler-rt/test/cfi mfcall.cpp cross-dso-diagnostic.cpp, compiler-rt/test/cfi/cross-dso/icall icall-from-dso.cpp icall.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+2-3compiler-rt/test/cfi/icall/bad-signature.c
+2-2compiler-rt/test/cfi/mfcall.cpp
+1-1compiler-rt/test/cfi/cross-dso/icall/icall-from-dso.cpp
+1-1compiler-rt/test/cfi/cross-dso-diagnostic.cpp
+1-1compiler-rt/test/cfi/cross-dso/icall/icall.cpp
+7-85 files

LLVM/project d48b631libc/src/__support/FPUtil BasicOperations.h, libc/src/__support/math fmax.h fmaxf16.h

[libc][math] Qualify fmax functions to be constexpr (#194551)
DeltaFile
+9-0libc/test/shared/shared_math_constexpr_test.cpp
+3-1libc/src/__support/math/fmax.h
+3-1libc/src/__support/math/fmaxf16.h
+3-1libc/src/__support/math/fmaxf.h
+1-1libc/src/__support/math/fmaxf128.h
+1-1libc/src/__support/FPUtil/BasicOperations.h
+20-52 files not shown
+22-78 files

LLVM/project b73875allvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 677ebcallvm/docs AMDGPUUsage.rst

[AMDGPUUsage] Specify what one-as syncscopes do

This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
DeltaFile
+89-77llvm/docs/AMDGPUUsage.rst
+89-771 files

LLVM/project 4f6b1b0llvm/docs AMDGPUUsage.rst LangRef.rst

[LangRef][AMDGPU] Specify that syncscope can cause atomic operations to race (#189015)

Targets should be able to specify that the syncscope of atomic operations
influences whether they participate in data races with each other.

For example, in AMDGPU, we want (and already implement) the load in the
following case to be in a data race (i.e., return `undef` according to the
current definition), because there is an atomic store with workgroup syncscope
executing in a different workgroup:

```
; workgroup 0:
  store atomic i32 1, ptr %p syncscope("workgroup") monotonic, align 4

; workgroup 1:
  store atomic i32 2, ptr %p syncscope("workgroup") monotonic, align 4
  load atomic i32, ptr %p syncscope("workgroup") monotonic, align 4
```


    [3 lines not shown]
DeltaFile
+4-1llvm/docs/AMDGPUUsage.rst
+2-1llvm/docs/LangRef.rst
+6-22 files

LLVM/project 8146275libc/shared/math fminf16.h fminf128.h, libc/src/__support/math CMakeLists.txt fminf128.h

[libc][math] Refactor fmin family to header-only (#194549)

Refactors the fmin math family to be header-only.

part of: #147386

Target Functions:
  - fmin
  - fminbf16
  - fminf
  - fminf128
  - fminf16
  - fminl
DeltaFile
+88-3utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+65-0libc/src/__support/math/CMakeLists.txt
+31-0libc/src/__support/math/fminf128.h
+31-0libc/src/__support/math/fminf16.h
+29-0libc/shared/math/fminf16.h
+29-0libc/shared/math/fminf128.h
+273-320 files not shown
+519-4926 files

LLVM/project 0ad22a2llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/test/MC/AMDGPU hsa-gfx12-v4-user-sgpr-err.s hsa-gfx125x-v4-user-sgpr-err.s

Update error msg and code format
DeltaFile
+3-1llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+1-2llvm/test/MC/AMDGPU/hsa-gfx12-v4-user-sgpr-err.s
+1-2llvm/test/MC/AMDGPU/hsa-gfx125x-v4-user-sgpr-err.s
+5-53 files

LLVM/project 4c092f6clang/lib/AST/ByteCode Compiler.h, clang/test/AST/ByteCode records.cpp

[clang][bytecode] Add a missing fallthrough() call (#194537)

When the local variable is enabled but we don't emit any dtor
instructions, we need to fallthrough to the `EndLabel`.
DeltaFile
+8-1clang/test/AST/ByteCode/records.cpp
+1-0clang/lib/AST/ByteCode/Compiler.h
+9-12 files

LLVM/project 7b49e58libc/src/__support/math CMakeLists.txt fminimumf128.h, libc/src/math/generic CMakeLists.txt

[libc][math] Refactor fmaximum-fminimum family to header-only (#194533)

Refactors the fmaximum-fminimum math family to be header-only.

part of: #147386

Target Functions:
  - fmaximum
  - fmaximumbf16
  - fmaximumf
  - fmaximumf128
  - fmaximumf16
  - fmaximuml
  - fminimum
  - fminimumbf16
  - fminimumf
  - fminimumf128
  - fminimumf16
  - fminimuml
DeltaFile
+176-6utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+130-0libc/src/__support/math/CMakeLists.txt
+12-38libc/src/math/generic/CMakeLists.txt
+31-0libc/src/__support/math/fminimumf128.h
+31-0libc/src/__support/math/fmaximumf128.h
+31-0libc/src/__support/math/fminimumf16.h
+411-4438 files not shown
+1,065-10144 files

LLVM/project 50a6d97clang/lib/AST/ByteCode Interp.cpp, clang/test/AST/ByteCode openmp.cpp

[clang][bytecode] Reject more constexpr-unknown pointers in CheckStore (#194529)

Even in constant contexts.
DeltaFile
+8-1clang/test/AST/ByteCode/openmp.cpp
+1-1clang/lib/AST/ByteCode/Interp.cpp
+9-22 files

LLVM/project fb2c65dmlir/lib/Conversion/TosaToLinalg TosaToLinalgPass.cpp, mlir/test/Conversion/TosaToLinalg tosa-to-linalg-pipeline-no-validation.mlir

[mlir][tosa] Make tosa-attach-target optional in addTosaToLinalgPasses (#193467)

addTosaToLinalgPasses unconditionally scheduled tosa-attach-target, thus
adding a `tosa.target_env` attribute to the module. Callers therefore
had no way to opt out of such attribute. This attribute is consumed if
validationOptions is enabled, which is optional. Therefore, if the
caller disables validationOptions, the tosa-attach-target attribute will
exist even after TosaToLinalg. So consumers that don't load the TOSA
dialect can't even parse the resulting module.

This PR makes sure that we schedule tosa-attach-target only when the
caller actually needs it, i.e. when validationOptions or
attachTargetOptions is set. The default values stay inside the
`!attachTargetOptions` branch so callers that set validationOptions
still get a target env to validate against, preserving existing
behavior.

Also add a `validation` pipeline option (default `true`) to the
registered `tosa-to-linalg-pipeline`, so it can be invoked without
scheduling either `tosa-attach-target` / `tosa-validate`. A lit test is
also added.
DeltaFile
+33-12mlir/lib/Conversion/TosaToLinalg/TosaToLinalgPass.cpp
+14-0mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-pipeline-no-validation.mlir
+47-122 files