LLVM/project f5b6e4flibcxx/include CMakeLists.txt, libcxx/include/__locale_dir locale_base_api.h

[libc++][NFC] Remove unused header <__support/xlocale/__nop_locale_mgmt.h> (#194316)
DeltaFile
+0-33libcxx/include/__support/xlocale/__nop_locale_mgmt.h
+3-0libcxx/include/__locale_dir/locale_base_api.h
+0-1libcxx/include/CMakeLists.txt
+3-343 files

LLVM/project a6cf1aamlir/lib/Dialect/SPIRV/IR SPIRVOps.cpp, mlir/test/Dialect/SPIRV/IR structure-ops.mlir

[mlir][SPIR-V] Allow SpecConstantComposite constituents to reference other SpecConstantComposites (#193416)

The verifier for spirv.SpecConstantComposite previously assumed all
constituents were spirv.SpecConstant ops, which caused a crash when
referencing nested spirv.SpecConstantComposite ops

Per the SPIR-V spec (s3.3.7, OpSpecConstantComposite), constituents
"must be the \<id\>s of other specialization constants, constant
declarations, or an OpUndef", which includes OpSpecConstantComposite
DeltaFile
+51-0mlir/test/Dialect/SPIRV/IR/structure-ops.mlir
+20-6mlir/lib/Dialect/SPIRV/IR/SPIRVOps.cpp
+71-62 files

LLVM/project a726282lldb/source/Commands CommandObjectThread.cpp CommandObjectType.cpp, lldb/source/Interpreter CommandInterpreter.cpp

[lldb] Remove full stop from AppendErrorWithFormat format strings (part 2) (#194352)

To fit the style guide:
https://llvm.org/docs/CodingStandards.html#error-and-warning-messages

I found these with:
* Find `(\.AppendErrorWithFormat\(([\s\r\n]+)?"(?:(?:\\.|[^"\\])*))\."`
and replace with `$1"` using Visual Studio Code.
* Putting a call to `validate_diagnostic` in `AppendErrorWithFormat`.
* Manual inspection.

Note that this change *does not* include a call to `validate_diagnostic`
because I do not know what's going to crash on platforms that I haven't
tested on.
DeltaFile
+31-32lldb/source/Commands/CommandObjectThread.cpp
+13-14lldb/source/Commands/CommandObjectType.cpp
+13-13lldb/source/Commands/CommandObjectTarget.cpp
+10-12lldb/source/Commands/CommandObjectSource.cpp
+7-7lldb/source/Interpreter/CommandInterpreter.cpp
+3-3lldb/source/Commands/CommandObjectProcess.cpp
+77-816 files not shown
+87-9112 files

LLVM/project 711a17dllvm/include/llvm/CodeGenTypes LowLevelType.h, llvm/lib/CodeGen/GlobalISel LegalizerHelper.cpp

[AArch64][GlobalISel] Lower BF16 FPTRUNC (#193941)

When the architecture +bf16 features is available this is simple as we
lower to a standard instruction. When not available we need to expand to
a series of instructions that performs the necessary rounding. The code
to do that is a port of TargetLowering::expandFP_ROUND to GISel, minus
the float64 odd rounding via expandRoundInexactToOdd. f64 will follow in
a followup patch.

uitofp and sitofp are currently disabled, so that we can take this one
step at a time and check each part in turn. The LLT fp types attempt to
return true for ieee types without UseExtended for types of the correct
size, always returning false for non-standard types.
DeltaFile
+77-39llvm/test/CodeGen/AArch64/bf16-instructions.ll
+46-22llvm/test/CodeGen/AArch64/bf16-v8-instructions.ll
+47-6llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+29-15llvm/test/CodeGen/AArch64/bf16-v4-instructions.ll
+32-3llvm/include/llvm/CodeGenTypes/LowLevelType.h
+7-7llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+238-921 files not shown
+239-927 files

LLVM/project a910a0bmlir/lib/Dialect/X86/Transforms VectorContractToPackedTypeDotProduct.cpp, mlir/test/Dialect/X86 vector-contract-to-packed-type-dotproduct.mlir

[mlir][x86] Fix - Replace `load` with `transfer_read` to support on tensor type. (#194543)

This `patch` replaces `vector.load` operation with
`vector.transfer_read` op, such that the re-write lowers
`vector.contract` ops to `bf16_avx512_dp`.
DeltaFile
+105-0mlir/test/Dialect/X86/vector-contract-to-packed-type-dotproduct.mlir
+10-2mlir/lib/Dialect/X86/Transforms/VectorContractToPackedTypeDotProduct.cpp
+115-22 files

LLVM/project b6ed6b6llvm/docs AMDGPUUsage.rst

Add operands doc
DeltaFile
+13-0llvm/docs/AMDGPUUsage.rst
+13-01 files

LLVM/project 9edf0e7flang/include/flang/Optimizer/Analysis ArraySectionAnalyzer.h, flang/lib/Optimizer/Analysis ArraySectionAnalyzer.cpp

[flang] improve array section analysis for WHERE (#194399)

The array section analysis in the HLFIR pass in charge of WHERE lowering
was unable to tell that the LHS and RHS are the same array section when
the base is an assumed shape or when a variable is used as indices.

This patch adds an optional callback to the array section
analysis to tell if two SSA values have the same value. This call back
is then implemented to tell that two SSA values are the same only if:
they are the result of equivalent operations with no memory effect (ok
to be non speculatable) and with operands that have the same value
(recursively), or if they are the load from the same variable (which is
OK in the context of WHERE RHS/LHS thanks to Fortran 2023 10.1.4 that
guarantee that a variable referred both on the RHS and LHS cannot be
modified by side effects in the RHS/LHS).

Assisted by: Claude
DeltaFile
+143-0flang/test/HLFIR/order_assignments/where-equivalent-subscripts.fir
+114-15flang/lib/Optimizer/HLFIR/Transforms/ScheduleOrderedAssignments.cpp
+24-3flang/include/flang/Optimizer/Analysis/ArraySectionAnalyzer.h
+19-5flang/lib/Optimizer/Analysis/ArraySectionAnalyzer.cpp
+300-234 files

LLVM/project 972d2c2llvm/docs AMDGPUUsage.rst

Comments
DeltaFile
+4-9llvm/docs/AMDGPUUsage.rst
+4-91 files

LLVM/project 20ae894libc/src/__support/math CMakeLists.txt fminimum_numf128.h, libc/src/math/generic CMakeLists.txt

[libc][math] Refactor fmaximum_num-fminimum_num family to header-only (#194562)

Refactors the fmaximum_num-fminimum_num math family to be header-only.

part of: #147386

Target Functions:
  - fmaximum_num
  - fmaximum_numbf16
  - fmaximum_numf
  - fmaximum_numf128
  - fmaximum_numf16
  - fmaximum_numl
  - fminimum_num
  - fminimum_numbf16
  - fminimum_numf
  - fminimum_numf128
  - fminimum_numf16
  - fminimum_numl
DeltaFile
+176-6utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+132-0libc/src/__support/math/CMakeLists.txt
+12-38libc/src/math/generic/CMakeLists.txt
+31-0libc/src/__support/math/fminimum_numf128.h
+31-0libc/src/__support/math/fmaximum_numf16.h
+31-0libc/src/__support/math/fmaximum_numf128.h
+413-4438 files not shown
+1,047-9544 files

LLVM/project e867cb0llvm/docs AMDGPUExecutionSynchronization.rst AMDGPUUsage.rst

[AMDGPU][Doc] Move barrier documentation to a separate document

Create a new "AMDGPU Execution Synchronization" document.
For now, it just documents barriers and their execution model.
Hopefully, over time, we can improve it to document the
programming model of most common methods of synchronizing execution
of threads (e.g. using memory/spinlock).

I kept the documentation mostly as-is, but I did do some minor changes
to make it flow a bit better as a standalone document. For example,
the fact that barriers work at a wavefront granularity has been moved
to the section about `s_barrier` specifically.
I also moved the note about barrier objects existing within a scope
in the main documentation. As a result, the "target-specific properties"
section has been eliminated.
DeltaFile
+431-0llvm/docs/AMDGPUExecutionSynchronization.rst
+4-411llvm/docs/AMDGPUUsage.rst
+4-0llvm/docs/UserGuides.rst
+439-4113 files

LLVM/project 7ef9330flang/lib/Optimizer/OpenMP MapInfoFinalization.cpp, flang/test/Transforms omp-map-info-finalization-usm.fir

Revert "[Flang][OpenMP] Clear close on descriptor members for box parents in …"

This reverts commit 1cb85dd46074e4e081995407cae973eee4a1aa38.
DeltaFile
+0-49offload/test/offloading/fortran/usm-box-parent-descriptor-close.f90
+12-12flang/test/Transforms/omp-map-info-finalization-usm.fir
+12-6flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+24-673 files

LLVM/project 2e7541bllvm/test/CodeGen/AArch64/Atomics aarch64-atomicrmw-v8a.ll aarch64-atomicrmw-rcpc3.ll, llvm/test/CodeGen/AMDGPU regalloc-hoist-spill-live-range-upd.ll

Merge branch 'main' into users/hev/vbitops-tests
DeltaFile
+326-4,626llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-setcc.ll
+3,230-456llvm/test/CodeGen/WebAssembly/strided-int-mac.ll
+2,870-0llvm/test/CodeGen/AMDGPU/regalloc-hoist-spill-live-range-upd.ll
+1,250-1,305llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-v8a.ll
+1,250-1,305llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-rcpc3.ll
+1,250-1,305llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-rcpc.ll
+10,176-8,9972,985 files not shown
+109,936-56,2262,991 files

LLVM/project 9841f81llvm/docs LangRef.rst

Apply reviewer suggestions

Co-authored-by: Sameer Sahasrabuddhe <sameer.sahasrabuddhe at amd.com>
DeltaFile
+9-9llvm/docs/LangRef.rst
+9-91 files

LLVM/project f05e531lldb/source/Plugins/Process/gdb-remote ProcessGDBRemote.h ProcessGDBRemote.cpp

fixup! use map typedef
DeltaFile
+2-4lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+2-3lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+4-72 files

LLVM/project bdf7a56lldb/include/lldb/Utility GDBRemote.h, lldb/source/Plugins/Process/gdb-remote ProcessGDBRemote.cpp ProcessGDBRemote.h

[lldb] Override UpdateBreakpointSites in ProcessGDBRemote to use MultiBreakpoint

This concludes the implementation of MultiBreakpoint by actually using
the new packet to batch breakpoint requests.

https://github.com/llvm/llvm-project/pull/192910
DeltaFile
+192-0lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+8-0lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+4-0lldb/source/Utility/GDBRemote.cpp
+3-0lldb/include/lldb/Utility/GDBRemote.h
+207-04 files

LLVM/project 157fcfflldb/include/lldb/Target Process.h, lldb/source/Target Process.cpp

fixup! fix order of class declaration
DeltaFile
+14-12lldb/include/lldb/Target/Process.h
+2-3lldb/source/Target/Process.cpp
+16-152 files

LLVM/project fb4af2allvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 atomic-lock-and-setcc-folded.ll

[X86] lock opt ptr const inconsistencies (#185195)

Resolves: https://github.com/llvm/llvm-project/issues/147280

The linked issue mentions cases of atomic arithmetic followed by a test
which can be recovered by flags that are emitting cas loops instead of
lock + op which can be inferred from flags.

There's one fold that solves the issue's code: `lock and` sets ZF on the
result of old & C, so any nonzero comparison against new = old & C can
be answered with ZF, so this fold does just that, reduces to a != 0 or
== 0.

I also decided to refactor `shouldExpandCmpArithRMWInIR` into a
dispatching function and make `getCmpArithCC` just return X86::CondCodes
directly. This deleted the dispatching switch later in the code.

Also I broke out the different cases of `getCmpArithWithCC` into helper
functions for each case (add, sub, and, xor, or, add with overflow, sub

    [3 lines not shown]
DeltaFile
+159-74llvm/lib/Target/X86/X86ISelLowering.cpp
+59-0llvm/test/CodeGen/X86/atomic-lock-and-setcc-folded.ll
+218-742 files

LLVM/project 6db8e46llvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 4274239llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 packus.ll

[X86] combineVTRUNCSAT - don't split 128-bit concatenated vectors when folding to PACKSS/US (#194347)

If the VTRUNCS/US node has 128-bit src and dst types, then ensure we
don't split into sub-128-bit vectors - just treat it as padded with
zeros to match VTRUNC behaviour.

Fixes #194344
DeltaFile
+19-0llvm/test/CodeGen/X86/packus.ll
+14-3llvm/lib/Target/X86/X86ISelLowering.cpp
+33-32 files

LLVM/project c8a148blldb/include/lldb/Target Process.h, lldb/source/Target Process.cpp

fixup! fix order of class declaration
DeltaFile
+14-12lldb/include/lldb/Target/Process.h
+2-3lldb/source/Target/Process.cpp
+16-152 files

LLVM/project e9ad511llvm/lib/Transforms/Vectorize VPRecipeBuilder.h

[VPlan] Remove unused PhisToFix member from VPRecipeBuilder. NFC (#194451)

The field is not referenced anywhere; the only PhisToFix in the codebase
is a local variable in VPlanConstruction.cpp.
DeltaFile
+0-5llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+0-51 files

LLVM/project cad0ef5llvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/CodeGen/AMDGPU bpermute-xor-dpp-combine.ll

[AMDGPU] Convert ds_bpermute/wave_shuffle XOR patterns to DPP row_xmask and permlanex16
DeltaFile
+115-7llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+19-84llvm/test/CodeGen/AMDGPU/bpermute-xor-dpp-combine.ll
+18-74llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-bpermute-xor-to-dpp.ll
+11-38llvm/test/Transforms/InstCombine/AMDGPU/llvm.amdgcn.wave.shuffle.ll
+163-2034 files

LLVM/project e4b6407llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 vector-replicaton-i1-mask.ll avx512-mask-op.ll

[X86] combineKSHIFT - fold kshift(logicop(X,C1),C2) -> logicop(kshift(X,C2),kshift(C1,C2)) (#194343)

Attempt to push KSHIFTs up through logicops in the DAG to expose additional folding

Requires us to add constant folding handling for KSHIFTL/R instructions as well

Yak shaving for #193700
DeltaFile
+43-43llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll
+21-2llvm/lib/Target/X86/X86ISelLowering.cpp
+4-12llvm/test/CodeGen/X86/avx512-mask-op.ll
+5-9llvm/test/CodeGen/X86/avx512-insert-extract.ll
+3-9llvm/test/CodeGen/X86/avx512-ext.ll
+4-4llvm/test/CodeGen/X86/masked_store.ll
+80-796 files

LLVM/project 60e9465llvm/test/CodeGen/AMDGPU bpermute-xor-dpp-combine.ll, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-bpermute-xor-to-dpp.ll llvm.amdgcn.wave.shuffle.ll

[AMDGPU][NFC] Pre-commit tests for ds_bpermute XOR to DPP/permlane lowering
DeltaFile
+461-0llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-bpermute-xor-to-dpp.ll
+251-0llvm/test/Transforms/InstCombine/AMDGPU/llvm.amdgcn.wave.shuffle.ll
+240-0llvm/test/CodeGen/AMDGPU/bpermute-xor-dpp-combine.ll
+952-03 files

LLVM/project 61760fdllvm/test/CodeGen/AArch64/GlobalISel legalize-pow.mir legalize-atan2.mir

[AArch64][GlobalISel] Update fp legalization mir tests. NFC (#194561)

This updates a number of the floating point mir legalization tests to
use f
types instead of generic s types.
DeltaFile
+166-166llvm/test/CodeGen/AArch64/GlobalISel/legalize-pow.mir
+159-159llvm/test/CodeGen/AArch64/GlobalISel/legalize-atan2.mir
+125-125llvm/test/CodeGen/AArch64/GlobalISel/legalize-intrinsic-roundeven.mir
+108-108llvm/test/CodeGen/AArch64/GlobalISel/legalize-fexp2.mir
+96-96llvm/test/CodeGen/AArch64/GlobalISel/legalize-modf.mir
+92-92llvm/test/CodeGen/AArch64/GlobalISel/legalize-fma.mir
+746-74632 files not shown
+2,456-2,44838 files

LLVM/project f299805compiler-rt/test/cfi/cross-dso/icall diag.cpp

update

Created using spr 1.3.7
DeltaFile
+1-1compiler-rt/test/cfi/cross-dso/icall/diag.cpp
+1-11 files

LLVM/project 9085c76utils/bazel/llvm-project-overlay/libc BUILD.bazel

[libc][math][bazel] fix build (#194559)
DeltaFile
+1-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-11 files

LLVM/project 347c5d6llvm/lib/Support Hash.cpp

Hash.cpp: include ErrorHandling.h (#194553)

Hash.cpp uses llvm_unreachable but currently picks up ErrorHandling.h
only transitively through xxhash.h -> ArrayRef.h -> Hashing.h.
DeltaFile
+1-0llvm/lib/Support/Hash.cpp
+1-01 files

LLVM/project a5f8f47llvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 534ba39llvm/docs AMDGPUUsage.rst

[AMDGPUUsage] Specify what one-as syncscopes do (#189016)

This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
DeltaFile
+89-77llvm/docs/AMDGPUUsage.rst
+89-771 files