LLVM/project 61760fdllvm/test/CodeGen/AArch64/GlobalISel legalize-pow.mir legalize-atan2.mir

[AArch64][GlobalISel] Update fp legalization mir tests. NFC (#194561)

This updates a number of the floating point mir legalization tests to
use f
types instead of generic s types.
DeltaFile
+166-166llvm/test/CodeGen/AArch64/GlobalISel/legalize-pow.mir
+159-159llvm/test/CodeGen/AArch64/GlobalISel/legalize-atan2.mir
+125-125llvm/test/CodeGen/AArch64/GlobalISel/legalize-intrinsic-roundeven.mir
+108-108llvm/test/CodeGen/AArch64/GlobalISel/legalize-fexp2.mir
+96-96llvm/test/CodeGen/AArch64/GlobalISel/legalize-modf.mir
+92-92llvm/test/CodeGen/AArch64/GlobalISel/legalize-fma.mir
+746-74632 files not shown
+2,456-2,44838 files

LLVM/project f299805compiler-rt/test/cfi/cross-dso/icall diag.cpp

update

Created using spr 1.3.7
DeltaFile
+1-1compiler-rt/test/cfi/cross-dso/icall/diag.cpp
+1-11 files

LLVM/project 9085c76utils/bazel/llvm-project-overlay/libc BUILD.bazel

[libc][math][bazel] fix build (#194559)
DeltaFile
+1-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-11 files

LLVM/project 347c5d6llvm/lib/Support Hash.cpp

Hash.cpp: include ErrorHandling.h (#194553)

Hash.cpp uses llvm_unreachable but currently picks up ErrorHandling.h
only transitively through xxhash.h -> ArrayRef.h -> Hashing.h.
DeltaFile
+1-0llvm/lib/Support/Hash.cpp
+1-01 files

LLVM/project a5f8f47llvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 534ba39llvm/docs AMDGPUUsage.rst

[AMDGPUUsage] Specify what one-as syncscopes do (#189016)

This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
DeltaFile
+89-77llvm/docs/AMDGPUUsage.rst
+89-771 files

LLVM/project c857cb2libc/shared/math dsqrtf128.h, libc/src/__support/math dsqrtf128.h CMakeLists.txt

[libc][math] Refactor dsqrtf128 to header-only (#194552)

part of: #147386
DeltaFile
+31-0libc/src/__support/math/dsqrtf128.h
+29-0libc/shared/math/dsqrtf128.h
+11-1utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+11-0libc/src/__support/math/CMakeLists.txt
+2-4libc/src/math/generic/dsqrtf128.cpp
+1-2libc/src/math/generic/CMakeLists.txt
+85-74 files not shown
+90-710 files

LLVM/project 90372f5compiler-rt/test/cfi mfcall.cpp cross-dso-diagnostic.cpp, compiler-rt/test/cfi/cross-dso/icall icall-from-dso.cpp icall.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+2-3compiler-rt/test/cfi/icall/bad-signature.c
+2-2compiler-rt/test/cfi/mfcall.cpp
+1-1compiler-rt/test/cfi/cross-dso/icall/icall-from-dso.cpp
+1-1compiler-rt/test/cfi/cross-dso-diagnostic.cpp
+1-1compiler-rt/test/cfi/cross-dso/icall/icall.cpp
+7-85 files

LLVM/project d48b631libc/src/__support/FPUtil BasicOperations.h, libc/src/__support/math fmax.h fmaxf16.h

[libc][math] Qualify fmax functions to be constexpr (#194551)
DeltaFile
+9-0libc/test/shared/shared_math_constexpr_test.cpp
+3-1libc/src/__support/math/fmax.h
+3-1libc/src/__support/math/fmaxf16.h
+3-1libc/src/__support/math/fmaxf.h
+1-1libc/src/__support/math/fmaxf128.h
+1-1libc/src/__support/FPUtil/BasicOperations.h
+20-52 files not shown
+22-78 files

LLVM/project b73875allvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 677ebcallvm/docs AMDGPUUsage.rst

[AMDGPUUsage] Specify what one-as syncscopes do

This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
DeltaFile
+89-77llvm/docs/AMDGPUUsage.rst
+89-771 files

LLVM/project 4f6b1b0llvm/docs AMDGPUUsage.rst LangRef.rst

[LangRef][AMDGPU] Specify that syncscope can cause atomic operations to race (#189015)

Targets should be able to specify that the syncscope of atomic operations
influences whether they participate in data races with each other.

For example, in AMDGPU, we want (and already implement) the load in the
following case to be in a data race (i.e., return `undef` according to the
current definition), because there is an atomic store with workgroup syncscope
executing in a different workgroup:

```
; workgroup 0:
  store atomic i32 1, ptr %p syncscope("workgroup") monotonic, align 4

; workgroup 1:
  store atomic i32 2, ptr %p syncscope("workgroup") monotonic, align 4
  load atomic i32, ptr %p syncscope("workgroup") monotonic, align 4
```


    [3 lines not shown]
DeltaFile
+4-1llvm/docs/AMDGPUUsage.rst
+2-1llvm/docs/LangRef.rst
+6-22 files

LLVM/project 8146275libc/shared/math fminf16.h fminf128.h, libc/src/__support/math CMakeLists.txt fminf128.h

[libc][math] Refactor fmin family to header-only (#194549)

Refactors the fmin math family to be header-only.

part of: #147386

Target Functions:
  - fmin
  - fminbf16
  - fminf
  - fminf128
  - fminf16
  - fminl
DeltaFile
+88-3utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+65-0libc/src/__support/math/CMakeLists.txt
+31-0libc/src/__support/math/fminf128.h
+31-0libc/src/__support/math/fminf16.h
+29-0libc/shared/math/fminf16.h
+29-0libc/shared/math/fminf128.h
+273-320 files not shown
+519-4926 files

LLVM/project 0ad22a2llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/test/MC/AMDGPU hsa-gfx12-v4-user-sgpr-err.s hsa-gfx125x-v4-user-sgpr-err.s

Update error msg and code format
DeltaFile
+3-1llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+1-2llvm/test/MC/AMDGPU/hsa-gfx12-v4-user-sgpr-err.s
+1-2llvm/test/MC/AMDGPU/hsa-gfx125x-v4-user-sgpr-err.s
+5-53 files

LLVM/project 4c092f6clang/lib/AST/ByteCode Compiler.h, clang/test/AST/ByteCode records.cpp

[clang][bytecode] Add a missing fallthrough() call (#194537)

When the local variable is enabled but we don't emit any dtor
instructions, we need to fallthrough to the `EndLabel`.
DeltaFile
+8-1clang/test/AST/ByteCode/records.cpp
+1-0clang/lib/AST/ByteCode/Compiler.h
+9-12 files

LLVM/project 7b49e58libc/src/__support/math CMakeLists.txt fminimumf128.h, libc/src/math/generic CMakeLists.txt

[libc][math] Refactor fmaximum-fminimum family to header-only (#194533)

Refactors the fmaximum-fminimum math family to be header-only.

part of: #147386

Target Functions:
  - fmaximum
  - fmaximumbf16
  - fmaximumf
  - fmaximumf128
  - fmaximumf16
  - fmaximuml
  - fminimum
  - fminimumbf16
  - fminimumf
  - fminimumf128
  - fminimumf16
  - fminimuml
DeltaFile
+176-6utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+130-0libc/src/__support/math/CMakeLists.txt
+12-38libc/src/math/generic/CMakeLists.txt
+31-0libc/src/__support/math/fminimumf128.h
+31-0libc/src/__support/math/fmaximumf128.h
+31-0libc/src/__support/math/fminimumf16.h
+411-4438 files not shown
+1,065-10144 files

LLVM/project 50a6d97clang/lib/AST/ByteCode Interp.cpp, clang/test/AST/ByteCode openmp.cpp

[clang][bytecode] Reject more constexpr-unknown pointers in CheckStore (#194529)

Even in constant contexts.
DeltaFile
+8-1clang/test/AST/ByteCode/openmp.cpp
+1-1clang/lib/AST/ByteCode/Interp.cpp
+9-22 files

LLVM/project fb2c65dmlir/lib/Conversion/TosaToLinalg TosaToLinalgPass.cpp, mlir/test/Conversion/TosaToLinalg tosa-to-linalg-pipeline-no-validation.mlir

[mlir][tosa] Make tosa-attach-target optional in addTosaToLinalgPasses (#193467)

addTosaToLinalgPasses unconditionally scheduled tosa-attach-target, thus
adding a `tosa.target_env` attribute to the module. Callers therefore
had no way to opt out of such attribute. This attribute is consumed if
validationOptions is enabled, which is optional. Therefore, if the
caller disables validationOptions, the tosa-attach-target attribute will
exist even after TosaToLinalg. So consumers that don't load the TOSA
dialect can't even parse the resulting module.

This PR makes sure that we schedule tosa-attach-target only when the
caller actually needs it, i.e. when validationOptions or
attachTargetOptions is set. The default values stay inside the
`!attachTargetOptions` branch so callers that set validationOptions
still get a target env to validate against, preserving existing
behavior.

Also add a `validation` pipeline option (default `true`) to the
registered `tosa-to-linalg-pipeline`, so it can be invoked without
scheduling either `tosa-attach-target` / `tosa-validate`. A lit test is
also added.
DeltaFile
+33-12mlir/lib/Conversion/TosaToLinalg/TosaToLinalgPass.cpp
+14-0mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-pipeline-no-validation.mlir
+47-122 files

LLVM/project 33b9e74clang-tools-extra/clang-tidy/bugprone SuspiciousIncludeCheck.cpp

[clang-tidy] Fix UB in SuspiciousIncludeCheck when IgnoredRegex is not set (#194521)

When the "IgnoredRegex" option is not set, `IgnoredRegexString` is
default-constructed, i.e. initialized with a null data pointer. This is
passed to `llvm_regcomp` as the pattern argument, causing a nullptr+0 UB
in regcomp.c (caught by UBSan). Fix by initializing
`IgnoredRegexString` with an empty string literal instead.
DeltaFile
+1-1clang-tools-extra/clang-tidy/bugprone/SuspiciousIncludeCheck.cpp
+1-11 files

LLVM/project c8ff862llvm/lib/CodeGen/AsmPrinter AsmPrinter.cpp AsmPrinterInlineAsm.cpp, llvm/lib/Target/AArch64 AArch64AsmPrinter.cpp

[CodeGen] Make AsmPrinter::MAI a reference. NFC (#194538)

AsmPrinter::MAI is non-null. This is made more explicit after
PR #194523 changed TargetMachine::getMCAsmInfo to return a reference
with recent MCAsmInfo/MCTargetOptions related refactoring.

Convert the member from const MCAsmInfo * to const MCAsmInfo & and
update all consumers.
DeltaFile
+55-55llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+11-11llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+8-8llvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
+8-8llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp
+6-6llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+4-4llvm/lib/Target/X86/X86AsmPrinter.cpp
+92-9228 files not shown
+142-14134 files

LLVM/project 4049935clang/test/Driver modules-driver-import-std.cpp

Fix UNSUPPORTED added to test in #194502. (#194541)

The change in #194502 attempted to mark the test as UNSUPPORTED for
AArch64, but it didn't work because it wasn't specified correctly. This
fixes it.
DeltaFile
+1-1clang/test/Driver/modules-driver-import-std.cpp
+1-11 files

LLVM/project e4196c1clang/test/CIR/CodeGenHIP target-features.hip

fix comment in test
DeltaFile
+0-2clang/test/CIR/CodeGenHIP/target-features.hip
+0-21 files

LLVM/project 259f40dllvm/include/llvm/Bitcode LLVMBitCodes.h, llvm/include/llvm/Support AMDGPUSummary.h

[RFC][AMDGPU] Add AMDGPU_SUMMARY bitcode block for ThinLTO

With AMDGPU object linking, device functions are compiled separately from the
kernels that call them. Without whole-program visibility, the compiler must be
conservative about occupancy for every device function, leading to suboptimal
resource usage. However, GPU kernels typically carry explicit occupancy control
attributes that constrain the launch environment. ThinLTO is the natural place
to propagate these kernel attributes to callees: the combined module summary
index contains a cross-TU call graph, allowing occupancy information to be
propagated top-down from kernels to all reachable device functions. The backend
can then generate better code with the propagated constraints, achieving
whole-program awareness without the compile-time overhead of full LTO.

This patch introduces a dedicated AMDGPU_SUMMARY bitcode block that serializes
per-function summary data alongside the standard module summary. The block is
scoped to AMDGPU so that non-AMDGPU targets are completely unaffected. A
follow-up patch will add the ThinLTO propagation logic that reads these
summaries and applies conservative attribute bounds to device functions
reachable from multiple kernels.
DeltaFile
+92-0llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+82-0llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+47-0llvm/test/ThinLTO/AMDGPU/amdgpu-summary-roundtrip.ll
+46-0llvm/include/llvm/Support/AMDGPUSummary.h
+11-0llvm/lib/Bitcode/Reader/BitcodeAnalyzer.cpp
+10-0llvm/include/llvm/Bitcode/LLVMBitCodes.h
+288-01 files not shown
+293-07 files

LLVM/project ec88858clang/lib/CIR/CodeGen CIRGenModule.cpp, clang/test/CIR/CodeGenHIP target-features.hip

add comments and update test
DeltaFile
+35-0clang/test/CIR/CodeGenHIP/target-features.hip
+10-0clang/lib/CIR/CodeGen/CIRGenModule.cpp
+45-02 files

LLVM/project 3deb42borc-rt/include/orc-rt/sps-ci AllSPSCI.h, orc-rt/lib/executor/sps-ci AllSPSCI.cpp

[orc-rt] Sink include into implementation file. (#194379)
DeltaFile
+1-1orc-rt/include/orc-rt/sps-ci/AllSPSCI.h
+1-0orc-rt/lib/executor/sps-ci/AllSPSCI.cpp
+2-12 files

LLVM/project 549ecfemlir/include/mlir/Dialect/Vector/Transforms LoweringPatterns.h, mlir/lib/Dialect/Vector/Transforms LowerVectorContract.cpp

experimental expose vector contract lowerings to have multiple options

Signed-off-by: Eric Feng <Eric.Feng at amd.com>
DeltaFile
+202-84mlir/lib/Dialect/Vector/Transforms/LowerVectorContract.cpp
+91-0mlir/test/Dialect/Vector/vector-contract-composable-lowering.mlir
+64-0mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp
+27-0mlir/include/mlir/Dialect/Vector/Transforms/LoweringPatterns.h
+384-844 files

LLVM/project e6676b0mlir/include/mlir/Dialect/AMDGPU/Utils Chipset.h, mlir/lib/Conversion/AMDGPUToROCDL AMDGPUToROCDL.cpp

experimental amdgpu reduction

Signed-off-by: Eric Feng <Eric.Feng at amd.com>
DeltaFile
+379-0mlir/lib/Dialect/AMDGPU/Transforms/VectorReductionToDot.cpp
+323-0mlir/test/Dialect/AMDGPU/vector-reduction-to-dot.mlir
+103-20mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
+0-53mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+53-0mlir/include/mlir/Dialect/AMDGPU/Utils/Chipset.h
+43-0mlir/test/Dialect/AMDGPU/vector-reduction-to-dot-gfx9.mlir
+901-734 files not shown
+938-7310 files

LLVM/project 4940171llvm/unittests/Frontend OpenMPIRBuilderTest.cpp

Try to fix unit tests
DeltaFile
+35-0llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+35-01 files

LLVM/project 0f9dd88llvm/docs LangRef.rst ReleaseNotes.md, llvm/include/llvm/IR DataLayout.h

[DataLayout] Add null pointer value infrastructure

Add support for specifying the null pointer bit representation per address space
in DataLayout via new pointer spec flags:
- 'z': null pointer is all-zeros
- 'o': null pointer is all-ones

When neither flag is present, the address space inherits the default set by the
new 'N<null-value>' top-level specifier ('Nz' or 'No'). If that is also absent,
the null pointer value is zero.

No target DataLayout strings are updated in this change. This is pure
infrastructure for a future ConstantPointerNull semantic change to support
targets with non-zero null pointers (e.g. AMDGPU).
DeltaFile
+136-1llvm/unittests/IR/DataLayoutTest.cpp
+61-6llvm/lib/IR/DataLayout.cpp
+23-1llvm/include/llvm/IR/DataLayout.h
+17-1llvm/docs/LangRef.rst
+8-0llvm/docs/ReleaseNotes.md
+245-95 files

LLVM/project 922d95aclang/lib/CIR/CodeGen CIRGenBuiltinAMDGPU.cpp, clang/test/CIR/CodeGenHIP builtins-amdgcn.hip

[CIR][AMDGPU] Add lowering for amdgcn div fmas builtins (#194334)

Upstreaming ClangIR PR: https://github.com/llvm/clangir/pull/2051

This PR adds support for lowering of _builtin_amdgcn_div_fmas* amdgpu
builtins to clangIR.
Followed similar lowering from reference clang->llvmir in
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp.
DeltaFile
+16-0clang/test/CIR/CodeGenHIP/builtins-amdgcn.hip
+10-4clang/lib/CIR/CodeGen/CIRGenBuiltinAMDGPU.cpp
+26-42 files