LLVM/project 1788345llvm/test/CodeGen/AMDGPU memmove-param-combinations.ll, llvm/test/MC/AMDGPU gfx10_unsupported.s gfx7_unsupported.s

Merge remote-tracking branch 'upstream/main' into rewrite-hlsl-intrinsics-to-tablegen
DeltaFile
+2,210-1,106llvm/test/MC/AMDGPU/gfx10_unsupported.s
+863-863llvm/test/MC/AMDGPU/gfx7_unsupported.s
+601-1,016llvm/test/CodeGen/AMDGPU/memmove-param-combinations.ll
+1,185-397llvm/test/MC/AMDGPU/gfx950_asm_features.s
+691-691llvm/test/MC/AMDGPU/gfx11_unsupported.s
+613-613llvm/test/MC/AMDGPU/gfx8_unsupported.s
+6,163-4,6862,156 files not shown
+62,177-31,8902,162 files

LLVM/project 502b5e0llvm/lib/Transforms/Instrumentation MemProfUse.cpp, llvm/test/Transforms/PGOProfile memprof-inline-call-stacks.ll

[MemProf] Dump inline call stacks as optimization remarks (#188678)

This patch teaches the MemProf matching pass to dump inline call
stacks as analysis remarks like so:

frame: 704e4117e6a62739 main:10:5
frame: 273929e54b9f1234 foo:2:12
inline call stack: 704e4117e6a62739,273929e54b9f1234

The output consists of two types of remarks:

- "frame": Acts as a dictionary mapping a unique MD5-based FrameID
  to source information (function name, line offset, and column).

- "inline call stack": Provides the full call stack for a call site
  as a sequence of FrameIDs.

Both types of remarks are deduplicated to reduce the output size.

This patch is intended to be a debugging aid.
DeltaFile
+65-2llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
+38-0llvm/test/Transforms/PGOProfile/memprof-inline-call-stacks.ll
+103-22 files

LLVM/project 4537293llvm/lib/Target/AMDGPU AMDGPUCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Match fract from compare and select and minimum

Implementing this with any of the minnum variants is overconstraining
for the actual use. Existing patterns use fmin, then have to manually
clamp nan inputs to get nan propagating behavior. It's cleaner to express
this with a nan propagating operation to start with.
DeltaFile
+197-264llvm/test/CodeGen/AMDGPU/fract-match.ll
+124-85llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+321-3492 files

LLVM/project 0cfea9cllvm/lib/Target/AMDGPU AMDGPUCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Match fract pattern with swapped edge case check

A fract implementation can equivalently be written as
  r = fmin(x - floor(x))
  r = isnan(x) ? x : r;
  r = isinf(x) ? 0.0 : r;

or:
  r = fmin(x - floor(x));
  r = isinf(x) ? 0.0 : r;
  r = isnan(x) ? x : r;

Previously this only matched the previous form. Match
the case where the isinf check is the inner clamp. There are
a few more ways to write this pattern (e.g., move the clamp of
infinity to the input) but I haven't encountered that in the wild.

The existing code seems to be trying too hard to match noncanonical
variants of the pattern. Only handles the result that all 4 permutations
of compare and select produce out of instcombine.
DeltaFile
+328-349llvm/test/CodeGen/AMDGPU/fract-match.ll
+47-17llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+375-3662 files

LLVM/project 28f24b5llvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Add baseline tests for more fract patterns (#189092)
DeltaFile
+2,235-0llvm/test/CodeGen/AMDGPU/fract-match.ll
+2,235-01 files

LLVM/project 871d675compiler-rt/lib/profile CMakeLists.txt

[compiler-rt] Add PTX feature specifically when CUDA is not available (#189083)

Summary:
People need to be able to build this without a CUDA installation.

Long term we should bump up the minimum version as I'm pretty sure every
architecture before this has been deprecated by NVIDIA.
DeltaFile
+2-0compiler-rt/lib/profile/CMakeLists.txt
+2-01 files

LLVM/project df6d6c9compiler-rt/lib/scudo/standalone/tests combined_test.cpp

[Scudo] Disable ScudoCombinedTests.NewType (#189070)

This is failing in some configurations on AArch64 Linux. Given there are
a lot of follow-up commits that makes this hard to revert, just disable
it for now pending future investigation.
DeltaFile
+1-1compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
+1-11 files

LLVM/project ba44df4clang/tools/clang-format git-clang-format

[clang-format] Add pre-commit CI env var support to git-clang-format (#188816)

When git-clang-format is invoked with no explicit commit arguments and
both PRE_COMMIT_FROM_REF and PRE_COMMIT_TO_REF are set, the script
automatically uses those refs as the diff range and implies --diff. If
the variables are absent, existing behavior is fully preserved.

This allows projects to use `git-clang-format` directly inside CI
pipelines via the [pre-commit](https://pre-commit.com/) framework
without any wrapper scripts or extra configuration.


Closes: #188813

No existing lit test suite for this script. Verified manually that env
vars activate two-commit diff mode, existing behavior is preserved
without them, and explicit CLI args always override them.
DeltaFile
+15-0clang/tools/clang-format/git-clang-format
+15-01 files

LLVM/project 354f742clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage UnsafeBufferUsage.cpp

Update clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.cpp

Replace "const char * const" with "llvm::StringLiteral"

Co-authored-by: Balázs Benics <benicsbalazs at gmail.com>
DeltaFile
+1-3clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.cpp
+1-31 files

LLVM/project 1611a23offload/test/offloading back2back_distribute.c bug49021.cpp, openmp/device/src Synchronization.cpp

[OFFLOAD] Add spirv implementation for named barrier (#180393)

This change adds implementation for named barriers for SPIRV backend.
Since there is no built in API/intrinsics for named barrier in SPIRV,
the implementation loosely follows implementation for AMD
DeltaFile
+22-9openmp/device/src/Synchronization.cpp
+2-1offload/test/offloading/back2back_distribute.c
+2-1offload/test/offloading/bug49021.cpp
+2-1offload/test/offloading/atomic-compare-signedness.c
+2-1offload/test/offloading/bug51781.c
+2-1offload/test/offloading/bug51982.c
+32-1482 files not shown
+56-9688 files

LLVM/project de65a73llvm/lib/Analysis ValueTracking.cpp

Rename function to show nan doesn't matter
DeltaFile
+4-4llvm/lib/Analysis/ValueTracking.cpp
+4-41 files

LLVM/project f8a2e0eclang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage UnsafeBufferUsage.cpp

Update clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.cpp

Remove "#include SSAFForceLinker.h"

Co-authored-by: Balázs Benics <benicsbalazs at gmail.com>
DeltaFile
+0-1clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.cpp
+0-11 files

LLVM/project 8420612clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage UnsafeBufferUsage.cpp

Update clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.cpp

adjust the position of the file title

Co-authored-by: Balázs Benics <benicsbalazs at gmail.com>
DeltaFile
+1-1clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsage.cpp
+1-11 files

LLVM/project a609bffllvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Add baseline tests for more fract patterns
DeltaFile
+2,235-0llvm/test/CodeGen/AMDGPU/fract-match.ll
+2,235-01 files

LLVM/project 3c625a1llvm/test/MC/AMDGPU gfx10_unsupported.s gfx7_unsupported.s

[AMDGPU][MC] Improving assembler error message for unsupported instructions (#185778)

The updated error message shows both the instruction name and the GPU
target name.
DeltaFile
+2,210-1,106llvm/test/MC/AMDGPU/gfx10_unsupported.s
+863-863llvm/test/MC/AMDGPU/gfx7_unsupported.s
+1,185-397llvm/test/MC/AMDGPU/gfx950_asm_features.s
+691-691llvm/test/MC/AMDGPU/gfx11_unsupported.s
+613-613llvm/test/MC/AMDGPU/gfx8_unsupported.s
+376-376llvm/test/MC/AMDGPU/gfx1250_asm_wmma_w32.s
+5,938-4,04652 files not shown
+10,005-7,66658 files

LLVM/project 7b5c33dflang/lib/Optimizer/OpenMP LowerWorkdistribute.cpp, mlir/include/mlir/Dialect/OpenMP OpenMPClauses.td OpenMPOps.td

[mlir][OpenMP] Add iterator support to depend clause

Extend the depend clause to support `!omp.iterated<Ty>` handles
alongside plain depend vars, so the IR can represent both forms.

Assisted with copilot
DeltaFile
+107-58mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+35-2mlir/test/Dialect/OpenMP/ops.mlir
+24-4mlir/test/Dialect/OpenMP/invalid.mlir
+11-5mlir/include/mlir/Dialect/OpenMP/OpenMPClauses.td
+3-3mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+4-0flang/lib/Optimizer/OpenMP/LowerWorkdistribute.cpp
+184-721 files not shown
+186-727 files

LLVM/project 07f63daclang/test/SemaHLSL Texture2D-mips-errors.hlsl

[HLSL] Fix up Texture2D-mips-errors test

The Texture2D-mips-errors test was supposed to test for an error when the mips
types are used as templates. It was initially disabled because of a
crash. On further investigation, the crash was related to int2(0,0), and
not the mips type.

Follow-up issue for the int2(0,0) crash: #189086

Fixes #188556
DeltaFile
+5-7clang/test/SemaHLSL/Texture2D-mips-errors.hlsl
+5-71 files

LLVM/project 55f15adclang/lib/Headers/hlsl hlsl_alias_intrinsics.h, clang/test/CodeGenHLSL/builtins fma.hlsl

Merge branch 'main' into users/amehsan/weakc-nsw
DeltaFile
+0-220llvm/test/CodeGen/AMDGPU/frame-index-disjoint-s-or-b32.ll
+0-161llvm/test/CodeGen/AMDGPU/eliminate-frame-index-scalar-bit-ops.mir
+138-0clang/test/CodeGenHLSL/builtins/fma.hlsl
+113-0clang/test/SemaHLSL/BuiltIns/fma-errors.hlsl
+54-0clang/lib/Headers/hlsl/hlsl_alias_intrinsics.h
+53-0llvm/test/CodeGen/DirectX/fma.ll
+358-38111 files not shown
+481-41517 files

LLVM/project 509f181flang/test/Transforms debug-imported-entity.fir, mlir/test/Dialect/LLVMIR bytecode.mlir

[MLIR][TableGen] Fix ArrayRefParameter in struct format roundtrip  (#189065)

When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a
non-last position within a struct() assembly format directive, the
printed
output is ambiguous: the comma-separated array elements are
indistinguishable from the struct-level commas separating key-value
pairs.

Fix this by wrapping such parameters in square brackets in both the
generated printer and parser. The printer emits '[' before and ']' after
the array value; the parser calls parseLSquare()/parseRSquare() around
the
FieldParser call. Parameters with a custom printer or parser are
unaffected
(the user controls the format in that case).

Fixes #156623

Assisted-by: Claude Code
DeltaFile
+62-9mlir/tools/mlir-tblgen/AttrOrTypeFormatGen.cpp
+68-0mlir/test/mlir-tblgen/attr-or-type-format.td
+33-0mlir/test/lib/Dialect/Test/TestAttrDefs.td
+18-1mlir/test/mlir-tblgen/attr-or-type-format-roundtrip.mlir
+1-1mlir/test/Dialect/LLVMIR/bytecode.mlir
+1-1flang/test/Transforms/debug-imported-entity.fir
+183-122 files not shown
+185-148 files

LLVM/project 0760a72flang/lib/Optimizer/OpenMP LowerWorkdistribute.cpp, mlir/include/mlir/Dialect/OpenMP OpenMPClauses.td OpenMPOps.td

[mlir][OpenMP] Add iterator support to depend clause

Extend the depend clause to support `!omp.iterated<Ty>` handles
alongside plain depend vars, so the IR can represent both forms.
DeltaFile
+102-50mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+35-2mlir/test/Dialect/OpenMP/ops.mlir
+11-5mlir/include/mlir/Dialect/OpenMP/OpenMPClauses.td
+6-6mlir/test/Dialect/OpenMP/invalid.mlir
+3-3mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+4-0flang/lib/Optimizer/OpenMP/LowerWorkdistribute.cpp
+161-661 files not shown
+163-667 files

LLVM/project a996f2allvm/lib/Target/AMDGPU SIRegisterInfo.cpp, llvm/test/CodeGen/AMDGPU frame-index-disjoint-s-or-b32.ll eliminate-frame-index-scalar-bit-ops.mir

Revert "AMDGPU: Fold frame indexes into disjoint s_or_b32" (#189074)

Reverts llvm/llvm-project#102345

unblock bot: https://lab.llvm.org/buildbot/#/builders/10/builds/25403
DeltaFile
+0-220llvm/test/CodeGen/AMDGPU/frame-index-disjoint-s-or-b32.ll
+0-161llvm/test/CodeGen/AMDGPU/eliminate-frame-index-scalar-bit-ops.mir
+2-6llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+2-3873 files

LLVM/project 3405dc4libclc/clc/lib/generic/math clc_fract.inc

libclc: Simplify fract implementation

This is nan propagating, so it's unnatural to implement it
in terms of the nan avoiding fmin. Implement with compare and
select, which is the least constrained way to implement the clamp.
DeltaFile
+2-2libclc/clc/lib/generic/math/clc_fract.inc
+2-21 files

LLVM/project ac1863ellvm/lib/Target/AMDGPU AMDGPUCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Match fract from compare and select and minimum

Implementing this with any of the minnum variants is overconstraining
for the actual use. Existing patterns use fmin, then have to manually
clamp nan inputs to get nan propagating behavior. It's cleaner to express
this with a nan propagating operation to start with.
DeltaFile
+780-30llvm/test/CodeGen/AMDGPU/fract-match.ll
+124-85llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+904-1152 files

LLVM/project 6f23cbdllvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor nofpclass-fmul.ll

ValueTracking: x - floor(x) cannot introduce overflow

This returns a value with an absolute value less than 1 so it
should be possible to propagate no-infs.
DeltaFile
+42-0llvm/test/Transforms/Attributor/nofpclass-fmul.ll
+9-1llvm/lib/Analysis/ValueTracking.cpp
+51-12 files

LLVM/project 6587af1llvm/lib/Target/AMDGPU AMDGPUCodeGenPrepare.cpp, llvm/test/CodeGen/AMDGPU fract-match.ll

AMDGPU: Match fract pattern with swapped edge case check

A fract implementation can equivalently be written as
  r = fmin(x - floor(x))
  r = isnan(x) ? x : r;
  r = isinf(x) ? 0.0 : r;

or:
  r = fmin(x - floor(x));
  r = isinf(x) ? 0.0 : r;
  r = isnan(x) ? x : r;

Previously this only matched the previous form. Match
the case where the isinf check is the inner clamp. There are
a few more ways to write this pattern (e.g., move the clamp of
infinity to the input) but I haven't encountered that in the wild.

The existing code seems to be trying too hard to match noncanonical
variants of the pattern. Only handles the result that all 4 permutations
of compare and select produce out of instcombine.
DeltaFile
+1,401-1llvm/test/CodeGen/AMDGPU/fract-match.ll
+47-17llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+1,448-182 files

LLVM/project ba823d0llvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor/AMDGPU nofpclass-amdgcn-fract.ll

ValueTracking: llvm.amdgcn.fract cannot introduce overflow

This returns a value with an absolute value less than 1.
DeltaFile
+26-0llvm/test/Transforms/Attributor/AMDGPU/nofpclass-amdgcn-fract.ll
+2-1llvm/lib/Analysis/ValueTracking.cpp
+28-12 files

LLVM/project 88bc265mlir/lib/ExecutionEngine LevelZeroRuntimeWrappers.cpp, mlir/lib/Target/LLVM/XeVM Target.cpp

[XeVM] Use `ocloc` for binary generation. (#188331)

XeVM currently doesn't support native binary generation. This PR enables
Ahead of Time (AOT) compilation of gpu module to native binary using
`ocloc`.

Currently, only works with LevelZeroRuntimeWrappers.
DeltaFile
+19-6mlir/lib/Target/LLVM/XeVM/Target.cpp
+15-8mlir/lib/ExecutionEngine/LevelZeroRuntimeWrappers.cpp
+34-142 files

LLVM/project 34a4fe5clang/lib/Headers __clang_spirv_libdevice_declares.h

[OFFLOAD] Fix a build break (#189076)

This PR fixes a build break reported after introduction of spirv
function declarations
DeltaFile
+5-0clang/lib/Headers/__clang_spirv_libdevice_declares.h
+5-01 files

LLVM/project 42ac467clang/test/OpenMP target_teams_distribute_parallel_for_simd_schedule_codegen.cpp teams_distribute_parallel_for_simd_schedule_codegen.cpp, libc/AOR_v20.02/math/test/traces sincosf.txt exp.txt

Merge branch 'main' into users/amehsan/weakc-nsw
DeltaFile
+0-31,999libc/AOR_v20.02/math/test/traces/sincosf.txt
+0-16,000libc/AOR_v20.02/math/test/traces/exp.txt
+6,835-6,798llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+6,432-6,562llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-7.ll
+5,294-4,814clang/test/OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp
+5,238-4,758clang/test/OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp
+23,799-70,9319,495 files not shown
+592,132-389,7639,501 files

LLVM/project c703ea5clang/lib/Headers/hlsl hlsl_alias_intrinsics.h, clang/lib/Sema SemaHLSL.cpp

[HLSL][DirectX][SPIRV] Implement the `fma` API (#185304)

This PR adds `fma` HLSL intrinsic (with support for matrices)
It follows all of the steps from #99117.
Closes #99117.
DeltaFile
+138-0clang/test/CodeGenHLSL/builtins/fma.hlsl
+113-0clang/test/SemaHLSL/BuiltIns/fma-errors.hlsl
+54-0clang/lib/Headers/hlsl/hlsl_alias_intrinsics.h
+53-0llvm/test/CodeGen/DirectX/fma.ll
+35-0clang/lib/Sema/SemaHLSL.cpp
+11-3llvm/lib/Target/DirectX/DXILShaderFlags.cpp
+404-34 files not shown
+430-610 files