LLVM/project 8b02238clang/docs LanguageExtensions.rst, clang/include/clang/Basic BuiltinsAMDGPUDocs.td

[AMDGPU][Clang] add __builtin_amdgcn_av_(load|store)_b128 (#199176)

These builtins allow the program to request store-available and
load-visible accesses as described in #191246. Each of them takes a
__MEMORY_SCOPE_* operand that is then translated to target-specific
cache policy bits.

This patch was extracted from #172090.

Co-authored-by: macurtis-amd <macurtis at amd.com>
Assisted-by: Claude Opus 4.6

---------

Co-authored-by: macurtis-amd <macurtis at amd.com>
DeltaFile
+250-0clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-store.cl
+29-0clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+28-0clang/docs/LanguageExtensions.rst
+24-0clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-target-error.cl
+22-0clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-error.cl
+20-0clang/test/SemaHIP/amdgpu-av-load-store.hip
+373-09 files not shown
+432-715 files

LLVM/project 54d5646clang/test/SemaCXX attr-section.cpp

[Sema] Add original GH192264 reproducer as a section-conflict regression test (#202276)

Follow up of https://github.com/llvm/llvm-project/pull/200873
DeltaFile
+7-0clang/test/SemaCXX/attr-section.cpp
+7-01 files

LLVM/project 9129363llvm/include/llvm/CodeGen/GlobalISel GIMatchTableExecutorImpl.h

[GlobalISel] Use more inline elements in a match table SmallVector (#202568)

The 4 inline elements for OnFailResumeAt only cover 36.8% (33455 / 90902) of
aarch64-isel executeMatchTable invocations encountered while compiling sqlite3
on aarch64-O0-g.

The 8 inline elements cover 100% (maximum observed depth was 6). Small -0.09%
CTMark geomean improvement on aarch64-O0-g.

https://llvm-compile-time-tracker.com/compare.php?from=2de2edb943fe1b83d79bdffa03606eb8c5452e9b&to=8deb4f949b5f80a26a8a61775fb411bf30fefd80&stat=instructions%3Au

Assisted-by: codex
DeltaFile
+1-1llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h
+1-11 files

LLVM/project 4c2061aclang/include/clang/Basic BuiltinsAMDGPU.td, clang/test/CodeGen link-builtin-bitcode.c amdgpu-builtin-is-invocable.c

fix failing tests
DeltaFile
+3-3clang/test/CodeGen/link-builtin-bitcode.c
+2-2clang/test/CodeGenCXX/dynamic-cast-address-space.cpp
+1-1clang/test/CodeGen/amdgpu-builtin-is-invocable.c
+1-1clang/test/CodeGen/amdgpu-builtin-processor-is.c
+1-1clang/include/clang/Basic/BuiltinsAMDGPU.td
+8-85 files

LLVM/project 271c8b7mlir/include/mlir/Dialect/Linalg/IR LinalgEnums.td, mlir/lib/Dialect/Linalg/IR LinalgOps.cpp

[mlir][linalg] Add sin, cos, tan to elementwise operations (#200950)

Add sin, cos, and tan as UnaryFn entries in the linalg dialect, enabling
their use via linalg.elementwise, named ops (linalg.sin, linalg.cos,
linalg.tan), and specialization from linalg.generic.

---------

Co-authored-by: Vinit Deodhar <vinitdeodhar at users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply at anthropic.com>
Co-authored-by: Vinit Deodhar <vdeodhar at ah-vdeodhar-l.dhcp.mathworks.com>
DeltaFile
+31-1mlir/test/Dialect/Linalg/specialize-generic-ops.mlir
+15-0mlir/test/Dialect/Linalg/roundtrip-morphism-linalg-category-ops.mlir
+11-0mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
+6-0mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+4-1mlir/include/mlir/Dialect/Linalg/IR/LinalgEnums.td
+67-25 files

LLVM/project adac149llvm/lib/Target/AMDGPU SIFoldOperands.cpp, llvm/test/CodeGen/AMDGPU si-fold-operands-bundle.mir ds_gws_align.ll

[AMDGPU] SIFoldOperands: update BUNDLE header implicit use when folding

When folding an operand inside a BUNDLE, also rewrite the matching
implicit use on the bundle header. LiveVariables iterates a
MachineBasicBlock with the bundle-aware iterator and only inspects the
header, so without this update its kill flags go stale and a later
MachineVerifier run reports "Using a killed virtual register".

Co-Authored-By: Claude Opus 4 <noreply at anthropic.com>
DeltaFile
+61-0llvm/test/CodeGen/AMDGPU/si-fold-operands-bundle.mir
+12-0llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+11-0llvm/test/CodeGen/AMDGPU/ds_gws_align.ll
+84-03 files

LLVM/project 057a1fellvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU async-buffer-loads.ll

[AMDGPU][GISel] Add register bank legalization rules for amdgcn_raw_buffer_load_async_lds (#201406)

Also amdgcn_struct_buffer_load_async_lds,
amdgcn_raw_ptr_buffer_load_async_lds, and
amdgcn_struct_ptr_buffer_load_async_lds.
DeltaFile
+22-1llvm/test/CodeGen/AMDGPU/async-buffer-loads.ll
+15-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+37-12 files

LLVM/project 7581dc5clang/lib/AST/ByteCode Compiler.cpp, clang/test/AST/ByteCode unions.cpp

[clang][bytecode] Fix an assertion failure in visitDtorCall() (#202507)

In `emitDestructionPop()`, we assert that the Descriptor has a
non-trivial dtor. Check this first here so we don't do all this work for
nothing.
DeltaFile
+12-0clang/test/AST/ByteCode/unions.cpp
+5-0clang/lib/AST/ByteCode/Compiler.cpp
+17-02 files

LLVM/project 323285fllvm/test/Transforms/SLPVectorizer/X86 revec-reduced-value-vectorized-later.ll

[SLP] Update test against const-folding (#202532)

223ef1f3 ([IRBuilder] ConstFold unary intrinsics, #200496) made a lot of
test updates to SLPVectorizer. The tests were written a long time ago,
and it is unclear what their intent was, but at least update the one
test to replace constants with arguments, where the intent is clear.
DeltaFile
+40-4llvm/test/Transforms/SLPVectorizer/X86/revec-reduced-value-vectorized-later.ll
+40-41 files

LLVM/project f7e4167llvm/docs AMDGPUUsage.rst, llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp SIInstrInfo.h

[AMDGPU] Clamp load_monitor scope to minimum SCOPE_SE (#198245)

The load_monitor instructions monitor L2 cache lines and therefore
require at least SCOPE_SE to ensure the L2 cache is hit. The current
memory model requires the user to ensure that the specified scope is
such that it results in at least SCOPE_SE, otherwise the behaviour is
undefined. Instead, we now clamp the emitted scope at a minimum of
SCOPE_SE, so that the undefined behaviour is converted into a
performance loss instead.

Assisted-By: Claude Opus 4.6
DeltaFile
+37-3llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.monitor.gfx1250.ll
+21-0llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+17-0llvm/lib/Target/AMDGPU/SIInstrInfo.h
+3-4llvm/docs/AMDGPUUsage.rst
+78-74 files

LLVM/project ca227bfllvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-intrinsics.ll

[AMDGPU] Produce ballot/icmp/fcmp lane masks at wavefront width (#201358)
DeltaFile
+7-9llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+4-2llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+11-112 files

LLVM/project 9dfcf76llvm/docs ProgrammersManual.rst, llvm/include/llvm/ADT StringMap.h

[StringMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R (#202103)

StringMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with Knuth TAOCP 6.4 Algorithm R deletion,
similar to DenseMap #200595.

erase now relocates the following entries to close the hole. StringMap
buckets are pointers to heap-allocated entries, so only the pointers
(and the parallel hash array) move. References and pointers to entries
remain valid, but iterators are invalidated.

Depends on #202237 and #202520
Aided by Claude Opus 4.8
DeltaFile
+32-47llvm/include/llvm/ADT/StringMap.h
+28-50llvm/lib/Support/StringMap.cpp
+77-0llvm/unittests/ADT/StringMapTest.cpp
+2-2llvm/docs/ProgrammersManual.rst
+1-2llvm/utils/gdb-scripts/prettyprinters.py
+140-1015 files

LLVM/project 85ab773llvm/lib/Transforms/InstCombine InstCombineCasts.cpp, llvm/test/Transforms/InstCombine issue173148-sext-phi-select-infloop.ll

[InstCombine] Fix infinite combine loop in evaluateInDifferentType (#202572)

The implementation assumes that all original uses inside visited
instructions would get removed as part of changing the type. However,
this is not true for uses in select conditions, as only the value
operands change type in that case. Bail out if we encounter uses in
select conditions to avoid this.

Fixes https://github.com/llvm/llvm-project/issues/173148.
DeltaFile
+26-0llvm/test/Transforms/InstCombine/issue173148-sext-phi-select-infloop.ll
+12-2llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+38-22 files

LLVM/project c2f1439clang/include/clang/Basic BuiltinsAMDGPU.td, clang/test/SemaOpenCL builtins-amdgcn-global-load-store-target-error.cl

add ArgNames; use target feature "flat-global-insts"
DeltaFile
+8-0llvm/lib/TargetParser/AMDGPUTargetParser.cpp
+5-3clang/include/clang/Basic/BuiltinsAMDGPU.td
+2-2clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-target-error.cl
+15-53 files

LLVM/project 43d1fb1clang/include/clang/Basic BuiltinsAMDGPUDocs.td BuiltinsAMDGPU.td, clang/test/SemaHIP amdgpu-av-load-store.hip

add docs for the builtins; split the target test; add a host/device test
DeltaFile
+29-0clang/include/clang/Basic/BuiltinsAMDGPUDocs.td
+12-14clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-target-error.cl
+20-0clang/test/SemaHIP/amdgpu-av-load-store.hip
+8-2clang/include/clang/Basic/BuiltinsAMDGPU.td
+69-164 files

LLVM/project 9315208clang/docs LanguageExtensions.rst, clang/lib/CodeGen/TargetBuiltins AMDGPU.cpp

[AMDGPU][Clang] add __builtin_amdgcn_av_(load|store)_b128

These builtins allow the program to request store-available and load-visible
accesses as described in #191246. Each of them takes a __MEMORY_SCOPE_* operand
that is then translated to target-specific cache policy bits.

This patch was extracted from #172090.

Co-authored-by: macurtis-amd <macurtis at amd.com>
Assisted-by: Claude Opus 4.6z
DeltaFile
+250-0clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-store.cl
+28-0clang/docs/LanguageExtensions.rst
+26-0clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-target-error.cl
+22-0clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-error.cl
+16-0clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+12-0clang/lib/Sema/SemaAMDGPU.cpp
+354-02 files not shown
+362-08 files

LLVM/project 663fc1eclang/test/CodeGen/LoongArch/lasx builtin-alias.c builtin.c, llvm/test/CodeGen/AMDGPU memintrinsic-unroll.ll

Merge upstream/main into users/mariusz-sikora-at-amd/gfx13/add-atomic-optimization-tests
DeltaFile
+3,563-3,543llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll
+6,598-111llvm/test/CodeGen/X86/clmul-vector.ll
+2,749-2,749clang/test/CodeGen/LoongArch/lasx/builtin-alias.c
+2,745-2,745clang/test/CodeGen/LoongArch/lasx/builtin.c
+3,092-2,392llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
+5,294-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-avx512vl.s
+24,041-11,5403,354 files not shown
+218,020-89,0953,360 files

LLVM/project 0f8a3b2llvm/lib/Target/AArch64 AArch64.h AArch64TargetMachine.cpp, llvm/lib/Target/AArch64/GISel AArch64PostLegalizerCombiner.cpp

[NewPM][AArch64][GlobalISel] Port AArch64PostLegalizerCombiner to NewPM (#194156)

Adds a standard porting.

Updates some (but not all) tests to verify the NewPM path is working.
DeltaFile
+156-103llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
+21-2llvm/lib/Target/AArch64/AArch64.h
+2-2llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+2-0llvm/lib/Target/AArch64/AArch64PassRegistry.def
+1-0llvm/test/CodeGen/AArch64/GlobalISel/opt-overlapping-and-postlegalize.mir
+1-0llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combine-ptr-add-chain.mir
+183-1074 files not shown
+187-10710 files

LLVM/project a1c1cddllvm/lib/Transforms/Scalar AlignmentFromAssumptions.cpp, llvm/test/Transforms/AlignmentFromAssumptions simple.ll

[AlignmentFromAssumes] Skip huge alignment (#202567)

Fixes https://github.com/llvm/llvm-project/issues/202043
Though `align` on huge alignment is not supported, the case below
confirms we allow huge alignment in `assume`:
https://github.com/llvm/llvm-project/blob/c4f4206ff3ab97db9577f11bb2dabd40896bcca9/llvm/test/Transforms/InstCombine/assume.ll#L71
In this case, we should skip huge alignment in AlignmentFromAssumes.
DeltaFile
+12-0llvm/test/Transforms/AlignmentFromAssumptions/simple.ll
+3-0llvm/lib/Transforms/Scalar/AlignmentFromAssumptions.cpp
+15-02 files

LLVM/project f744f48llvm/docs AMDGPUAsyncOperations.rst

[AMDGPU] Improve the description of asyncmark semantics

- The semantics of asyncmarks is now definded purely in terms of sequences,
  without referring to the implementation.
- The examples incorrectly used (post)dominance. Fixed that with wording in
  terms of asyncmark sequences.
DeltaFile
+113-51llvm/docs/AMDGPUAsyncOperations.rst
+113-511 files

LLVM/project 11cf295llvm/tools/llvm-exegesis/lib Assembler.h

Revert "[NFC][llvm-exegesis] Disable CFI-icall for JIT-executed function" (#202571)

Reverts llvm/llvm-project#202472
DeltaFile
+1-2llvm/tools/llvm-exegesis/lib/Assembler.h
+1-21 files

LLVM/project 00e3e6fllvm/lib/Target/X86 X86ScheduleC864GM7.td, llvm/test/tools/llvm-mca/X86/C864GM4 resources-x86_64.s

[X86] Hygon Processors Initial enablement (#187622)

This patch adds initial support for several Hygon architectures.

The Hygon architectures include:

- c86-4g-m4
- c86-4g-m6
- c86-4g-m7

This patch includes:

- Added Hygon architectures CPU targets recognition in Clang and LLVM
- Added Hygon architectures to target parser and host CPU detection
- Updated compiler-rt CPU model detection for Hygon architectures
- Added Hygon architectures to various optimizer tests
- Added scheduler models for Hygon architectures CPU targets
DeltaFile
+5,294-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-avx512vl.s
+3,721-0llvm/lib/Target/X86/X86ScheduleC864GM7.td
+3,264-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-avx512.s
+2,976-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-avx512bwvl.s
+2,894-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-x86_64.s
+2,890-0llvm/test/tools/llvm-mca/X86/C864GM4/resources-x86_64.s
+21,039-0125 files not shown
+52,603-0131 files

LLVM/project f704a92clang/lib/CodeGen CGHLSLRuntime.cpp, clang/test/CodeGenHLSL/resources cbuffer.hlsl cbuffer-empty-struct-array.hlsl

Revert "[HLSL] Set visibility of cbuffer global variables to internal" (#202538)

Reverts llvm/llvm-project#200312

Breaks several buildbots, e.g.,
https://lab.llvm.org/buildbot/#/builders/203/builds/48531

Co-authored-by: Nikolas Klauser <nikolasklauser at berlin.de>
DeltaFile
+0-38llvm/test/CodeGen/DirectX/cbuffer_global_elim.ll
+0-36llvm/test/CodeGen/SPIRV/cbuffer_global_elim.ll
+10-14clang/test/CodeGenHLSL/resources/cbuffer.hlsl
+0-17llvm/lib/Frontend/HLSL/CBuffer.cpp
+2-10clang/lib/CodeGen/CGHLSLRuntime.cpp
+4-4clang/test/CodeGenHLSL/resources/cbuffer-empty-struct-array.hlsl
+16-11911 files not shown
+30-13717 files

LLVM/project b942ed2llvm/docs AMDGPUUsage.rst, mlir/include/mlir/Dialect/LLVMIR ROCDLDialect.td

Address comments, fix rebase
DeltaFile
+4-4llvm/docs/AMDGPUUsage.rst
+2-0mlir/include/mlir/Dialect/LLVMIR/ROCDLDialect.td
+6-42 files

LLVM/project 96480b2mlir/lib/Dialect/Bufferization/IR BufferizableOpInterface.cpp, mlir/test/Dialect/Bufferization/Transforms test-one-shot-module-bufferize.mlir one-shot-module-bufferize.mlir

[mlir][bufferization] Drop TensorLikeType::getBufferType() (#201350)

Replace TensorLikeType::getBufferType() with
options.unknownTypeConverterFn() hook. Make the hook work with
tensor-like and buffer-like types (instead of builtins) to maintain the
same behaviour at the API boundary level and still allow user types to
be properly supported.

Historically, an attempt to support user types within the one-shot
bufferization framework was made. As part of it,
TensorLikeType::getBufferType() was introduced to allow user-provided
types to customize bufferization. However, the whole affair proved to be
overly complex: there is an interface with customization points for
user-provided tensors, and options-based (not sufficient) implementation
for builtin tensors. On top of this, there was always a
function-specific hook to customize function-level behaviour further. As
a result of this, users would need to implement two different mechanisms
on their end: interface implementation + option hooks.


    [19 lines not shown]
DeltaFile
+298-0mlir/test/Dialect/Bufferization/Transforms/test-one-shot-module-bufferize.mlir
+0-232mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir
+16-48mlir/test/lib/Dialect/Test/TestOpDefs.cpp
+22-24mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
+32-7mlir/test/lib/Dialect/Bufferization/TestOneShotModuleBufferize.cpp
+0-38mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize.mlir
+368-34913 files not shown
+407-45219 files

LLVM/project 9ac836bclang/lib/AST/ByteCode Interp.h, clang/test/AST/ByteCode intap.cpp

[clang][bytecode] Fix shifting by negative IntAP values (#202505)

The negation of a negative value didn't necessarily result in a positive
value. Fix that by giving it one more bit of precision.
DeltaFile
+8-2clang/test/AST/ByteCode/intap.cpp
+1-1clang/lib/AST/ByteCode/Interp.h
+9-32 files

LLVM/project 0e9ff98offload/plugins-nextgen/level_zero/src L0Queue.cpp

[OFFLOAD][L0] Add wait events for AsyncQueue memFill (#202287)

Fix an issue where memFill operations were not chained properly with respect prior operations.
DeltaFile
+3-1offload/plugins-nextgen/level_zero/src/L0Queue.cpp
+3-11 files

LLVM/project d7d9601llvm/include/llvm/Analysis Loads.h, llvm/lib/Analysis Loads.cpp ValueTracking.cpp

[Loads] Migrate isDereferenceable APIs to SimplifyQuery (#202553)

These take the usual set of analysis parameters, so we can encapsulate
them using SimplifyQuery.
DeltaFile
+55-77llvm/lib/Analysis/Loads.cpp
+9-13llvm/include/llvm/Analysis/Loads.h
+5-3llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
+3-3llvm/lib/CodeGen/TargetLoweringBase.cpp
+3-3llvm/lib/Analysis/ValueTracking.cpp
+2-2llvm/lib/CodeGen/MachineOperand.cpp
+77-1015 files not shown
+86-10811 files

LLVM/project 3519974llvm/tools/llvm-exegesis/lib Assembler.h

Revert "[NFC][llvm-exegesis] Disable CFI-icall for JIT-executed function (#20…"

This reverts commit 1ada747bd3be297fcd2ac309f679f0a4024c1b1f.
DeltaFile
+1-2llvm/tools/llvm-exegesis/lib/Assembler.h
+1-21 files

LLVM/project 01d3932clang/lib/Headers __clang_hip_runtime_wrapper.h, clang/test/Headers hip-constexpr-cmath.hip

[Clang][HIP] Include `__clang_cuda_math_forward_declares.h` before `<cmath>` (#201563)

In HIP, `constexpr` functions are treated as both `__host__` and
`__device__`.

A new version of the MS STL shipped with the build tools version
14.51.36231 has `constexpr` definitions for some `cmath` functions when
the
compiler in use is Clang (this gets worse when C++23 is in use).

These definitions conflict with the `__device__` declarations we provide
in the header wrappers.

There is a workaround for this: We do not mark `constexpr`
functions [_that are defined in a system
header_](https://github.com/llvm/llvm-project/blob/03127a03860b9d8cb440fe8f51c00647f45eb8be/clang/lib/Sema/SemaCUDA.cpp#L877)
as
`__host__` and `__device__` if there is a previous `__device__`
 declaration.

    [14 lines not shown]
DeltaFile
+70-0clang/test/Headers/hip-constexpr-cmath.hip
+6-1clang/lib/Headers/__clang_hip_runtime_wrapper.h
+76-12 files