LLVM/project 323285fllvm/test/Transforms/SLPVectorizer/X86 revec-reduced-value-vectorized-later.ll

[SLP] Update test against const-folding (#202532)

223ef1f3 ([IRBuilder] ConstFold unary intrinsics, #200496) made a lot of
test updates to SLPVectorizer. The tests were written a long time ago,
and it is unclear what their intent was, but at least update the one
test to replace constants with arguments, where the intent is clear.
DeltaFile
+40-4llvm/test/Transforms/SLPVectorizer/X86/revec-reduced-value-vectorized-later.ll
+40-41 files

LLVM/project f7e4167llvm/docs AMDGPUUsage.rst, llvm/lib/Target/AMDGPU SIMemoryLegalizer.cpp SIInstrInfo.h

[AMDGPU] Clamp load_monitor scope to minimum SCOPE_SE (#198245)

The load_monitor instructions monitor L2 cache lines and therefore
require at least SCOPE_SE to ensure the L2 cache is hit. The current
memory model requires the user to ensure that the specified scope is
such that it results in at least SCOPE_SE, otherwise the behaviour is
undefined. Instead, we now clamp the emitted scope at a minimum of
SCOPE_SE, so that the undefined behaviour is converted into a
performance loss instead.

Assisted-By: Claude Opus 4.6
DeltaFile
+37-3llvm/test/CodeGen/AMDGPU/llvm.amdgcn.load.monitor.gfx1250.ll
+21-0llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+17-0llvm/lib/Target/AMDGPU/SIInstrInfo.h
+3-4llvm/docs/AMDGPUUsage.rst
+78-74 files

LLVM/project ca227bfllvm/lib/Target/AMDGPU AMDGPUInstCombineIntrinsic.cpp, llvm/test/Transforms/InstCombine/AMDGPU amdgcn-intrinsics.ll

[AMDGPU] Produce ballot/icmp/fcmp lane masks at wavefront width (#201358)
DeltaFile
+7-9llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+4-2llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+11-112 files

LLVM/project 9dfcf76llvm/docs ProgrammersManual.rst, llvm/include/llvm/ADT StringMap.h

[StringMap] Replace tombstone deletion with TAOCP 6.4 Algorithm R (#202103)

StringMap uses quadratic probing with lazy deletion: an erased entry
becomes a tombstone, a third bucket state alongside empty and live that
every find/insert must inspect.

Switch to linear probing with Knuth TAOCP 6.4 Algorithm R deletion,
similar to DenseMap #200595.

erase now relocates the following entries to close the hole. StringMap
buckets are pointers to heap-allocated entries, so only the pointers
(and the parallel hash array) move. References and pointers to entries
remain valid, but iterators are invalidated.

Depends on #202237 and #202520
Aided by Claude Opus 4.8
DeltaFile
+32-47llvm/include/llvm/ADT/StringMap.h
+28-50llvm/lib/Support/StringMap.cpp
+77-0llvm/unittests/ADT/StringMapTest.cpp
+2-2llvm/docs/ProgrammersManual.rst
+1-2llvm/utils/gdb-scripts/prettyprinters.py
+140-1015 files

LLVM/project 85ab773llvm/lib/Transforms/InstCombine InstCombineCasts.cpp, llvm/test/Transforms/InstCombine issue173148-sext-phi-select-infloop.ll

[InstCombine] Fix infinite combine loop in evaluateInDifferentType (#202572)

The implementation assumes that all original uses inside visited
instructions would get removed as part of changing the type. However,
this is not true for uses in select conditions, as only the value
operands change type in that case. Bail out if we encounter uses in
select conditions to avoid this.

Fixes https://github.com/llvm/llvm-project/issues/173148.
DeltaFile
+26-0llvm/test/Transforms/InstCombine/issue173148-sext-phi-select-infloop.ll
+12-2llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+38-22 files

LLVM/project 0f8a3b2llvm/lib/Target/AArch64 AArch64.h AArch64TargetMachine.cpp, llvm/lib/Target/AArch64/GISel AArch64PostLegalizerCombiner.cpp

[NewPM][AArch64][GlobalISel] Port AArch64PostLegalizerCombiner to NewPM (#194156)

Adds a standard porting.

Updates some (but not all) tests to verify the NewPM path is working.
DeltaFile
+156-103llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
+21-2llvm/lib/Target/AArch64/AArch64.h
+2-2llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+2-0llvm/lib/Target/AArch64/AArch64PassRegistry.def
+1-0llvm/test/CodeGen/AArch64/GlobalISel/opt-overlapping-and-postlegalize.mir
+1-0llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combine-ptr-add-chain.mir
+183-1074 files not shown
+187-10710 files

LLVM/project a1c1cddllvm/lib/Transforms/Scalar AlignmentFromAssumptions.cpp, llvm/test/Transforms/AlignmentFromAssumptions simple.ll

[AlignmentFromAssumes] Skip huge alignment (#202567)

Fixes https://github.com/llvm/llvm-project/issues/202043
Though `align` on huge alignment is not supported, the case below
confirms we allow huge alignment in `assume`:
https://github.com/llvm/llvm-project/blob/c4f4206ff3ab97db9577f11bb2dabd40896bcca9/llvm/test/Transforms/InstCombine/assume.ll#L71
In this case, we should skip huge alignment in AlignmentFromAssumes.
DeltaFile
+12-0llvm/test/Transforms/AlignmentFromAssumptions/simple.ll
+3-0llvm/lib/Transforms/Scalar/AlignmentFromAssumptions.cpp
+15-02 files

LLVM/project f744f48llvm/docs AMDGPUAsyncOperations.rst

[AMDGPU] Improve the description of asyncmark semantics

- The semantics of asyncmarks is now definded purely in terms of sequences,
  without referring to the implementation.
- The examples incorrectly used (post)dominance. Fixed that with wording in
  terms of asyncmark sequences.
DeltaFile
+113-51llvm/docs/AMDGPUAsyncOperations.rst
+113-511 files

LLVM/project 11cf295llvm/tools/llvm-exegesis/lib Assembler.h

Revert "[NFC][llvm-exegesis] Disable CFI-icall for JIT-executed function" (#202571)

Reverts llvm/llvm-project#202472
DeltaFile
+1-2llvm/tools/llvm-exegesis/lib/Assembler.h
+1-21 files

LLVM/project 00e3e6fllvm/lib/Target/X86 X86ScheduleC864GM7.td, llvm/test/tools/llvm-mca/X86/C864GM4 resources-x86_64.s

[X86] Hygon Processors Initial enablement (#187622)

This patch adds initial support for several Hygon architectures.

The Hygon architectures include:

- c86-4g-m4
- c86-4g-m6
- c86-4g-m7

This patch includes:

- Added Hygon architectures CPU targets recognition in Clang and LLVM
- Added Hygon architectures to target parser and host CPU detection
- Updated compiler-rt CPU model detection for Hygon architectures
- Added Hygon architectures to various optimizer tests
- Added scheduler models for Hygon architectures CPU targets
DeltaFile
+5,294-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-avx512vl.s
+3,721-0llvm/lib/Target/X86/X86ScheduleC864GM7.td
+3,264-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-avx512.s
+2,976-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-avx512bwvl.s
+2,894-0llvm/test/tools/llvm-mca/X86/C864GM7/resources-x86_64.s
+2,890-0llvm/test/tools/llvm-mca/X86/C864GM4/resources-x86_64.s
+21,039-0125 files not shown
+52,603-0131 files

LLVM/project f704a92clang/lib/CodeGen CGHLSLRuntime.cpp, clang/test/CodeGenHLSL/resources cbuffer.hlsl cbuffer-empty-struct-array.hlsl

Revert "[HLSL] Set visibility of cbuffer global variables to internal" (#202538)

Reverts llvm/llvm-project#200312

Breaks several buildbots, e.g.,
https://lab.llvm.org/buildbot/#/builders/203/builds/48531

Co-authored-by: Nikolas Klauser <nikolasklauser at berlin.de>
DeltaFile
+0-38llvm/test/CodeGen/DirectX/cbuffer_global_elim.ll
+0-36llvm/test/CodeGen/SPIRV/cbuffer_global_elim.ll
+10-14clang/test/CodeGenHLSL/resources/cbuffer.hlsl
+0-17llvm/lib/Frontend/HLSL/CBuffer.cpp
+2-10clang/lib/CodeGen/CGHLSLRuntime.cpp
+4-4clang/test/CodeGenHLSL/resources/cbuffer-empty-struct-array.hlsl
+16-11911 files not shown
+30-13717 files

LLVM/project b942ed2llvm/docs AMDGPUUsage.rst, mlir/include/mlir/Dialect/LLVMIR ROCDLDialect.td

Address comments, fix rebase
DeltaFile
+4-4llvm/docs/AMDGPUUsage.rst
+2-0mlir/include/mlir/Dialect/LLVMIR/ROCDLDialect.td
+6-42 files

LLVM/project 96480b2mlir/lib/Dialect/Bufferization/IR BufferizableOpInterface.cpp, mlir/test/Dialect/Bufferization/Transforms test-one-shot-module-bufferize.mlir one-shot-module-bufferize.mlir

[mlir][bufferization] Drop TensorLikeType::getBufferType() (#201350)

Replace TensorLikeType::getBufferType() with
options.unknownTypeConverterFn() hook. Make the hook work with
tensor-like and buffer-like types (instead of builtins) to maintain the
same behaviour at the API boundary level and still allow user types to
be properly supported.

Historically, an attempt to support user types within the one-shot
bufferization framework was made. As part of it,
TensorLikeType::getBufferType() was introduced to allow user-provided
types to customize bufferization. However, the whole affair proved to be
overly complex: there is an interface with customization points for
user-provided tensors, and options-based (not sufficient) implementation
for builtin tensors. On top of this, there was always a
function-specific hook to customize function-level behaviour further. As
a result of this, users would need to implement two different mechanisms
on their end: interface implementation + option hooks.


    [19 lines not shown]
DeltaFile
+298-0mlir/test/Dialect/Bufferization/Transforms/test-one-shot-module-bufferize.mlir
+0-232mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir
+16-48mlir/test/lib/Dialect/Test/TestOpDefs.cpp
+22-24mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
+32-7mlir/test/lib/Dialect/Bufferization/TestOneShotModuleBufferize.cpp
+0-38mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize.mlir
+368-34913 files not shown
+407-45219 files

LLVM/project 9ac836bclang/lib/AST/ByteCode Interp.h, clang/test/AST/ByteCode intap.cpp

[clang][bytecode] Fix shifting by negative IntAP values (#202505)

The negation of a negative value didn't necessarily result in a positive
value. Fix that by giving it one more bit of precision.
DeltaFile
+8-2clang/test/AST/ByteCode/intap.cpp
+1-1clang/lib/AST/ByteCode/Interp.h
+9-32 files

LLVM/project 0e9ff98offload/plugins-nextgen/level_zero/src L0Queue.cpp

[OFFLOAD][L0] Add wait events for AsyncQueue memFill (#202287)

Fix an issue where memFill operations were not chained properly with respect prior operations.
DeltaFile
+3-1offload/plugins-nextgen/level_zero/src/L0Queue.cpp
+3-11 files

LLVM/project d7d9601llvm/include/llvm/Analysis Loads.h, llvm/lib/Analysis Loads.cpp ValueTracking.cpp

[Loads] Migrate isDereferenceable APIs to SimplifyQuery (#202553)

These take the usual set of analysis parameters, so we can encapsulate
them using SimplifyQuery.
DeltaFile
+55-77llvm/lib/Analysis/Loads.cpp
+9-13llvm/include/llvm/Analysis/Loads.h
+5-3llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
+3-3llvm/lib/CodeGen/TargetLoweringBase.cpp
+3-3llvm/lib/Analysis/ValueTracking.cpp
+2-2llvm/lib/CodeGen/MachineOperand.cpp
+77-1015 files not shown
+86-10811 files

LLVM/project 3519974llvm/tools/llvm-exegesis/lib Assembler.h

Revert "[NFC][llvm-exegesis] Disable CFI-icall for JIT-executed function (#20…"

This reverts commit 1ada747bd3be297fcd2ac309f679f0a4024c1b1f.
DeltaFile
+1-2llvm/tools/llvm-exegesis/lib/Assembler.h
+1-21 files

LLVM/project 01d3932clang/lib/Headers __clang_hip_runtime_wrapper.h, clang/test/Headers hip-constexpr-cmath.hip

[Clang][HIP] Include `__clang_cuda_math_forward_declares.h` before `<cmath>` (#201563)

In HIP, `constexpr` functions are treated as both `__host__` and
`__device__`.

A new version of the MS STL shipped with the build tools version
14.51.36231 has `constexpr` definitions for some `cmath` functions when
the
compiler in use is Clang (this gets worse when C++23 is in use).

These definitions conflict with the `__device__` declarations we provide
in the header wrappers.

There is a workaround for this: We do not mark `constexpr`
functions [_that are defined in a system
header_](https://github.com/llvm/llvm-project/blob/03127a03860b9d8cb440fe8f51c00647f45eb8be/clang/lib/Sema/SemaCUDA.cpp#L877)
as
`__host__` and `__device__` if there is a previous `__device__`
 declaration.

    [14 lines not shown]
DeltaFile
+70-0clang/test/Headers/hip-constexpr-cmath.hip
+6-1clang/lib/Headers/__clang_hip_runtime_wrapper.h
+76-12 files

LLVM/project c4f4206libcxx/test/libcxx/containers/sequences/vector nodiscard.iterator.verify.cpp

[libc++][vector] Test `[[nodiscard]]` applied to `vector::iterator` (#202262)

Adds test coverage.

`[[nodicard]]` applied in:

- #198489
- #198492

Towards #172124

Co-authored-by: Hristo Hristov <zingam at outlook.com>
DeltaFile
+52-0libcxx/test/libcxx/containers/sequences/vector/nodiscard.iterator.verify.cpp
+52-01 files

LLVM/project 09b451flibcxx/include __bit_reference, libcxx/test/libcxx/containers/sequences/vector.bool nodiscard.iterator.verify.cpp

[libc++][vector] Apply `[[nodiscard]]` to `vector<bool>::iterator` (#202265)

Towards #172124

Co-authored-by: Hristo Hristov <zingam at outlook.com>
DeltaFile
+52-0libcxx/test/libcxx/containers/sequences/vector.bool/nodiscard.iterator.verify.cpp
+9-6libcxx/include/__bit_reference
+61-62 files

LLVM/project 96a2636clang/lib/AST/ByteCode InterpFrame.h InterpFrame.cpp

[clang][bytecode] Remove `InterpFrame::ThisPointerOffset` (#202322)

Replace it with a `uint8_t` representing some bool flags about the
function. This reduces the size of a frame from 88 to 80 bytes.
DeltaFile
+9-4clang/lib/AST/ByteCode/InterpFrame.h
+4-6clang/lib/AST/ByteCode/InterpFrame.cpp
+13-102 files

LLVM/project 0f6f15allvm/include/llvm/Analysis ScalarEvolution.h, llvm/lib/Transforms/Utils ScalarEvolutionExpander.cpp

[SCEVExpander] Don't expand a UDiv with a possibly-poison divisor (#202378)

SCEVExpander::isSafeToExpand only check divisor isKnownNonZero, which
ignore the possibility of poison. For the following divisor:
```
%ct = call i32 @llvm.cttz.i32(i32 %x, i1 true)
%divisor = add i32 %ct, 1
...
%rem = urem i32 1, %divisor
```
The urem may be hoisted unsafely.

Fix by also check divisor isGuaranteedNotToBePoison.

Fixes https://github.com/llvm/llvm-project/issues/202028
DeltaFile
+42-0llvm/test/Transforms/IndVarSimplify/exit-value-safe-udiv.ll
+3-3llvm/include/llvm/Analysis/ScalarEvolution.h
+2-1llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
+47-43 files

LLVM/project 1b819adllvm/docs AMDGPUUsage.rst, llvm/include/llvm/Support AMDGPUAddrSpace.h

Fix docs
DeltaFile
+2-2llvm/docs/AMDGPUUsage.rst
+1-1llvm/include/llvm/Support/AMDGPUAddrSpace.h
+3-32 files

LLVM/project 05de6cellvm/test/CodeGen/AMDGPU s-barrier-id-allocation.ll, mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td

Fix MLIR
DeltaFile
+21-21llvm/test/CodeGen/AMDGPU/s-barrier-id-allocation.ll
+6-6mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+4-4mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-barriers-gfx12.mlir
+2-2mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+1-1mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+34-345 files

LLVM/project ed352e2llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU addrspacecast-barrier.ll s-barrier.ll

[RFC][AMDGPU] Add BARRIER address space

Add a new BARRIER address space that is used for global variables that are used to represent the barrier IDs in GFX12.5.

These barrier addresses just have values corresponding 1-1 to barrier IDs. They are still implemented on top of LDS, but the offsetting happens during an addrspacecast to generic, not whenever the barrier GV is used.

The motivation for this is to make the relation between LDS and barrier GVs explicit in the compiler. It does add a bit more complexity, but that complexity was already there, just hidden by pretending barrier GVs were actual LDS.
DeltaFile
+442-0llvm/test/CodeGen/AMDGPU/addrspacecast-barrier.ll
+62-45llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+54-31llvm/test/CodeGen/AMDGPU/s-barrier.ll
+35-31llvm/test/CodeGen/AMDGPU/s-barrier-lowering.ll
+52-14llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+32-32llvm/test/CodeGen/AMDGPU/amdgpu-lower-exec-sync-and-module-lds.ll
+677-15342 files not shown
+1,107-44048 files

LLVM/project 2b3bb59llvm/lib/Target/AMDGPU AMDGPUMemoryUtils.cpp AMDGPUMemoryUtils.h

[NFC][AMDGPU] Generalize some LDS MemoryUtils

In preparation for upcoming work, I need some functions used by the LDS lowering
system to work on any GV. I removed the LDS specific queries inside these functions
and replaced them with functors passed by the caller, so these utility functions can be reused.

I also cleaned-up a few things that weren't up to code, such as lowercase variable names.
DeltaFile
+30-36llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
+37-9llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.h
+20-17llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+21-7llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+7-6llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+115-755 files

LLVM/project 9408122clang/lib/CodeGen TargetInfo.h CodeGenModule.cpp, clang/lib/CodeGen/Targets AMDGPU.cpp SPIR.cpp

[NFCI][clang] Allow overriding any global variable address space

Allow the target to change the AS of a global variable at will, not just whenever Clang cannot assign one.
This enables the next patch that will specialize LDS GVs for barriers as a separate address space.
DeltaFile
+10-9clang/lib/CodeGen/Targets/AMDGPU.cpp
+12-6clang/lib/CodeGen/TargetInfo.h
+7-8clang/lib/CodeGen/Targets/SPIR.cpp
+11-2clang/lib/CodeGen/CodeGenModule.cpp
+5-6clang/lib/CodeGen/TargetInfo.cpp
+6-3clang/lib/CodeGen/Targets/AVR.cpp
+51-346 files

LLVM/project 435be7fllvm/lib/Target/AMDGPU AMDGPULowerExecSync.cpp

clang-format
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+1-21 files

LLVM/project bfb1cd6llvm/test/CodeGen/SPIRV preserve-interface.ll, mlir/include/mlir/Dialect/LLVMIR NVVMOps.td

Merge branch 'main' into revert-200312-emit-cbuffer-globals-as-internal
DeltaFile
+137-41mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+112-26mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+101-0mlir/test/Conversion/SPIRVToLLVM/cl-ops-to-llvm.mlir
+69-5mlir/test/Target/LLVMIR/nvvmir-invalid.mlir
+0-69llvm/test/CodeGen/SPIRV/preserve-interface.ll
+55-0mlir/test/Conversion/SPIRVToLLVM/cast-ops-to-llvm.mlir
+474-14139 files not shown
+917-40745 files

LLVM/project 944284fmlir/include/mlir/Interfaces ControlFlowInterfaces.td

[mlir][Interfaces] Document completeness requirement of `RegionBranchOpInterface` (#202018)

Document that interface implementations must report all possible control
flow edges. Failure to report a possible edge may break
analyses/transformations/APIs such as
`RegionBranchOpInterface::isRepetitiveRegion`.
DeltaFile
+19-20mlir/include/mlir/Interfaces/ControlFlowInterfaces.td
+19-201 files