LLVM/project 613c5b4flang/lib/Semantics tools.cpp, flang/test/Lower/CUDA cuda-program-global.cuf

[flang][cuda] Lower unified variables as cuf.alloc in main program scope (#190713)

Remove the unified exception from CanCUDASymbolBeGlobal so unified
variables follow the same cuf.alloc lowering path as other CUDA data
attributes.
DeltaFile
+1-3flang/lib/Semantics/tools.cpp
+2-1flang/test/Lower/CUDA/cuda-program-global.cuf
+3-42 files

LLVM/project f9adee2llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPUInstructionSelector.cpp, llvm/test/CodeGen/AMDGPU asyncmark-gfx12plus.ll asyncmark-err.ll

[AMDGPU] asyncmark support for ASYNC_CNT (#185813)

The ASYNC_CNT is used to track the progress of asynchronous copies
between global and LDS memories. By including it in asyncmark, the
compiler can now assist the programmer in generating waits for
ASYNC_CNT.

Assisted-By: Claude Sonnet 4.5

This is part of a stack:

- #185813
- #185810 

Fixes: LCOMPILER-332
DeltaFile
+359-0llvm/test/CodeGen/AMDGPU/asyncmark-gfx12plus.ll
+14-7llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-19llvm/test/CodeGen/AMDGPU/asyncmark-err.ll
+1-2llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+3-0llvm/lib/Target/AMDGPU/AMDGPU.td
+1-1llvm/lib/Target/AMDGPU/SOPInstructions.td
+378-291 files not shown
+380-297 files

LLVM/project 5567b34llvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp, llvm/test/CodeGen/AMDGPU vgpr-setreg-mode-swar.mir hazard-setreg-vgpr-msb-gfx1250.mir

[AMDGPU] Fix setreg handling in the VGPR MSB lowering

There are multiple issues with it:

1. It can skip inserting S_SET_VGPR_MSB if we set the mode via
   piggybacking. We are now relying on the HW bug for correct
   behavior. If/when the bug is fixed lowering will be incorrect.
2. We should just unconditionally update MSBs if immediate allows it.
   We shall set correct bits and keep the rest of the immediate
   (that is done). There is no reasonable way for an user to change
   MSBs nor does it do anything good to set it with SETREG and then
   immediately overwrite with S_SET_VGPR_MSB.
3. We can always update immediate if Offset is zero.
4. Redundant mode changes created as seen in the
   hazard-setreg-vgpr-msb-gfx1250.mir.

With unconditional immediate update most of time and not relying on
the SETREG for setting MSBs there is no good reason to complicate
handling by supporting SETREG as a piggybacking target. Moreover,

    [10 lines not shown]
DeltaFile
+209-47llvm/test/CodeGen/AMDGPU/vgpr-setreg-mode-swar.mir
+20-39llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+12-18llvm/test/CodeGen/AMDGPU/hazard-setreg-vgpr-msb-gfx1250.mir
+241-1043 files

LLVM/project 164505dllvm/utils/TableGen AsmMatcherEmitter.cpp

[NFC][AsmMatcher] Add Commented Name for FeatureBitsets (#190688)
DeltaFile
+1-1llvm/utils/TableGen/AsmMatcherEmitter.cpp
+1-11 files

LLVM/project 75bb30dllvm/lib/CodeGen PreISelIntrinsicLowering.cpp, llvm/lib/Transforms/InstCombine InstCombineLoadStoreAlloca.cpp

Move {load,store}(llvm.protected.field.ptr) lowering to InstCombine.

The previous position of llvm.protected.field.ptr lowering for loads
and stores was problematic as it not only inhibited optimizations such
as DSE (as stores to a llvm.protected.field.ptr were not considered to
must-alias stores to the non-protected.field pointer) but also required
changes to other optimization passes to avoid transformations that would
reduce PFP coverage.

Address this by moving the load/store part of the lowering to
InstCombine, where it will run earlier than the PFP-breaking and
AA-relying transformations. The deactivation symbol, null comparison
and EmuPAC parts of the lowering remain in PreISelLowering.

Now that the transformation inhibitions are no longer needed, remove them
(i.e. partially revert #151649, and revert #182976).

This change resulted in a 2.4% reduction in Fleetbench .text size and
the following improvements to PFP performance overhead for BM_PROTO_Arena

    [11 lines not shown]
DeltaFile
+57-73llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+17-86llvm/test/Transforms/PreISelIntrinsicLowering/protected-field-pointer.ll
+17-86llvm/test/Transforms/PreISelIntrinsicLowering/protected-field-pointer-addrspace1.ll
+64-0llvm/test/Transforms/PreISelIntrinsicLowering/emupac.ll
+62-0llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+61-0llvm/test/Transforms/InstCombine/protected-field-ptr.ll
+278-2456 files not shown
+331-31612 files

LLVM/project ac3745eclang/test/Analysis/Scalable/ssaf-format list.test

Apply suggestion from @ziqingluo-90
DeltaFile
+1-1clang/test/Analysis/Scalable/ssaf-format/list.test
+1-11 files

LLVM/project eb35aa9llvm/lib/Target/RISCV RISCVInstrInfoZvk.td, llvm/test/CodeGen/RISCV/rvv vrol.ll

[RISCV] Use per-SEW immediate inversion for vrol intrinsic patterns (#190113)

The VPatBinaryV_VI_VROL multiclass was using InvRot64Imm for all SEW
widths when converting vrol immediate intrinsics to vror.vi. This
produced unnecessarily large immediates for narrower element types
(e.g., 61 instead of 5 for SEW=8 rotate-left by 3).

Use the appropriate InvRot{SEW}Imm transform to match what the SDNode
patterns already do.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply at anthropic.com>
DeltaFile
+36-36llvm/test/CodeGen/RISCV/rvv/vrol.ll
+3-2llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
+39-382 files

LLVM/project 82505fbllvm/include/llvm/Transforms/Utils Cloning.h, llvm/lib/Transforms/IPO Inliner.cpp

[Inliner] Put inline history into IR as !inline_history metadata (#190700)

(Reland of #190092 with verifier change to look through GlobalAliases)

So that it's preserved across all inline invocations rather than just
one inliner pass run.

This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.

For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.

The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant

    [5 lines not shown]
DeltaFile
+102-0llvm/test/Transforms/Inline/inline-history.ll
+57-28llvm/lib/Transforms/Utils/InlineFunction.cpp
+25-36llvm/lib/Transforms/IPO/Inliner.cpp
+61-0llvm/test/Verifier/inline-history-metadata.ll
+25-26llvm/lib/Transforms/Utils/CloneFunction.cpp
+19-17llvm/include/llvm/Transforms/Utils/Cloning.h
+289-10713 files not shown
+394-21319 files

LLVM/project 63be9b2clang/include/clang/ScalableStaticAnalysisFramework/Analyses EntityPointerLevel.h

fix format
DeltaFile
+2-2clang/include/clang/ScalableStaticAnalysisFramework/Analyses/EntityPointerLevel.h
+2-21 files

LLVM/project 2931325clang/test/Analysis/Scalable/ssaf-format list.test

Apply suggestion from @ziqingluo-90
DeltaFile
+1-1clang/test/Analysis/Scalable/ssaf-format/list.test
+1-11 files

LLVM/project fa70ee4clang/lib/CIR/CodeGen CIRGenBuiltin.cpp, clang/test/CIR/CodeGenBuiltins builtins-floating-point.c

[CIR] Implement __builtin_flt_rounds and __builtin_set_flt_rounds (#190706)

This adds CIR handling for the __builtin_flt_rounds and
__builtin_set_flt_rounds builtin functions. Because the LLVM dialect
does not have dedicated operations for these, I have chosen not to
implement them as operations in CIR either. Instead, we just call the
LLVM intrinsic.
DeltaFile
+26-0clang/test/CIR/CodeGenBuiltins/builtins-floating-point.c
+17-3clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+43-32 files

LLVM/project 511a7aaclang/include/clang/CIR/Dialect/IR CIRAttrs.td, clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp CMakeLists.txt

[CIR][NFC] Use tablegen to create CIRAttrToValue visitor declarations (#187607)

This change introduces TableGen support for indicating CIR attributes
that require a CIRAttrToValue visitor, adds the new flag to all
attributes to which it applies, and replaces the explicit declarations
with the tablegen output.
DeltaFile
+34-27clang/include/clang/CIR/Dialect/IR/CIRAttrs.td
+46-0clang/utils/TableGen/CIRLoweringEmitter.cpp
+3-24clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+4-0clang/test/CIR/Lowering/poison.cir
+1-0clang/lib/CIR/Lowering/DirectToLLVM/CMakeLists.txt
+88-515 files

LLVM/project abc7647clang/include/clang/ScalableStaticAnalysisFramework/Analyses EntityPointerLevel.h, clang/lib/ScalableStaticAnalysisFramework/Analyses SSAFAnalysesCommon.h

fix typo
DeltaFile
+4-8clang/lib/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsageExtractor.cpp
+4-4clang/unittests/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsageTest.cpp
+3-3clang/lib/ScalableStaticAnalysisFramework/Analyses/SSAFAnalysesCommon.h
+1-1clang/include/clang/ScalableStaticAnalysisFramework/Analyses/EntityPointerLevel.h
+12-164 files

LLVM/project 120df3ellvm/lib/Target/AMDGPU AMDGPURewriteAGPRCopyMFMA.cpp, llvm/test/CodeGen/AMDGPU rewrite-vgpr-mfma-to-agpr-spill-multi-store.ll

[AMDGPU] Fixed verifier crash because of multiple live range components.

In Rewrite AGPR-Copy-MFMA pass, after replacing spill instructions, the
replacement register may have multiple live range components when the
spill slot was stored to more than once. The verifier crashes with a
bad machine code error. This patch fixes the problem by splitting a live
range but assigning the same physical register in this scenario. A new
test has been added that verifies the absence of this verifier error.

Assisted-by: Claude Opus
DeltaFile
+146-0llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-spill-multi-store.ll
+12-0llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+158-02 files

LLVM/project 4711f40llvm/lib/CodeGen Rematerializer.cpp

Remove lambda
DeltaFile
+20-21llvm/lib/CodeGen/Rematerializer.cpp
+20-211 files

LLVM/project 1f48b88llvm/lib/CodeGen Rematerializer.cpp, llvm/unittests/CodeGen RematerializerTest.cpp

[CodeGen] Fix incorrect rematerializtion order in rematerializer

When rematerializing DAGs of registers wherein multiple paths exist
between some regsters of the DAG, it is possible that the
rematerialization determines an incorrect rematerialization order that
does not ensure that a register's dependencies are rematerialized before
itself; an invariant that is otherwise required.

This fixes that using a simpler recursive logic to determine a correct
rematerialization order that honors this invariant. A minimal unit test
is added that fails on the current implementation.
DeltaFile
+20-33llvm/lib/CodeGen/Rematerializer.cpp
+38-0llvm/unittests/CodeGen/RematerializerTest.cpp
+58-332 files

LLVM/project 8e54890clang/include/clang/ScalableStaticAnalysisFramework SSAFBuiltinForceLinker.h, clang/unittests/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage UnsafeBufferUsageTest.cpp

fix bugs
DeltaFile
+1-1clang/include/clang/ScalableStaticAnalysisFramework/SSAFBuiltinForceLinker.h
+1-1clang/unittests/ScalableStaticAnalysisFramework/Analyses/UnsafeBufferUsage/UnsafeBufferUsageTest.cpp
+2-22 files

LLVM/project 94875aellvm/include/llvm/CodeGen LiveIntervals.h, llvm/lib/CodeGen Rematerializer.cpp

[CodeGen] Fix multiple connected component issue in rematerializer (#186674)

This fixes a rematerializer issue wherein re-creating the interval of a
non-rematerializable super-register defined over multiple MIs, some of
which defining entirely dead sub-registers, could cause a crash when
changing the order of sub-definitions (for example during scheduling)
because the re-created interval could end up with multiple connected
components, which is illegal. The solution is to split separate
components of the interval in such cases. The added unit test crashes
without that added behavior.
DeltaFile
+71-0llvm/unittests/CodeGen/RematerializerTest.cpp
+16-1llvm/lib/CodeGen/Rematerializer.cpp
+6-0llvm/include/llvm/CodeGen/LiveIntervals.h
+93-13 files

LLVM/project 90ec5f2mlir/test/Integration/GPU/CUDA async.mlir

[MLIR][test] Re-disable FileCheck on async.mlir integration test (#190702)

#190563 re-enabled FileCheck on `Integration/GPU/CUDA/async.mlir`, but
the buildbot has shown intermittent wrong-output failures
([example](https://lab.llvm.org/buildbot/#/builders/116/builds/27026)):
the test produces `[42, 42]` instead of the expected `[84, 84]`.

This wrong-output flakiness is distinct from the cleanup-time
`cuModuleUnload` errors that #190563 actually fixes — it's the
underlying issue tracked by #170833. The merged commit message for
#190563 incorrectly says `Fixes #170833`; that issue should be reopened,
since the cleanup-error fix doesn't address the wrong-output behavior.

This PR puts the test back in its previously-disabled state. The runtime
cleanup fix in #190563 is unaffected.
DeltaFile
+5-2mlir/test/Integration/GPU/CUDA/async.mlir
+5-21 files

LLVM/project aedd4e0clang/lib/CIR/CodeGen CIRGenExprConstant.cpp, clang/test/CIR/CodeGen static-local.cpp

[CIR] Handle static local var decl constants (#190699)

This adds the handling for the case where the address of a static local
variable is used to initialize another static local. In this case, the
address of the first variable is emitted as a constant in the
initializer of the second variable.
DeltaFile
+17-3clang/test/CIR/CodeGen/static-local.cpp
+4-3clang/lib/CIR/CodeGen/CIRGenExprConstant.cpp
+21-62 files

LLVM/project 228b6aeclang/lib/CIR/CodeGen CIRGenBuiltin.cpp, clang/test/CIR/CodeGenBuiltins builtin-signbit.c

[CIR][CodeGen] Implement __builtin_signbit (#188433)

__builtin_signbit function checks if the sign bit of a floating-point
number is set to 0 or 1.
DeltaFile
+158-0clang/test/CIR/CodeGenBuiltins/builtin-signbit.c
+10-1clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+168-12 files

LLVM/project fe07678.github/workflows libcxx-build-and-test.yaml

[libc++] Switch CI runners to use the latest Docker image (#190363)
DeltaFile
+3-3.github/workflows/libcxx-build-and-test.yaml
+3-31 files

LLVM/project 85eb6b3llvm/include/llvm/CodeGen LiveIntervals.h

Format
DeltaFile
+1-1llvm/include/llvm/CodeGen/LiveIntervals.h
+1-11 files

LLVM/project fe9a478llvm/include/llvm/CodeGen LiveIntervals.h, llvm/lib/CodeGen Rematerializer.cpp

[CodeGen] Fix multiple connected component issue in rematerializer

This fixes a rematerializer issue wherein re-creating the interval of a
non-rematerializable super-register defined over multiple MIs, some of
which defining entirely dead subregisters, could cause a crash when
changing the order of sub-definitions (for example during scheduling)
because the re-created interval could end up with multiple connected
components, which is illegal. The solution is to split separate
components of the interval in such cases. The added unit test crashes
without that added behavior.
DeltaFile
+71-0llvm/unittests/CodeGen/RematerializerTest.cpp
+16-1llvm/lib/CodeGen/Rematerializer.cpp
+6-0llvm/include/llvm/CodeGen/LiveIntervals.h
+93-13 files

LLVM/project 014d5d5lldb/packages/Python/lldbsuite/test/make Makefile.rules

[lldb] Change most tests to build with system libc++ on Darwin (#190034)

Today, on Darwin platforms, almost every test binary in our test suite
loads two copies of libc++, libc++abi, and libunwind. This is because
each of the test binaries explicitly link against a just-built libc++
(which is explicitly required on Darwin right now) but we don't take the
correct steps to replace the system libc++. Doing so is unnecessary and
potentially error-prone, so most tests should link against the system
libc++ where possible.

Background:
The lldb test suite has a collection of tests that rely on libc++
explicitly. The two biggest categories are data formatter tests (which
make sure that we can correctly display values for std types) and
import-std-module tests (which test that we can import the libc++ std
module). To make sure these tests are run, we require a just-built
libc++ to be used.

All of the test binaries link against the just-built libc++, so it gets

    [12 lines not shown]
DeltaFile
+18-11lldb/packages/Python/lldbsuite/test/make/Makefile.rules
+18-111 files

LLVM/project 8e1ea8allvm/lib/Target/RISCV RISCVInstrInfoP.td, llvm/test/CodeGen/RISCV rv32p.ll

[RISCV][P-ext] Add isel patterns for mhacc/mhaccu/mhaccsu. (#190670)
DeltaFile
+130-0llvm/test/CodeGen/RISCV/rv32p.ll
+7-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+137-02 files

LLVM/project ce61fe5llvm/tools/llvm-profgen MissingFrameInferrer.cpp ProfileGenerator.cpp

[NFC][llvm-profgen] Fix a few minor issues (#190019)

A few NFC (mostly) fixes:
- Drop unused parameters.
- Check return error.
- Fix return type.
DeltaFile
+8-6llvm/tools/llvm-profgen/MissingFrameInferrer.cpp
+4-5llvm/tools/llvm-profgen/ProfileGenerator.cpp
+1-2llvm/tools/llvm-profgen/ProfileGenerator.h
+2-0llvm/tools/llvm-profgen/llvm-profgen.cpp
+1-1llvm/tools/llvm-profgen/PerfReader.h
+16-145 files

LLVM/project 1ae179bllvm/lib/Transforms/IPO SampleProfileMatcher.cpp, llvm/test/Transforms/SampleProfile pseudo-probe-stale-profile-backward-matching.ll

[SampleProfileMatcher] Fix backward matching of non-anchor locations (#190118)

The backward matching loop in `matchNonCallsiteLocs` was ineffective
because `InsertMatching` used `std::unordered_map::insert()` which does
not overwrite existing entries. Since forward matching already inserted
entries for all non-anchor locations, the backward matching for the
second half was silently ignored.

The backward matching can update forward mappings in
`IRToProfileLocationMap` in 2 ways:
- The IR location maps a new different profile location. Change
`insert()` to `insert_or_assign()` so that entry overwrite can happen.
- The IR location maps the same profile location. Add `erase()` to
remove such mapping.
DeltaFile
+92-0llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-backward-matching.ll
+8-6llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
+6-0llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-backward-matching.prof
+106-63 files

LLVM/project fc1ce37llvm/lib/Target/AMDGPU AMDGPU.td

[AMDGPU] Enable real true16 on gfx1250
DeltaFile
+1-0llvm/lib/Target/AMDGPU/AMDGPU.td
+1-01 files

LLVM/project dd4e284llvm/lib/Target/AMDGPU AMDGPULowerVGPREncoding.cpp, llvm/test/CodeGen/AMDGPU vgpr-setreg-mode-swar.mir hazard-setreg-vgpr-msb-gfx1250.mir

[AMDGPU] Fix setreg handling in the VGPR MSB lowering

There are multiple issues with it:

1. It can skip inserting S_SET_VGPR_MSB if we set the mode via
   piggybacking. We are now relying on the HW bug for correct
   behavior. If/when the bug is fixed lowering will be incorrect.
2. We should just unconditionally update MSBs if immediate allows it.
   We shall set correct bits and keep the rest of the immediate
   (that is done). There is no reasonable way for an user to change
   MSBs nor does it do anything good to set it with SETREG and then
   immediately overwrite with S_SET_VGPR_MSB.
3. We can always update immediate if Offset is zero.
4. Redundant mode changes created as seen in the
   hazard-setreg-vgpr-msb-gfx1250.mir.

With unconditional immediate update most of time and not relying on
the SETREG for setting MSBs there is no good reason to complicate
handling by supporting SETREG as a piggybacking target. Moreover,

    [10 lines not shown]
DeltaFile
+209-47llvm/test/CodeGen/AMDGPU/vgpr-setreg-mode-swar.mir
+27-40llvm/lib/Target/AMDGPU/AMDGPULowerVGPREncoding.cpp
+12-18llvm/test/CodeGen/AMDGPU/hazard-setreg-vgpr-msb-gfx1250.mir
+248-1053 files