LLVM/project c91c9fcllvm/include/llvm/Support Allocator.h, llvm/unittests/Support AllocatorTest.cpp

Reland [Allocator] Keep bump pointer at a minimum alignment (#205240)

Reland #203718 (reverted in #205091) by making computation in integer
domain to avoid UB (nullptr + non-zero offset).

Add a `MinAlign` template parameter (default 8, sizeof(size_t) on 64-bit
platforms) so that the common case `Alignment <= MinAlign` can skip
realigning `CurPtr`.

This is achieved by rounding each allocation's size up to MinAlign, so
the bump pointer stays MinAlign-aligned between allocations.

SpecificBumpPtrAllocator::DestroyAll() walks objects at a fixed
sizeof(T) stride and needs tight packing, so it uses MinAlign=1.
(alignof(T) would
pack just as tightly and reuse the default instantiation, but T may be
incomplete here, e.g. `SpecificBumpPtrAllocator<MCSectionELF>`.)

Its `Allocate` still skips the realign: the slab is max_align_t-aligned

    [9 lines not shown]
DeltaFile
+41-17llvm/include/llvm/Support/Allocator.h
+19-0llvm/unittests/Support/AllocatorTest.cpp
+60-172 files

LLVM/project e2765f3llvm/lib/Transforms/IPO OpenMPOpt.cpp, llvm/test/Transforms/Attributor/reduced openmp_opt_constant_type_crash.ll

[OpenMPOpt][Attributor] Selectively seed deglobalization AAs (#198710)

This addresses a compile-time issue observed on a large generated C++
translation unit compiled with `-fopenmp`.

The source code is not OpenMP-heavy. It mainly consists of generated
function-registration wrappers, template instantiations, lambdas, and
small helper functions. However, because the TU is compiled with OpenMP
enabled, `OpenMPOptCGSCCPass` runs and drives Attributor on a module
with many functions.

`OpenMPOpt::registerAAsForFunction` currently eagerly creates the
deglobalization AAs for every function in OpenMP device modules:

* `AAHeapToShared`
* `AAHeapToStack`

Most generated wrapper/helper functions in the motivating workload do
not contain `__kmpc_alloc_shared`, removable allocations, or free-like

    [25 lines not shown]
DeltaFile
+34-13llvm/test/Transforms/Attributor/reduced/openmp_opt_constant_type_crash.ll
+34-10llvm/lib/Transforms/IPO/OpenMPOpt.cpp
+3-3llvm/test/Transforms/OpenMP/single_threaded_execution.ll
+71-263 files

LLVM/project 77879b4llvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep.ll

[AMDGPU] Fold constant offsets into named barrier addresses

Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.

Change-Id: I639bc723eb001573585cc05d0ad19f2773054f21
Assisted-by: Cursor
DeltaFile
+11-5llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+12-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2-5llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+25-113 files

LLVM/project 261d748llvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep.ll

[AMDGPU] Pre-commit test for constant-offset named barrier signal_var

A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
With object linking it emits a runtime add of the offset instead of folding
it into the relocation addend.

Change-Id: I7cea0dd64d050eb3e2143841e7136355cbb3bc50
Assisted-by: Cursor
DeltaFile
+119-0llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+119-01 files

LLVM/project 86184abllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp SIISelLowering.cpp, llvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep-object-linking.ll s-barrier-signal-var-gep.ll

[AMDGPU] Fold constant offsets into named barrier addresses

Allow isOffsetFoldingLegal to fold a constant offset into an LDS
named-barrier global, and include the node offset when materializing the
LDS address in LowerGlobalAddress. s_barrier_signal_var on a GEP'd named
barrier now selects the immediate form, matching a bare global and GlobalISel.
With object linking the offset folds into the relocation addend.

Change-Id: Ie05b8c8cd127604ff174c423a74340fd2de4e405
Assisted-by: Cursor
DeltaFile
+11-5llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+12-1llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+2-2llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep-object-linking.ll
+1-2llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+26-104 files

LLVM/project b820eb7llvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep.ll s-barrier-signal-var-gep-object-linking.ll

[AMDGPU] Pre-commit test for constant-offset named barrier signal_var

A GEP into a named-barrier array (&bars[1]) lowers s_barrier_signal_var to
the dynamic m0 form on SelectionDAG, unlike the bare global and GlobalISel.
With object linking it emits a runtime add of the offset instead of folding
it into the relocation addend.

Change-Id: I59f0e6fe6a72b4c96c8efb926610f7f2d3833e38
Assisted-by: Cursor
DeltaFile
+59-0llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+40-0llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep-object-linking.ll
+99-02 files

LLVM/project d853c05clang/include/clang/CIR/Dialect/Builder CIRBaseBuilder.h, clang/lib/CIR/CodeGen CIRGenBuiltin.cpp CIRGenExpr.cpp

[CIR] Add support for __builtin_nontemporal_store and __builtin_nontemporal_load (#197872)

Add nontemporal attribute to cir.load and cir.store ops.
DeltaFile
+77-0clang/test/CIR/CodeGenBuiltins/builtin-nontemporal.cpp
+12-8clang/lib/CIR/Dialect/Transforms/FlattenCFG.cpp
+18-2clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp
+8-11clang/lib/CIR/CodeGen/CIRGenExpr.cpp
+9-7clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
+7-5clang/lib/CIR/CodeGen/CIRGenBuilder.h
+131-3310 files not shown
+161-4716 files

LLVM/project 1cbfe8bllvm/include/llvm/IR GlobalValue.h, llvm/include/llvm/Transforms/Utils AssignGUID.h

Reland #184065
DeltaFile
+61-17llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+45-30llvm/lib/LTO/LTO.cpp
+64-2llvm/lib/IR/Globals.cpp
+49-3llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+45-5llvm/include/llvm/IR/GlobalValue.h
+49-0llvm/include/llvm/Transforms/Utils/AssignGUID.h
+313-57120 files not shown
+872-416126 files

LLVM/project 9b228b5llvm/lib/Transforms/IPO ThinLTOBitcodeWriter.cpp WholeProgramDevirt.cpp, llvm/test/ThinLTO/X86 devirt_function_alias2.ll

[CFI] Create an external linkage alias instead of promoting internals
DeltaFile
+20-33llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
+20-5llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+10-7llvm/test/Transforms/ThinLTOBitcodeWriter/comdat.ll
+16-0llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+6-4llvm/test/ThinLTO/X86/devirt_function_alias2.ll
+4-2llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
+76-513 files not shown
+83-569 files

LLVM/project f519bd9llvm/lib/IR Verifier.cpp, llvm/test/Verifier memprof-metadata-bad.ll

[Verifier] Require !callsite with !memprof metadata (#205053)

Fixes: https://github.com/llvm/llvm-project/issues/181237
DeltaFile
+10-6llvm/test/Verifier/memprof-metadata-bad.ll
+3-0llvm/lib/IR/Verifier.cpp
+13-62 files

LLVM/project 8995486llvm/include/llvm/IR IntrinsicsRISCV.td, llvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVInstrInfoP.td

[RISCV][P-ext] packed exchanged add/sub codegen (#203473)

Wire up the already-defined exchanged add/sub instructions
pas/psa/psas/pssa/paas/pasa with llvm.riscv.* intrinsics and isel
patterns.
DeltaFile
+174-0llvm/test/CodeGen/RISCV/rvp-simd-64.ll
+68-2llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+54-0llvm/test/CodeGen/RISCV/rvp-simd-32.ll
+24-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+8-0llvm/include/llvm/IR/IntrinsicsRISCV.td
+328-25 files

LLVM/project 677d378llvm/lib/Transforms/IPO ThinLTOBitcodeWriter.cpp WholeProgramDevirt.cpp, llvm/test/ThinLTO/X86 devirt_function_alias2.ll

[CFI] Create an external linkage alias instead of promoting internals
DeltaFile
+20-33llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
+20-5llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+10-7llvm/test/Transforms/ThinLTOBitcodeWriter/comdat.ll
+16-0llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+6-4llvm/test/ThinLTO/X86/devirt_function_alias2.ll
+4-2llvm/test/Transforms/ThinLTOBitcodeWriter/split-vfunc-internal.ll
+76-513 files not shown
+83-569 files

LLVM/project 4bf5379llvm/include/llvm/IR GlobalValue.h, llvm/include/llvm/Transforms/Utils AssignGUID.h

Reland #184065
DeltaFile
+61-17llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+45-30llvm/lib/LTO/LTO.cpp
+64-2llvm/lib/IR/Globals.cpp
+49-3llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+45-5llvm/include/llvm/IR/GlobalValue.h
+49-0llvm/include/llvm/Transforms/Utils/AssignGUID.h
+313-57119 files not shown
+858-410125 files

LLVM/project b0bd945compiler-rt/lib/instrumentor-examples/precision-analysis precision_analysis_runtime.cpp CMakeLists.txt, compiler-rt/test/instrumentor-examples precision_fp16_overflow.c precision_detailed.c

[Instrumentor] Add runtime examples: [2/N] A FP precision analysis

Second example:
Check all floating point operations and track if they could be done at
lower precision.

Partially developped by Claude (AI), tested and verified by me.
DeltaFile
+616-0compiler-rt/lib/instrumentor-examples/precision-analysis/precision_analysis_runtime.cpp
+92-0compiler-rt/test/instrumentor-examples/precision_fp16_overflow.c
+76-0compiler-rt/test/instrumentor-examples/precision_detailed.c
+68-0compiler-rt/lib/instrumentor-examples/precision-analysis/CMakeLists.txt
+66-0compiler-rt/test/instrumentor-examples/precision_mixed.c
+56-0compiler-rt/test/instrumentor-examples/simple_precision.c
+974-05 files not shown
+1,051-011 files

LLVM/project dfe8b22llvm/lib/Target/BPF BPFSelectionDAGInfo.cpp

[BPF] Increase BPFMaxStoresPerMemFunc from 128 to 192 (#205222)

With commits [1] and [2], memory operations like memcpy/memmove lower to
a sequence of loads/stores whose width is the minimum of the source and
destination alignment, and the store count is bounded by
BPFMaxStoresPerMemFunc. For 1-byte alignment, the maximum copy length
that can be inlined is therefore 128 bytes.

This may regress cases that previously inlined. Consider a memcpy with
src alignment 8, dst alignment 1 and size 136. After [1]/[2], the store
width is the minimum alignment (1 byte), so the store count is 136,
which exceeds the 128 limit and the copy falls back. Before [1]/[2], the
store count was computed with a fixed 8-byte unit regardless of the
actual alignment (each unit expands to 8 one-byte stores when the
minimum alignment is 1), so the total count was only 17 (136/8 < 128)
and the copy was inlined.

Raise the limit from 128 to 192 to mitigate. Alternatively, users can
increase alignment to avoid the regression.

    [2 lines not shown]
DeltaFile
+1-1llvm/lib/Target/BPF/BPFSelectionDAGInfo.cpp
+1-11 files

LLVM/project 0dddce7compiler-rt/lib/instrumentor-examples/precision-analysis precision_analysis_runtime.cpp CMakeLists.txt, compiler-rt/test/instrumentor-examples precision_fp16_overflow.c precision_detailed.c

[Instrumentor] Add runtime examples: [2/N] A FP precision analysis

Second example:
Check all floating point operations and track if they could be done at
lower precision.

Partially developped by Claude (AI), tested and verified by me.
DeltaFile
+569-0compiler-rt/lib/instrumentor-examples/precision-analysis/precision_analysis_runtime.cpp
+91-0compiler-rt/test/instrumentor-examples/precision_fp16_overflow.c
+76-0compiler-rt/test/instrumentor-examples/precision_detailed.c
+68-0compiler-rt/lib/instrumentor-examples/precision-analysis/CMakeLists.txt
+66-0compiler-rt/test/instrumentor-examples/precision_mixed.c
+56-0compiler-rt/test/instrumentor-examples/simple_precision.c
+926-05 files not shown
+1,003-011 files

LLVM/project 1ad5dfdcompiler-rt/lib/instrumentor-examples instrumentor_runtime.h README.md, compiler-rt/lib/instrumentor-examples/flop-counter flop_counter_runtime.cpp README.md

[Instrumentor] Add runtime examples: [1/N] A flop counter

This adds a instrumentor-examples folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed.

Partially developped by Claude (AI), tested and verified by me.
DeltaFile
+295-0compiler-rt/lib/instrumentor-examples/instrumentor_runtime.h
+180-0compiler-rt/lib/instrumentor-examples/flop-counter/flop_counter_runtime.cpp
+107-0compiler-rt/lib/instrumentor-examples/flop-counter/README.md
+83-4llvm/lib/Transforms/IPO/Instrumentor.cpp
+74-0compiler-rt/test/instrumentor-examples/lit.cfg.py
+72-0compiler-rt/lib/instrumentor-examples/README.md
+811-412 files not shown
+1,123-518 files

LLVM/project 617fad6llvm/test/CodeGen/AMDGPU s-barrier-signal-var-gep.ll

Apply suggestion from @chinmaydd
DeltaFile
+0-3llvm/test/CodeGen/AMDGPU/s-barrier-signal-var-gep.ll
+0-31 files

LLVM/project 52e0005clang/include/clang/Basic DiagnosticSemaKinds.td, clang/include/clang/Sema SemaOpenCL.h

[OpenCL] Warn if filter_mode is linear in read_image{i|ui} (#204086)

Per OpenCL spec:
The read_image{i|ui} calls support a nearest filter only. The
filter_mode specified in sampler must be set to CLK_FILTER_NEAREST;
otherwise the values returned are undefined.

Warn users when they apply a linear filter accidentally.
Address https://github.com/intel/compute-runtime/issues/379#issuecomment-4592083032

Assisted-by: Claude Sonnet 4.6
DeltaFile
+80-0clang/test/SemaOpenCL/read-image-integer-linear-filter.cl
+48-0clang/lib/Sema/SemaOpenCL.cpp
+7-0clang/lib/Sema/SemaExpr.cpp
+2-0clang/include/clang/Basic/DiagnosticSemaKinds.td
+2-0clang/include/clang/Sema/SemaOpenCL.h
+139-05 files

LLVM/project 3bb61e8llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll, llvm/test/CodeGen/RISCV clmul.ll

Merge branch 'main' into users/ziqingluo/PR-179150798
DeltaFile
+25,784-36,416llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+12,227-23,140llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+12,991-3,310llvm/test/MC/AMDGPU/gfx13_asm_vop3_dpp16.s
+11,856-3,719llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s
+4,004-11,142llvm/test/CodeGen/RISCV/clmul.ll
+6,940-6,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+73,802-84,5092,934 files not shown
+198,728-209,3352,940 files

LLVM/project c7e815bclang/lib/CodeGen CGExpr.cpp, clang/test/CodeGenHLSL/BasicFeatures OutArgLifetime.hlsl

[HLSL] Emit lifetime.start before copy-in for inout parameters (#191917)

For inout parameters, Clang was emitting lifetime.start after the
copy-in store that initializes the temporary. Per LLVM's lifetime
semantics, any access to memory outside its lifetime is undefined
behavior, so the copy-in store was technically UB and the value was
undefined after lifetime.start.

Move EmitLifetimeStart into EmitHLSLOutArgLValues so that it is emitted
before EmitInitializationToLValue, putting the copy-in store within the
lifetime of the temporary.

---------

Co-authored-by: Alexandre Isoard <alexandre.isoard at amd.com>
Co-authored-by: Deric C. <cheung.deric at gmail.com>
DeltaFile
+91-0clang/test/CodeGenHLSL/BasicFeatures/OutArgLifetime.hlsl
+5-2clang/lib/CodeGen/CGExpr.cpp
+96-22 files

LLVM/project 38df8cblldb/test/API/macosx/deny-attach main.c TestDenyAttach.py, lldb/tools/debugserver/source/MacOSX MachProcess.mm

[lldb] Survive ptrace(PT_DENY_ATTACH) when attaching (#204688) (#205198)

A process can opt out of being debugged with ptrace(PT_DENY_ATTACH). The
XNU kernel enforces this by delivering SIGSEGV to the *attaching*
process while it is still inside the ptrace(PT_ATTACHEXC) syscall. This
means debugserver gets killed before it can inspect the result. LLDB
only sees the dropped connection ("error: attach failed: lost
connection").

The condition can't be detected up front: the target's P_LNOATTACH flag
is not exposed to userspace. To work around this, install a temporary
SIGSEGV handler around the ptrace(PT_ATTACHEXC) call in AttachForDebug
and siglongjmp back out if it fires, turning the fatal signal into an
EPERM that propagates to lldb as a clear message:

```
error: attach failed: cannot attach to process N because it has
disabled debugging via ptrace(PT_DENY_ATTACH)
```

    [7 lines not shown]
DeltaFile
+87-5lldb/tools/debugserver/source/MacOSX/MachProcess.mm
+60-0lldb/test/API/macosx/deny-attach/main.c
+36-0lldb/test/API/macosx/deny-attach/TestDenyAttach.py
+3-0lldb/test/API/macosx/deny-attach/Makefile
+186-54 files

LLVM/project ff4bc6eclang/docs ReleaseNotes.rst, clang/lib/Sema SemaExpr.cpp

[Clang] Fix crash when comparing fixed point type with BitInt (#199912)

Fixes #196948

Added checks in `handleFixedPointConversion`: reject fixed point/BitInt
comparisons

Now clang properly emits an error instead of crashing.

---------

Co-authored-by: cry <2091136672 at foxmail.com>
DeltaFile
+6-1clang/test/SemaCXX/ext-int.cpp
+4-0clang/lib/Sema/SemaExpr.cpp
+1-0clang/docs/ReleaseNotes.rst
+11-13 files

LLVM/project febe8f0clang/test/Analysis/Scalable/PointerFlow entity-name-no-conflict.cpp benign-entity-name-conflict.cpp

Change 'benign-entity-name-conflict.cpp' to
'entity-name-no-conflict.cpp' because it is a USR generation bug even
though the erroneous behavior is benign in this example.
DeltaFile
+27-0clang/test/Analysis/Scalable/PointerFlow/entity-name-no-conflict.cpp
+0-24clang/test/Analysis/Scalable/PointerFlow/benign-entity-name-conflict.cpp
+27-242 files

LLVM/project bc4aadblldb/source/Plugins/ScriptInterpreter/Lua LuaState.cpp

[lldb] Fix LuaState after #205210 (#205219)
DeltaFile
+2-2lldb/source/Plugins/ScriptInterpreter/Lua/LuaState.cpp
+2-21 files

LLVM/project 0b46f55llvm/docs LangRef.md

Migrate 11 tables back from list-table to regular markdown tables
DeltaFile
+80-202llvm/docs/LangRef.md
+80-2021 files

LLVM/project a6986f0flang/lib/Semantics mod-file.cpp, flang/test/Semantics modfile84.f90

[flang][cuda][openacc] Emit an error when CUDA symbols are imported with CUDA disabled (#205207)
DeltaFile
+29-0flang/lib/Semantics/mod-file.cpp
+17-0flang/test/Semantics/modfile84.f90
+46-02 files

LLVM/project 2d0b2fblldb/source/Plugins/Process/gdb-remote ProcessGDBRemote.cpp ProcessGDBRemote.h

Revert "[lldb][Windows] Remember server's primary stop thread on gdb-remote stops" (#205220)

Reverts llvm/llvm-project#203525 because it breaks TestRealDefinition.py
DeltaFile
+0-9lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+0-3lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+0-122 files

LLVM/project 97394f0llvm/docs LangRef.md

[docs] Rewrite LangRef.md as Markdown
DeltaFile
+14,624-17,431llvm/docs/LangRef.md
+14,624-17,4311 files

LLVM/project c2bc2accompiler-rt/lib/instrumentor-examples/flop-counter rt.h instrumentor_runtime.h, compiler-rt/test/instrumentor-examples lit.cfg.py

[Instrumentor] Add runtime examples: [1/N] A flop counter

This adds a instrumentor-examples folder into compiler RT to showcase
use cases of the instrumentor. The initial example is a program that,
via instrumentation, counts the number of flops performed.

Partially developped by Claude (AI), tested and verified by me.
DeltaFile
+295-0compiler-rt/lib/instrumentor-examples/flop-counter/rt.h
+295-0compiler-rt/lib/instrumentor-examples/flop-counter/instrumentor_runtime.h
+181-0compiler-rt/lib/instrumentor-examples/flop-counter/flop_counter_runtime.cpp
+107-0compiler-rt/lib/instrumentor-examples/flop-counter/README.md
+78-4llvm/lib/Transforms/IPO/Instrumentor.cpp
+70-0compiler-rt/test/instrumentor-examples/lit.cfg.py
+1,026-413 files not shown
+1,406-519 files