LLVM/project 3210031clang/test/Instrumentor StackUsageRT.cpp StackUsageRT.json, llvm/include/llvm/Transforms/IPO Instrumentor.h

[Instrumentor] Add Alloca and Function support; stack usage example

This adds support for alloca instrumentation and function pre/post
instrumentation. Alloca support follows load/store support directly.
Functions require special care to determine the insertion points.

Together, we can showcase how the stack high watermark can be profiled,
see InstrumentorStackUsage.cpp.
DeltaFile
+294-7llvm/lib/Transforms/IPO/Instrumentor.cpp
+120-8llvm/include/llvm/Transforms/IPO/Instrumentor.h
+60-0clang/test/Instrumentor/StackUsageRT.cpp
+59-0llvm/test/Instrumentation/Instrumentor/default_config.json
+57-0llvm/test/Instrumentation/Instrumentor/alloca_and_function.ll
+54-0clang/test/Instrumentor/StackUsageRT.json
+644-152 files not shown
+683-158 files

LLVM/project 5f47bf8llvm/include/llvm/Transforms/IPO Instrumentor.h InstrumentorConfigFile.h, llvm/lib/Passes PassBuilderPipelines.cpp

[Instrumentor] Use the pass builder's FileSystem for reading files

In the IO sandbox, the old read calls caused the CI to fail. This
changes uses the PassBuilder's FileSystem the same way other passes
read files from disk (during CI).
DeltaFile
+16-5llvm/lib/Transforms/IPO/InstrumentorConfigFile.cpp
+12-1llvm/lib/Transforms/IPO/Instrumentor.cpp
+7-3llvm/include/llvm/Transforms/IPO/Instrumentor.h
+2-2llvm/lib/Passes/PassBuilderPipelines.cpp
+1-1llvm/include/llvm/Transforms/IPO/InstrumentorConfigFile.h
+38-125 files

LLVM/project c0792f3clang/test/CXX/drs cwg9xx.cpp, clang/www cxx_dr_status.html

[clang][NFC] Mark CWG941 as implemented and add a test (#197202)

[CWG941](https://wg21.link/cwg941) allowed specializing deleted function
templates. Clang accepted this between 2.7 and 2.9, regressed and
started emitting redefinition errors between 3.0 and 3.8, then went back
to accepting in 3.9: https://godbolt.org/z/GKnf9je7j. I've marked it as
implemented since 3.9.
DeltaFile
+28-0clang/test/CXX/drs/cwg9xx.cpp
+1-1clang/www/cxx_dr_status.html
+29-12 files

LLVM/project 6008312llvm/lib/Target/PowerPC PPCISelLowering.cpp PPCInstrInfo.td

Also add pattern for LWAT.
DeltaFile
+0-53llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+23-10llvm/lib/Target/PowerPC/PPCInstrInfo.td
+0-11llvm/lib/Target/PowerPC/PPCInstr64Bit.td
+23-743 files

LLVM/project 2cb2dd4llvm/test/Object zos-archive-read.test

Fix z/OS archive test failure on macOS (#197290)

Fixes failures introduced by #187110.
- https://lab.llvm.org/buildbot/#/builders/190/builds/42544
- https://lab.llvm.org/buildbot/#/builders/23/builds/19989
-
https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8682002400978909393/+/u/clang/test/stdout


The original test hit a "file to small" error on macOS before hitting
the expected "truncated or malformed archive" error. This patch updates
the test to generate a valid archive then truncates it to 28 bytes so
that its large enough to pass the initial size check but too small for
the 60-byte member header, so it correctly hits the expected failure on
all platforms.
DeltaFile
+4-2llvm/test/Object/zos-archive-read.test
+4-21 files

LLVM/project d6b1219utils/bazel/llvm-project-overlay/libc BUILD.bazel

[Bazel] Port 8c187665e883e7c37ddff733ea50304d093dc9f4 (#197307)
DeltaFile
+1-0utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+1-01 files

LLVM/project ed442c5llvm/test/TableGen sort.td

improve naming, sort defs in test

Created using spr 1.3.7
DeltaFile
+35-35llvm/test/TableGen/sort.td
+35-351 files

LLVM/project 857f558llvm/test/TableGen sort.td

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+35-35llvm/test/TableGen/sort.td
+35-351 files

LLVM/project ce6462cllvm/test/TableGen sort.td

improve naming, sort defs in test

Created using spr 1.3.7
DeltaFile
+35-35llvm/test/TableGen/sort.td
+35-351 files

LLVM/project b296ee7llvm/lib/TableGen Record.cpp TGParser.cpp

clang-format

Created using spr 1.3.7
DeltaFile
+15-7llvm/lib/TableGen/Record.cpp
+5-4llvm/lib/TableGen/TGParser.cpp
+20-112 files

LLVM/project 6f0b2fallvm/lib/TableGen Record.cpp TGParser.cpp

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.7

[skip ci]
DeltaFile
+15-7llvm/lib/TableGen/Record.cpp
+5-4llvm/lib/TableGen/TGParser.cpp
+20-112 files

LLVM/project ecd1bafllvm/lib/TableGen Record.cpp TGParser.cpp

clang-format

Created using spr 1.3.7
DeltaFile
+15-7llvm/lib/TableGen/Record.cpp
+5-4llvm/lib/TableGen/TGParser.cpp
+20-112 files

LLVM/project f595c61llvm/lib/CodeGen RegisterScavenging.cpp, llvm/test/CodeGen/SystemZ scavenge-clobbered-reg.mir

[RegisterScavenging] Respect early-clobber defs when scavenging registers (#197120)

When scavenging registers backwards for virtual registers introduced
during frame index elimination, the register scavenger was ignoring
early-clobber constraints on the instruction using the scavenged
register. This could lead to assigning a virtual register to a physical
register marked as early-clobber output, violating the constraint that
early-clobber outputs cannot overlap with inputs.

This change inspects `RestoreAfter` to determine if the scavenged
register will be used by the instruction pointed at by MBBI, and if so,
remove any such registers from the scavengeable set.

This also adds a test to check if such EC defs are indeed respected whne
they otherwise wouldn't be.

co-authored-by: @uweigand

---------

Co-authored-by: Matt Arsenault <arsenm2 at gmail.com>
DeltaFile
+42-0llvm/test/CodeGen/SystemZ/scavenge-clobbered-reg.mir
+10-0llvm/lib/CodeGen/RegisterScavenging.cpp
+52-02 files

LLVM/project 6aff962llvm/docs/TableGen ProgRef.rst, llvm/lib/TableGen Record.cpp TGParser.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+84-0llvm/test/TableGen/sort.td
+54-1llvm/lib/TableGen/Record.cpp
+20-12llvm/lib/TableGen/TGParser.cpp
+14-0llvm/docs/TableGen/ProgRef.rst
+1-0llvm/lib/TableGen/TGLexer.cpp
+1-0llvm/lib/TableGen/TGLexer.h
+174-131 files not shown
+175-137 files

LLVM/project e7b684bllvm/docs/TableGen ProgRef.rst, llvm/include/llvm/Target Target.td

[𝘀𝗽𝗿] initial version

Created using spr 1.3.7
DeltaFile
+84-0llvm/test/TableGen/sort.td
+54-1llvm/lib/TableGen/Record.cpp
+20-12llvm/lib/TableGen/TGParser.cpp
+11-11llvm/test/TableGen/aarch64-apple-tuning-features.td
+14-0llvm/docs/TableGen/ProgRef.rst
+1-1llvm/include/llvm/Target/Target.td
+184-253 files not shown
+187-259 files

LLVM/project 56d6b3fllvm/docs/TableGen ProgRef.rst, llvm/include/llvm/TableGen Record.h

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.7

[skip ci]
DeltaFile
+84-0llvm/test/TableGen/sort.td
+54-1llvm/lib/TableGen/Record.cpp
+20-12llvm/lib/TableGen/TGParser.cpp
+14-0llvm/docs/TableGen/ProgRef.rst
+1-0llvm/lib/TableGen/TGLexer.h
+1-0llvm/include/llvm/TableGen/Record.h
+174-131 files not shown
+175-137 files

LLVM/project bdb5d95llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AArch64 gep-user-scalable.ll

[SLP] Do not account scalable vectorized users when estimating geps cost

We should not try to widen the scalable users of geps, they are not
vectorized and scalable vector type cannot be widened.

Fixes #197132

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/197301
DeltaFile
+39-0llvm/test/Transforms/SLPVectorizer/AArch64/gep-user-scalable.ll
+1-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+40-12 files

LLVM/project 5c80731flang/lib/Optimizer/CodeGen CodeGen.cpp, flang/test/Fir/CUDA cuda-code-gen.mlir

[flang][cuda] Place box value kernel args in managed memory (#197116)

Example:
```fortran
type deviceArray
  integer, allocatable, dimension(:,:), device :: Arr
end type deviceArray
type(deviceArray), allocatable, dimension(:) :: DA

allocate(DA(2))
allocate(DA(1)%Arr(32,32))
call mykernel<<<1,32>>>(DA(1)%Arr, 32)  ! cudaErrorIllegalAddress
```

In this code, `DA(1)%Arr` is a device allocatable component inside a
managed derived type. The compiler loads the descriptor, reboxes it on
the host stack, and passes it to the kernel. Since `!fir.box` is lowered
to a pointer in LLVM IR, the kernel receives a host-stack pointer it
cannot dereference — causing `cudaErrorIllegalAddress`.

    [11 lines not shown]
DeltaFile
+25-6flang/lib/Optimizer/CodeGen/CodeGen.cpp
+28-0flang/test/Fir/CUDA/cuda-code-gen.mlir
+53-62 files

LLVM/project 171e2bdllvm/lib/CodeGen ShadowStackGCLowering.cpp

Revert "[CodeGen] Use byte offsets and ptradd in ShadowStackGCLowering" (#197297)

Reverts llvm/llvm-project#178436. I need to update the tests that I
added for that PR.
DeltaFile
+86-101llvm/lib/CodeGen/ShadowStackGCLowering.cpp
+86-1011 files

LLVM/project 5e4a21allvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVInstrInfoP.td, llvm/test/CodeGen/RISCV rvp-simd-64.ll rvp-unaligned-load-store.ll

[RISCV][P-ext] Add initial 64-bit support for RV32. (#197093)

Most operations are set to expand. A few operations that were easy to
support using isel patterns have been added. concat_vectors and
extract_subvector are supported in order to allow type legalization to
split 64-bit vectors into 32-bit vectors around the supported
operations.

Loads and stores are custom split into two i32 scalars or two v4i8/v2i16
vectors.

I've added new opcodes to build and split vectors into 2 GPRs at
function arguments and returns. These are similar to BuildPairF64 and
SplitF64 nodes we use for RV32D soft float. Long term we might want to
use concat_vectors/build_vector and extract_subvector/extract_vectorelt.
DeltaFile
+3,000-975llvm/test/CodeGen/RISCV/rvp-simd-64.ll
+248-8llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+72-72llvm/test/CodeGen/RISCV/rvp-unaligned-load-store.ll
+40-46llvm/test/CodeGen/RISCV/calling-conv-p-ext-vector.ll
+84-0llvm/lib/Target/RISCV/RISCVInstrInfoP.td
+65-7llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+3,509-1,1085 files not shown
+3,565-1,12511 files

LLVM/project c924ecbllvm/lib/CodeGen ShadowStackGCLowering.cpp

[CodeGen] Use byte offsets and ptradd in ShadowStackGCLowering (#178436)

Replace typed struct GEPs with byte array allocation and ptradd
operations:

1. Track root offsets as byte offsets instead of building typed struct.
2. Use `ComputeFrameLayout` to compute byte offsets based on DataLayout,
properly accounting for each root's size and alignment.
3. Allocate frame as `[FrameSize x i8]` byte array instead of typed
struct.
4. Replace all CreateGEP operations with CreatePtrAdd using computed
offsets.
5. Frame layout unchanged: `[Next ptr | Map ptr | Root 0 | Root 1 | ...
| Root N]` where each root is placed at its computed aligned offset.
6. Zero out padding between roots with memset for deterministic frame
contents for GC.

Benefits:
- Removes dependency on `getAllocatedType` for building frame struct

    [7 lines not shown]
DeltaFile
+101-86llvm/lib/CodeGen/ShadowStackGCLowering.cpp
+101-861 files

LLVM/project d48651fmlir/include/mlir/Dialect/AMDGPU/IR AMDGPUOps.td, mlir/lib/Dialect/AMDGPU/IR AMDGPUOps.cpp

[mlir][AMDGPU] Canonicalize masks on global_load_async_to_lds (#197280)

If the mask is always true, remove the mask operand (there are patterns
that key off the presence of the lack of a mask operand to know when
they can be more aggressive). If the mask is always false, just go ahead
and delete the op as it won't write anythig.

AI: I described the patterns, Codex 5.5 wrote them
DeltaFile
+29-0mlir/test/Dialect/AMDGPU/canonicalize.mlir
+25-0mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
+1-0mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUOps.td
+55-03 files

LLVM/project a9d11f9clang/test/CXX/drs cwg7xx.cpp, clang/www cxx_dr_status.html

[clang][NFC] Mark CWG730 as implemented and add a test (#197186)

[CWG730](https://wg21.link/cwg730) clarifies that it's allowed to
specialize templates that are members of a non-template class. Clang
implements this since 2.7: https://godbolt.org/z/bWzb766rz
DeltaFile
+10-0clang/test/CXX/drs/cwg7xx.cpp
+1-1clang/www/cxx_dr_status.html
+11-12 files

LLVM/project bbc3c08llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AArch64 reused-reduction-revec.ll

[SLP]Disable reused reductions in revec mode for vector scalars

Reused reductions may require some special processing, but courrently it
crashes the compiler. Disable reused reductions for vector scalars in
revec mode to fix a crash.

Fixes #196914

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/197291
DeltaFile
+68-0llvm/test/Transforms/SLPVectorizer/AArch64/reused-reduction-revec.ll
+2-1llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+70-12 files

LLVM/project a2a3554clang/test/CodeGen scoped-atomic-ops.c scoped-fence-ops.c, clang/test/CodeGenOpenCL builtins-amdgcn-gfx1250.cl amdgpu-abi-struct-arg-byref.cl

[clang] use QualType addrspace when making an alloca (#181390)

Instead of assuming that QualType is in default addrspace (or
compatible with it), actually use the addrspace declared by the
frontend. That removes needless dueling addrspacecast calls and
associated IR noise. Any callers that intend to discard the attributes
of the type (e.g. because they are casting an rvalue through memory)
need to now be explicit about that (e.g. by calling getUnqualifiedType).

This is part of a commit sequence trying to help the WASM be able to
have distinguished pointer types between stack memory and local memory
(attempting to emit an addrspacecast between the two is invalid).

Assisted-By: Claude Sonnet 4.5 <noreply at anthropic.com>
DeltaFile
+544-713clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250.cl
+431-406clang/test/CodeGen/scoped-atomic-ops.c
+99-137clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
+88-134clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+126-66clang/test/CodeGen/scoped-fence-ops.c
+68-106clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
+1,356-1,56221 files not shown
+1,541-1,82027 files

LLVM/project 483ecf8flang/lib/Semantics expression.cpp, flang/test/Semantics cuf14.cuf

[flang][cuda] Fix CUDA generic matching with omitted optional args (#197275)

Skip omitted optional arguments when computing CUDA address-space
matching distances, so -gpu=unified overload resolution does not compare
expanded dummy-argument lists of different sizes. Adds a regression
covering a unified-memory overload with optional extras.
DeltaFile
+19-0flang/test/Semantics/cuf14.cuf
+4-0flang/lib/Semantics/expression.cpp
+23-02 files

LLVM/project f1c30afllvm/lib/Transforms/Utils SimplifyCFG.cpp, llvm/test/Transforms/SimplifyCFG merge-cond-stores-debuginfo.ll

[SimplifyCFG] correct and move debug info for mergeConditionalStoreToAddress (#180789)

Previously, a combination of TryToSimplifyUncondBranchFromEmptyBlock
and SpeculatedStoreValue was changing the separate conditional stores
into a store of one value, which was then being hoisted to a
non-conditional store of that one value (and a DCE of the other). This
makes all linked stores use the new value, which is still
unconditionally correct. It isn't easy for
TryToSimplifyUncondBranchFromEmptyBlock to otherwise guess why the
value is different and try to recover which one is correct when doing
the conditional update. The end result being that the debug info might
have the wrong value. Now instead this updates the debug info at the
same time to reflect that the merged store will be equivalent, hoping
to turn these into the same info. This ensures that later passes don't
need to reverse how the different stores connected back to the new IR,
since either debug info now contains correct information for either
branch taken.

And additionally, without `combineMetadataForCSE`, it was dropping the

    [8 lines not shown]
DeltaFile
+78-0llvm/test/Transforms/SimplifyCFG/merge-cond-stores-debuginfo.ll
+14-2llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+92-22 files

LLVM/project 7b209c4llvm/test/CodeGen/AMDGPU llvm.amdgcn.image.sample.g16.encode.ll

Fix failing test after merge with VOP3 encoding
DeltaFile
+107-51llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.encode.ll
+107-511 files

LLVM/project 939912ecompiler-rt/lib/scudo/standalone/tests common_test.cpp map_test.cpp

[scudo] Move MemMap tests from to map_test.cpp

The tests VerifyGetResidentPages, VerifyReleasePagesToOS, and Zeros test
MemMapT functionality and fit better in map_test.cpp where other MemMapT
tests reside.
DeltaFile
+0-98compiler-rt/lib/scudo/standalone/tests/common_test.cpp
+98-0compiler-rt/lib/scudo/standalone/tests/map_test.cpp
+98-982 files

LLVM/project 2efd530llvm/include/llvm/Support InstructionCost.h, llvm/test/Transforms/LoopVectorize/AArch64 call-costs.ll

[Support] Always scale InstructionCost::Value (#178962)

Allows for fractional InstructionCost's up to a granularity with little overhead.
Will allow for more accurate division results and will support finer granularity
of TTI costing.

Before:
InstructionCost(2) / 4 = 0

After (with ScalingFactor 4):
InstructionCost(2) / 4 = 1 / 2

Also, there is a decrease in the maximum value of InstructionCost, as
the largest value is now `std::numeric_limits<CostType>::max() /
ScalingFactor`.

Addresses #174429
DeltaFile
+96-16llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll
+66-15llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+42-23llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll
+43-6llvm/test/Transforms/LoopVectorize/X86/cost-model-i386.ll
+19-13llvm/test/Transforms/LoopVectorize/RISCV/predicated-costs.ll
+19-5llvm/include/llvm/Support/InstructionCost.h
+285-78120 files not shown
+487-261126 files