LLVM/project aa3de78llvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/AArch64 spillcost-call-between-operands.ll spillcost-loop-backedge.ll

[SLP] Fix spill-cost cache lookup and predecessor scan

A cached intra-block scan that stopped at a call or budget limit only
proves the sub-range below the stop point is call-free; do not reuse
the cached bit for queries whose First lies above it. Also switch the
cross-block predecessor scan to "exists a call-free backward path"
semantics, skip blocks strictly dominated by Root, and memoize only
the (Root, OpParent) key. Fixes a false-positive spill cost that was
blocking profitable vectorization.

Reviewers: RKSimon, hiraditya, bababuck

Pull Request: https://github.com/llvm/llvm-project/pull/192709
DeltaFile
+104-26llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+4-11llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-call-between-operands.ll
+0-5llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-loop-backedge.ll
+108-423 files

LLVM/project d593279llvm/lib/Target/NVPTX NVPTXISelLowering.cpp NVPTXISelLowering.h, llvm/test/CodeGen/NVPTX fp-contract-f32x2.ll

[NVPTX] Scalarize `contract FMUL v2f32` to enable FMA fusion (#192815)

SM100+ legalizes `FMUL v2f32`, blocking the scalar FADD->FMA combiner.
Scalarize it when `contract` (or `allowFMA()`) is set and every lane
feeds a single `contract` FADD.
DeltaFile
+62-24llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+29-0llvm/test/CodeGen/NVPTX/fp-contract-f32x2.ll
+10-0llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+101-243 files

LLVM/project ab57f94libcxx/utils/ci/images libcxx_next_runners.txt

[libcxx] Bump next runner set (#194211)

So that we can pick up changes in
49b0451ec690dfc76690c19032bdc97c2889000b.
DeltaFile
+1-1libcxx/utils/ci/images/libcxx_next_runners.txt
+1-11 files

LLVM/project a0e42c2clang/lib/AST/ByteCode Integral.h Interp.h, clang/test/AST/ByteCode functions.cpp

[clang][bytecode] Add new IntegralType for function addresses (#194206)

We used to use just `::Address` for functions, which later caused
problems because we casted the pointer to `ValueDecl*` and passed it to
`Program::getOrCreateGlobal()`, which doesn't work of course.
DeltaFile
+9-0clang/lib/AST/ByteCode/Integral.h
+9-0clang/test/AST/ByteCode/functions.cpp
+5-1clang/lib/AST/ByteCode/Interp.h
+2-0clang/lib/AST/ByteCode/Primitives.h
+25-14 files

LLVM/project 49b0451libcxx/utils/ci/docker docker-compose.yml

[libcxx] Bump base image version to most recent (#194209)

To pull in the changes from bd75c10199a159a20720f8ee5c00afebb033f46e.
DeltaFile
+2-2libcxx/utils/ci/docker/docker-compose.yml
+2-21 files

LLVM/project 32a9f63clang/lib/AST/ByteCode Interp.cpp Interp.h, clang/test/Sema static-init.c

[clang][bytecode] Fix some problems with ptr-to-int casts (#193988)

1) When doing integral casts on a pointer-casted-to-integral, check the
bitwidth we're casting _to_, not the one we're casting _from_.
2) When the pointer we're casting to an integral is a dummy pointer,
don't forget to check the bitwidth.
DeltaFile
+5-3clang/lib/AST/ByteCode/Interp.cpp
+1-1clang/lib/AST/ByteCode/Interp.h
+1-0clang/test/Sema/static-init.c
+7-43 files

LLVM/project 33f2036llvm/lib/Target/AArch64/MCTargetDesc AArch64MCAsmInfo.cpp AArch64MCTargetDesc.cpp, llvm/lib/Target/ARM/MCTargetDesc ARMMCAsmInfo.cpp

[MC] Add MCTargetOptions to MCAsmInfo constructor. NFC (#194200)

Since #180464 the canonical MCTargetOptions pointer is stored in
MCAsmInfo, but it is bound after construction via `setTargetOptions`
called from TargetRegistry::createMCAsmInfo.

Direct constructions in unit tests can leave the pointer null, leading
to a runtime assert failure. Add MCTargetOptions to every MCAsmInfo
subclass constructor, store it as a reference in MCAsmInfo, and remove
`setTargetOptions()`.
DeltaFile
+18-9llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
+12-6llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.h
+11-4llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCAsmInfo.cpp
+11-4llvm/lib/Target/ARM/MCTargetDesc/ARMMCAsmInfo.cpp
+7-7llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
+5-4llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCTargetDesc.cpp
+64-3478 files not shown
+231-15984 files

LLVM/project 63f2a6dllvm/include/llvm/ProfileData SampleProf.h, llvm/test/tools/llvm-profgen filter-build-id.test filter-build-id-unsymbolized.test

[𝘀𝗽𝗿] initial version

Created using spr 1.3.4
DeltaFile
+101-25llvm/tools/llvm-profgen/PerfReader.cpp
+33-0llvm/test/tools/llvm-profgen/filter-build-id.test
+32-0llvm/test/tools/llvm-profgen/filter-build-id-unsymbolized.test
+21-3llvm/include/llvm/ProfileData/SampleProf.h
+24-0llvm/test/tools/llvm-profgen/Inputs/buildid-unsymbolized.raw
+12-3llvm/tools/llvm-profgen/PerfReader.h
+223-311 files not shown
+234-317 files

LLVM/project 3641e28llvm/test/tools/llvm-profgen filter-build-id.test, llvm/test/tools/llvm-profgen/Inputs buildid-cs-noprobe.aggperfscript

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.4

[skip ci]
DeltaFile
+90-24llvm/tools/llvm-profgen/PerfReader.cpp
+33-0llvm/test/tools/llvm-profgen/filter-build-id.test
+12-3llvm/tools/llvm-profgen/PerfReader.h
+11-0llvm/test/tools/llvm-profgen/Inputs/buildid-cs-noprobe.aggperfscript
+146-274 files

LLVM/project bd75c10libcxx/utils/ci/docker linux-builder-base.dockerfile

[libcxx] Include python3-yaml and rsync in container (#194182)

rsync is needed for installing the kernel headers for the libc build.
The yaml python package is needed for libc's hdrgen. This means we no
longer have to install these utilities at runtime.

They should be small enough relative to the existing container image
size to not really have an impact in that regard.
DeltaFile
+2-0libcxx/utils/ci/docker/linux-builder-base.dockerfile
+2-01 files

LLVM/project c012265clang/lib/AST/ByteCode Compiler.cpp, clang/test/AST/ByteCode functions.cpp

[clang][bytecode] Reject functions with dependent return type (#194114)

This unfortunately crashes the current interpreter as well.
DeltaFile
+12-3clang/test/AST/ByteCode/functions.cpp
+3-0clang/lib/AST/ByteCode/Compiler.cpp
+15-32 files

LLVM/project 682cf72mlir/lib/Dialect/Arith/IR ArithOps.cpp, mlir/test/Dialect/Arith canonicalize.mlir

[mlir][arith] Fold subi(a, subi(a, b)) to b (#194134)

Add a folder for `arith.subi` that simplifies `subi(a, subi(a, b))` to
`b` using the algebraic identity `a - (a - b) = b`.
DeltaFile
+10-0mlir/test/Dialect/Arith/canonicalize.mlir
+7-2mlir/lib/Dialect/Arith/IR/ArithOps.cpp
+17-22 files

LLVM/project c2a9725mlir/include/mlir/Dialect/Math/IR MathOps.td, mlir/lib/Dialect/Math/IR MathOps.cpp

[mlir][math] Add constant folding for sincos/cbrt (#194130)

Adds constant folder for `math.sincos` and `math.cbrt`.
DeltaFile
+51-0mlir/test/Dialect/Math/canonicalize.mlir
+48-1mlir/lib/Dialect/Math/IR/MathOps.cpp
+6-3mlir/include/mlir/Dialect/Math/IR/MathOps.td
+105-43 files

LLVM/project a75e6a5mlir/lib/Dialect/XeGPU/Transforms XeGPUUnroll.cpp, mlir/test/Dialect/XeGPU xegpu-wg-to-sg.mlir xegpu-wg-to-sg-unify-ops.mlir

[MLIR][XeGPU] Remove offsets from create_nd_tdesc & remove update_nd_offset, move offsets to load/store/prefetch ops (#193330)

This PR removes the optional offsets/const_offsets operands on
xegpu.create_nd_tdesc and instead mandates offsets directly on the
consuming load, store, and prefetch ops. It also deprecates the
update_nd_offset op.
DeltaFile
+980-230mlir/test/Dialect/XeGPU/xegpu-wg-to-sg.mlir
+0-987mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+245-107mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-rr.mlir
+164-174mlir/test/Dialect/XeGPU/propagate-layout.mlir
+44-282mlir/lib/Dialect/XeGPU/Transforms/XeGPUUnroll.cpp
+106-147mlir/test/Dialect/XeGPU/xegpu-blocking.mlir
+1,539-1,92721 files not shown
+1,946-3,33627 files

LLVM/project e7164d4libclc CMakeLists.txt

[libclc] Only check the triple architecture for libclc (#194149)

Summary:
Previously, `nvptx64--` would reject `nvptx64-unknown-unknown`. Two
options, either normalize all the triples in CMake, or just check the
architecture. I went with the former because it makes it easier for
people to pass different values.
DeltaFile
+9-14libclc/CMakeLists.txt
+9-141 files

LLVM/project 0ccb181compiler-rt/lib/sanitizer_common sanitizer_redefine_builtins.h

[compiler-rt] Use asm .set only for Hexagon (#194160)

Two incompatible assembler syntaxes exist for symbol assignment:
```
  sym = val      -- accepted by most GNU assembler targets; rejected by
                    Hexagon, which interprets it as a mnemonic
  .set sym, val  -- accepted by Hexagon; rejected by Alpha, which
                    reserves .set for assembler mode flags
```
Switch all to `sym = val`, and opt out Hexagon to `.set sym`.

Fixes: dbb03f8f606e ("[compiler-rt] Replace assignment w/.set directive
(#107667)")

---------

Co-authored-by: Vitaly Buka <vitalybuka at google.com>
DeltaFile
+17-5compiler-rt/lib/sanitizer_common/sanitizer_redefine_builtins.h
+17-51 files

LLVM/project b614c15llvm/include/llvm/MC TargetRegistry.h, llvm/include/llvm/MC/MCParser MCTargetAsmParser.h

[MC] Drop MCTargetOptions parameter from MCTargetAsmParser (#194120)

Since #180464, MCAsmInfo holds the canonical MCTargetOptions.
The MCTargetAsmParser::MCOptions member is a redundant by-value copy,
which may have inconsistent values (llvm-exegesis passes a temporary
MCTargetOptions(), but this probably doesn't matter in practice; other
in-tree uses are correct).

Remove the field in favor of getParser().getContext().getTargetOptions,
and remove the MCTargetOptions parameter from the base ctor, all
subclass ctors, Target::createMCAsmParser, MCAsmParserCtorTy, and
RegisterMCAsmParser.
DeltaFile
+7-9llvm/include/llvm/MC/TargetRegistry.h
+7-6llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h
+5-4llvm/lib/Target/Mips/AsmParser/MipsAsmParser.cpp
+4-4llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
+3-4llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
+3-3llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+29-3028 files not shown
+76-8334 files

LLVM/project 8174442clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/non-overloaded vfncvtbf16.c, clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded vfncvtbf16.c

Add extra cehck for invariants

Created using spr 1.3.7
DeltaFile
+3,230-456llvm/test/CodeGen/WebAssembly/strided-int-mac.ll
+704-882llvm/test/CodeGen/RISCV/rvv/setcc-int-vp.ll
+472-472clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/overloaded/vfncvtbf16.c
+345-558llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-int-vp.ll
+280-280clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/zvfofp8min/policy/non-overloaded/vfncvtbf16.c
+236-281llvm/test/CodeGen/RISCV/rvv/vsrl-vp.ll
+5,267-2,929148 files not shown
+10,292-6,461154 files

LLVM/project 486e97aclang/include/clang/Sema Initialization.h

[clang][NFC] Fix typo in HLSL initialization comment (#194124)
DeltaFile
+1-1clang/include/clang/Sema/Initialization.h
+1-11 files

LLVM/project b5471ccllvm/lib/MC MCObjectStreamer.cpp, llvm/test/MC/AsmParser directive_fill.s

[MC] Always lower .fill to MCFillFragment (#194164)

Constant-count, constant-pattern .fill expands inline into the current
fragment via emitIntValue per byte, wasting both memory and time (a
redundant copy at MCAssembler.cpp). #50974 reports a 4s compile dropping
to 0.6s when the loop is removed.

Drop the inline path so .fill always becomes MCFillFragment.
This cannot be done before commit 507efbcce03d (2023) allowed
label differences to be separated by a MCFillFragment.

In directive_fill.s, the parse time warning is now diagnosed by
MCAssembler.
DeltaFile
+5-16llvm/lib/MC/MCObjectStreamer.cpp
+1-1llvm/test/MC/AsmParser/directive_fill.s
+6-172 files

LLVM/project 4c7dc9clibc/src/__support/FPUtil BasicOperations.h, libc/src/__support/math CMakeLists.txt fmaximum_mag_numbf16.h

Reland "[libc][math] Refactor fmaximum_mag_num family to header-only" (#194194)

Reland #182169

---------

Co-authored-by: Muhammad Bassiouni <60100307+bassiounix at users.noreply.github.com>
DeltaFile
+47-2utils/bazel/llvm-project-overlay/libc/BUILD.bazel
+30-0libc/src/__support/math/CMakeLists.txt
+20-7libc/src/__support/FPUtil/BasicOperations.h
+26-0libc/src/__support/math/fmaximum_mag_numbf16.h
+25-0libc/src/__support/math/fmaximum_mag_num.h
+25-0libc/src/__support/math/fmaximum_mag_numf.h
+173-911 files not shown
+272-3017 files

LLVM/project de7c63ellvm/tools/llvm-profgen PerfReader.cpp ProfileGenerator.cpp

[llvm-profgen] Add --time-profgen (#191930)

Add `NamedRegionTimer`s to main profgen phases:
- Parse and aggregate trace (`parseAndAggregateTrace`)
- Unwind samples (`unwindSamples`)
- Generate profile (`ProfileGenerator::generateProfile`)
- Generate CS profile (`CSProfileGenerator::generateProfile`)

Test Plan:
```
$ llvm-profgen --time-profgen ...

===-------------------------------------------------------------------------===
                                  llvm-profgen
===-------------------------------------------------------------------------===
  Total Execution Time: 2826.6549 seconds (2873.3410 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  1059.4929 ( 38.1%)   8.5146 ( 17.3%)  1068.0075 ( 37.8%)  1090.6604 ( 38.0%)  Generate CS profile

    [3 lines not shown]
DeltaFile
+11-0llvm/tools/llvm-profgen/PerfReader.cpp
+5-0llvm/tools/llvm-profgen/ProfileGenerator.cpp
+1-0llvm/tools/llvm-profgen/Options.h
+17-03 files

LLVM/project 75b450fbolt/test/X86 pre-aggregated-records.s, bolt/test/X86/Inputs pre-aggregated-bad-hex.txt pre-aggregated-bad-type.txt

[BOLT] Add tests for pre-aggregated parsing (#193843)

Extends e2e coverage of pre-aggregated profile parsing to match the
unit-test coverage added in #192390:

- R (Return) records, including the branch=0 fallback path that
  rewrites to the FT_EXTERNAL_RETURN sentinel.
- r (FT_EXTERNAL_RETURN) records.
- B and T records using the negative -1 hex form (#192391),
  which is parsed as the BR_ONLY/FT_ONLY sentinel.
- Error paths: invalid record type letter and malformed hex address
  (perf2bolt is expected to exit non-zero with a parser error).

The two error-path inputs are tiny raw files under Inputs/ since they
contain intentionally malformed records that link_fdata doesn't process.

Test Plan:
added bolt/test/X86/pre-aggregated-records.s
DeltaFile
+60-0bolt/test/X86/pre-aggregated-records.s
+1-0bolt/test/X86/Inputs/pre-aggregated-bad-hex.txt
+1-0bolt/test/X86/Inputs/pre-aggregated-bad-type.txt
+62-03 files

LLVM/project 71816eflibc/src/__support/FPUtil/generic add_sub.h, libc/src/__support/math fdim.h fdimf.h

[libc][math] Qualify fdim funtions to constexpr (#194137)

Signed-off-by: udaykiriti <udaykiriti624 at gmail.com>
Co-authored-by: Muhammad Bassiouni <60100307+bassiounix at users.noreply.github.com>
DeltaFile
+8-0libc/test/shared/shared_math_constexpr_test.cpp
+6-0libc/test/shared/CMakeLists.txt
+5-1libc/src/__support/FPUtil/generic/add_sub.h
+3-1libc/src/__support/math/fdim.h
+3-1libc/src/__support/math/fdimf.h
+3-1libc/src/__support/math/fdimf16.h
+28-44 files not shown
+32-810 files

LLVM/project 24f4629lldb/test/API/commands/thread/backtrace TestThreadBacktraceRepeat.py

[lldb][test] Use assertIn in TestThreadBacktraceRepeat.py (NFC) (#194193)

I broke this test locally, and fixed the assets to produce more useful
output upon failure.
DeltaFile
+7-8lldb/test/API/commands/thread/backtrace/TestThreadBacktraceRepeat.py
+7-81 files

LLVM/project 13e7958llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

rebase

Created using spr 1.3.4
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-05,383 files not shown
+1,085,016-125,6375,389 files

LLVM/project e55f02fllvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.4

[skip ci]
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-05,383 files not shown
+1,085,016-125,6375,389 files

LLVM/project 46154febolt/docs profiles.md, bolt/lib/Profile DataReader.cpp DataAggregator.cpp

[BOLT] Support negative hex in pre-aggregated profile (#192391)

Handle signed values in parseHexField by falling back to int64_t parsing
when uint64_t fails. This allows pre-aggregated profile tools to use -1
for BR_ONLY, -2 for FT_EXTERNAL_ORIGIN, -3 for FT_EXTERNAL_RETURN.

Guard the external address reset loop in parseAggregatedLBREntry to
preserve sentinel values (offsets >= FT_EXTERNAL_RETURN).

Add tests for -1/-2/-3 in parseHexField and T entries with -1,
ffffffffffffffff, and buildid:-1 as BR_ONLY.
DeltaFile
+44-6bolt/docs/profiles.md
+40-0bolt/unittests/Profile/DataAggregator.cpp
+8-3bolt/lib/Profile/DataReader.cpp
+4-2bolt/lib/Profile/DataAggregator.cpp
+96-114 files

LLVM/project 53f7610llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

rebase

Created using spr 1.3.4
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-05,382 files not shown
+1,084,972-125,6315,388 files

LLVM/project 2954251llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

[𝘀𝗽𝗿] changes introduced through rebase

Created using spr 1.3.4

[skip ci]
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-05,382 files not shown
+1,084,972-125,6315,388 files