LLVM/project bd6e8a8polly/lib/External/isl isl_tab.c isl_map.c

[Polly] Update isl to isl-0.27-89-gdc16f8e3 (#188013)

Update isl to include

https://repo.or.cz/isl.git/commit/ee3677039011f2f87f3630f8b2a004f9e4944a08
which fixes #187216

Closes #187216

Thanks @skimo-openhub for the fix and @thapgua for the bugreport.
DeltaFile
+5-3polly/lib/External/isl/isl_tab.c
+3-4polly/lib/External/isl/isl_map.c
+2-0polly/lib/External/isl/isl_tab_pip.c
+1-1polly/lib/External/isl/GIT_HEAD_ID
+11-84 files

LLVM/project 733bc34bolt/include/bolt/Profile DataAggregator.h, bolt/lib/Profile DataAggregator.cpp

[BOLT][Perf2bolt] Add support to generate pre-parsed perf data (#171144)

Adding a generator into Perf2bolt is the initial step to support the
large end-to-end tests for Arm SPE. This functionality proves unified format of
pre-parsed profile that Perf2bolt is able to consume.

Why does the test need to have a textual format SPE profile?

* To collect an Arm SPE profile by Linux Perf, it needs to have
an arm developer device which has SPE support.
* To decode SPE data, it also needs to have the proper version of
Linux Perf.
* The minimum required version of Linux Perf is v6.15.

Bypassing these technical difficulties, that easier to prove
a pre-generated textual profile format.

The generator relies on the aggregator work to spawn the required
perf-script jobs based on the the aggregation type, and merges the

    [12 lines not shown]
DeltaFile
+75-8bolt/include/bolt/Profile/DataAggregator.h
+82-1bolt/lib/Profile/DataAggregator.cpp
+157-92 files

LLVM/project f1d4ddallvm/lib/Target/AMDGPU SOPInstructions.td, llvm/test/CodeGen/AMDGPU fptosi-sat-vector.ll fptoui-sat-vector.ll

[AMDGPU] Use s_cvt_i32/u32_f32 instructions for saturated uniform conversions (#187711)

We attempt to select `s_cvt_i32/u32_f32` where possible, with some
considerations:

* For `f64` default to `v_` instructions as there is no support for
`f64` in SALU.
* For `f16` to `i16` select `v_cvt_i16/u16_f16` which is consistent with
non-saturating conversions behavior. However we could emit
`s_cvt_f32_f16` followed by `s_cvt_i32/u32_f32` to keep the computation
in SALU, as SALU does not have `s_cvt_i16_f16`. Happy to look into it if
beneficial.
* When it comes to clamping, ISel turns min and max sequence into
`v_med3` with `v0` destination, whereas globalisel keeps min and max as
`s_min` and `s_max` and then moves the result into `v0`, as lit tests
expect the return value to be in `v0` in both cases. This is unrelated
to this change but I thought it is worth highlighting.
DeltaFile
+3,172-0llvm/test/CodeGen/AMDGPU/fptosi-sat-vector.ll
+2,919-0llvm/test/CodeGen/AMDGPU/fptoui-sat-vector.ll
+1,177-0llvm/test/CodeGen/AMDGPU/fptosi-sat-scalar.ll
+859-0llvm/test/CodeGen/AMDGPU/fptoui-sat-scalar.ll
+11-0llvm/lib/Target/AMDGPU/SOPInstructions.td
+8,138-05 files

LLVM/project dcaab6dclang/lib/AST/ByteCode Compiler.cpp EvalEmitter.cpp, clang/test/AST/ByteCode constexpr-steps.cpp

[clang][bytecode] Add source info to jump ops (#188003)

The attached test case otherwise results in a function with one jump op
but no source info at all.
DeltaFile
+28-28clang/lib/AST/ByteCode/Compiler.cpp
+8-4clang/lib/AST/ByteCode/EvalEmitter.cpp
+6-6clang/lib/AST/ByteCode/ByteCodeEmitter.cpp
+7-0clang/test/AST/ByteCode/constexpr-steps.cpp
+3-3clang/lib/AST/ByteCode/EvalEmitter.h
+3-3clang/lib/AST/ByteCode/ByteCodeEmitter.h
+55-441 files not shown
+56-457 files

LLVM/project 682aedbclang/lib/CIR/CodeGen/Targets AMDGPU.cpp, clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVMIR.cpp

add support for amdgpu-expand-waitcnt-profiling
DeltaFile
+44-32clang/lib/CIR/CodeGen/Targets/AMDGPU.cpp
+16-0clang/test/CIR/CodeGenHIP/amdgpu-attrs.hip
+1-4clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVMIR.cpp
+61-363 files

LLVM/project 249b086llvm/lib/Transforms/Vectorize VPlanPatternMatch.h VPlanTransforms.cpp, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-replicate-extends.ll

[LV] Fix crash when extends are not widened in partial reduction matching  (#187782)

Fixes https://github.com/llvm/llvm-project/pull/185821#issuecomment-4098933551
DeltaFile
+45-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-replicate-extends.ll
+16-2llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+3-3llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+64-53 files

LLVM/project 00f8ed1llvm/test/CodeGen/X86/GlobalISel prelegalizer-combiner-identity.mir prelegalizer-combiner-sub.mir

[X86][Gisel] add trivial arith tests for gisel x86-prelegalizer-combiner (#183544)
DeltaFile
+192-0llvm/test/CodeGen/X86/GlobalISel/prelegalizer-combiner-identity.mir
+108-0llvm/test/CodeGen/X86/GlobalISel/prelegalizer-combiner-sub.mir
+88-0llvm/test/CodeGen/X86/GlobalISel/prelegalizer-combiner-mul.mir
+41-0llvm/test/CodeGen/X86/GlobalISel/prelegalizer-combiner-div.mir
+41-0llvm/test/CodeGen/X86/GlobalISel/prelegalizer-combiner-rem.mir
+27-0llvm/test/CodeGen/X86/GlobalISel/prelegalizer-combiner-or.mir
+497-02 files not shown
+539-08 files

LLVM/project e3c287fllvm/lib/Target/AMDGPU SIInstrInfo.h, llvm/test/CodeGen/AMDGPU insert-skips-gfx1250.mir vgpr-set-msb-coissue.mir

[AMDGPU] Handle S_WAIT_XCNT in SIInstrInfo::isWaitcnt (#187726)

This affects the behavior of SIPreEmitPeephole and
AMDGPULowerVGPREncoding.
DeltaFile
+60-0llvm/test/CodeGen/AMDGPU/insert-skips-gfx1250.mir
+40-0llvm/test/CodeGen/AMDGPU/vgpr-set-msb-coissue.mir
+1-0llvm/lib/Target/AMDGPU/SIInstrInfo.h
+101-03 files

LLVM/project 94b222bllvm/include/llvm/CodeGen/GlobalISel LegalizerInfo.h

[GlobalISel] Add `widenScalarFor()` function (#187731)

The function is mentioned in `Legalizer.rst` but has been missing. This
also fixes the asymetry between `narrowScalarXXX()` that has both
`narrowScalarFor()` and `narrowScalarIf()`, and `widenScalarXXX()` that
only had `widenScalarIf()`.
DeltaFile
+8-0llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h
+8-01 files

LLVM/project 9fa53a8llvm/lib/Target/AArch64 AArch64ExpandPseudoInsts.cpp

[AArch64] Combine cases with the same code in `expandMOVImm` (NFC) (#187843)

Combine cases for `ORRWri`, `ORRXri`, `ANDXri` and `EORXri` in
`AArch64ExpandPseudoImpl::expandMOVImm`, because these cases are handled
with exactly the same code.
DeltaFile
+2-19llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+2-191 files

LLVM/project 9426fc1clang/lib/CodeGen/TargetBuiltins ARM.cpp, clang/lib/Sema SemaARM.cpp

[AArch64] Fix _sys implemantation and MRS/MSR Sema checks (#187290)

This patch fixes lowering of _sys builtin, which used to lower into
invalid MSR S1... instruction. This was fixed by adding new sys llvm
intrinsic and proper lowering into sys instruction and its aliases.

I also fixed the sema check for _sys, _ReadStatusRegister and
_WriteStatusRegister builtins so they correctly capture invalid
usecases.
DeltaFile
+126-0llvm/test/CodeGen/AArch64/aarch64-sys-intrinsic.ll
+21-12clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+10-16clang/test/CodeGen/arm64-microsoft-sys.c
+14-5llvm/lib/Target/AArch64/AArch64InstrFormats.td
+13-0clang/test/Sema/builtins-microsoft-arm64.c
+6-3clang/lib/Sema/SemaARM.cpp
+190-363 files not shown
+198-419 files

LLVM/project 2ab8924clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std map unordered_map, clang-tools-extra/test/clang-tidy/checkers/modernize use-emplace.cpp

[clang-tidy][NFC] Use universal containers mock (#186669)

Changes are quite big but most of them is just copypasting and creating
mocks.
DeltaFile
+12-306clang-tools-extra/test/clang-tidy/checkers/modernize/use-emplace.cpp
+127-0clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std/map
+126-0clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std/unordered_map
+121-0clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std/set
+113-0clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std/unordered_set
+82-0clang-tools-extra/test/clang-tidy/checkers/Inputs/Headers/std/functional
+581-30622 files not shown
+1,027-65928 files

LLVM/project befad79libclc/clc/lib/generic/math clc_remainder.cl clc_remainder.inc

libclc: Implement remainder with remquo
 (#187999)

This fixes conformance failures for double and
without -cl-denorms-are-zero. Optimizations are
able to eliminate the unusued quo handling without
duplicating most of the code.
DeltaFile
+2-221libclc/clc/lib/generic/math/clc_remainder.cl
+13-0libclc/clc/lib/generic/math/clc_remainder.inc
+15-2212 files

LLVM/project 1a9fe17libclc/clc/include/clc/math remquo_decl.inc, libclc/clc/include/clc/shared binary_with_out_arg_scalarize.inc

libclc: Update remquo (#187998)

This was failing in the float case without -cl-denorms-are-zero
and failing for double. This now passes in all cases.

This was originally ported from rocm device libs in
8db45e4cf170cc6044a0afe7a0ed8876dcd9a863. This is mostly a port
in of more recent changes with a few changes.

- Templatification, which almost but doesn't quite enable
  vectorization yet due to the outer branch and loop.

- Merging of the 3 types into one shared code path, instead of
  duplicating  per type with 3 different functions implemented together.
  There are only some slight differences for the half case, which mostly
  evaluates as float.

- Splitting out of the is_odd tracking, instead of deriving it from the
  accumulated quotient. This costs an extra register, but saves several

    [6 lines not shown]
DeltaFile
+13-260libclc/clc/lib/generic/math/clc_remquo.inc
+158-0libclc/clc/lib/generic/math/clc_remquo_stret.inc
+82-0libclc/clc/include/clc/shared/binary_with_out_arg_scalarize.inc
+41-15libclc/clc/lib/generic/math/clc_remquo.cl
+22-12libclc/clc/include/clc/math/remquo_decl.inc
+316-2875 files

LLVM/project d6373b4mlir/include/mlir/Dialect/LLVMIR LLVMIntrinsicOps.td, mlir/test/Dialect/LLVMIR roundtrip.mlir

[mlir][LLVM] Add more `llvm.intr.experimental.constrained.*` ops (#187948)

Add additional "constrained" intrinsic ops. A rounding mode can be
specified for these ops.

Assisted by: claude-4.6-opus-high
DeltaFile
+105-0mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir
+77-0mlir/test/Target/LLVMIR/Import/intrinsic.ll
+67-2mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
+49-0mlir/test/Dialect/LLVMIR/roundtrip.mlir
+298-24 files

LLVM/project ac795f0clang/lib/AST/ByteCode InterpBuiltin.cpp

[clang][bytecode] Create fewer pointers in __builtin_nan() (#187990)

Check the elements directly for initialization state and keep track of
whether we found a NUL byte.
DeltaFile
+12-8clang/lib/AST/ByteCode/InterpBuiltin.cpp
+12-81 files

LLVM/project 8e53f91clang/lib/CIR/CodeGen CIRGenModule.cpp TargetInfo.cpp, clang/lib/CIR/CodeGen/Targets AMDGPU.cpp

[CIR][AMDGPU] Add AMDGPU-specific function attributes for HIP kernels
DeltaFile
+256-0clang/lib/CIR/CodeGen/Targets/AMDGPU.cpp
+82-0clang/test/CIR/CodeGenHIP/amdgpu-attrs.hip
+24-3clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVMIR.cpp
+8-6clang/lib/CIR/CodeGen/CIRGenModule.cpp
+10-0clang/lib/CIR/CodeGen/TargetInfo.cpp
+5-0clang/lib/CIR/CodeGen/TargetInfo.h
+385-91 files not shown
+386-97 files

LLVM/project e60c11fclang/lib/CIR/CodeGen CIRGenModule.cpp TargetInfo.cpp, clang/lib/CIR/CodeGen/Targets AMDGPU.cpp

[CIR][AMDGPU] Add AMDGPU-specific function attributes for HIP kernels
DeltaFile
+256-0clang/lib/CIR/CodeGen/Targets/AMDGPU.cpp
+82-0clang/test/CIR/CodeGenHIP/amdgpu-attrs.hip
+24-3clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVMIR.cpp
+8-6clang/lib/CIR/CodeGen/CIRGenModule.cpp
+13-0clang/lib/CIR/CodeGen/TargetInfo.cpp
+5-0clang/lib/CIR/CodeGen/TargetInfo.h
+388-91 files not shown
+389-97 files

LLVM/project bdfb59blibclc/clc/include/clc/math remquo_decl.inc, libclc/clc/include/clc/shared binary_with_out_arg_scalarize.inc

Address comments
DeltaFile
+5-0libclc/clc/include/clc/shared/binary_with_out_arg_scalarize.inc
+0-4libclc/clc/include/clc/math/remquo_decl.inc
+5-42 files

LLVM/project 083b36blibclc/clc/lib/generic/math clc_remainder.cl clc_remainder.inc

libclc: Update remainder

Previously this was failing conformance without -cl-denorms-are-zero
in the float case, and always failing in the double case.
DeltaFile
+17-212libclc/clc/lib/generic/math/clc_remainder.cl
+171-0libclc/clc/lib/generic/math/clc_remainder.inc
+188-2122 files

LLVM/project 0a0e785libclc/clc/lib/generic/math clc_remainder.inc clc_remainder.cl

libclc: Implement remainder with remquo

This fixes conformance failures for double and
without -cl-denorms-are-zero. Optimizations are
able to eliminate the unusued quo handling without
duplicating most of the code.
DeltaFile
+2-160libclc/clc/lib/generic/math/clc_remainder.inc
+1-25libclc/clc/lib/generic/math/clc_remainder.cl
+3-1852 files

LLVM/project b7ee3b0libclc/clc/lib/generic/math clc_remquo_stret.inc clc_remquo.inc

Fix missing definitions
DeltaFile
+163-0libclc/clc/lib/generic/math/clc_remquo_stret.inc
+0-154libclc/clc/lib/generic/math/clc_remquo.inc
+28-1libclc/clc/lib/generic/math/clc_remquo.cl
+191-1553 files

LLVM/project 7c6996fllvm/include/llvm/CodeGen ValueTypes.h, llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp

[ValueType][NFC] Add widenIntegerElementType method (#187816)

Fixes #187805
DeltaFile
+9-20llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+9-1llvm/include/llvm/CodeGen/ValueTypes.h
+18-212 files

LLVM/project 85ab2a9llvm/include/llvm/CodeGen TargetInstrInfo.h, llvm/lib/CodeGen/AsmPrinter AsmPrinter.cpp

[AsmPrinter] Add generic support for verifying instruction sizes (#187703)

Many backends rely on TII reporting correct instruction sizes for MIR
level branch relaxation passes. Reporting a too small size can result in
MC fixup failures (or silent miscompiles for unvalidated fixups).

Some time ago I added validation to the PPC asm printer to verify that
the TII instruction size matches the actually emitted size. This was
very helpful to systematically fix all incorrectly reported instruction
sizes.

However, the same problem also exists in lots of other backends, so this
moves the validation into AsmPrinter, controlled by a new
getInstSizeVerifyMode() hook in TII, which is disabled by default.

The intention here is to gradually enable this validation for more
backends (which requires fixing them first).
DeltaFile
+35-0llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+0-26llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+16-0llvm/include/llvm/CodeGen/TargetInstrInfo.h
+8-0llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
+3-0llvm/lib/Target/PowerPC/PPCInstrInfo.h
+62-265 files

LLVM/project 0d6185ellvm/test/CodeGen/AMDGPU callee-frame-setup.ll

[AMDGPU] Update test to match comment. NFC (#187273)

The comment says there shouldn't be any free registers, so update the
inline assembly to clobber all non-preserved SGPRs.
DeltaFile
+68-18llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
+68-181 files

LLVM/project 31caa34llvm/lib/Target/LoongArch LoongArchISelLowering.cpp, llvm/lib/Target/RISCV RISCVISelLowering.cpp

[LoongArch][RISCV] Fix incorrect indexing of incoming byval arguments in tail call eligibility check
DeltaFile
+48-0llvm/test/CodeGen/LoongArch/issue187832.ll
+48-0llvm/test/CodeGen/RISCV/issue187832.ll
+2-2llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+2-2llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+100-44 files

LLVM/project bb86440clang-tools-extra/clang-tidy/bugprone DerivedMethodShadowingBaseMethodCheck.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Correctly ignore function templates in derived-method-shadowing-base-method (#185741) (#185875)

This commit fixes a false positive in the
derived-method-shadowin-base-method clang-tidy check, as described in
[ticket 185741](https://github.com/llvm/llvm-project/issues/185741)

Fixes #185741

---------

Co-authored-by: Tom James <tom.james at siemens.com>
Co-authored-by: Zeyi Xu <mitchell.xu2 at gmail.com>
DeltaFile
+6-2clang-tools-extra/clang-tidy/bugprone/DerivedMethodShadowingBaseMethodCheck.cpp
+7-0clang-tools-extra/test/clang-tidy/checkers/bugprone/derived-method-shadowing-base-method.cpp
+4-0clang-tools-extra/docs/ReleaseNotes.rst
+17-23 files

LLVM/project de514fbbolt/include/bolt/Profile DataReader.h, bolt/include/bolt/Rewrite RewriteInstance.h

[BOLT] Remove some unused code (NFC) (#183880)

Remove some unused code in BOLT:
- `RewriteInstance::linkRuntime` is declared but not defined
- `BranchContext` typedef is never used
- `FuncBranchData::getBranch` is defined but never used
- `FuncBranchData::getDirectCallBranch` is defined but never used
DeltaFile
+0-29bolt/lib/Profile/DataReader.cpp
+0-10bolt/include/bolt/Profile/DataReader.h
+0-3bolt/include/bolt/Rewrite/RewriteInstance.h
+0-423 files

LLVM/project 3fa88f0mlir/lib/Dialect/Tensor/IR TensorOps.cpp, mlir/test/Dialect/Tensor canonicalize.mlir

[mlir][tensor] Fix empty tensor with cast encoding fold (#187963)

Fixed a todo where empty tensor with cast fold can't fold encoding or
attributes.
DeltaFile
+13-0mlir/test/Dialect/Tensor/canonicalize.mlir
+3-5mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+16-52 files

LLVM/project 66afa8fllvm/lib/Target/X86 X86ISelLoweringCall.cpp, llvm/test/CodeGen/X86 x86-fp80-ret-no-x87.ll

[X86] Emit user-friendly error for x86_fp80 with x87 disabled on x86_64 (#183932)

When compiling a function that uses `x86_fp80` on x86_64 with x87 disabled (`-mattr=-x87`), LLVM crashes with a cryptic internal error.

Fixes #182450
DeltaFile
+13-0llvm/test/CodeGen/X86/x86-fp80-ret-no-x87.ll
+13-0llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+26-02 files