LLVM/project 658bed5libc/src/__support/wctype perfect_hash_map.h lower_to_upper.h, libc/utils/wctype_utils/conversion hex_writer.py

[libc][wctype] Add perfect hash map for conversion functions
DeltaFile
+876-0libc/src/__support/wctype/perfect_hash_map.h
+568-0libc/src/__support/wctype/lower_to_upper.h
+553-0libc/src/__support/wctype/upper_to_lower.h
+0-400libc/src/__support/wctype/lower_to_upper.inc
+0-390libc/src/__support/wctype/upper_to_lower.inc
+71-1libc/utils/wctype_utils/conversion/hex_writer.py
+2,068-7918 files not shown
+2,256-79714 files

LLVM/project 7c28aaelibc/src/__support/CPP bit.h

reapply static
DeltaFile
+1-1libc/src/__support/CPP/bit.h
+1-11 files

LLVM/project f2e0e48libc/src/__support/math ceill.h, libc/test/shared shared_math_test.cpp

link issue
DeltaFile
+2-13libc/src/__support/math/ceill.h
+3-5libc/test/shared/shared_math_test.cpp
+5-182 files

LLVM/project 1d03351libc/src/__support/FPUtil bfloat16.h NearestIntegerOperations.h, libc/src/__support/FPUtil/generic add_sub.h

[libc][math] Qualify ceil functions to constexpr
DeltaFile
+59-7libc/test/shared/shared_math_test.cpp
+13-13libc/src/__support/FPUtil/generic/add_sub.h
+11-11libc/src/__support/FPUtil/bfloat16.h
+8-8libc/src/__support/FPUtil/NearestIntegerOperations.h
+13-1libc/src/__support/math/ceill.h
+7-7libc/src/__support/FPUtil/comparison_operations.h
+111-479 files not shown
+141-7215 files

LLVM/project 2b74b14libc/src/__support/FPUtil PolyEval.h

misc
DeltaFile
+4-4libc/src/__support/FPUtil/PolyEval.h
+4-41 files

LLVM/project e6789f9llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPU.td, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

[AMDGPU] Introduce ASYNC_CNT on GFX1250 (#185810)

Async operations transfer data between global memory and LDS. Their
progress is tracked by the ASYNC_CNT counter on GFX1250 and later
architectures. This change introduces the representation of that counter
in SIInsertWaitCnts. For now, the programmer must manually insert
s_wait_asyncnt instructions. Later changes will add compiler assistance
for generating the waits by including this counter in the asyncmark
instructions.

Assisted-by: Claude Sonnet 4.5

This is part of a stack:

- #185813
- #185810
DeltaFile
+24-9llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+10-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+8-1llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+6-0llvm/lib/Target/AMDGPU/AMDGPU.td
+48-104 files

LLVM/project 895c281llvm/lib/Target/AArch64/GISel AArch64RegisterBankInfo.cpp, llvm/test/CodeGen/AArch64 arm64-int-neon.ll

[AArch64][GlobalISel] Remove fallback for scalar usqadd/suqadd intrinsics (#187513)

Previously, GlobalISel was failing to select these intrinsics when given
scalar operands, as RegBankSelect would place these on GPR banks. Fixing
this enables GlobalISel to lower correctly, as in Instruction Selection
the intrinsic matches the SIMD patterns in AArch64InstrInfo.td.
DeltaFile
+1-5llvm/test/CodeGen/AArch64/arm64-int-neon.ll
+2-0llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
+3-52 files

LLVM/project 4376bf2clang-tools-extra/clang-tidy/performance FasterStringFindCheck.cpp, clang-tools-extra/test/clang-tidy/checkers/performance faster-string-find.cpp

[clang-tidy] Fix "effective" -> "efficient". (#187536)

"Effective" is the wrong word: Both overloads are effective; they do
what they're supposed to do. But the character overload does less work.
DeltaFile
+1-1clang-tools-extra/test/clang-tidy/checkers/performance/faster-string-find.cpp
+1-1clang-tools-extra/clang-tidy/performance/FasterStringFindCheck.cpp
+2-22 files

LLVM/project 4b17135llvm/lib/Transforms/Vectorize VPlanTransforms.cpp VPlanPatternMatch.h, llvm/test/Transforms/LoopVectorize/AArch64 partial-reduce-fdot-product.ll

[LV] Simplify `matchExtendedReductionOperand()` (NFCI) (#185821)

This updates `matchExtendedReductionOperand` so the simple case of
`UpdateR(PrevValue, ext(...))` is matched first as an early exit. The
binop matching is then flattened to remove the extra layer of the
`MatchExtends` lambda.
DeltaFile
+63-75llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+58-0llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-fdot-product.ll
+4-0llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+125-753 files

LLVM/project 78f267fclang/lib/AST/ByteCode InterpFrame.h InterpFrame.cpp

Reapply "[clang][bytecode] Allocate local variables in `InterpFrame` … (#187644)

…tail storage" (#187410)

This reverts commit bf1db77fc87ce9d2ca7744565321b09a5d23692f.

Avoid using an `InterpFrame` member after calling its destructor this
time. I hope that was the only problem.
DeltaFile
+41-15clang/lib/AST/ByteCode/InterpFrame.h
+23-21clang/lib/AST/ByteCode/InterpFrame.cpp
+13-15clang/lib/AST/ByteCode/Function.h
+9-15clang/lib/AST/ByteCode/Compiler.cpp
+15-7clang/lib/AST/ByteCode/Context.cpp
+13-6clang/lib/AST/ByteCode/Interp.cpp
+114-7910 files not shown
+146-11616 files

LLVM/project 68e3556llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPU.td, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

[AMDGPU] Introduce ASYNC_CNT on GFX1250

Async operations transfer data between global memory and LDS. Their progress is
tracked by the ASYNC_CNT counter on GFX1250 and later architectures. This change
introduces the representation of that counter in SIInsertWaitCnts. For now, the
programmer must manually insert s_wait_asyncnt instructions. Later changes will
add compiler assistance for generating the waits by including this counter in
the asyncmark instructions.

Assisted-by: Claude Sonnet 4.5
DeltaFile
+24-9llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+10-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+8-1llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+6-0llvm/lib/Target/AMDGPU/AMDGPU.td
+48-104 files

LLVM/project ab28384llvm/include/llvm/CodeGen ExpandMemCmp.h, llvm/include/llvm/Passes CodeGenPassBuilder.h

[ExpandMemCmp] Remove unused TM/TLI dependency (#187660)

This pass does not actually use TargetMachine/TargetLoweringInfo.
DeltaFile
+15-24llvm/lib/CodeGen/ExpandMemCmp.cpp
+0-4llvm/include/llvm/CodeGen/ExpandMemCmp.h
+1-1llvm/include/llvm/Passes/CodeGenPassBuilder.h
+1-1llvm/lib/Passes/PassRegistry.def
+0-1llvm/test/tools/opt/no-target-machine.ll
+17-315 files

LLVM/project da11265llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

add VH callback support for value deletion in uniformity
DeltaFile
+83-0llvm/unittests/Target/AMDGPU/UniformityAnalysisCallbackVHTest.cpp
+46-0llvm/lib/Analysis/UniformityAnalysis.cpp
+14-0llvm/include/llvm/ADT/GenericUniformityImpl.h
+4-0llvm/lib/IR/SSAContext.cpp
+4-0llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+2-0llvm/lib/CodeGen/MachineSSAContext.cpp
+153-02 files not shown
+155-08 files

LLVM/project f18c8aellvm/lib/CodeGen MachineSSAContext.cpp, llvm/lib/IR SSAContext.cpp

review: fix isNeverDivergent and separate VH callback for other follow-up
DeltaFile
+0-4llvm/lib/IR/SSAContext.cpp
+0-2llvm/lib/CodeGen/MachineSSAContext.cpp
+0-62 files

LLVM/project c912af8llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp

handle inside isFLAT; add missing comment for getBitWidth
DeltaFile
+5-5llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+1-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+6-52 files

LLVM/project 5d503a3llvm/include/llvm/ADT GenericUniformityImpl.h GenericSSAContext.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: add comment in isNeverDivergent and separate VH callback for other follow-up
DeltaFile
+0-83llvm/unittests/Target/AMDGPU/UniformityAnalysisCallbackVHTest.cpp
+0-46llvm/lib/Analysis/UniformityAnalysis.cpp
+4-14llvm/include/llvm/ADT/GenericUniformityImpl.h
+5-0llvm/include/llvm/ADT/GenericSSAContext.h
+0-4llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+0-1llvm/unittests/Target/AMDGPU/CMakeLists.txt
+9-1481 files not shown
+9-1497 files

LLVM/project d97adc4llvm/lib/Target/X86 X86ISelLowering.cpp, llvm/test/CodeGen/X86 bit-manip-i512.ll bit-manip-i256.ll

[X86] Perform i128/i256/i512 BITREVERSE on the FPU (#187502)

Bitcast the large scalar integer to a vXi64 vector, reverse the elements
and then perform a per-element vXi64 bitreverse

If we have SSSE3 or later, BITREVERSE expansion using PSHUFB is always
more efficient than performing it as a scalar sequence (no need for
mayFoldIntoVector check).

Fixes #187353
DeltaFile
+780-2,395llvm/test/CodeGen/X86/bit-manip-i512.ll
+450-1,161llvm/test/CodeGen/X86/bit-manip-i256.ll
+228-452llvm/test/CodeGen/X86/bitreverse.ll
+324-317llvm/test/CodeGen/X86/bit-manip-i128.ll
+28-5llvm/lib/Target/X86/X86ISelLowering.cpp
+1,810-4,3305 files

LLVM/project ef75891clang/include/clang/AST Decl.h

clang-format
DeltaFile
+1-3clang/include/clang/AST/Decl.h
+1-31 files

LLVM/project 689afb5llvm/utils/release build_llvm_release.bat

Windows release build: Add checksum verification for downloaded source archives (#187113)

Add checksum verification for libxml2, zlib, and zstd source archives
via `cmake -E *sum` and `cmake -E compare_files` commands.

This also adds the following minor changes:
* Factor out libxml2 version into variable.
* Check `tar` return code.
DeltaFile
+24-5llvm/utils/release/build_llvm_release.bat
+24-51 files

LLVM/project 69cd746clang/tools/clang-fuzzer/handle-llvm handle_llvm.cpp, llvm/docs/CommandGuide llc.rst

[llc] Add -mtune option (#186998)

This patch adds a Clang-compatible -mtune option to llc, to enable
decoupled ISA and microarchitecture targeting, which is especially
important for backend development. For example, it can enable to easily
test a subtarget feature or scheduling model effects on codegen across a
variaty of workloads on the IR corpus benchmark:
https://github.com/dtcxzyw/llvm-codegen-benchmark.

The implementation adds an isolated generic codegen flag, to establish a
base for wider usage - the plan is to add it to `opt` as well in a
followup patch. Then `llc` consumes it, and sets `tune-cpu` attributes
for functions, which are further consumed by the backend.
DeltaFile
+69-0llvm/test/tools/llc/mtune.ll
+31-11llvm/lib/CodeGen/CommandFlags.cpp
+17-7llvm/include/llvm/CodeGen/CommandFlags.h
+15-9llvm/tools/llc/llc.cpp
+11-0llvm/docs/CommandGuide/llc.rst
+2-2clang/tools/clang-fuzzer/handle-llvm/handle_llvm.cpp
+145-293 files not shown
+149-329 files

LLVM/project 4df2967lldb/include/lldb/Utility Stream.h, lldb/source/Core UserSettingsController.cpp

[lldb] Implement llvm::formatv overload for Stream::operator << (#187462)

This will allow us to more conveniently use llvm::formatv in the
codebase.
DeltaFile
+13-1lldb/include/lldb/Utility/Stream.h
+9-0lldb/unittests/Utility/StreamTest.cpp
+3-5lldb/source/Interpreter/OptionValueProperties.cpp
+4-3lldb/source/Target/TraceDumper.cpp
+6-0lldb/source/Utility/Stream.cpp
+1-1lldb/source/Core/UserSettingsController.cpp
+36-106 files

LLVM/project d6d2289llvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h, llvm/lib/Analysis UniformityAnalysis.cpp

add VH callback support for value deletion in uniformity
DeltaFile
+83-0llvm/unittests/Target/AMDGPU/UniformityAnalysisCallbackVHTest.cpp
+46-0llvm/lib/Analysis/UniformityAnalysis.cpp
+14-0llvm/include/llvm/ADT/GenericUniformityImpl.h
+4-0llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+1-0llvm/include/llvm/ADT/GenericUniformityInfo.h
+1-0llvm/unittests/Target/AMDGPU/CMakeLists.txt
+149-06 files

LLVM/project facc82d.github CODEOWNERS

[clang][cir] Adding myself in CODEOWNERS for CIRGenBuiltinAArch64.cpp (#187570)

This is to help with #185382 and to make sure that I don't miss any PRs.
DeltaFile
+1-0.github/CODEOWNERS
+1-01 files

LLVM/project 24060b6llvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

review: fix isNeverDivergent and separate VH callback for other follow-up
DeltaFile
+0-83llvm/unittests/Target/AMDGPU/UniformityAnalysisCallbackVHTest.cpp
+0-46llvm/lib/Analysis/UniformityAnalysis.cpp
+6-16llvm/include/llvm/ADT/GenericUniformityImpl.h
+0-4llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+0-4llvm/lib/IR/SSAContext.cpp
+0-2llvm/lib/CodeGen/MachineSSAContext.cpp
+6-1553 files not shown
+6-1589 files

LLVM/project 367d5ablibclc/clc/lib/generic/math clc_acos.inc clc_acos.cl

libclc: Update acos

This was originally ported from rocm device libs in
efeafa1bdaa715733fc100bcd9d21f93c7272368, merge in more
recent changes.
DeltaFile
+114-105libclc/clc/lib/generic/math/clc_acos.inc
+2-0libclc/clc/lib/generic/math/clc_acos.cl
+116-1052 files

LLVM/project c8dd829libclc/clc/lib/amdgpu CMakeLists.txt, libclc/clc/lib/amdgpu/math clc_amdgpu_cbrt.inc clc_cbrt.cl

libclc: Override cbrt for AMDGPU (#187560)
DeltaFile
+78-0libclc/clc/lib/amdgpu/math/clc_amdgpu_cbrt.inc
+34-0libclc/clc/lib/amdgpu/math/clc_cbrt.cl
+1-0libclc/clc/lib/amdgpu/CMakeLists.txt
+113-03 files

LLVM/project edbe827libclc/clc/lib/amdgpu CMakeLists.txt, libclc/clc/lib/amdgpu/math clc_log2.cl clc_log.cl

libclc: Use log intrinsic for half and float cases for amdgpu (#187538)

This is pretty verbose and ugly. We're pulling the base implementation
in for the double cases, and scalarizing it. Also fully defining the
half and float cases to directly use the intrinsic, for all vector
types. It would be much more convenient if we had linker based overrides
for the generic implementations, rather than per source file.
DeltaFile
+41-0libclc/clc/lib/amdgpu/math/clc_log2.cl
+41-0libclc/clc/lib/amdgpu/math/clc_log.cl
+41-0libclc/clc/lib/amdgpu/math/clc_log10.cl
+11-0libclc/clc/lib/amdgpu/math/clc_amdgpu_log.inc
+3-0libclc/clc/lib/amdgpu/CMakeLists.txt
+137-05 files

LLVM/project a5de509libclc/clc/lib/generic/math clc_log_base.h clc_log_base.inc

libclc: Rewrite log implementation as gentype inc file (#187537)

Follow the ordinary gentype conventions for the log implementation,
instead of using a plain header. This doesn't quite yet enable
vectorization, due to how the table is currently indexed. This should
make it easier for targets to selectively overload the function for
a subset of types.
DeltaFile
+0-252libclc/clc/lib/generic/math/clc_log_base.h
+243-0libclc/clc/lib/generic/math/clc_log_base.inc
+14-10libclc/clc/lib/generic/math/clc_log2.cl
+14-10libclc/clc/lib/generic/math/clc_log10.cl
+10-0libclc/clc/lib/generic/math/clc_log.cl
+281-2725 files

LLVM/project 441790bllvm/lib/Target/AArch64 AArch64SelectionDAGInfo.cpp, llvm/test/CodeGen/AArch64 mops-mmo-size.ll

[AArch64] Use an unknown size for memcpy ops with non-constant sizes. (#187445)

The previous value of 0 was allowing loads to move past the mops
operations where it is not valid. Use a LocationSize::afterPointer()
size instead.

The GISel lowering currently loses the MMO, which is fine as it should
be conservatively treated as a load/store to any location.
DeltaFile
+28-0llvm/test/CodeGen/AArch64/mops-mmo-size.ll
+4-4llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.cpp
+32-42 files

LLVM/project 421bf13libclc/clc/lib/generic/math clc_tanpi.inc clc_cospi.inc

libclc: Update trigpi functions (#187579)

These were originally ported from rocm device
libs in bc81ebefb7d9d9d71d20bfee2ce4cccb09701e9b.
Merge in more recent changes.
DeltaFile
+62-105libclc/clc/lib/generic/math/clc_tanpi.inc
+3-106libclc/clc/lib/generic/math/clc_cospi.inc
+2-104libclc/clc/lib/generic/math/clc_sinpi.inc
+23-62libclc/clc/lib/generic/math/clc_sincos_helpers_fp64.inc
+50-0libclc/clc/lib/generic/math/clc_sincospi.inc
+27-0libclc/clc/lib/generic/math/clc_sincos_helpers.inc
+167-37715 files not shown
+310-39821 files