LLVM/project f7ea3f3llvm/lib/Target/AMDGPU SIFixSGPRCopies.cpp, llvm/test/CodeGen/AMDGPU si-fix-sgpr-copies-av-constrain.mir fix-sgpr-copies-readfirstlane-av-register-regression.ll

AMDGPU: Constrain readfirstlane operand to vgpr_32

When inserting a readfirstlane, ensure the operand constraint
is respected. If the source register was an av_* class, the
verifier would fail.

Fixes regression after c7019c7eda6629ae99eb95aa1ee9e1f8249a4f49
DeltaFile
+92-0llvm/test/CodeGen/AMDGPU/si-fix-sgpr-copies-av-constrain.mir
+52-0llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-readfirstlane-av-register-regression.ll
+14-3llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
+158-33 files

LLVM/project f42fcfallvm/docs LangRef.rst, llvm/include/llvm/Transforms/Utils LoopUtils.h

Rethink fix: Don't convert 0 to 1.
DeltaFile
+84-80llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll
+17-38llvm/lib/Transforms/Utils/LoopUtils.cpp
+34-0llvm/test/Transforms/LoopVectorize/vectorize-zero-estimated-trip-count.ll
+6-21llvm/include/llvm/Transforms/Utils/LoopUtils.h
+17-9llvm/docs/LangRef.rst
+14-12llvm/test/Transforms/LoopVectorize/branch-weights.ll
+172-1607 files not shown
+197-18713 files

LLVM/project 182c415llvm/lib/Target/AMDGPU SIRegisterInfo.cpp SIRegisterInfo.h

AMDGPU: Remove getProperlyAlignedRC (#167993)

This is unused.
DeltaFile
+0-22llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+0-5llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+0-272 files

LLVM/project e4e55ecclang/lib/CodeGen BackendUtil.cpp, cross-project-tests/veclib veclib-sincos.c

RuntimeLibcalls: Move VectorLibrary handling into TargetOptions

This fixes the -fveclib flag getting lost on its way to the backend.

Previously this was its own cl::opt with a random boolean. Move the
flag handling into CommandFlags with other backend ABI-ish options,
and have clang directly set it, rather than forcing it to go through
command line parsing.

Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector
function. Clang has special handling for TargetLibraryInfo, where it would
directly construct one with the vector library in the pass pipeline.
RuntimeLibcallsInfo currently is not used as an analysis in codegen, and
needs to know the vector library when constructed.

RuntimeLibraryAnalysis could follow the same trick that TargetLibraryInfo is
using in the future, but a lot more boilerplate changes are needed to thread
that analysis through codegen. Ideally this would come from an IR module flag,
and nothing would be in TargetOptions. For now, it's better for all of these
sorts of controls to be consistent.
DeltaFile
+30-29clang/lib/CodeGen/BackendUtil.cpp
+0-34llvm/lib/IR/SystemLibraries.cpp
+24-0llvm/lib/CodeGen/CommandFlags.cpp
+21-0cross-project-tests/veclib/veclib-sincos.c
+9-6llvm/lib/Analysis/TargetLibraryInfo.cpp
+5-3llvm/tools/opt/optdriver.cpp
+89-7212 files not shown
+117-8618 files

LLVM/project e363d62clang/lib/CodeGen BackendUtil.cpp

clang: Pass -vector-library flag when using -fveclib

Really this belongs in an IR module flag.
DeltaFile
+30-0clang/lib/CodeGen/BackendUtil.cpp
+30-01 files

LLVM/project 0b3d5adllvm/lib/Analysis RuntimeLibcallInfo.cpp, llvm/lib/Transforms/Utils DeclareRuntimeLibcalls.cpp

DeclareRuntimeLibcalls: Use RuntimeLibraryAnalysis

Also add boilerplate to have a live instance when running
opt configured from CommandFlags / TargetOptions.
DeltaFile
+17-0llvm/test/Transforms/Util/DeclareRuntimeLibcalls/codegen-opt-flags.ll
+7-3llvm/tools/opt/NewPMDriver.h
+7-1llvm/tools/opt/optdriver.cpp
+5-3llvm/tools/opt/NewPMDriver.cpp
+4-1llvm/lib/Transforms/Utils/DeclareRuntimeLibcalls.cpp
+3-1llvm/lib/Analysis/RuntimeLibcallInfo.cpp
+43-91 files not shown
+44-107 files

LLVM/project 9d6a9b9offload/include device.h

Fix format
DeltaFile
+1-2offload/include/device.h
+1-21 files

LLVM/project b5444acllvm/lib/Target/AMDGPU SIRegisterInfo.cpp SIRegisterInfo.h

AMDGPU: Remove getProperlyAlignedRC

This is unused.
DeltaFile
+0-22llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+0-5llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+0-272 files

LLVM/project cfad41cclang/include/clang/CIR MissingFeatures.h, clang/lib/CIR/CodeGen CIRGenFunction.cpp

[CIR] Upstream l-value emission for ExprWithCleanups (#167938)

This adds the necessary handler for emitting an l-value for an
ExprWithCleanups expression.
DeltaFile
+86-0clang/test/CIR/CodeGen/temporary-materialization.cpp
+7-0clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+1-0clang/include/clang/CIR/MissingFeatures.h
+94-03 files

LLVM/project b1262d1mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td, mlir/lib/Conversion/AMDGPUToROCDL AMDGPUToROCDL.cpp

[mlir][ROCDL] Refactor wmma intrinsics to use attributes not operands where possible (#167041)

The current implementation of the WMMA intrinsic ops as they are defined
in the ROCDL tablegen is incorrect. They represent as operands what
should be attributes such as `clamp`, `opsel`, `signA/signB`. This
change performs a refactoring to bring it in line with what we expect.

---------

Signed-off-by: Muzammiluddin Syed <muzasyed at amd.com>
DeltaFile
+137-40mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+70-72mlir/test/Target/LLVMIR/rocdl.mlir
+49-36mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+12-12mlir/test/Conversion/AMDGPUToROCDL/wmma-gfx11.mlir
+11-11mlir/test/Conversion/AMDGPUToROCDL/wmma-gfx1250.mlir
+10-10mlir/test/Conversion/AMDGPUToROCDL/wmma-gfx12.mlir
+289-1811 files not shown
+293-1867 files

LLVM/project b38a2dboffload/include device.h omptarget.h, offload/libomptarget device.cpp

Add more fixes
DeltaFile
+16-2offload/include/device.h
+0-13offload/libomptarget/device.cpp
+2-3offload/include/omptarget.h
+1-1offload/libomptarget/OpenMP/API.cpp
+19-194 files

LLVM/project 630dfc9compiler-rt/test/dfsan origin_endianness.c, llvm/lib/Transforms/Instrumentation DataFlowSanitizer.cpp

[dfsan] Fix Endianess issue (#162881)

Fix Endianess issue with getting shadow 4 bytes corresponding to the
first origin pointer.

---------

Co-authored-by: anoopkg6 <anoopkg6 at github.com>
DeltaFile
+37-0compiler-rt/test/dfsan/origin_endianness.c
+10-2llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
+47-22 files

LLVM/project cdbf243clang/lib/Frontend CompilerInvocation.cpp

[clang] Add a TODO for output paths in invocation path visitation (#167983)

Pointed out in code review downstream:
https://github.com/swiftlang/llvm-project/pull/11816
DeltaFile
+1-0clang/lib/Frontend/CompilerInvocation.cpp
+1-01 files

LLVM/project 7237caallvm/lib/Target/AArch64 AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 aarch64-load-ext.ll andorxor.ll

[AArch64] Optimize extending loads of small vectors

Reduces the total amount of loads and the amount of moves between SIMD
registers and general-purpose registers.
DeltaFile
+198-28llvm/test/CodeGen/AArch64/aarch64-load-ext.ll
+119-37llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+36-45llvm/test/CodeGen/AArch64/andorxor.ll
+36-37llvm/test/CodeGen/AArch64/vec3-loads-ext-trunc-stores.ll
+10-17llvm/test/CodeGen/AArch64/sub.ll
+10-17llvm/test/CodeGen/AArch64/add.ll
+409-18119 files not shown
+539-36425 files

LLVM/project 4fe79a7compiler-rt/lib/sanitizer_common sanitizer_procmaps_mac.cpp, compiler-rt/test/asan/TestCases/Darwin asan-verify-module-map.cpp

[sanitizer-common] [Darwin] Fix overlapping dyld segment addresses (attempt 2) (#167800)

This re-lands #166005, which was reverted due to the issue described in
#167797.

There are 4 small changes:
- Fix LoadedModule leak by calling Clear() on the modules list
- Fix internal_strncpy calls that are not null-terminated
- Improve test to accept the dylib being loaded from a different path
than compiled `{{.*}}[[DYLIB]]`
- strcmp => internal_strncmp

This should not be merged until after #167797.


rdar://163149325
DeltaFile
+66-18compiler-rt/lib/sanitizer_common/sanitizer_procmaps_mac.cpp
+25-0compiler-rt/test/asan/TestCases/Darwin/asan-verify-module-map.cpp
+91-182 files

LLVM/project f93fcdeclang/tools/offload-arch AMDGPUArchByHIP.cpp OffloadArch.cpp

[offload-arch] Fix amdgpu-arch crash on Windows with ROCm 7.1 (#167695)

The tool was crashing on Windows with ROCm 7.1 due to two issues: misuse
of hipDeviceGet which should not be used (it worked before by accident
but was undefined behavior), and ABI incompatibility from
hipDeviceProp_t struct layout changes between HIP versions where the
gcnArchName offset changed from 396 to 1160 bytes.

The fix removes hipDeviceGet and queries properties directly by device
index. It defines separate struct layouts for R0600 (HIP 6.x+) and R0000
(legacy) to handle the different memory layouts correctly.

An automatic API fallback mechanism tries R0600, then R0000, then the
unversioned API until one succeeds, ensuring compatibility across
different HIP runtime versions. A new --hip-api-version option allows
manually selecting the API version when needed.

Additional improvements include enhanced error handling with
hipGetErrorString, verbose logging throughout the detection process, and

    [3 lines not shown]
DeltaFile
+172-25clang/tools/offload-arch/AMDGPUArchByHIP.cpp
+3-1clang/tools/offload-arch/OffloadArch.cpp
+175-262 files

LLVM/project d719876clang/lib/Format FormatToken.h UnwrappedLineParser.cpp, clang/unittests/Format FormatTestVerilog.cpp

[clang-format] Recognize Verilog DPI export and import (#165595)

The directives should not change the indentation level. Previously the
program erroneously added an indentation level when it saw the
`function` keyword.
DeltaFile
+137-45clang/lib/Format/FormatToken.h
+31-7clang/lib/Format/UnwrappedLineParser.cpp
+10-0clang/unittests/Format/FormatTestVerilog.cpp
+2-0clang/lib/Format/UnwrappedLineParser.h
+180-524 files

LLVM/project 7a0f7dbpolly/include/polly LinkAllPasses.h, polly/lib/Analysis ScopInfo.cpp DependenceInfo.cpp

[Polly] Introduce PhaseManager and remove LPM support (#125442) (#167560)

Reapply of a22d1c2225543aa9ae7882f6b1a97ee7b2c95574. Using this PR for
pre-merge CI.

Instead of relying on any pass manager to schedule Polly's passes, add
Polly's own pipeline manager which is seen as a monolithic pass in
LLVM's pass manager. Polly's former passes are now phases of the new
PhaseManager component.

Relying on LLVM's pass manager (the legacy as well as the New Pass
Manager) to manage Polly's phases never was a good fit that the
PhaseManager resolves:

* Polly passes were modifying analysis results, in particular RegionInfo
and ScopInfo. This means that there was not just one unique and
"definite" analysis result, the actual result depended on which analyses
ran prior, and the pass manager was not allowed to throw away cached
analyses or prior SCoP optimizations would have been forgotten. The LLVM

    [27 lines not shown]
DeltaFile
+432-0polly/lib/Pass/PhaseManager.cpp
+273-124polly/lib/Support/RegisterPasses.cpp
+0-228polly/lib/Analysis/ScopInfo.cpp
+9-211polly/lib/Analysis/DependenceInfo.cpp
+22-138polly/lib/Exchange/JSONExporter.cpp
+0-156polly/include/polly/LinkAllPasses.h
+736-8571,140 files not shown
+2,775-4,4651,146 files

LLVM/project ec77765offload/libomptarget device.cpp, offload/plugins-nextgen/amdgpu/dynamic_hsa hsa_ext_amd.h

Add fixes and improvements
DeltaFile
+35-29offload/plugins-nextgen/common/src/PluginInterface.cpp
+22-17offload/plugins-nextgen/common/include/PluginInterface.h
+2-2offload/libomptarget/device.cpp
+0-2offload/tools/deviceinfo/llvm-offload-device-info.cpp
+0-1offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
+59-515 files

LLVM/project 3a2de95clang/lib/Format WhitespaceManager.cpp, clang/unittests/Format FormatTest.cpp

[clang-format] Align trailing comments for function parameters (#164458)

before

```C++
void foo(int   name, // name
         float name, // name
         int   name)   // name
{}
```

after

```C++
void foo(int   name, // name
         float name, // name
         int   name) // name
{}
```

    [5 lines not shown]
DeltaFile
+10-2clang/lib/Format/WhitespaceManager.cpp
+8-0clang/unittests/Format/FormatTest.cpp
+18-22 files

LLVM/project d9790e0llvm/lib/Target/AMDGPU SIRegisterInfo.h SIRegisterInfo.td

[AMDGPU] Prioritize allocation of low 256 VGPR classes

If we have 1024 VGPRs available we need to give priority to the
allocation of these registers where operands can only use low 256.
That is noteably scale operands of V_WMMA_SCALE instructions.
Otherwise large tuples will be allocated first and take all low
registers, so we would have to spill to get a room for these
scale registers.

Allocation priority itself does not eliminate spilling completely
in large kernels, although helps to some degree. Increasing spill
weight of a restricted class on top of it helps.
DeltaFile
+11-0llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+1-1llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+12-12 files

LLVM/project 388ef61llvm/lib/CodeGen RegAllocGreedy.cpp RegAllocGreedy.h

[RegAllocGreedy] Use MCRegister instead of MCPhysReg. NFC (#167974)

DeltaFile
+9-11llvm/lib/CodeGen/RegAllocGreedy.cpp
+2-2llvm/lib/CodeGen/RegAllocGreedy.h
+11-132 files

LLVM/project e6b9805clang/lib/CIR/CodeGen CIRGenExprCXX.cpp CIRGenFunction.cpp

[CIR][NFC] Add missing code markers for Dtor_VectorDeleting (#167969)

This adds some minimal code to mark locations where handling is needed
for Dtor_VectorDeleting type dtors, which were added in
https://github.com/llvm/llvm-project/pull/165598

This is not a comprehensive mark-up of the missing code, as some code
will be needed in places where the surrounding function has larger
missing pieces in CIR currently.

This fixes a warning for an uncovered switch case that was causing CI
builds to fail.
DeltaFile
+7-0clang/lib/CIR/CodeGen/CIRGenExprCXX.cpp
+4-1clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+11-12 files

LLVM/project 4e71530llvm/lib/Transforms/Vectorize VPlanConstruction.cpp

[VPlan] Add findComputeReductionResult helper. (NFC)

Move utility to helper for re-use in follow-up patches.
DeltaFile
+12-7llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+12-71 files

LLVM/project 3ff3c4eclang/include/clang/CIR/Dialect/IR CIROps.td, clang/lib/CIR/CodeGen CIRGenBuiltinX86.cpp CIRGenExprScalar.cpp

[CIR] Upstream X86 builtin clflush, fence and pause (#167401)

This PR upstreams the intrinsics `_mm_prefetch`, `_mm_(l|m)fenche`,
`_mm_pause` and `_mm_clflush` from the incubator repository.

DeltaFile
+58-0clang/test/CIR/CodeGen/X86/sse2-builtins.c
+45-0clang/include/clang/CIR/Dialect/IR/CIROps.td
+36-1clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
+28-0clang/test/CIR/CodeGen/X86/sse-builtins.c
+24-0clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+22-0clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
+213-12 files not shown
+217-18 files

LLVM/project 36848a3lldb/bindings/python python-typemaps.h python-typemaps.swig

[lldb] Remove bindings/python/python-typemaps.h (#167966)

The minimum supported SWIG version is 4.0 so there's no need for using a
separate file anymore.
DeltaFile
+0-19lldb/bindings/python/python-typemaps.h
+12-6lldb/bindings/python/python-typemaps.swig
+12-252 files

LLVM/project e781a5allvm/lib/Transforms/Utils ProfileVerify.cpp, llvm/test/Transforms/PGOProfile profcheck-select.ll

[profcheck] Disable verification of selects on vector conditions.
DeltaFile
+0-30llvm/utils/profcheck-xfail.txt
+20-2llvm/test/Transforms/PGOProfile/profcheck-select.ll
+10-6llvm/lib/Transforms/Utils/ProfileVerify.cpp
+30-383 files

LLVM/project 513232fclang/lib/Tooling/DependencyScanning ModuleDepCollector.cpp, clang/test/ClangScanDeps modules-header-sharing.m

[clang][deps] Track VFS overlay files in file dependencies. (#167824)

rdar://164612831
DeltaFile
+3-0clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+2-1clang/test/ClangScanDeps/modules-header-sharing.m
+5-12 files

LLVM/project 1eebb8dflang/lib/Lower/OpenMP Utils.cpp ClauseProcessor.cpp, offload/test/offloading/fortran implicit-derived-enter-exit.f90

Address reviewer comments.
DeltaFile
+65-0offload/test/offloading/fortran/implicit-derived-enter-exit.f90
+4-6flang/lib/Lower/OpenMP/Utils.cpp
+2-5flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+3-0flang/lib/Lower/OpenMP/OpenMP.cpp
+74-114 files

LLVM/project a6edeedllvm/test/Transforms/LoopVectorize first-order-recurrence-tail-folding.ll use-scalar-epilogue-if-tp-fails.ll, llvm/test/Transforms/LoopVectorize/RISCV uniform-load-store.ll divrem.ll

Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (#149042)"

This reverts commit 62d1a080e69e3c5e98840e000135afa7c688a77b.

This appears to be causing some runtime failures on RISCV
https://lab.llvm.org/buildbot/#/builders/210/builds/5221
DeltaFile
+111-757llvm/test/Transforms/LoopVectorize/first-order-recurrence-tail-folding.ll
+44-179llvm/test/Transforms/LoopVectorize/use-scalar-epilogue-if-tp-fails.ll
+66-141llvm/test/Transforms/LoopVectorize/optsize.ll
+69-57llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll
+62-39llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
+40-46llvm/test/Transforms/LoopVectorize/pr43166-fold-tail-by-masking.ll
+392-1,21915 files not shown
+584-1,62621 files