LLVM/project c45dd43libcxx/include/__locale_dir num.h, libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members get_long.pass.cpp

[libc++] Fix num_get base parsing (#170460)

This fixes two bugs reported in #121795 and adds regression tests.
Specifically, these bugs are in the base detection mechanism. The first
bug is that the out parameter isn't set when the stream only contains
zero and after that is the end of the stream. The second one is that we
don't consider `0` to be a number, and instead we only parse it as the
start of an octal literal.
DeltaFile
+96-0libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_long.pass.cpp
+2-0libcxx/include/__locale_dir/num.h
+98-02 files

LLVM/project 6584e47llvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV/rvv vpmerge-sdnode.ll fixed-vectors-vpmerge.ll

[RISCV] Combine vmerge_vl allones -> vmv_v_v, vmv_v_v splat(x) -> vmv_v_x (#170539)

An upcoming patch aims to remove the last use of
@llvm.experimental.vp.splat in RISCVCodegenPrepare by replacing it with
a vp_merge of a regular splat.

A vp_merge will get lowered to vmerge_vl, and if we combine vmerge_vl of
a splat to vmv_v_x we can get the same behaviour as the vp.splat
intrinsic.

This adds the two combines needed. It was easier to do the combines on
_vl nodes rather than on vp_merge itself, since the types are already
legal for _vl nodes.
DeltaFile
+45-0llvm/test/CodeGen/RISCV/rvv/vpmerge-sdnode.ll
+45-0llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpmerge.ll
+38-0llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+128-03 files

LLVM/project cb8ce28llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU lds-dma-waits.ll

[AMDGPU][Waitcnts] Don't create a pending flat event for LDS DMA (#170263)

Flat instructions need a waitcnt(0) on both VMEM and LDS accesses, but
only when the instruction really is using flat addressing. The LDS DMA
instructions (on GFX9) have the FLAT flag set, but they have very clear
semantics. These instructions update only VM_CNT (on GFX9), and hence do
not need to be treated like actual flat instructions.
DeltaFile
+7-4llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+2-1llvm/test/CodeGen/AMDGPU/lds-dma-waits.ll
+9-52 files

LLVM/project 050d06fllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp

AMDGPU: Use correct chain when emitting error on a call

Return the input chain at the callsite, not the entry node
chain. Presumably this could cause issues somewhere.
DeltaFile
+1-1llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+1-11 files

LLVM/project d364c0elibcxx/include/__random mersenne_twister_engine.h

[libc++][NFC] Inline mersenne_twister_engine functions into the class body (#170454)

Defining the functions outside the class makes things way harder to read
here, since the list of template arguments is incredibly long.
DeltaFile
+50-154libcxx/include/__random/mersenne_twister_engine.h
+50-1541 files

LLVM/project adb7275llvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64SVEInstrInfo.td, llvm/test/CodeGen/AArch64 sve-intrinsics-int-arith-undef.ll

[LLVM][CodeGen][SVE] Maintain existing predicate when lowering aarch64.sve.[s,u]abd.u intrinsics. (#170472)

DeltaFile
+8-4llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+0-8llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-undef.ll
+2-2llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+10-143 files

LLVM/project 8e53a88llvm/lib/CodeGen/SelectionDAG TargetLowering.cpp DAGCombiner.cpp, llvm/test/CodeGen/AArch64 rem-by-const.ll sdiv-by-const-promoted-ops.ll

[DAGCombiner] Handle type-promoted constants in SDIV lowering (#169924)

Builds up on the solution proposed for #169491 and applies it for SDIV
as well.
DeltaFile
+17-72llvm/test/CodeGen/AArch64/rem-by-const.ll
+77-0llvm/test/CodeGen/AArch64/sdiv-by-const-promoted-ops.ll
+5-3llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+4-2llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+103-774 files

LLVM/project dd6e87bmlir/include/mlir/Dialect/LLVMIR NVVMOps.td

[MLIR][NVVM] Fix lowering logic after fddf7b05 (#170545)

Without this mapping fails when there is no result specified.

See:
https://github.com/llvm/llvm-project/pull/169922#issuecomment-3605378445

To reproduce error on `main`:

```bash
mkdir -p build && cd build
cmake -G Ninja ../llvm \
  -DLLVM_ENABLE_PROJECTS=mlir \
  -DLLVM_TARGETS_TO_BUILD="host;NVPTX" \
  -DMLIR_ENABLE_CUDA_RUNNER=ON \
  -DMLIR_RUN_CUDA_TENSOR_CORE_TESTS=ON \
  -DMLIR_RUN_CUDA_SM90_TESTS=ON \
  -DMLIR_GPU_COMPILATION_TEST_FORMAT=fatbin \
  -DMLIR_INCLUDE_INTEGRATION_TESTS=ON \

    [6 lines not shown]
DeltaFile
+2-5mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+2-51 files

LLVM/project eb45efbllvm/lib/Target/LoongArch LoongArchMachineFunctionInfo.h

Address weining's comments
DeltaFile
+2-2llvm/lib/Target/LoongArch/LoongArchMachineFunctionInfo.h
+2-21 files

LLVM/project 37ea097llvm/lib/Transforms/Vectorize VPlan.h LoopVectorize.cpp

[VPlan] Remove VPWidenRecipe constructor with no underlying instruction. NFCI (#166521)

My understanding is that a VPWidenRecipe should be used for recipes with
an exact underlying scalar instruction, and VPInstruction should be used
elsewhere e.g. for instructions generated as a part of the vectorization
process.

The only user of the VPWidenRecipe constructor that doesn't take an
underlying instruction is in adjustRecipesForReductions, but we can just
use VPInstruction there.
DeltaFile
+3-11llvm/lib/Transforms/Vectorize/VPlan.h
+3-3llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+6-142 files

LLVM/project dd3b2a4llvm/lib/CodeGen/GlobalISel IRTranslator.cpp

implement feedback
DeltaFile
+5-6llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+5-61 files

LLVM/project b4b369allvm/lib/Target/AArch64 AArch64TargetTransformInfo.cpp, llvm/test/Transforms/InstCombine/AArch64 sve-intrinsic-opts-dup.ll

[LLVM][InstCombine][AArch64] sve.dup(V, all_active, S) ==> splat(S) (#170292)

Also refactors the rest of instCombineSVEDup to simplify the code.
DeltaFile
+32-7llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-dup.ll
+15-17llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+47-242 files

LLVM/project 0458fe5llvm/test/CodeGen/AMDGPU vgpr-lowering-gfx1250.mir

[AMDGPU] Improve VGPR lowering test. NFC (#170633)

Add asm comments checks for readability.
DeltaFile
+65-0llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250.mir
+65-01 files

LLVM/project 73ef27cllvm/lib/CodeGen/SelectionDAG TargetLowering.cpp, llvm/test/CodeGen/AArch64 udiv-by-const-promoted-ops.ll

[DAGCombiner] Handle type-promoted constants in UDIV exact lowering (#169949)

Builds up on the solution proposed for
https://github.com/llvm/llvm-project/pull/169491 and applies it for UDIV
exact as well.
DeltaFile
+21-0llvm/test/CodeGen/AArch64/udiv-by-const-promoted-ops.ll
+5-3llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+26-32 files

LLVM/project 0e11a92llvm/lib/Transforms/Vectorize VPlanRecipes.cpp VPlanHelpers.h, llvm/test/Transforms/LoopVectorize vplan-printing-metadata.ll

[VPlan] Implement printing VPIRMetadata. (#168385)

mplement printing for VPIRMetadata, using generic dyn_cast to
VPIRMetadata.

Depends on https://github.com/llvm/llvm-project/pull/166245 

PR: https://github.com/llvm/llvm-project/pull/168385
DeltaFile
+21-9llvm/test/Transforms/LoopVectorize/vplan-printing-metadata.ll
+22-0llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+20-1llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+10-4llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
+5-0llvm/lib/Transforms/Vectorize/VPlan.h
+78-145 files

LLVM/project 0e517e1llvm/lib/CodeGen/GlobalISel IRTranslator.cpp, llvm/test/CodeGen/AMDGPU callbr-intrinsics.ll

[AMDGPU][GlobalISel] Fix / workaround amdgcn.kill/.unreachable lowering

cf. https://github.com/llvm/llvm-project/pull/133907#issuecomment-3611354688
DeltaFile
+19-6llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
+18-4llvm/test/CodeGen/AMDGPU/callbr-intrinsics.ll
+37-102 files

LLVM/project f6cbd7allvm/tools/llvm-profgen ProfiledBinary.cpp

[llvm-profgen] Fix warnings when building without asserts [NFC]

Building without asserts we got:
 ../tools/llvm-profgen/ProfiledBinary.cpp:627:14: error: unused variable 'Err' [-Werror,-Wunused-variable]
   627 |         bool Err = MIA->evaluateBranch(Inst, Address, Size, Target);
       |              ^~~
 ../tools/llvm-profgen/ProfiledBinary.cpp:1172:14: error: unused variable 'TopProbe' [-Werror,-Wunused-variable]
  1172 |         auto TopProbe = TopLevelProbes.begin();
       |              ^~~~~~~~
 2 errors generated.

Add [[maybe_unused]] to the variables just used in asserts.
DeltaFile
+3-2llvm/tools/llvm-profgen/ProfiledBinary.cpp
+3-21 files

LLVM/project 11b1bd5llvm/lib/Analysis DependenceAnalysis.cpp

[DA] Fix typo: Constan -> Constant (NFC) (#170636)

DeltaFile
+5-5llvm/lib/Analysis/DependenceAnalysis.cpp
+5-51 files

LLVM/project 0ba73fbllvm/docs ReleaseNotes.md, llvm/include/llvm-c Core.h

[llvm-c] Add LLVMConstFPFromBits() API (#164381)

This change adds the ability to create a 128 bit floating point value
from 2 64 bit integer values.
Some language frontends have already parsed a floating point string into
a proper 128 bit quad value
and need to get the llvm value directly.
DeltaFile
+30-0llvm/unittests/IR/ConstantsTest.cpp
+8-0llvm/include/llvm-c/Core.h
+8-0llvm/lib/IR/Core.cpp
+1-0llvm/docs/ReleaseNotes.md
+47-04 files

LLVM/project 8837e4dbolt/lib/Passes PointerAuthCFIAnalyzer.cpp

[BOLT] PointerAuthCFIAnalyzer: return early if there is no work

- makes sure we do not divide by zero, to calculate the % of ignored
  functions.
DeltaFile
+3-0bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+3-01 files

LLVM/project 9561666bolt/lib/Passes PointerAuthCFIAnalyzer.cpp

[BOLT] Add comment about the chosen threshold
DeltaFile
+10-0bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+10-01 files

LLVM/project 9cc76cbbolt/lib/Passes PointerAuthCFIAnalyzer.cpp, bolt/test/AArch64 pacret-cfi-incorrect.s

[BOLT] Use opts::Verbosity in PointerAuthCFIAnalyzer
DeltaFile
+17-10bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+1-1bolt/test/AArch64/pacret-cfi-incorrect.s
+18-112 files

LLVM/project e8dcb85bolt/lib/Passes PointerAuthCFIAnalyzer.cpp, bolt/test/runtime/AArch64 pacret-synchronous-unwind.cpp

[BOLT][PAC] Warn about synchronous unwind tables

BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.

See also: #165215
DeltaFile
+33-0bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
+8-1bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+41-12 files

LLVM/project a25e367bolt/docs PointerAuthDesign.md PacRetDesign.md, bolt/lib/Passes InsertNegateRAStatePass.cpp PointerAuthCFIFixup.cpp

[BOLT] Rename Pointer Auth DWARF rewriter passes (#164622)

Rename passes to names that better reflect their intent, 
and describe their relationship to each other.

InsertNegateRAStatePass renamed to PointerAuthCFIFixup,
MarkRAStates renamed to PointerAuthCFIAnalyzer.

Added the --print-<passname> flags for these passes.
DeltaFile
+339-0bolt/unittests/Passes/PointerAuthCFIFixup.cpp
+0-333bolt/unittests/Passes/InsertNegateRAState.cpp
+0-268bolt/lib/Passes/InsertNegateRAStatePass.cpp
+268-0bolt/lib/Passes/PointerAuthCFIFixup.cpp
+240-0bolt/docs/PointerAuthDesign.md
+0-235bolt/docs/PacRetDesign.md
+847-83624 files not shown
+1,445-1,42130 files

LLVM/project 39e2f30llvm/test/CodeGen/AMDGPU vgpr-lowering-gfx1250.mir

[AMDGPU] Improve VGPR lowering test. NFC

Add asm comments checks for readability.
DeltaFile
+65-0llvm/test/CodeGen/AMDGPU/vgpr-lowering-gfx1250.mir
+65-01 files

LLVM/project e5603dallvm/lib/Frontend/OpenMP OMPIRBuilder.cpp

[mlir][OpenMP] Fix assert in processing of dist_schedule (#170269)

When #152736 was initially merged, the assert that checks for the
chunksize when applying a static-chunked schedule was incorrect. While
it would not have changed the behaviour of the assert, the string
attached to it would have been emitted in cases where it was simplified.

This was raised here:
https://github.com/llvm/llvm-project/pull/152736#discussion_r2578314276

Testing for this was explored, but this assert is a last chance failure
point that should never be reached as applyWorkshareLoop decides the
`EffectiveScheduleType` based on the existence of `ChunkSize` or
`DistScheduleChunkSize`, so this will only trigger if there are issues
with that conversion, and UnitTesting already exists for
`applyWorkshareLoop`
DeltaFile
+2-2llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+2-21 files

LLVM/project ff89558llvm/lib/Transforms/Vectorize VPlanRecipes.cpp

[VPlan] Fix opcode in LoadStore EVL recipe (#170594)

After #169885 lands, vp_load/vp_store are handled by
getMemIntrinsicInstrCost, so we can use the correct opcode here.
DeltaFile
+2-6llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+2-61 files

LLVM/project 3c71ffallvm/test/CodeGen/AMDGPU div_i128.ll, llvm/test/CodeGen/AMDGPU/GlobalISel lshr.ll frem.ll

rebase
DeltaFile
+8-22llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll
+3-3llvm/test/CodeGen/AMDGPU/div_i128.ll
+0-6llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
+11-313 files

LLVM/project 52113cfllvm/lib/Target/AMDGPU AMDGPUISelLowering.cpp, llvm/test/CodeGen/AMDGPU llvm.exp10.ll

AMDGPU: Fix broken exp10 lowering for f16 (#170582)

This was calling the exp handling, so multiplying by the wrong
constant.

GlobalISel is still broken, but missing the fast exp10 path.
This is tracked in https://github.com/llvm/llvm-project/issues/170576
DeltaFile
+385-126llvm/test/CodeGen/AMDGPU/llvm.exp10.ll
+7-3llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+392-1292 files

LLVM/project 6c22e57llvm/lib/Target/AMDGPU AMDGPUCombine.td

Remove duplicate combines
DeltaFile
+3-4llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+3-41 files