LLVM/project 146533elibc/src/__support/OSUtil/linux/syscall_wrappers mmap.h

[libc] Fix for SYS_mmap2 offset computation (#197413)

The comment implies that the offset argument is a multiple of page size,
but

[this](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/csky/kernel/syscall.c#L25)
[is](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/parisc/kernel/sys_parisc.c#L193)
[not](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/microblaze/kernel/sys_microblaze.c#L50)
[the](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/riscv/kernel/sys_riscv.c#L48)
[case](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/arm64/kernel/sys32.c#L47)
[for](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/sparc/kernel/sys_sparc_32.c#L113)
[almost](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/m68k/kernel/sys_m68k.c#L44)
[every](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/powerpc/kernel/syscalls.c#L56)
[architecture](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/sh/kernel/sys_sh.c#L46)
[supported](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/powerpc/kernel/syscalls.c#L56)
[by](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/mips/kernel/syscall.c#L76)
[linux](https://github.com/torvalds/linux/blob/1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524/arch/arm/kernel/entry-common.S#L410).
Most architectures just use fixed 4k units instead.


    [16 lines not shown]
DeltaFile
+5-6libc/src/__support/OSUtil/linux/syscall_wrappers/mmap.h
+5-61 files

LLVM/project cc91763.github CODEOWNERS

Fix misspelling of 'llvm' (#197649)
DeltaFile
+1-1.github/CODEOWNERS
+1-11 files

LLVM/project 4141630clang/lib/Driver/ToolChains/Arch AArch64.cpp, clang/test/Driver aarch64-march.c

[AArch64][clang] Improve -march= error message with many feature flags (#197441)

When calling `clang` with a large number of feature flags, the entire
argument is printed as an error message if one of the feature flags is
invalid.

For example, before this change, when providing a large number of features
to `-march=` with one of them invalid, an error message such as this is
printed:
```
  clang: error: unsupported argument 'armv9.6a+sme2+sme2p1+sve2+sve2p1+profile
  +crypto+aes+sha2+sha3+sm4+memtag+ssbs+bf16+i8mm+dotprod+ls64+rcpc3+brbe+gcs
  +faminmax+fp8+fp8fma+fp8dot4+fp8dot2+sme-f8f32+the+lut+lsui+pops+occmo
  +rme-gpc3+d128+invalidfeature'
```
and a user doesn't know which of the `+feature` flags is actually invalid.

After this change, the following error message is printed:
```

    [2 lines not shown]
DeltaFile
+27-12clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+7-0clang/test/Driver/aarch64-march.c
+34-122 files

LLVM/project adb5802llvm/lib/Transforms/Vectorize LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize byte-type-function-variants.ll

[LV] Avoid crashing for vector calls with scalar byte types (#197417)

If a parameter to a vector function variant is uniform or linear, check
whether the type is SCEVable first. Byte types aren't, so would cause
an assert. We could improve this later if needed.
DeltaFile
+212-0llvm/test/Transforms/LoopVectorize/byte-type-function-variants.ll
+4-2llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+216-22 files

LLVM/project 9a4faeellvm/lib/IR Constants.cpp, llvm/test/CodeGen/AArch64 neon-mov.ll

[LLVM][Constants] Remove the option to disable vector ConstantFP support. (#197427)

Removes the command line options:
  -use-constant-fp-for-fixed-length-splat
  -use-constant-fp-for-scalable-splat
DeltaFile
+12-24llvm/test/Transforms/InstCombine/load-store-forward.ll
+8-22llvm/lib/IR/Constants.cpp
+2-7llvm/test/Transforms/Attributor/nofpclass.ll
+4-4llvm/test/CodeGen/AArch64/neon-mov.ll
+4-4llvm/test/Transforms/InstCombine/extractelement.ll
+3-3llvm/test/CodeGen/PowerPC/vec_constants.ll
+33-6432 files not shown
+56-10738 files

LLVM/project b80e53dbolt/lib/Passes LongJmp.cpp, bolt/test/AArch64 long-jmp-hugify-fixup-out-of-range.s

[BOLT][AArch64] Account for hugify alignment in AArch64 long jump layout (#195272)

When --hugify is used for a PIE, the final section allocation in
RewriteInstance::mapCodeSections aligns the address after the last
non-cold text section before laying out the following sections:

  for (BinarySection *Section : CodeSections) {
    Address = alignTo(Address, Section->getAlignment());
    Section->setOutputAddress(Address);
    Address += Section->getOutputSize();

    if (opts::Hugify && !BC->HasFixedLoadAddress &&
        Section->getName() == LastNonColdSectionName)
      Address = alignTo(Address, Section->getAlignment());
  }

The AArch64 long-jump pass doesn't model that gap in its tentative
layout, so a CBZ could be considered in range during stub insertion and
later become out of range when JITLink applied the final layout.

    [5 lines not shown]
DeltaFile
+21-6bolt/test/AArch64/long-jmp-hugify-fixup-out-of-range.s
+5-0bolt/lib/Passes/LongJmp.cpp
+26-62 files

LLVM/project a2098f2llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal (#197583)
DeltaFile
+33-48llvm/test/MC/AMDGPU/literals.s
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+40-492 files

LLVM/project 2003bc2llvm/lib/Target/AMDGPU AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel zextload.ll legalize-sextload-global.mir

AMDGPU/GlobalISel: Legalize scalar extloads with large memory type

Add narrowScalar for scalar sext/zextload when the memory type is
larger then 32 bits. There is no narrow scalar implementation when
NarrowSize < MemSize (split load) but we don't want that anyway.
Narrow scalar to MemSize creates large normal load + extension to dst.
DeltaFile
+52-0llvm/test/CodeGen/AMDGPU/GlobalISel/zextload.ll
+10-5llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sextload-global.mir
+8-5llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-zextload-global.mir
+7-0llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+77-104 files

LLVM/project 73a48a6llvm/lib/CodeGen/SelectionDAG LegalizeDAG.cpp

[LLVM][CodeGen] When expanding ISD::LRINT, non-deterministic results should be frozen. (#197435)
DeltaFile
+5-3llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+5-31 files

LLVM/project 652a978clang/test/AST ast-dump-templates.cpp, llvm/test/CodeGen/AArch64 bf16-v8-instructions.ll

Merge remote-tracking branch 'origin/main' into xteam-red-runtime
DeltaFile
+652-9,343clang/test/AST/ast-dump-templates.cpp
+7,584-740llvm/test/CodeGen/AArch64/bf16-v8-instructions.ll
+8,195-0llvm/test/MC/AMDGPU/gfx13_asm_vop3.s
+8,182-0llvm/test/MC/AMDGPU/gfx13_asm_vop3-fake16.s
+6,873-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Premium-sve-instructions.s
+6,862-0llvm/test/tools/llvm-mca/AArch64/Cortex/C1Nano-sve-instructions.s
+38,348-10,0838,257 files not shown
+437,282-165,1218,263 files

LLVM/project 0b53f85clang/lib/Driver/ToolChains/Arch AArch64.cpp, clang/test/Driver aarch64-march.c

[AArch64][clang] Improve -mcpu= and -mtune= error messages too

Similar to my previous change improving the error message for
`-march=` in #197441, this changes `-mcpu=` and `-mtune=` arguments
to only report the invalid feature flag, rather than the entire
string.

This is a much clearer error message for the user.
DeltaFile
+34-26clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+13-0clang/test/Driver/aarch64-march.c
+47-262 files

LLVM/project 5a0e401clang/test/Driver aarch64-march.c

fixup! Adjust test comment
DeltaFile
+1-1clang/test/Driver/aarch64-march.c
+1-11 files

LLVM/project ccd7f4cllvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal
DeltaFile
+33-48llvm/test/MC/AMDGPU/literals.s
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+40-492 files

LLVM/project a4acc5cllvm/test/CodeGen/X86 vector-reduce-umin.ll vector-reduce-umax.ll

[X86] Improve lowering of i32/i64 minmax reductions (#197578)

Allow 32-bit targets to correctly lower i64 ISD::VECREDUCE min/max nodes
via ReplaceNodeResults - this is necessary once we're finally ready for
#194473 and remove combineMinMaxReduction entirely

Improve handling of v2iXX reduction stages by consistently preferring
binop(extract(),extract()) scalarisation on SSE targets (if the vector
binop isn't legal).
DeltaFile
+592-703llvm/test/CodeGen/X86/vector-reduce-umin.ll
+573-658llvm/test/CodeGen/X86/vector-reduce-umax.ll
+460-549llvm/test/CodeGen/X86/vector-reduce-smin.ll
+464-543llvm/test/CodeGen/X86/vector-reduce-smax.ll
+227-276llvm/test/CodeGen/X86/horizontal-reduce-umin.ll
+210-244llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
+2,526-2,9735 files not shown
+2,886-3,39711 files

LLVM/project b3c0ae4openmp/device/src Reduction.cpp

add a comment about the self-reset
DeltaFile
+2-0openmp/device/src/Reduction.cpp
+2-01 files

LLVM/project 0521762compiler-rt/lib/builtins/arm extendsfdf2.S truncdfsf2.S

Update for rename of endian.h in a previous patch
DeltaFile
+1-1compiler-rt/lib/builtins/arm/extendsfdf2.S
+1-1compiler-rt/lib/builtins/arm/truncdfsf2.S
+2-22 files

LLVM/project e1b0f56llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge from main
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-019,520 files not shown
+2,324,406-580,58319,526 files

LLVM/project 1cb92d8compiler-rt/lib/builtins/arm fcmp.h cmpsf2.S, compiler-rt/lib/builtins/arm/thumb1 fcmp.h cmpsf2.S

[compiler-rt][ARM] Optimized single-precision FP comparisons (#179925)

These comparison functions follow the same structure as the
double-precision ones in a prior commit, of a header file containing the
main logic and some entry points varying the construction of the return
value.

In this case, we have provided versions for Thumb1 as well as
Arm/Thumb2.
DeltaFile
+443-0compiler-rt/test/builtins/Unit/comparesf2new_test.c
+189-0compiler-rt/lib/builtins/arm/thumb1/fcmp.h
+176-0compiler-rt/lib/builtins/arm/fcmp.h
+56-0compiler-rt/lib/builtins/arm/cmpsf2.S
+56-0compiler-rt/lib/builtins/arm/unordsf2.S
+55-0compiler-rt/lib/builtins/arm/thumb1/cmpsf2.S
+975-04 files not shown
+1,147-110 files

LLVM/project b783371llvm/lib/Target/AMDGPU/Disassembler AMDGPUDisassembler.cpp, llvm/test/MC/AMDGPU literals.s

[AMDGPU] Fix disasm roundtrip for forced fp64 literal
DeltaFile
+33-48llvm/test/MC/AMDGPU/literals.s
+7-1llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+40-492 files

LLVM/project d566c8cclang/lib/Format UnwrappedLineParser.cpp, clang/unittests/Format FormatTest.cpp

[clang-format] Fix parsing of goto labels (#197538)

Fixes #196662.

---------

Co-authored-by: owenca <owenpiano at gmail.com>
DeltaFile
+41-18clang/unittests/Format/FormatTest.cpp
+3-3clang/lib/Format/UnwrappedLineParser.cpp
+44-212 files

LLVM/project 6293f16llvm/lib/TableGen Record.cpp

[TableGen] Simplify Record type checks. NFC. (#197450)
DeltaFile
+6-6llvm/lib/TableGen/Record.cpp
+6-61 files

LLVM/project 3fda43dllvm/test/CodeGen/AMDGPU llvm.amdgcn.permlane.ll llvm.amdgcn.permlane.bcast.ll

[AMDGPU] Update permlane_bcast/down/up/xor intrinsic to support more types (#197141)

Co-authored-by: Acim Maravic <Acim.Maravic at amd.com>
DeltaFile
+3,435-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll
+1,105-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.bcast.ll
+1,105-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.down.ll
+1,105-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.up.ll
+1,105-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.xor.ll
+0-440llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.gfx1250.ll
+7,855-4409 files not shown
+8,043-48015 files

LLVM/project 151cc5dllvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge from main
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+41,844-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills2.mir
+40,613-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills1.mir
+37,209-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/test_ers_multiple_spills3.mir
+597,128-019,514 files not shown
+2,316,298-580,03119,520 files

LLVM/project ad6050ellvm/test/Transforms/LoopVectorize gather-scatter.ll if-conversion-scalable.ll

Nits to harmonise tests
DeltaFile
+159-31llvm/test/Transforms/LoopVectorize/gather-scatter.ll
+65-53llvm/test/Transforms/LoopVectorize/if-conversion-scalable.ll
+224-842 files

LLVM/project 7ae1962clang-tools-extra/clangd CodeComplete.cpp, clang-tools-extra/clangd/unittests CodeCompleteTests.cpp

[clangd] Fix parens suppression in mid-identifier code-completion (#197249)

When completing in the middle of an existing identifier (e.g.
`fo^o<int>(42)`), the next-token check lexes the character immediately
after the cursor, which prevents parens suppression to kick in.

After the fix, we go to the end of the current identifier first and only
then we start lexing for the next token, which handles redundant parens
even when the cursor is mid-identifier.

This also fixes the parens suppression in the replace mode which by
design is used mid-identifier.

Fixes https://github.com/clangd/clangd/issues/387
DeltaFile
+18-57clang-tools-extra/clangd/CodeComplete.cpp
+13-4clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp
+31-612 files

LLVM/project c0ed919llvm/lib/Target/AMDGPU GCNPreRAOptimizations.cpp, llvm/test/CodeGen/AMDGPU optimize-ds-bvh-stack-pre-ra.ll

[AMDGPU][GCNPreRAOptimizations] Reduce BVH premature reuse (#197386)

Add implicit uses to ds_bvh_stack instructions to avoid reuse of VGPRs
allocated to bvh_intersect_ray results prior to ds_bvh_stack. This
reduces likelihood of a premature s_wait_bvhcnt occuring due to partial
reallocation of unused bvh_intersect_ray results registers.
DeltaFile
+300-0llvm/test/CodeGen/AMDGPU/optimize-ds-bvh-stack-pre-ra.ll
+78-24llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
+378-242 files

LLVM/project d2af73ccompiler-rt/lib/builtins/arm dcmp.h unorddf2.S, compiler-rt/lib/builtins/arm/thumb1 dcmp.h cmpdf2.S

[compiler-rt][ARM] Optimized double-precision FP comparisons (#179924)

The structure of these comparison functions consists of a header file
containing the main code, and several `.S` files that include that
header with different macro definitions, so that they can use the same
procedure to determine the logical comparison result and then just
translate it into a return value in different ways.
DeltaFile
+619-0compiler-rt/test/builtins/Unit/comparedf2new_test.c
+231-0compiler-rt/lib/builtins/arm/thumb1/dcmp.h
+212-0compiler-rt/lib/builtins/arm/dcmp.h
+71-0compiler-rt/lib/builtins/arm/unorddf2.S
+64-0compiler-rt/lib/builtins/arm/cmpdf2.S
+61-0compiler-rt/lib/builtins/arm/thumb1/cmpdf2.S
+1,258-04 files not shown
+1,447-010 files

LLVM/project 4c88347clang/lib/Driver/ToolChains/Arch AArch64.cpp, clang/test/Driver aarch64-march.c

[AArch64][clang] Improve -mcpu= and -mtune= error messages too

Similar to my previous change improving the error message for
`-march=` in #197441, this changes `-mcpu=` and `-mtune=` arguments
to only report the invalid feature flag, rather than the entire
string.

This is a much clearer error message for the user.
DeltaFile
+34-26clang/lib/Driver/ToolChains/Arch/AArch64.cpp
+13-0clang/test/Driver/aarch64-march.c
+47-262 files

LLVM/project 7b09a4ellvm/lib/Transforms/Vectorize VPlanRecipes.cpp LoopVectorize.cpp, llvm/test/Transforms/LoopVectorize/AArch64 reduction-cost.ll

[LV] Fix the cost model for freeze instructions (#197188)

While working on a PR to add a cost model for VPDerivedIV recipes I
noticed that a loop in or_reduction_with_freeze:

test/Transforms/LoopVectorize/AArch64/reduction-cost.ll

stopped vectorising because the cost model decided it was no longer
worth it. However, the main cause of this was the incredibly high cost
(14) of freeze for VF=2. We were using the cost of a vector mul
instruction as a proxy for the freeze cost, which is incredibly bad for
an AArch64 target without SVE since the operation needs scalarising.

As far as I understand, the freeze instruction does not lead to any
actual code being generated and acts merely as a barrier to potentially
unsafe optimisations. As such, I've updated the cost model to return 0
instead.
DeltaFile
+6-3llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+3-3llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll
+1-3llvm/test/Transforms/LoopVectorize/AArch64/reduction-cost.ll
+2-0llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+12-94 files

LLVM/project 20f4289llvm/test/Transforms/LoopVectorize/AArch64 reduction-small-size.ll

[LV][NFC] Generate full CHECK lines for reduction-small-size.ll (#197632)
DeltaFile
+161-38llvm/test/Transforms/LoopVectorize/AArch64/reduction-small-size.ll
+161-381 files