LLVM/project 5a7c1e0clang-tools-extra/clang-tidy/readability NonConstParameterCheck.cpp, clang-tools-extra/docs ReleaseNotes.rst

[clang-tidy] Fix false positive in readability-non-const-parameter with dependent array subscripts (#190936)

Fixes #60163

---------

Co-authored-by: Zeyi Xu <zeyi2 at nekoarch.cc>
DeltaFile
+16-0clang-tools-extra/test/clang-tidy/checkers/readability/non-const-parameter.cpp
+9-5clang-tools-extra/docs/ReleaseNotes.rst
+6-1clang-tools-extra/clang-tidy/readability/NonConstParameterCheck.cpp
+31-63 files

LLVM/project 2684dfallvm/include/llvm/Analysis UniformityAnalysis.h, llvm/lib/Analysis UniformityAnalysis.cpp

refactor: update variable names in uniformity analysis
DeltaFile
+15-15llvm/lib/Analysis/UniformityAnalysis.cpp
+4-4llvm/include/llvm/Analysis/UniformityAnalysis.h
+19-192 files

LLVM/project fcb8fbaorc-rt/unittests SessionTest.cpp

[orc-rt] Move RedundantAsyncShutdown to SessionTest suite. NFCI. (#191130)

RedundantAsyncShutdown is a Session test, not a ControllerAccess test.
DeltaFile
+16-16orc-rt/unittests/SessionTest.cpp
+16-161 files

LLVM/project d512d4allvm/include/llvm/ADT GenericUniformityImpl.h, llvm/lib/Analysis UniformityAnalysis.cpp

[UniformityAnalysis] Skip CycleAnalysis on targets without branch divergence (#189948)

UniformityAnalysis unconditionally computes CycleAnalysis even on
targets that don't care about divergence, causing measurable
compile-time overhead (see [#99878
(comment)](https://github.com/llvm/llvm-project/pull/175167#issuecomment-4156230947)).

---------

Co-authored-by: padivedi <padivedi at amd.com>
DeltaFile
+13-12llvm/lib/Analysis/UniformityAnalysis.cpp
+10-4llvm/include/llvm/ADT/GenericUniformityImpl.h
+23-162 files

LLVM/project d2b1229clang/include/clang/Analysis/Analyses PostOrderCFGView.h, clang/lib/Analysis PostOrderCFGView.cpp

Fix Clang+MLIR

Created using spr 1.3.8-wip
DeltaFile
+11-180llvm/include/llvm/ADT/PostOrderIterator.h
+0-51clang/include/clang/Analysis/Analyses/PostOrderCFGView.h
+26-15mlir/include/mlir/IR/Iterators.h
+33-5clang/lib/Analysis/PostOrderCFGView.cpp
+2-1llvm/include/llvm/Analysis/BlockFrequencyInfoImpl.h
+2-1llvm/include/llvm/Analysis/LoopIterator.h
+74-2534 files not shown
+78-25710 files

LLVM/project 4852657orc-rt/include/orc-rt Session.h, orc-rt/lib/executor Session.cpp

[orc-rt] Remove Session::waitForShutdown. (#191124)

The existing implementation triggered Session shutdown and then blocked
on a std::future that would be unblocked by an on-shutdown callback that
waitForShutdown had installed. Since there is no guarantee that this
callback would be the last one run, the result was that waitForShutdown
only guaranteed that it would not return until the shutdown sequence had
started (rather than completed).

This could have been fixed, but the Session destructor is already
supposed to block until the Session can be safely destroyed, so a
"working" waitForShutdown would be effectively redundant. Since it was
also a potential footgun (calling it from an on-detach or on-shutdown
callback could deadlock) it was safer to just remove it entirely.

Some Session unit tests do rely on testing properties of the Session
after the shutdown sequence has started, so a new utility has been added
to SessionTests.cpp to support this.
DeltaFile
+36-36orc-rt/unittests/SessionTest.cpp
+14-13orc-rt/lib/executor/Session.cpp
+6-10orc-rt/include/orc-rt/Session.h
+56-593 files

LLVM/project b743d7dclang/test/CIR/CodeGenHIP builtins-amdgcn.hip

update temp file name in test
DeltaFile
+2-2clang/test/CIR/CodeGenHIP/builtins-amdgcn.hip
+2-21 files

LLVM/project 27e6a4ellvm/docs AMDGPUUsage.rst

[AMDGPUUsage] Specify what one-as syncscopes do

This matches the currently implemented and (as far as I could determine)
intended semantics of these syncscopes.
The sync scope table is unchanged except for removing its indentation;
otherwise it would be rendered as part of the preceding note.
DeltaFile
+89-77llvm/docs/AMDGPUUsage.rst
+89-771 files

LLVM/project a8dfe0bllvm/docs LangRef.rst

[LangRef] Specify that syncscopes can affect the monotonic modification order

If a target specifies that atomics with mismatching syncscopes appear
non-atomic to each other, there is no point in requiring them to be ordered in
the monotonic modification order. Notably, the [AMDGPU target user
guide](https://llvm.org/docs/AMDGPUUsage.html#memory-scopes) has specified
syncscopes to relax the modification order for years.

So far, I haven't found an example where this less constrained ordering would
be observable (at least with the AMDGPU inclusive scope rules). Whenever a load
would be able to see two monotonic stores with non-inclusive scope, that's
considered a data race (i.e., the load would return `undef`), so it cannot be
used to observe the order of the stores.
DeltaFile
+13-9llvm/docs/LangRef.rst
+13-91 files

LLVM/project 497a161llvm/docs AMDGPUUsage.rst LangRef.rst

[LangRef][AMDGPU] Specify that syncscope can cause atomic operations to race

Targets should be able to specify that the syncscope of atomic operations
influences whether they participate in data races with each other.

For example, in AMDGPU, we want (and already implement) the load in the
following case to be in a data race (i.e., return `undef` according to the
current definition), because there is an atomic store with workgroup syncscope
executing in a different workgroup:

```
; workgroup 0:
  store atomic i32 1, ptr %p syncscope("workgroup") monotonic, align 4

; workgroup 1:
  store atomic i32 2, ptr %p syncscope("workgroup") monotonic, align 4
  load atomic i32, ptr %p syncscope("workgroup") monotonic, align 4
```


    [3 lines not shown]
DeltaFile
+4-1llvm/docs/AMDGPUUsage.rst
+2-1llvm/docs/LangRef.rst
+6-22 files

LLVM/project f518458llvm/docs LangRef.rst

Add an "(or stronger)" for clarity, improve wrapping.
DeltaFile
+10-10llvm/docs/LangRef.rst
+10-101 files

LLVM/project 9ba7745llvm/test/CodeGen/PowerPC masked-srem.ll masked-urem.ll, llvm/test/CodeGen/X86 masked-srem.ll masked-urem.ll

[IR] Add llvm.masked.{udiv,sdiv,urem,srem} intrinsics (#189705)

Because division by zero is undefined behaviour, when the loop
vectorizer encounters a div that's not unconditionally executed it needs
to replace its divisor with a non-zero value on any lane that wouldn't
have been executed in the scalar loop:

   %safedivisor = select <vscale x 2 x i1> %mask, <vscale x 2 x i64> %divisor, <vscale x 2 x i64> splat (i64 1)
   %res = udiv <vscale x 2 x i64> %dividend, %safedivisor

https://godbolt.org/z/jczc3ovbr

We need this for architectures like x86 where division by zero (or
overflow for sdiv/srem) can trap. But on AArch64 and RISC-V division
doesn't trap so we don't actually need to mask off any divisors. Not
only that, but there are also dedicated vector division instructions
that can be predicated.

Originally we tried to optimize this on RISC-V by transforming `udiv x,

    [11 lines not shown]
DeltaFile
+722-0llvm/test/CodeGen/X86/masked-srem.ll
+720-0llvm/test/CodeGen/X86/masked-urem.ll
+718-0llvm/test/CodeGen/X86/masked-sdiv.ll
+716-0llvm/test/CodeGen/X86/masked-udiv.ll
+460-0llvm/test/CodeGen/PowerPC/masked-srem.ll
+458-0llvm/test/CodeGen/PowerPC/masked-urem.ll
+3,794-028 files not shown
+8,126-134 files

LLVM/project 9162f06llvm/lib/Passes PassBuilder.cpp, llvm/test/Other new-pm-lto-defaults.ll new-pm-defaults.ll

[Passes] Enable vectorizers at Oz (#190182)

The way this is handled right now is very inconsistent. When using
`-passes="default<Oz>"` (the code modified here), both vectorizers were
disabled. The clang frontend enables SLP at Oz but not LoopVectorize.
All the LTO backends enable both vectorizers at Oz.

I'm proposing here that `default<Oz>` should enable both vectorizers by
default. There seems to be a consensus that this is the right thing to
do for SLP (as both Clang and LTO backends do this). It's a bit less
clear for LoopVectorize, but given that the implementation already has
special handling for minsize functions (like switching to code size for
cost modelling and disabling various size-increasing transforms) I'm
inclined that we should also be enabling it at minsize.

This is part of trying to make optsize/minsize purely attribute based
and independent of the pipeline optimization level.
DeltaFile
+0-1,495llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll
+1,300-0llvm/test/Transforms/PhaseOrdering/X86/loop-vectorize-metadata.ll
+8-10llvm/test/Other/new-pm-lto-defaults.ll
+3-8llvm/test/Other/new-pm-defaults.ll
+4-6llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+2-5llvm/lib/Passes/PassBuilder.cpp
+1,317-1,5242 files not shown
+1,321-1,5328 files

LLVM/project f1aa984llvm/lib/Target/RISCV RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV/rvv vuitofp-vp.ll vsitofp-vp.ll

[RISCV] Remove codegen for vp_fp_to_{u,s}int, vp_{u,s}int_to_fp (#190576)

Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off 4 intrinsics from #179622.
DeltaFile
+105-190llvm/test/CodeGen/RISCV/rvv/vuitofp-vp.ll
+99-184llvm/test/CodeGen/RISCV/rvv/vsitofp-vp.ll
+84-135llvm/test/CodeGen/RISCV/rvv/vfptoui-vp.ll
+84-135llvm/test/CodeGen/RISCV/rvv/vfptosi-vp.ll
+5-192llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+72-119llvm/test/CodeGen/RISCV/rvv/fixed-vectors-uitofp-vp.ll
+449-95513 files not shown
+839-1,52319 files

LLVM/project 4b6e164clang/lib/CIR/Lowering/DirectToLLVM LowerToLLVM.cpp, clang/test/CIR/Lowering address-space.cir

Handle case when type and addrspace differ
DeltaFile
+10-4clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+7-0clang/test/CIR/Lowering/address-space.cir
+17-42 files

LLVM/project 87491a4clang/lib/Driver ModulesDriver.cpp, clang/test/Driver modules-driver-clang-modules-only.cpp modules-driver-manifest-input-args.cpp

Revert "[clang][ModulesDriver] Add support for Clang modules to -fmodules-driver" due to memory leak (#191122)

Reverts llvm/llvm-project#187606 due to a memory leak.
See
https://github.com/llvm/llvm-project/pull/187606#issuecomment-4212198373
DeltaFile
+0-127clang/test/Driver/modules-driver-clang-modules-only.cpp
+21-51clang/lib/Driver/ModulesDriver.cpp
+9-7clang/test/Driver/modules-driver-manifest-input-args.cpp
+30-1853 files

LLVM/project 4e832f1llvm/docs LangRef.rst

[LangRef] Allow monotonic & seq_cst accesses to inter-operate with other accesses

Currently, the LangRef says that atomic operations (which includes `unordered`
operations, which don't participate in the monotonic modification order) must
read a value from the modification order of monotonic operations.

In the following example, this means that the load does not have a store it
could read from, because all stores it may see do not participate in the
monotonic modification order:

```
; thread 0:
  store atomic i32 1, ptr %p unordered, align 4

; thread 1:
  store atomic i32 2, ptr %p unordered, align 4
  load atomic i32, ptr %p unordered, align 4
```


    [18 lines not shown]
DeltaFile
+17-15llvm/docs/LangRef.rst
+17-151 files

LLVM/project e2b44f1libc/docs CMakeLists.txt, libc/docs/headers index.rst

[libc][docs] Add fcntl.h POSIX header documentation (#188822)

Add YAML documentation for `fcntl.h` listing all functions and macros as
defined in POSIX.1-2024 (IEEE Std 1003.1-2024).

**Functions (6):** creat, fcntl, open, openat, posix_fadvise,
posix_fallocate

**Macros (51):** O_RDONLY, O_WRONLY, O_RDWR, O_APPEND, O_CREAT, O_EXCL,
O_TRUNC, F_DUPFD, F_GETFD, F_SETFD, F_GETFL, F_SETFL, AT_FDCWD,
POSIX_FADV_*, and more.

Part of #122006
DeltaFile
+121-0libc/utils/docgen/fcntl.yaml
+1-0libc/docs/CMakeLists.txt
+1-0libc/docs/headers/index.rst
+123-03 files

LLVM/project e9a36c0llvm/lib/CodeGen/SelectionDAG DAGCombiner.cpp, llvm/test/CodeGen/SystemZ vec-trunc-to-i16.ll

[CodeGen] Preserve big-endian trunc in concat_vectors (#190701)

A transform from `concat_vectors(trunc(scalar), undef)` to
`scalar_to_vector(scalar)` is only equivalent for little-endian targets.
On big-endian, that would put the extra upper bytes ahead of the desired
truncated bytes. This problem was seen on Rust s390x in [RHEL-147748].

[RHEL-147748]: https://redhat.atlassian.net/browse/RHEL-147748

Assisted-by: Claude Code
(cherry picked from commit 5df89ae3da8b24804c17479ce74a930783db045e)
DeltaFile
+45-0llvm/test/CodeGen/SystemZ/vec-trunc-to-i16.ll
+3-1llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+48-12 files

LLVM/project 53c41b3llvm/lib/Target/PowerPC PPCISelLowering.cpp PPCISelLowering.h, llvm/test/CodeGen/PowerPC bitcast-truncate-vec-i1.ll

[PowerPC] Optimize bitcast(truncate) patterns using vbpermq (#181233)

Use vbpermq and vbpermd to efficiently pack i1 vector bits into scalar
integers, avoiding stack operations during type legalization.
Fixes https://github.com/llvm/llvm-project/issues/171879

(cherry picked from commit 668938917493fe05c98d5b725f68dfd17ab8eb2f)
DeltaFile
+203-0llvm/test/CodeGen/PowerPC/bitcast-truncate-vec-i1.ll
+79-0llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+6-0llvm/lib/Target/PowerPC/PPCISelLowering.h
+288-03 files

LLVM/project 7c1805dllvm/lib/Target/RISCV RISCVISelLowering.cpp RISCVTargetTransformInfo.h, llvm/test/CodeGen/RISCV/rvv vcopysign-vp.ll vfabs-vp.ll

[RISCV] Remove codegen for vp_fabs, vp_fcopysign (#190592)

Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off 2 intrinsics from #179622.

The remaining sign bit intrinsic vp_fneg is expanded in #190589 since
other tests rely on it
DeltaFile
+261-261llvm/test/CodeGen/RISCV/rvv/vcopysign-vp.ll
+215-240llvm/test/CodeGen/RISCV/rvv/vfabs-vp.ll
+151-176llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfabs-vp.ll
+102-145llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vcopysign-vp.ll
+3-8llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+0-2llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+732-8321 files not shown
+734-8327 files

LLVM/project a5ae639clang/test/CodeGen/X86 pr190962.ll

[X86] Fix No available targets failure (#191116)
DeltaFile
+1-0clang/test/CodeGen/X86/pr190962.ll
+1-01 files

LLVM/project ab2f55blld/COFF SymbolTable.cpp Config.h, lld/test/COFF export-all-conflict.test

[LLD] [COFF] Fix crashes for conflicting exports with -export-all-symbols (#190492)

Commit adcdc9cc3740adba3577b328fa3ba492cbccd3a5 (since LLD 17) added a
warning message if there are conflicting attempts to export a specific
symbol.

That commit missed one source of exports, from the LLD specific
-export-all-symbols flag (which only has an effect in mingw mode).

To trigger this case, one needs to have an export set by a def file,
combined with the -export-all-symbols flag (which attempts to export all
global symbols, despite explicit exports through embedded directives or
a def file).

To trigger the warning (and the previous crash), one would have to have
some difference between the export produced by -export-all-symbols and
the one from the def file. That difference could be e.g. that the def
file contained an explicit ordinal, or that the def file lacked a DATA
marking for a symbol that the automatic export of all symbols decides to

    [7 lines not shown]
DeltaFile
+18-0lld/test/COFF/export-all-conflict.test
+2-0lld/COFF/SymbolTable.cpp
+1-0lld/COFF/Config.h
+1-0lld/COFF/Driver.cpp
+22-04 files

LLVM/project 4575266llvm/lib/MC MCAssembler.cpp, llvm/test/MC/X86 align-branch-convergence.s

[MC] Track per-section inner relaxation iterations and add convergence test (#191121)

Count inner iterations (max across sections) instead of outer relaxOnce
calls. This more accurately reflects the work done during relaxation.

Add a test that verifies boundary alignment convergence may require
O(N) iterations where N is the number of BoundaryAlign fragments.
This will be fixed by #190318
DeltaFile
+71-0llvm/test/MC/X86/align-branch-convergence.s
+8-5llvm/lib/MC/MCAssembler.cpp
+79-52 files

LLVM/project 0d42811llvm/lib/Target/AArch64 AArch64RegisterInfo.cpp

[AArch64] Avoid expensive getStrictlyReservedRegs calls in isAnyArgRegReserved (#190957)

`AArch64RegisterInfo::isAnyArgRegReserved` is used during call lowering
across all instruction selectors (SDAG, GISel, FastISel) to emit an
error if any of the arg registers (x0-x7) are reserved. This puts
`AArch64RegisterInfo::getStrictlyReservedRegs` which computes this in
the hot-path and it shows up in compile-time profiles since it's
computed for every call.

As the intent was to guard against using +reserve-x{0-7} with function
calls we can instead call `isXRegisterReserved` which is faster since
it's a simple BitVector lookup.

Compile-time improves across all instruction selectors on CTMark:

             geomean
    SDAG     ~ -0.14%
    GISel    ~  -0.6%
    FastISel ~  -0.7% (measured locally)

    [12 lines not shown]
DeltaFile
+5-4llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+5-41 files

LLVM/project ada1a00llvm/test/MC/ARM thumb-ldr-stretch.s, llvm/test/MC/CSKY lrw-stretch.s

[test] Add MC relaxation stretch tests (#191118)

Verify:

- ARM tLDRpci instructions don't spuriously widen to t2LDRpci when
  upstream branches relax, which would push cbz targets out of range.
  This would catch the #184544 regression.
- CSKY lrw16 instructions don't spuriously widen to lrw32 when
  upstream branches relax. Similar to ARM.
DeltaFile
+66-0llvm/test/MC/ARM/thumb-ldr-stretch.s
+35-0llvm/test/MC/CSKY/lrw-stretch.s
+101-02 files

LLVM/project 07ce5cfllvm/include/llvm/ADT GenericUniformityImpl.h GenericUniformityInfo.h, llvm/lib/Analysis UniformityAnalysis.cpp

add callBackVH sdupport in uniformity
DeltaFile
+45-0llvm/lib/Analysis/UniformityAnalysis.cpp
+43-0llvm/unittests/Target/AMDGPU/UniformityAnalysisTest.cpp
+14-0llvm/include/llvm/ADT/GenericUniformityImpl.h
+4-0llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+1-0llvm/include/llvm/ADT/GenericUniformityInfo.h
+107-05 files

LLVM/project 5359e80orc-rt/include/orc-rt Session.h, orc-rt/lib/executor Session.cpp

[orc-rt] Simplify notification service construction in Session. NFC. (#191113)

We can replace the addNotificationService method with a call to the
generic createService method that was introduced in 98ccac607a9ff.
DeltaFile
+2-9orc-rt/lib/executor/Session.cpp
+0-1orc-rt/include/orc-rt/Session.h
+2-102 files

LLVM/project 6d49460clang/lib/Driver/ToolChains MinGW.cpp, clang/test/Driver mingw.cpp

[Clang] [MinGW] Handle `-nolibc` argument (#182062)

This implementation differs from GCC, but arguably more in line with
Unix systems, because it stops linking of default Win32 system
libraries.

On GCC it works like this:
```
❯ /ucrt64/bin/gcc -### /dev/null -nolibc 2>&1 | tr ' ' '\n' | rg '^\-l' | sort -u
-lgcc
-lgcc_eh
-lkernel32
-lmingw32
-lmingwex
-lmsvcrt

❯ /ucrt64/bin/gcc -### /dev/null 2>&1 | tr ' ' '\n' | rg '^\-l' | sort -u
-ladvapi32
-lgcc

    [21 lines not shown]
DeltaFile
+21-16clang/lib/Driver/ToolChains/MinGW.cpp
+10-0clang/test/Driver/mingw.cpp
+31-162 files

LLVM/project 16f02c0clang/test/CodeGen/X86 pr190962.ll, llvm/lib/Target/X86 X86InstrInfo.cpp X86InstrInfo.h

[X86][APX] Add copy instruction to LiveInterval of SrcReg (#191102)

Fixes: #190962
DeltaFile
+64-0clang/test/CodeGen/X86/pr190962.ll
+19-8llvm/lib/Target/X86/X86InstrInfo.cpp
+2-1llvm/lib/Target/X86/X86InstrInfo.h
+1-1llvm/lib/Target/X86/X86FastISel.cpp
+86-104 files