LLVM/project b160d1dllvm/include/llvm/CodeGen MachineRegionInfo.h, llvm/include/llvm/Passes MachinePassRegistry.def

[CodeGen][NewPM] Port machine-region-info to new pass manager (#203848)

- Make `MachineRegionInfo` movable, like `RegionInfo`.
- Add printer pass.
DeltaFile
+40-0llvm/include/llvm/CodeGen/MachineRegionInfo.h
+33-0llvm/lib/CodeGen/MachineRegionInfo.cpp
+3-2llvm/include/llvm/Passes/MachinePassRegistry.def
+1-0llvm/test/CodeGen/X86/machine-region-info.mir
+1-0llvm/lib/Passes/PassBuilder.cpp
+78-25 files

LLVM/project f63d8d6libc/src/__support/OSUtil/linux/syscall_wrappers prlimit.h CMakeLists.txt, libc/src/sys/resource/linux setrlimit.cpp getrlimit.cpp

[libc][NFC] wrap prlimit64 and refactor getrlimit/setrlimit (#204306)

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+38-0libc/src/__support/OSUtil/linux/syscall_wrappers/prlimit.h
+10-7libc/src/sys/resource/linux/setrlimit.cpp
+10-7libc/src/sys/resource/linux/getrlimit.cpp
+15-0libc/src/__support/OSUtil/linux/syscall_wrappers/CMakeLists.txt
+2-4libc/src/sys/resource/linux/CMakeLists.txt
+75-185 files

LLVM/project d5918ceflang/include/flang/Optimizer/Dialect FIROps.td FIROps.h, flang/lib/Optimizer/Dialect FIROps.cpp

Revert "[flang][mem2reg] promote memory slots through declares" (#204332)

Reverts llvm/llvm-project#196975

This patch is causing regressions on some of our downstream testing. I
am not sure the logic here is directly to blame, but I'd rather revert
and investigate for now.
DeltaFile
+16-195flang/test/Fir/mem2reg.mlir
+24-111flang/lib/Optimizer/Dialect/FIROps.cpp
+3-10flang/include/flang/Optimizer/Dialect/FIROps.td
+0-1flang/include/flang/Optimizer/Dialect/FIROps.h
+43-3174 files

LLVM/project 85ec3afflang/include/flang/Optimizer/Dialect FIROps.td FIROps.h, flang/lib/Optimizer/Dialect FIROps.cpp

Revert "[flang][mem2reg] promote memory slots through declares (#196975)"

This reverts commit c1ec4b3c79967ae5ef824f7194540f6529405a03.
DeltaFile
+16-195flang/test/Fir/mem2reg.mlir
+24-111flang/lib/Optimizer/Dialect/FIROps.cpp
+3-10flang/include/flang/Optimizer/Dialect/FIROps.td
+0-1flang/include/flang/Optimizer/Dialect/FIROps.h
+43-3174 files

LLVM/project f8fea59clang/lib/Basic LangOptions.cpp

[Clang][NFC] Change if-else to switch for OpenCL/HLSL version mapping (#204288)

Address https://github.com/llvm/llvm-project/pull/204043#discussion_r3419702862
DeltaFile
+34-15clang/lib/Basic/LangOptions.cpp
+34-151 files

LLVM/project 1f9f4f8lldb/test/API/api/listeners main.c, lldb/test/API/commands/expression/radar_9673664 main.c

[lldb] Remove several system header includes from tests (#204072)

System includes slow down test compilation and create unnecessary
dependencies on system header code.

This patch removes system headers from tests that do not test their
functionality. For the most part, this just removes the dummy 'printf'
we had in many tests.
DeltaFile
+11-10lldb/test/API/commands/expression/weak_symbols/main.c
+1-8lldb/test/API/functionalities/object-file/bin/hello.c
+1-8lldb/test/API/functionalities/object-file/bin/hello.cpp
+3-5lldb/test/API/commands/expression/radar_9673664/main.c
+1-7lldb/test/API/python_api/breakpoint/main.c
+1-7lldb/test/API/api/listeners/main.c
+18-4522 files not shown
+38-12128 files

LLVM/project 4d2342fllvm/docs LangRef.rst

[RFC][LangRef] Specify that the accessed bytes of concurrent atomics must be either disjoint or the same

So far, the LangRef hasn't been clear on the semantics of partially overlapping
concurrent atomics in LLVM IR (specifically: a set of accesses marked as
`atomic` that would be in a data race if they weren't `atomic` and not all of
them access the exact same set of bytes).

What loads read is defined in terms of individual bytes, but the memory
ordering constraints are formulated closely to the C/C++ (and Java for
`unordered`) memory model, where partially overlapping atomics are not
possible. It's not obvious how concepts like C/C++'s per-location total
modification order for `monotonic` accesses map to accesses that can partially
overlap. While C/C++ relies on the modification order to ensure that atomics
cannot tear (i.e., atomic reads return bytes from two or more atomic writes),
our IR semantics (as written) currently does not guarantee this in the presence
of partially overlapping accesses.

This PR proposes a solution to this problem: It specifies that concurrent
overlapping atomics must access the exact same set of bytes to act atomically.

    [7 lines not shown]
DeltaFile
+6-2llvm/docs/LangRef.rst
+6-21 files

LLVM/project 21f3248llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-debug.mir

[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging (#202937)

It's 8 years old, only used by a handful of tests, and has not been
updated
in a while except for maintenance as far as I can see.

I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt
debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
DeltaFile
+1-50llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-44llvm/test/CodeGen/AMDGPU/waitcnt-debug.mir
+1-942 files

LLVM/project 68c947fllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPUWaitcntUtils.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils (#202936)

Move really trivial functions into helpers to declutter InsertWaitCnt a
bit more.
I had to move HardwareLimits into a different header but it's only used
in InsertWaitCnt so it doesn't matter.
DeltaFile
+26-90llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+75-0llvm/lib/Target/AMDGPU/AMDGPUWaitcntUtils.cpp
+32-0llvm/lib/Target/AMDGPU/AMDGPUWaitcntUtils.h
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+133-1305 files

LLVM/project 1bea228libc/src/unistd/linux ftruncate.cpp link.cpp

[libc][NFC] Migrate unistd entrypoints to syscall wrappers (#204176)

Migrated link, ftruncate, and getentropy entrypoints to use their
corresponding syscall wrappers instead of direct syscall_impl calls.
Updated CMake dependencies accordingly.

Assisted-by: Automated tooling, human reviewed.
DeltaFile
+4-18libc/src/unistd/linux/ftruncate.cpp
+5-15libc/src/unistd/linux/link.cpp
+6-8libc/src/unistd/linux/getentropy.cpp
+3-11libc/src/unistd/linux/CMakeLists.txt
+18-524 files

LLVM/project ec7235eclang/lib/CodeGen CGStmt.cpp, clang/test/CodeGenCXX noreturn-init-stmt.cpp

[clang][CodeGen] Fix crash on if/switch init-statement ending in noreturn (#201047)

EmitStmt may `ClearInsertionPoint()` to mark dead code, EmitDecl is not
prepared to handle it. Fix by `EnsureInsertPoint()` in transition from
EmitStmt to EmitDecl. If/Switch body may contain a label which makes
them not dead.

Fixes #115514.
DeltaFile
+98-0clang/test/CodeGenCXX/noreturn-init-stmt.cpp
+12-2clang/lib/CodeGen/CGStmt.cpp
+110-22 files

LLVM/project 4995c6eclang/include/clang/Basic LangOptions.def

[LifetimeSafety] Mark lifetime safety LangOptions as `Benign` (#204316)

Without this, we cannot load modules built without lifetime safety.
Analysis options are in general benign and does not effect AST
construction.

See doc:
```cpp
  /// For ASTs produced with different option value, signifies their level of
  /// compatibility.
  enum class CompatibilityKind {
    /// Does affect the construction of the AST in a way that does prevent
    /// module interoperability.
    NotCompatible,
    /// Does affect the construction of the AST in a way that doesn't prevent
    /// interoperability (that is, the value can be different between an
    /// explicit module and the user of that module).
    Compatible,
    /// Does not affect the construction of the AST in any way (that is, the

    [4 lines not shown]
DeltaFile
+4-4clang/include/clang/Basic/LangOptions.def
+4-41 files

LLVM/project 0dda20cllvm/test/Transforms/LoopVectorize/AArch64 replicating-load-store-costs-apple.ll induction-costs.ll, llvm/test/Transforms/LoopVectorize/WebAssembly memory-interleave.ll

[LV] Add initial cost model for VPScalarIVSteps (#203347)

This PR currently only adds a cost model for integer types in
non-replicating regions in order to limit the scope of impact.
We can also support replicating regions, but that requires
looking for a recipe with an underlying value in the same
region in order to get a BasicBlock to pass in to the
getPredBlockCostDivisor function. This can be done in a future
PR.
DeltaFile
+18-130llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll
+54-73llvm/test/Transforms/LoopVectorize/AArch64/replicating-load-store-costs-apple.ll
+16-67llvm/test/Transforms/LoopVectorize/X86/replicating-load-store-costs.ll
+30-43llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll
+46-17llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+31-31llvm/test/Transforms/LoopVectorize/WebAssembly/memory-interleave.ll
+195-3617 files not shown
+299-40913 files

LLVM/project b9f8eeellvm/include/llvm/BinaryFormat DXContainer.h

[DirectX][ObjectYAML][NFC] Remove unused function (#204019)

A small follow-up for #202761.
`updateSize()` function added there is a rebase artifact. It is never
actually used. This change removes it.
DeltaFile
+0-5llvm/include/llvm/BinaryFormat/DXContainer.h
+0-51 files

LLVM/project 55ea182llvm/lib/Transforms/IPO FunctionSpecialization.cpp, llvm/test/Transforms/FunctionSpecialization interposable.ll

[FuncSpec] Do not specialize interposable functions (#204314)

We cannot specialize interposable functions, because the definition we
see may not be prevailing one. The prevailing definition can have
arbitrary different behavior.

We *can* still specialize inexact definitions like linkonce_odr, similar
to inlining.
DeltaFile
+40-0llvm/test/Transforms/FunctionSpecialization/interposable.ll
+3-0llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
+43-02 files

LLVM/project 6f73bc2llvm/lib/Support Parallel.cpp

[llvm] Fix unused function warning in Parallel (#204114)

When llvm is built without threading support:
<...>/llvm-project/llvm/lib/Support/Parallel.cpp:230:13: warning: unused
function 'isNested' [-Wunused-function]
  230 | static bool isNested() {
      |             ^~~~~~~~

The function is only used once, so I've put the code into the caller,
which is itself guarded with `#if LLVM_ENABLE_THREADS`.

Function added in 8daaa26efdda3802f73367d844b267bda3f84cbe / #189293.
DeltaFile
+2-9llvm/lib/Support/Parallel.cpp
+2-91 files

LLVM/project 8c3d2e9llvm/docs Passes.rst, llvm/include/llvm InitializePasses.h

[Passes] Remove deadarghaX0r pass (#204310)

This was a pass internally used by bugpoint. Bugpoint has been removed,
so remove the pass as well.
DeltaFile
+7-30llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
+0-7llvm/include/llvm/Transforms/IPO/DeadArgumentElimination.h
+0-6llvm/docs/Passes.rst
+0-1llvm/lib/Transforms/IPO/IPO.cpp
+0-1llvm/include/llvm/InitializePasses.h
+7-455 files

LLVM/project 7d92d40mlir/include/mlir/Dialect/Tosa/IR TosaComplianceData.h.inc TosaOps.td, mlir/lib/Dialect/Tosa/IR TosaOps.cpp

[mlir][tosa] Add row_gather operator (#202895)

Adds support for the row_gather operator defined by the TOSA
specification, see https://github.com/arm/tosa-specification/pull/60.

This includes:
- Operator definition
- Verification logic for the operator
- Output shape inference for the operator
- Validation checks to ensure compliance with the TOSA specification
including profile compliance and level checks.
- Canonicalization to replace row_gather with gather when row_count is
statically known to be 1.

It does not yet cover support for MXFP types. This will be added once
block scaled types are supported.
DeltaFile
+88-16mlir/lib/Dialect/Tosa/IR/TosaOps.cpp
+63-0mlir/test/Dialect/Tosa/verifier.mlir
+59-0mlir/include/mlir/Dialect/Tosa/IR/TosaComplianceData.h.inc
+49-0mlir/test/Dialect/Tosa/tosa-infer-shapes.mlir
+47-0mlir/test/Dialect/Tosa/ops.mlir
+36-0mlir/include/mlir/Dialect/Tosa/IR/TosaOps.td
+342-1612 files not shown
+506-1618 files

LLVM/project b59f965lldb/test/API/lang/objc/hidden-ivars TestHiddenIvars.py

[lldb][test] Cleanup and modernize TestHiddenIvars.py (#202023)

This is simple rewrite of the test. The patch improves three things:

* It replaces old expect tests with the new expect_* variants that no
longer rely on substring matching.

* It unifies the strip/non-stripped checks as we actually produce
identical SBValues in both cases (by fetching data from the Objective-C
runtime).

* It builds this test with a shared build directory. Our stripping logic
generates a new stripped binary in a subdirectory and doesn't touch the
shared build files. This also halves the test runtime to 6s.
DeltaFile
+79-187lldb/test/API/lang/objc/hidden-ivars/TestHiddenIvars.py
+79-1871 files

LLVM/project 045ec52lldb/packages/Python/lldbsuite/test dotest.py dotest_args.py, lldb/test/API lit.cfg.py

[lldb][test] Only calculate LLDB python path once (#201327)

We spend about 70ms each dotest invocation recalculating the path where
the LLDB module is. This patch changes this so that dotest calculates
this path once and passes it to every dotest invocation.

As a fallback, we still support inferring the location from LLDB as
before, but I would propose we drop this support in the future.
DeltaFile
+29-11lldb/packages/Python/lldbsuite/test/dotest.py
+22-0lldb/test/API/lit.cfg.py
+6-0lldb/packages/Python/lldbsuite/test/dotest_args.py
+5-0lldb/packages/Python/lldbsuite/test/configuration.py
+62-114 files

LLVM/project de84e7cllvm/lib/Target/Mips MipsSEISelLowering.cpp MipsMSAInstrInfo.td, llvm/test/CodeGen/Mips/msa f16-llvm-ir.ll

[MIPS] soft-promote `f16` also when using `+msa` (#204158)

Fixes https://github.com/llvm/llvm-project/issues/202808
Re-lands https://github.com/llvm/llvm-project/pull/203065

Make use of the default soft-promote mechanism for f16, rather than an
ad-hoc approach making f16 storage-only.

In theory you could leave it at that, but I added custom implementations
to make use of the instructions for `FP16_TO_FP` and `FP_TO_FP16`, and
manually apply the "fptoui to fptosi trick" which generates shorter
code.

I've now tested that, in combination with
https://github.com/llvm/llvm-project/pull/203390, this PR is able to
build and run the rust `std` test suite, which exercises both `f16` and
vectors a bunch. The tests all pass under `qemu` as well.

The last commit fixes an integer overflow bug that triggered UBSan and
led to an earlier revert of these changes.
DeltaFile
+966-1,105llvm/test/CodeGen/Mips/msa/f16-llvm-ir.ll
+101-411llvm/lib/Target/Mips/MipsSEISelLowering.cpp
+0-37llvm/lib/Target/Mips/MipsMSAInstrInfo.td
+2-14llvm/lib/Target/Mips/MipsSEISelLowering.h
+0-6llvm/lib/Target/Mips/MipsRegisterInfo.td
+3-2llvm/lib/Target/Mips/MipsISelLowering.cpp
+1,072-1,5753 files not shown
+1,073-1,5809 files

LLVM/project 7e69b16llvm/include/llvm/IR IntrinsicInst.h, llvm/lib/Analysis InstructionSimplify.cpp

[InstCombine] Fold X == Identity ? Y : min/max(X, Y) (#202748)

Fixes #202576

Fold:

```llvm
select (X == -1), Y, umin(X, Y) -> umin(X, Y)
select (X == 0), Y, umax(X, Y) -> umax(X, Y)
select (X == SignedMax), Y, smin(X, Y) -> smin(X, Y)
select (X == SignedMax), Y, smax(Y, X) -> smax(X, Y)
```

Snd the inverted/commuted forms.

AI note: I used AI to help me read through the codebase and write the
tests.
DeltaFile
+57-0llvm/test/Transforms/InstCombine/umin-icmp.ll
+23-0llvm/include/llvm/IR/IntrinsicInst.h
+21-0llvm/lib/Analysis/InstructionSimplify.cpp
+11-0llvm/test/Transforms/InstCombine/smin-icmp.ll
+11-0llvm/test/Transforms/InstCombine/umax-icmp.ll
+10-0llvm/test/Transforms/InstCombine/smax-icmp.ll
+133-06 files

LLVM/project ec6e35fclang/include/clang/Basic LangOptions.def

lifetime-safety-is-benign
DeltaFile
+4-4clang/include/clang/Basic/LangOptions.def
+4-41 files

LLVM/project d5b7e08clang/include/clang/Basic LangOptions.def DiagnosticOptions.def, clang/include/clang/Options Options.td

frontend-opt-to-diags
DeltaFile
+4-4clang/include/clang/Options/Options.td
+0-8clang/include/clang/Basic/LangOptions.def
+6-0clang/include/clang/Basic/DiagnosticOptions.def
+4-1clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
+2-2clang/lib/Sema/SemaLifetimeSafety.h
+3-1clang/lib/Analysis/LifetimeSafety/Checker.cpp
+19-161 files not shown
+21-187 files

LLVM/project 285ed05llvm/lib/Target/AMDGPU/AsmParser AMDGPUAsmParser.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

AMDGPU: Refactor AMDGPUTargetID to not store MCSubtargetInfo

Store the triple string and GPUKind instead. The dependence
on checking AMDHSA seems like an anti-feature, but maintain the
behavior of not printing the modifiers for other OSes. Start
parsing the target ID instead of performing a direct string
comparison. Also improve test coverage for the treatment of the
environment component of the triple. The main behavioral change
is this will now produce normalized triples in the output and
diagnostics. Practially, this means all of the places that
currently emit "--" will be expanded into "-unknown-".

Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
DeltaFile
+79-36llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+36-10llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+27-1llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+17-0llvm/test/MC/AMDGPU/amdgcn-target-directive-triple-env.s
+5-5llvm/test/MC/AMDGPU/hsa-diag-v4.s
+4-4llvm/test/MC/AMDGPU/isa-version-pal.s
+168-5616 files not shown
+198-7722 files

LLVM/project cb81b5dllvm/lib/Target/AMDGPU AMDGPUHWEvents.h SIInsertWaitcnts.cpp

[AMDGPU][InsertWaitCnts] Make HWEvent a BitMask

Follow up from comments on https://github.com/llvm/llvm-project/pull/202886

Make HWEvent a bitmask by default instead of having both the enum, and a separate HWEventSet. This has the advantage of streamlining the code a bit and opening the possibility of adding "modifiers" to events, e.g. I imagine we could now fold "VMemType" into the Events.
We already do this with things like SMEM_GROUP. At least now it's baked into the design.

I opted for a bit more verbosity by taking inspiration from FastMathFlags (FMF): instead of exposing a raw enum, I wrap it in a class w/ helper function. The downside is having to reimplement all the little bitwise ops, but the result is a cleaner, simpler interface than a raw enum (class) w/ many helper functions. I initially tried that but I recoiled at the sight of things like `contains(A, B)` which isn't very clear, while `A.contains(B)` is self explanatory.

Considering HWEvent is a bitmask, I also implemented a simple iterator to iterate over all set bits of the mask, which is a useful thing to have as some APIs in InsertWaitCnt rely on treating one event at a time.
DeltaFile
+96-94llvm/lib/Target/AMDGPU/AMDGPUHWEvents.h
+73-79llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+28-34llvm/lib/Target/AMDGPU/AMDGPUHWEvents.def
+30-32llvm/lib/Target/AMDGPU/AMDGPUHWEvents.cpp
+227-2394 files

LLVM/project 415c765llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-debug.mir

[RFC][AMDGPU] Remove DebugCounter-based WaitCnt debugging

It's 8 years old, only used by a handful of tests, and has not been updated
in a while except for maintenance as far as I can see.

I don't mind keeping it in if there are users of it, but right now it
looks like a dead feature. If we want some more elaborate waitcnt debugging,
we should have a modern, generic system that works on any waitcnt, not
something specific to 3 GFX9 counters.
DeltaFile
+1-50llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+0-44llvm/test/CodeGen/AMDGPU/waitcnt-debug.mir
+1-942 files

LLVM/project 4c01416llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Add helper for getLimit
DeltaFile
+9-8llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+9-81 files

LLVM/project 3d923fcllvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp

Comment
DeltaFile
+5-5llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+5-51 files

LLVM/project 97b49callvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp AMDGPUWaitcntUtils.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

[NFC][AMDGPU][InsertWaitCnts] Move some simple functions into Utils

Move really trivial functions into helpers to declutter InsertWaitCnt a bit more.
I had to move HardwareLimits into a different header but it's only used in InsertWaitCnt so it doesn't matter.
DeltaFile
+21-86llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+75-0llvm/lib/Target/AMDGPU/AMDGPUWaitcntUtils.cpp
+32-0llvm/lib/Target/AMDGPU/AMDGPUWaitcntUtils.h
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+0-20llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+128-1265 files