LLVM/project 6d73d5cflang/lib/Lower/OpenMP OpenMP.cpp

address review comments
DeltaFile
+8-6flang/lib/Lower/OpenMP/OpenMP.cpp
+8-61 files

LLVM/project b08a295flang/lib/Lower/OpenMP OpenMP.cpp, flang/lib/Optimizer/OpenMP DoConcurrentConversion.cpp

[Flang][OpenMP] Add combined construct information

This patch adds the `omp.combined` attribute to OpenMP dialect
operations following changes to the `ComposableOpInterface`.

This attribute is added to operations representing non-innermost leaf
constructs of a combined construct and to standalone block-associated
constructs that can be combined with their parent construct.

Changes are made to the OpenMP lowering logic, as well as the
do-concurrent, workshare and workdistribute transformation passes.
DeltaFile
+1,094-0flang/test/Lower/OpenMP/compound.f90
+56-20flang/lib/Lower/OpenMP/OpenMP.cpp
+6-6flang/test/Transforms/DoConcurrent/use_loop_bounds_in_body.f90
+5-5flang/test/Transforms/DoConcurrent/local_device.mlir
+4-4flang/test/Transforms/DoConcurrent/reduce_device.mlir
+6-2flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+1,171-3727 files not shown
+1,225-7133 files

LLVM/project 9e60e47mlir/include/mlir/Dialect/OpenMP OpenMPOps.td, mlir/lib/Dialect/OpenMP/IR OpenMPDialect.cpp

[MLIR][OpenMP] Explicit tagging of combined constructs

Combined OpenMP constructs, such as `parallel do`, which represent
nests of constructs where each one contains a single other construct
without any other directives or statements in between, are currently not
marked in any way in the MLIR representation.

This works because they don't usually require any specific handling
other than what would be done for the included operations. However, the
handling of `target` regions needs to know whether it was part of a
combined construct in order to properly optimize for the SPMD case and
detect when certain clauses must be inconditionally evaluated in the
host.

So far, this has been achieved by having some MLIR pattern-matching
logic to infer whether a nest of operations could have potentially been
produced for a combined construct. This approach is error prone,
computationally expensive and it can't really work in the general case.
On the other hand, a compiler frontend can easily tell the difference

    [10 lines not shown]
DeltaFile
+137-134mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+123-76mlir/test/Dialect/OpenMP/invalid.mlir
+106-0mlir/test/Dialect/OpenMP/invalid-interface.mlir
+33-33mlir/test/Dialect/OpenMP/ops.mlir
+29-33mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+24-24mlir/test/Target/LLVMIR/openmp-teams-clauses-trunc-ext.mlir
+452-30035 files not shown
+565-37041 files

LLVM/project 0e5c89eflang/lib/Lower/OpenMP OpenMP.cpp

address review comments
DeltaFile
+14-13flang/lib/Lower/OpenMP/OpenMP.cpp
+14-131 files

LLVM/project 8f6cb73cmake/Modules GetTripleCMakeSystemName.cmake

Handle more cases from the chart
DeltaFile
+20-3cmake/Modules/GetTripleCMakeSystemName.cmake
+20-31 files

LLVM/project 355735ecmake/Modules GetTripleCMakeSystemName.cmake

Handle mingw
DeltaFile
+1-1cmake/Modules/GetTripleCMakeSystemName.cmake
+1-11 files

LLVM/project 23d906eopenmp/runtime/cmake LibompExports.cmake

[openmp] Fix export file paths (#202692)

The files omp_lib.h and omp-tools.h are the outputs of two
configure_file invocations which specify the full path of the outputs.
Use these full paths in LibompExports.cmake so they can actually be
found.
DeltaFile
+2-2openmp/runtime/cmake/LibompExports.cmake
+2-21 files

LLVM/project 6505f14llvm/include/llvm/CodeGen UnreachableBlockElim.h, llvm/lib/Target/AMDGPU AMDGPU.h SIWholeQuadMode.h

[NPM] Make few more passes Required
DeltaFile
+4-4llvm/lib/Target/AMDGPU/AMDGPU.h
+2-2llvm/include/llvm/CodeGen/UnreachableBlockElim.h
+1-1llvm/lib/Target/AMDGPU/SIWholeQuadMode.h
+1-1llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.h
+1-1llvm/lib/Target/AMDGPU/SILowerWWMCopies.h
+1-1llvm/lib/Target/AMDGPU/SILowerSGPRSpills.h
+10-1013 files not shown
+23-2319 files

LLVM/project 88429cacross-project-tests/debuginfo-tests/dexter/dex/dextIR StepIR.py, cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/debugging then_at_frame.cpp

[Dexter] Add at_frame_idx to check values in frames above current

This patch adds a new attribute for !and nodes, `at_frame_idx`, which
matches against frames above its parent node; for example, in the script:

```
!where {function: foo}:
  !where {function: bar}:
    !and {at_frame_idx: 1}:
      !value x: 0
```

The `!value x` node checks the value of 'x' in 'foo' while the debugger is
inside 'bar'. Use of this attribute comes with some restrictions: a !where
node can never be nested under a !and{at_frame_idx} node, and neither can
another !and{at_frame_idx} node.
DeltaFile
+61-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/Inputs/rewrite_at_frame_expected.cpp
+60-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/debugging/then_at_frame.cpp
+49-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/rewriting/rewrite_at_frame.cpp
+46-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/evaluation/eval_at_frame.cpp
+26-13cross-project-tests/debuginfo-tests/dexter/dex/dextIR/StepIR.py
+33-0cross-project-tests/debuginfo-tests/dexter/feature_tests/scripts/parser/reject-bad-at_frame_idx.test
+275-1313 files not shown
+364-5319 files

LLVM/project a782830llvm/docs AMDGPUUsage.rst

Clean-up docs
DeltaFile
+3-3llvm/docs/AMDGPUUsage.rst
+3-31 files

LLVM/project 6684278. .git-blame-ignore-revs

Add "Split clang/lib/CodeGen/CGBuiltin.cpp" to .git-blame-ignore-revs (#203419)
DeltaFile
+3-0.git-blame-ignore-revs
+3-01 files

LLVM/project 663bcb3clang/lib/CodeGen CodeGenFunction.h, clang/lib/CodeGen/TargetBuiltins ARM.cpp

[SVE] Replace unnecessary Intrinsic::aarch64_sve_ptrue construction. (#203349)

Prefer ConstantInt::getTrue() over sve.ptrue(31) when creating
all-active boolean vectors.
DeltaFile
+24-46llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
+22-30clang/test/CodeGen/AArch64/sve-intrinsics/acle_sve_dupq.c
+14-23llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+6-14llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+1-8clang/lib/CodeGen/TargetBuiltins/ARM.cpp
+0-1clang/lib/CodeGen/CodeGenFunction.h
+67-1226 files

LLVM/project ae2ef21llvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll

AMDGPU/GlobalISel: RegBankLegalize rules for mfma_scale
DeltaFile
+9,306-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.32x32x64.f8f6f4.ll
+4,210-1llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.scale.f32.16x16x128.f8f6f4.ll
+7-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+13,523-23 files

LLVM/project 33c56b4clang/cmake/modules ClangConfig.cmake.in, cmake/Modules GetTripleCMakeSystemName.cmake

runtimes: Pass CMAKE_SYSTEM_NAME based on target triple

Compute the cmake system name from the target triple, rather
than passing through the host's. This is primarily to stop
forwarding OSX specific cmake variables.

This fixes build failures when trying to build gpu libc on mac
hosts. Previously it would fail on several issues, starting with
an unused argument -mmacos-version-min error, followed by other
errors caused by passing -isysroot.

Secondarily, restrict the cmake imported targets when cross compiling.
Without this, the amdgpu build prints many cmake warnings about the
target not supporting shared libraries.

Claude did most of the actual work, though it required quite a few
rounds of prodding to get it into the right place. In particular it
took care of handling all of the cmake platform recognized names from
the triple.

    [2 lines not shown]
DeltaFile
+32-37llvm/cmake/modules/LLVMConfig.cmake.in
+65-0cmake/Modules/GetTripleCMakeSystemName.cmake
+25-2llvm/cmake/modules/LLVMExternalProjectUtils.cmake
+4-1clang/cmake/modules/ClangConfig.cmake.in
+0-4llvm/runtimes/CMakeLists.txt
+126-445 files

LLVM/project 056b4a7llvm/lib/Target/SPIRV SPIRVNonSemanticDebugHandler.cpp SPIRVNonSemanticDebugHandler.h, llvm/test/CodeGen/SPIRV/debug-info debug-type-vector-skipped.ll debug-type-vector.ll

Emit debug type vector (#200056)

This emits `DebugTypeVector` for HLSL `float4`-style vectors.

`partitionTypes()` separates vector `DICompositeType` nodes from basic
types so both can be visited in a single pass over the debug metadata. A
new `emitDebugTypeVector()` helper builds the `DebugTypeVector`
instruction and looks up the base-type register in `DebugTypeRegs`.

The helper skips four cases silently:

1. Absent or non-`DIBasicType` base type: only scalar element types are
supported for now.
2. Base type not yet emitted: the type was not reached during the
`DebugTypeBasic` pass.
3. Multiple subranges: `DebugTypeVector` models one-dimensional vectors
only (NSDI cannot encode multi-subrange types).
4. Non-constant subrange count: NSDI cannot represent variable-length
counts.

    [2 lines not shown]
DeltaFile
+66-0llvm/test/CodeGen/SPIRV/debug-info/debug-type-vector-skipped.ll
+66-0llvm/test/CodeGen/SPIRV/debug-info/debug-type-vector.ll
+49-7llvm/lib/Target/SPIRV/SPIRVNonSemanticDebugHandler.cpp
+14-1llvm/lib/Target/SPIRV/SPIRVNonSemanticDebugHandler.h
+195-84 files

LLVM/project 67aa8fallvm/docs AMDGPUUsage.rst, llvm/include/llvm/Support AMDGPUAddrSpace.h

Fix docs
DeltaFile
+2-2llvm/docs/AMDGPUUsage.rst
+1-1llvm/include/llvm/Support/AMDGPUAddrSpace.h
+3-32 files

LLVM/project 85c8d8ellvm/docs AMDGPUUsage.rst, mlir/include/mlir/Dialect/LLVMIR ROCDLDialect.td

Address comments, fix rebase
DeltaFile
+4-4llvm/docs/AMDGPUUsage.rst
+2-0mlir/include/mlir/Dialect/LLVMIR/ROCDLDialect.td
+6-42 files

LLVM/project 005809ellvm/lib/Target/AMDGPU AMDGPULowerExecSync.cpp

clang-format
DeltaFile
+1-2llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+1-21 files

LLVM/project e7d5d3cllvm/lib/Target/AMDGPU AMDGPUMemoryUtils.cpp AMDGPUMemoryUtils.h

[NFC][AMDGPU] Generalize some LDS MemoryUtils

In preparation for upcoming work, I need some functions used by the LDS lowering
system to work on any GV. I removed the LDS specific queries inside these functions
and replaced them with functors passed by the caller, so these utility functions can be reused.

I also cleaned-up a few things that weren't up to code, such as lowercase variable names.
DeltaFile
+30-36llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp
+37-9llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.h
+20-17llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+21-7llvm/lib/Target/AMDGPU/AMDGPULowerExecSync.cpp
+7-6llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp
+115-755 files

LLVM/project 9090e13llvm/lib/Target/AMDGPU SIISelLowering.cpp AMDGPULegalizerInfo.cpp, llvm/test/CodeGen/AMDGPU addrspacecast-barrier.ll s-barrier.ll

[RFC][AMDGPU] Add BARRIER address space

Add a new BARRIER address space that is used for global variables that are used to represent the barrier IDs in GFX12.5.

These barrier addresses just have values corresponding 1-1 to barrier IDs. They are still implemented on top of LDS, but the offsetting happens during an addrspacecast to generic, not whenever the barrier GV is used.

The motivation for this is to make the relation between LDS and barrier GVs explicit in the compiler. It does add a bit more complexity, but that complexity was already there, just hidden by pretending barrier GVs were actual LDS.
DeltaFile
+442-0llvm/test/CodeGen/AMDGPU/addrspacecast-barrier.ll
+62-45llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+54-31llvm/test/CodeGen/AMDGPU/s-barrier.ll
+52-14llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+35-31llvm/test/CodeGen/AMDGPU/s-barrier-lowering.ll
+32-32llvm/test/CodeGen/AMDGPU/amdgpu-lower-exec-sync-and-module-lds.ll
+677-15342 files not shown
+1,107-44048 files

LLVM/project 56e520cclang/lib/CodeGen TargetInfo.h CodeGenModule.cpp, clang/lib/CodeGen/Targets AMDGPU.cpp SPIR.cpp

[NFCI][clang] Allow overriding any global variable address space

Allow the target to change the AS of a global variable at will, not just whenever Clang cannot assign one.
This enables the next patch that will specialize LDS GVs for barriers as a separate address space.
DeltaFile
+10-9clang/lib/CodeGen/Targets/AMDGPU.cpp
+12-6clang/lib/CodeGen/TargetInfo.h
+7-8clang/lib/CodeGen/Targets/SPIR.cpp
+11-2clang/lib/CodeGen/CodeGenModule.cpp
+5-6clang/lib/CodeGen/TargetInfo.cpp
+6-3clang/lib/CodeGen/Targets/AVR.cpp
+51-346 files

LLVM/project 3999e13llvm/test/CodeGen/AMDGPU s-barrier-id-allocation.ll, mlir/include/mlir/Dialect/LLVMIR ROCDLOps.td

Fix MLIR
DeltaFile
+21-21llvm/test/CodeGen/AMDGPU/s-barrier-id-allocation.ll
+6-6mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+4-4mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-barriers-gfx12.mlir
+2-2mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+1-1mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+34-345 files

LLVM/project 2743310llvm/test/CodeGen/AMDGPU llvm.amdgcn.cluster.id.ll

[AMDGPU] Regenerate cluster ID checks (#203494)
DeltaFile
+24-24llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cluster.id.ll
+24-241 files

LLVM/project dbc255bllvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64ISelDAGToDAG.cpp, llvm/lib/Target/AArch64/GISel AArch64PostLegalizerLowering.cpp AArch64GlobalISelUtils.h

[AArch64](NFC) Introduce unified `isLegalArithImmed()` and `isLegalCmpImmed()` (#203020)

Quick tidy up to factor out some common helpers into
`AArch64AddressingModes.h`.
DeltaFile
+12-24llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+22-0llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
+3-9llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
+4-7llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+0-6llvm/lib/Target/AArch64/GISel/AArch64GlobalISelUtils.h
+41-465 files

LLVM/project 7f80165compiler-rt/lib/builtins/arm addsf3.S

Clarify #(x << y) idiom
DeltaFile
+13-13compiler-rt/lib/builtins/arm/addsf3.S
+13-131 files

LLVM/project e22a2a9compiler-rt/lib/builtins/arm addsf3.S, compiler-rt/lib/builtins/arm/thumb1 addsf3fast.S

Fix misaligned comments
DeltaFile
+33-33compiler-rt/lib/builtins/arm/addsf3.S
+27-25compiler-rt/lib/builtins/arm/thumb1/addsf3fast.S
+60-582 files

LLVM/project c3f9156compiler-rt/lib/builtins/arm addsf3.S

Swap round #if !__thumb__
DeltaFile
+17-15compiler-rt/lib/builtins/arm/addsf3.S
+17-151 files

LLVM/project b5ec4f1llvm/lib/Support UnicodeNameToCodepointGenerated.cpp, llvm/test/CodeGen/AMDGPU/NextUseAnalysis spill-vreg-many-lanes.mir acyclic-770bb.mir

Merge remote-tracking branch 'upstream/main' into arm-fp-faddsub
DeltaFile
+275,101-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/spill-vreg-many-lanes.mir
+144,679-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/acyclic-770bb.mir
+38,494-84,026llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+57,682-0llvm/test/CodeGen/AMDGPU/NextUseAnalysis/double-nested-loops-complex-cfg.mir
+23,873-20,923llvm/lib/Support/UnicodeNameToCodepointGenerated.cpp
+22,388-22,086llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+562,217-127,03528,757 files not shown
+3,384,957-1,212,10928,763 files

LLVM/project da65d6allvm/lib/Support KnownFPClass.cpp, llvm/test/Transforms/Attributor nofpclass-canonicalize.ll

[KnownFPClass] Fix canonicalize incorrectly dropping fcNegZero under positive-zero denormal mode (#202268)

The denormal mode only flushes *denormal* (subnormal) values; -0.0 is
not a denormal, and per LangRef canonicalize must conserve the sign of
zero (canonicalize(-0.0) == -0.0).

Alive2 (InstCombine fold of canonicalize on a {+/-0, nan} value):
  before (miscompiles -0.0 -> +0.0): https://alive2.llvm.org/ce/z/ZRK-sr
  after  (verifies):                 https://alive2.llvm.org/ce/z/L3tPu3
DeltaFile
+18-18llvm/test/Transforms/Attributor/nofpclass-canonicalize.ll
+8-2llvm/lib/Support/KnownFPClass.cpp
+1-1llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-canonicalize.ll
+27-213 files

LLVM/project 0a55578amd/comgr/src/hotswap code-object-utils.cpp code-object-utils.h, amd/comgr/test-unit RaiserScaffoldingTest.cpp

[Comgr][hotswap] Address PR #2437 review comments

Reviewer feedback from chinmaydd and jmmartinez:

- readKernelDescriptor now returns Expected<KernelDescriptorFields> by
  value instead of writing through an out-parameter (jmmartinez), folding
  the byte read and field extraction into one function.
- Group the KD register fields into a KernelDescriptorFields struct stored
  as std::optional<KernelDescriptorFields> on KernelMeta, replacing the
  HasKernelDescriptor bool flag (jmmartinez). PrivateSegmentFixedSize now
  lives only in the descriptor struct, read authoritatively from .rodata.
- extractKernelMeta propagates a KD parse failure as an error rather than
  swallowing it into a partial success (ftynse/chinmaydd; martin-luecke
  agreed), so a successful KernelMeta always carries the descriptor.
- raiser.cpp reuses the shared kAMDGPUTriple from mc-state.h instead of a
  local duplicate constant (chinmaydd).
- Add TODOs flagging the non-thread-safe target init and the
  non-exhaustive stripEncoding suffix list (chinmaydd).

Assited-by: Claude Opus 4.8 (1M context) <noreply at anthropic.com>
DeltaFile
+41-63amd/comgr/src/hotswap/code-object-utils.cpp
+49-52amd/comgr/src/hotswap/code-object-utils.h
+6-6amd/comgr/test-unit/RaiserScaffoldingTest.cpp
+4-5amd/comgr/src/hotswap/raiser.cpp
+8-0amd/comgr/src/hotswap/mc-state.cpp
+1-1amd/comgr/src/hotswap/raiser.h
+109-1276 files