LLVM/project ed89f08llvm/include/llvm/Object ELFObjectFile.h, llvm/lib/DWP DWP.cpp

[llvm-dwp] Fix incorrect ELF OS/ABI in DWP output (#198486)

I received a report internally that
https://github.com/llvm/llvm-project/pull/192112 caused issues with
lldb.
LLDB has not able to load the dwp files because of the OS mismatch
between the binary and dwp file.

Investigating, it turns out that the refactor caused DWPWriter to call
`ELFObjectFileBase::getOS()` which sets the output OS/ABI, but getOS()
returns `Triple::OSType`, not the raw `e_ident[EI_OSABI]` byte. These
enums have different numbering :( oops.

This caused certain tools that validate OS/ABI consistency between a
binary and its DWP to reject the debug info.
Fix by adding getEIdentOSABI() to ELFObjectFileBase (parallel to
getEIdentABIVersion()) and using it instead of getOS().

Assisted-by: Claude
DeltaFile
+25-0llvm/test/tools/llvm-dwp/X86/osabi.test
+7-0llvm/include/llvm/Object/ELFObjectFile.h
+1-1llvm/lib/DWP/DWP.cpp
+33-13 files

LLVM/project 861baealldb/tools/lldb-dap/extension/src process-tree.ts

[lldb-dap] Add missing `arguments` field to LldbDapProcessEntry (#198597)

The TypeScript interface was missing the optional `arguments` field that
`parseListProcessesOutput` reads and `pick-process` displays, breaking
the extension build.
DeltaFile
+1-0lldb/tools/lldb-dap/extension/src/process-tree.ts
+1-01 files

LLVM/project 41c45a2clang-tools-extra/clangd/refactor/tweaks DefineOutline.cpp, clang-tools-extra/clangd/unittests/tweaks DefineOutlineTests.cpp

[clangd] Let DefineOutline tweak create a definition from scratch (#71950)

Fixes https://github.com/clangd/clangd/issues/445
DeltaFile
+72-11clang-tools-extra/clangd/refactor/tweaks/DefineOutline.cpp
+26-6clang-tools-extra/clangd/unittests/tweaks/DefineOutlineTests.cpp
+3-0clang-tools-extra/docs/ReleaseNotes.rst
+101-173 files

LLVM/project f037e17lldb/source/Commands CommandObjectTarget.cpp

[lldb] Don't require a real target for `target modules list -g` (#198594)

The `-g` flag lists the global module list, which doesn't need a target.
Switch to eCommandAllowsDummyTarget and error out explicitly in
DoExecute on the non-global paths when no real target is selected.

Fixes a regression introduced by #198429.
DeltaFile
+8-2lldb/source/Commands/CommandObjectTarget.cpp
+8-21 files

LLVM/project 70f8c7bllvm/test/CodeGen/AMDGPU atomic_optimizations_global_pointer.ll atomic_optimizations_local_pointer.ll, llvm/test/MachineVerifier/AMDGPU dpp-sgpr-src1.mir

[AMDGPU] Disable dpp src1 sgpr on gfx11 (#164241)

https://github.com/llvm/llvm-project/pull/67461 enabled SGPRs as src1 by
default for all dpp opcodes with manual checks for targets where this is
not supported. In that case, isOperandLegal checked if the second
operand is legal as src0.
https://github.com/llvm/llvm-project/pull/155595 disabled this check by
removing the calls to isOperandLegal, which resulted in SGPRs being used
as operands for src1 on gfx11. This PR reenables this check and fixes
the lit test.

---------

Co-authored-by: Paul Trojahn <paul.trojahn at amd.com>
DeltaFile
+83-79llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
+70-66llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
+26-0llvm/test/CodeGen/AMDGPU/si-fold-operands-gfx11.mir
+18-3llvm/test/CodeGen/AMDGPU/dpp_combine.ll
+20-0llvm/test/MachineVerifier/AMDGPU/dpp-sgpr-src1.mir
+7-5llvm/test/CodeGen/AMDGPU/dpp_combine_gfx11.mir
+224-1532 files not shown
+235-1618 files

LLVM/project b16e3a0libc/startup/linux/aarch64 tls.cpp

[libc] Remove broken __builtin_aarch64_wsr fallback in set_thread_ptr (#197295)

The fallback used __builtin_aarch64_wsr (32-bit) instead of
__builtin_aarch64_wsr64, truncating the 64-bit thread pointer value and
causing non-deterministic runtime crashes.

Modern GCC correctly warns about it and -Werror=conversion catches it.


```
/var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp: In function ‘bool __llvm_libc_22_1_5_::set_thread_ptr(uintptr_t)’:
/var/tmp/portage/llvm-runtimes/libc-22.1.5/work/libc/startup/linux/aarch64/tls.cpp:90:38: error: conversion from ‘uintptr_t’ {aka ‘long unsigned int’} to ‘unsigned int’ may change value [-Werror=conversion]
   90 |   __builtin_aarch64_wsr("tpidr_el0", val);
      |                                      ^~~
cc1plus: all warnings being treated as errors
```
DeltaFile
+2-2libc/startup/linux/aarch64/tls.cpp
+2-21 files

LLVM/project e13d9c2clang/lib/CIR/CodeGen CIRGenAtomic.cpp, clang/test/CIR/CodeGen atomic.c

[CIR] Implement atomic cmp exhange with non-const 'weak' lowering (#198546)

This was left as an NYI, but appears in self build!

This patch follows the existing solution in that we are doing the
branching of weak vs not-weak at the CIR level. This is necessary
because the LLVM intrinsics (and the CIR operaions) take 'weak' as a
constant value.

Unlike classic-codegen, this patch uses an 'if' instead of a 'switch' on
the 'weak' value. This is mainly for readability (since it is a switch
    over a bool!), but also because our 'switch' doesn't seem to support
'bool', so this would require an additional cast.

As a future direction, we may wish to modify the CIR operations to take
'weak' and 'failure' value (both are constants in LLVM intrinsics!) as
non-constants, and handle the switch/if statement during lowering. This
would give us an opportunity to optimize the value out without having to
collapse the if/switch/etc, and minimize the size of the CIR. However,
as that is a larger direction, this patch skips that for now.
DeltaFile
+354-0clang/test/CIR/CodeGen/atomic.c
+35-3clang/lib/CIR/CodeGen/CIRGenAtomic.cpp
+389-32 files

LLVM/project bb95a8dclang/lib/CIR/CodeGen CIRGenExpr.cpp, clang/test/CIR/CodeGen builtin-call.cpp

[CIR] Fix assumption that 'curFn' is always a function in direct-call (#197766)

The code to do some checking with a builtin function tried to tell
whether it is being called inside of a function of the same name. This
isn't necessarily true (that it is in a function), since we generate
'global' ops as a curFn too. This patch just removes the assumption and
changes the condition to only happen when we're in a function.
DeltaFile
+77-0clang/test/CIR/CodeGen/builtin-call.cpp
+4-3clang/lib/CIR/CodeGen/CIRGenExpr.cpp
+81-32 files

LLVM/project 58a43dcmlir/lib/Dialect/SPIRV/IR SPIRVOps.cpp, mlir/lib/Target/SPIRV/Deserialization Deserializer.cpp

[mlir][SPIR-V] Support literal struct type in spirv.Constant (#198414)
DeltaFile
+29-1mlir/test/Dialect/SPIRV/IR/structure-ops.mlir
+21-3mlir/lib/Dialect/SPIRV/IR/SPIRVOps.cpp
+14-0mlir/test/Target/SPIRV/constant.mlir
+4-3mlir/lib/Target/SPIRV/Serialization/Serializer.cpp
+1-1mlir/lib/Target/SPIRV/Deserialization/Deserializer.cpp
+69-85 files

LLVM/project ace44dcllvm/test/CodeGen/AMDGPU memory-legalizer-local-nontemporal.ll shl_add.ll

[AMDGPU] Gate `S_LSHL[1-4]_ADD_U32` patterns on uniform results (#198508)

Like the other SOP2 patterns in this file, these scalar instructions
require the result to be uniform. Wrap them in `UniformBinFrag` so
divergent shl/add chains use `V_LSHL_ADD_U32`
DeltaFile
+147-102llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll
+233-0llvm/test/CodeGen/AMDGPU/shl_add.ll
+104-72llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll
+87-62llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll
+51-51llvm/test/CodeGen/AMDGPU/flat-scratch.ll
+43-43llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll
+665-3308 files not shown
+748-40114 files

LLVM/project 987b0e3mlir/include/mlir/Dialect/AMDGPU/IR AMDGPUOps.td, mlir/lib/Conversion/AMDGPUToROCDL AMDGPUToROCDL.cpp

[mlir][AMDGPU] Extend amdgpu.transpose_load for gfx1250 (#198354)

This commit adds support for gfx1250's ds_load_tr* instructions to
`amdgpu.transpose_load` since they're pretty close to the gfx950 ones.

---------

Co-authored-by: Codex <codex at openai.com>
DeltaFile
+97-35mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+41-0mlir/test/Conversion/AMDGPUToROCDL/transpose_load_gfx1250.mlir
+22-15mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
+10-6mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPUOps.td
+5-5mlir/test/Conversion/AMDGPUToROCDL/transpose_load.mlir
+8-0mlir/test/Conversion/AMDGPUToROCDL/transpose_load_gfx1250_invalid.mlir
+183-612 files not shown
+192-628 files

LLVM/project ed14419mlir/include/mlir/IR BuiltinDialectBytecode.td BytecodeBase.td, mlir/test/Dialect/Builtin/Bytecode types.mlir

[mlirbc] Add missing encoding for float types (#191962)

Enabling but making it easy to disable to enable reader side first updates.
DeltaFile
+43-1mlir/include/mlir/IR/BuiltinDialectBytecode.td
+26-2mlir/test/Dialect/Builtin/Bytecode/types.mlir
+4-0mlir/include/mlir/IR/BytecodeBase.td
+73-33 files

LLVM/project d46cca0llvm/test/tools/dsymutil/ARM thumb.c, llvm/test/tools/dsymutil/X86 reproducer.test modules-pruning.cpp

[dsymutil] Add missing --linker {classic,parallel} in tests (#198568)

As I'm preparing to toggle the default, I found another set of tests
that don't explicitly pass the linker to dsymutil.
DeltaFile
+4-4llvm/test/tools/dsymutil/X86/reproducer.test
+5-0llvm/test/tools/dsymutil/X86/modules-pruning.cpp
+2-2llvm/test/tools/dsymutil/X86/remarks-linking-archive.text
+2-2llvm/test/tools/dsymutil/ARM/thumb.c
+2-2llvm/test/tools/dsymutil/X86/modules.m
+2-2llvm/test/tools/dsymutil/X86/odr-uniquing.cpp
+17-1215 files not shown
+32-2621 files

LLVM/project 43b66dfllvm/docs LangRef.rst

[IR] Explicitly note C standard library UB (#198562)

This language is to my understanding a bit outdated (if we're in a
freestanding environment, we should be handling things fine to my
knowledge, or at least I'm not aware of any outstanding issues reported
by people compiling for freestanding environments/different languages
which are somewhat prominent at this point). The language here dates
back to
68f971b1d67d51272f5c141fc9e4740e27e279f4 with some minor modifications
in 722212d1a0672ae18a23db58c4cfb7e38073abfa. Explicitly note the UB
aspect as this came up recently when working on llubi in #190147 and I
do not think hurts to explicitly note.
DeltaFile
+2-2llvm/docs/LangRef.rst
+2-21 files

LLVM/project bd74b5bllvm/include/llvm/CodeGen MachineFunction.h, llvm/lib/CodeGen MachineFunction.cpp

[AMDGPU][MC] Replace shifted registers in CFI instructions

Change-Id: I0d99e9fe43ec3b6fecac20531119956dca2e4e5c
DeltaFile
+67-67llvm/test/CodeGen/AMDGPU/sgpr-spill-overlap-wwm-reserve.mir
+33-0llvm/lib/MC/MCDwarf.cpp
+15-15llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll
+10-0llvm/lib/CodeGen/MachineFunction.cpp
+4-4llvm/test/CodeGen/AMDGPU/debug-frame.ll
+4-0llvm/include/llvm/CodeGen/MachineFunction.h
+133-865 files not shown
+143-9011 files

LLVM/project 1df6d5fllvm/lib/Target/AMDGPU SIFrameLowering.cpp SIMachineFunctionInfo.h, llvm/test/CodeGen/AMDGPU amdgpu-spill-cfi-saved-regs.ll

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs

These spills need special CFI anyway, so implementing them directly
where CFI is emitted avoids the need to invent a mechanism to track them
from ISel.

Change-Id: If4f34abb3a8e0e46b859a7c74ade21eff58c4047
Co-authored-by: Scott Linder scott.linder at amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu at amd.com
DeltaFile
+2,926-0llvm/test/CodeGen/AMDGPU/amdgpu-spill-cfi-saved-regs.ll
+12-0llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+10-0llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+9-0llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+2-0llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+2,959-05 files

LLVM/project 5a78b00llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll gfx-callable-argument-types.ll

[AMDGPU] Implement CFI for CSR spills

Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.

Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.

Change-Id: I9b09646abd2ac4e56eddf5e9aeca1a5bebbd43dd
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+3,568-2,598llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,912-1,913llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll
+2,700-12llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+631-631llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+505-510llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+394-399llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+9,710-6,063108 files not shown
+14,825-9,526114 files

LLVM/project 2af9d2dllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.960bit.ll

[AMDGPU] Use register pair for PC spill

Change-Id: Ibedeef926f7ff235a06de65a83087c151f66a416
DeltaFile
+4,331-4,331llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+1,742-1,740llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+1,562-1,560llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+1,462-1,460llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+1,238-1,236llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+1,030-1,028llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll
+11,365-11,35589 files not shown
+18,153-18,04495 files

LLVM/project e2acf26llvm/lib/Target/AMDGPU SIFrameLowering.cpp, llvm/test/CodeGen/AMDGPU debug-frame.ll eliminate-frame-index-v-add-u32.mir

[AMDGPU] Emit entry function Dwarf CFI

Entry functions represent the end of unwinding, as they are the
outer-most frame. This implies they can only have a meaningful
definition for the CFA, which AMDGPU defines using a memory location
description with a literal private address space address. The return
address is set to undefined as a sentinel value to signal the end of
unwinding.

Change-Id: I21580f6a24f4869ba32939c9c6332506032cc654
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+1,405-0llvm/test/CodeGen/AMDGPU/debug-frame.ll
+204-12llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-u32.mir
+134-6llvm/test/CodeGen/AMDGPU/eliminate-frame-index-v-add-co-u32.mir
+114-10llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-add-i32.mir
+42-5llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+34-0llvm/test/CodeGen/AMDGPU/entry-function-cfi.mir
+1,933-3322 files not shown
+2,044-5028 files

LLVM/project a4067dbllvm/test/CodeGen/AMDGPU accvgpr-spill-scc-clobber.mir pei-build-av-spill.mir

[AMDGPU] Implement CFI for non-kernel functions

This does not implement CSR spills other than those AMDGPU handles
during PEI. The remaining spills are handled in a subsequent patch.

Change-Id: I5e3a9a62cf9189245011a82a129790d813d49373
Co-authored-by: Scott Linder <scott.linder at amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu at amd.com>
DeltaFile
+5,568-0llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
+3,000-96llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
+2,208-72llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
+2,196-0llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir
+2,136-0llvm/test/CodeGen/AMDGPU/vgpr-spill-scc-clobber.mir
+1,671-1llvm/test/CodeGen/AMDGPU/debug-frame.ll
+16,779-16993 files not shown
+22,893-1,13299 files

LLVM/project fe270cfclang/lib/Driver/ToolChains Gnu.cpp, clang/test/Driver amdgpu-unwind.cl

[Clang] Default to async unwind tables for amdgcn

To avoid codegen changes when enabling debug-info (see
https://bugs.llvm.org/show_bug.cgi?id=37240) we want to
enable unwind tables by default.

There is some pessimization in post-prologepilog scheduling, and a
general solution to the problem of CFI_INSTRUCTION-as-scheduling-barrier
should be explored.

Change-Id: I83625875966928c7c4411cd7b95174dc58bda25a
DeltaFile
+26-0clang/test/Driver/amdgpu-unwind.cl
+1-0clang/lib/Driver/ToolChains/Gnu.cpp
+27-02 files

LLVM/project 431cd31

[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU

While these can be represented with .cfi_escape, using these pseudo-cfi
instructions makes .s/.mir files more readable, and it is necessary to
support updating registers in CFI instructions (something that the
AMDGPU backend requires).

Change-Id: I763d0cabe5990394670281d4afb5a170981e55d0
DeltaFile
+0-00 files

LLVM/project ab48ef0

[MIR] Error on signed integer in getUnsigned

Previously we effectively took the absolute value of the APSInt, instead
diagnose the unexpected negative value.

Change-Id: I4efe961e7b29fdf1d5f97df12f8139aac12c9219
DeltaFile
+0-00 files

LLVM/project 156e9d9flang/lib/Semantics resolve-directives.cpp, flang/test/Semantics/OpenMP declare-simd-uniform.f90

[flang][OpenMP] Remove ompFlagsRequireMark from symbol resolution

The `ompFlagsRequireMark` set was there to make sure that we put the flags
from it on symbols even when no new symbols needed to be created.

Instead of doing that, we can just put the flag on the symbol every time.
There is no harm in having these flags, it's just extra information.
DeltaFile
+2-12flang/lib/Semantics/resolve-directives.cpp
+1-1flang/test/Semantics/OpenMP/declare-simd-uniform.f90
+3-132 files

LLVM/project c5e2c25llvm/lib/MC MCSchedule.cpp, llvm/lib/MCA InstrBuilder.cpp

[MC] Add -sched-model-reservation-station-scale-factor option (#195638)

This patch adds a new CLI option to the MC layer called
`-sched-model-reservation-station-scale-factor` that enables to scale
the buffer size of all reservation stations (RS) for resources in the
scheduling model by a positive `float` factor. It is limited to scaling
OOO resources, and not special buffer sizes (-1,0,1), and similarly it
is only allowed to produce OOO resources.

This can be used for example to try find headroom in post-RA instruction
scheduling for OOO cores - e.g. scale RS size by 2 and observe IPC
gains, if so the code may be senetitive to the schedule and we may do
better.

Note: Currently, BufferSize for LSU resources defines the reservation
station (RS), but if present also the ld/st queue size, which just
points to the provided LSU resource. Thus, we currently scale them both
in lockstep, until we have an independent ld/st queue model.


    [18 lines not shown]
DeltaFile
+148-0llvm/test/tools/llvm-mca/reservation-station-scale-factor.s
+53-0llvm/lib/MC/MCSchedule.cpp
+5-5llvm/tools/llvm-mca/Views/SchedulerStatistics.cpp
+5-5llvm/lib/MCA/InstrBuilder.cpp
+3-3llvm/lib/MCA/HardwareUnits/ResourceManager.cpp
+3-3llvm/tools/llvm-mca/Views/TimelineView.cpp
+217-166 files not shown
+227-2212 files

LLVM/project 7b227a2llvm/include/llvm/ADT SmallPtrSet.h, llvm/lib/Support SmallPtrSet.cpp

[SmallPtrSet] Drop tombstones in large mode (#197637)

SmallPtrSet uses quadratic probing with tombstone deletion in large
mode. Tombstones occupy a third bucket state and hurts lookup.

Switch to linear probing with deletion implemented using Knuth TAOCP 6.4
Algorithm R.  `erase` opens a hole at the removed slot, walks forward
sliding each following entry whose probe path crosses the hole back
into it (the hole moves with each slide), and stops at the next empty
slot.  The scan stops at the next empty bucket, which is guaranteed to
exist.

`remove_if` clears matches in a single pass then calls `Grow` at the
current size to restore the linear-probe invariant, O(N) total.
(Per-match Algorithm R erase would be O(N * cluster).)

My DenseMap experiments suggest that Robin Hood Hashing and Abseil Swiss
Table family (not good at small keys) are actually worse than the
baseline.

    [7 lines not shown]
DeltaFile
+30-46llvm/lib/Support/SmallPtrSet.cpp
+38-32llvm/include/llvm/ADT/SmallPtrSet.h
+3-2llvm/unittests/ADT/SmallPtrSetTest.cpp
+71-803 files

LLVM/project c7b9aa7flang/lib/Lower Bridge.cpp, flang/test/Lower/CUDA cuda-doconc.cuf

[CUF] Fix `do concurrent` PFT navigation in `cuf.kernel` lowering

When lowering a cuf kernel directive with an explicit loop depth (e.g.
`do(2)`), the code navigated `nestedLoops-1` levels deeper in the PFT
expecting nested regular do-loops. For a `do concurrent` construct with
multiple index variables (e.g. `do concurrent(i=1:k,j=1:n)`), the PFT
represents the entire construct as a single node, so the extra
navigation stepped off into a plain statement with no nested
evaluations and triggered an assertion in `getNestedEvaluations()`.

Fix by skipping the depth navigation loop for `do concurrent`, which is
always a single flat PFT construct regardless of how many index
variables it declares.

Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+27-0flang/test/Lower/CUDA/cuda-doconc.cuf
+3-2flang/lib/Lower/Bridge.cpp
+30-22 files

LLVM/project 5d5220cflang/lib/Lower Bridge.cpp, flang/test/Lower/CUDA cuda-doconc.cuf

[CUF] Fix `do concurrent` IV type in `cuf.kernel` lowering (#198584)

Induction variables in a `do concurrent` construct inside a cuf kernel
directive were allocated with the `index` type instead of the
Fortran-declared `integer` type. This caused a type conversion failure
when the index variable was used in a context requiring a different
integer or real type (e.g. `real(i)`).

Fix by using `genType(*name.symbol)` to derive the allocation type from
the symbol's Fortran declaration.

PR stack:
- https://github.com/llvm/llvm-project/pull/198584 ◀️ 
- https://github.com/llvm/llvm-project/pull/198585

Co-authored-by: Claude Sonnet 4.6 <noreply at anthropic.com>
DeltaFile
+42-13flang/test/Lower/CUDA/cuda-doconc.cuf
+1-1flang/lib/Lower/Bridge.cpp
+43-142 files

LLVM/project 40fa628libc/config/linux/aarch64 entrypoints.txt, libc/config/linux/riscv entrypoints.txt

[libc] implement putwc (#196165)

Add putwc function and tests. Part 9/10.

Assisted by Gemini
DeltaFile
+148-0libc/test/src/wchar/putwc_test.cpp
+39-0libc/src/wchar/putwc.cpp
+28-0libc/src/wchar/putwc.h
+3-1libc/test/src/wchar/CMakeLists.txt
+1-1libc/config/linux/aarch64/entrypoints.txt
+1-1libc/config/linux/riscv/entrypoints.txt
+220-32 files not shown
+223-48 files

LLVM/project 7fc1635mlir/lib/Dialect/XeGPU/Transforms XeGPUPeepHoleOptimizer.cpp

[MLIR][XeGPU] Temporarily disable XeGPU peephole optimizer for CRI (#198031)

Temporarily disabling the optimization pass until a fix is ready.
Support for CRI was added in recent PR:
https://github.com/llvm/llvm-project/pull/197229
But had post merge issues.
DeltaFile
+4-9mlir/lib/Dialect/XeGPU/Transforms/XeGPUPeepHoleOptimizer.cpp
+4-91 files