LLVM/project 9081ac2llvm/lib/Target/DirectX DXILResourceAccess.cpp, llvm/test/CodeGen/DirectX/ResourceAccess handle-to-index.ll handle-cases.ll

[DirectX][ResourceAccess] Resolve resource handles at access (#182106)

This change resolves handles (or corresponding ptr) that all point into
a unique global resource by propagating an index into that global
resource through control flow.

If a unique global resource can't be resolved, an error is reported
instead.

This specifically resolves all handles that point into the same global
resource array.

Resolves: https://github.com/llvm/llvm-project/issues/165288

By reporting an error, this is part of resolving
https://github.com/llvm/llvm-project/issues/179303.
DeltaFile
+247-4llvm/lib/Target/DirectX/DXILResourceAccess.cpp
+156-0llvm/test/CodeGen/DirectX/ResourceAccess/handle-to-index.ll
+117-0llvm/test/CodeGen/DirectX/ResourceAccess/handle-cases.ll
+37-0llvm/test/CodeGen/DirectX/ResourceAccess/non-unique.ll
+557-44 files

LLVM/project 640ba7b.github/workflows/containers/github-action-ci-tooling Dockerfile

[Github] Bump clang-format/clang-tidy to v22.1.0 (#184374)

Per the version policy for these tools to bump them at the beginning of
the release cycle and at the end of the release cycle.
DeltaFile
+2-2.github/workflows/containers/github-action-ci-tooling/Dockerfile
+2-21 files

LLVM/project 5936c91clang-tools-extra/clang-doc YAMLGenerator.cpp JSONGenerator.cpp

Format
DeltaFile
+2-4clang-tools-extra/clang-doc/YAMLGenerator.cpp
+3-2clang-tools-extra/clang-doc/JSONGenerator.cpp
+2-1clang-tools-extra/clang-doc/MDGenerator.cpp
+1-1clang-tools-extra/clang-doc/Representation.cpp
+8-84 files

LLVM/project 92a078bclang-tools-extra/clang-doc MDGenerator.cpp Generators.cpp, clang-tools-extra/unittests/clang-doc GeneratorTest.cpp ClangDocTest.cpp

[clang-doc] Improve complexity of Index construction

The existing implementation ends up with an O(N^2) algorithm due to
repeated linear scans during index construction. Switching to a
StringMap allows us to reduce this to O(N), since we no longer need to
search the vector.

The `BM_Index_Insertion` benchmark measures the time taken to insert N
unique records into the index.

| Scale (N Items) | Baseline (ns) | Patched (ns) | Speedup | Change |
|----------------:|--------------:|-------------:|--------:|-------:|
| 10              | 9,977         | 11,004       | 0.91x   | +10.3% |
| 64              | 69,249        | 69,166       | 1.00x   | -0.1%  |
| 512             | 1,932,714     | 525,877      | 3.68x   | -72.8% |
| 4,096           | 92,411,535    | 4,589,030    | 20.1x   | -95.0% |
| 10,000          | 577,384,945   | 12,998,039   | 44.4x   | -97.7% |

The patch delivers significant improvements to scalability. At 10,000

    [13 lines not shown]
DeltaFile
+71-17clang-tools-extra/unittests/clang-doc/GeneratorTest.cpp
+21-10clang-tools-extra/clang-doc/MDGenerator.cpp
+13-11clang-tools-extra/clang-doc/Generators.cpp
+11-5clang-tools-extra/clang-doc/JSONGenerator.cpp
+3-3clang-tools-extra/clang-doc/YAMLGenerator.cpp
+2-2clang-tools-extra/unittests/clang-doc/ClangDocTest.cpp
+121-482 files not shown
+124-518 files

LLVM/project b33c7dbclang-tools-extra/clang-doc CMakeLists.txt, clang-tools-extra/clang-doc/benchmarks ClangDocBenchmark.cpp CMakeLists.txt

[clang-doc] Add basic benchmarks for library functionality (#182620)

clang-doc's performance is good, but we suspect it could be better. To
track this with more fidelity, we can add a set of GoogleBenchmarks that
exercise portions of the library. To start we try to track high level
items that we monitor via the TimeTrace functions, and give them their
own micro benchmarks. This should give us more confidence that switching
out data structures or updating algorthms will have a positive
performance impact.

Note that an LLM helped generate portions of the benchmarks and
parameterize them. Most of the internal logic was written by me, but
the LLM was used to handle boilerplate and adaptation to the harness.
DeltaFile
+234-0clang-tools-extra/clang-doc/benchmarks/ClangDocBenchmark.cpp
+18-0clang-tools-extra/clang-doc/benchmarks/CMakeLists.txt
+4-0clang-tools-extra/clang-doc/CMakeLists.txt
+256-03 files

LLVM/project 779d76cllvm/lib/Target/AArch64 AArch64LoadStoreOptimizer.cpp AArch64PassRegistry.def, llvm/test/CodeGen/AArch64 stp-opt-with-renaming-ld3.mir

[AArch64] Add basic NPM support for LoadStoreOptimizer. (#184090)

This adds what I can tell is the the basics for NPM support on LLVM, and
ports the AArch64LoadStoreOpt pass to have NPM support.
DeltaFile
+41-18llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
+30-0llvm/lib/Target/AArch64/AArch64PassRegistry.def
+10-2llvm/lib/Target/AArch64/AArch64.h
+5-3llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+1-0llvm/test/CodeGen/AArch64/stp-opt-with-renaming-ld3.mir
+87-235 files

LLVM/project b44dba9mlir CMakeLists.txt

[mlir] Install '.pdll' files along with the header files (#183855)

The CMake install configuration  was not installing 
'include/mlir/Transforms/DialectConversion.pdll`, which is required
by the installed PDLL compiler tools for interacting withthe dialect 
conversion infrastructure.
DeltaFile
+1-0mlir/CMakeLists.txt
+1-01 files

LLVM/project 3e9de78offload/test/api omp_virtual_func_multiple_inheritance_02.cpp omp_virtual_func_multiple_inheritance_01.cpp

Revert "[OpenMP][clang] Indirect and Virtual function call mapping from host …"

This reverts commit b23438661c1056bae385daba1501afb762d1e336.
DeltaFile
+0-403offload/test/api/omp_virtual_func_multiple_inheritance_02.cpp
+0-400offload/test/api/omp_virtual_func_multiple_inheritance_01.cpp
+0-322offload/test/api/omp_indirect_func_struct.c
+0-153offload/test/api/omp_virtual_func.cpp
+0-124offload/test/api/omp_indirect_func_array.c
+0-95offload/test/api/omp_indirect_func_basic.c
+0-1,49714 files not shown
+1-1,80820 files

LLVM/project bb2b957llvm/lib/Target/ARM ARMFrameLowering.cpp ARMInstrThumb2.td, llvm/test/CodeGen/Thumb2 pacbti-m-bxaut.ll

[Thumb2] Use BXAUT instruction if available (#183056)

Generated a

  bxaut r12, lr, sp

instruction rather than

  aut r12, lr, sp
  bx lr

The bxaut instruction is available when for thumb2 code with the
armv8.1m-main architecture and PACBTI is enabled

This change introduces a new pseudo instruction ARM::t2BXAUT_RET which
is similar to the existing pseudo instruction ARM::tBX_RET.

---------

Co-authored-by: Copilot <175728472+Copilot at users.noreply.github.com>
DeltaFile
+158-0llvm/test/CodeGen/Thumb2/pacbti-m-bxaut.ll
+12-2llvm/lib/Target/ARM/ARMFrameLowering.cpp
+4-0llvm/lib/Target/ARM/ARMInstrThumb2.td
+1-1llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
+175-34 files

LLVM/project 829da49clang/include/clang/CIR/Dialect/Builder CIRBaseBuilder.h, clang/lib/CIR/CodeGen CIRGenBuiltinAArch64.cpp

[CIR][AArch64] Add lowering for vaba_* and vabd_* builtins (#183595)

Add CIR lowering for the following AdvSIMD (NEON) intrinsic groups:

* vabd_*  – Absolute difference

https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#absolute-difference

* vaba_*  – Absolute difference and accumulate

https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#absolute-difference-and-accumulate

Tests for these intrinsics were split out from:
  * "test/CodeGen/AArch64/neon-intrinsics.c"

and moved to:
 * "test/CodeGen/AArch64/neon/intrinsics.c".

The following helper hooks were adapted from the ClangIR project:

    [2 lines not shown]
DeltaFile
+460-0clang/test/CodeGen/AArch64/neon/intrinsics.c
+0-364clang/test/CodeGen/AArch64/neon-intrinsics.c
+135-0clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp
+3-0clang/include/clang/CIR/Dialect/Builder/CIRBaseBuilder.h
+598-3644 files

LLVM/project a232b5bmlir/lib/Conversion/ShardToMPI ShardToMPI.cpp, mlir/lib/Dialect/Shard/Transforms Simplify.cpp Simplifications.cpp

[mlir][shard, mpi] Adding Shard/MPI reduce_scatter and simplification (#184189)

- introduces a simplify pass, which finds such patterns and replaces it
with the equivalent `reduce-scatter`
- promotes the test-pass `test-shard-optimizations` to a proper pass and adds
  - folding allgather+allslice into reduce_scatter
- sanitizes the `shard.reduce_scatter` op
- adds a new `mpi.reduce_scatter_block` op
- lowers `shard.reduce_scatter` to MPI
- lowers `mpi-reduce_scatter_block` to llvm

---------

Co-authored-by: Copilot <175728472+Copilot at users.noreply.github.com>
DeltaFile
+280-227mlir/test/Conversion/MPIToLLVM/mpitollvm.mlir
+262-0mlir/test/Dialect/Shard/simplify.mlir
+185-0mlir/lib/Dialect/Shard/Transforms/Simplify.cpp
+0-179mlir/test/Dialect/Shard/simplifications.mlir
+152-2mlir/lib/Conversion/ShardToMPI/ShardToMPI.cpp
+0-120mlir/lib/Dialect/Shard/Transforms/Simplifications.cpp
+879-52821 files not shown
+1,225-74227 files

LLVM/project 5f8f1e2clang/lib/CIR/Dialect/Transforms FlattenCFG.cpp, clang/test/CIR/Transforms flatten-try-op.cir

[CIR] Fix unreachable block generation in EH flattening (#184268)

The previous EH CFG flattening implementation would sometimes create
dispatch handlers in unreachable blocks. This seemed OK until I started
implementing the code to lower the flattened CIR to an ABI-specific form
and those weren't getting updated.

This change fixes the flattening code to avoid generating unreachable
blocks.
DeltaFile
+6-12clang/test/CIR/Transforms/flatten-try-op.cir
+9-0clang/lib/CIR/Dialect/Transforms/FlattenCFG.cpp
+15-122 files

LLVM/project f82f8cflld/ELF SyntheticSections.cpp Config.h, lld/ELF/Arch X86.cpp X86_64.cpp

[ELF] Add TargetInfo::initTargetSpecificSections hook (#184292)

so that we can move target-specific synthetic section creation from
createSyntheticSections into per-target initTargetSpecificSections
overrides. This reduces target-specific code in the shared
SyntheticSections.cpp. The subsequent commits (split from
https://github.com/llvm/llvm-project/pull/184057) will move these
target-specific classes to Arch/ files.
DeltaFile
+2-22lld/ELF/SyntheticSections.cpp
+8-0lld/ELF/Arch/X86.cpp
+8-0lld/ELF/Arch/X86_64.cpp
+7-0lld/ELF/Arch/PPC64.cpp
+2-4lld/ELF/Config.h
+6-0lld/ELF/Arch/ARM.cpp
+33-262 files not shown
+41-268 files

LLVM/project 3f1d968mlir/include/mlir/IR Region.h Operation.h, mlir/lib/Dialect/OpenACC/IR OpenACC.cpp

[mlir][IR] Add variadic `getParentOfType` overloads (#184071)

Add `getParentOfType` overloads that work with multiple types.
DeltaFile
+7-23mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp
+11-0mlir/include/mlir/IR/Region.h
+1-8mlir/lib/Dialect/OpenACC/Transforms/LegalizeDataValues.cpp
+8-0mlir/include/mlir/IR/Operation.h
+1-7mlir/lib/Dialect/OpenACC/Utils/OpenACCUtils.cpp
+2-4mlir/lib/Dialect/SparseTensor/Transforms/Utils/CodegenUtils.cpp
+30-426 files

LLVM/project 70f88ebllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll

Address comments

Created using spr 1.3.7
DeltaFile
+84,419-78,498llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+25,751-24,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+23,663-20,281llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+21,867-18,577llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+19,112-16,445llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+11,541-22,066llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+186,353-180,649667 files not shown
+293,680-264,860673 files

LLVM/project e68f696.github/workflows spirv-tests.yml

[CI][SPIRV][NFC] Remove unneccessary mkdir from workflow (#184353)

The `CMake` command does the `mkdir` automatically.

Pointed out in https://github.com/llvm/llvm-project/pull/184174

Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
DeltaFile
+0-1.github/workflows/spirv-tests.yml
+0-11 files

LLVM/project 6cc42b3libc/src/__support/GPU allocator.cpp

[libc] Various GPU allocator tweaks and optimizations (#184368)

Summary:
Some low-hanging fruit tweaks. Mostly preventing redundant loads and
unnecessary widening. Some fixes as well, like nullptr handling,
incorrect rounding, and oversized bitfields.
DeltaFile
+29-38libc/src/__support/GPU/allocator.cpp
+29-381 files

LLVM/project d61b45cclang/lib/CodeGen CGAtomic.cpp, clang/test/CodeGen atomic-arm64.c atomic-ops.c

[Clang] Generate ptr and float atomics without integer casts (#183853)

Summary:
LLVM IR should support these for all cases except for compare-exchange.
Currently the code goes through an integer indirection for these cases.
This PR changes the behavior to use atomics directly to the target
memory type.
DeltaFile
+13-13clang/lib/CodeGen/CGAtomic.cpp
+4-4clang/test/CodeGen/atomic-arm64.c
+3-3clang/test/CodeGen/atomic-ops.c
+3-3clang/test/CodeGen/big-atomic-ops.c
+2-2clang/test/CodeGenOpenCL/atomic-ops.cl
+25-255 files

LLVM/project aef9627llvm/lib/Target/SPIRV SPIRVCommandLine.cpp SPIRVCommandLine.h, llvm/lib/Target/SPIRV/MCTargetDesc SPIRVBaseInfo.h

Reapply "[SPIRV][NFCI] Use unordered data structures for SPIR-V extensions (#184162)

Reapply https://github.com/llvm/llvm-project/pull/183567 with minor
changes.

Problem causing the revert was we couldn't use the enum in `DenseMap`
directly because of some `TableGen` limitations so I casted made the map
use the underlying type, but that caused some UB, so I
[fixed](https://github.com/llvm/llvm-project/pull/183769) the `TableGen`
limitation so now it just works.
DeltaFile
+160-173llvm/lib/Target/SPIRV/SPIRVCommandLine.cpp
+7-12llvm/lib/Target/SPIRV/SPIRVCommandLine.h
+4-6llvm/lib/Target/SPIRV/SPIRVSubtarget.cpp
+3-5llvm/lib/Target/SPIRV/SPIRVSubtarget.h
+3-0llvm/lib/Target/SPIRV/MCTargetDesc/SPIRVBaseInfo.h
+1-1llvm/lib/Target/SPIRV/SPIRVAPI.cpp
+178-1971 files not shown
+179-1977 files

LLVM/project 02b2a1ellvm/lib/Target/M68k/GISel M68kCallLowering.cpp

Fix `assignValueToReg` function's argument (#184354)

Because of [PR#178198](https://github.com/llvm/llvm-project/pull/178198)
the argument changes for `assignValueToReg`.

This PR aiming at fixing M86k experimental target
DeltaFile
+4-2llvm/lib/Target/M68k/GISel/M68kCallLowering.cpp
+4-21 files

LLVM/project dd0a780llvm CMakeLists.txt, openmp/runtime CMakeLists.txt

CMake fixes
DeltaFile
+8-8openmp/runtime/cmake/arm64x.cmake
+2-2llvm/CMakeLists.txt
+1-2openmp/runtime/CMakeLists.txt
+11-123 files

LLVM/project 205a89allvm/include/llvm/CodeGen Rematerializer.h

Remove useless argument
DeltaFile
+2-2llvm/include/llvm/CodeGen/Rematerializer.h
+2-21 files

LLVM/project 938e87fclang-tools-extra/unittests/clang-tidy LexerUtilsTest.cpp, clang/test/SemaHLSL static_resources.hlsl

Address comments

Created using spr 1.3.7
DeltaFile
+216-0lldb/test/API/functionalities/gdb_remote_client/TestBatchedBreakpointStepOver.py
+204-0clang-tools-extra/unittests/clang-tidy/LexerUtilsTest.cpp
+170-1lldb/source/Target/ThreadList.cpp
+138-0clang/test/SemaHLSL/Resources/static_resources.hlsl
+0-138clang/test/SemaHLSL/static_resources.hlsl
+135-0clang/test/SemaHLSL/Resources/resource_binding_attr_error_udt.hlsl
+863-139230 files not shown
+4,932-2,698236 files

LLVM/project 358f477clang/lib/CodeGen CGStmtOpenMP.cpp, clang/test/OpenMP parallel_for_codegen.cpp for_range_loop_codegen.cpp

[Clang] Fix clang crash for fopenmp statement(for) inside lambda function (#146772)

C++ range-for statements introduce implicit variables such as `__range`,
`__begin`, and `__end`. When such a loop appears inside an OpenMP
loop-based directive (e.g. `#pragma omp for`) within a lambda, these
implicit variables were not emitted before OpenMP privatization logic
ran.

OMPLoopScope assumes that loop-related variables are already present in
LocalDeclMap and temporarily overrides their addresses. Since the
range-for implicit variables had not yet been emitted, they were treated
as newly introduced entries and later erased during restore(), leading
to missing mappings and a crash during codegen.

Fix this by emitting the range-for implicit variables before OpenMP
privatization (setVarAddr/apply), ensuring that existing mappings are
correctly overridden and restored.

This fixes #146335
DeltaFile
+1,128-1,116clang/test/OpenMP/parallel_for_codegen.cpp
+252-0clang/test/OpenMP/for_range_loop_codegen.cpp
+14-13clang/lib/CodeGen/CGStmtOpenMP.cpp
+1,394-1,1293 files

LLVM/project e10655ellvm/test/CodeGen/X86 known-never-zero.ll

[X86] known-never-zero.ll - add sdiv/udiv vector test coverage for #183047 (#184350)

DeltaFile
+146-0llvm/test/CodeGen/X86/known-never-zero.ll
+146-01 files

LLVM/project 43503c4llvm/lib/Target/AArch64 AArch64ConditionOptimizer.cpp

[NFC][AArch64] isPureCmp is a duplicate of canAdjustCmp, so remove the duplicate (#183568)

Just delete the duplicate function.
DeltaFile
+1-19llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
+1-191 files

LLVM/project 81396ebllvm/lib/Target/AMDGPU SIShrinkInstructions.cpp, llvm/test/CodeGen/AMDGPU v_swap_b16.ll v_swap_b32.mir

[AMDGPU] Generate more swaps (#184164)

Generate more swaps from:

```
   mov T, X
   ...
   mov X, Y
   ...
   mov Y, X
```
by being more careful about what use/defs of X, Y, T are allowed in
intervening code and allowing flexibility where the swap is inserted.

---------

Signed-off-by: John Lu <John.Lu at amd.com>
DeltaFile
+154-0llvm/test/CodeGen/AMDGPU/v_swap_b16.ll
+62-52llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+66-32llvm/test/CodeGen/AMDGPU/v_swap_b32.mir
+15-29llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
+2-2llvm/test/CodeGen/AMDGPU/whole-wave-functions.ll
+299-1155 files

LLVM/project e570faaclang/lib/Driver/ToolChains HIPAMD.cpp, clang/test/Driver hip-toolchain-no-rdc.hip spirv-amd-toolchain.c

[SPIR-V][HIP] Disable SPV_KHR_untyped_pointers (#183530)

SPV_KHR_untyped_pointers in SPIR-V to LLVM translator is incomplete with
few known issues. Therefore we better not to rely on this extension for SPIR-V
generation.
DeltaFile
+1-1clang/lib/Driver/ToolChains/HIPAMD.cpp
+1-1clang/test/Driver/hip-toolchain-no-rdc.hip
+1-1clang/test/Driver/spirv-amd-toolchain.c
+3-33 files

LLVM/project acb8a6dllvm/lib/Target/AArch64 AArch64InstrInfo.td, llvm/test/CodeGen/AArch64 neon-extractbitcast-mir.ll

[AArch64] Fix type mismatch in bitconvert + vec_extract patterns (#183549)

This patch fixes mismatch in element width during isel of bitconvert +
vec_extract nodes. This resolves issue reported on
[this](https://github.com/llvm/llvm-project/pull/172837) PR.
DeltaFile
+18-0llvm/test/CodeGen/AArch64/neon-extractbitcast-mir.ll
+2-2llvm/lib/Target/AArch64/AArch64InstrInfo.td
+20-22 files

LLVM/project c9d065allvm/test/CodeGen/X86 shift-i256.ll funnel-shift-i256.ll

[X86] Add i256 shift / funnel shift coverage to match i512 tests (#184346)

shift-i256.ll - added x86-64/x86-64-v2/x86-64-v3/x86-64-v4 coverage and retained the x86 test coverage
DeltaFile
+3,169-313llvm/test/CodeGen/X86/shift-i256.ll
+2,056-0llvm/test/CodeGen/X86/funnel-shift-i256.ll
+5,225-3132 files