LLVM/project d4b2258flang/lib/Lower/OpenMP ClauseProcessor.cpp OpenMP.cpp, flang/test/Lower/OpenMP declare-simd.f90

Fix declare simd linear stride rescaling and arg_types verifier

1. Rescale constant linear steps from source-level element counts to byte
   strides in Flang's processLinear(). For reference-like parameters
   (pointers or non-VALUE dummy arguments) with Linear or LinearRef ABI
   kind, the step must be multiplied by the element size in bytes. This
   matches Clang's rescaling in CGOpenMPRuntime.cpp. Val and UVal kinds
   are not rescaled as they describe value changes, not pointer strides.
   Var-strides are also not rescaled as the value is an argument index.

2. Add a verifier check in DeclareSimdOp to ensure 'arg_types' length
   matches the number of function arguments, preventing out-of-bounds
   access during MLIR-to-LLVM IR translation.

Also restructure processLinear() to compute stepOperand per-variable
instead of appending the same operand for all objects in the clause,
enabling per-variable rescaling.

Assisted with copilot.
DeltaFile
+49-6flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+7-7flang/test/Lower/OpenMP/declare-simd.f90
+7-6mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+8-0mlir/test/Dialect/OpenMP/invalid.mlir
+7-0mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+2-2flang/lib/Lower/OpenMP/OpenMP.cpp
+80-212 files not shown
+83-248 files

LLVM/project 66f06f5mlir/lib/Dialect/OpenACC/Transforms ACCComputeLowering.cpp, mlir/test/Dialect/OpenACC acc-compute-lowering-compute.mlir

[mlir][acc] Sink constants into acc.compute_region when creating (#187777)

When converting OpenACC compute constructs to acc.compute_region, also
sink constants inside so they do not become live-ins.
DeltaFile
+32-0mlir/lib/Dialect/OpenACC/Transforms/ACCComputeLowering.cpp
+24-0mlir/test/Dialect/OpenACC/acc-compute-lowering-compute.mlir
+56-02 files

LLVM/project ecee1cbflang/include/flang/Semantics openmp-utils.h, flang/lib/Semantics openmp-utils.cpp check-omp-loop.cpp

[flang][OpenMP] Provide reasons for calculated depths

If the depth (either semantic or perfect) was limited by some factor,
include the reason for what caused the reduction.

Issue: https://github.com/llvm/llvm-project/issues/185287
DeltaFile
+68-31flang/lib/Semantics/openmp-utils.cpp
+63-0flang/test/Semantics/OpenMP/tile09.f90
+15-9flang/test/Semantics/OpenMP/do08.f90
+7-5flang/lib/Semantics/check-omp-loop.cpp
+7-2flang/test/Semantics/OpenMP/do13.f90
+2-2flang/include/flang/Semantics/openmp-utils.h
+162-496 files not shown
+174-4912 files

LLVM/project bd3b06bllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.class.ll llvm.amdgcn.class.f16.ll

[AMDGPU][GlobalISel] Add RegBankLegalize rules for amdgcn.class (#178827)
DeltaFile
+71-32llvm/test/CodeGen/AMDGPU/llvm.amdgcn.class.ll
+33-15llvm/test/CodeGen/AMDGPU/llvm.amdgcn.class.f16.ll
+8-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+1-2llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.class.mir
+113-494 files

LLVM/project bb369f1libc/src/string/memory_utils op_x86.h, libc/src/string/memory_utils/x86_64 inline_memcpy.h

[libc][x86] Add Non-temporal code path for large memcpy (#187108)

Large memcopies are pretty rare, but are more common in ML workloads
(copying large matrixes/tensors, often to/from CPU host).

For large copies NTA stores can provide performance advantages for both
memcpy itself and the rest of the workload (by reducing cache
pollution). Other runtimes already have NTA path for large copies, so
add 1 to the llvm-libc.

Internal whole-program loadtests shows small, but statistically
significant improvement of 0.1%. ML specific bencahmrks showed 10-20%
performance gain, and fleetbench (https://github.com/google/fleetbench,
which has more up-to-date version of libc benchmarks) shows ~3% gain
(ns/byte for distributions taken from various applications).

```
[Memcpy_0]_L1      0.01950n ± 3%   0.01900n ± 5%       ~ (p=0.390 n=20)
[Memcpy_0]_L2      0.02300n ± 0%   0.02300n ± 0%       ~ (p=0.256 n=20)

    [35 lines not shown]
DeltaFile
+34-8libc/src/string/memory_utils/x86_64/inline_memcpy.h
+10-0libc/src/string/memory_utils/op_x86.h
+44-82 files

LLVM/project 827ddb2llvm/test/CodeGen/AMDGPU waitcnt-wcg-attributes.mir

[AMDGPU][SIInsertWaitcnts] Add test functions in waitcnt-wcg-attributes.mir (#186504)

This patch adds two more functions for exercising the target-cpu
attribute.
DeltaFile
+44-3llvm/test/CodeGen/AMDGPU/waitcnt-wcg-attributes.mir
+44-31 files

LLVM/project dd30239llvm/lib/Target/AMDGPU SIInstrInfo.cpp, llvm/test/CodeGen/AMDGPU convergent.mir

[AMDGPU] Add basic verification for source modifiers (#186733)

Source modifiers (input modifiers) should always be immediates.
This commit made machine verifier reject non-immediate source modifiers.

Closes #182243
DeltaFile
+30-0llvm/test/MachineVerifier/AMDGPU/invalid-vop3-source-modifiers.mir
+10-10llvm/test/CodeGen/AMDGPU/convergent.mir
+1-1llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+41-113 files

LLVM/project 498dd13llvm/lib/Target/AMDGPU DSInstructions.td, llvm/test/MC/AMDGPU gfx13_asm_vds.s gfx13_asm_vds_alias.s

Add VDS encoding for gfx13 (#187693)

Co-authored-by: Jay Foad <jay.foad at amd.com>
DeltaFile
+1,987-0llvm/test/MC/AMDGPU/gfx13_asm_vds.s
+179-149llvm/lib/Target/AMDGPU/DSInstructions.td
+147-0llvm/test/MC/AMDGPU/gfx13_asm_vds_alias.s
+2,313-1493 files

LLVM/project 950eaaaclang/lib/Sema SemaLookup.cpp

[Clang] Use stable_sort for UnqualUsingDirectiveSet for determinism in ambiguity notes (#187750)

In SemaLookup.cpp, `UnqualUsingDirectiveSet::done()` uses `llvm::sort`
with a comparator that only checks the ancestor relationships. So, if
there are multiple "neighbor" namespaces, they are considered equal, and
thus `llvm::sort` may return the using directives in a non-deterministic
order.

This was observed as a test failure on clang/test/CXX/drs/cwg0xx.cpp at
line 220 after PR #187219 started verifying the diagnostics ordering.
The two "candidate found by name lookup" notes were emitted in the
opposite order from the test's expectations -- in some builds of Clang,
but not others.

Switching to `llvm::stable_sort` ensures that using-directives are
always traversed in a deterministic order, and thus the notes emitted
deterministically.
DeltaFile
+1-1clang/lib/Sema/SemaLookup.cpp
+1-11 files

LLVM/project cfc94a6flang/include/flang/Semantics openmp-utils.h, flang/lib/Semantics openmp-utils.cpp check-omp-loop.cpp

[flang][OpenMP] Introduce `WithReason<T>` for nest/sequence properties (#187563)

This helper class contains an optional value and a "reason" message. It
replaces the uses of std::pair<optional<...>, Reason>.

Issue: https://github.com/llvm/llvm-project/issues/185287
DeltaFile
+73-36flang/lib/Semantics/openmp-utils.cpp
+34-8flang/include/flang/Semantics/openmp-utils.h
+18-19flang/lib/Semantics/check-omp-loop.cpp
+125-633 files

LLVM/project 74f88c8clang/lib/CodeGen CodeGenTypes.cpp, clang/test/CodeGen builtins-extended-image.c builtins-image-load.c

[Clang][AMDGPU] Lower __amdgpu_texture_t to <8 x i32> instead of ptr addrspace(0)
DeltaFile
+220-264clang/test/CodeGen/builtins-extended-image.c
+210-252clang/test/CodeGen/builtins-image-load.c
+140-168clang/test/CodeGen/builtins-image-store.c
+5-5clang/test/CodeGen/amdgpu-image-rsrc-type-debug-info.c
+7-2clang/lib/CodeGen/CodeGenTypes.cpp
+582-6915 files

LLVM/project 78b651allvm/lib/Target/RISCV RISCVSchedSiFive7.td, llvm/test/tools/llvm-mca/RISCV/SiFiveX100 floating-point.test zfhmin.test

[RISCV] Fix the pipe used by `fmv.x.<fp>/<fp>.x` in SiFive7 sched model (#187740)

These FP <-> Integer conversion instructions should use PipeA instead.
DeltaFile
+9-9llvm/test/tools/llvm-mca/RISCV/SiFiveX100/floating-point.test
+6-6llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+5-5llvm/test/tools/llvm-mca/RISCV/SiFiveX100/zfhmin.test
+20-203 files

LLVM/project 92187cdclang/test/OpenMP target_teams_distribute_parallel_for_simd_schedule_codegen.cpp teams_distribute_parallel_for_simd_schedule_codegen.cpp, libc/AOR_v20.02/math/test/traces sincosf.txt exp.txt

Rebase

Created using spr 1.3.7
DeltaFile
+0-31,999libc/AOR_v20.02/math/test/traces/sincosf.txt
+0-16,000libc/AOR_v20.02/math/test/traces/exp.txt
+5,294-4,814clang/test/OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp
+5,238-4,758clang/test/OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp
+4,350-4,098clang/test/OpenMP/distribute_parallel_for_simd_codegen.cpp
+4,004-3,524clang/test/OpenMP/teams_distribute_parallel_for_schedule_codegen.cpp
+18,886-65,1937,577 files not shown
+472,109-314,9597,583 files

LLVM/project 9b428dbllvm/test/Transforms/LoopVectorize pointer-induction-index-width-smaller-than-iv-width.ll

[NFC][LV] Fix what seems to be a typo in the test

The test was added in https://github.com/llvm/llvm-project/commit/4e9894498e166ef6b207c25e780db0b6f006cc89.

Alternative fixes would be:
* Remove unused GEP, although not clear why we'd want to overwrite
  stored `i64` with `ptr` store.
* Keep this patch, but perform both GEPs with `i64` element type to
  reduce the diff. It's not clear if the scalarization caused by that
  type mismatch is intentional/relevant for the original change.
DeltaFile
+14-33llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll
+14-331 files

LLVM/project 63c9573llvm/test/Transforms/LoopStrengthReduce/X86 reuse-existing-phi.ll

[LSR] Add regression test for unnecessary phi introduction (#187751)

Test case for https://github.com/llvm/llvm-project/issues/187728
DeltaFile
+34-0llvm/test/Transforms/LoopStrengthReduce/X86/reuse-existing-phi.ll
+34-01 files

LLVM/project ca05871clang/lib/CodeGen CodeGenTypes.cpp, clang/test/CodeGen builtins-extended-image.c builtins-image-load.c

[Clang][AMDGPU] Lower __amdgpu_texture_t to <8 x i32> instead of ptr addrspace(0)
DeltaFile
+220-264clang/test/CodeGen/builtins-extended-image.c
+210-252clang/test/CodeGen/builtins-image-load.c
+140-168clang/test/CodeGen/builtins-image-store.c
+5-5clang/test/CodeGen/amdgpu-image-rsrc-type-debug-info.c
+6-2clang/lib/CodeGen/CodeGenTypes.cpp
+581-6915 files

LLVM/project a5c6dd7flang/include/flang/Semantics openmp-utils.h

Delete unused header
DeltaFile
+0-1flang/include/flang/Semantics/openmp-utils.h
+0-11 files

LLVM/project 688090bllvm/test/CodeGen/AMDGPU llvm.amdgcn.class.ll llvm.amdgcn.class.f16.ll

Rebased and updated tests
DeltaFile
+91-163llvm/test/CodeGen/AMDGPU/llvm.amdgcn.class.ll
+0-24llvm/test/CodeGen/AMDGPU/llvm.amdgcn.class.f16.ll
+91-1872 files

LLVM/project 6e3ac64flang/lib/Lower/OpenMP ClauseProcessor.cpp, flang/test/Lower/OpenMP declare-simd.f90

Fix declare simd linear stride rescaling and arg_types verifier

1. Rescale constant linear steps from source-level element counts to byte
   strides in Flang's processLinear(). For reference-like parameters
   (pointers or non-VALUE dummy arguments) with Linear or LinearRef ABI
   kind, the step must be multiplied by the element size in bytes. This
   matches Clang's rescaling in CGOpenMPRuntime.cpp. Val and UVal kinds
   are not rescaled as they describe value changes, not pointer strides.
   Var-strides are also not rescaled as the value is an argument index.

2. Add a verifier check in DeclareSimdOp to ensure 'arg_types' length
   matches the number of function arguments, preventing out-of-bounds
   access during MLIR-to-LLVM IR translation.

Also restructure processLinear() to compute stepOperand per-variable
instead of appending the same operand for all objects in the clause,
enabling per-variable rescaling.

Assisted with copilot.
DeltaFile
+49-6flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+6-6flang/test/Lower/OpenMP/declare-simd.f90
+8-0mlir/test/Dialect/OpenMP/invalid.mlir
+7-0mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+2-2mlir/test/Dialect/OpenMP/ops.mlir
+72-145 files

LLVM/project cc4461eflang/include/flang/Semantics openmp-utils.h, flang/lib/Semantics openmp-utils.cpp check-omp-loop.cpp

Remove handling of unsourced messages
DeltaFile
+3-19flang/lib/Semantics/openmp-utils.cpp
+3-12flang/include/flang/Semantics/openmp-utils.h
+3-3flang/lib/Semantics/check-omp-loop.cpp
+9-343 files

LLVM/project 5804f7dllvm/lib/CodeGen CodeGenPrepare.cpp, llvm/test/Transforms/CodeGenPrepare/AArch64 ptrauth.ll

[CGP][PAC] Flip PHI and blends when all immediate modifiers are the same

GVN PRE, SimplifyCFG and possibly other passes may hoist the call to
`@llvm.ptrauth.blend` intrinsic, introducing multiple duplicate call
instructions hidden behind a PHI node. This prevents the instruction
selector from generating safer code by absorbing the address and
immediate modifiers into separate operands of AUT, PAC, etc. pseudo
instruction.

This patch makes CodeGenPrepare pass detect when discriminator is
computed as a PHI node with all incoming values being blends with the
same immediate modifier. Each such discriminator value is replaced by a
single blend, whose address argument is computed by a PHI node.
DeltaFile
+142-0llvm/test/Transforms/CodeGenPrepare/AArch64/ptrauth.ll
+75-0llvm/lib/CodeGen/CodeGenPrepare.cpp
+217-02 files

LLVM/project 9431920llvm/test/tools/llvm-debuginfod-find headers-winhttp.test

[llvm] Silence llvm-debuginfod-find/headers-winhttp.test on Windows bots temporarily (#187753)

Windows bots are still failing after a3db68a97b2c321e and
d7dbba55bff52f342. This test is new, let's take it off while
we investigate.
DeltaFile
+1-1llvm/test/tools/llvm-debuginfod-find/headers-winhttp.test
+1-11 files

LLVM/project 07896d4clang/test/OpenMP target_firstprivate_codegen.cpp target_teams_distribute_simd_codegen.cpp

[OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#186261)

Summary:
This PR changes the handling of the emitted kernels when targeting a CPU
to be a pointer struct.

The old handling emitted a standard function prototype, this
necessitated a target specific ABI to call it because the signature
differed with the number of arguments. Instead, this PR emits a void
pointer to a naturally aligned struct, this is what APIs like `pthreads`
assert.

This allows us to remove all the complexity around launching host
kernels and just pass the argument list.
DeltaFile
+804-696clang/test/OpenMP/target_firstprivate_codegen.cpp
+774-570clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp
+762-566clang/test/OpenMP/target_parallel_for_simd_codegen.cpp
+587-403clang/test/OpenMP/target_teams_codegen.cpp
+391-289clang/test/OpenMP/target_teams_distribute_codegen.cpp
+367-273clang/test/OpenMP/target_parallel_for_codegen.cpp
+3,685-2,79734 files not shown
+7,515-5,38940 files

LLVM/project 60db764utils/bazel/llvm-project-overlay/clang BUILD.bazel

[Bazel] Port a2c0c43699917bb26a3eb20fefcbf29ff120ce70
DeltaFile
+15-0utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+15-01 files

LLVM/project 0ec6e1dclang/lib/CIR/CodeGen CIRGenModule.cpp CIRGenModule.h, clang/lib/CIR/Dialect/IR CIRDialect.cpp

[CIR] Address Space support for GlobalOps (#179082)

Related: https://github.com/llvm/llvm-project/issues/179278,
https://github.com/llvm/llvm-project/issues/160386

Extends cir.global to accept address space attributes. Globals can now
specify either `target_address_space(N)` or
`lang_address_space(offload_*)`. Address spaces are also preserved
throughout get_global ops.
DeltaFile
+68-21clang/lib/CIR/CodeGen/CIRGenModule.cpp
+46-0clang/test/CIR/Lowering/global-address-space.cir
+30-0clang/test/CIR/IR/address-space.cir
+23-2clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+20-0clang/test/CIR/IR/invalid-addrspace.cir
+16-4clang/lib/CIR/CodeGen/CIRGenModule.h
+203-279 files not shown
+270-5115 files

LLVM/project 4a5da64clang/lib/CIR/CodeGen CIRGenClass.cpp CIRGenExprScalar.cpp

[CIR][NFC] Minor cleanups to missing feature markers (#187754)

This fixes a few places where MissingFeatures asserts were incorrect,
extends the text of two errorNYI diagnostics to disambiguate them, and
fixes a typo in an adjacent comment.
DeltaFile
+3-2clang/lib/CIR/CodeGen/CIRGenClass.cpp
+2-2clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
+1-1clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+1-1clang/lib/CIR/CodeGen/CIRGenDecl.cpp
+7-64 files

LLVM/project cc0cccbclang/lib/Serialization ASTReader.cpp, clang/test/Modules merge-target-features.cpp

improve

Created using spr 1.3.7
DeltaFile
+28-28clang/lib/Serialization/ASTReader.cpp
+3-2clang/test/Modules/merge-target-features.cpp
+31-302 files

LLVM/project bc6a265offload/test/offloading/fortran implicit-record-field-mapping.f90 formatted-io.f90

[offload] Use flang-rt for test feature requirements (#187733)
DeltaFile
+1-5offload/test/offloading/fortran/implicit-record-field-mapping.f90
+1-1offload/test/offloading/fortran/formatted-io.f90
+1-1offload/test/offloading/fortran/io.f90
+3-73 files

LLVM/project c002a8allvm/test/CodeGen/AArch64 ptrauth-isel.ll

[AArch64][PAC] Precommit ptrauth-isel.ll tests on calls and tail calls
DeltaFile
+209-0llvm/test/CodeGen/AArch64/ptrauth-isel.ll
+209-01 files

LLVM/project aa6f819llvm/lib/Target/AArch64 AArch64ISelLowering.cpp AArch64InstrInfo.td, llvm/lib/Target/AArch64/GISel AArch64CallLowering.cpp

[AArch64][PAC] Rework discriminator analysis for calls and tail calls

Make use of fixupBlendComponents for AUTH_TCRETURN[_BTI] and for
BLRA[_RVMARKER] pseudos the same way it is done for AUT/PAC/AUTPAC.

This patch unifies discriminator analysis for DAGISel and GlobalISel
and improves cross-BB analysis in case of DAGISel.
DeltaFile
+18-41llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+23-16llvm/test/CodeGen/AArch64/ptrauth-isel.ll
+6-18llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
+3-1llvm/lib/Target/AArch64/AArch64InstrInfo.td
+2-2llvm/test/CodeGen/AArch64/ptrauth-call.ll
+52-785 files