LLVM/project 2b3e30dllvm/lib/CodeGen MachineInstr.cpp, llvm/test/CodeGen/AArch64 sme-streaming-checkvl.ll

[CodeGen] Treat hasOrderedMemoryRef as implying arbitrary loads or stores (#182000)

This prevents MachineSink from sinking loads past fences (or any other instruction marked as hasSideEffects).

Fixes: #181708
DeltaFile
+250-5llvm/test/CodeGen/AMDGPU/misched-remat-revert.ll
+44-0llvm/test/CodeGen/AMDGPU/machine-sink-fence.ll
+12-13llvm/test/CodeGen/AMDGPU/iglp-no-clobber.ll
+13-6llvm/test/CodeGen/AArch64/sme-streaming-checkvl.ll
+5-5llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll
+1-2llvm/lib/CodeGen/MachineInstr.cpp
+325-316 files

LLVM/project d9d6b16llvm/lib/Analysis ValueTracking.cpp, llvm/test/Transforms/Attributor nofpclass.ll

ValueTracking: Handle ConstantDataSequential in computeKnownFPClass (#184191)

DeltaFile
+16-17llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-aggregates.ll
+18-0llvm/test/Transforms/Attributor/nofpclass.ll
+7-0llvm/lib/Analysis/ValueTracking.cpp
+41-173 files

LLVM/project ff0220dflang/docs OpenACC.md, flang/lib/Semantics check-acc-structure.cpp check-acc-structure.h

[flang][acc] Allow orphaned acc cache directive (#184448)

While the spec allows the cache directive at the top of a loop body, the
directive has also been utilized at the top of an acc routine. This PR
removes the semantic check that rejects the cache directive outside of a
loop, allowing orphaned `!$acc cache` similar to CIR.

The OpenACC.md deviation document is updated to note this extension.
DeltaFile
+0-10flang/lib/Semantics/check-acc-structure.cpp
+1-0flang/docs/OpenACC.md
+0-1flang/lib/Semantics/check-acc-structure.h
+0-1flang/test/Semantics/OpenACC/acc-cache-validity.f90
+1-124 files

LLVM/project 937bf9cutils/bazel/llvm-project-overlay/bolt BUILD.bazel

[bazel] Fix parse_headers in bolt (#184648)

DeltaFile
+2-0utils/bazel/llvm-project-overlay/bolt/BUILD.bazel
+2-01 files

LLVM/project 56a5355clang/lib/CodeGen CGDebugInfo.cpp, clang/test/DebugInfo/CXX debug-info-constexpr-array.cpp

[DebugInfo] Emit DW_AT_const_value for constexpr array static members (#182442)

Clang does not emit a `DW_AT_const_value` in DWARF, while GCC does. This
patch fixes this issue through handling Array `APValues` and respective
handling in the backend through `ConstantDataSequential`

Fixes #165220
DeltaFile
+192-0llvm/test/DebugInfo/X86/debug-info-constexpr-array.ll
+53-0clang/test/DebugInfo/CXX/debug-info-constexpr-array.cpp
+52-0clang/lib/CodeGen/CGDebugInfo.cpp
+12-1llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp
+309-14 files

LLVM/project b80248aclang-tools-extra/clang-doc JSONGenerator.cpp MDMustacheGenerator.cpp, clang-tools-extra/clang-doc/assets/md namespace-template.mustache

[clang-doc] Add a Mustache Markdown generator (#177221)

Adds a Markdown generator that uses Mustache templates. This patch adds
the templates themselves and implements changes to the JSONGenerator to
allow for the creation of specific files needed by the MD tests like
`all-files.json`.

This backend should be considered experimental. It satisfies all the
same tests that the current MD backend is tested against, but those
don't seem to provide full coverage for all functionality inside that
backend. It also doesn't output everything provided by JSON. It doesn't
use the MD unittests because the Mustache templates must currently be
written to files.
DeltaFile
+127-0clang-tools-extra/test/clang-doc/basic-project.mustache.test
+118-5clang-tools-extra/test/clang-doc/namespace.cpp
+98-17clang-tools-extra/clang-doc/JSONGenerator.cpp
+115-0clang-tools-extra/clang-doc/MDMustacheGenerator.cpp
+63-0clang-tools-extra/clang-doc/assets/md/namespace-template.mustache
+45-17clang-tools-extra/clang-doc/tool/ClangDocMain.cpp
+566-3922 files not shown
+844-7428 files

LLVM/project c2db12dllvm/lib/MC XCOFFObjectWriter.cpp, llvm/test/CodeGen/PowerPC aix-reloc-sorting.ll

[AIX] Sort relocations in XCOFF object writer. (#180807)

Some relocations (like R_REF) are emitted to an offset 0 within the CSECT. If other relocations have already been emitted then the relocations are not in increasing order and the linker will emit an error. Sort the relocations before emitting to fix the problem.
DeltaFile
+77-0llvm/test/CodeGen/PowerPC/aix-reloc-sorting.ll
+8-3llvm/lib/MC/XCOFFObjectWriter.cpp
+85-32 files

LLVM/project 3028604utils/bazel/llvm-project-overlay/clang BUILD.bazel

[bazel] Add target for `clang-nvlink-wrapper` (#184644)

DeltaFile
+30-0utils/bazel/llvm-project-overlay/clang/BUILD.bazel
+30-01 files

LLVM/project 9105d9clld/ELF/Arch Hexagon.cpp, lld/test/ELF hexagon-duplex-relocs.s

[lld][Hexagon] Fix findMaskR8 missing duplex support (#183936)

findMaskR8() lacked an isDuplex() check, unlike findMaskR6(),
findMaskR11(), and findMaskR16() which all handle duplex instructions.

When the assembler generates R_HEX_8_X on a duplex SA1_addi instruction
(e.g. `{ r0 = add(r0, ##target); memw(r1+#0) = r2 }`), the wrong mask
0x00001fe0 placed relocation bits at [12:5] instead of [25:20],
corrupting the low sub-instruction (e.g. memw became memb).

Add the isDuplex() check returning 0x03f00000, and add a comprehensive
test covering all duplex instruction x relocation type combinations
across findMaskR6, findMaskR8, findMaskR11, and findMaskR16.
DeltaFile
+40-0lld/test/ELF/hexagon-duplex-relocs.s
+2-0lld/ELF/Arch/Hexagon.cpp
+42-02 files

LLVM/project 958c68bllvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll

Merge branch 'main' into users/arsenm/valuetracking/handle-constant-data-sequential-computeKnownFPClass
DeltaFile
+84,419-78,498llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+66,293-29,491llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+25,751-24,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+23,663-20,281llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+21,867-18,577llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+19,112-16,445llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+241,105-188,0741,532 files not shown
+409,008-294,5021,538 files

LLVM/project af3cea5llvm/test/Transforms/InstCombine simplify-demanded-fpclass-aggregates.ll

Drop fixme
DeltaFile
+0-1llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-aggregates.ll
+0-11 files

LLVM/project 55d4280llvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll aarch64-addv.ll

Update tests, remove regression test
DeltaFile
+0-13llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+6-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+2-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+8-213 files

LLVM/project 8778f23llvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll

Compile time regression test
DeltaFile
+13-0llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+13-01 files

LLVM/project 9fcff00llvm/lib/Target/AArch64 AArch64MIPeepholeOpt.cpp, llvm/test/CodeGen/AArch64 peephole-insvigpr.mir fpclamptosat_vec.ll

[AArch64] Fold zero-high vector inserts in MI peephole optimisation

Summary
This patch follows on from #178227.
The previous ISel fold lowers the 64-bit case to:
    fmov d0, x0
    fmov d0, d0
which is not ideal and could be fmov d0, x0.
A redundant copy comes from the INSERT_SUBREG/INSvi64lane.

This peephole detects <2 x i64> vectors made of a zeroed upper and low
lane produced by FMOVXDr/FMOVDr, then removes the redundant copy.

Further updated tests and added MIR tests.
DeltaFile
+51-0llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
+47-4llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+24-24llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
+7-8llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+6-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+2-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+137-442 files not shown
+137-468 files

LLVM/project 9cd054bllvm/lib/Target/AArch64 AArch64InstrFormats.td AArch64ISelLowering.cpp, llvm/test/CodeGen/AArch64 arm64-int-neon.ll arm64-vqadd.ll

[AArch64] Add lowering for misc NEON intrinsics (#183050)

This patch adds custom lowering for the following NEON intrinsics to
enable better codegen for convert and load/store operations:

- suqadd
- usqadd
- abs
- sqabs
- sqneg
DeltaFile
+144-4llvm/test/CodeGen/AArch64/arm64-int-neon.ll
+16-16llvm/test/CodeGen/AArch64/arm64-vqadd.ll
+16-8llvm/lib/Target/AArch64/AArch64InstrFormats.td
+11-2llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+8-4llvm/lib/Target/AArch64/AArch64InstrInfo.td
+5-6llvm/test/CodeGen/AArch64/arm64-neon-copy.ll
+200-406 files

LLVM/project b9f1199llvm/lib/Transforms/InstCombine InstCombineSimplifyDemanded.cpp, llvm/test/Transforms/InstCombine simplify-demanded-fpclass-aggregates.ll

InstCombine: Support extractvalue in SimplifyDemandedFPClass (#184171)

Previously this only handled extractvalue of frexp.
DeltaFile
+67-0llvm/test/Transforms/InstCombine/simplify-demanded-fpclass-aggregates.ll
+6-1llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+73-12 files

LLVM/project 94fa697llvm/lib/Target/AMDGPU/MCTargetDesc AMDGPUTargetStreamer.cpp, llvm/lib/Target/AMDGPU/Utils AMDGPUBaseInfo.cpp AMDGPUBaseInfo.h

AMDGPU: Clean up print handling of AMDGPUTargetID

Provide print to raw_ostream method and use it where applicable.
DeltaFile
+7-5llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+8-3llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+9-0llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+24-83 files

LLVM/project fb664adllvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll aarch64-addv.ll

Update tests, remove regression test
DeltaFile
+0-13llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+6-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+2-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+8-213 files

LLVM/project 34f1b86llvm/lib/Target/AArch64 AArch64MIPeepholeOpt.cpp, llvm/test/CodeGen/AArch64 peephole-insvigpr.mir fpclamptosat_vec.ll

[AArch64] Fold zero-high vector inserts in MI peephole optimisation

Summary
This patch follows on from #178227.
The previous ISel fold lowers the 64-bit case to:
    fmov d0, x0
    fmov d0, d0
which is not ideal and could be fmov d0, x0.
A redundant copy comes from the INSERT_SUBREG/INSvi64lane.

This peephole detects <2 x i64> vectors made of a zeroed upper and low
lane produced by FMOVXDr/FMOVDr, then removes the redundant copy.

Further updated tests and added MIR tests.
DeltaFile
+51-0llvm/test/CodeGen/AArch64/peephole-insvigpr.mir
+47-4llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
+24-24llvm/test/CodeGen/AArch64/fpclamptosat_vec.ll
+7-8llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+6-6llvm/test/CodeGen/AArch64/aarch64-addv.ll
+2-2llvm/test/CodeGen/AArch64/bitcast-extend.ll
+137-442 files not shown
+137-468 files

LLVM/project 2383919llvm/test/CodeGen/AArch64 aarch64-neonvector-tensorflow-regression.ll

Compile time regression test
DeltaFile
+13-0llvm/test/CodeGen/AArch64/aarch64-neonvector-tensorflow-regression.ll
+13-01 files

LLVM/project b0b5834llvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp, llvm/test/CodeGen/X86 known-pow2.ll

[DAG] Improved handling of ISD::ROTL and ISD::ROTR in isKnownToBeAPowerOfTwo (#182744)

Fixes #181642
DeltaFile
+80-0llvm/test/CodeGen/X86/known-pow2.ll
+1-1llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+81-12 files

LLVM/project a14f9f8mlir/include/mlir/Dialect/XeGPU/IR XeGPUAttrs.td, mlir/lib/Dialect/XeGPU/Transforms XeGPUWgToSgDistribute.cpp

[mlir][xegpu] Add support for accessing the default order of a layout.  (#184451)

Currently, `getOrder` returns null if the user does not provide an
`order` in xegpu layout. This behavior is undesirable when coupled with
utility functions that work on top of layouts (like `isTransposeOf`).
This PR introduce a `getEffectiveOrder` which always returns the true
order, even if user decides to omit it.
DeltaFile
+46-15mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
+5-4mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops.mlir
+4-4mlir/test/Dialect/XeGPU/subgroup-distribute.mlir
+0-7mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp
+3-2mlir/test/Dialect/XeGPU/xegpu-wg-to-sg-unify-ops-rr.mlir
+2-2mlir/test/Dialect/XeGPU/subgroup-distribute-unit.mlir
+60-346 files

LLVM/project 87bb6e0libsycl/include/sycl/__impl usm_functions.hpp, libsycl/src usm_functions.cpp

fix my comments

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova at intel.com>
DeltaFile
+2-2libsycl/include/sycl/__impl/usm_functions.hpp
+2-2libsycl/src/usm_functions.cpp
+4-42 files

LLVM/project 18226e7llvm/lib/Target/RISCV RISCVInstrInfoZvk.td RISCVISelLowering.cpp, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll clmul-sdnode.ll

[RISCV] Lower i8/i16/i32 scalable vector ISD::CLMUL/CLMULH with Zvbc32e. (#184465)

DeltaFile
+53,024-7,001llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+15,172-1,553llvm/test/CodeGen/RISCV/rvv/clmul-sdnode.ll
+11-3llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
+6-4llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+2-0llvm/lib/Target/RISCV/RISCVFeatures.td
+68,215-8,5615 files

LLVM/project 77f1480llvm/lib/Target/SPIRV SPIRVPrepareFunctions.cpp

[SPIRV] Fix return value of runOnModule for SPIRVPrepareFunctions (#184636)

We need to return `true` if the module is changed by the pass, and we
forgot to do that in this new case.

This fixes a buildbot
[fail](https://lab.llvm.org/buildbot/#/builders/187/builds/17475).

Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
DeltaFile
+2-1llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
+2-11 files

LLVM/project 668d09bclang/lib/Frontend CompilerInstance.cpp, clang/test/Modules pch-config-macros.c

[clang][Modules] Fixing unexpected warnings triggered by a PCH and a module with config macros  (#177078)

When a PCH is compiled with macro definitions on the command line, such
as `-DCONFIG1`, an unexpected warning can occur if the macro definitions
happen to belong to an imported module's config macros. The warning may
look like the following:
```
definition of configuration macro 'CONFIG1' has no effect on the import of 'Mod1'; pass '-DCONFIG1=...' on the command line to configure the module
```
while `-DCONFIG1` is clearly on the command line when `clang` compiles
the source that uses the PCH and the module.

The reason this can happen is a combination of two things:
1. The logic that checks for config macros is not aware of any command
line macros passed through the PCH

([here](https://github.com/llvm/llvm-project/blob/7976ac990000a58a7474269a3ca95e16aed8c35b/clang/lib/Frontend/CompilerInstance.cpp#L1562)).
2. `clang` _replaces_ the predefined macros on the command line with the
predefined macros from the PCH, which does not include any builtins

    [7 lines not shown]
DeltaFile
+93-0clang/test/Modules/pch-config-macros.c
+9-4clang/lib/Frontend/CompilerInstance.cpp
+5-0clang/test/Modules/Inputs/pch-config-macros/include/Mod1.h
+4-0clang/test/Modules/Inputs/pch-config-macros/include/module.modulemap
+111-44 files

LLVM/project 4b3a924utils/bazel/llvm-project-overlay/lldb BUILD.bazel, utils/bazel/llvm-project-overlay/lldb/source/Plugins BUILD.bazel

[bazel] Fix more parse_headers cases in lldb (#184534)

CI doesn't have a toolchain that supports this so we don't validate this
there, but locally this fixes some issues if you're using a toolchain
that does. Mostly since lldb has a bunch of circular deps we just have
to disable it for the header only targets we create to avoid those
circular deps.
DeltaFile
+42-2utils/bazel/llvm-project-overlay/lldb/source/Plugins/BUILD.bazel
+39-1utils/bazel/llvm-project-overlay/lldb/BUILD.bazel
+81-32 files

LLVM/project 8f8590eclang/lib/Driver/ToolChains AIX.cpp

[OpenMP][AIX] Add libpthreads for -fopenmp (#184629)

The compiler uses TLS for OpenMP thread‑private data, which results in
references to symbols such as `__tls_get_addr` in `libpthreads`.
Therefore, this PR adds `libpthreads` to the link command when
`-fopenmp` is specified.
DeltaFile
+2-0clang/lib/Driver/ToolChains/AIX.cpp
+2-01 files

LLVM/project 39d5aeamlir/include/mlir/Dialect/OpenACC OpenACCUtilsLoop.h, mlir/lib/Dialect/OpenACC/Utils OpenACCUtilsLoop.cpp

[OpenACC] Replace terminators with scf.yield in wrapMultiBlockRegionWithSCFExecuteRegion (#184458)

When wrapping a multi-block region in `scf.execute_region`, replace
`func::ReturnOp` (if flag `convertFuncReturn` is set) and `acc::YieldOp`
in all the blocks with `scf.yield` so the region has a valid SCF
terminator.
DeltaFile
+154-0mlir/unittests/Dialect/OpenACC/OpenACCUtilsLoopTest.cpp
+20-13mlir/lib/Dialect/OpenACC/Utils/OpenACCUtilsLoop.cpp
+10-6mlir/include/mlir/Dialect/OpenACC/OpenACCUtilsLoop.h
+184-193 files

LLVM/project a4207f3llvm/test/CodeGen/AMDGPU amdgcn.bitcast.1024bit.ll amdgcn.bitcast.512bit.ll, llvm/test/CodeGen/RISCV/rvv clmulh-sdnode.ll

Merge branch 'main' into users/KseniyaTikhomirova/usm_3_alloc
DeltaFile
+84,419-78,498llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll
+25,751-24,782llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll
+23,663-20,281llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll
+21,867-18,577llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll
+13,685-22,906llvm/test/CodeGen/RISCV/rvv/clmulh-sdnode.ll
+19,112-16,445llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll
+188,497-181,4891,674 files not shown
+348,974-290,2321,680 files