LLVM/project 46b14f8mlir/cmake/modules AddMLIRPython.cmake

[mlir][Python] don't build libnanobind if module only has "pure" extensions
DeltaFile
+19-7mlir/cmake/modules/AddMLIRPython.cmake
+19-71 files

LLVM/project 1a0f16fllvm/test/CodeGen/AArch64 stack-tagging-untag-placement.ll

test

Created using spr 1.3.6
DeltaFile
+2-0llvm/test/CodeGen/AArch64/stack-tagging-untag-placement.ll
+2-01 files

LLVM/project 7b401a0llvm/lib/Target/AArch64 AArch64StackTagging.cpp, llvm/lib/Transforms/Instrumentation HWAddressSanitizer.cpp

[𝘀𝗽𝗿] initial version

Created using spr 1.3.6
DeltaFile
+318-34llvm/test/Instrumentation/HWAddressSanitizer/use-after-scope.ll
+28-21llvm/lib/Transforms/Utils/MemoryTaggingSupport.cpp
+46-2llvm/test/CodeGen/AArch64/stack-tagging-split-lifetime.ll
+2-6llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
+2-6llvm/lib/Target/AArch64/AArch64StackTagging.cpp
+4-2llvm/test/CodeGen/AArch64/stack-tagging-untag-placement.ll
+400-712 files not shown
+405-728 files

LLVM/project c0fea1fllvm/lib/Transforms/Utils MemoryTaggingSupport.cpp, llvm/test/CodeGen/AArch64 stack-tagging-split-lifetime.ll stack-tagging-untag-placement.ll

[𝘀𝗽𝗿] changes to main this commit is based on

Created using spr 1.3.6

[skip ci]
DeltaFile
+34-34llvm/test/Instrumentation/HWAddressSanitizer/use-after-scope.ll
+46-2llvm/test/CodeGen/AArch64/stack-tagging-split-lifetime.ll
+9-19llvm/lib/Transforms/Utils/MemoryTaggingSupport.cpp
+4-2llvm/test/CodeGen/AArch64/stack-tagging-untag-placement.ll
+2-1llvm/test/CodeGen/AArch64/stack-tagging-ex-1.ll
+95-585 files

LLVM/project f90cbc6offload/test/tools llvm-omp-device-info.c

[offload][lit] Enable llvm-omp-device-info.c on Intel GPUs (#175084)

It's XPASSing after https://github.com/llvm/llvm-project/pull/172946.

https://lab.llvm.org/staging/#/builders/225/builds/313

Signed-off-by: Nick Sarnie <nick.sarnie at intel.com>
DeltaFile
+0-1offload/test/tools/llvm-omp-device-info.c
+0-11 files

LLVM/project 541eb8bclang/include/clang/Basic DebugOptions.def, clang/include/clang/Options Options.td

Address comments
DeltaFile
+25-23clang/lib/CodeGen/CGDebugInfo.cpp
+4-0clang/include/clang/Options/Options.td
+3-0clang/include/clang/Basic/DebugOptions.def
+1-1clang/test/DebugInfo/Generic/macro-info.c
+33-244 files

LLVM/project a651edfclang/lib/Headers gpuintrin.h, clang/test/Headers gpuintrin.c

[Clang] Make gpuintrin out of range grid dimension accessors match OpenCL (#174605)

Summary:
Currently these return an unreachable / invalid value if used out of
range. This PR changes this to match the OpenCL behavior to both give it
a defined value and make it easier to use in those contexts.
DeltaFile
+61-54clang/test/Headers/gpuintrin.c
+4-4clang/lib/Headers/gpuintrin.h
+65-582 files

LLVM/project 03ad3d2llvm/lib/CodeGen LocalStackSlotAllocation.cpp, llvm/test/CodeGen/AMDGPU local-stack-alloc-block-sp-reference.ll local-stack-alloc-add-references.gfx10.mir

[CodeGen] Consider imm offsets when sorting framerefs (#171012)

LocalStackSlotAllocation pass disallows negative offsets with respect to
a base register. The pass ends up introducing a new register for such
frame references. This patch helps LocalStackSlotAlloca to additionally
consider the immediate offset of an instruction, when sorting frame refs
- hence, avoiding negative offsets and maximizing reuse of the existing
registers.
DeltaFile
+27-15llvm/lib/CodeGen/LocalStackSlotAllocation.cpp
+4-8llvm/test/CodeGen/Thumb/frame-chain.ll
+6-5llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
+5-5llvm/test/CodeGen/AMDGPU/local-stack-alloc-add-references.gfx10.mir
+3-6llvm/test/CodeGen/AMDGPU/flat-scratch-alloca-issue-155902.ll
+1-1llvm/test/CodeGen/AMDGPU/local-stack-alloc-sort-framerefs.mir
+46-406 files

LLVM/project 51f6c58mlir/include/mlir/Bindings/Python IRCore.h

[mlir][Python] fix namespace shadowing on MSVC (#175077)

If you set `MLIR_PYTHON_BINDINGS_DOMAIN=mlir`, you get namespace nesting
like `mlir::python::mlir` and then `mlir::Twine` shadows `llvm::Twine`
(but only on MSVC). So prefix with `::llvm` to have the correct root
namespace.

Co-authored-by: Abhishek Varma <abhvarma at amd.com>
DeltaFile
+11-11mlir/include/mlir/Bindings/Python/IRCore.h
+11-111 files

LLVM/project 3fd745fclang/lib/Basic DarwinSDKInfo.cpp

[clang][driver][darwin] Report bad SDKSettings as a fatal error rather than unreachable (#175073)

Fatal error is more appropriate than unreachable when the SDKSettings is
not in a recognized form (encountered in a few tests with incomplete
SDKSettings.json).
DeltaFile
+3-1clang/lib/Basic/DarwinSDKInfo.cpp
+3-11 files

LLVM/project 2ac0dbdclang/include/clang/Basic DiagnosticDriverKinds.td, clang/include/clang/Options Options.td

[SFrame][Retry] Add assembler option --gsframe (#165806)

This plumbs the option --gsframe through the various levels needed to
support it in the assembler.

This is the final step in assembler-level sframe support for x86. With
it in place, clang produces sframe-sections that successfully link with
gnu-ld.

LLD support is pending some discussion.

The previous PR (https://github.com/llvm/llvm-project/pull/165322) had a
bad merge, but the only comments were as below. Both done.

1. Fix some stray formatting.
2. Add tests that:

the option is passed on to cc1
the correct error is emitted when an unsupported platform is used

    [4 lines not shown]
DeltaFile
+26-0clang/lib/Driver/ToolChains/Clang.cpp
+24-0clang/test/Driver/sframe.c
+7-0clang/test/Misc/cc1as-sframe.s
+6-0clang/tools/driver/cc1as_main.cpp
+5-0clang/include/clang/Options/Options.td
+4-0clang/include/clang/Basic/DiagnosticDriverKinds.td
+72-02 files not shown
+75-08 files

LLVM/project a1cfcc4llvm/lib/CodeGen CodeGenTargetMachineImpl.cpp

[CodeGen] add RuntimeLibraryInfoWrapper pass to addPassesToEmitMC  (#174682)

Register RuntimeLibraryInfoWrapper with the pass manager, following the
change in 04c81a99735c, so that codegen in JIT compiler using ORC JIT is
working correctly.
In our downstream target, memcpy was lowered to a loop because
RuntimeLibraryInfo was missing.
DeltaFile
+10-0llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp
+10-01 files

LLVM/project a1c2882llvm/include/llvm/ExecutionEngine/Orc Core.h, llvm/lib/ExecutionEngine/Orc Core.cpp

[ORC] Add JITDylibDefunct Error. (#174923)

This Error can be returned from operations on JITDylibs that cannot
proceed as the target JITDylib has been closed.

This patch uses the new error to replace an unsafe assertion in
JITDylib::define: If a JITDylib::define operation is run by an in-flight
task after the target JITDylib is closed it should error out rather than
asserting.

See also https://github.com/llvm/llvm-project/issues/174922
DeltaFile
+15-1llvm/include/llvm/ExecutionEngine/Orc/Core.h
+10-0llvm/lib/ExecutionEngine/Orc/Core.cpp
+9-0llvm/unittests/ExecutionEngine/Orc/CoreAPIsTest.cpp
+34-13 files

LLVM/project 3938502mlir/include/mlir/Bindings/Python IRCore.h

[mlir][Python] fix namespace shadowing

Co-authored-by: Abhishek Varma <abhvarma at amd.com>
DeltaFile
+11-11mlir/include/mlir/Bindings/Python/IRCore.h
+11-111 files

LLVM/project e4b97ebllvm/test/MachineVerifier verify-inlineasmbr.mir test_g_memcpy.mir

[MachineVerifier](NFC)(TestOnly) Canonicalise top-level MachineVerifier tests (#172527)

DeltaFile
+5-60llvm/test/MachineVerifier/verify-inlineasmbr.mir
+17-21llvm/test/MachineVerifier/test_g_memcpy.mir
+17-21llvm/test/MachineVerifier/test_g_memmove.mir
+14-20llvm/test/MachineVerifier/test_g_phi.mir
+15-19llvm/test/MachineVerifier/test_g_memcpy_inline.mir
+11-15llvm/test/MachineVerifier/test_g_bzero.mir
+79-15680 files not shown
+429-71386 files

LLVM/project 125a53cllvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/X86 multi-parent-instr-copyable-regular.ll

Revert "[SLP]Update deps for copyables operands, if the user is used several times in node"

This reverts commit 6e1acd061e74f44df6d53d54c78d1e50790456a8 to fix
crashes detected in  https://lab.llvm.org/buildbot/#/builders/25/builds/14678.
DeltaFile
+0-85llvm/test/Transforms/SLPVectorizer/X86/multi-parent-instr-copyable-regular.ll
+2-0llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+2-852 files

LLVM/project 6c5535bllvm/lib/CodeGen/SelectionDAG SelectionDAG.cpp

[SelectionDAG] Unify ISD::LOAD handling in ComputeNumSignBits. NFC (#175060)

Range metadata was handled in a ISD::LOAD case in the main opcode
switch. Extending loads and constant pools were handled with special
code after the main switch. Move this code into the ISD::LOAD case of
the main switch.

There is one slight change here, I put the Op.getResNo() == 0 check
before the range handling. This should be more correct.
DeltaFile
+47-48llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+47-481 files

LLVM/project f59d120lldb/source/Target Thread.cpp, lldb/unittests/Thread ThreadTest.cpp

[lldb] Keep the unexpected b/p state for suspended threads (#174264)

This fixes stepping out for a case when two threads reach the
stepping-out target breakpoint simultaneously, but a concurrent thread
executes the breakpoint first. The issue affects platforms with software
breakpoints. The scenario is as follows:

* The `step-out` command is executed for thread `A`.
* `ThreadPlanStepOut` creates a breakpoint at the target location.
* All threads are resumed, because the `step-out` command does not
  suspend other threads.
* Threads `A` and `B` reach the stepping-out address at the same time,
  but `B` executes the breakpoint instruction first.
* `SetThreadStoppedAtUnexecutedBP()` is called for thread `A`, and
  `SetThreadHitBreakpointSite()` is called for thread `B`.
* Thread `B` has no plans to stop at this location, so
  `ThreadPlanStepOverBreakpoint` is scheduled.
* The plan disables the breakpoint and resumes thread `B` with
  `eStateStepping`; for thread `A`, `ShouldResume(eStateSuspended)` is

    [8 lines not shown]
DeltaFile
+54-0lldb/unittests/Thread/ThreadTest.cpp
+10-1lldb/source/Target/Thread.cpp
+64-12 files

LLVM/project 583ce49offload/plugins-nextgen/level_zero/include L0Device.h, offload/plugins-nextgen/level_zero/src L0Device.cpp

[OFFLOAD] Make L0 provide more information about device to be consistent with other plugins (#172946)

Update information about devices provided by level zero plugin in order
to be more consistent with other plugins.
DeltaFile
+24-1offload/plugins-nextgen/level_zero/src/L0Device.cpp
+2-0offload/plugins-nextgen/level_zero/include/L0Device.h
+26-12 files

LLVM/project 5ab966allvm/lib/Transforms/Scalar LoopFuse.cpp, llvm/test/Transforms/LoopFusion pr166356.ll

[LoopFusion] Non-loop block must be the immediate successor of exit (#175034)

Loop fusion assumes the non-loop block of a guarded adjacent loop is the
immediate successor of its exit block. This patch ensures this condition
is hold and fixes the crash #166356.
DeltaFile
+50-0llvm/test/Transforms/LoopFusion/pr166356.ll
+6-5llvm/lib/Transforms/Scalar/LoopFuse.cpp
+56-52 files

LLVM/project 5cf9208llvm/lib/Target/AMDGPU AMDGPULowerKernelAttributes.cpp, llvm/test/CodeGen/AMDGPU implicit-arg-block-count.ll

[AMDGPU] Optimize block count calculations to the new ABI (#174112)

Summary:
We already have a way to get the block count using the old grid size
lookup and dividing it by the number of threads. We did not want to make
a new intrinsic to do the same thing, so this optimization pattern
matches on this usage to automatically optimize it to the new form. This
should improve performance of old kernels by converting branches into a
simple index lookup and removing the division.
DeltaFile
+322-0llvm/test/CodeGen/AMDGPU/implicit-arg-block-count.ll
+46-0llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
+368-02 files

LLVM/project 9c849f4mlir/cmake/modules AddMLIRPython.cmake

[mlir][Python] dont export all symbols on MSVC for MLIRPythonSupport
DeltaFile
+0-3mlir/cmake/modules/AddMLIRPython.cmake
+0-31 files

LLVM/project 3a1a1b8mlir/include/mlir/Bindings/Python IRCore.h

[mlir][Python] fix namespace shadowing
DeltaFile
+11-11mlir/include/mlir/Bindings/Python/IRCore.h
+11-111 files

LLVM/project f7b20ecllvm/lib/Target/AMDGPU AMDGPURegBankLegalizeRules.cpp, llvm/test/CodeGen/AMDGPU/GlobalISel regbankselect-umulh.mir regbankselect-smulh.mir

[AMDGPU][GlobalISel] Add RegBankLegalize support for G_UMULH, G_SMULH (#174555)

DeltaFile
+7-5llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-umulh.mir
+7-5llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-smulh.mir
+6-0llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+20-103 files

LLVM/project fdead4dbolt Maintainers.md Maintainers.txt, llvm Maintainers.md

[BOLT] Update maintainers list
DeltaFile
+69-0bolt/Maintainers.md
+0-29bolt/Maintainers.txt
+1-1llvm/Maintainers.md
+70-303 files

LLVM/project b86ac2aclang/include/clang/Basic BuiltinsSPIRVCommon.td, clang/test/CodeGenSPIRV/Builtins group.c

[SPIR-V] Add clang builtin for group-wide barrier (#175064)

Summary:
This adds a clang builtin for the existing group sync. I was considering
instead exposing a raw barrier operation and chaining it with a
`__scoped_atomic_thread_fence` but this seemed simpler. Right now this
implies a sequentially consistent memory fence. These semantics should
already match with what's implied with CUDA `__syncthreads`. I'm unsure
if there's a situation where we'd need more control. If we want more
control we'd probably just want to match it up with the scoped atomic
scopes.
DeltaFile
+13-0clang/test/CodeGenSPIRV/Builtins/group.c
+2-2llvm/include/llvm/IR/IntrinsicsSPIRV.td
+1-0clang/include/clang/Basic/BuiltinsSPIRVCommon.td
+16-23 files

LLVM/project c02da3dclang/include/clang/Basic BuiltinsSPIRVCommon.td, clang/lib/CodeGen/TargetBuiltins SPIR.cpp

[SPIR-V] Add clang builtin for subgroup shuffles (#174655)

Summary:
This is an attempt to begin filling out some missing pieces to allow
more generic compute code to use SPIR-V flavored builtins. This should
provide the basic shuffle operation. The next most important one is the
ballot, but I don't think we have an IR intrinsic for that yet.

I don't know SPIR-V very well so let me know if this is the proper
function with the proper semantic checks.
DeltaFile
+36-0clang/lib/Sema/SemaSPIRV.cpp
+12-0clang/test/SemaSPIRV/BuiltIns/subgroup-errors.c
+10-2clang/test/CodeGenSPIRV/Builtins/subgroup.c
+8-0clang/lib/CodeGen/TargetBuiltins/SPIR.cpp
+1-0clang/include/clang/Basic/BuiltinsSPIRVCommon.td
+67-25 files

LLVM/project 86b95b0clang/lib/CodeGen CGCoroutine.cpp, clang/test/CodeGenCoroutines coro-attributes.cpp

[SampleProf] Handle coro wrapper function name canonicalization (#174881)

Fix an issue where `FunctionSamples::getCanonicalFnName` incorrectly
canonicalizes coro await suspense wrapper functions to collide with the
coro function itself. This causes the sample annotation to skip coro
function. Canonicalization strips everything comes after the first dot
(.), unless the function attribute
"sample-profile-suffix-elision-policy" is set to "selected", in which
case it strips after the known suffixes. The wrapper function name has
the suffix of ".__await_suspend_wrapper__" + await_kind. Add the
attribute to wrapper function so that the suffix is not stripped.
DeltaFile
+31-0llvm/test/Transforms/SampleProfile/coro-annotate.ll
+4-0clang/test/CodeGenCoroutines/coro-attributes.cpp
+2-0llvm/test/Transforms/SampleProfile/Inputs/coro-annotate.prof
+1-0clang/lib/CodeGen/CGCoroutine.cpp
+38-04 files

LLVM/project aba7d72mlir/lib/Conversion/AMDGPUToROCDL AMDGPUToROCDL.cpp, mlir/test/Conversion/AMDGPUToROCDL amdgpu-to-rocdl.mlir

[mlir][amdgpu] gfx1250+ lower fat_raw_pointer_cast (#175047)

* numRecords are set to all 1s if out of bounds is not requested.
* set flags correctly to zero.
DeltaFile
+45-26mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+30-2mlir/test/Conversion/AMDGPUToROCDL/amdgpu-to-rocdl.mlir
+75-282 files

LLVM/project 51c37f4llvm/include/llvm/ProfileData SampleProf.h, llvm/lib/CodeGen PseudoProbeInserter.cpp

[CodeGen] Strip Coroutine suffixes when generating pseudo probe (#173834)

CoroSplit pass now creates separate DWARF symbols with the `.resume`,
`.destroy`, `.cleanup` suffixes.
https://github.com/llvm/llvm-project/pull/141889 But pseudo probes are
created in an earlier pass (`SampleProfileProbePass`) before the
CoroSplit, which creates a mismatch of Function GUIDs between the
original function name and the function names with the coroutine
suffixes during the CodeGen when the AsmPrinter iterates through the
`InlinedAt` chain and generates the `InlineStack`.

This will create mismatched pseudo probes in the final binary and
llvm-profgen will also fail when parsing the pseudo probe section. This
fix simply strips the coroutine suffixes from the inline callers' name,
so the CoroSplit changes will be transparent.
DeltaFile
+340-0llvm/test/Transforms/SampleProfile/pseudo-probe-coro-debug-fix.ll
+7-0llvm/include/llvm/ProfileData/SampleProf.h
+5-0llvm/lib/CodeGen/PseudoProbeInserter.cpp
+4-0llvm/lib/CodeGen/AsmPrinter/PseudoProbePrinter.cpp
+1-0llvm/lib/CodeGen/AsmPrinter/CMakeLists.txt
+357-05 files