Skip to content

[DingTalk] Add file/image sending support and fix 5 bugs in DingTalk adapter #9149

@chenyf1010

Description

@chenyf1010

Feature Description

Add file and image sending support to the DingTalk platform adapter (gateway/platforms/dingtalk.py).

Currently the adapter only supports sending markdown text replies via the session webhook. It should also support sending:

  • Images (jpg, png, gif, bmp — up to 20MB)
  • Files (doc, docx, xls, xlsx, ppt, pptx, zip, pdf, rar — up to 20MB)
  • Voice messages (amr, mp3, wav — up to 2MB)

Motivation

  • Users want to receive generated documents, images, and other files from the agent through DingTalk
  • The DingTalk API already supports these message types via session webhook
  • The media upload permission (/media/upload) is default-enabled for enterprise internal apps — no additional permission needed
  • Other platforms (Discord, Telegram) already support file sending

Proposed Solution

  1. Upload flow: Use DingTalk's /media/upload API to upload files and obtain media_id
  2. Send flow: Send image/file/voice message types via session webhook (same endpoint, different msgtype)
  3. Integration: Hook into the existing send() method or add a send_file() method

Session webhook already accepts these payloads:

// Image
{"msgtype": "image", "image": {"media_id": "@xxx"}}

// File  
{"msgtype": "file", "file": {"media_id": "@xxx"}}

Bugs Found During Setup

Also found 5 bugs in the current DingTalk adapter:

Bug 1: start() should be start_forever()

asyncio.to_thread(self._stream_client.start) only gets a coroutine without executing it. The stream client exits immediately. Should use start_forever().

Bug 2: process() must be async

The SDK's ChatbotHandler.process() is an async method. The override must also be async, otherwise raw_process fails with object tuple can't be used in 'await' expression.

Bug 3: _extract_text() wrong type handling

message.text is a TextContent object (has .content attr), not a dict or str. Current code produces garbled output like TextContent(content=hello) instead of extracting hello.

Bug 4: CallbackMessage vs ChatbotMessage

The SDK passes CallbackMessage to process(), not ChatbotMessage. Need to extract from CallbackMessage.data via ChatbotMessage.from_dict(data).

Bug 5: Webhook URL regex too strict

_DINGTALK_WEBHOOK_RE only matches api.dingtalk.com, but session webhooks use oapi.dingtalk.com. Fix: r'^https://(api|oapi)\.dingtalk\.com/'

Environment

  • Hermes Agent (latest)
  • DingTalk Stream Mode (dingtalk-stream SDK 0.24.3)
  • Docker on Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions