Feature Description
Add file and image sending support to the DingTalk platform adapter (gateway/platforms/dingtalk.py).
Currently the adapter only supports sending markdown text replies via the session webhook. It should also support sending:
- Images (jpg, png, gif, bmp — up to 20MB)
- Files (doc, docx, xls, xlsx, ppt, pptx, zip, pdf, rar — up to 20MB)
- Voice messages (amr, mp3, wav — up to 2MB)
Motivation
- Users want to receive generated documents, images, and other files from the agent through DingTalk
- The DingTalk API already supports these message types via session webhook
- The media upload permission (
/media/upload) is default-enabled for enterprise internal apps — no additional permission needed
- Other platforms (Discord, Telegram) already support file sending
Proposed Solution
- Upload flow: Use DingTalk's
/media/upload API to upload files and obtain media_id
- Send flow: Send
image/file/voice message types via session webhook (same endpoint, different msgtype)
- Integration: Hook into the existing
send() method or add a send_file() method
Session webhook already accepts these payloads:
// Image
{"msgtype": "image", "image": {"media_id": "@xxx"}}
// File
{"msgtype": "file", "file": {"media_id": "@xxx"}}
Bugs Found During Setup
Also found 5 bugs in the current DingTalk adapter:
Bug 1: start() should be start_forever()
asyncio.to_thread(self._stream_client.start) only gets a coroutine without executing it. The stream client exits immediately. Should use start_forever().
Bug 2: process() must be async
The SDK's ChatbotHandler.process() is an async method. The override must also be async, otherwise raw_process fails with object tuple can't be used in 'await' expression.
Bug 3: _extract_text() wrong type handling
message.text is a TextContent object (has .content attr), not a dict or str. Current code produces garbled output like TextContent(content=hello) instead of extracting hello.
Bug 4: CallbackMessage vs ChatbotMessage
The SDK passes CallbackMessage to process(), not ChatbotMessage. Need to extract from CallbackMessage.data via ChatbotMessage.from_dict(data).
Bug 5: Webhook URL regex too strict
_DINGTALK_WEBHOOK_RE only matches api.dingtalk.com, but session webhooks use oapi.dingtalk.com. Fix: r'^https://(api|oapi)\.dingtalk\.com/'
Environment
- Hermes Agent (latest)
- DingTalk Stream Mode (dingtalk-stream SDK 0.24.3)
- Docker on Linux
Feature Description
Add file and image sending support to the DingTalk platform adapter (
gateway/platforms/dingtalk.py).Currently the adapter only supports sending markdown text replies via the session webhook. It should also support sending:
Motivation
/media/upload) is default-enabled for enterprise internal apps — no additional permission neededProposed Solution
/media/uploadAPI to upload files and obtainmedia_idimage/file/voicemessage types via session webhook (same endpoint, differentmsgtype)send()method or add asend_file()methodSession webhook already accepts these payloads:
Bugs Found During Setup
Also found 5 bugs in the current DingTalk adapter:
Bug 1:
start()should bestart_forever()asyncio.to_thread(self._stream_client.start)only gets a coroutine without executing it. The stream client exits immediately. Should usestart_forever().Bug 2:
process()must be asyncThe SDK's
ChatbotHandler.process()is an async method. The override must also be async, otherwiseraw_processfails withobject tuple can't be used in 'await' expression.Bug 3:
_extract_text()wrong type handlingmessage.textis aTextContentobject (has.contentattr), not a dict or str. Current code produces garbled output likeTextContent(content=hello)instead of extractinghello.Bug 4:
CallbackMessagevsChatbotMessageThe SDK passes
CallbackMessagetoprocess(), notChatbotMessage. Need to extract fromCallbackMessage.dataviaChatbotMessage.from_dict(data).Bug 5: Webhook URL regex too strict
_DINGTALK_WEBHOOK_REonly matchesapi.dingtalk.com, but session webhooks useoapi.dingtalk.com. Fix:r'^https://(api|oapi)\.dingtalk\.com/'Environment