Skip to content

[Feature][Zeta]Add built-in runtime dynamic MetadataProvider for datasource registration #10816

@chl-wxp

Description

@chl-wxp

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

After the Metadata SPI was introduced, SeaTunnel supports resolving datasource and table metadata through a pluggable MetadataProvider.

Currently, this mechanism is mainly suitable for external metadata systems, such as Gravitino, DataHub, Atlas, or other catalog services. However, for Zeta engine runtime scenarios, there is also a lightweight requirement:

Users or external platforms may want to register datasource connection properties dynamically through Zeta REST API, and then reference the registered datasource in job configuration by metadata_datasource_id.

Therefore, this issue proposes to add a built-in runtime dynamic MetadataProvider for Zeta engine.

This provider is not designed to replace external metadata services. It is a lightweight runtime datasource metadata provider for Zeta REST-based integration scenarios.

Motivation

In many platform-based or REST-based job submission scenarios, users do not want to repeatedly write datasource connection parameters in every SeaTunnel job configuration.

For example, without metadata datasource support, users need to write connection properties in every job:

source {
  Jdbc {
    url = "jdbc:mysql://127.0.0.1:3306/test"
    user = "root"
    password = "******"
    driver = "com.mysql.cj.jdbc.Driver"
    query = "select * from user"
  }
}

This has several problems:

  1. Datasource connection parameters are duplicated in many job configs.
  2. Sensitive information such as password may be exposed in job configs.
  3. External platforms need to generate complete connector configs every time.
  4. It is not convenient for REST API based job submission.
  5. It is not friendly for low-code platforms or AI-generated job configurations.

With a built-in dynamic metadata provider, users can register datasource connection properties once through Zeta REST API, and then reference the datasource by metadata_datasource_id in job configs.

Example:

source {
  Jdbc {
    metadata_datasource_id = "mysql_001"
    query = "select * from user"
  }
}

The dynamic MetadataProvider will resolve mysql_001 to the actual datasource connection properties.

Proposal

Add a built-in dynamic metadata provider for Zeta engine.

Example configuration in seatunnel.yaml:

seatunnel:
  metadata:
    enabled: true
    kind: dynamic
    dynamic:
      enable_storage: false

Configuration description:

Option Description
seatunnel.metadata.enabled Whether to enable metadata provider
seatunnel.metadata.kind Metadata provider type. For this proposal, the value is dynamic
seatunnel.metadata.dynamic.enable_storage Whether to enable local persistent storage. In the first version, only false is supported

In the first version, enable_storage = false means datasource metadata is only available during the current Zeta runtime. After Zeta engine restarts, the registered datasource metadata will be lost.

REST API

Add Zeta REST API to register datasource metadata dynamically.

Create datasource

POST /metadata/datasource

Request body example:

{
  "datasource_id": "mysql_001",
  "type": "jdbc",
  "properties": {
    "url": "jdbc:mysql://127.0.0.1:3306/test",
    "user": "root",
    "password": "******",
    "driver": "com.mysql.cj.jdbc.Driver"
  }
}

Field description:

Field Description
datasource_id Unique datasource identifier. It will be referenced by metadata_datasource_id in job configs
type Datasource or connector type, such as jdbc, mysql-cdc, postgres-cdc, kafka, etc.
properties Connector-specific connection properties, such as url, user, password, driver, etc.

After registering the datasource, users can reference it in job configuration:

source {
  Jdbc {
    metadata_datasource_id = "mysql_001"
    query = "select * from user"
  }
}

The dynamic metadata provider should resolve metadata_datasource_id = mysql_001 to the actual connector connection properties.

Scope of the first version

The first version should focus on the minimum complete workflow:

  1. Add a built-in dynamic type MetadataProvider.
  2. Support storing datasource metadata during Zeta runtime.
  3. Support registering datasource metadata through Zeta REST API.
  4. Support resolving datasource connection properties by metadata_datasource_id.
  5. Do not support local persistent storage in the first version.

The main workflow is:

Register datasource through Zeta REST API
        ↓
Dynamic MetadataProvider stores datasource metadata
        ↓
Job config references datasource by metadata_datasource_id
        ↓
Connector gets actual connection properties
        ↓
Job runs normally

Runtime storage

Although the first version does not support persistent storage, it should not simply store datasource metadata in a local JVM HashMap.

Zeta is a distributed engine. If the REST request is handled by node A, but the job is scheduled or executed by node B, node B may not be able to access datasource metadata stored only in node A's local memory.

Therefore, the first version should store datasource metadata in Zeta runtime cluster memory, for example by using Hazelcast distributed map or other existing runtime cluster storage.

The expected behavior is:

Datasource metadata is lost after Zeta engine restarts,
but it should be visible within the running Zeta cluster.

Non-goals

This issue does not aim to implement the following features in the first version:

  1. Local persistent storage.
  2. External metadata service integration.
  3. Full datasource management UI.
  4. Datasource permission model.
  5. Datasource version management.
  6. Credential encryption or secret management.
  7. Table schema metadata management.
  8. Metadata lineage management.

These features can be discussed and implemented in follow-up issues or PRs.

Open questions

1. Should the first version support GET, DELETE, and UPDATE APIs?

The minimum API required for the first version is:

POST /metadata/datasource

For better debugging and management experience, we may also consider adding:

GET /metadata/datasource/{datasource_id}
DELETE /metadata/datasource/{datasource_id}
PUT /metadata/datasource/{datasource_id}

This can be discussed during implementation.

2. Should SeaTunnel Client support datasource registration?

Besides REST API, we may also consider adding client-side commands in the future.

For example:

seatunnel.sh metadata datasource create --file datasource.json

or:

seatunnel.sh metadata datasource list
seatunnel.sh metadata datasource delete mysql_001

This is not required in the first version and can be implemented in a follow-up PR.

3. Should datasource_id be provided by user or generated by server?

For platform integration scenarios, user-provided datasource_id is more predictable because job configs can reference a stable datasource identifier directly.

For example:

metadata_datasource_id = "mysql_001"

Therefore, the first version may require users to provide datasource_id explicitly.

Compatibility

This proposal is based on the existing Metadata SPI.

It does not change the existing connector configuration behavior. Users can still configure datasource connection properties directly in job configs.

The dynamic metadata provider only provides an additional way to resolve datasource connection properties through metadata_datasource_id.

Benefits

This feature can bring the following benefits:

  1. Make Metadata SPI directly usable in Zeta runtime scenarios.
  2. Reduce duplicated datasource connection parameters in job configs.
  3. Improve REST API based job submission experience.
  4. Make SeaTunnel easier to integrate with external platforms.
  5. Provide a lightweight option without requiring external metadata services.
  6. Provide a foundation for future local storage, client commands, UI management, and permission control.
  7. Make it easier for low-code platforms or AI agents to generate SeaTunnel job configs.

Summary

This issue proposes to add a built-in runtime dynamic MetadataProvider for Zeta engine.

The provider allows users to register datasource connection properties through Zeta REST API and reference them in job configs by metadata_datasource_id.

This is not intended to replace external metadata systems such as Gravitino, DataHub, or Atlas. Instead, it provides a lightweight runtime datasource metadata provider for Zeta engine, especially for REST API based job submission and platform integration scenarios.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions