Search before asking
Description
After the Metadata SPI was introduced, SeaTunnel supports resolving datasource and table metadata through a pluggable MetadataProvider.
Currently, this mechanism is mainly suitable for external metadata systems, such as Gravitino, DataHub, Atlas, or other catalog services. However, for Zeta engine runtime scenarios, there is also a lightweight requirement:
Users or external platforms may want to register datasource connection properties dynamically through Zeta REST API, and then reference the registered datasource in job configuration by metadata_datasource_id.
Therefore, this issue proposes to add a built-in runtime dynamic MetadataProvider for Zeta engine.
This provider is not designed to replace external metadata services. It is a lightweight runtime datasource metadata provider for Zeta REST-based integration scenarios.
Motivation
In many platform-based or REST-based job submission scenarios, users do not want to repeatedly write datasource connection parameters in every SeaTunnel job configuration.
For example, without metadata datasource support, users need to write connection properties in every job:
source {
Jdbc {
url = "jdbc:mysql://127.0.0.1:3306/test"
user = "root"
password = "******"
driver = "com.mysql.cj.jdbc.Driver"
query = "select * from user"
}
}
This has several problems:
- Datasource connection parameters are duplicated in many job configs.
- Sensitive information such as password may be exposed in job configs.
- External platforms need to generate complete connector configs every time.
- It is not convenient for REST API based job submission.
- It is not friendly for low-code platforms or AI-generated job configurations.
With a built-in dynamic metadata provider, users can register datasource connection properties once through Zeta REST API, and then reference the datasource by metadata_datasource_id in job configs.
Example:
source {
Jdbc {
metadata_datasource_id = "mysql_001"
query = "select * from user"
}
}
The dynamic MetadataProvider will resolve mysql_001 to the actual datasource connection properties.
Proposal
Add a built-in dynamic metadata provider for Zeta engine.
Example configuration in seatunnel.yaml:
seatunnel:
metadata:
enabled: true
kind: dynamic
dynamic:
enable_storage: false
Configuration description:
| Option |
Description |
seatunnel.metadata.enabled |
Whether to enable metadata provider |
seatunnel.metadata.kind |
Metadata provider type. For this proposal, the value is dynamic |
seatunnel.metadata.dynamic.enable_storage |
Whether to enable local persistent storage. In the first version, only false is supported |
In the first version, enable_storage = false means datasource metadata is only available during the current Zeta runtime. After Zeta engine restarts, the registered datasource metadata will be lost.
REST API
Add Zeta REST API to register datasource metadata dynamically.
Create datasource
POST /metadata/datasource
Request body example:
{
"datasource_id": "mysql_001",
"type": "jdbc",
"properties": {
"url": "jdbc:mysql://127.0.0.1:3306/test",
"user": "root",
"password": "******",
"driver": "com.mysql.cj.jdbc.Driver"
}
}
Field description:
| Field |
Description |
datasource_id |
Unique datasource identifier. It will be referenced by metadata_datasource_id in job configs |
type |
Datasource or connector type, such as jdbc, mysql-cdc, postgres-cdc, kafka, etc. |
properties |
Connector-specific connection properties, such as url, user, password, driver, etc. |
After registering the datasource, users can reference it in job configuration:
source {
Jdbc {
metadata_datasource_id = "mysql_001"
query = "select * from user"
}
}
The dynamic metadata provider should resolve metadata_datasource_id = mysql_001 to the actual connector connection properties.
Scope of the first version
The first version should focus on the minimum complete workflow:
- Add a built-in
dynamic type MetadataProvider.
- Support storing datasource metadata during Zeta runtime.
- Support registering datasource metadata through Zeta REST API.
- Support resolving datasource connection properties by
metadata_datasource_id.
- Do not support local persistent storage in the first version.
The main workflow is:
Register datasource through Zeta REST API
↓
Dynamic MetadataProvider stores datasource metadata
↓
Job config references datasource by metadata_datasource_id
↓
Connector gets actual connection properties
↓
Job runs normally
Runtime storage
Although the first version does not support persistent storage, it should not simply store datasource metadata in a local JVM HashMap.
Zeta is a distributed engine. If the REST request is handled by node A, but the job is scheduled or executed by node B, node B may not be able to access datasource metadata stored only in node A's local memory.
Therefore, the first version should store datasource metadata in Zeta runtime cluster memory, for example by using Hazelcast distributed map or other existing runtime cluster storage.
The expected behavior is:
Datasource metadata is lost after Zeta engine restarts,
but it should be visible within the running Zeta cluster.
Non-goals
This issue does not aim to implement the following features in the first version:
- Local persistent storage.
- External metadata service integration.
- Full datasource management UI.
- Datasource permission model.
- Datasource version management.
- Credential encryption or secret management.
- Table schema metadata management.
- Metadata lineage management.
These features can be discussed and implemented in follow-up issues or PRs.
Open questions
1. Should the first version support GET, DELETE, and UPDATE APIs?
The minimum API required for the first version is:
POST /metadata/datasource
For better debugging and management experience, we may also consider adding:
GET /metadata/datasource/{datasource_id}
DELETE /metadata/datasource/{datasource_id}
PUT /metadata/datasource/{datasource_id}
This can be discussed during implementation.
2. Should SeaTunnel Client support datasource registration?
Besides REST API, we may also consider adding client-side commands in the future.
For example:
seatunnel.sh metadata datasource create --file datasource.json
or:
seatunnel.sh metadata datasource list
seatunnel.sh metadata datasource delete mysql_001
This is not required in the first version and can be implemented in a follow-up PR.
3. Should datasource_id be provided by user or generated by server?
For platform integration scenarios, user-provided datasource_id is more predictable because job configs can reference a stable datasource identifier directly.
For example:
metadata_datasource_id = "mysql_001"
Therefore, the first version may require users to provide datasource_id explicitly.
Compatibility
This proposal is based on the existing Metadata SPI.
It does not change the existing connector configuration behavior. Users can still configure datasource connection properties directly in job configs.
The dynamic metadata provider only provides an additional way to resolve datasource connection properties through metadata_datasource_id.
Benefits
This feature can bring the following benefits:
- Make Metadata SPI directly usable in Zeta runtime scenarios.
- Reduce duplicated datasource connection parameters in job configs.
- Improve REST API based job submission experience.
- Make SeaTunnel easier to integrate with external platforms.
- Provide a lightweight option without requiring external metadata services.
- Provide a foundation for future local storage, client commands, UI management, and permission control.
- Make it easier for low-code platforms or AI agents to generate SeaTunnel job configs.
Summary
This issue proposes to add a built-in runtime dynamic MetadataProvider for Zeta engine.
The provider allows users to register datasource connection properties through Zeta REST API and reference them in job configs by metadata_datasource_id.
This is not intended to replace external metadata systems such as Gravitino, DataHub, or Atlas. Instead, it provides a lightweight runtime datasource metadata provider for Zeta engine, especially for REST API based job submission and platform integration scenarios.
Are you willing to submit a PR?
Code of Conduct
Search before asking
Description
After the Metadata SPI was introduced, SeaTunnel supports resolving datasource and table metadata through a pluggable
MetadataProvider.Currently, this mechanism is mainly suitable for external metadata systems, such as Gravitino, DataHub, Atlas, or other catalog services. However, for Zeta engine runtime scenarios, there is also a lightweight requirement:
Users or external platforms may want to register datasource connection properties dynamically through Zeta REST API, and then reference the registered datasource in job configuration by
metadata_datasource_id.Therefore, this issue proposes to add a built-in runtime dynamic
MetadataProviderfor Zeta engine.This provider is not designed to replace external metadata services. It is a lightweight runtime datasource metadata provider for Zeta REST-based integration scenarios.
Motivation
In many platform-based or REST-based job submission scenarios, users do not want to repeatedly write datasource connection parameters in every SeaTunnel job configuration.
For example, without metadata datasource support, users need to write connection properties in every job:
This has several problems:
With a built-in dynamic metadata provider, users can register datasource connection properties once through Zeta REST API, and then reference the datasource by
metadata_datasource_idin job configs.Example:
The dynamic
MetadataProviderwill resolvemysql_001to the actual datasource connection properties.Proposal
Add a built-in dynamic metadata provider for Zeta engine.
Example configuration in
seatunnel.yaml:Configuration description:
seatunnel.metadata.enabledseatunnel.metadata.kinddynamicseatunnel.metadata.dynamic.enable_storagefalseis supportedIn the first version,
enable_storage = falsemeans datasource metadata is only available during the current Zeta runtime. After Zeta engine restarts, the registered datasource metadata will be lost.REST API
Add Zeta REST API to register datasource metadata dynamically.
Create datasource
Request body example:
{ "datasource_id": "mysql_001", "type": "jdbc", "properties": { "url": "jdbc:mysql://127.0.0.1:3306/test", "user": "root", "password": "******", "driver": "com.mysql.cj.jdbc.Driver" } }Field description:
datasource_idmetadata_datasource_idin job configstypejdbc,mysql-cdc,postgres-cdc,kafka, etc.propertiesurl,user,password,driver, etc.After registering the datasource, users can reference it in job configuration:
The dynamic metadata provider should resolve
metadata_datasource_id = mysql_001to the actual connector connection properties.Scope of the first version
The first version should focus on the minimum complete workflow:
dynamictypeMetadataProvider.metadata_datasource_id.The main workflow is:
Runtime storage
Although the first version does not support persistent storage, it should not simply store datasource metadata in a local JVM
HashMap.Zeta is a distributed engine. If the REST request is handled by node A, but the job is scheduled or executed by node B, node B may not be able to access datasource metadata stored only in node A's local memory.
Therefore, the first version should store datasource metadata in Zeta runtime cluster memory, for example by using Hazelcast distributed map or other existing runtime cluster storage.
The expected behavior is:
Non-goals
This issue does not aim to implement the following features in the first version:
These features can be discussed and implemented in follow-up issues or PRs.
Open questions
1. Should the first version support GET, DELETE, and UPDATE APIs?
The minimum API required for the first version is:
For better debugging and management experience, we may also consider adding:
This can be discussed during implementation.
2. Should SeaTunnel Client support datasource registration?
Besides REST API, we may also consider adding client-side commands in the future.
For example:
or:
This is not required in the first version and can be implemented in a follow-up PR.
3. Should
datasource_idbe provided by user or generated by server?For platform integration scenarios, user-provided
datasource_idis more predictable because job configs can reference a stable datasource identifier directly.For example:
Therefore, the first version may require users to provide
datasource_idexplicitly.Compatibility
This proposal is based on the existing Metadata SPI.
It does not change the existing connector configuration behavior. Users can still configure datasource connection properties directly in job configs.
The dynamic metadata provider only provides an additional way to resolve datasource connection properties through
metadata_datasource_id.Benefits
This feature can bring the following benefits:
Summary
This issue proposes to add a built-in runtime dynamic
MetadataProviderfor Zeta engine.The provider allows users to register datasource connection properties through Zeta REST API and reference them in job configs by
metadata_datasource_id.This is not intended to replace external metadata systems such as Gravitino, DataHub, or Atlas. Instead, it provides a lightweight runtime datasource metadata provider for Zeta engine, especially for REST API based job submission and platform integration scenarios.
Are you willing to submit a PR?
Code of Conduct