Skip to content

Commit 27896e1

Browse files
authored
Merge pull request #2548 from Open-Earth-Foundation/on-5594-multilingual-support
ON-5594: Add multilingual explanations and translation safeguards
2 parents 63cca6e + df51d42 commit 27896e1

60 files changed

Lines changed: 10968 additions & 5510 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

hiap-meed/.env.example

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ HIAP_MEED_FREE_TEXT_EXCLUSIONS_MODEL=
4141
HIAP_MEED_EXPLANATIONS_ENABLED=true
4242
# model name used by the explanation stage
4343
HIAP_MEED_EXPLANATIONS_MODEL=
44+
# model name used for explanation translation
45+
HIAP_MEED_EXPLANATION_TRANSLATIONS_MODEL=
4446

4547
# required for OpenAI-backed features
4648
OPENAI_API_KEY=

hiap-meed/README.md

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
`hiap-meed` is a synchronous FastAPI service that implements the MEED prioritization pipeline. It sits between the CityCatalyst frontend and the upstream Global API, fetching city context and action data before running a configurable scoring pipeline.
44

55
See [`docs/service-architecture.md`](docs/service-architecture.md) for the full system diagram.
6+
See [`docs/prioritization-accuracy-initial-benchmark.md`](docs/prioritization-accuracy-initial-benchmark.md) for the planned validation mechanism of ranking quality.
67

78
## Repository layout
89

@@ -45,6 +46,7 @@ HIAP_MEED_FREE_TEXT_EXCLUSIONS_ENABLED=false
4546
HIAP_MEED_FREE_TEXT_EXCLUSIONS_MODEL=
4647
HIAP_MEED_EXPLANATIONS_ENABLED=true
4748
HIAP_MEED_EXPLANATIONS_MODEL=
49+
HIAP_MEED_EXPLANATION_TRANSLATIONS_MODEL=
4850
OPENAI_API_KEY=
4951
OPENAI_TIMEOUT_SECONDS=30
5052
OPENAI_MAX_RETRIES=3
@@ -68,7 +70,8 @@ Variables:
6870
- `OPENAI_API_KEY`: API key used by OpenAI-backed features
6971
- `OPENAI_TIMEOUT_SECONDS`: shared OpenAI client timeout in seconds (default `30`)
7072
- `HIAP_MEED_EXPLANATIONS_ENABLED`: global switch for post-ranking explanation calls
71-
- `HIAP_MEED_EXPLANATIONS_MODEL`: model name used when `createExplanations=true`
73+
- `HIAP_MEED_EXPLANATIONS_MODEL`: model name used for canonical explanation generation when `createExplanations=true`
74+
- `HIAP_MEED_EXPLANATION_TRANSLATIONS_MODEL`: model name used for explanation translation
7275
- `OPENAI_MAX_RETRIES`: shared OpenAI client retries (default `3`)
7376

7477
### 2. Install dependencies
@@ -92,6 +95,7 @@ Verify the service:
9295
- Health check: `curl http://localhost:8000/health`
9396
- OpenAPI docs: `http://localhost:8000/docs`
9497
- Prioritization endpoint: `POST /v1/prioritize`
98+
- Explanation translation endpoint: `POST /v1/explanations/translate`
9599
- Exclusion preview endpoint: `POST /v1/prioritize/exclusions/preview`
96100

97101
### External API contracts (modeled, integration pending)
@@ -124,7 +128,10 @@ Request body:
124128
- Single-city and multi-city payloads both use `requestData.cityDataList`.
125129
- Optional flag: `requestData.createExplanations` controls whether the post-ranking
126130
explanation stage is executed.
127-
- `requestData.requestedLanguages` is currently accepted as a list for frontend compatibility, but ranked-action explanations support only one returned language today. The backend uses the first list item as the explanation language and ignores the rest.
131+
- `requestData.requestedLanguages` controls which explanation languages the backend attempts to return.
132+
- Canonical explanation generation is always English.
133+
- If non-English languages are requested, the backend generates English once and then translates from English into each requested target language.
134+
- Response metadata reports `generated_languages` as the languages actually present in the returned explanation payload.
128135

129136
Exclusions:
130137

@@ -219,8 +226,9 @@ Response fields:
219226
- `alignment_score` (`float`)
220227
- `feasibility_score` (`float`)
221228
- `evidence_summary` (`object`): compact explainability snapshot from hard-filter/impact/alignment/feasibility evidence
222-
- `explanation` (`string | null`): optional qualitative explanation text when `createExplanations=true`
229+
- `explanations` (`object`): optional explanation texts keyed by language code when `createExplanations=true`
223230
- `metadata` (`object`): request IDs, timings, counts, and hard-filter evidence.
231+
- `warnings` (`string[]`): human-readable translation warnings when canonical English inputs appear non-English or mixed-language
224232

225233
Ranking details:
226234

@@ -234,11 +242,26 @@ Explanation stage behavior:
234242

235243
- Explanations are generated only when `requestData.createExplanations=true`.
236244
- Explanations are generated from post-ranking evidence and do not change ranks.
237-
- The explanation stage currently supports one output language only. It resolves that language from the first item in `requestData.requestedLanguages`.
245+
- Explanations are always authored canonically in English.
246+
- Requested non-English explanations are translations of the canonical English text.
247+
- In response metadata, `generated_languages` is the response-level union of explanation languages actually returned across `ranked_actions[].explanations`.
238248
- Explanations receive the selected `cityStrategicPreferenceCoBenefitKeys` directly.
249+
- If translation detects that a canonical explanation labeled as English appears non-English or mixed-language, translation still returns results and adds a warning to logs and the API response.
250+
- That language-check warning is determined internally per action, then aggregated by the backend into the public top-level `warnings` list returned by the API.
239251
- The backend logs a warning if the final explanation prompt becomes unusually large.
240252
- If explanation generation fails or times out, the endpoint fails open and
241-
returns normal ranking output with `explanation=null`.
253+
returns normal ranking output with `explanations={}`.
254+
255+
### 5. Call the explanation translation endpoint
256+
257+
- The endpoint accepts the frontend envelope `ExplanationTranslationApiRequest`.
258+
- `requestData.sourceLanguage` must be `en`.
259+
- `requestData.targetLanguages` must contain only non-English target languages.
260+
- `requestData.rankedActions[*]` includes:
261+
- `actionId`
262+
- `canonicalExplanation`
263+
- The endpoint is stateless: the frontend sends the canonical English explanations it wants translated.
264+
- The endpoint returns only the requested target-language translations, not the original English text.
242265

243266
Example JSON request bodies (using mock data from `data/`):
244267

@@ -363,9 +386,10 @@ Example response:
363386
"matched_city_gpc_refs_count": 2
364387
}
365388
},
366-
"explanation": null
389+
"explanations": {}
367390
}
368391
],
392+
"warnings": [],
369393
"metadata": {
370394
"internal_request_id": "d1db6269-4cf9-4d62-8f4c-8f4ce631fbd2",
371395
"frontend_request_id": "1234567890",
@@ -442,7 +466,14 @@ What each request run folder contains:
442466
- `llm/explanations_io.json`: explanation-stage LLM request/response artifact (only when explanations are generated successfully)
443467
- `llm/explanations_prompt.txt`: plain-text rendered user prompt with preserved newlines (only when explanations are generated successfully)
444468
- `llm/explanations_error.json`: explanation-stage failure artifact with request context and error (only when explanation generation fails)
445-
- Explanation artifacts and response metadata record both the original `requestedLanguages` list and the single resolved explanation language used for the run.
469+
- `llm/explanation_translations_io.json`: translation-stage LLM request/response artifact (only when translations are generated successfully)
470+
- `llm/explanation_translations_prompt.txt`: plain-text rendered translation prompt (only when translations are generated successfully)
471+
- `llm/explanation_translations_error.json`: translation-stage failure artifact with request context and error (only when translation fails)
472+
- Prioritization explanation artifacts and response metadata record the original `requestedLanguages`, canonical language `en`, generated languages actually returned in the response, and any translation warnings.
473+
- Explanation translation request folders additionally include:
474+
- `llm/explanation_translations_io.json`
475+
- `llm/explanation_translations_prompt.txt`
476+
- Explanation translation artifacts record the source language contract, requested target languages, and any LLM language-check warnings.
446477
- For the direct other-preference feature, the `alignment` step detail includes evidence such as `resolved_preferred_co_benefits`, `matched_preferred_co_benefits`, and mapping source fields
447478
- The active request flow does not emit dedicated LLM prompt/response artifact files for Alignment because direct co-benefit selections are deterministic
448479
- Exclusion preview request folders additionally include:

0 commit comments

Comments
 (0)