fix(backend): preserve conversations when LLM returns empty title (#5668) by Surya-Midde · Pull Request #6675 · BasedHardware/omi

Surya-Midde · 2026-04-15T19:25:22Z

Summary

Fixes #5668 — P1 data-loss bug where valid conversations were silently discarded when the LLM returned an empty title.

Two-part fix, split across two commits for reviewability:

fix(backend) — root-cause signal fix. _get_conversation_obj was re-deriving discarded from structured.title == ''. The authoritative discarded bool is already correctly computed by _get_structured via should_discard_conversation, and is used correctly downstream for folder assignment and analytics. It was just being dropped at the _get_conversation_obj call boundary. This commit passes it explicitly and removes the fragile title-based proxy. Default value is False so any future caller that forgets to pass it preserves the conversation rather than silently dropping it.
feat(backend) — deterministic fallback title. Even with silent-discard protection, an empty title still renders as "" in the UI and breaks search indexing. When the main transcript path's LLM call returns empty title for a transcript ≥100 words, a deterministic fallback is applied (preferring overview first-sentence with a ≥2-word guard, then a locale-independent date label, then "Untitled Conversation"). No retry LLM call — intentionally cheap.

Scope

Fix (1) protects all 5 return paths of _get_structured from silent discard (main transcript, force-process/reprocess, external-integration audio/message/other).
Fix (2) is scoped to the main transcript path only. External integrations typically get titles from their source systems (Limitless, iOS Shortcuts, etc.); force-process preserves the existing title as an LLM hint. Those paths are protected from data loss by fix (1) but don't get cosmetic fallback titles — keeping the PR narrow.

Files

backend/utils/conversations/process_conversation.py — signature change, new helper, fallback trigger, caller update
backend/tests/unit/test_empty_title_no_discard.py — new — 8 pure unit tests
backend/test.sh — register new test file

Note on line numbers

The issue cites some stale line numbers (conversation.py:169-170, process_conversation.py:209). Since the issue was filed, Structured was extracted into its own module (models/structured.py) and the title == '' check is now at process_conversation.py:222. The bug itself is unchanged.

Implementation details worth knowing

Locale-independent date formatting: uses a hard-coded English month-abbreviation tuple rather than strftime('%b %d, %Y'), which is locale-sensitive (returns \"avr.\" on fr_FR.UTF-8). This makes the fallback and its tests deterministic on any developer's machine.
2-word minimum on overview first sentence: avoids noisy one-word titles like \"Dr\" (from "Dr. Smith met with...") or \"Scored 3\" (from "Scored 3.5 on...").
Pure unit tests: all 8 tests avoid mocking _get_structured. Stubbing of heavy transitive imports (Firestore, Pinecone, RAG, etc.) follows the same pattern as test_process_conversation_usage_context.py.

Out of scope

Retry LLM title generation (extra cost for an edge case)
Making Structured.title a required Pydantic field (would break ≥6 placeholder construction sites in transcribe.py, merge_conversations.py, conversation_processing.py)
Fallback titles for external-integration and force-process paths

Potential follow-up (for post-merge, not this PR)

Add a `logger.info(...)` at the fallback trigger site for observability — would let maintainers spot LLM regressions via log volume without waiting for user reports.

Test plan

`pytest tests/unit/test_empty_title_no_discard.py -v` → 8/8 passing
`pytest tests/unit/test_conversation_model_split.py -v` → unchanged, existing assertions (`Structured().title == ''` default) still pass
`bash test.sh` run locally (new test registered)
`black --line-length 120 --skip-string-normalization` applied

🤖 Generated with Claude Code

) The `_get_conversation_obj` helper re-derived `discarded` from `structured.title == ''`, which silently dropped valid conversations whenever the LLM returned an empty title. The authoritative `discarded` signal was already being computed correctly by `_get_structured` (via `should_discard_conversation`) but was being thrown away at the call boundary. This change passes the authoritative `discarded` bool explicitly to `_get_conversation_obj` and removes the fragile title-based proxy. Default value is `False` so any future caller that forgets to pass a value preserves the conversation rather than silently dropping it. Adds three regression tests covering: 1. empty title + discarded=False preserves the conversation 2. empty title + discarded=True legitimately marks it discarded 3. omitted discarded param defaults to False (safe default) Fixes #5668 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…onversations (#5668) Complements the signal-fix commit: even with silent-discard protection, an empty LLM title still renders as '' in the UI and breaks search indexing. This commit adds a deterministic fallback title generator invoked on the main transcript path when the LLM returns an empty title for a transcript >=100 words. Preference order for the fallback: 1. First sentence of the LLM-generated overview (if present and <=60 chars) 2. Date-stamped label ("Conversation on Apr 15, 2026") 3. Static fallback "Untitled Conversation" No retry LLM call — intentionally cheap. External integration and force-process paths are not affected (they have their own title sources) and stay out of scope for this PR. Adds four helper tests (all pure). Fixes #5668 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-04-15T19:28:20Z

Greptile Summary

This PR fixes issue #5668 where conversations were silently discarded when the LLM returned an empty title. The fix has two parts: (1) _get_conversation_obj now accepts an explicit discarded parameter instead of re-deriving it from structured.title == '', and (2) a deterministic fallback title is applied on the main transcript path when the LLM returns an empty title for a transcript ≥100 words. Eight new unit tests are included and registered in test.sh.

Confidence Score: 5/5

Safe to merge — the data-loss bug is correctly fixed across all code paths with no new regressions introduced.

All three branches of _get_conversation_obj correctly propagate the explicit discarded parameter. The only call site in the codebase has been updated. The Structured.title field defaults to '' so .strip() is safe. The fallback title logic is deterministic and locale-independent. The single remaining finding is a P2 docstring inaccuracy about the "Scored 3" edge case — it does not affect runtime behavior.

No files require special attention; the docstring inaccuracy in process_conversation.py lines 109–111 is cosmetic only.

Important Files Changed

Filename	Overview
backend/utils/conversations/process_conversation.py	Root-cause fix: replaces `discarded = structured.title == ''` with explicit parameter; adds `_fallback_title_for_empty_llm_response`; one P2 docstring inaccuracy about the "Scored 3" guard
backend/tests/unit/test_empty_title_no_discard.py	8 pure unit tests covering the discard-signal fix and fallback title helper; module stubbing pattern mirrors existing tests; all cases are correct
backend/test.sh	New test file registered in CI runner; no issues

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[process_conversation] --> B["_get_structured(uid, lang, conv, force_process)"]
    B --> C{conversation.source?}
    C -->|external_integration / workflow| D{text_source?}
    D -->|audio| E[get_transcript_structure]
    D -->|message| F[get_message_structure]
    D -->|other| G[summarize_experience_text]
    E & F & G --> H["return structured, discarded=False"]
    C -->|native transcript| I{force_process?}
    I -->|yes| J[get_reprocess_transcript_structure]
    J --> K["return structured, discarded=False"]
    I -->|no| L[should_discard_conversation]
    L -->|discarded=True| M["return Structured(emoji=...), True"]
    L -->|discarded=False| N[get_transcript_structure]
    N --> O{title empty AND transcript >= 100 words?}
    O -->|yes| P["_fallback_title_for_empty_llm_response()"]
    O -->|no| Q["return structured, False"]
    P --> Q
    H & K & M & Q --> R["_get_conversation_obj(uid, structured, conv, discarded=discarded)"]
    R --> S{isinstance?}
    S -->|CreateConversation| T["Conversation(discarded=discarded)"]
    S -->|ExternalIntegrationCreateConversation| U["Conversation(discarded=discarded)"]
    S -->|Conversation| V["conv.discarded = discarded"]
    T & U & V --> W[return Conversation]

_{Reviews (1): Last reviewed commit: "feat(backend): fallback title for empty ..." | Re-trigger Greptile}

greptile-apps · 2026-04-15T19:28:24Z

+      * The 2-word minimum avoids producing noisy one-word titles when overview
+        begins with an abbreviation ("Dr. Smith met..." → first_sentence=="Dr")
+        or a decimal ("Scored 3.5..." → first_sentence=="Scored 3").


Misleading docstring — 2-word guard does not prevent "Scored 3"

The comment claims the 2-word minimum avoids "Scored 3" (from "Scored 3.5 on..."), but "Scored 3" is 2 words and passes the >= 2 check. Tracing the logic: "Scored 3.5 on...".split('.') → ['Scored 3', '5 on...'], so first_sentence = 'Scored 3' which has len(...split()) == 2 and len == 8 <= 60 — it is returned as the title. The guard only prevents 1-word extractions (e.g. "Dr" from "Dr. Smith..."). The decimal-split case should either be handled explicitly or removed from the docstring to avoid misleading future maintainers.

Suggested change

* The 2-word minimum avoids producing noisy one-word titles when overview

begins with an abbreviation ("Dr. Smith met..." → first_sentence=="Dr")

or a decimal ("Scored 3.5..." → first_sentence=="Scored 3").

* The 2-word minimum avoids producing noisy one-word titles when overview

begins with an abbreviation ("Dr. Smith met..." → first_sentence=="Dr").

Note: decimal-split cases ("Scored 3.5..." → first_sentence=="Scored 3")

still produce a 2-word fallback title; they are not filtered by this guard.

Surya-Midde · 2026-04-15T19:30:54Z

Closing in favor of #6676 — same commits, pushed from my primary GitHub account (@middesurya). Sorry for the noise!

github-actions · 2026-04-15T19:31:15Z

Hey @Surya-Midde 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

Project standards — Ensuring consistency across the codebase
User needs — Making sure changes align with what our users need
Code best practices — Maintaining code quality and maintainability
Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

middesurya and others added 2 commits April 15, 2026 13:34

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

Surya-Midde closed this Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend): preserve conversations when LLM returns empty title (#5668)#6675

fix(backend): preserve conversations when LLM returns empty title (#5668)#6675
Surya-Midde wants to merge 2 commits intoBasedHardware:mainfrom
Surya-Midde:fix/5668-empty-llm-title-discard

Surya-Midde commented Apr 15, 2026

Uh oh!

greptile-apps Bot commented Apr 15, 2026

Uh oh!

greptile-apps Bot Apr 15, 2026

Uh oh!

Surya-Midde commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Surya-Midde commented Apr 15, 2026

Summary

Scope

Files

Note on line numbers

Implementation details worth knowing

Out of scope

Potential follow-up (for post-merge, not this PR)

Test plan

Uh oh!

greptile-apps Bot commented Apr 15, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Surya-Midde commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants