fix(backend): preserve conversations when LLM returns empty title (#5668)#6675
fix(backend): preserve conversations when LLM returns empty title (#5668)#6675Surya-Midde wants to merge 2 commits intoBasedHardware:mainfrom
Conversation
) The `_get_conversation_obj` helper re-derived `discarded` from `structured.title == ''`, which silently dropped valid conversations whenever the LLM returned an empty title. The authoritative `discarded` signal was already being computed correctly by `_get_structured` (via `should_discard_conversation`) but was being thrown away at the call boundary. This change passes the authoritative `discarded` bool explicitly to `_get_conversation_obj` and removes the fragile title-based proxy. Default value is `False` so any future caller that forgets to pass a value preserves the conversation rather than silently dropping it. Adds three regression tests covering: 1. empty title + discarded=False preserves the conversation 2. empty title + discarded=True legitimately marks it discarded 3. omitted discarded param defaults to False (safe default) Fixes #5668 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…onversations (#5668) Complements the signal-fix commit: even with silent-discard protection, an empty LLM title still renders as '' in the UI and breaks search indexing. This commit adds a deterministic fallback title generator invoked on the main transcript path when the LLM returns an empty title for a transcript >=100 words. Preference order for the fallback: 1. First sentence of the LLM-generated overview (if present and <=60 chars) 2. Date-stamped label ("Conversation on Apr 15, 2026") 3. Static fallback "Untitled Conversation" No retry LLM call — intentionally cheap. External integration and force-process paths are not affected (they have their own title sources) and stay out of scope for this PR. Adds four helper tests (all pure). Fixes #5668 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR fixes issue #5668 where conversations were silently discarded when the LLM returned an empty title. The fix has two parts: (1) Confidence Score: 5/5Safe to merge — the data-loss bug is correctly fixed across all code paths with no new regressions introduced. All three branches of No files require special attention; the docstring inaccuracy in Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[process_conversation] --> B["_get_structured(uid, lang, conv, force_process)"]
B --> C{conversation.source?}
C -->|external_integration / workflow| D{text_source?}
D -->|audio| E[get_transcript_structure]
D -->|message| F[get_message_structure]
D -->|other| G[summarize_experience_text]
E & F & G --> H["return structured, discarded=False"]
C -->|native transcript| I{force_process?}
I -->|yes| J[get_reprocess_transcript_structure]
J --> K["return structured, discarded=False"]
I -->|no| L[should_discard_conversation]
L -->|discarded=True| M["return Structured(emoji=...), True"]
L -->|discarded=False| N[get_transcript_structure]
N --> O{title empty AND transcript >= 100 words?}
O -->|yes| P["_fallback_title_for_empty_llm_response()"]
O -->|no| Q["return structured, False"]
P --> Q
H & K & M & Q --> R["_get_conversation_obj(uid, structured, conv, discarded=discarded)"]
R --> S{isinstance?}
S -->|CreateConversation| T["Conversation(discarded=discarded)"]
S -->|ExternalIntegrationCreateConversation| U["Conversation(discarded=discarded)"]
S -->|Conversation| V["conv.discarded = discarded"]
T & U & V --> W[return Conversation]
Reviews (1): Last reviewed commit: "feat(backend): fallback title for empty ..." | Re-trigger Greptile |
| * The 2-word minimum avoids producing noisy one-word titles when overview | ||
| begins with an abbreviation ("Dr. Smith met..." → first_sentence=="Dr") | ||
| or a decimal ("Scored 3.5..." → first_sentence=="Scored 3"). |
There was a problem hiding this comment.
Misleading docstring — 2-word guard does not prevent "Scored 3"
The comment claims the 2-word minimum avoids "Scored 3" (from "Scored 3.5 on..."), but "Scored 3" is 2 words and passes the >= 2 check. Tracing the logic: "Scored 3.5 on...".split('.') → ['Scored 3', '5 on...'], so first_sentence = 'Scored 3' which has len(...split()) == 2 and len == 8 <= 60 — it is returned as the title. The guard only prevents 1-word extractions (e.g. "Dr" from "Dr. Smith..."). The decimal-split case should either be handled explicitly or removed from the docstring to avoid misleading future maintainers.
| * The 2-word minimum avoids producing noisy one-word titles when overview | |
| begins with an abbreviation ("Dr. Smith met..." → first_sentence=="Dr") | |
| or a decimal ("Scored 3.5..." → first_sentence=="Scored 3"). | |
| * The 2-word minimum avoids producing noisy one-word titles when overview | |
| begins with an abbreviation ("Dr. Smith met..." → first_sentence=="Dr"). | |
| Note: decimal-split cases ("Scored 3.5..." → first_sentence=="Scored 3") | |
| still produce a 2-word fallback title; they are not filtered by this guard. |
|
Closing in favor of #6676 — same commits, pushed from my primary GitHub account (@middesurya). Sorry for the noise! |
|
Hey @Surya-Midde 👋 Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request. After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:
Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out. Thank you for being part of the Omi community! 💜 |
Summary
Fixes #5668 — P1 data-loss bug where valid conversations were silently discarded when the LLM returned an empty title.
Two-part fix, split across two commits for reviewability:
fix(backend)— root-cause signal fix._get_conversation_objwas re-derivingdiscardedfromstructured.title == ''. The authoritativediscardedbool is already correctly computed by_get_structuredviashould_discard_conversation, and is used correctly downstream for folder assignment and analytics. It was just being dropped at the_get_conversation_objcall boundary. This commit passes it explicitly and removes the fragile title-based proxy. Default value isFalseso any future caller that forgets to pass it preserves the conversation rather than silently dropping it.feat(backend)— deterministic fallback title. Even with silent-discard protection, an empty title still renders as""in the UI and breaks search indexing. When the main transcript path's LLM call returns empty title for a transcript ≥100 words, a deterministic fallback is applied (preferring overview first-sentence with a ≥2-word guard, then a locale-independent date label, then"Untitled Conversation"). No retry LLM call — intentionally cheap.Scope
_get_structuredfrom silent discard (main transcript, force-process/reprocess, external-integration audio/message/other).Files
backend/utils/conversations/process_conversation.py— signature change, new helper, fallback trigger, caller updatebackend/tests/unit/test_empty_title_no_discard.py— new — 8 pure unit testsbackend/test.sh— register new test fileNote on line numbers
The issue cites some stale line numbers (
conversation.py:169-170,process_conversation.py:209). Since the issue was filed,Structuredwas extracted into its own module (models/structured.py) and thetitle == ''check is now atprocess_conversation.py:222. The bug itself is unchanged.Implementation details worth knowing
strftime('%b %d, %Y'), which is locale-sensitive (returns\"avr.\"onfr_FR.UTF-8). This makes the fallback and its tests deterministic on any developer's machine.\"Dr\"(from "Dr. Smith met with...") or\"Scored 3\"(from "Scored 3.5 on...")._get_structured. Stubbing of heavy transitive imports (Firestore, Pinecone, RAG, etc.) follows the same pattern astest_process_conversation_usage_context.py.Out of scope
Structured.titlea required Pydantic field (would break ≥6 placeholder construction sites intranscribe.py,merge_conversations.py,conversation_processing.py)Potential follow-up (for post-merge, not this PR)
Test plan
🤖 Generated with Claude Code