[TAN-7519] Add error handling and retry logic for PDF import jobs by AchrafGoVocal · Pull Request #13615 · CitizenLabDotCo/citizenlab

AchrafGoVocal · 2026-04-10T14:25:32Z

Summary

Makes PDF imports resilient to transient LLM failures and gives admins a clearer view of what happened when things go wrong. Builds on top of the tracker-based progress work from TAN-7380.

Changes

Retry transient LLM errors (`IdeaPdfImportJob`)

perform_retries true with a RETRYABLE_ERRORS allowlist: RubyLLM::RateLimitError, OverloadedError, ServerError, ServiceUnavailableError. Everything else falls through to the non-retry path.
handle_error — retries allowlisted errors up to Que's maximum_retry_count, then stops. When retries are exhausted (or the error is non-retryable), the job:
- Counts unprocessed files (unprocessed_files_count) and pushes them through track_progress_and_complete!(remaining, remaining) so the tracker hits total and the frontend stops spinning.
- Reports the error via ErrorReporter with the phase_id in the extras.
- Calls expire so Que stops retrying.
Idempotent retries — added next if IdeaImport.exists?(file_id: file.id) at the top of the per-file loop so that when Que retries a batch, files that already succeeded on a previous attempt are skipped.
after_success only when work happened — skips the bulk_import_succeeded activity when the batch produced zero ideas (e.g. all files were already imported on a previous retry attempt).

Misc backend fixes

IdeaXlsxImportJob — fixed track_progress_and_complete! being called with no arguments on the error path, so the failed chunk now properly advances the tracker by 1 with an error count of 1 (was silently stuck before).

Importer UI — per-job error counts

ImportStatus component — now takes an errorCount prop and, when the import has errors, shows "Imported N of M inputs. K could not be imported due to errors." instead of the generic "Errors occurred during the import". errorCount is wired up from latestJob?.attributes.error_count in ReviewSection.
Copy — existing errorImporting message is kept as a fallback; added a new errorImportingWithCounts message for the counted variant.

Changelog

Added

[TAN-7519] Added retry handling for transient LLM errors during PDF imports and surfaced per-job error counts in the importer UI.

Fixed

[TAN-7519] Fixed XLSX import failures not advancing the progress tracker.

For translators

notion-workspace · 2026-04-10T14:25:38Z

Better error handling and retry logic

cl-dev-bot · 2026-04-10T14:29:37Z

	Messages
📖	Changelog provided 🎉
📖	Notion issue: TAN-7519
📖	Run the e2e tests
📖	Check translation progress

Generated by 🚫 dangerJS against e5fac3a

-error-handling-and-retry-formsync

…com:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

amanda-anderson

FE looks good to me 👍

…com:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

…om:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

…com:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

adessy

Getting there! 💪

adessy · 2026-04-22T14:04:20Z

+      QueJob
+        .where('id = :id OR data @> :data', id: root_job_id, data: { root_job_id: root_job_id }.to_json)


We could extract this as a private jobs method.

adessy · 2026-04-22T14:08:24Z

+    def job_errors
+      QueJob
+        .where('id = :id OR data @> :data', id: root_job_id, data: { root_job_id: root_job_id }.to_json)
+        .where(finished_at: nil) # Filter out jobs that are not finished yet since they might be retried and the error might be resolved in a retry.


expired_at is set on failing jobs that they run out of retries.

Suggested change

.where(finished_at: nil) # Filter out jobs that are not finished yet since they might be retried and the error might be resolved in a retry.

.where.not(expired_at: nil)

adessy · 2026-04-22T14:14:39Z

+      QueJob
+        .where('id = :id OR data @> :data', id: root_job_id, data: { root_job_id: root_job_id }.to_json)
+        .where(finished_at: nil) # Filter out jobs that are not finished yet since they might be retried and the error might be resolved in a retry.
+        .where.not(last_error_message: nil)


This is redundant with the filter in filter_map. In any case, if they are failed, there should be an error message.

Suggested change

.where.not(last_error_message: nil)

adessy · 2026-04-22T14:19:48Z

  )

  attribute :job_type, &:root_job_type
+  attribute :errors, &:job_errors


I still don't understand how those error messages get localized by the front-end, since they're essentially just the string representation of the Ruby error (e.to_s or similar, most likely).

I checked and it's something like: "#{e.class}: #{e.message}".slice(0, 500)

adessy · 2026-04-22T14:36:03Z


    self.priority = 60
-    perform_retries false
+    perform_retries true


This can be removed. It's true by default.

adessy · 2026-04-23T09:38:07Z

Here's the code we drafted during our call this morning. It's not perfect, but I think it's about as good as we can get without reworking parts of the job system.

private def handle_error(error) case error when *RETRYABLE_ERRORS then super else expire end end def expire error = que_target.que_error finalize_foobar(idea_import_files, import_user, phase, error) super end def finalize_foobar(idea_files, user, phase, error) SideFxBulkImportService.new.after_failure(user, phase, 'idea', 'pdf', error.to_s) remaining = count_missing_imports(idea_files) track_progress(remaining, remaining) if remaining > 0 complete_if_done! end def count_missing_imports(idea_files) file_ids = idea_files.map(&:id) file_ids.size - IdeaImport.where(file_id: file_ids).count end

[TAN-7519] Add import progress tracking to the frontend

0ea8bc4

Translations updated by CI (extract-intl)

c74b2b2

AchrafGoVocal changed the title ~~[TAN-7519] Add import progress tracking to the frontend~~ [TAN-7519] Add error handling and retry logic for PDF import jobs Apr 10, 2026

Merge branch 'TAN-7380-formsync-better-progress-indicator' into TAN-7519

9d56fc0

-error-handling-and-retry-formsync

AchrafGoVocal added this to the FormSync milestone Apr 13, 2026

AchrafGoVocal added 3 commits April 13, 2026 10:49

Merge branch 'TAN-7380-formsync-better-progress-indicator' of github.…

c483f68

…com:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

Merge branch 'TAN-7380-formsync-better-progress-indicator' of github.…

4f1a974

…com:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

[TAN-7380] Remove unused message

38cc010

AchrafGoVocal mentioned this pull request Apr 14, 2026

[TAN-7380] Formsync better progress indicator #13606

Open

Translations updated by CI (extract-intl)

8fae5a7

AchrafGoVocal requested a review from adessy April 14, 2026 07:53

AchrafGoVocal assigned amanda-anderson and AchrafGoVocal and unassigned amanda-anderson Apr 14, 2026

AchrafGoVocal requested a review from amanda-anderson April 14, 2026 08:43

AchrafGoVocal added the waiting for translations label Apr 14, 2026

AchrafGoVocal marked this pull request as ready for review April 14, 2026 08:43

amanda-anderson approved these changes Apr 14, 2026

View reviewed changes

AchrafGoVocal added 10 commits April 15, 2026 09:12

Merge branch 'TAN-7380-formsync-better-progress-indicator' of github.…

08972ae

…com:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

Merge branch 'TAN-7519-error-handling-and-retry-formsync' of github.c…

78590f8

…om:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

[TAN-7380] Fix missing error count in the hook

a3b1bcd

[TAN-7519] Include the job errors in the serializer

9ad411e

[TAN-7519] Use case statement for error handling in PDF import job

f8d07dd

Merge branch 'TAN-7380-formsync-better-progress-indicator' of github.…

bd79722

…com:CitizenLabDotCo/citizenlab into TAN-7519-error-handling-and-retry-formsync

[TAN-7519] Clarify arguments[0] usage in PDF import job

f1594d0

[TAN-7519] Move failure side effects into handle_error

2d938bf

[TAN-7519] Fix failing specs

043a6e4

[TAN-7519] Fix failing specs 2

27dc680

AchrafGoVocal added 2 commits April 21, 2026 14:17

[TAN-7519] Clean up deadcode

542d75f

[TAN-7519] Fix small spec issue

e5fac3a

adessy reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TAN-7519] Add error handling and retry logic for PDF import jobs#13615

[TAN-7519] Add error handling and retry logic for PDF import jobs#13615
AchrafGoVocal wants to merge 19 commits intoTAN-7380-formsync-better-progress-indicatorfrom
TAN-7519-error-handling-and-retry-formsync

AchrafGoVocal commented Apr 10, 2026 •

edited

Loading

Uh oh!

notion-workspace Bot commented Apr 10, 2026

Uh oh!

cl-dev-bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

amanda-anderson left a comment

Uh oh!

adessy left a comment

Uh oh!

adessy Apr 22, 2026

Uh oh!

adessy Apr 22, 2026

Uh oh!

adessy Apr 22, 2026 •

edited

Loading

Uh oh!

adessy Apr 22, 2026

Uh oh!

adessy Apr 23, 2026

Uh oh!

adessy Apr 22, 2026

Uh oh!

adessy Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		QueJob
		.where('id = :id OR data @> :data', id: root_job_id, data: { root_job_id: root_job_id }.to_json)

	.where(finished_at: nil) # Filter out jobs that are not finished yet since they might be retried and the error might be resolved in a retry.
	.where.not(expired_at: nil)

Conversation

AchrafGoVocal commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Retry transient LLM errors (IdeaPdfImportJob)

Misc backend fixes

Importer UI — per-job error counts

Changelog

Added

Fixed

For translators

Uh oh!

notion-workspace Bot commented Apr 10, 2026

Uh oh!

cl-dev-bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amanda-anderson left a comment

Choose a reason for hiding this comment

Uh oh!

adessy left a comment

Choose a reason for hiding this comment

Uh oh!

adessy Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

adessy Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

adessy Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adessy Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

adessy Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

adessy Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

adessy Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AchrafGoVocal commented Apr 10, 2026 •

edited

Loading

Retry transient LLM errors (`IdeaPdfImportJob`)

cl-dev-bot commented Apr 10, 2026 •

edited

Loading

adessy Apr 22, 2026 •

edited

Loading