Questions on Automatic Data Collection and Labeling in Data Drift #681

qyy2003 · 2025-02-04T13:07:58Z

qyy2003
Feb 4, 2025

Data drift can degrade model performance. Small models deployed on mobile and edge devices suffer more than large foundation models.

A common mitigation strategy is to fine-tune and redeploy the model. However, relying on experts to manually collect, label, and fine-tune models at regular intervals is impractical. Both model monitoring and fine-tuning require access to ground truth, raising the critical question: How can we automate data collection and, more importantly, labeling?

Note: The approach discussed in Chapter 6: Data Engineering still appears to require human annotators, and Chapter 14: On-Device Learning does not address this issue.

profvjreddi · 2026-03-23T13:01:01Z

profvjreddi
Mar 23, 2026
Maintainer

Good catch @qyy2003. You've found a real gap in the current coverage.

You're right that Ch6 assumes human annotators and Ch14 doesn't address automated labeling for drift. This is an active research area. A few directions worth knowing about:

Programmatic labeling (Snorkel-style) where you write heuristics or use knowledge bases to generate noisy labels, then learn to denoise them. Self-training where you use the model's confident predictions as pseudo-labels for fine-tuning. Active learning where you strategically pick the most informative samples for human review instead of labeling everything. And increasingly, foundation model distillation where a big cloud model labels your edge data so you can fine-tune the small model.

The fundamental tension is you need labels to detect and fix drift, but drift means your existing model (your best automatic labeler) is becoming unreliable. Classic chicken-and-egg, and that's what makes it a rich research question.

Thanks for the thoughtful feedback. This is the kind of thing that helps us improve the book.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on Automatic Data Collection and Labeling in Data Drift #681

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Questions on Automatic Data Collection and Labeling in Data Drift #681

Uh oh!

qyy2003 Feb 4, 2025

Replies: 1 comment

Uh oh!

profvjreddi Mar 23, 2026 Maintainer

qyy2003
Feb 4, 2025

profvjreddi
Mar 23, 2026
Maintainer