How to use CRD loss for training on large scale dataset

Hi @AberHu ,

Thank you very much for your work. 

Now I am working on using CRD loss for training my face recognition model, while your work and the original paper show that CRD loss with KL divergence for distillation is better than the others at this moment. But I found that it needs 2 memory buffers, which make it unfeasible when the dataset is really huge. So I wonder if there is a softer way to implement this. 

Hope for your reply. Thanks. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use CRD loss for training on large scale dataset #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to use CRD loss for training on large scale dataset #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions