如何微调Florence-2以进行目标检测任务

Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license. The model demonstrates strong zero-shot and fine-tuning capabilities across tasks such as captioning, object detection, grounding, and segmentation. You can learn more about the capabilities of the pre-trained Florence model from our blog post.

Florence-2是微软在MIT许可证下开源的轻量级视觉语言模型。该模型在图像描述、目标检测、定位和分割等任务上展示了强大的零样本和微调能力。您可以从我们的博客文章中了解更多关于预训练Florence模型的能力。

Figure 1. Illustration showing the level of spatial hierarchy and semantic granularity expressed by each task. Source: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.

图 1. 说明每个任务所表达的空间层次和语义粒度水平。来源： Florence-2: 推进多种视觉任务的统一表示。

Like other pre-trained foundational models, Florence-2 may lack domain-specific knowledge. For example, it may perform poorly with medical or satellite imagery. In such cases, fine-tuning with a custom dataset is necessary. This tutorial will show you how to fine-tune Florence-2 on object detection datasets to improve model performance for your specific use case. Let's dive in!

与其他预训练基础模型一样，Florence-2 可能缺乏特定领域的知识。例如，它在医学或卫星图像上的表现可能较差。在这种情况下，需要使用自定义数据集进行微调。本教程将向您展示如何在目标检测数据集上微调 Florence-2，以提高模型在特定用例中的性能。让我们开始吧！

Figure 2 The result of Florence-2 inference on a validation subset of the custom dataset before fine-tuning.

图 2 Florence-2 在自定义数据集的验证子集上推理的结果，未进行微调。

Figure 3. The result of Florence-2 inference on a validation subset of the custom dataset after fine-tuning.

图 3. 在微调后，Florence-2 在自定义数据集的验证子集上的推理结果。

Getting Started

入门

Before we fine-tune the Florence-2 model on a custom detection dataset, we need to properly configure our environment. This tutorial is accompanied by a notebook that you can open in a separate tab and follow along.

在我们对自定义检测数据集微调 Florence-2 模型之前，需要正确配置我们的环境。本教程附带一个笔记本，您可以在单独的标签页中打开并跟随。

Open the notebook that accompanies this guide.

打开与本指南配套的笔记本。

Before we discuss the data format, model training, and evaluation, make sure your environment is GPU-accelerated. If you are using our Google Colab, ensure y...