SigLIP 2：一个更好的多语言视觉语言编码器

I tried running the zeroshot classification example and got ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (input_idsin this case) have excessive nesting (inputs typelistwhere typeintis expected). transformers version=4.49.0.dev0

我尝试运行零样本分类示例，得到了 ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (input_idsin this case) have excessive nesting (inputs typelistwhere typeintis expected). transformers version=4.49.0.dev0

I tried adding both padding=True and truncation=True to no avail. i also tried padding="max_length"

我尝试同时添加 padding=True 和 truncation=True，但没有效果。我还尝试了 padding="max_length"

EDIT:
it seems to work if my labels are all the same length. doing some debugging, i see that in zero_shot_image_classification.py, the padding provided to the tokenizer is forced to be max_length anyway here (L148-149)

编辑：
如果我的标签长度相同，似乎可以正常工作。在进行一些调试时，我发现，在 zero_shot_image_classification.py 中，提供给分词器的填充被强制设置为 max_length（L148-149）

padding = "max_length" if self.model.config.model_type == "siglip" else True
text_inputs = self.tokenizer(sequences, return_tensors=self.framework, padding=padding, **tokenizer_kwargs)

and yet, if my labels have variable lengths, the outputs are not the same length, and so calling torch.tensor on that ultimately failsi did spot this warning in my terminal as well:

然而，如果我的标签长度不一致，输出的长度就不相同，因此调用 torch.tensor 最终会失败。我在终端中也发现了这个警告：

Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.

请求填充到 max_length，但未提供最大长度，并且模型没有预定义的最大长度。默认不填充。