在使用Transformers与Spacy 3.1集成时,可能会遇到一些问题,特别是因为API的变化和版本兼容性问题。以下是一个更新的示例代码,展示了如何在Spacy 3.1中使用Transformers。
首先,确保你已经安装了必要的依赖项:
pip install spacy
pip install transformers
pip install spacy-transformers
以下是一个示例,展示了如何在Spacy 3.1中使用Transformers:
import spacy
from spacy.tokens import DocBin
from spacy.training import Example
from spacy_transformers import TransformersLanguage, TransformersWordPiecer, TransformersTok2Vec
# 加载预训练的Transformers模型
transformer_model = "bert-base-uncased"
# 创建一个新的Spacy语言对象
nlp = spacy.blank("en")
# 添加Transformers组件到管道中
nlp.add_pipe("transformer", config={"model": transformer_model})
nlp.add_pipe("ner")
# 准备训练数据
train_data = [
("Apple is looking at buying U.K. startup for $1 billion", {"entities": [(0, 5, "ORG"), (27, 31, "GPE"), (44, 54, "MONEY")]}),
("San Francisco considers banning sidewalk delivery robots", {"entities": [(0, 13, "GPE")]}),
]
# 创建DocBin对象来存储训练数据
db = DocBin()
for text, annotations in train_data:
doc = nlp.make_doc(text)
ents = []
for start, end, label in annotations["entities"]:
span = doc.char_span(start, end, label=label)
if span is None:
print(f"Skipping entity: {text[start:end]}")
else:
ents.append(span)
doc.ents = ents
db.add(doc)
# 保存训练数据到磁盘
db.to_disk("./train.spacy")
# 加载训练数据
train_docs = DocBin().from_disk("./train.spacy").get_docs(nlp.vocab)
# 准备训练示例
train_examples = []
for doc in train_docs:
example = Example.from_dict(doc, {"entities": [(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]})
train_examples.append(example)
# 开始训练
optimizer = nlp.begin_training()
for i in range(10):
losses = {}
nlp.update(train_examples, sgd=optimizer, losses=losses)
print(f"Losses at iteration {i}: {losses}")
# 保存模型
nlp.to_disk("./model")
# 加载模型并测试
nlp = spacy.load("./model")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.label_)
spacy
、transformers
和spacy-transformers
。spacy.blank
创建一个新的Spacy语言对象。nlp.add_pipe
方法将Transformers组件添加到管道中。DocBin
对象存储。nlp.update
方法进行模型训练。spacy
、transformers
和spacy-transformers
的版本兼容。None
的情况,需要进行适当的错误处理。领取专属 10元无门槛券
手把手带您无忧上云