推荐一个Github项目:Morizeyao/GPT2-Chinese
Chinese version of GPT2 training code, using BERT tokenizer.
作者是AINLP交流群里的杜则尧同学,提供了一份中文的GPT2训练代码,使用BERT的Tokenizer。可以写诗,新闻,小说,或是训练通用语言模型。支持字为单位或是分词模式。支持大语料训练。推荐Star,项目链接,点击阅读原文可以直达:
https://github.com/Morizeyao/GPT2-Chinese
以下来在该项目主页描述。
@misc{GPT2-Chinese,
author = {Zeyao Du},
title = {GPT2-Chinese: Tools for training GPT2 model in Chinese language},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Morizeyao/GPT2-Chinese}},
}
生成样例可以在Github页面查看,点击阅读原文直达。