CareGPT (关怀GPT)是一个医疗大语言模型,同时它集合了数十个公开可用的医疗微调数据集和开放可用的医疗大语言模型,包含LLM的训练、测评、部署等以促进医疗LLM快速发展。
特性:
项目地址:
关注微信公众号 datayx 然后回复 caregpt 即可获取。
conda create -n llm python=3.11
conda activate llm
python -m pip install -r requirements.txt
LLaMA模型下载:https://blog.csdn.net/u014297502/article/details/129829677# 转为HF格式
python -m transformers.models.llama.convert_llama_weights_to_hf \
--input_dir path_to_llama_weights--model_size 7B --output_dir path_to_llama_model
2.数据配置
数据集配置、PT、SFT、RW数据格式
dataset_info
如果您使用自定义数据集,请务必在 dataset_info.json 文件中以如下格式提供您的数据集定义。
其中 prompt
和 response
列应当是非空的字符串。query
列的内容将会和 prompt
列拼接作为模型输入。history
列应当是一个列表,其中每个元素是一个字符串二元组,分别代表用户请求和模型答复。
.txt
格式,一行一个无监督数据。
Machine learning (ML) is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks.
Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, agriculture, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.