ART (Agent Reinforcement Trainer) 是一个开源的强化学习框架,专注于训练能够处理多步任务的智能体。项目核心价值在于:
项目状态活跃,最新版本持续更新中。
启动训练集群(SkyPilot):
./scripts/launch-cluster.sh
运行2048示例:
import art
model = art.TrainableModel(
name="qwen2.5-14b-instruct",
project="demo",
base_model="Qwen/Qwen2.5-14B-Instruct"
)
await model.register(backend)
主要组件:
TrainableModel
: 可训练模型封装LocalBackend
: 本地训练后端Trajectory
: 智能体轨迹数据类RULER
: 自动奖励评分系统async def ruler_score_group(group, model_name):
"""使用LLM自动评分轨迹组"""
prompt = build_scoring_prompt(group.trajectories)
response = await llm_completion(prompt, model_name)
return parse_scores(response)
def build_scoring_prompt(trajectories):
"""构建评分提示"""
return f"""
评估以下轨迹,根据任务完成度打分(0-1):
{json.dumps([t.to_dict() for t in trajectories])}
"""
async def train_loop(model, env, steps=1000):
for step in range(steps):
# 收集轨迹
trajectories = await gather_trajectories(model, env)
# 使用RULER评分
scored = await ruler_score_group(trajectories, "o3")
# 训练步骤
await model.train(scored)
# 定期评估
if step % 10 == 0:
eval_results = await evaluate(model, test_env)
log_to_wandb(eval_results)
async def deploy_to_cloud(model, step):
"""部署模型到云端推理服务"""
deployment = await deploy_model(
deploy_to="together",
model=model,
step=step,
wait_for_completion=True
)
return deployment.endpoint
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。