NAACL同传Workshop:千言 - 机器同传

发布2021-01-05 14:40:55
发布2021-01-05 14:40:55

比赛背景 Background

同声传译结合了机器翻译(MT)、自动语音识别(ASR)和文本语音合成(TTS)等人工智能技术,在国际会议、商务谈判、新闻发言、法律诉讼和医学交流等众多场景都有广泛的应用,已发展成为一个前沿的研究领域。作为一个新兴的跨学科领域,同声传译未来将面临更多挑战。 Simultaneous translation, which performs translation concurrently with the source speech, is widely useful in many scenarios such as international conferences, business negotiations, press releases, legal proceedings and medical communications. It combines the AI technologies of machine translation (MT), automatic speech recognition (ASR) and text-to-speech synthesis (TTS),and is becoming a cutting-edge research field. As an emerging interdisciplinary field, Simultaneous translation will face more challenges in the future.

为了促进同声传译的发展,在ACL2020上,百度联合Google、Facebook、宾夕法尼亚大学、清华大学等顶尖机构和高校成功地举办了第一届同声传译研讨会,共邀请到6位主讲嘉宾,吸引了94名注册参与者。研讨会上同期发布的汉英同声翻译竞赛任务,共吸引了227名参赛者。该竞赛任务为参赛者提供了开放数据集:百度语音翻译语料库(BSTC),涵盖了信息技术、经济、文化、生物、艺术等多个领域的专题演讲。 In order to promote the development of simultaneous interpretation, Baidu, together with Google, Facebook, University of Pennsylvania, Tsinghua University and other top institutions and universities, successfully held the first automatic simultaneous translation workshop at ACL2020, which invited 6 keynote speakers and attracted 94 registered participants. A total of 227 participants participated in the competition. The competition task provides the participants with an open data set: Baidu speech translation corpus (BSTC), which covers the thematic speeches in the fields of information technology, economy, culture, biology, art and so on.

为了进一步推动机器同传技术的发展,在NAACL 2021上,将继续举办第二届同声传译研讨会,汇集了机器翻译、语音处理和人类口译领域的众多研究人员和实践者,共同讨论同声传译的最新进展和当下面临的突出挑战,包括: To further promote the development of simultaneous translation, we will host the 2nd automatic simultaneous translation workshop at NAACL2021, brings together many researchers and practitioners in the field of machine translation, speech processing, and human interpretation to discuss the latest progresses and current challenges, including:

· 同声传译范式: 在传统流水线(ASR-MT-TTS)或端到端(语音语音)框架下,如何构建高质量、低延迟的同传系统; · 数据资源: 如何高效运用训练同声翻译系统的大型高质量语料库; · 评价方法: 如何评价翻译质量和如何选取时间延迟指标; · 计算机辅助口译(CAI): 如何尽快提高人工翻译的效率和质量。

· Simultaneous translation paradigms: traditional pipeline (ASR-MT-TTS) or end-to-end (speech-speech); · Data resources: large and high-quality corpora for training simultaneous translation systems. · Evaluation methods: metrics to evaluate the translation quality and time latency; · Computer Aided Interpretation (CAI): improve the efficiency and quality of human interpreters.

赛程赛制 Schedule

时间 Time

赛程 Schedule

2020/12/28 00:00:00

正式启动注册报名,训练集数据开放下载 Registration and Release of data

2021/01/31 23:59:59

比赛报名截止 End of registration

2021/02/20 00:00:00

评测入口开启(即系统提交开始)System Submission

2021/03/07 23:59:59

评测入口关闭(即系统提交截止)System submission close

2021/03/15 00:00:00

系统描述论文提交,Workshop审稿启动 System Description Due

2021/04/15 00:00:00

论文录用结果通知 Notification of Acceptance

2021/04/26 00:00:00

论文Camera-Ready版本提交 Camera-ready Papers Due

参赛对象及要求 Participants and requirements

参赛对象 Participants: 本次竞赛面向全社会开放,不限年龄、身份、国籍,相关领域的个人、高等院校、科研机构、企业单位、初创团队等均可报名参赛。大赛主办单位中有机会提前接触赛题和数据的人员不得参加比赛,其他员工可以参与比赛排名,但不可领取任何奖项。 This competition is opened to the whole society, and has no restriction on age, identity and nationality. Individuals, institutions of higher learning, research institutions, enterprises and start-up teams in related fields can register for this competition. Those who have access to the task and data in advance cannot participate in the competition. Other employees can participate in the ranking of the competition but cannot receive any award.

参赛要求 Requirements: 支持以个人或团队形式参赛,每个参赛队伍人数最多不超过5人,允许跨单位自由组队,但每人只能参加一支队伍。 Individual or team participation is supported. The maximum number of participants in each team is 5. Cross-unit team is allowed, but each person can only join one team.

参赛方式及规则 Entry method and rules

(1) 所有参赛选手都必须在百度大脑AI Studio平台注册报名; (2) 参赛选手需确保注册时提交信息准确有效,所有的比赛资格及奖金支付均以提交信息为准; (3) 参赛选手报名后可在“我的团队”页面组队。每支队伍需指定一名队长,队伍成员总数最多不超过5人注意:报名截止日期之后不允许更改队伍成员名单,请谨慎选择队员组队; (4)队伍名的设定不得违反中国法律法规或社会公序良俗,且参赛队伍命名中不可出现“百度官方”,“飞桨官方”,“paddle官方”,“官方baseline”等字样。若命名违规的队伍在收到比赛主办方警告后仍未修改队伍名称,比赛主办方有权解散该队伍 ; (5) 每名参赛选手只能参加一支队伍,一旦发现某选手以注册多个账号的方式参加多支队伍,将取消相关队伍的参赛资格; (6) 除主办方提供的数据集外,参赛选手不得使用任何其他渠道的标注数据; (7) 参赛队伍可在参赛期间随时上传测试集的预测结果,每个队伍仅有一次提交机会,请各队伍谨慎提交预测结果,比赛管理系统会实时更新当前最新榜单排名情况; (8) 赛事交流QQ群:744528691。 [图片上传失败...(image-3cbcc2-1609430134777)]

(1)All contestants must register on the Baidu Brain AI Studio platform; (2)All contestants must ensure that the information submitted during registration is accurate and valid. All eligibility and bonus payment is based on the information submitted; (3)All contestants can form a team on the “My Team” page after registration. Each team must designate a team leader, and the total number of team members shall not exceed 5; Note: The list of team members is not allowed to be changed after the registration deadline. Please select each team member carefully; (4)The names of the teams shall not be set in violation of Chinese laws and regulations or public order and good customs, and the names of the participating teams shall not be used as “Baidu Official”, “Paddle official”, “official baseline” or other words. If the offending team fails to change its name after receiving a warning from the competition organizer, the competition organizer has the right to dismiss the team; (5) Each contestant can only join one team. Once a contestant is found to have joined more than one team by registering multiple accounts, the relevant team will be disqualified; (6)All contestants must not use the marked data from any other channel except the data set provided by the organizer; (7)The participating teams can upload the predicted results of the test set at any time during the competition. Each team has only one chance to submit the predicted results. All teams are requested to submit the predicted results cautiously; (8)QQ group: 744528691.

奖项设置 Award

本场比赛共分为3个赛道,每个赛道奖项设置如下: There are 3 tracks in this competition, and each track will generate one first, one second and one third prize as follows:

名称 Award

数量 Quantity

奖金 Bonus

一等奖 First prize



二等奖 Second prize



三等奖 Third prize



备注 Notes: (1)以上所有金额均为税前金额; (2)优秀参赛者还有机会在NAACL Workshop上直播分享或发表论文。

(1)All the above bonus are pre-tax amounts; (2)Excellent contestants will have the opportunity to share or publish their papers at NAACL Workshop.

比赛福利 More welfare

(1) 免费算力 Free computing power:

  • 百度大脑AI Studio为参赛选手免费发放100h Tesla V100 GPU算力卡,报名即可在数据下载页获得算力卡申请地址; Baidu Brain AI Studio will provide free 100h Tesla V100 GPU power card for all contestants. You can get the application address of the power card on the datasets tab after registration.
  • 每天在AI Studio上运行项目,当天即送10h GPU算力,每周最多赠送70h GPU算力。 You can get 10h GPU computing power every day by running project in AI Studio, and can get up to 70h GPU computing power every week.

(2) 官方基线 Official baseline:

  • 飞桨将提供可一键fork的官方基线,报名后即可在数据下载页获取。 PaddlePaddle will provide an official baseline for all contestants, which can be obtained on the datasets tab after registration.

参赛须知补充 Supplement

公平竞技: 参赛者禁止在比赛中抄袭他人作品、交换答案、使用多个小号,经发现将取消比赛成绩并严肃处理; 组织声明: 组委会保留对比赛规则、赛事安排进行调整和修改的权利、比赛作弊行为的判定权利和处置权利、收回或拒绝授予影响组织及公平性的参赛团队奖项的权利; 基线模型: 基线模型供参赛选手参考,可以选择在其基础上改进。参赛选手不能直接提交基线模型的预测结构;如果提交结构与基线模型预测结果高度相似,则将取消比赛成绩; 作品产权: 参赛作品(包含但不限于算法、模型等)知识产权归参赛选手所有,组委会有权将参赛作品、作品相关、参赛团队信息用于宣传品、相关出版物、指定及授权媒体发布、官方网站浏览及下载、展览(含巡展)等活动项目,大赛组织单位享有优先合作权利。

Fair competition: competitors are not allowed to copy others’ works, exchange answers or use more than one trumpet in the competition. If found guilty, the results will be cancelled and dealt with seriously. Organization statement: the organizing committee reserves the right to adjust and modify the rules of the competition, the arrangement of the competition, the right to judge and deal with cheating in the competition, and the right to withdraw or refuse awards to the participating teams that affect the organization and fairness. Baseline model: the baseline model is for the reference of competitors, and can be improved upon. Competitors cannot directly submit the prediction structure of the baseline model; If the submission structure is highly similar to the predicted results of the baseline model, the result will be cancelled. Property: entries (including, but not limited to algorithm, model, etc.) the intellectual property rights owned by the competitors, the organizing committee reserves the right to the entries, work related, team information used for propaganda materials, publications, designated and authorized media release, the official web site to browse and download, exhibitions (including tour) activities such as projects, the priority right to cooperation organization unit.

反作弊说明 Anti-cheating instructions

(1) 参与者禁止注册多账户报名,经发现将取消成绩并严肃处理。 (2) 参与者禁止在考核技术能力的范围外利用规则漏洞或技术漏洞等不良途径提高成绩排名,经发现将取消成绩并严肃处理。 (3) 可以接触到赛题相关数据的人员,其提交作品将不计入排行榜及评奖。 (4) AI Studio将收集选手信息以及代码、模型、系统报告用于成绩评定、比赛通知等相关比赛事项。

(1) Participants are forbidden to register for multiple accounts, and their scores will be cancelled and seriously dealt with if found. (2) Participants are prohibited from using rules loopholes or technical loopholes outside the scope of the assessment of technical ability to improve the performance ranking. If found, the performance will be cancelled and seriously dealt with. (3)For those who have access to relevant data of the competition, their submitted works will not be included in the rankings and awards. (4)AI Studio will collect contestants’ information, codes, models and system

