发布2020-07-22 09:51:07
GitHub Archive Program: the journey of the world’s open source code to the Arctic GitHub 存档程序:全球开放源代码到北极的旅程

At GitHub Universe 2019[1], we introduced the GitHub Archive Program[2] along with the GitHub Arctic Code Vault[3]. Our mission is to preserve open source software for future generations by storing your code in an archive built to last a thousand years.

在 GitHub Universe 2019上,我们介绍了 GitHub 存档程序和 GitHub Arctic Code Vault。我们的任务是为后代保存开源软件,把你的代码存储在一个可以保存一千年的档案库中。

On February 2, 2020, we took a snapshot of all active public repositories on GitHub to archive in the vault. Over the last several months, our archive partners Piql[4], wrote 21TB of repository data to 186 reels of piqlFilm (digital photosensitive archival film). Our original plan was for our team to fly to Norway and personally escort the world’s open source code to the Arctic, but as the world continues to endure a global pandemic, we had to adjust our plans. We stayed in close contact with our partners, waiting for the time when it was safe for them to travel to Svalbard. We’re happy to report that the code was successfully deposited in the Arctic Code Vault on July 8, 2020.

2020年2月2日,我们截取了 GitHub 上所有活跃的公共存储库的快照,将其存档。在过去的几个月里,我们的档案合作伙伴 Piql 为186卷 piqlFilm (数字感光档案胶片)写入了21tb 的存储数据。我们最初的计划是让我们的团队飞往挪威,亲自护送世界上的开源代码到北极,但是随着世界继续忍受着全球性的流行病,我们不得不调整我们的计划。我们与我们的合作伙伴保持密切联系,等待他们到达斯瓦尔巴群岛的安全时刻。我们很高兴地报告,该密码已于2020年7月8日成功存入北极密码库。

Join us as we follow the code in its journey to the Arctic, and take a look at a few other things we’ve been up to here at the GitHub Archive Program.

加入我们吧,我们将跟随代码进入北极之旅,来看看我们在 GitHub 档案项目中做过的其他一些事情。

The journey of the world’s open source code to the Arctic Circle 世界开源代码的旅程到北极圈

Your code’s journey begins in Piql’s facility in Drammen, Norway where the boxes with 186 film reels were shipped to Oslo Airport and then loaded into the belly of the plane which provides passenger service to Svalbard. Svalbard, roughly 600 miles (1000 km) north of the European mainland, just recently opened up[5] to visitors from countries within the Schengen Area and the European Economic Area.

你的代码的旅程从 Piql 在德拉门的设施开始,在那里,装有186卷胶卷的盒子被运到奥斯陆机场,然后装入飞机腹部,提供前往斯瓦尔巴群岛的客运服务。斯瓦尔巴群岛位于欧洲大陆以北约600英里(1000公里)处,最近刚刚对来自申根区和欧洲经济区国家的游客开放。

The code landed in Longyearbyen, a town of a few thousand people on Svalbard, where our boxes were met by a local logistics company and taken into intermediate secure storage overnight. The next morning, it traveled to the decommissioned coal mine set in the mountain, and then to a chamber deep inside hundreds of meters of permafrost, where the code now resides fulfilling their mission of preserving the world’s open source code for over 1,000 years.


Introducing the Arctic Code Vault Badge北极代码贡献者徽章

Millions of developers around the world contributed to the open source software now stored in the Arctic Code Vault. To recognize and celebrate these contributions, we designed the Arctic Code Vault Badge, which is shown in the highlights section of a developer’s profile on GitHub. Hover and you can discover some of the repositories an individual contributed to.

世界各地数以百万计的开发者为目前存储在北极代码库中的开源软件做出了贡献。为了表彰和庆祝这些贡献,我们设计了 Arctic Code Vault 徽章,它显示在 GitHub 上开发者个人资料的亮点部分。鼠标悬停,你可以发现一些个人贡献的仓库。

An update from our Archive Program partners来自我们档案项目合作伙伴的更新

Internet Archive互联网档案馆

The Internet Archive[6] is a well-known, widely beloved non-profit digital library which provides free public access to collections of digitized materials. In partnership with the GitHub Archive Program, the Internet Archive (IA) commenced its ongoing archive of GitHub public repositories on April 13 of this year. At present, IA is using a two-pronged approach. First, their well-known Wayback Machine is accessing and archiving raw GitHub data as WARCs, or Web ARChive files. As of this writing they have archived some 55TB of data[7]. Second, they have the goal of making entire archived GitHub repositories available via “git clone,” while also keeping repo comments, issues, and other metadata easily accessible on the web. This second initiative is well underway and initial archiving is expected to commence this month.

互联网档案馆图书馆是一个著名的,广受喜爱的非营利性数字图书馆,它提供免费的公众访问数字化资料的收藏。在与 GitHub Archive Program 的合作下,互联网档案馆档案馆于今年4月13日开始对 GitHub 公共档案馆进行档案管理。目前,内部审计师采用的是双管齐下的方式。首先,他们著名的 Wayback Machine 将访问和归档原始的 GitHub 数据作为 WARCs 或 Web ARChive 文件。截至本文撰写时,他们已经存档了大约55tb 的数据。其次,他们的目标是通过“ git 克隆”使整个归档的 GitHub 存储库可用,同时保持 repo 评论、问题和其他元数据在 web 上容易访问。第二项举措正在顺利进行,预计初步归档工作将于本月开始。

Software Heritage Foundation软件传统基金会

Software Heritage[8] is a non profit, multi-stakeholder initiative launched by Inria in collaboration with UNESCO[9] with the goal to collect, preserve and share the source code of our software commons. They already archive more than 130 million projects[10], with their full development history, and we are delighted to announce that 100 million of these are from GitHub. Thanks to the collaboration announced at GitHub Universe 2019[11], the archival engine is being improved with the goal to keep it up to speed with GitHub[12]‘s growth, but if the project you are interested in, or its latest version, is not archived yet, you do not need to wait, it’s easy to trigger its archival right now in a few clicks on https://save.softwareheritage.org[13].

“软件遗产”是一个非盈利、多方利益相关者的倡议,由 Inria 与联合国教科文组织合作发起,目的是收集、保存和共享我们的软件公共资源的源代码。他们已经存档了超过1.3亿个项目,以及他们完整的开发历史,我们很高兴地宣布,其中的1亿个项目来自 GitHub。多亏了 GitHub Universe 2019上宣布的合作,存档引擎得到了改进,目的是让它跟上 GitHub 的发展速度,但是如果你感兴趣的项目,或者它的最新版本,还没有存档,你不需要等待,现在点击 google https://save.softwareheritage.org 就可以轻松启动它的存档。

Project Silica二氧化硅项目

Project Silica[14] is developing the first storage technology designed and built from the media up for cloud-scale storage of long-lived data. By leveraging recent discoveries in ultrafast laser optics, data is stored in quartz glass, through a process that permanently changes the physical structure of the glass material. Quartz glass is a durable storage media that offers unparalleled data lifetimes of upwards of tens of thousands of years. It is resilient to electromagnetic interference, water, and heat, making it the ideal storage medium for ensuring the world’s open source software is forever preserved for future generations. As a partner in the GitHub Archive Program[15], Project Silica is committed to driving storage innovation, and developing a storage technology that addresses the need for a sustainable and reliable storage technology for the world’s long-lived data. We’ve archived 6,000 of the world’s most popular repositories as a proof of concept for future archives.

二氧化硅项目正在开发第一种存储技术,这种技术是为长期数据的云级存储而设计和建造的。通过利用最近在超快激光光学方面的发现,数据被储存在石英玻璃中,通过一个永久性地改变玻璃材料物理结构的过程。石英玻璃是一种耐用的存储介质,可提供数万年以上无与伦比的数据寿命。它对电磁干扰、水和热有很好的适应性,使其成为理想的存储介质,确保世界上的开源软件能够永久保存下来,留给后代使用。作为 GitHub 存档项目的合作伙伴,二氧化硅项目致力于推动存储创新,并开发一种存储技术,以满足为世界上长寿命数据提供可持续和可靠存储技术的需要。我们已经存档了6000个世界上最受欢迎的档案库,作为未来档案的概念验证。

What’s next? 下一步是什么?

Code, culture, history, and technology: The Tech Tree 代码、文化、历史和技术: 科技之树

Every reel of the archive includes a copy of the “Guide to the GitHub Code Vault” in five languages, written with input from GitHub’s community and available at the Archive Program’s own GitHub repository[16]. In addition, the archive will include a separate human-readable reel which documents the technical history and cultural context of the archive’s contents. We call this the Tech Tree.

每卷档案都有一份“ GitHub 代码库指南” ,用5种语言编写,由 GitHub 社区提供输入,可在档案程序自己的 GitHub 存储库中查阅。此外,档案还将包括一个单独的人类可读的卷轴,其中记录了档案内容的技术历史和文化背景。我们称之为科技树。

Inspired by the Long Now’s Manual for Civilization[17], the Tech Tree will consist primarily of existing works, selected to provide a detailed understanding of modern computing, open source and its applications, modern software development, popular programming languages, etc. It will also include works which explain the many layers of technical foundations that make software possible: microprocessors, networking, electronics, semiconductors, and even pre-industrial technologies. This will allow the archive’s inheritors to better understand today’s world and its technologies, and may even help them recreate computers to use the archived software.

受到 Long Now 的文明手册的启发,Tech Tree 将主要包括现有的作品,选择提供对现代计算、开放源码及其应用、现代软件开发、流行编程语言等的详细理解。它还将包括解释使软件成为可能的多层技术基础的作品: 微处理器、网络、电子、半导体,甚至工业化前的技术。这将使档案馆的继承者更好地了解当今的世界和它的技术,甚至可能帮助他们重建计算机来使用存档的软件。

Encapsulating the world’s cultural context and technical history is a challenging prospect, and we expect the Tech Tree to evolve and iterate over time. We will soon publish to the Archive Program’s GitHub repository a very initial draft list of works selected for the Tech Tree, along with, importantly, a request for community input. We look forward to incorporating ideas and suggestions from the GitHub community before the Tech Tree is added to the Arctic Code Vault.

封装世界的文化背景和技术历史是一个具有挑战性的前景,我们希望技术树能够随着时间的推移不断进化和迭代。我们很快就会向存档项目的 GitHub 存储库发布一个初步的技术树选择作品列表,以及一个重要的社区输入请求。我们期待着在 Tech Tree 被添加到 Arctic Code Vault 之前吸收 GitHub 社区的想法和建议。



