前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用Arrow管理数据

使用Arrow管理数据

原创
作者头像
生信探索
修改2023-04-17 17:36:23
4380
修改2023-04-17 17:36:23
举报
文章被收录于专栏:生信探索生信探索

在之前的数据挖掘:是时候更新一下TCGA的数据了推文中,保存TCGA的数据就是使用Arrow格式,因为占空间小,读写速度快,多语言支持(我主要使用的3种语言都支持)

Format

https://arrow.apache.org

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Language Supported

Arrow's libraries implement the format and provide building blocks for a range of use cases, including high performance analytics. Many popular projects use Arrow to ship columnar data efficiently or as the basis for analytic engines.

Libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

Ecosystem

Apache Arrow is software created by and for the developer community. We are dedicated to open, kind communication and consensus decisionmaking. Our committers come from a range of organizations and backgrounds, and we welcome all to participate with us.

R

代码语言:text
复制
install.packages("arrow")
library(arrow)
# write iris to iris.arrow and compressed by zstd
arrow::write_ipc_file(iris,'iris.arrow', compression =  "zstd",compression_level=1)
# read iris.arrow as DataFrame
iris=arrow::read_ipc_file('iris.arrow')

python

代码语言:Python
复制
# conda install -y pandas pyarrow
import pandas as pd
# read iris.arrow as DataFrame
iris=pd.read_feather('iris.arrow')
# write iris to iris.arrow and compressed by zstd
iris.to_feather('iris.arrow',compression='zstd', compression_level=1)

Julia

代码语言:text
复制
using Pkg
Pkg.add(["Arrow","DataFrames"])

using Arrow, DataFrames
# read iris.arrow as DataFrame
iris = Arrow.Table("iris.arrow") |> DataFrame
# write iris to iris.arrow, using 8 threads and compressed by zstd
Arrow.write("iris.arrow",iris,compress=:zstd,ntasks=8)

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Format
  • Language Supported
  • Ecosystem
  • R
  • python
  • Julia
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档