我有一个巨大的“范围”数据,这个数据是GenomicRanges
格式的,如果我把它转换成data.frame,下面是一个例子:
file <- "seqnames start end width strand
chr1 2 5 4 *
chr2 3 7 5 *"
file<-read.table(text=file,header=T)
我想将这个“范围”分解到各个位置,比如下面的例子:
file2 <- "seqnames Position
chr1 2
chr1 3
chr1 4
chr1 5
chr2 3
chr2 4
chr2 5
chr2 6
chr2 7"
file2 <- read.table(text=file2,header=T)
我该怎么做呢?
发布于 2017-02-22 13:02:27
如果使用Bioconductor GenomicRanges,则
> GPos(GRanges(c("chr1:2-5", "chr2:3-7")))
GPos object with 9 positions and 0 metadata columns:
seqnames pos strand
<Rle> <integer> <Rle>
[1] chr1 2 *
[2] chr1 3 *
[3] chr1 4 *
[4] chr1 5 *
[5] chr2 3 *
[6] chr2 4 *
[7] chr2 5 *
[8] chr2 6 *
[9] chr2 7 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
也许是第一个
GRanges(file)
发布于 2017-02-22 13:14:15
我们可以使用data.table
library(data.table)
setDT(file)[, .(position = start:end), by = seqnames]
# seqnames position
# 1: chr1 2
# 2: chr1 3
# 3: chr1 4
# 4: chr1 5
# 5: chr2 3
# 6: chr2 4
# 7: chr2 5
# 8: chr2 6
# 9: chr2 7
https://stackoverflow.com/questions/42392140
复制相似问题