一个用于处理fastq测序文件的命令行小工具,功能还在不断更新中,子命令也不多,支持gzip压缩文件的输入和输出(结果文件名以.gz结尾,结果会自动压缩)。
https://github.com/sharkLoc/fqkit
cargo install fqkit
fqkit: A simple program for fastq file manipulation
Version: 0.2.17
Authors: sharkLoc <mmtinfo@163.com>
Usage: fqkit [OPTIONS] <COMMAND>
Commands:
topn get first N records from fastq file
subfq subsample sequences from big fastq file [aliases: sample]
trim trim fastq file
search search reads/motifs from fastq file
stats summary for fastq format file [aliases: stat]
size report the number sequences and bases
plot line plot for A T G C N percentage in read position
fq2fa translate fastq to fasta
fq2sam converts a fastq file to an unaligned SAM file
flatten flatten fastq sequences
barcode split barcode for PE reads
remove remove reads by read name
reverse get a reverse-complement of fastq file [aliases: rev]
split split interleaved fastq file
merge merge PE reads as interleaved fastq file
split2 split fastq file by records number
gcplot get GC content result and plot
view view fastq file page by page
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
Global Arguments:
-v, --verbosity <VERBOSE> control verbosity of logging, possible values: {error, warn, info, debug, trace} [default: debug]
Global FLAGS:
-q, --quiet be quiet and do not show extra information
输出一个fq文件的前N个reads,-n 参数指定数量; -q参数关闭日志
从一个fq文件中随机抽取指定数量的reads数(蓄水池算法),如果是超大文件且抽取的read数很多可以指定-r参数节省内存,但是会增加耗时;-q参数关闭日志
从fq文件中搜索含有目标pattern/motif的reads,参数-p指定pattern/motif(需要大写),支持正则表达式传入模式;-q参数关闭日志
统计fq文件基本信息,包括每个cycle每个位置测序质量分数的计数
summary.txt:基本信息汇总:
read average length: 126
read max length: 126
total gc content(%): 57.52
total read count: 2000
total base count: 252000
base A count: 53864 (21.37%)
base T count: 53136 (21.09%)
base G count: 70989 (28.17%)
base C count: 73967 (29.35%)
base N count: 44 (0.02%)
Number of base calls with quality value of 20 or higher (Q20+) (%) 237670 (94.31%)
Number of base calls with quality value of 30 or higher (Q30+) (%) 223461 (88.67%)
cycle.txt: 每个cycle每个位置测序质量分数的计数
stats命令结果的可视化,可以输出png和svg格式的图片:
添加参数-s还可以在终端上显示每个位置ATGCN的含量比例:
fq文件转fasta格式
fq文件转sam格式
快速计算fq文件reads、和各种碱基数量
混库测序按照barcode序列拆分个体样本
从fq文件中按照read name移除reads,参数-n指定含有read name的文件,一行一个,且不包含read name前缀符号@
将PE测序的reads交替合并成一个fq文件
merge命令的逆操作
输出fq文件的gc含量结果并作图
指定参数-s可在终端上显示GC含量分布的柱状图
参数-o指定输出GC含量文件,从GC含量0%到100%范围内每个百分比下的reads的数量和比例
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。