工欲善其事必先利其器
书接上文:ANNOVAR —— 基因组变异注释利器
前面我们介绍了,annovar的基本用法,并输出了注释结果,今天我们进一步了解下注释所用到的数据库以及结果解读
~/software/annovar/table_annovar.pl homo_test.filter.vcf \
~/software/annovar/humandb/ -buildver hg38 \
-out annovar_test \
-remove \
-protocol refGene,cytoBand,exac03,avsnp150,dbnsfp42a,knownGene,clinvar_20221231\
-operation gx,r,f,f,f,g,f \
-nastring . -polish --vcfinput
protocal
参数中,我们用到了多个注释数据库,那么这些数据库都是什么?
refGene
refGeneWithVer
knownGene
ensGene
cytoBand:
dbNSFP 系列 (dbnsfp30a
, dbnsfp31a
, dbnsfp33a
, dbnsfp35a
, dbnsfp35c
, dbnsfp41a
, dbnsfp41c
, dbnsfp42a
, dbnsfp42c
)
revel
avsnp 系列 (avsnp142
, avsnp144
, avsnp147
, avsnp150
)
1000 Genomes (1000g2014oct
, 1000g2015aug
)
GnomAD (gnomad
, gnomad211
, gnomad30
, gnomad312
)
ExAC (exac03
, exac03nonpsych
, exac03nontcga
)
Kaviar
ESP6500
Intervar
COSMIC (cosmic70
)
Gene4Denovo(gene4denovo201907
)
M-Cap
NCI60
ICGC28
ClinVar
ABraOM
GME
仅列出部分,非全部。当然也可以自定义数据库,更多信息见:https://annovar.openbioinformatics.org/en/latest/user-guide/gene/#create-your-own-gene-definition-databases-for-non-human-species
挑一个位点查看其vcf文件信息
$cat homo_test.filter.vcf |sed -n "3402p;3428p"
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 1334174 . T C 6876.73 PASS AC=6;AF=1.00;AN=6;DP=253;ExcessHet=3.0103;FS=0.000;MLEAC=6;MLEAF=1.00;MQ=60.00;QD=27.84;SOR=0.895 GT:AD:DP:GQ:PL
查看其联合注释结果 xxx_multianno.txt
文件
$cat annovar_test.hg38_multianno.txt|grep -w 1334174
chr1 1334174 1334174 T C exonic TAS1R3 . nonsynonymous SNV TAS1R3:NM_152228:exon6:c.T2269C:p.C757R 1p36.33 0.9613 0.9873 0.988 0.9128 0.9366 0.9721 0.9321 0.9239 rs307377 1.0 0.010 T 1.0 0.012 T 0.0 0.029 B 0.0 0.014 B 0.048 0.233 N 1 0.090 P-2.125 0.001 N -2.22 0.870 D 4.08 0.001 N 0.028 0.006 -0.948 0.413 T 0.000 0.000 T 0.000 0.000 T . . .0.134 0.368 . . . . . . 0.292 0.092 T 0.088 0.381 T -0.489 0.007 T -0.368 0.372 T 0.003 0.000 T0.218 0.028 T .; .; 0.468 0.088 6.166 0.506 0.044 0.083 0.143 N 0.024 0.015 N -1.491 0.019 -1.411 0.031 1.000 0.463 0.421 0.064 0 . . 4.59 -0.839 0.102 0.360 0.200 -0.238 0.086 0.075 0.222 0.102 0.195 1.084 0.015 GPCR family 3, C-terminal|GPCR family 3, C-terminal DVL1|ANKRD65|TAS1R3|ANKRD65|TAS1R3|INTS11|INTS11|TAS1R3|ANKRD65|ANKRD65|C1QTNF12|MXRA8|ANKRD65|ANKRD65|MXRA8|ANKRD65|ANKRD65|SCNN1D|DVL1|ANKRD65|ATAD3C|INTS11|ANKRD65|AL391244.1|ANKRD65|AL391244.2|ANKRD65|INTS11|TAS1R3|DVL1|TAS1R3 Adipose_Subcutaneous|Adipose_Subcutaneous|Artery_Aorta|Artery_Aorta|Artery_Tibial|Brain_Cerebellar_Hemisphere|Brain_Cerebellum|Brain_Cerebellum|Brain_Cortex|Brain_Frontal_Cortex_BA9|Cells_Cultured_fibroblasts|Colon_Sigmoid|Colon_Sigmoid|Colon_Transverse|Esophagus_Muscularis|Esophagus_Muscularis|Heart_Left_Ventricle|Nerve_Tibial|Nerve_Tibial|Nerve_Tibial|Nerve_Tibial|Ovary|Skin_Not_Sun_Exposed_Suprapubic|Skin_Not_Sun_Exposed_Suprapubic|Skin_Sun_Exposed_Lower_leg|Spleen|Spleen|Thyroid|Thyroid|Thyroid|Whole_Blood exonic TAS1R3 . nonsynonymous SNV TAS1R3:ENST00000339381.6:exon6:c.T2269C:p.C757R . . ... 1 6876.73 101 chr1 1334174 . T C 6876.73 PASS AC=6;AF=1.00;AN=6;DP=253;ExcessHet=3.0103;FS=0.000;MLEAC=6;MLEAF=1.00;MQ=60.00;QD=27.84;SOR=0.895 GT:AD:DP:GQ:PL
由于没有列名,每一列的内容不太方便查看,我们可以去看注释后的 multianno.vcf
文件
取出位点注释后的信息,
$cat annovar_test.hg38_multianno.vcf |grep -w 1334174
chr1 1334174 . T C 6876.73 PASS AC=6;AF=1.00;AN=6;DP=253;ExcessHet=3.0103;FS=0.000;MLEAC=6;MLEAF=1.00;MQ=60.00;QD=27.84;SOR=0.895;ANNOVAR_DATE=2020-06-08;Func.refGene=exonic;Gene.refGene=TAS1R3;GeneDetail.refGene=.;ExonicFunc.refGene=nonsynonymous_SNV;AAChange.refGene=TAS1R3:NM_152228:exon6:c.T2269C:p.C757R;cytoBand=1p36.33;ExAC_ALL=0.9613;ExAC_AFR=0.9873;ExAC_AMR=0.988;ExAC_EAS=0.9128;ExAC_FIN=0.9366;ExAC_NFE=0.9721;ExAC_OTH=0.9321;ExAC_SAS=0.9239;avsnp150=rs307377;SIFT_score=1.0;SIFT_converted_rankscore=0.010;SIFT_pred=T;SIFT4G_score=1.0;SIFT4G_converted_rankscore=0.012;SIFT4G_pred=T;Polyphen2_HDIV_score=0.0;Polyphen2_HDIV_rankscore=0.029;Polyphen2_HDIV_pred=B;Polyphen2_HVAR_score=0.0;Polyphen2_HVAR_rankscore=0.014;Polyphen2_HVAR_pred=B;LRT_score=0.048;LRT_converted_rankscore=0.233;LRT_pred=N;MutationTaster_score=1;MutationTaster_converted_rankscore=0.090;MutationTaster_pred=P;MutationAssessor_score=-2.125;MutationAssessor_rankscore=0.001;MutationAssessor_pred=N;FATHMM_score=-2.22;FATHMM_converted_rankscore=0.870;FATHMM_pred=D;PROVEAN_score=4.08;PROVEAN_converted_rankscore=0.001;PROVEAN_pred=N;VEST4_score=0.028;VEST4_rankscore=0.006;MetaSVM_score=-0.948;MetaSVM_rankscore=0.413;MetaSVM_pred=T;MetaLR_score=0.000;MetaLR_rankscore=0.000;MetaLR_pred=T;MetaRNN_score=0.000;MetaRNN_rankscore=0.000;MetaRNN_pred=T;M-CAP_score=.;M-CAP_rankscore=.;M-CAP_pred=.;REVEL_score=0.134;REVEL_rankscore=0.368;MutPred_score=.;MutPred_rankscore=.;MVP_score=.;MVP_rankscore=.;MPC_score=.;MPC_rankscore=.;PrimateAI_score=0.292;PrimateAI_rankscore=0.092;PrimateAI_pred=T;DEOGEN2_score=0.088;DEOGEN2_rankscore=0.381;DEOGEN2_pred=T;BayesDel_addAF_score=-0.489;BayesDel_addAF_rankscore=0.007;BayesDel_addAF_pred=T;BayesDel_noAF_score=-0.368;BayesDel_noAF_rankscore=0.372;BayesDel_noAF_pred=T;ClinPred_score=0.003;ClinPred_rankscore=0.000;ClinPred_pred=T;LIST-S2_score=0.218;LIST-S2_rankscore=0.028;LIST-S2_pred=T;Aloft_pred=.\x3b;Aloft_Confidence=.\x3b;CADD_raw=0.468;CADD_raw_rankscore=0.088;CADD_phred=6.166;DANN_score=0.506;DANN_rankscore=0.044;fathmm-MKL_coding_score=0.083;fathmm-MKL_coding_rankscore=0.143;fathmm-MKL_coding_pred=N;fathmm-XF_coding_score=0.024;fathmm-XF_coding_rankscore=0.015;fathmm-XF_coding_pred=N;Eigen-raw_coding=-1.491;Eigen-raw_coding_rankscore=0.019;Eigen-PC-raw_coding=-1.411;Eigen-PC-raw_coding_rankscore=0.031;GenoCanyon_score=1.000;GenoCanyon_rankscore=0.463;integrated_fitCons_score=0.421;integrated_fitCons_rankscore=0.064;integrated_confidence_value=0;LINSIGHT=.;LINSIGHT_rankscore=.;GERP++_NR=4.59;GERP++_RS=-0.839;GERP++_RS_rankscore=0.102;phyloP100way_vertebrate=0.360;phyloP100way_vertebrate_rankscore=0.200;phyloP30way_mammalian=-0.238;phyloP30way_mammalian_rankscore=0.086;phastCons100way_vertebrate=0.075;phastCons100way_vertebrate_rankscore=0.222;phastCons30way_mammalian=0.102;phastCons30way_mammalian_rankscore=0.195;SiPhy_29way_logOdds=1.084;SiPhy_29way_logOdds_rankscore=0.015;Interpro_domain=GPCR_family_3,_C-terminal|GPCR_family_3,_C-terminal;GTEx_V8_gene=DVL1|ANKRD65|TAS1R3|ANKRD65|TAS1R3|INTS11|INTS11|TAS1R3|ANKRD65|ANKRD65|C1QTNF12|MXRA8|ANKRD65|ANKRD65|MXRA8|ANKRD65|ANKRD65|SCNN1D|DVL1|ANKRD65|ATAD3C|INTS11|ANKRD65|AL391244.1|ANKRD65|AL391244.2|ANKRD65|INTS11|TAS1R3|DVL1|TAS1R3;GTEx_V8_tissue=Adipose_Subcutaneous|Adipose_Subcutaneous|Artery_Aorta|Artery_Aorta|Artery_Tibial|Brain_Cerebellar_Hemisphere|Brain_Cerebellum|Brain_Cerebellum|Brain_Cortex|Brain_Frontal_Cortex_BA9|Cells_Cultured_fibroblasts|Colon_Sigmoid|Colon_Sigmoid|Colon_Transverse|Esophagus_Muscularis|Esophagus_Muscularis|Heart_Left_Ventricle|Nerve_Tibial|Nerve_Tibial|Nerve_Tibial|Nerve_Tibial|Ovary|Skin_Not_Sun_Exposed_Suprapubic|Skin_Not_Sun_Exposed_Suprapubic|Skin_Sun_Exposed_Lower_leg|Spleen|Spleen|Thyroid|Thyroid|Thyroid|Whole_Blood;Func.knownGene=exonic;Gene.knownGene=TAS1R3;GeneDetail.knownGene=.;ExonicFunc.knownGene=nonsynonymous_SNV;AAChange.knownGene=TAS1R3:ENST00000339381.6:exon6:c.T2269C:p.C757R;CLNALLELEID=.;CLNDN=.;CLNDISDB=.;CLNREVSTAT=.;CLNSIG=.;ALLELE_END GT:AD:DP:GQ:PL
转换为更易读的显示
$cat annovar_test.hg38_multianno.vcf |grep -w 1334174|cut -f 8|cut -d ";" -f 12-|tr ";" "\n"
ANNOVAR_DATE=2020-06-08
Func.refGene=exonic
Gene.refGene=TAS1R3
GeneDetail.refGene=.
ExonicFunc.refGene=nonsynonymous_SNV
AAChange.refGene=TAS1R3:NM_152228:exon6:c.T2269C:p.C757R
cytoBand=1p36.33
ExAC_ALL=0.9613
ExAC_AFR=0.9873
ExAC_AMR=0.988
ExAC_EAS=0.9128
ExAC_FIN=0.9366
ExAC_NFE=0.9721
ExAC_OTH=0.9321
ExAC_SAS=0.9239
avsnp150=rs307377
SIFT_score=1.0
SIFT_converted_rankscore=0.010
SIFT_pred=T
SIFT4G_score=1.0
SIFT4G_converted_rankscore=0.012
SIFT4G_pred=T
Polyphen2_HDIV_score=0.0
Polyphen2_HDIV_rankscore=0.029
Polyphen2_HDIV_pred=B
Polyphen2_HVAR_score=0.0
Polyphen2_HVAR_rankscore=0.014
Polyphen2_HVAR_pred=B
LRT_score=0.048
LRT_converted_rankscore=0.233
LRT_pred=N
MutationTaster_score=1
MutationTaster_converted_rankscore=0.090
MutationTaster_pred=P
MutationAssessor_score=-2.125
MutationAssessor_rankscore=0.001
MutationAssessor_pred=N
FATHMM_score=-2.22
FATHMM_converted_rankscore=0.870
FATHMM_pred=D
PROVEAN_score=4.08
PROVEAN_converted_rankscore=0.001
PROVEAN_pred=N
VEST4_score=0.028
VEST4_rankscore=0.006
MetaSVM_score=-0.948
MetaSVM_rankscore=0.413
MetaSVM_pred=T
MetaLR_score=0.000
MetaLR_rankscore=0.000
MetaLR_pred=T
MetaRNN_score=0.000
MetaRNN_rankscore=0.000
MetaRNN_pred=T
M-CAP_score=.
M-CAP_rankscore=.
M-CAP_pred=.
REVEL_score=0.134
REVEL_rankscore=0.368
MutPred_score=.
MutPred_rankscore=.
MVP_score=.
MVP_rankscore=.
MPC_score=.
MPC_rankscore=.
PrimateAI_score=0.292
PrimateAI_rankscore=0.092
PrimateAI_pred=T
DEOGEN2_score=0.088
DEOGEN2_rankscore=0.381
DEOGEN2_pred=T
BayesDel_addAF_score=-0.489
BayesDel_addAF_rankscore=0.007
BayesDel_addAF_pred=T
BayesDel_noAF_score=-0.368
BayesDel_noAF_rankscore=0.372
BayesDel_noAF_pred=T
ClinPred_score=0.003
ClinPred_rankscore=0.000
ClinPred_pred=T
LIST-S2_score=0.218
LIST-S2_rankscore=0.028
LIST-S2_pred=T
Aloft_pred=.\x3b
Aloft_Confidence=.\x3b
CADD_raw=0.468
CADD_raw_rankscore=0.088
CADD_phred=6.166
DANN_score=0.506
DANN_rankscore=0.044
fathmm-MKL_coding_score=0.083
fathmm-MKL_coding_rankscore=0.143
fathmm-MKL_coding_pred=N
fathmm-XF_coding_score=0.024
fathmm-XF_coding_rankscore=0.015
fathmm-XF_coding_pred=N
Eigen-raw_coding=-1.491
Eigen-raw_coding_rankscore=0.019
Eigen-PC-raw_coding=-1.411
Eigen-PC-raw_coding_rankscore=0.031
GenoCanyon_score=1.000
GenoCanyon_rankscore=0.463
integrated_fitCons_score=0.421
integrated_fitCons_rankscore=0.064
integrated_confidence_value=0
LINSIGHT=.
LINSIGHT_rankscore=.
GERP++_NR=4.59
GERP++_RS=-0.839
GERP++_RS_rankscore=0.102
phyloP100way_vertebrate=0.360
phyloP100way_vertebrate_rankscore=0.200
phyloP30way_mammalian=-0.238
phyloP30way_mammalian_rankscore=0.086
phastCons100way_vertebrate=0.075
phastCons100way_vertebrate_rankscore=0.222
phastCons30way_mammalian=0.102
phastCons30way_mammalian_rankscore=0.195
SiPhy_29way_logOdds=1.084
SiPhy_29way_logOdds_rankscore=0.015
Interpro_domain=GPCR_family_3,_C-terminal|GPCR_family_3,_C-terminal
GTEx_V8_gene=DVL1|ANKRD65|TAS1R3|ANKRD65|TAS1R3|INTS11|INTS11|TAS1R3|ANKRD65|ANKRD65|C1QTNF12|MXRA8|ANKRD65|ANKRD65|MXRA8|ANKRD65|ANKRD65|SCNN1D|DVL1|ANKRD65|ATAD3C|INTS11|ANKRD65|AL391244.1|ANKRD65|AL391244.2|ANKRD65|INTS11|TAS1R3|DVL1|TAS1R3
GTEx_V8_tissue=Adipose_Subcutaneous|Adipose_Subcutaneous|Artery_Aorta|Artery_Aorta|Artery_Tibial|Brain_Cerebellar_Hemisphere|Brain_Cerebellum|Brain_Cerebellum|Brain_Cortex|Brain_Frontal_Cortex_BA9|Cells_Cultured_fibroblasts|Colon_Sigmoid|Colon_Sigmoid|Colon_Transverse|Esophagus_Muscularis|Esophagus_Muscularis|Heart_Left_Ventricle|Nerve_Tibial|Nerve_Tibial|Nerve_Tibial|Nerve_Tibial|Ovary|Skin_Not_Sun_Exposed_Suprapubic|Skin_Not_Sun_Exposed_Suprapubic|Skin_Sun_Exposed_Lower_leg|Spleen|Spleen|Thyroid|Thyroid|Thyroid|Whole_Blood
Func.knownGene=exonic
Gene.knownGene=TAS1R3
GeneDetail.knownGene=.
ExonicFunc.knownGene=nonsynonymous_SNV
AAChange.knownGene=TAS1R3:ENST00000339381.6:exon6:c.T2269C:p.C757R
CLNALLELEID=.
CLNDN=.
CLNDISDB=.
CLNREVSTAT=.
CLNSIG=.
ALLELE_END
从例子中可以得到位点的详细注释信息:
变异信息
Func.refGene
): exonic
(变异位于外显子区域)Gene.refGene
): TAS1R3
(变异发生在TAS1R3基因上)ExonicFunc.refGene
): nonsynonymous_SNV
(非同义单核苷酸变异,即该变异会改变编码的氨基酸)AAChange.refGene
): TAS1R1:NM_152228:exon6:c.T2269C:p.C757R
(变异导致蛋白质在第757位由赖氨酸变为精氨酸)人群频率信息
ExAC_ALL=0.9613
(所有人群中的频率)功能预测工具的评分和预测
SIFT_score=1.0
, SIFT_pred=T
(表示变异被预测为耐受)Polyphen2_HDIV_score=0.0
, Polyphen2_HDIV_pred=B
(表示变异被预测为良性)MutationTaster_score=1
, MutationTaster_pred=P
(预测为可能有害)MutationAssessor_score=-2.125
, MutationAssessor_pred=N
(预测为中性)FATHMM_score=-2.22
, FATHMM_pred=D
(预测为有害)进化保守性和基因功能
CADD_phred=6.166
(提供了变异的可能致病性的预测)GERP++_RS=-0.839
(相对较低的进化保守性分数)其他数据库和信息
cosmic70
(如果有值,则提供癌症相关突变信息)使用 annotate_variation.pl
基于refGene数据库的注释结果
$cat homo_test.avinput.variant_function |grep 1334174
exonic TAS1R3 chr1 1334174 1334174 T C 1 6876.73 101 chr1 1334174 . T C 6876.73 PASS AC=6;AF=1.00;AN=6;DP=253;ExcessHet=3.0103;FS=0.000;MLEAC=6;MLEAF=1.00;MQ=60.00;QD=27.84;SOR=0.895 GT:AD:DP:GQ:PL
$cat homo_test.avinput.exonic_variant_function |grep 1334174
line26 nonsynonymous SNV TAS1R3:NM_152228:exon6:c.T2269C:p.C757R, chr1 1334174 1334174 T C 1 6876.73 101 chr1 1334174 . T C6876.73 PASS AC=6;AF=1.00;AN=6;DP=253;ExcessHet=3.0103;FS=0.000;MLEAC=6;MLEAF=1.00;MQ=60.00;QD=27.84;SOR=0.895 GT:AD:DP:GQ:PL
同以上联合注释结果,不再详述。
更多用法,详见:https://annovar.openbioinformatics.org/en/latest/