使用:
最近有很多蛋白结构要分析
就找到了这个
简单看下,使用的ide还是jupyter notebook
官网链接:
https://biopython-cn.readthedocs.io/zh_CN/latest/
好吧有中文教程
简介(从官网扒下来的):
Biopython工程是一个使用Python来开发计算分子生物学工具的国际团体。(http://www.python.org) Python是一种面向对象的、解释型的、灵活的语言,在计算机科学中日益流行。Python易学,语法明晰,并且能很容易的使用以C,C++或 者FORTRAN编写的模块实现扩展。
实例:
直接用它来处理结构文件
#读取结构文件
#(1)读取mmcif文件
#创建一个mmcif解析器
from Bio.PDB.MMCIFParser import MMCIFParser
parser = MMCIFParser()
#导入mmcif结构文件
structure = parser.get_structure('1fat', '1fat.cif')
#(2)读取pdb格式文件
#创建一个pdb解析器
from Bio.PDB.PDBParser import PDBParser
p = PDBParser()
#导入pdb文件
structure = p.get_structure('1d3z','1D3Z.pdb')
#这样就拿到了蛋白结构
#看下结构的表示
#一个 Structure 对象的整体布局遵循称为SMCRA(Structure/Model/Chain/Residue/Atom,结构/模型/链/残基/原子)的体系架构:
#结构由模型组成
#模型由多条链组成
#链由残基组成
#多个原子构成残基
#上一副UML图,官方的。。。自己悟
#再来一个经典的案例
#看一下help文档
help(structure)
for model in structure:
for chain in model:
for residue in chain:
for atom in residue:
print(atom)
#然后一层一层看下来
structure.get_full_id()
('1d3z',)
model_list=structure.get_list()
model_list
[<Model id=0>,
<Model id=1>,
<Model id=2>,
<Model id=3>,
<Model id=4>,
<Model id=5>,
<Model id=6>,
<Model id=7>,
<Model id=8>,
<Model id=9>]
chain_list=model_list[0].get_list()
chain_list
[<Chain id=A>]
res_list=chain_list[0].get_list()
res_list
[<Residue MET het= resseq=1 icode= >,
<Residue GLN het= resseq=2 icode= >,
<Residue ILE het= resseq=3 icode= >,
<Residue PHE het= resseq=4 icode= >,
<Residue VAL het= resseq=5 icode= >,
<Residue LYS het= resseq=6 icode= >,
<Residue THR het= resseq=7 icode= >,
<Residue LEU het= resseq=8 icode= >,
<Residue THR het= resseq=9 icode= >,
<Residue GLY het= resseq=10 icode= >,
<Residue LYS het= resseq=11 icode= >,
<Residue THR het= resseq=12 icode= >,
<Residue ILE het= resseq=13 icode= >,
<Residue THR het= resseq=14 icode= >,
<Residue LEU het= resseq=15 icode= >,
<Residue GLU het= resseq=16 icode= >,
<Residue VAL het= resseq=17 icode= >,
<Residue GLU het= resseq=18 icode= >,
<Residue PRO het= resseq=19 icode= >,
<Residue SER het= resseq=20 icode= >,
<Residue ASP het= resseq=21 icode= >,
<Residue THR het= resseq=22 icode= >,
<Residue ILE het= resseq=23 icode= >,
<Residue GLU het= resseq=24 icode= >,
<Residue ASN het= resseq=25 icode= >,
<Residue VAL het= resseq=26 icode= >,
<Residue LYS het= resseq=27 icode= >,
<Residue ALA het= resseq=28 icode= >,
<Residue LYS het= resseq=29 icode= >,
<Residue ILE het= resseq=30 icode= >,
<Residue GLN het= resseq=31 icode= >,
<Residue ASP het= resseq=32 icode= >,
<Residue LYS het= resseq=33 icode= >,
<Residue GLU het= resseq=34 icode= >,
<Residue GLY het= resseq=35 icode= >,
<Residue ILE het= resseq=36 icode= >,
<Residue PRO het= resseq=37 icode= >,
<Residue PRO het= resseq=38 icode= >,
<Residue ASP het= resseq=39 icode= >,
<Residue GLN het= resseq=40 icode= >,
<Residue GLN het= resseq=41 icode= >,
<Residue ARG het= resseq=42 icode= >,
<Residue LEU het= resseq=43 icode= >,
<Residue ILE het= resseq=44 icode= >,
<Residue PHE het= resseq=45 icode= >,
<Residue ALA het= resseq=46 icode= >,
<Residue GLY het= resseq=47 icode= >,
<Residue LYS het= resseq=48 icode= >,
<Residue GLN het= resseq=49 icode= >,
<Residue LEU het= resseq=50 icode= >,
<Residue GLU het= resseq=51 icode= >,
<Residue ASP het= resseq=52 icode= >,
<Residue GLY het= resseq=53 icode= >,
<Residue ARG het= resseq=54 icode= >,
<Residue THR het= resseq=55 icode= >,
<Residue LEU het= resseq=56 icode= >,
<Residue SER het= resseq=57 icode= >,
<Residue ASP het= resseq=58 icode= >,
<Residue TYR het= resseq=59 icode= >,
<Residue ASN het= resseq=60 icode= >,
<Residue ILE het= resseq=61 icode= >,
<Residue GLN het= resseq=62 icode= >,
<Residue LYS het= resseq=63 icode= >,
<Residue GLU het= resseq=64 icode= >,
<Residue SER het= resseq=65 icode= >,
<Residue THR het= resseq=66 icode= >,
<Residue LEU het= resseq=67 icode= >,
<Residue HIS het= resseq=68 icode= >,
<Residue LEU het= resseq=69 icode= >,
<Residue VAL het= resseq=70 icode= >,
<Residue LEU het= resseq=71 icode= >,
<Residue ARG het= resseq=72 icode= >,
<Residue LEU het= resseq=73 icode= >,
<Residue ARG het= resseq=74 icode= >,
<Residue GLY het= resseq=75 icode= >,
<Residue GLY het= resseq=76 icode= >]
res=res_list[0]
res
<Residue MET het= resseq=1 icode= >
atom_list=res.get_list()
atom_list
[<Atom N>,
<Atom CA>,
<Atom C>,
<Atom O>,
<Atom CB>,
<Atom CG>,
<Atom SD>,
<Atom CE>,
<Atom H1>,
<Atom H2>,
<Atom H3>,
<Atom HA>,
<Atom HB2>,
<Atom HB3>,
<Atom HG2>,
<Atom HG3>,
<Atom HE1>,
<Atom HE2>,
<Atom HE3>]
atom_list[1]
<Atom CA>
atom_list[1].coord
array([ 51.653, -89.304, 8.833], dtype=float32)
GitHub地址:
https://github.com/luskyqi1995/pubchem/blob/master/biopython_1.ipynb
然后一波图片凑数:
uml图: