前往小程序,Get更优阅读体验!
立即前往
发布
社区首页 >专栏 >Python 代码加速

Python 代码加速

原创
作者头像
华科云商小徐
发布2024-12-03 16:54:18
发布2024-12-03 16:54:18
9300
代码可运行
举报
文章被收录于专栏:小徐学爬虫小徐学爬虫
运行总次数:0
代码可运行

Python 代码加速通常是为了提高计算性能、减少运行时间。以下是一些常见的 Python 加速方法,适用于不同场景:

1、问题背景

代码语言:javascript
代码运行次数:0
复制
def novo (infile, seqList, out) :
    uDic = dict()
    rDic = dict()
    nmDic = dict()
​
    with open(infile, 'r') as infile, open(seqList, 'r') as RADlist :
        samples = [line.strip() for line in RADlist]
        lines = [line.strip() for line in infile]
​
    #Create dictionaires with all the samples
        for i in samples:
            uDic[i.replace(" ","")] = 0
            rDic[i.replace(" ","")] = 0
            nmDic[i.replace(" ","")] = 0
​
        for k in lines:
            l1 = k.split("\t")
            l2 = l1[0].split(";")
            l3 = l2[0].replace(">","")
            if len(l1)<2:
                continue
            if l1[4] == "U":
                for k in uDic.keys():
                    if k == l3:
                        uDic[k] += 1
​
            if l1[4] == "R":
                for j in rDic.keys():
                    if j == l3:
                        rDic[j] += 1
​
            if l1[4] == "NM":
                for h in nmDic.keys():
                    if h == l3:
                        nmDic[h] += 1
​
    f = open(out, "w")
    f.write("Sample"+"\t"+"R"+"\t"+"U"+"\t"+"NM"+"\t"+"TOTAL"+"\t"+"%R"+"\t"+"%U"+"\t"+"%NM"+"\n")
    for i in samples:
        U = int()
        R = int()
        NM = int ()
        for k, j in uDic.items():
            if k == i:
                U = j
        for o, p in rDic.items():
            if o == i:
                R = p
        for y,u in nmDic.items():
            if y == i:
                NM = u
        TOTAL = int(U + R + NM)
        try:
         f.write(i+"\t"+str(R)+"\t"+str(U)+"\t"+str(NM)+"\t"+str(TOTAL)+"\t"+str(float(R) / TOTAL)+"\t"+str(float(U) / TOTAL)+"\t"+str(float(NM) / TOTAL)+"\n")
        except:
         continue
​
    f.close()

上面是一个 Python 代码,它从文本文件中读取字符串并将其搜索一个输入文件中,并将这些字符串在输出文件中出现的次数打印出来。问题是,该代码在处理大文件时速度很慢。

2、解决方案

方法一

一个提高代码速度的方法是使用迭代器来逐行读取文件,而不是一次性将整个文件读入内存。这可以节省大量的内存,并允许代码处理更大的文件。

代码语言:javascript
代码运行次数:0
复制
from collections import Counter
import csv
​
​
# Count
counts = Counter()
with open(infile, 'r') as infile:
    for line in infile:
        l1 = line.strip().split("\t")
        l2 = l1[0].split(";")
        l3 = l2[0].replace(">","")
        if len(l1)<2:
            continue
        counts[(l1[4], l3)] += 1
​
​
# Produce output
types = ['R', 'U', 'NM']
with open(seqList, 'r') as RADlist, open(out, 'w') as outfile:
    f = csv.writer(outfile, delimiter='\t')
    f.writerow(types + ['TOTAL'] + ['%' + t for t in types])
    for sample in RADlist:
        sample = sample.strip()
        countrow = [counts((t, sample)) for t in types]
        total = sum(countrow)
        f.writerow([sample] + countrow + [total] + [c/total for c in countrow])

方法二

另一个提高代码速度的方法是使用并行处理。这可以利用多核 CPU 的优势,同时处理多个任务。

代码语言:javascript
代码运行次数:0
复制
from concurrent.futures import ProcessPoolExecutor
from collections import Counter
​
​
# Count
def count_sample(sample, infile):
    counts = Counter()
    with open(infile, 'r') as infile:
        for line in infile:
            l1 = line.strip().split("\t")
            l2 = l1[0].split(";")
            l3 = l2[0].replace(">","")
            if len(l1)<2:
                continue
            counts[(l1[4], l3)] += 1
    return sample, counts
​
​
# Produce output
types = ['R', 'U', 'NM']
with ProcessPoolExecutor() as executor, open(seqList, 'r') as RADlist, open(out, 'w') as outfile:
    f = csv.writer(outfile, delimiter='\t')
    f.writerow(types + ['TOTAL'] + ['%' + t for t in types])
    for sample, counts in executor.map(count_sample, RADlist, [infile] * len(RADlist)):
        countrow = [counts[(t, sample)] for t in types]
        total = sum(countrow)
        f.writerow([sample] + countrow + [total] + [c/total for c in countrow])

通过这些方法,可以显著加快 Python 代码的执行速度。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档