Python 代码加速

原创

华科云商小徐

发布于 2024-12-03 16:54:18

9300

代码可运行

文章被收录于专栏：小徐学爬虫小徐学爬虫

运行总次数：0

代码可运行

Python 代码加速通常是为了提高计算性能、减少运行时间。以下是一些常见的 Python 加速方法，适用于不同场景：

1、问题背景

def novo (infile, seqList, out) :
    uDic = dict()
    rDic = dict()
    nmDic = dict()

    with open(infile, 'r') as infile, open(seqList, 'r') as RADlist :
        samples = [line.strip() for line in RADlist]
        lines = [line.strip() for line in infile]

    #Create dictionaires with all the samples
        for i in samples:
            uDic[i.replace(" ","")] = 0
            rDic[i.replace(" ","")] = 0
            nmDic[i.replace(" ","")] = 0

        for k in lines:
            l1 = k.split("\t")
            l2 = l1[0].split(";")
            l3 = l2[0].replace(">","")
            if len(l1)<2:
                continue
            if l1[4] == "U":
                for k in uDic.keys():
                    if k == l3:
                        uDic[k] += 1

            if l1[4] == "R":
                for j in rDic.keys():
                    if j == l3:
                        rDic[j] += 1

            if l1[4] == "NM":
                for h in nmDic.keys():
                    if h == l3:
                        nmDic[h] += 1

    f = open(out, "w")
    f.write("Sample"+"\t"+"R"+"\t"+"U"+"\t"+"NM"+"\t"+"TOTAL"+"\t"+"%R"+"\t"+"%U"+"\t"+"%NM"+"\n")
    for i in samples:
        U = int()
        R = int()
        NM = int ()
        for k, j in uDic.items():
            if k == i:
                U = j
        for o, p in rDic.items():
            if o == i:
                R = p
        for y,u in nmDic.items():
            if y == i:
                NM = u
        TOTAL = int(U + R + NM)
        try:
         f.write(i+"\t"+str(R)+"\t"+str(U)+"\t"+str(NM)+"\t"+str(TOTAL)+"\t"+str(float(R) / TOTAL)+"\t"+str(float(U) / TOTAL)+"\t"+str(float(NM) / TOTAL)+"\n")
        except:
         continue

    f.close()

上面是一个 Python 代码，它从文本文件中读取字符串并将其搜索一个输入文件中，并将这些字符串在输出文件中出现的次数打印出来。问题是，该代码在处理大文件时速度很慢。

2、解决方案

方法一

一个提高代码速度的方法是使用迭代器来逐行读取文件，而不是一次性将整个文件读入内存。这可以节省大量的内存，并允许代码处理更大的文件。

from collections import Counter
import csv


# Count
counts = Counter()
with open(infile, 'r') as infile:
    for line in infile:
        l1 = line.strip().split("\t")
        l2 = l1[0].split(";")
        l3 = l2[0].replace(">","")
        if len(l1)<2:
            continue
        counts[(l1[4], l3)] += 1


# Produce output
types = ['R', 'U', 'NM']
with open(seqList, 'r') as RADlist, open(out, 'w') as outfile:
    f = csv.writer(outfile, delimiter='\t')
    f.writerow(types + ['TOTAL'] + ['%' + t for t in types])
    for sample in RADlist:
        sample = sample.strip()
        countrow = [counts((t, sample)) for t in types]
        total = sum(countrow)
        f.writerow([sample] + countrow + [total] + [c/total for c in countrow])

方法二

另一个提高代码速度的方法是使用并行处理。这可以利用多核 CPU 的优势，同时处理多个任务。

from concurrent.futures import ProcessPoolExecutor
from collections import Counter


# Count
def count_sample(sample, infile):
    counts = Counter()
    with open(infile, 'r') as infile:
        for line in infile:
            l1 = line.strip().split("\t")
            l2 = l1[0].split(";")
            l3 = l2[0].replace(">","")
            if len(l1)<2:
                continue
            counts[(l1[4], l3)] += 1
    return sample, counts


# Produce output
types = ['R', 'U', 'NM']
with ProcessPoolExecutor() as executor, open(seqList, 'r') as RADlist, open(out, 'w') as outfile:
    f = csv.writer(outfile, delimiter='\t')
    f.writerow(types + ['TOTAL'] + ['%' + t for t in types])
    for sample, counts in executor.map(count_sample, RADlist, [infile] * len(RADlist)):
        countrow = [counts[(t, sample)] for t in types]
        total = sum(countrow)
        f.writerow([sample] + countrow + [total] + [c/total for c in countrow])

通过这些方法，可以显著加快 Python 代码的执行速度。

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

python

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

python

登录后参与评论

0 条评论

热度

Python 代码加速

Python 代码加速

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐