资源简介
文件用于计算fasta文件中基因序列的N50、基因条数、最短最长的序列条数。将脚本文件拷贝至fasta文件目录下,使用方法:python cal_N50.py
跳出“Enter your fasta/fa name: ”后,输入你当前目录下的fasta文件名后回车即可
代码片段和文件信息
#GC_N50.py
print ‘Python and Biopython needed for running this script!‘
print “script for calculating N50 of assembly“
fasta = raw_input(‘Enter your fasta/fa name: ‘)
# N50 calculation
baseSumLength= 0[]
ValueSumN50 = 00
no_cno_gno_ano_tno_n = 00000
from Bio import SeqIO
for record in SeqIO.parse(open(fasta) “fasta“):
baseSum += len(record.seq)
Length.append(len(record.seq))
seq =record.seq.lower()
no_c+=seq.count(‘c‘)
no_g+=seq.count(‘g‘)
no_a+=seq.count(‘a‘)
no_t+=seq.count(‘t‘)
no_n+=seq.count(‘n‘)
#N50 calcuation
N50_pos = baseSum / 2.0
Length.sort()
Length.reverse()
for value in Length:
ValueSum += value
if N50_pos <= ValueSum:
N50 = value
break
print ‘Sequences NO.:‘+‘t‘+str(len(Length))
print ‘Sequences Min.:‘+‘t‘+str(min(Length))
print ‘Sequences Max.:‘+‘t‘+str(max(Length))
print ‘N50: ‘ + str(N50)
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 278 2020-11-17 08:48 浣跨敤鏂规硶.txt
目录 0 2020-11-17 08:48 __MACOSX\
文件 210 2020-11-17 08:48 __MACOSX\._浣跨敤鏂规硶.txt
文件 885 2020-11-16 23:52 璁$畻N50鐨刾ython鑴氭湰.py
文件 613 2020-11-16 23:52 __MACOSX\._璁$畻N50鐨刾ython鑴氭湰.py
评论
共有 条评论