机器学习字母分类-python

大小: 210KB

文件类型: .rar

金币: 2

下载: 0 次

发布日期: 2023-08-04
语言: Python
标签: 机器学习 字母分类 python Data

高速下载

资源简介

打开网站链接http://archive.ics.uci.edu/ml/，点击链接 view all data sets，打开所有数据页面，点击Instances，按照研究实例由多到少排序，选择任务为Classification的数据集，最后我们小组选择了“Letter Recognition Data Set”字母识别数据集。二、数据分析字母识别数据集每个对象有16个特征，共包括20000个数据对象，每个特征的取值都为整数，于1991年1月1日提供，主要用来进行数据分类试验。分类的目标是识别由黑白像素组成矩形的图像，代表26英文字母哪个字母。这些图像基于20种不同字体，并经过随机变形生成的20000个模拟实例。每个实例转化成16个原始数字特征，其中10000用于训练，另外10000个用于字母预测。因为每个样本都有明确的类标识，所以这个一个监督学习过程。

资源截图

小图大图

代码片段和文件信息

from numpy import *
import string

#parse files function every data is a integer
def loadDataSet（filename）:
    numFeat = len（open（filename）.readline（）.split（‘‘））
    dataMat = []
    labelMat=[]
    fr = open（filename）
    for line in fr.readlines（）:
        lineArr= []
        curLine = line.strip（‘\n‘）.split（‘‘）
        for i in range（1 numFeat）:
            lineArr.append（int（curLine[i]））
        dataMat.append（lineArr）
        labelMat.append（curLine[0]）
    fr.close（）
    return dataMat labelMat
‘‘‘‘‘
purpose: data classify by compare to threshold
‘‘‘
def stumpClassify（dataMatrix dimen threshVal threshIneq）:
    retArray = ones（（shape（dataMatrix）[0]1））
    if threshIneq == ‘lt‘:
        retArray[dataMatrix[:dimen]    else:
        retArray[dataMatrix[:dimen]    return retArray
‘‘‘‘‘
purpose: single level decision tree create function（weak classify device）
input:  dataArr: dataSet classLabels:class label D:data weight
output:  bestStump: single level decision tree having min error rate minError: min Error rate
         bestClassEst: estimate class labels
‘‘‘
def buildStump（dataArrclassLabelsD）:
    dataMatrix = mat（dataArr）; labelMat = mat（classLabels）.T
    mn = shape（dataMatrix）
    numSteps = 10.0
    # define a empty dictionary for store Dthe better single level tree info
    bestStump = {}
    bestClasEst = mat（zeros（（m1）））
    minError = inf #init error sum to +infinity
    for i in range（n）:#loop over all dimensions
        rangeMin = dataMatrix[:i].min（）
        rangeMax = dataMatrix[:i].max（）
        stepSize = （rangeMax-rangeMin）/numSteps
        for j in range（-1int（numSteps）+1）:#loop over all range in current dimension
            for inequal in [‘lt‘ ‘gt‘]: #go over less than and greater than
                threshVal = （rangeMin + float（j） * stepSize）
                predictedVals = stumpClassify（dataMatrixithreshValinequal）#call stump classify with i j lessThan
                errArr = mat（ones（（m1））） # create error array
                errArr[predictedVals == labelMat] = 0
                weightedError = D.T*errArr  #calc total error multiplied by D
                #print “split: dim %d thresh %.2f thresh ineqal: %s the weighted error is %.3f“ % （i threshVal inequal weightedError）
                if weightedError < minError: #if current error is smaller than before then save it into the beststump dictionary
                    minError = weightedError
                    bestClasEst = predictedVals.copy（）
                    bestStump[‘dim‘] = i
                    bestStump[‘thresh‘] = threshVal
                    bestStump[‘ineq‘] = inequal
    return bestStumpminErrorbestClasEst

‘‘‘‘‘
purpose:whole AdaBoost algorithm
input parameter:
dataArr:data set
classLabels:class labels
numIt:die dai number （only one parameter needed user to specified）
output parameter:
weakClassArr:seve

属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----

     文件     356180  2016-11-24 20:38  traindata.txt

     文件       7150  2016-11-26 22:02  TreeAdaBoost.py

     文件      36042  2017-03-18 09:31  文档.docx

     文件     356383  2016-11-24 20:39  testdata.txt

----------- ---------  ---------- -----  ----

               755755                    4

上一篇：python Django 学生会管理系统.zip
下一篇：路由器Drcom哆点教程

共有条评论

机器学习字母分类-python

资源简介

资源截图

代码片段和文件信息

评论

相关资源