python数据挖掘分类聚类回归关联算法代码加样例

大小: 337KB

文件类型: .zip

金币: 2

下载: 0 次

发布日期: 2021-05-09
语言: Python
标签: 聚类分类回归关联 python

高速下载

资源简介

算法有关联算法Apriori，分类算法BP、adboost，KNN，聚类算法kmeans、kmedoids、Clarans，回归有线性回归，里面程序代码有自带样例，下载相应包即可运行

资源截图

小图大图

代码片段和文件信息

import numpy as np

def loadSimData（）:
    datMat = np.matrix（[[1.02.1][2.0 1.1][1.31.0][1.01.0][2.01.0]]）
    classLabels = [1.0 1.0 -1.0 -1.0 1.0]
    return datMat classLabels

def stumpClassify（dataMatrixdimenthresholdValuethresholdIneq）:
    returnArray =  np.ones（（np.shape（dataMatrix）[0]1））
    if thresholdIneq == ‘lt‘:
        returnArray[dataMatrix[:dimen] <= thresholdValue] = -1
    else:
        returnArray[dataMatrix[:dimen] > thresholdValue] = -1
    return returnArray

def buildStump（dataArrayclassLabelsD）:
    dataMatrix = np.mat（dataArray）; labelMat = np.mat（classLabels）.T
    mn = np.shape（dataMatrix）
    stepNum = 10.0; bestStump = {}; bestClassEst = np.mat（np.zeros（（m1）））
    minError = np.inf
    for i in range（n）:
        rangeMin = dataMatrix[:i].min（）; rangeMax = dataMatrix[:i].max（）
        stepSize = （rangeMax - rangeMin）/stepNum
        for j in range（-1 int（stepNum）+1）:
            for thresholdIneq in [‘lt‘ ‘gt‘]:
                thresholdValue =  rangeMin + float（j） * stepSize
                predictClass = stumpClassify（dataMatrixithresholdValuethresholdIneq）
                errArray =  np.mat（np.ones（（m1）））
                errArray[predictClass == labelMat] = 0
                weightError = D.T * errArray
                #print “split: dim %d thresh: %.2fthreIneq:%sweghtError %.3F“ %（ithresholdValuethresholdIneqweightError）
                if weightError < minError:
                    minError = weightError
                    bestClassEst = predictClass.copy（）
                    bestStump[‘dimen‘] = i
                    bestStump[‘thresholdValue‘] = thresholdValue
                    bestStump[‘thresholdIneq‘] = thresholdIneq
    return bestClassEst minError bestStump

def adaBoostTrainDS（dataArrayclassLabelsnumIt=40）:
    weakClass = []#定义弱分类数组，保存每个基本分类器bestStump
    mn = np.shape（dataArray）
    D = np.mat（np.ones（（m1））/m）
    aggClassEst = np.mat（np.zeros（（m1）））
    for i in range（numIt）:
        print （“i:“i）
        bestClassEst minError bestStump = buildStump（dataArrayclassLabelsD）#step1:找到最佳的单层决策树
        print （“D.T:“ D.T）
        alpha = float（0.5*np.log（（1-minError）/max（minError1e-16）））#step2: 更新alpha
        print （“alpha:“alpha）
        bestStump[‘alpha‘] = alpha
        weakClass.append（bestStump）#step3:将基本分类器添加到弱分类的数组中
        print （“classEst:“bestClassEst）
        expon = np.multiply（-1*alpha*np.mat（classLabels）.TbestClassEst）
        D = np.multiply（D np.exp（expon））
        D = D/D.sum（）#step4:更新权重，该式是让D服从概率分布
        aggClassEst += alpha*bestClassEst#steo5:更新累计类别估计值
        print （“aggClassEst:“aggClassEst.T）
        print （np.sign（aggClassEst） != np.mat（classLabels）.T）
        aggError = np.multiply（np.sign（aggClassEst） != np.mat（classLabels）.Tnp.ones（（m1）））
        print （“aggError“aggError）
        aggErrorRate = aggError.sum（）/m
        print （“total error:“aggErrorRate）

属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----
     目录           0  2019-01-02 11:12  Adboost\
     文件        3994  2019-01-02 11:12  Adboost\__init__.py
     目录           0  2019-01-02 11:12  Apriori\
     文件        3392  2019-01-02 11:12  Apriori\__init__.py
     目录           0  2019-01-02 11:13  BP\
     文件         136  2019-01-02 11:13  BP\demo.weights
     文件        5434  2019-01-02 11:12  BP\__init__.py
     目录           0  2019-01-02 10:54  Clarans\
     文件        1188  2017-12-13 05:42  Clarans\.gitignore
     目录           0  2019-01-02 10:33  Clarans\data\
     文件       54220  2017-12-13 05:42  Clarans\data\clarans.png
     文件       10804  2019-01-02 11:07  Clarans\data\data.txt
     文件       14564  2019-01-02 11:09  Clarans\data\output.txt
     文件       31859  2017-12-13 05:42  Clarans\data\sample_data.png
     文件       35484  2017-12-13 05:42  Clarans\data\sample_output.png
     文件      104990  2017-12-13 05:42  Clarans\data\sample_polygons_data.png
     文件       90728  2017-12-13 05:42  Clarans\data\sample_polygons_output.png
     文件        2497  2017-12-13 05:42  Clarans\generate_data.py
     文件       10291  2017-12-13 05:42  Clarans\model.py
     文件        5815  2017-12-13 05:42  Clarans\README.md
     文件         139  2017-12-13 05:42  Clarans\requirements.txt
     文件        1257  2019-01-02 10:54  Clarans\run_clarans.py
     文件         492  2017-12-13 05:42  Clarans\show_data.py
     文件        2756  2017-12-13 05:42  Clarans\utils.py
     目录           0  2019-01-02 10:51  Clarans\__pycache__\
     文件        6422  2019-01-02 10:51  Clarans\__pycache__\model.cpython-36.pyc
     文件        4543  2019-01-02 10:51  Clarans\__pycache__\utils.cpython-36.pyc
     目录           0  2019-01-02 11:15  Kmeans\
     文件        4264  2019-01-02 11:04  Kmeans\kmeans.py
     文件        1181  2019-01-02 11:15  Kmeans\test.py
     文件        1755  2019-01-02 11:01  Kmeans\testSet.txt
............此处省略8个文件信息

上一篇：基于django的在线作业提交系统
下一篇：Python教程.rar

共有条评论

python数据挖掘分类聚类回归关联算法代码加样例

资源简介

资源截图

代码片段和文件信息

评论

相关资源