资源简介
此代码主要是对数据集生成图,第一部分是生成数据的相关性矩阵图,第二部分是生成数据集的缺失图,第三部分是数据通过PCA从多维降为二维后使用聚类处理在二维层面上显示的散点图,第三部分是分类算法对数据集的处理输出为分类准确率,分类算法包括随机森林,朴素贝叶斯,决策树,KNN,支持向量机,和神经网络。以上皆为代码所能处理的功能。如果你是需要对数据集进行分析需要图,这份代码就比较合适。
代码片段和文件信息
#注意,如果代码有缺失值,请先将缺失值使用NA填充,代码只识别NA为缺失值。
#输入数据集保持csv格式
import pandas as pd
import numpy as np
import os
import os.path
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
import seaborn as sns
import missingno as msno
from sklearn.decomposition import PCA
from sklearn import preprocessing
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
rf = RandomForestClassifier(bootstrap=True class_weight=None criterion=‘gini‘
max_depth=30 max_features=‘auto‘ max_leaf_nodes=None
min_impurity_decrease=0.0 min_impurity_split=None
min_samples_leaf=1 min_samples_split=6
min_weight_fraction_leaf=0.0 n_estimators=400 n_jobs=None
oob_score=False random_state=42 verbose=0
warm_start=False)
NB = MultinomialNB(alpha=1.0 class_prior=None fit_prior=True)
tree = DecisionTreeClassifier(max_depth=30)
Knn = KNeighborsClassifier()
svc = SVC(gamma=‘auto‘kernel=‘linear‘)
mlp = MLPClassifier(solver=‘lbfgs‘ alpha=1e-5hidden_layer_sizes=(5 5) random_state=1)
def plot_make(datasetname):
ds_corr = dataset.corr(method=‘pearson‘ min_periods=1)
f ax = plt.subplots(figsize=(14 10))
sns.heatmap(ds_corr cmap=‘RdBu‘ linewidths=0.05 ax=ax)
ax.set_title(‘Correlation between features in ‘ + name)
f.savefig(name + ‘.png‘ dpi=100 bbox_inches=‘tight‘)
def Noise_found(datasetname):
dataset = np.array(dataset)
X = np.delete(dataset -1 axis=1)
# #y = dataset[: -1]
for i in range(X.shape[0]):#行数
for j in range(X.shape[1]):#列数
if X[i][j] == ‘NA‘:
X[i][j] == ‘NaN‘
imp = preprocessing.Imputer(missing_values=‘NaN‘strategy=‘most_frequent‘)#先来个简单填补
imp.fit(X)
X = imp.transform(X)
pca = PCA(n_components=2)
reduced_X = pca.fit_transform(X)
k1 = KMeans(n_clusters=2) # 将其类别分为3类
k1.fit(reduced_X)
kc1 = k1.clus
相关资源
- Python for data analysis(第二版中文版代
- DataV.GeoAtlas全国GeoJSON省市区县json数据
- Data Science from Scratch First Principles wit
- Learning Data Mining With Python book 代码及数
- Data Science Fundamentals for Python and Mongo
- 数据挖掘课程设计.rar
- Introduction to Data Science - A Python Approa
- 英文原版-Bayesian Analysis with Python 1st
- Python Data Analysis Cookbook by Ivan Idris
- Learning Data Mining with Python - Second Edit
- Python for Everybody: Exploring Data in Python
- Problem Solving in Data Structures and Algorit
- Fundamentals of Python Data Structures 无水印
- Data Structures and Algorithms in Python 无水印
- Data Science from Scratch First Principles wit
- Data Structures and Algorithms in Python文字版
- problem-solving-with-algorithms-and-data-struc
- Problem Solving with Algorithms and DataStruct
- data-science-using-python-r
- Python for Data Analysis(2nd )中文带书签
- TBCNN 源码
- Python for Data Analysis 2nd Edition 英文高清
- 利用Python进行数据分析第二版(英文
- Learning IPython for Interactive Computing and
- 《Learning data mining with python》中文版
- Python for Data Analysis 2nd Edition最终版
- Python for Data Analysis 2nd Edition.pdf
-
me
tadata.txt - population_data.json
- Web Scraping with Python_Collecting Data from
评论
共有 条评论