资源简介
Kaggle平台泰坦尼克号数据集+源代码+注释

代码片段和文件信息
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
################################
# Preparing Data
################################
# read data from file
data = pd.read_csv(‘train.csv‘)
# fill nan values with 0
data = data.fillna(0)
# convert [‘male‘ ‘female‘] values of Sex to [1 0]
data[‘Sex‘] = data[‘Sex‘].apply(lambda s: 1 if s == ‘male‘ else 0)
# ‘Survived‘ is the label of one class
# add ‘Deceased‘ as the other class
data[‘Deceased‘] = data[‘Survived‘].apply(lambda s: 1 - s)
# select features and labels for training
dataset_X = data[[‘Sex‘ ‘Age‘ ‘Pclass‘ ‘SibSp‘ ‘Parch‘ ‘Fare‘]].as_matrix()
dataset_Y = data[[‘Deceased‘ ‘Survived‘]].as_matrix()
# split training data and validation set data
X_train X_val y_train y_val = train_test_split(dataset_X dataset_Y
test_size=0.2
random_state=42)
################################
# Constructing Dataflow Graph
################################
# create symbolic variables
X = tf.placeholder(tf.float32 shape=[None 6])
y = tf.placeholder(tf.float32 shape=[None 2])
# weights and bias are the variables to be trained
weights = tf.Variable(tf.random_normal([6 2]) name=‘weights‘)
bias = tf.Variable(tf.zeros([2]) name=‘bias‘)
y_pred = tf.nn.softmax(tf.matmul(X weights) + bias)
# Minimise cost using cross entropy
# NOTE: add a epsilon(1e-10) when calculate log(y_pred)
# otherwise the result will be -inf
cross_entropy = - tf.reduce_sum(y * tf.log(y_pred + 1e-10)
reduction_indices=1)
cost = tf.reduce_mean(cross_entropy)
# use gradient descent optimizer to minimize cost
train_op = tf.train.GradientDescentOptimizer(0.001).minimize(cost)
# calculate accuracy
correct_pred = tf.equal(tf.argmax(y 1) tf.argmax(y_pred 1))
acc_op = tf.reduce_mean(tf.cast(correct_pred tf.float32))
################################
# Training and Evaluating the model
################################
# use session to run the calculation
with tf.Session() as sess:
# variables have to be initialized at the first place
tf.global_variables_initializer().run()
# training loop
for epoch in range(10):
total_loss = 0.
for i in range(len(X_train)):
# prepare feed data and run
feed_dict = {X: [X_train[i]] y: [y_train[i]]}
# print(“x_train“)
#print(X_train[i])
_ loss = sess.run([train_op cost] feed_dict=feed_dict)
print(“number:“+str(i))
print(sess.run(y_predfeed_dict=feed_dict))
total_loss += loss
# display loss per epoch
#print(‘Epoch: %04d total loss=%.9f‘ % (epoch + 1 total_loss))
# Accuracy calculated by TensorFlow
accuracy = sess.run(acc_op feed_dict={X: X_val y: y_val})
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 26694 2018-04-27 11:42 Titannic\Taitan_onehot.csv
文件 3914 2018-04-27 12:41 Titannic\example1.py
文件 5864 2018-04-25 22:00 Titannic\example2.py
文件 3258 2018-04-25 16:32 Titannic\gender_submission.csv
文件 28629 2018-04-25 16:32 Titannic\test.csv
文件 61194 2018-04-25 16:32 Titannic\train.csv
目录 0 2018-04-27 13:18 Titannic\
- 上一篇:模拟电磁曲射炮论文.docx
- 下一篇:吾爱破解论坛学习脱壳
相关资源
- Iris数据集分类,查看几种分类方法的
- 陈强stata数据集
- 基于pytorch的UNet_demo实现及训练自己的
- 多目标跟踪MOT16_Benchmark数据集链接
- LCSTS高质量中文短文本摘要数据集
- EMC中国人寿再保险公司数据集中存储
- 银行搜索数据集(bankresearch dataset)
- 常用数据挖掘数据集
- Google论文\“Wide & Deep Learning for Recom
- 深度学习数据集标注
- WEKA arff 实验数据集---数据挖掘用
- 基于决策树和朴素贝叶斯算法对Adul
- kinetics600.tar.gz
- 系统中ETL和数据集市的架构设计和实
- titanic_dataset.csv泰坦尼克数据集
- 北大中文《人民日报》199801-199806数据
- 贝叶斯应用案例测试集及源码
- 消费金融场景下的用户购买预测_数据
- 深度学习: MNIST的数据集
- kaggle信用卡欺诈数据
- 中国地面气候资料日值数据集201801-
- WS 445-2014电子病历基本数据集1-17全集
- 今日头条38万条新闻数据标题
- Oxford花卉数据加文本描述数据集
- zhwiki-20200720-pages-articles-multistream5.xm
- 卫生部WS 445-2014电子病历基本数据集
- PHM2008 挑战赛数据集
- 中国地面气候资料日值数据集(V3.0)
- 案例实战信用卡欺诈检测数据集
- decisiontree决策树在adult数据集上的实现
评论
共有 条评论