新闻分类文本分类

大小: 54.35MB

文件类型: .rar

金币: 1

下载: 0 次

发布日期: 2022-12-14
语言: 其他
标签: CNN RNN

高速下载

资源简介

采用深度学习，cnn,rnn 两种方式对新闻类信息。进行分类预测。。。。仅供初学者练习使用

资源截图

小图大图

代码片段和文件信息

# coding: utf-8

import tensorflow as tf


class TCNNConfig（object）:
    “““CNN配置参数“““

    embedding_dim = 64  # 字的特征是64
    seq_length = 600  # 序列长度    句子长度600
    num_classes = 10  # 类别数
    num_filters = 256  # 卷积核数目
    kernel_size = 5  # 卷积核尺寸
    vocab_size = 5000  # 字数

    hidden_dim = 128  # 全连接层神经元

    dropout_keep_prob = 0.5  # dropout保留比例
    learning_rate = 1e-3  # 学习率0.001

    batch_size = 64  # 每批训练大小
    num_epochs = 10000  # 总迭代轮次

    print_per_batch = 100  # 每多少轮输出一次结果
    save_per_batch = 10  # 每多少轮存入tensorboard


class TextCNN（object）:
    “““文本分类，CNN模型“““

    def __init__（self config）:
        self.config = config

        # 三个待输入的数据
        self.input_x = tf.placeholder（tf.int32 [None self.config.seq_length] name=‘input_x‘）#句子长度（句子数600）
        self.input_y = tf.placeholder（tf.float32 [None self.config.num_classes] name=‘input_y‘）#标签类别（110）
        self.keep_prob = tf.placeholder（tf.float32 name=‘keep_prob‘）#设置的dropout

        self.cnn（）

    def cnn（self）:
        “““CNN模型“““
        # 字向量映射
        with tf.device（‘/cpu:0‘）:#5000行64列代表一个字
            embedding = tf.get_variable（‘embedding‘ [self.config.vocab_size self.config.embedding_dim]）#（500064）5000个字
            embedding_inputs = tf.nn.embedding_lookup（embedding self.input_x） #选取一个张量里面索引对应的元素 shape=（句子数 600 64）

        with tf.name_scope（“cnn“）:
            # CNN layer  embedding_inputs 是三维（hw）
            conv = tf.layers.conv1d（embedding_inputs self.config.num_filters self.config.kernel_size name=‘conv‘）
            #1*5的个卷积核，256个核 一维计算 输入的（? 600 64）      输出 shape=（? 596 256）  （600-5）/1+1=596  
            # global max pooling layer reduce_max计算张量的各个维度上的元素的最大值  64个句子，每个句子是600 每个字是256维（每个维度是1*5卷积）
            gmp = tf.reduce_max（conv reduction_indices=[1] name=‘gmp‘）#shape=（? 256） 按1维去取最大[[12][34]]指定按行列，不指定按均值

        with tf.name_scope（“score“）:
            # 全连接层，后面接dropout以及relu激活gmp输入的数据，hidden_dim输出的维度大小
            fc = tf.layers.dense（gmp self.config.hidden_dim name=‘fc1‘）#shape=（64 128）
            fc = tf.contrib.layers.dropout（fc self.keep_prob）#shape=（64 128）
            fc = tf.nn.relu（fc）#shape=（64 128）

            # 分类器
            self.logits = tf.layers.dense（fc self.config.num_classes name=‘fc2‘）#shape=（? 10）
            self.y_pred_cls = tf.argmax（tf.nn.softmax（self.logits） 1）  # 预测类别 shape=（?）按列取

        with tf.name_scope（“optimize“）:
            # 损失函数，交叉熵
            cross_entropy = tf.nn.softmax_cross_entropy_with_logits（logits=self.logits labels=self.input_y）# shape=（?）
            self.loss = tf.reduce_mean（cross_entropy）#shape=（） 
            # 优化器
            self.optim = tf.train.AdamOptimizer（learning_rate=self.config.learning_rate）.minimize（self.loss）

        with tf.name_scope（“accuracy“）:
            # 准确率
            correct_pred = tf.equal（tf.argmax（self.input_y 1） self.y_pred_cls）#shape=（?）
            self.acc = tf.reduce_mean（tf.cast（correct_pred tf.float32））#shape=（）

属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----

     文件         96  2018-04-25 15:32  text-classification-cnn-rnn\.gitignore

     文件    6127232  2018-08-30 10:55  text-classification-cnn-rnn\checkpoints\textcnn\best_validation.data-00000-of-00001

     文件       1554  2018-08-30 10:55  text-classification-cnn-rnn\checkpoints\textcnn\best_validation.index

     文件     474291  2018-08-30 10:55  text-classification-cnn-rnn\checkpoints\textcnn\best_validation.meta

     文件         87  2018-08-30 10:55  text-classification-cnn-rnn\checkpoints\textcnn\checkpoint

     文件       3518  2018-08-29 18:38  text-classification-cnn-rnn\cnn_model.py

     文件   27508648  2018-04-30 23:12  text-classification-cnn-rnn\data\cnews\cnews.test.txt

     文件  130089129  2018-04-30 23:13  text-classification-cnn-rnn\data\cnews\cnews.train.txt

     文件   11788178  2018-04-30 23:15  text-classification-cnn-rnn\data\cnews\cnews.val.txt

     文件      24784  2018-08-29 13:59  text-classification-cnn-rnn\data\cnews\cnews.vocab.txt

     文件      19782  2018-04-30 23:12  text-classification-cnn-rnn\data\cnews\cnews.vocab1.txt

     文件       5437  2018-08-29 17:02  text-classification-cnn-rnn\data\cnews_loader.py

     文件          0  2018-04-25 15:32  text-classification-cnn-rnn\data\__init__.py

     文件       4167  2018-08-29 17:05  text-classification-cnn-rnn\data\__pycache__\cnews_loader.cpython-36.pyc

     文件        150  2018-05-07 23:57  text-classification-cnn-rnn\data\__pycache__\__init__.cpython-36.pyc

     文件       1745  2018-04-25 15:32  text-classification-cnn-rnn\helper\cnews_group.py

     文件        440  2018-04-25 15:32  text-classification-cnn-rnn\helper\copy_data.sh

     文件          0  2018-04-25 15:32  text-classification-cnn-rnn\helper\__init__.py

     文件     185646  2018-04-25 15:32  text-classification-cnn-rnn\images\acc_loss.png

     文件     137770  2018-04-25 15:32  text-classification-cnn-rnn\images\acc_loss_rnn.png

     文件      60095  2018-04-25 15:32  text-classification-cnn-rnn\images\cnn_architecture.png

     文件      57085  2018-04-25 15:32  text-classification-cnn-rnn\images\rnn_architecture.png

     文件        586  2018-05-12 23:29  text-classification-cnn-rnn\New Project #2.wpr

     文件      38443  2018-07-14 01:10  text-classification-cnn-rnn\New Project #2.wpu

     文件       1890  2018-04-25 15:32  text-classification-cnn-rnn\predict.py

     文件         24  2018-04-25 15:32  text-classification-cnn-rnn\requirements.txt

     文件       4344  2018-08-31 18:05  text-classification-cnn-rnn\rnn_model.py

     文件       7396  2018-08-30 09:48  text-classification-cnn-rnn\run_cnn.py

     文件      81109  2018-08-29 16:51  text-classification-cnn-rnn\tensorboard\textcnn\events.out.tfevents.1535532668.LAPTOP-1V3PUQTB

     文件     100376  2018-08-29 17:29  text-classification-cnn-rnn\tensorboard\textcnn\events.out.tfevents.1535533727.LAPTOP-1V3PUQTB

............此处省略17个文件信息

上一篇：QGIS源码已通过编译测试，需要的朋友可以
下一篇：1万张数字验证码数据集

共有条评论

新闻分类文本分类

资源简介

资源截图

代码片段和文件信息

评论

相关资源