完整用CNN（Tensorflow）完成文本分类的工程

大小: 464KB

文件类型: .zip

金币: 2

下载: 0 次

发布日期: 2021-06-13
语言: 其他
标签: Tensorflow cnn 深度学习

高速下载

资源简介

网站的用Tensorflow完成文本分类任务的完整工程代码
包括：训练、运行、和评估所有的代码。
打包文件里面还包含了一个影评标记过的影评的语料库。
开箱即用

资源截图

小图大图

代码片段和文件信息

import numpy as np
import re


def clean_str（string）:
    “““
    Tokenization/string cleaning for all datasets except for SST.
    Original taken from https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py
    “““
    string = re.sub（r“[^A-Za-z0-9（）!?\‘\‘]“ “ “ string）
    string = re.sub（r“\‘s“ “ \‘s“ string）
    string = re.sub（r“\‘ve“ “ \‘ve“ string）
    string = re.sub（r“n\‘t“ “ n\‘t“ string）
    string = re.sub（r“\‘re“ “ \‘re“ string）
    string = re.sub（r“\‘d“ “ \‘d“ string）
    string = re.sub（r“\‘ll“ “ \‘ll“ string）
    string = re.sub（r““ “  “ string）
    string = re.sub（r“!“ “ ! “ string）
    string = re.sub（r“\（“ “ \（ “ string）
    string = re.sub（r“\）“ “ \） “ string）
    string = re.sub（r“\?“ “ \? “ string）
    string = re.sub（r“\s{2}“ “ “ string）
    return string.strip（）.lower（）


def load_data_and_labels（positive_data_file negative_data_file）:
    “““
    Loads MR polarity data from files splits the data into words and generates labels.
    Returns split sentences and labels.
    “““
    # Load data from files
    positive_examples = list（open（positive_data_file “r“ encoding=‘utf-8‘）.readlines（））
    positive_examples = [s.strip（） for s in positive_examples]
    negative_examples = list（open（negative_data_file “r“ encoding=‘utf-8‘）.readlines（））
    negative_examples = [s.strip（） for s in negative_examples]
    # Split by words
    x_text = positive_examples + negative_examples
    x_text = [clean_str（sent） for sent in x_text]
    # Generate labels
    positive_labels = [[0 1] for _ in positive_examples]
    negative_labels = [[1 0] for _ in negative_examples]
    y = np.concatenate（[positive_labels negative_labels] 0）
    return [x_text y]


def batch_iter（data batch_size num_epochs shuffle=True）:
    “““
    Generates a batch iterator for a dataset.
    “““
    data = np.array（data）
    data_size = len（data）
    num_batches_per_epoch = int（（len（data）-1）/batch_size） + 1
    for epoch in range（num_epochs）:
        # Shuffle the data at each epoch
        if shuffle:
            shuffle_indices = np.random.permutation（np.arange（data_size））
            shuffled_data = data[shuffle_indices]
        else:
            shuffled_data = data
        for batch_num in range（num_batches_per_epoch）:
            start_index = batch_num * batch_size
            end_index = min（（batch_num + 1） * batch_size data_size）
            yield shuffled_data[start_index:end_index]

属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----
     目录           0  2019-12-12 11:22  data\
     目录           0  2018-07-21 13:36  data\rt-polaritydata\
     文件      612506  2018-07-21 13:36  data\rt-polaritydata\rt-polarity.neg
     文件      626395  2018-07-21 13:36  data\rt-polaritydata\rt-polarity.pos
     文件        2472  2018-07-21 13:36  data_helpers.py
     文件        3738  2018-07-21 13:36  eval.py
     文件        2280  2018-07-21 13:36  README.md
     文件        3776  2018-07-21 13:36  text_cnn.py
     文件        9073  2018-07-21 13:36  train.py

共有条评论

完整用CNN（Tensorflow）完成文本分类的工程

资源简介

资源截图

代码片段和文件信息

评论

相关资源