CNN卷积神经网络实现语音识别.zip

大小: 5KB

文件类型: .zip

金币: 2

下载: 0 次

发布日期: 2021-06-04
语言: C/C++
标签: cnn 语音识别

高速下载

资源简介

目的：使用CNN卷积神经网络实现语音识别步骤：（1）预处理。首尾端的静音切除，降低对后续步骤造成的干扰，然后进行声音分帧，把声音切开成帧，，各帧之间一般是有交叠。（2）特征提取。运用的算法为倒谱系数（MFCC），把每一帧波形变成一个包含声音信息的多维向量；（3）RNN模型训练。有了特征，就可以使用TensorFlow完成模型的建立和训练了。（4）验证模型。目标：对相应的声音数据进行分类，例如数据的是数数的数据，能够输出对应的数字。

资源截图

小图大图

代码片段和文件信息


# -*- coding: utf-8 -*-

import tensorflow as tf
import scipy.io.wavfile as wav
from python_speech_features import mfccdelta
import os
import numpy as np
import sklearn.preprocessing

path_film = os.path.abspath（‘.‘）
path = path_film + “/data/xunlian/“
test_path = path_film + “/data/test_data/“
isnot_test_path = path_film + “/data/isnot_test_path/“

#使用one-hot编码，将离散特征的取值扩展到了欧式空间
#全局one-hot编码空间
label_binarizer = ““
def def_one_hot（x）:
    if label_binarizer == ““:
        binarizer = sklearn.preprocessing.LabelBinarizer（）
    else:
        binarizer = label_binarizer
    binarizer.fit（range（max（x）+1））
    y= binarizer.transform（x）
    return y

“““读取文件位置“““
def read_wav_path（path）:

    map_path map_relative = [str（path） + str（x） for x in os.listdir（path） if os.path.isfile（str（path） + str（x））] [y for y in os.listdir（path）]
    return map_path map_relative

“““获得mfcc系数“““
def def_wav_read_mfcc（file_name）:
    fs audio = wav.read（file_name）
    processed_audio = mfcc（audio samplerate=fs nfft=512）
    return processed_audio

“““获取输入的矩阵形状（大小）“““
def find_matrix_max_shape（audio）:
    h l = 0 0
    for a in audio:
        a b = np.array（a）.shape
        if a > h:
            h=a
        if b > l:
            l=b
    return 700l


def matrix_make_up（audio）:
    h l = find_matrix_max_shape（audio）
    new_audio = []
    for aa in audio:
        zeros_matrix = np.zeros（[h l]np.int8）
        a b = np.array（aa）.shape
        for i in range（a）:
            for j in range（b）:
                zeros_matrix[i j]=zeros_matrix[ij]+aa[ij]
        new_audio.append（zeros_matrix）
    return new_audiohl

def read_wav_matrix（path）:
    map_path map_relative = read_wav_path（path）
    audio=[]
    labels=[]
    for idx folder in enumerate（map_path）:
        processed_audio_delta = def_wav_read_mfcc（folder）
        audio.append（processed_audio_delta）
        labels.append（int（map_relative[idx].split（“.“）[0].split（“_“）[0]））

    x_datahl = matrix_make_up（audio）
    x_data = np.array（x_data）
    # 得到文件夹内每种语音的one-hot编码
    x_label = np.array（def_one_hot（labels））
    return x_data x_label h l

“““初始化权值“““
def weight_variable（shapename）:
    initial = tf.truncated_normal（shapestddev=0.01）#生成一个截断的正态分布
    return tf.Variable（initialname=name）

“““初始化偏置“““
def bias_variable（shapename）:
    initial = tf.constant（0.01shape=shape）
    return tf.Variable（initialname=name）


“““卷积层“““
def conv2d（xW）:
    # x input tensor of shape ‘[batch in_height in_width in_channels]‘[训练时一个batch的图片数量 图片高度 图片宽度 图像通道数]
    # W filter / kernel tensor of shape [filter_height filter_width in_channels out_channels][卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]
    #‘strides[0] = strides[3] = 1‘. strides[1]代表x方向的步长，strides[2]代表y方向的步长
    # padding: A ‘string‘ from: ‘“SAME“ “VALID“‘
    return tf.nn.conv2d（xWstrides=[1111]padding=‘SAME‘）

“““池化层“““
def max_pool_2x2（x）:
    #[池化的输入

属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----
     文件        9694  2019-07-02 15:26  speechRecogined.py
     文件        2993  2019-07-02 09:26  test.py

上一篇：MFC 多线程之间通过消息传递数据
下一篇：瑞利信道下的正规LDPC性能.rar

共有条评论

CNN卷积神经网络实现语音识别.zip

资源简介

资源截图

代码片段和文件信息

评论

相关资源