语音识别系统通常包含多个模块和文件,每个模块负责不同的功能。以下是一个典型的语音识别系统的目录结构和主要文件及其功能的详细解释,并提供相关的代码示例。 ### 目录结构 ``` speech_recognition/ ├── data/ │ ├── audio/ │ │ └── example.wav │ └── transcripts/ │ └── example.txt ├── models/ │ ├── model.h5 │ └── weights.h5 ├── preprocess/ │ ├── feature_extraction.py │ └── data_loader.py ├── train/ │ ├── train.py │ └── config.json ├── evaluate/ │ ├── evaluate.py │ └── metrics.py ├── infer/ │ ├── inference.py │ └── utils.py ├── requirements.txt └── README.md ``` ### 主要文件及其功能 #### `data/` - **audio/**: 存放音频文件。 - `example.wav`: 示例音频文件。 - **transcripts/**: 存放对应的转录文本。 - `example.txt`: 示例音频的转录文本。 #### `models/` - **model.h5**: 保存训练好的模型架构。 - **weights.h5**: 保存训练好的模型权重。 #### `preprocess/` - **feature_extraction.py**: 用于从音频文件中提取特征(如MFCC、Mel频谱等)。 - **data_loader.py**: 用于加载和预处理数据。 #### `train/` - **train.py**: 用于训练模型的主脚本。 - **config.json**: 配置文件,包含训练参数。 #### `evaluate/` - **evaluate.py**: 用于评估模型性能的主脚本。 - **metrics.py**: 定义评估指标的函数。 #### `infer/` - **inference.py**: 用于进行推理的主脚本。 - **utils.py**: 辅助工具函数,如加载模型、处理输入输出等。 #### `requirements.txt` - 列出项目所需的Python库和版本。 #### `README.md` - 项目的说明文档,介绍项目的功能、使用方法等。 ### 代码示例 #### `preprocess/feature_extraction.py` ```python import librosa import numpy as np def extract_features(file_path, n_mfcc=13): y, sr = librosa.load(file_path) mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc) return mfccs.T if __name__ == "__main__": file_path = 'data/audio/example.wav' features = extract_features(file_path) print(features) ``` #### `preprocess/data_loader.py` ```python import os import numpy as np from feature_extraction import extract_features def load_data(data_dir, label_dir): audio_files = [os.path.join(data_dir, f) for f in os.listdir(data_dir)] labels = [] with open(label_dir, 'r') as f: labels = f.readlines() features = [] for audio_file in audio_files: features.append(extract_features(audio_file)) return np.array(features), np.array(labels) if __name__ == "__main__": data_dir = 'data/audio' label_dir = 'data/transcripts/example.txt' features, labels = load_data(data_dir, label_dir) print(features.shape, labels.shape) ``` #### `train/train.py` ```python import json import numpy as np from keras.models import Sequential from keras.layers import Dense, LSTM from sklearn.model_selection import train_test_split from preprocess.data_loader import load_data def build_model(input_shape): model = Sequential() model.add(LSTM(128, input_shape=input_shape, return_sequences=True)) model.add(LSTM(64)) model.add(Dense(32, activation='relu')) model.add(Dense(10, activation='softmax')) # Assuming 10 classes for simplicity return model if __name__ == "__main__": with open('train/config.json', 'r') as f: config = json.load(f) data_dir = config['data_dir'] label_dir = config['label_dir'] X, y = load_data(data_dir, label_dir) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2) model = build_model((X_train.shape[1], X_train.shape[2])) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=config['epochs'], batch_size=config['batch_size']) model.save('models/model.h5') model.save_weights('models/weights.h5') ``` #### `evaluate/evaluate.py` ```python import numpy as np from keras.models import load_model from sklearn.metrics import accuracy_score from preprocess.data_loader import load_data def evaluate_model(model_path, data_dir, label_dir): model = load_model(model_path) X, y = load_data(data_dir, label_dir) y_pred = model.predict(X) y_pred_classes = np.argmax(y_pred, axis=1) y_true = np.argmax(y, axis=1) accuracy = accuracy_score(y_true, y_pred_classes) return accuracy if __name__ == "__main__": model_path = 'models/model.h5' data_dir = 'data/audio' label_dir = 'data/transcripts/example.txt' accuracy = evaluate_model(model_path, data_dir, label_dir) print(f"Model Accuracy: {accuracy}") ``` #### `infer/inference.py` ```python import numpy as np from keras.models import load_model from preprocess.feature_extraction import extract_features from infer.utils import decode_predictions def predict(audio_path, model_path): model = load_model(model_path) features = extract_features(audio_path) features = np.expand_dims(features, axis=0) # Add batch dimension prediction = model.predict(features) return decode_predictions(prediction) if __name__ == "__main__": audio_path = 'data/audio/example.wav' model_path = 'models/model.h5' prediction = predict(audio_path, model_path) print(f"Predicted Transcript: {prediction}") ``` #### `infer/utils.py` ```python import numpy as np def decode_predictions(predictions): # Assuming a simple mapping from index to character for demonstration purposes index_to_char = {i: chr(97 + i) for i in range(26)} # Example mapping for lowercase letters pred_indices = np.argmax(predictions, axis=-1) return ''.join([index_to_char[idx] for idx in pred_indices]) ``` 以上是一个简单的语音识别系统的目录结构和代码示例。实际项目中可能会更复杂,包括更多的预处理步骤、更复杂的模型架构以及更详细的评估指标等。