Intelligent Audio Indexer Based on Semantic and Non-semantic Information (ART/323CP)

Intelligent Audio Indexer Based on Semantic and Non-semantic Information (ART/323CP)

Intelligent Audio Indexer Based on Semantic and Non-semantic Information (ART/323CP)
ART/323CP
Platform
22 / 04 / 2021 - 21 / 04 / 2022
5,779.893

Dr Luke Yunzhao LU

Independent Commission Against Corruption (Sponsor)


ICAC keeps massive recordings of interviews, complaints, telephone calls, and so on. The recordings are not well structured data, posing difficulties to search for and retrieve information for investigation purpose. This project will utilize speech recognition on mixed languages, and extend the researches on acoustic scene classification, speech indexing, and voiceprint indexing, which can help ICAC use multiple criteria to efficiently search for and retrieve information. The research on intelligent audio indexer will build topic adaptive models with acoustic scene classification technology for semantic indexing, and speaker recognition model for voiceprint indexing. Based on these researches, ICAC will provide 100 hours of preliminary training data to train the machine learning models. The research on acoustic scene classification is designed as preparing audio data, transforming audio to Mel-spectrograms in time frequency field, and deep neural network training with convolutional neural network, and fully connected output as global classifier. The global classifier is then used to classify an incoming audio to a specific topic. The research on voiceprint indexing is designed as preparing audio data, encoding audio features to speaker embeddings, and time delay neural network training with i-vectors to build a scoring model. The project deliverables will include speech transcription based on topic adaptive models with acoustic scene classification, audio indexer based on semantic information and non-semantic information(voiceprint), and related server service as a total solution.