Intelligent Audio Indexer Based On Semantic And Non-semantic Information (ART/323CP) | ASTRI - Hong Kong Applied Science And Technology Research Institute Company Limited

Project Title:

Intelligent Audio Indexer Based on Semantic and Non-semantic Information (ART/323CP)

Project Reference:

ART/323CP

Project Type:

Platform

Project Period:

22 / 04 / 2021 - 21 / 04 / 2022

Funds Approved (HK$’000):

5,779.893

Project Coordinator:

Dr Luke Yunzhao LU

Deputy Project Coordinator:

Deliverable:

Research Group:

Sponsor:

Independent Commission Against Corruption (Sponsor)

Description:

ICAC keeps massive recordings of interviews, complaints, telephone calls, and so on. The recordings are not well structured data, posing difficulties to search for and retrieve information for investigation purpose. This project will utilize speech recognition on mixed languages, and extend the researches on acoustic scene classification, speech indexing, and voiceprint indexing, which can help ICAC use multiple criteria to efficiently search for and retrieve information. The research on intelligent audio indexer will build topic adaptive models with acoustic scene classification technology for semantic indexing, and speaker recognition model for voiceprint indexing. Based on these researches, ICAC will provide 100 hours of preliminary training data to train the machine learning models. The research on acoustic scene classification is designed as preparing audio data, transforming audio to Mel-spectrograms in time frequency field, and deep neural network training with convolutional neural network, and fully connected output as global classifier. The global classifier is then used to classify an incoming audio to a specific topic. The research on voiceprint indexing is designed as preparing audio data, encoding audio features to speaker embeddings, and time delay neural network training with i-vectors to build a scoring model. The project deliverables will include speech transcription based on topic adaptive models with acoustic scene classification, audio indexer based on semantic information and non-semantic information(voiceprint), and related server service as a total solution.

Co-Applicant:

Keywords:

More about ASTRI

Hong Kong Applied Science and Technology Research Institute (ASTRI) is founded in 2000 by the Government of the Hong Kong Special Administrative Region. As Hong Kong’s largest government-funded research and development centre, ASTRI is a key force and super connector driving innovation and technology advancement in Hong Kong, the Greater Bay Area, and the global innovation ecosystem. This role is strengthened through its merger with Nano and Advanced Materials Institute (NAMI).

Committed to bringing high impact research from lab to market, ASTRI advances interdisciplinary, market driven R&D that delivers practical innovations for industry and meaningful benefits for society.

Its technological strengths and capabilities enable AI powered, cross disciplinary ICT and advanced materials solutions across Smart City, FinTech, Digital Health and Life Sciences, New Industrialisation and Intelligent Manufacturing, Application-Specific Integrated Circuits (ASIC) and Advanced Electronics, New Energy and Energy Storage, and Green and ESG Technologies.

Over the years, ASTRI has cultivated a strong pool of I&T talent and earned numerous international awards for its pioneering innovations and contributions to industry and the community. Following the merger with NAMI, ASTRI’s cumulative patent portfolio has grown to over 1,500, with more than 2,200 technologies successfully transferred to industry.

With enhanced scale and capabilities, ASTRI is uniquely positioned to elevate Hong Kong’s innovation capacity, accelerate technology commercialisation, and drive societal and economic progress through world class applied research.