Astri Logo White
Search icon
Astri Logo White
search icon

A Programmable Automated Tabular-Data Generation method for Machine Learning (ARD/330)

Project Title:
A Programmable Automated Tabular-Data Generation method for Machine Learning (ARD/330)
Project Reference:
ARD/330
Project Type:
Seed
Project Period:
10 / 12 / 2024 - 09 / 12 / 2025
Funds Approved (HK$’000):
2,796.800
Project Coordinator:
Dr Jitao OU
Deputy Project Coordinator:
/
Deliverable:
Research Group:
/
Sponsor:
Description:

The availability of high-quality, diverse datasets has been key to the rapid progress of data-driven applications and machine learning. However, the collection and sharing of real-world data, present significant challenges related to cost, collection bias and data privacy. In response to these challenges, the concept of synthetic data, or artificially generated data, has gained traction as a promising solution. By creating synthetic datasets that capture the statistical properties and patterns of real data, researchers can experiment with diverse datasets, simulate rare or extreme scenarios, and enhance model generalization without compromising data privacy. This project aims to explore technology for artificial data generation, and to develop an integration data generation software tool with programmable features that generates tabular and time-series data that closely resembles real data while ensuring the protection of sensitive information. This software tool will provide an integrated low-cost solution for organizations looking to leverage synthetic data for innovative and privacy-preserving machine learning modeling. It will be beneficial for a wide variety of stakeholders that include technical solution vendors, industry data holders, model auditing service agencies, and the public.

Co-Applicant:
/
Keywords:
/