The primary and foreign keys for each table in the synthesized database and their relationships are displayed below.
Table Name | Primary Key | Foreign Keys |
---|---|---|
Branch_Expenses | ID | Branch_ID |
Branches | Branch_ID | |
Detailed_Patients_Visits | ID | Patients_visits_ID |
Doctors | Doctor_ID | Specialization_ID |
Doctors_Contacts | ID | Doctor_ID |
Patients | Patient_ID | |
Patients_Visits | Patients_visits_ID | Doctor_ID Patient_ID Visit_ID |
Specializations | Specialization_ID | Branch_ID |
Visits_Type | Visit_ID |
For each table, individual and cross-table Gretel Synthetic Reports are generated, which include the Synthetic Data Quality Score (SQS). The individual Synthetic Report evaluates the statistical accuracy of the individual synthetic table compared to the real world data that it is based on. This provides insight into the accuracy of the synthetic output of the stand-alone table. The individual SQS does not take into account statistical correlations of data across related tables. The cross-table Synthetic Report evaluates the statistical accuracy of the synthetic data of a table with consideration to the correlations between data across related tables. The cross-table SQS provides insight into the accuracy of the table in the context of the database as a whole. More information about the Gretel Synthetic Report and Synthetic Data Quality Score is available here.
Synthetic Data Quality ScoresFor each table, individual and cross-table synthetic data quality scores (SQS) are computed and displayed below.
Table Name | Individual SQS | Cross-table SQS |
---|---|---|
Branches | 68 Good | None Unavailable |
Specializations | 100 Excellent | 57 Moderate |
Branch_Expenses | 34 Poor | 40 Moderate |
Doctors_Contacts | 68 Good | 65 Good |
Detailed_Patients_Visits | 71 Good | 45 Moderate |
Patients_Visits | 72 Good | 39 Poor |
Doctors | 69 Good | 64 Good |
Patients | 66 Good | None Unavailable |
Visits_Type | 100 Excellent | None Unavailable |
The Synthetic Data Quality Score is an estimate of how well the generated synthetic data maintains the same statistical properties as the original dataset. In this sense, the Synthetic Data Quality Score can be viewed as a utility score or a confidence score as to whether scientific conclusions drawn from the synthetic dataset would be the same if one were to have used the original dataset instead. If you do not require statistical symmetry, as might be the case in a testing or demo environment, a lower score may be just as acceptable.
If your Synthetic Data Quality Score isn't as high as you'd like it to be, read here for a multitude of ideas for improving your model.