5 datasets found

My Complete Genome
kaggle.com
zip
Updated May 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeeshan-ul-hassan Usmani (2021). My Complete Genome [Dataset]. https://www.kaggle.com/datasets/zusmani/mygenome/discussion
Explore at:
zip(32869133 bytes)Available download formats
Dataset updated
May 15, 2021
Authors
Zeeshan-ul-hassan Usmani
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Zeeshan-ul-hassan Usmani’s Genome Phenotype SNPs Raw Data

Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I’ve used 23andMe (using Illumina HumanOmniExpress-24) for my DNA’s Phenotype SNPs. I am sharing the entire raw dataset here for the international research community for following reasons:

I am a firm believer in open dataset, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering my entire DNA raw data for the world to use for research without worrying about privacy. I call it copyleft dataset.

Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share my data to bridge the gap and I expect others to follow the trend.

I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using my DNA data. Please use it the way you will

Content

Name: Zeeshan-ul-hassan Usmani

Age: 38 Years

Country of Birth: Pakistan

Country of Ancestors: India (Utter Pradesh - UP)

File: GenomeZeeshanUsmani.csv

Size: 15 MB

Sources: 23andMe Personalized Genome Report

The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via the dataset and become one of the few genomics early adopters.

The data-set is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.

A complete list of the exact SNPs (base pairs) available and their data-set index can be found at https://api.23andme.com/res/txt/snps.b4e00fe1db50.data

For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes

Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes

Acknowledgements

Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Genome Phenotype SNPS Raw Data File by 23andMe, Kaggle Dataset Repository, Jan 25, 2017.”

Useful Links

You may use the following human genome database sites for help:

GenBank - https://www.ncbi.nlm.nih.gov/genbank/

The Human Genome Project - https://www.genome.gov/hgp/

Genomes OnLine Database (GOLD) - https://gold.jgi.doe.gov

Complete Genomics - http://www.completegenomics.com/public-data/

Inspiration

Some ideas worth exploring:

Is the individual in question more susceptible to cancer?

Does he tend to gain weight?

Where is his place of origin?

Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc.

How does this phenotype SNPs compare with other similar datasets from the western-world?

What would be the likely cause of death for this person?

What are the most likely diseases/illnesses this person is going to face in lifetime?

What is unique about this dataset?

What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup?

Sample Reports

Please check out following reports to understand what can be done with this data

Ancestry – https://www.23andme.com/published-report/eeb4f9bbd6b5474f/?share_id=f6c5562848e84586

Weight Report - https://you.23andme.com/published/reports/65c9af9f8223456d/?share_id=0126f129e4f3458b
Family of Five - Genome Dataset
kaggle.com
zip
Updated Mar 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeeshan-ul-hassan Usmani (2021). Family of Five - Genome Dataset [Dataset]. https://www.kaggle.com/zusmani/family-genome-dataset
Explore at:
zip(27521639 bytes)Available download formats
Dataset updated
Mar 7, 2021
Authors
Zeeshan-ul-hassan Usmani
Description
Context

Complete Genome of a Family of five - Two Parents, Three Siblings (Genome Phenotype SNPs Raw Data)

Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I have used 23andMe (using Illumina HumanOmniExpress-24) for this family's DNA’s Phenotype SNPs. I am sharing the entire raw dataset of the family of five (Father, Mother and Three Brothers) here for the international research community for the following reasons:

I am a firm believer in open datasets, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering this entire family DNA raw data for the world to use for research without worrying about privacy.

Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share this data to bridge the gap and I expect others to follow the trend.

I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using this DNA dataset. Please use it the way you will

Content

Family Origin: Pakistani

Country of Grandparents/Ancestors: India (Kerana, Utter Pradesh - UP)

Files: Father, Mother, Child 1, Child 2, Child 3 (All CSVs)

Size: 75 MB

Sources: 23andMe Personalized Genome Reports

The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via this dataset and become one of the few genomics early adopters.

The dataset is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.

A complete list of the exact SNPs (base pairs) available and their data-set index can be found at https://api.23andme.com/res/txt/snps.b4e00fe1db50.data

For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes

Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes

Acknowledgements

Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Family of Give Genomic Dataset by 23andMe, Kaggle Dataset Repository, March 7, 2021.”

Useful Links

You may use the following human genome database sites for help:

GenBank - https://www.ncbi.nlm.nih.gov/genbank/

The Human Genome Project - https://www.genome.gov/hgp/

Genomes OnLine Database (GOLD) - https://gold.jgi.doe.gov

Complete Genomics - http://www.completegenomics.com/public-data/

Inspiration

Some ideas worth exploring:

Any individuals in the dataset more susceptible to cancer?

Does he/she tend to gain weight?

Where is his/her place of origin?

Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc.

How does this phenotype SNPs compare with other similar datasets from the western-world?

How the family differ in genomic makeup? Which traits are silent, which ones are dominant?

What would be the likely cause of death for any given person?

What are the most likely diseases/illnesses this family is going to face in lifetime?

What is unique about this dataset?

Can you compare the genomes within this family and see which diseases will have less or more impact on a given family member?

Can you delineate recombination sites precisely, identify sequence errors or find rare SNPs?

What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup?
r
Data from: openSNP
rrid.site
neuinfo.org
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). openSNP [Dataset]. http://identifiers.org/RRID:SCR_001636
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001636 https://identifiers.org/RRID:SCR_001636/resolver?q=*&i=rrid
Dataset updated
Jan 29, 2022
Description
Database of raw data from people who have shared their direct-to-customer (DTC) genetic results from 23andMe, deCODEme or FamilyTreeDNA. Logged-In users can search the database for users with specific phenotypes and mass-download all corresponding SNP-datasets. This allows you to get datasets like All genotyping files of openSNP-users that have Alzheimer and the corresponding control group. They are currently working on providing API-access. You can also use JSON to get access to openSNP-data and some other ways: If you want to automate the file-downloads for a given phenotype the RSS-feeds could help you. Inside the RSS-XML there are 2 flags you could use to automatically create correct genotype-groups: gives you the variation of this user at the phenotype you are looking at and gives you the download link. If you were genotyped by 23andMe, deCODEme or FamilyTreeDNA (contact them regarding others) you can upload the raw genotype data which you can download from your DTC test provider. The data will then be openly available for the world to see and download. They also parse these SNPs and annotate them. For annotation they include the manually curated SNPedia and find Open Access primary publications which appear in the journals of The Public Library of Science (PLoS), an Open Access publishing group. Additionally they screen Mendeley, a crowd-sourced repository of scientific publications. You can also publish some of your phenotypes so some day it might get possible to associate some SNPs with phenotypes. You can also share your knowledge about SNPs and phenotypes with other users and can socialize.
Z
OpenSNP data-freeze of 5,393 (19.10.2020)
data.niaid.nih.gov
Updated Oct 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lu; Tzovaras; Gough (2021). OpenSNP data-freeze of 5,393 (19.10.2020) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5047472
Explore at:
Dataset updated
Oct 19, 2021
Dataset provided by
MRC Laboratory of Molecular Biology
Authors
Lu; Tzovaras; Gough
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The open-source GenomePrep tool-kit, developed on the goodwill of open genome data, addresses the problem of processing raw DTC DNA data in the context of the present: genotype arrays. The output of GenomePrep are DNA datafiles of homogenous formats (23andMe-like or vcf), which enable further research analysis (example). A single combined data-freeze of OpenSNP genomes that passed checks is available here.

For more information, visit https://supfam.mrc-lmb.cam.ac.uk/GenomePrep/

Global At Home DNA Test Kits Market Research Report: By Test Type (Ancestry...

wiseguyreports.com

Updated Aug 22, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global At Home DNA Test Kits Market Research Report: By Test Type (Ancestry Testing, Health Testing, Paternity Testing, Nutritional Testing), By Application (Personal Use, Clinical Use, Research Use), By Distribution Channel (Online Retail, Pharmaceutical Stores, Supermarkets, Specialty Stores), By Age Group (Children, Adults, Seniors) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/at-home-dna-tests-kits-market

Explore at:

Dataset updated

Aug 22, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Aug 1, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	1,000(USD Million)
MARKET SIZE 2025	1,200(USD Million)
MARKET SIZE 2035	3,500(USD Million)
SEGMENTS COVERED	Test Type, Application, Distribution Channel, Age Group, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	increasing consumer awareness, technological advancements, rise in health consciousness, direct-to-consumer sales, regulatory challenges
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Map My Genome, Orig3n, Gene by Gene, Vitagene, MyHeritage, Living DNA, Helix, Genomind, Everlywell, Genetic Technologies, 23andMe, DNAfit, FamilyTreeDNA, Strands, Ancestry
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Personalized health insights, Ancestry exploration services, Genetic disease risk assessment, Enhanced marketing strategies, Partnerships with healthcare providers
COMPOUND ANNUAL GROWTH RATE (CAGR)	11.7% (2025 - 2035)

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Zeeshan-ul-hassan Usmani (2021). My Complete Genome [Dataset]. https://www.kaggle.com/datasets/zusmani/mygenome/discussion

My Complete Genome

6,000 Base-Pairs of Phenotype SNPs - Complete Raw Data

Explore at:

50 scholarly articles cite this dataset (View in Google Scholar)

zip(32869133 bytes)Available download formats

Dataset updated

May 15, 2021

Authors

Zeeshan-ul-hassan Usmani

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Zeeshan-ul-hassan Usmani’s Genome Phenotype SNPs Raw Data

Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I’ve used 23andMe (using Illumina HumanOmniExpress-24) for my DNA’s Phenotype SNPs. I am sharing the entire raw dataset here for the international research community for following reasons:

I am a firm believer in open dataset, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering my entire DNA raw data for the world to use for research without worrying about privacy. I call it copyleft dataset.
Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share my data to bridge the gap and I expect others to follow the trend.
I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using my DNA data. Please use it the way you will

Content

Name: Zeeshan-ul-hassan Usmani

Age: 38 Years

Country of Birth: Pakistan

Country of Ancestors: India (Utter Pradesh - UP)

File: GenomeZeeshanUsmani.csv

Size: 15 MB

Sources: 23andMe Personalized Genome Report

The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via the dataset and become one of the few genomics early adopters.

The data-set is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.

A complete list of the exact SNPs (base pairs) available and their data-set index can be found at https://api.23andme.com/res/txt/snps.b4e00fe1db50.data

For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes

Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes

Acknowledgements

Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Genome Phenotype SNPS Raw Data File by 23andMe, Kaggle Dataset Repository, Jan 25, 2017.”

Useful Links

You may use the following human genome database sites for help:

GenBank - https://www.ncbi.nlm.nih.gov/genbank/
The Human Genome Project - https://www.genome.gov/hgp/
Genomes OnLine Database (GOLD) - https://gold.jgi.doe.gov
Complete Genomics - http://www.completegenomics.com/public-data/

Inspiration

Some ideas worth exploring:

Is the individual in question more susceptible to cancer?
Does he tend to gain weight?
Where is his place of origin?
Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc.
How does this phenotype SNPs compare with other similar datasets from the western-world?
What would be the likely cause of death for this person?
What are the most likely diseases/illnesses this person is going to face in lifetime?
What is unique about this dataset?
What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup?

Sample Reports

Please check out following reports to understand what can be done with this data

Ancestry – https://www.23andme.com/published-report/eeb4f9bbd6b5474f/?share_id=f6c5562848e84586

Weight Report - https://you.23andme.com/published/reports/65c9af9f8223456d/?share_id=0126f129e4f3458b

Clear search

Close search

Google apps

Main menu

My Complete Genome

Context

Content

Acknowledgements

Useful Links

Inspiration

Sample Reports

Family of Five - Genome Dataset

Context

Content

Acknowledgements

Useful Links

Inspiration

Data from: openSNP

OpenSNP data-freeze of 5,393 (19.10.2020)

Global At Home DNA Test Kits Market Research Report: By Test Type (Ancestry...

My Complete Genome

6,000 Base-Pairs of Phenotype SNPs - Complete Raw Data

Context

Content

Acknowledgements

Useful Links

Inspiration

Sample Reports