5 datasets found
  1. My Complete Genome

    • kaggle.com
    zip
    Updated May 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeeshan-ul-hassan Usmani (2021). My Complete Genome [Dataset]. https://www.kaggle.com/datasets/zusmani/mygenome/discussion
    Explore at:
    zip(32869133 bytes)Available download formats
    Dataset updated
    May 15, 2021
    Authors
    Zeeshan-ul-hassan Usmani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Zeeshan-ul-hassan Usmani’s Genome Phenotype SNPs Raw Data

    Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I’ve used 23andMe (using Illumina HumanOmniExpress-24) for my DNA’s Phenotype SNPs. I am sharing the entire raw dataset here for the international research community for following reasons:

    1. I am a firm believer in open dataset, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering my entire DNA raw data for the world to use for research without worrying about privacy. I call it copyleft dataset.

    2. Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share my data to bridge the gap and I expect others to follow the trend.

    3. I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using my DNA data. Please use it the way you will

    Content

    Name: Zeeshan-ul-hassan Usmani

    Age: 38 Years

    Country of Birth: Pakistan

    Country of Ancestors: India (Utter Pradesh - UP)

    File: GenomeZeeshanUsmani.csv

    Size: 15 MB

    Sources: 23andMe Personalized Genome Report

    The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via the dataset and become one of the few genomics early adopters.

    The data-set is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.

    A complete list of the exact SNPs (base pairs) available and their data-set index can be found at https://api.23andme.com/res/txt/snps.b4e00fe1db50.data

    For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes

    Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes

    Acknowledgements

    Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Genome Phenotype SNPS Raw Data File by 23andMe, Kaggle Dataset Repository, Jan 25, 2017.”

    Useful Links

    You may use the following human genome database sites for help:

    Inspiration

    Some ideas worth exploring:

    • Is the individual in question more susceptible to cancer?
    • Does he tend to gain weight?
    • Where is his place of origin?
    • Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc.
    • How does this phenotype SNPs compare with other similar datasets from the western-world?
    • What would be the likely cause of death for this person?
    • What are the most likely diseases/illnesses this person is going to face in lifetime?
    • What is unique about this dataset?
    • What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup?

    Sample Reports

    Please check out following reports to understand what can be done with this data

    Ancestry – https://www.23andme.com/published-report/eeb4f9bbd6b5474f/?share_id=f6c5562848e84586

    Weight Report - https://you.23andme.com/published/reports/65c9af9f8223456d/?share_id=0126f129e4f3458b

  2. Family of Five - Genome Dataset

    • kaggle.com
    zip
    Updated Mar 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeeshan-ul-hassan Usmani (2021). Family of Five - Genome Dataset [Dataset]. https://www.kaggle.com/zusmani/family-genome-dataset
    Explore at:
    zip(27521639 bytes)Available download formats
    Dataset updated
    Mar 7, 2021
    Authors
    Zeeshan-ul-hassan Usmani
    Description

    Context

    Complete Genome of a Family of five - Two Parents, Three Siblings (Genome Phenotype SNPs Raw Data)

    Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I have used 23andMe (using Illumina HumanOmniExpress-24) for this family's DNA’s Phenotype SNPs. I am sharing the entire raw dataset of the family of five (Father, Mother and Three Brothers) here for the international research community for the following reasons:

    I am a firm believer in open datasets, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering this entire family DNA raw data for the world to use for research without worrying about privacy.

    Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share this data to bridge the gap and I expect others to follow the trend.

    I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using this DNA dataset. Please use it the way you will

    Content

    Family Origin: Pakistani

    Country of Grandparents/Ancestors: India (Kerana, Utter Pradesh - UP)

    Files: Father, Mother, Child 1, Child 2, Child 3 (All CSVs)

    Size: 75 MB

    Sources: 23andMe Personalized Genome Reports

    The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via this dataset and become one of the few genomics early adopters.

    The dataset is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.

    A complete list of the exact SNPs (base pairs) available and their data-set index can be found at https://api.23andme.com/res/txt/snps.b4e00fe1db50.data

    For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes

    Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes

    Acknowledgements

    Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Family of Give Genomic Dataset by 23andMe, Kaggle Dataset Repository, March 7, 2021.”

    Useful Links

    You may use the following human genome database sites for help:

    GenBank - https://www.ncbi.nlm.nih.gov/genbank/

    The Human Genome Project - https://www.genome.gov/hgp/

    Genomes OnLine Database (GOLD) - https://gold.jgi.doe.gov

    Complete Genomics - http://www.completegenomics.com/public-data/

    Inspiration

    Some ideas worth exploring:

    Any individuals in the dataset more susceptible to cancer?

    Does he/she tend to gain weight?

    Where is his/her place of origin?

    Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc.

    How does this phenotype SNPs compare with other similar datasets from the western-world?

    How the family differ in genomic makeup? Which traits are silent, which ones are dominant?

    What would be the likely cause of death for any given person?

    What are the most likely diseases/illnesses this family is going to face in lifetime?

    What is unique about this dataset?

    Can you compare the genomes within this family and see which diseases will have less or more impact on a given family member?

    Can you delineate recombination sites precisely, identify sequence errors or find rare SNPs?

    What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup?

  3. r

    Data from: openSNP

    • rrid.site
    • neuinfo.org
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). openSNP [Dataset]. http://identifiers.org/RRID:SCR_001636
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of raw data from people who have shared their direct-to-customer (DTC) genetic results from 23andMe, deCODEme or FamilyTreeDNA. Logged-In users can search the database for users with specific phenotypes and mass-download all corresponding SNP-datasets. This allows you to get datasets like All genotyping files of openSNP-users that have Alzheimer and the corresponding control group. They are currently working on providing API-access. You can also use JSON to get access to openSNP-data and some other ways: If you want to automate the file-downloads for a given phenotype the RSS-feeds could help you. Inside the RSS-XML there are 2 flags you could use to automatically create correct genotype-groups: gives you the variation of this user at the phenotype you are looking at and gives you the download link. If you were genotyped by 23andMe, deCODEme or FamilyTreeDNA (contact them regarding others) you can upload the raw genotype data which you can download from your DTC test provider. The data will then be openly available for the world to see and download. They also parse these SNPs and annotate them. For annotation they include the manually curated SNPedia and find Open Access primary publications which appear in the journals of The Public Library of Science (PLoS), an Open Access publishing group. Additionally they screen Mendeley, a crowd-sourced repository of scientific publications. You can also publish some of your phenotypes so some day it might get possible to associate some SNPs with phenotypes. You can also share your knowledge about SNPs and phenotypes with other users and can socialize.

  4. Z

    OpenSNP data-freeze of 5,393 (19.10.2020)

    • data.niaid.nih.gov
    Updated Oct 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lu; Tzovaras; Gough (2021). OpenSNP data-freeze of 5,393 (19.10.2020) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5047472
    Explore at:
    Dataset updated
    Oct 19, 2021
    Dataset provided by
    MRC Laboratory of Molecular Biology
    Authors
    Lu; Tzovaras; Gough
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The open-source GenomePrep tool-kit, developed on the goodwill of open genome data, addresses the problem of processing raw DTC DNA data in the context of the present: genotype arrays. The output of GenomePrep are DNA datafiles of homogenous formats (23andMe-like or vcf), which enable further research analysis (example). A single combined data-freeze of OpenSNP genomes that passed checks is available here.

    For more information, visit https://supfam.mrc-lmb.cam.ac.uk/GenomePrep/

  5. w

    Global At Home DNA Test Kits Market Research Report: By Test Type (Ancestry...

    • wiseguyreports.com
    Updated Aug 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global At Home DNA Test Kits Market Research Report: By Test Type (Ancestry Testing, Health Testing, Paternity Testing, Nutritional Testing), By Application (Personal Use, Clinical Use, Research Use), By Distribution Channel (Online Retail, Pharmaceutical Stores, Supermarkets, Specialty Stores), By Age Group (Children, Adults, Seniors) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/at-home-dna-tests-kits-market
    Explore at:
    Dataset updated
    Aug 22, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Aug 1, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20241,000(USD Million)
    MARKET SIZE 20251,200(USD Million)
    MARKET SIZE 20353,500(USD Million)
    SEGMENTS COVEREDTest Type, Application, Distribution Channel, Age Group, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSincreasing consumer awareness, technological advancements, rise in health consciousness, direct-to-consumer sales, regulatory challenges
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDMap My Genome, Orig3n, Gene by Gene, Vitagene, MyHeritage, Living DNA, Helix, Genomind, Everlywell, Genetic Technologies, 23andMe, DNAfit, FamilyTreeDNA, Strands, Ancestry
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESPersonalized health insights, Ancestry exploration services, Genetic disease risk assessment, Enhanced marketing strategies, Partnerships with healthcare providers
    COMPOUND ANNUAL GROWTH RATE (CAGR) 11.7% (2025 - 2035)
  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zeeshan-ul-hassan Usmani (2021). My Complete Genome [Dataset]. https://www.kaggle.com/datasets/zusmani/mygenome/discussion
Organization logo

My Complete Genome

6,000 Base-Pairs of Phenotype SNPs - Complete Raw Data

Explore at:
50 scholarly articles cite this dataset (View in Google Scholar)
zip(32869133 bytes)Available download formats
Dataset updated
May 15, 2021
Authors
Zeeshan-ul-hassan Usmani
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Zeeshan-ul-hassan Usmani’s Genome Phenotype SNPs Raw Data

Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I’ve used 23andMe (using Illumina HumanOmniExpress-24) for my DNA’s Phenotype SNPs. I am sharing the entire raw dataset here for the international research community for following reasons:

  1. I am a firm believer in open dataset, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering my entire DNA raw data for the world to use for research without worrying about privacy. I call it copyleft dataset.

  2. Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share my data to bridge the gap and I expect others to follow the trend.

  3. I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using my DNA data. Please use it the way you will

Content

Name: Zeeshan-ul-hassan Usmani

Age: 38 Years

Country of Birth: Pakistan

Country of Ancestors: India (Utter Pradesh - UP)

File: GenomeZeeshanUsmani.csv

Size: 15 MB

Sources: 23andMe Personalized Genome Report

The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via the dataset and become one of the few genomics early adopters.

The data-set is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.

A complete list of the exact SNPs (base pairs) available and their data-set index can be found at https://api.23andme.com/res/txt/snps.b4e00fe1db50.data

For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes

Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes

Acknowledgements

Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Genome Phenotype SNPS Raw Data File by 23andMe, Kaggle Dataset Repository, Jan 25, 2017.”

Useful Links

You may use the following human genome database sites for help:

Inspiration

Some ideas worth exploring:

  • Is the individual in question more susceptible to cancer?
  • Does he tend to gain weight?
  • Where is his place of origin?
  • Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc.
  • How does this phenotype SNPs compare with other similar datasets from the western-world?
  • What would be the likely cause of death for this person?
  • What are the most likely diseases/illnesses this person is going to face in lifetime?
  • What is unique about this dataset?
  • What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup?

Sample Reports

Please check out following reports to understand what can be done with this data

Ancestry – https://www.23andme.com/published-report/eeb4f9bbd6b5474f/?share_id=f6c5562848e84586

Weight Report - https://you.23andme.com/published/reports/65c9af9f8223456d/?share_id=0126f129e4f3458b

Search
Clear search
Close search
Google apps
Main menu