100+ datasets found
  1. Wikipedia notable people

    • kaggle.com
    zip
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konrad Banachewicz (2023). Wikipedia notable people [Dataset]. https://www.kaggle.com/datasets/konradb/wikipedia-notable-people
    Explore at:
    zip(268529204 bytes)Available download formats
    Dataset updated
    Jun 15, 2023
    Authors
    Konrad Banachewicz
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    From the original paper:

    A new strand of literature aims at building the most comprehensive and accurate database of notable individuals. We collect a massive amount of data from various editions of Wikipedia and Wikidata. Using deduplication techniques over these partially overlapping sources, we cross-verify each retrieved information. For some variables, Wikipedia adds 15% more information when missing in Wikidata. We find very few errors in the part of the database that contains the most documented individuals but nontrivial error rates in the bottom of the notability distribution, due to sparse information and classification errors or ambiguity. Our strategy results in a cross-verified database of 2.29 million individuals (an elite of 1/43,000 of human being having ever lived), including a third who are not present in the English edition of Wikipedia.

  2. Human Tracking & Object Detection Dataset

    • kaggle.com
    zip
    Updated Jul 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2023). Human Tracking & Object Detection Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/people-tracking
    Explore at:
    zip(46156442 bytes)Available download formats
    Dataset updated
    Jul 27, 2023
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    People Tracking & Object Detection dataset

    The dataset comprises of annotated video frames from positioned in a public space camera. The tracking of each individual in the camera's view has been achieved using the rectangle tool in the Computer Vision Annotation Tool (CVAT).

    The dataset is created on the basis of Real-Time Traffic Video Dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fc5a8dc4f63fe85c64a5fead10fad3031%2Fpersons_gif.gif?generation=1690705558283123&alt=media" alt="">

    Dataset Structure

    • The images directory houses the original video frames, serving as the primary source of raw data.
    • The annotations.xml file provides the detailed annotation data for the images.
    • The boxes directory contains frames that visually represent the bounding box annotations, showing the locations of the tracked individuals within each frame. These images can be used to understand how the tracking has been implemented and to visualize the marked areas for each individual.

    Data Format

    The annotations are represented as rectangle bounding boxes that are placed around each individual. Each bounding box annotation contains the position ( xtl-ytl-xbr-ybr coordinates ) for the respective box within the frame. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4f274551e10db2754c4d8a16dff97b33%2Fcarbon%20(10).png?generation=1687776281548084&alt=media" alt="">

    👉 Legally sourced datasets and carefully structured for AI training and model development. Explore samples from our dataset of 95,000+ human images & videos - Full dataset

    🚀 You can learn more about our high-quality unique datasets here

    keywords: multiple people tracking, human detection dataset, object detection dataset, people tracking dataset, tracking human object interactions, human Identification tracking dataset, people detection annotations, detecting human in a crowd, human trafficking dataset, deep learning object tracking, multi-object tracking dataset, labeled web tracking dataset, large-scale object tracking dataset

  3. n

    HmtDB - Human Mitochondrial DataBase

    • neuinfo.org
    • scicrunch.org
    • +1more
    Updated May 16, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). HmtDB - Human Mitochondrial DataBase [Dataset]. http://identifiers.org/RRID:SCR_007713
    Explore at:
    Dataset updated
    May 16, 2018
    Description

    A human mitochondrial resource aimed at supporting population genetics and mitochondrial disease studies. It consists of a database of Human Mitochondrial Genomes annotated with population and variability data, the latter estimated through the application of a new approach based on site-specific nucleotidic and aminoacidic variability calculation (SiteVar and MitVarProt programs). The goals of HmtDB are: to collect and integrate the publicly available human mitochondrial genomes data; to produce and provide the scientific community with site-specific nucleotidic and aminoacidic variability data estimated on all the collected human mitochondrial genome sequences; to allow any researcher to analyse his own human mitochondrial sequences (both complete and partial mitochondrial genomes) in order to automatically detect the nucleotidic variants compared to the revised Cambridge Reference Sequence (rCRS) and to predict their haplogroup paternity. HmtDBs first release contains 1255 human mitochondrial genomes derived from public databases (GenBank and MitoKor). The genomes have been stored and analysed as a whole dataset and grouped in continent-specific subsets (AF: Africa, AM: America, AS: Asia, EU: Europe, OC: Oceania). :The multialignment and site-variability analysis tools included in HmtDB are clustered in two Work Flows: the Variability Generation Work Flow (VGWF) and the Classification Work Flow (CWF), which are applied both to human mitochondrial genomes stored in the database and to newly sequenced genomes submitted by the user, respectively., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.

  4. Human Exposure Database System (HEDS)

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Dec 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) - National Exposure Research Laboratory (NERL) (2020). Human Exposure Database System (HEDS) [Dataset]. https://catalog.data.gov/dataset/human-exposure-database-system-heds
    Explore at:
    Dataset updated
    Dec 4, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The Human Exposure Database System (HEDS) provides public access to data sets, documents, and metadata from EPA on human exposure. It is primarily intended for scientists involved in human exposure studies or work requiring such data.

  5. Human Images Dataset - Men and Women

    • kaggle.com
    zip
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mahsa sanaei (2024). Human Images Dataset - Men and Women [Dataset]. https://www.kaggle.com/datasets/snmahsa/human-images-dataset-men-and-women
    Explore at:
    zip(725552194 bytes)Available download formats
    Dataset updated
    Aug 3, 2024
    Authors
    mahsa sanaei
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description:

    This dataset includes two folders of images of people. One folder contains images of men, and the other contains images of women. The images include faces, upper bodies, and full bodies. This dataset can be used for various projects like gender recognition, human identification, and image classification.

    Use Cases:

    1. Gender Recognition: For algorithms that recognize gender based on images.
    2. Human Identification: To improve models for identifying humans in images and videos.
    3. Image Classification: For classifying images into categories of men and women.
  6. n

    Human Mortality Database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jun 20, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). Human Mortality Database [Dataset]. http://identifiers.org/RRID:SCR_002370
    Explore at:
    Dataset updated
    Jun 20, 2014
    Description

    A database providing detailed mortality and population data to those interested in the history of human longevity. For each country, the database includes calculated death rates and life tables by age, time, and sex, along with all of the raw data (vital statistics, census counts, population estimates) used in computing these quantities. Data are presented in a variety of formats with regard to age groups and time periods. The main goal of the database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. New data series is continually added to this collection. However, the database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included are relatively wealthy and for the most part highly industrialized. The database replaces an earlier NIA-funded project, known as the Berkeley Mortality Database. * Dates of Study: 1751-present * Study Features: Longitudinal, International * Sample Size: 37 countries or areas

  7. d

    Persons Removed from Doing Business Database

    • catalog.data.gov
    • data.cityofnewyork.us
    Updated Nov 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2021). Persons Removed from Doing Business Database [Dataset]. https://catalog.data.gov/dataset/persons-removed-from-doing-business-database
    Explore at:
    Dataset updated
    Nov 29, 2021
    Dataset provided by
    data.cityofnewyork.us
    Description

    This report includes a list of entities or individuals should not be listed in the Database because the entity does not participate in transactions covered by LL 34, or the individual does not hold one of the positions noted above.

  8. n

    Human Life-Table Database

    • neuinfo.org
    • rrid.site
    • +2more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human Life-Table Database [Dataset]. http://identifiers.org/RRID:SCR_006248
    Explore at:
    Description

    A collection of population life tables covering a multitude of countries and many years. Most of the HLD life tables are life tables for national populations, which have been officially published by national statistical offices. Some of the HLD life tables refer to certain regional or ethnic sub-populations within countries. Parts of the HLD life tables are non-official life tables produced by researchers. Life tables describe the extent to which a generation of people (i.e. life table cohort) dies off with age. Life tables are the most ancient and important tool in demography. They are widely used for descriptive and analytical purposes in demography, public health, epidemiology, population geography, biology and many other branches of science. HLD includes the following types of data: * complete life tables in text format; * abridged life tables in text format; * references to statistical publications and other data sources; * scanned copies of the original life tables as they were published. Three scientific institutions are jointly developing the HLD: the Max Planck Institute for Demographic Research (MPIDR) in Rostock, Germany, the Department of Demography at the University of California at Berkeley, USA and the Institut national d''��tudes d��mographiques (INED) in Paris, France. The MPIDR is responsible for maintaining the database.

  9. t

    Human Face Database - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Human Face Database - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/human-face-database
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    A human face dataset used for evaluating image alignment techniques, containing altered and deformed images of human faces for testing alignment accuracy.

  10. n

    Human Brain Connectivity Database

    • neuinfo.org
    • dknet.org
    • +3more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Human Brain Connectivity Database [Dataset]. http://identifiers.org/RRID:SCR_001594
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Preliminary database of neuroanatomical connectivity reports specifically for the human brain, which have been manually curated. It includes details (based on manual literature curation) of tract tracing or related connectivity studies conducted in human brain tissue. This database and user interface will be expanded and improved in the near future.

  11. n

    Human Gene and Protein Database (HGPD)

    • neuinfo.org
    • scicrunch.org
    • +1more
    Updated Nov 23, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2008). Human Gene and Protein Database (HGPD) [Dataset]. http://identifiers.org/RRID:SCR_002889
    Explore at:
    Dataset updated
    Nov 23, 2008
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 4,2023.The Human Gene and Protein Database presents SDS-PAGE patterns and other informations of human genes and proteins. The HGPD was constructed from full-length cDNAs. For conversion to Gateway entry clones, we first determined an open reading frame (ORF) region in each cDNA meeting the criteria. Those ORF regions were PCR-amplified utilizing selected resource cDNAs as templates. All the details of the construction and utilization of entry clones will be published elsewhere. Amino acid and nucleotide sequences of an ORF for each cDNA and sequence differences of Gateway entry clones from source cDNAs are presented in the GW: Gateway Summary window. Utilizing those clones with a very efficient cell-free protein synthesis system featuring wheat germ, we have produced a large number of human proteins in vitro. Expressed proteins were detected in almost all cases. Proteins in both total and supernatant fractions are shown in the PE: Protein Expression window. In addition, we have also successfully expressed proteins in HeLa cells and determined subcellular localizations of human proteins. These biological data are presented on the frame of cDNA clusters in the Human Gene and Protein Database. To build the basic frame of HGPD, sequences of FLJ full-length cDNAs and others deposited in public databases (Human ESTs, RefSeq, Ensembl, MGC, etc.) are assembled onto the genome sequences (NCBI Build 35 (UCSC hg17)). The majority of analysis data for cDNA sequences in HGPD are shared with the FLJ Human cDNA Database (http://flj.hinv.jp/) constructed as a human cDNA sequence analysis database focusing on mRNA varieties caused by variations in transcription start site (TSS) and splicing.

  12. The Therapeutic Drug Target Database Human SwissProt

    • johnsnowlabs.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs, The Therapeutic Drug Target Database Human SwissProt [Dataset]. https://www.johnsnowlabs.com/marketplace/the-therapeutic-drug-target-database-human-swissprot/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset is a selection of The Therapeutic Target Database (release 4.3.02, 18th Oct 2013) protein IDs for successful targets. The web page states 388 but these reduced to 345 human Swiss-Prot accessions.

  13. d

    US EPA The Consolidated Human Activity Database (CHAD)

    • catalog.data.gov
    • data.ca.gov
    • +1more
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Environmental Protection Agency (2024). US EPA The Consolidated Human Activity Database (CHAD) [Dataset]. https://catalog.data.gov/dataset/us-epa-the-consolidated-human-activity-database-chad
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Environmental Protection Agency
    Description

    The Consolidated Human Activity Database (CHAD) is a resource for learning about human exposure and health studies and predictive models.

  14. n

    Human Transcriptome Database for Alternative Splicing

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Human Transcriptome Database for Alternative Splicing [Dataset]. http://identifiers.org/RRID:SCR_013305
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)

  15. o

    Armenia in European database on human and technical resources for health

    • data.opendata.am
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Armenia in European database on human and technical resources for health [Dataset]. https://data.opendata.am/dataset/human-and-technical-resources-for-health
    Explore at:
    Dataset updated
    Jun 16, 2023
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Armenia
    Description

    HlthRes-DB provides a wide range of statistics on human and technical resources for health and offers data on non-monetary health care resources collected through the joint work of the Statistical Office of the European Union (Eurostat), the Organisation for Economic Co-operation and Development (OECD) and WHO/Europe. It contains indicators on human resources for health. Data on Armenia cover 84 indicators, 1980-2019

  16. Selfies & ID Images Dataset, 95,000 files

    • kaggle.com
    zip
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KUCEV ROMAN (2023). Selfies & ID Images Dataset, 95,000 files [Dataset]. https://www.kaggle.com/datasets/tapakah68/selfies-id-images-dataset
    Explore at:
    zip(975731811 bytes)Available download formats
    Dataset updated
    Aug 1, 2023
    Authors
    KUCEV ROMAN
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Selfies, ID Images Face Dataset

    5 591 sets, which includes 2 photos of a person from his documents and 13 selfies. 571 sets of Hispanics and 3512 sets of Caucasians.

    Photo documents contains only a photo of a person. All personal information from the document is hidden

    💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on roman@kucev.com to buy the dataset

    Metadata for the full dataset:

    • assignment_id - unique identifier of the media file
    • worker_id - unique identifier of the person
    • age - age of the person
    • true_gender - gender of the person
    • country - country of the person
    • ethnicity - ethnicity of the person
    • photo_1_extension, photo_2_extension, …, photo_15_extension - photo extensions in the dataset
    • photo_1_resolution, photo_2_resolution, …, photo_15_resolution - photo resolution in the dataset

    Content

    The dataset includes 2 folders: - 18_sets_Caucasians - images of Caucasian people - 11_sets_Hispanics - images Hispanic people

    In each folder there are folders for every person in dataset. Files are named "ID_1", "ID_2" for ID images and "Selfie_1",..."Selfie_13" for selfies.

    https://sun9-53.userapi.com/impg/dOFVs6YsLexi-rM0LBud5rc6bVsCQPq5bIvrnA/S-3MRJPo-IE.jpg?size=2560x1054&quality=95&sign=16fc124e8f61d43a371cf4f0712f6a14&type=album" alt="">

    💴 Buy the Dataset: This is just an example of the data. Leave a request on roman@kucev.com to discuss your requirements, learn about the price and buy the dataset.

    keywords: biometric system, biometric dataset, face recognition database, face recognition dataset, face detection dataset, facial analysis, object detection dataset, deep learning datasets, computer vision datset, human images dataset, human faces dataset, machine learning, image-to-image, re-identification, id photos, selfies and paired id, photos, id verification models, passport, id card image, digital photo-identification

  17. d

    BrainWeb - Simulated Brain Database

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). BrainWeb - Simulated Brain Database [Dataset]. http://identifiers.org/RRID:SCR_003263
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of human brain images derived from a realistic phantom and generated using a sophisticated MRI simulator. Custom simulations may be generated to match a user's selected parameters. The goal is to aid validation of computer-aided quantitative analysis of medical image data. The SBD contains a set of realistic MRI data volumes produced by an MRI simulator. These data can be used by the neuroimaging community to evaluate the performance of various image analysis methods in a setting where the truth is known. The SBD contains simulated brain MRI data based on two anatomical models: normal and multiple sclerosis (MS). For both of these, full 3-dimensional data volumes have been simulated using three sequences (T1-, T2-, and proton-density- (PD-) weighted) and a variety of slice thicknesses, noise levels, and levels of intensity non-uniformity. These data are available for viewing in three orthogonal views (transversal, sagittal, and coronal), and for downloading.

  18. n

    Database of Human Hemoglobin Variants and Thalassemias

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Database of Human Hemoglobin Variants and Thalassemias [Dataset]. http://identifiers.org/RRID:SCR_007084
    Explore at:
    Dataset updated
    Jun 5, 2023
    Description

    HbVar is a relational database of information about hemoglobin variants and mutations that cause thalassemia. The initial data came from Syllabi authored by Prof. Titus H.J. Huisman, Mrs. Marianne F.H. Carver, Dr. Erol Baysal, and Prof. Georgi D. Efremov. This information was converted to a database, and now new entries are added and old entries are corrected by curators. HbVar results from a collaboration among several investigators at Penn State University (USA), INSERM Creteil (France), and Boston University Medical Center (USA). Visit our query page or summary page to see the types of information available.

  19. Scottish Drug Misuse Database (SDMD) People in Treatment - Dataset -...

    • ckan.publishing.service.gov.uk
    Updated Mar 7, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2013). Scottish Drug Misuse Database (SDMD) People in Treatment - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/scottish_drug_misuse_database_sdmd_people_in_treatment
    Explore at:
    Dataset updated
    Mar 7, 2013
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Scotland
    Description

    Information on individuals presenting to drug treatment services, their journey through treatment and, using person 'follow up' data, an exploration of treatment outcomes. Source agency: ISD Scotland (part of NHS National Services Scotland) Designation: Official Statistics not designated as National Statistics Language: English Alternative title: SDMD People in Treatment

  20. Data from: Visible Human Project

    • healthdata.gov
    • datadiscovery.nlm.nih.gov
    • +3more
    csv, xlsx, xml
    Updated Mar 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datadiscovery.nlm.nih.gov (2023). Visible Human Project [Dataset]. https://healthdata.gov/NIH/Visible-Human-Project/krti-uwg9
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    datadiscovery.nlm.nih.gov
    Description

    The NLM Visible Human Project® has created publicly-available complete, anatomically detailed, three-dimensional representations of a human male body and a human female body. Specifically, the VHP provides a public-domain library of cross-sectional cryosection, CT, and MRI images obtained from one male cadaver and one female cadaver. The Visible Man data set was publicly released in 1994 and the Visible Woman in 1995.

    https://www.nlm.nih.gov/research/visible/visible_human.html

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Konrad Banachewicz (2023). Wikipedia notable people [Dataset]. https://www.kaggle.com/datasets/konradb/wikipedia-notable-people
Organization logo

Wikipedia notable people

A Brief History of Human Time - Cross-verified Dataset

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(268529204 bytes)Available download formats
Dataset updated
Jun 15, 2023
Authors
Konrad Banachewicz
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

From the original paper:

A new strand of literature aims at building the most comprehensive and accurate database of notable individuals. We collect a massive amount of data from various editions of Wikipedia and Wikidata. Using deduplication techniques over these partially overlapping sources, we cross-verify each retrieved information. For some variables, Wikipedia adds 15% more information when missing in Wikidata. We find very few errors in the part of the database that contains the most documented individuals but nontrivial error rates in the bottom of the notability distribution, due to sparse information and classification errors or ambiguity. Our strategy results in a cross-verified database of 2.29 million individuals (an elite of 1/43,000 of human being having ever lived), including a third who are not present in the English edition of Wikipedia.

Search
Clear search
Close search
Google apps
Main menu