https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
People data provides complete people information and gives the ability to link individual information to organizations and roles.
A human mitochondrial resource aimed at supporting population genetics and mitochondrial disease studies. It consists of a database of Human Mitochondrial Genomes annotated with population and variability data, the latter estimated through the application of a new approach based on site-specific nucleotidic and aminoacidic variability calculation (SiteVar and MitVarProt programs). The goals of HmtDB are: to collect and integrate the publicly available human mitochondrial genomes data; to produce and provide the scientific community with site-specific nucleotidic and aminoacidic variability data estimated on all the collected human mitochondrial genome sequences; to allow any researcher to analyse his own human mitochondrial sequences (both complete and partial mitochondrial genomes) in order to automatically detect the nucleotidic variants compared to the revised Cambridge Reference Sequence (rCRS) and to predict their haplogroup paternity. HmtDBs first release contains 1255 human mitochondrial genomes derived from public databases (GenBank and MitoKor). The genomes have been stored and analysed as a whole dataset and grouped in continent-specific subsets (AF: Africa, AM: America, AS: Asia, EU: Europe, OC: Oceania). :The multialignment and site-variability analysis tools included in HmtDB are clustered in two Work Flows: the Variability Generation Work Flow (VGWF) and the Classification Work Flow (CWF), which are applied both to human mitochondrial genomes stored in the database and to newly sequenced genomes submitted by the user, respectively.
The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.
The Consolidated Human Activity Database (CHAD) is a resource for learning about human exposure and health studies and predictive models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Human Presence Database is a dataset for object detection tasks - it contains Person annotations for 285 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EOG
Copy of https://www.kaggle.com/datasets/kisoibo/countries-databasesqlite
Updated the name of the table from 'countries of the world' to 'countries', for ease of writing queries.
Info about the dataset:
Table Total Rows Total Columns countries of the world **0 ** ** 20** Country, Region, Population, Area (sq. mi.), Pop. Density (per sq. mi.), Coastline (coast/area ratio), Net migration, Infant mortality (per 1000 births), GDP ($ per capita), Literacy (%), Phones (per 1000), Arable (%), Crops (%), Other (%), Climate, Birthrate, Deathrate, Agriculture, Industry, Service
Acknowledgements Source: All these data sets are made up of data from the US government. Generally they are free to use if you use the data in the US. If you are outside of the US, you may need to contact the US Govt to ask. Data from the World Factbook is public domain. The website says "The World Factbook is in the public domain and may be used freely by anyone at anytime without seeking permission." https://www.cia.gov/library/publications/the-world-factbook/docs/faqs.html
When making visualisations related to countries, sometimes it is interesting to group them by attributes such as region, or weigh their importance by population, GDP or other variables.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This multi-camera surveillance dataset, the SAIVT-SoftBio database, was captured from an existing surveillance network, to enable the evaluation of person recognition and re-identification models in a reallife multi-camera surveillance environment.
The dataset consists of 150 people moving through a building environment, recorded by eight surveillance cameras. Each camera captures data at 25 frames per second, at a resolution of 704 x 576 pixels, and is calibrated using Tsai’s method. The placement of cameras is a real-life surveillance setup, and cameras have been placed to provide maximal coverage of the space (with some overlap) and observation of the entrances to the building. The dataset was collected in an uncontrolled manner, so subjects can travel any route through the building. Thus, the vast majority of subjects will only pass through a subset of the camera network and that subset varies from person to person. This provides a highly unconstrained environment in which to test person re-identification models.
The frames are recorded from when the subject enters the building through one of the three main doorways visible in Camera 4, Camera 7 and Camera 5/8, until they leave observation either through exiting the building or entering a lecture theatre. Any frames which are significantly occluded, have been omitted.
XML files are used to store information about the database to enable different evaluations to be easily performed based on which subset of the dataset fits the desired criteria. For each subject, an XML file is used to summarise the camera views
and frame information which can be used to select subjects which fit the desired evaluation conditions (e.g. only subjects that exist in specific cameras or locations can be selected).
The overall dataset is also summarised in an XML file, which provides information on the camera calibration data for each subject.
A database providing detailed mortality and population data to those interested in the history of human longevity. For each country, the database includes calculated death rates and life tables by age, time, and sex, along with all of the raw data (vital statistics, census counts, population estimates) used in computing these quantities. Data are presented in a variety of formats with regard to age groups and time periods. The main goal of the database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. New data series is continually added to this collection. However, the database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included are relatively wealthy and for the most part highly industrialized. The database replaces an earlier NIA-funded project, known as the Berkeley Mortality Database. * Dates of Study: 1751-present * Study Features: Longitudinal, International * Sample Size: 37 countries or areas
The Genetic Association Database is an archive of human genetic association studies of complex diseases and disorders. The goal of this database is to allow the user to rapidly identify medically relevant polymorphism from the large volume of polymorphism and mutational data, in the context of standardized nomenclature. The data is from published scientific papers. Study data is recorded in the context of official human gene nomenclature with additional molecular reference numbers and links. It is gene centered. That is, each record is a record of a gene or marker. If a study investigated 6 genes for a particular disorder, there will be 6 records. Anyone may view this database and anyone may submit records. You do not have to be an author on the original study to submit a record. All submitted records will be reviewed before inclusion in the archive. Both genetic and environmental factors contribute to human diseases. Most common diseases are influenced by a large number of genetic and environmental factors, most of which individually have only a modest effect on the disease. Though genetic contributions are relatively well characterized for some monogenetic diseases, there has been no effort at curating the extensive list of environmental etiological factors. From a comprehensive search of the MeSH annotation of MEDLINE articles, they identified 3,342 environmental etiological factors associated with 3,159 diseases. They also identified 1,100 genes associated with 1,034 complex diseases from the NIH Genetic Association Database (GAD), a database of genetic association studies. 863 diseases have both genetic and environmental etiological factors available. Integrating genetic and environmental factors results in the etiome, which they define as the comprehensive compendium of disease etiology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
“The Global Human Settlement Layer Urban Centres Database (GHS-UCDB) is the most complete database on cities to date, publicly released as an open and free dataset. The database represents the global status on Urban Centres in 2015 by offering cities location, their extent (surface, shape), and describing each city with a set of geographical, socio-economic and environmental attributes, many of them going back 25 or even 40 years in time.”Zusätzliche Informationen The Urban Centres are defined by specific cut-off values on resdient population and built-up surfac share in a 1x1km uniform global grid.See ghs_stat_ucdb2015mt_globe_r2019a_v1_0_web_1.pdf for more information.Views of this layer are used in web maps for the ArcGIS Living Atlas of the World.QuelleGlobal Human Settlement - Urban Centre database R2019A - European Commission | Zuletzt Aufgerufen am 25.04.2025Datenbestand2019
RegDB is used for Visible-Infrared Re-ID which handles the cross-modality matching between the daytime visible and night-time infrared images. The dataset contains images of 412 people. It includes 10 color and 10 thermal images for each person.
A collection of population life tables covering a multitude of countries and many years. Most of the HLD life tables are life tables for national populations, which have been officially published by national statistical offices. Some of the HLD life tables refer to certain regional or ethnic sub-populations within countries. Parts of the HLD life tables are non-official life tables produced by researchers. Life tables describe the extent to which a generation of people (i.e. life table cohort) dies off with age. Life tables are the most ancient and important tool in demography. They are widely used for descriptive and analytical purposes in demography, public health, epidemiology, population geography, biology and many other branches of science. HLD includes the following types of data: * complete life tables in text format; * abridged life tables in text format; * references to statistical publications and other data sources; * scanned copies of the original life tables as they were published. Three scientific institutions are jointly developing the HLD: the Max Planck Institute for Demographic Research (MPIDR) in Rostock, Germany, the Department of Demography at the University of California at Berkeley, USA and the Institut national d''��tudes d��mographiques (INED) in Paris, France. The MPIDR is responsible for maintaining the database.
A large body of research has demonstrated that land use and urban form can have a significant effect on transportation outcomes. People who live and/or work in compact neighborhoods with a walkable street grid and easy access to public transit, jobs, stores, and services are more likely to have several transportation options to meet their everyday needs. As a result, they can choose to drive less, which reduces their emissions of greenhouse gases and other pollutants compared to people who live and work in places that are not location efficient. Walking, biking, and taking public transit can also save people money and improve their health by encouraging physical activity. The Smart Location Database summarizes several demographic, employment, and built environment variables for every census block group (CBG) in the United States. The database includes indicators of the commonly cited “D” variables shown in the transportation research literature to be related to travel behavior. The Ds include residential and employment density, land use diversity, design of the built environment, access to destinations, and distance to transit. SLD variables can be used as inputs to travel demand models, baseline data for scenario planning studies, and combined into composite indicators characterizing the relative location efficiency of CBG within U.S. metropolitan regions. This update features the most recent geographic boundaries (2019 Census Block Groups) and new and expanded sources of data used to calculate variables. Entirely new variables have been added and the methods used to calculate some of the SLD variables have changed. More information on the National Walkability index: https://www.epa.gov/smartgrowth/smart-location-mapping More information on the Smart Location Calculator: https://www.slc.gsa.gov/slc/
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 4,2023.The Human Gene and Protein Database presents SDS-PAGE patterns and other informations of human genes and proteins. The HGPD was constructed from full-length cDNAs. For conversion to Gateway entry clones, we first determined an open reading frame (ORF) region in each cDNA meeting the criteria. Those ORF regions were PCR-amplified utilizing selected resource cDNAs as templates. All the details of the construction and utilization of entry clones will be published elsewhere. Amino acid and nucleotide sequences of an ORF for each cDNA and sequence differences of Gateway entry clones from source cDNAs are presented in the GW: Gateway Summary window. Utilizing those clones with a very efficient cell-free protein synthesis system featuring wheat germ, we have produced a large number of human proteins in vitro. Expressed proteins were detected in almost all cases. Proteins in both total and supernatant fractions are shown in the PE: Protein Expression window. In addition, we have also successfully expressed proteins in HeLa cells and determined subcellular localizations of human proteins. These biological data are presented on the frame of cDNA clusters in the Human Gene and Protein Database. To build the basic frame of HGPD, sequences of FLJ full-length cDNAs and others deposited in public databases (Human ESTs, RefSeq, Ensembl, MGC, etc.) are assembled onto the genome sequences (NCBI Build 35 (UCSC hg17)). The majority of analysis data for cDNA sequences in HGPD are shared with the FLJ Human cDNA Database (http://flj.hinv.jp/) constructed as a human cDNA sequence analysis database focusing on mRNA varieties caused by variations in transcription start site (TSS) and splicing.
To accelerate the process of tumor antigen discovery, we generated a publicly available Human Potential Tumor Associated Antigen database (HPtaa) with pTAAs identified by insilico computing. 3518 potential targets have been included in the database, which is freely available to academic users. It successfully screened out 41 of 82 known Cancer-Testis antigens, 6 of 18 differentiation antigen, 2 of 2 oncofetal antigen, and 7 of 12 FDA approved cancer markers that have Gene ID, therefore will provide a good platform for identification of cancer target genes. This database utilizes expression data from various expression platforms, including carefully chosen publicly available microarray expression data, GEO SAGE data, Unigene expression data. In addition, other relevant databases required for TAA discovery such as CGAP, CCDS, gene ontology database etc, were also incorporated. In order to integrate different expression platforms together, various strategies and algorithms have been developed. Known tumor antigens are gathered from literature and serve as training sets. A total tumor specificity penalty was computed from positive clue penalty for differential expression in human cancers, the corresponding differential ratio, and normal tissue restriction penalty for each gene. We hope this database will help with the process of cancer immunome identification, thus help with improving the diagnosis and treatment of human carcinomas.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This document consists of the corpus of image databases examined for race and gender information as published in:
Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R. Brubaker. 2020. How We’ve Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput. CSCW.
This code book includes:
Whether race/gender is present implicitly (as descriptive, but not annotated) or explicitly (annotated/labeled).
What categories or descriptions of race/gender are used.
Whether those categories/descriptions use underlying source material to justify or motivate their descriptions of race/gender.
Whether explicitly annotated databases describe the process of annotating race/gender.
This dataset was created by WinstonSDodson
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bottom-up proteomics approaches rely on database searches that compare experimental values of peptides to theoretical values derived from protein sequences in a database. While the human body can produce millions of distinct antibodies, current databases for human antibodies such as UniProtKB are limited to only 1095 sequences (as of 2024 January). This limitation may hinder the identification of new antibodies using bottom-up proteomics. Therefore, extending the databases is an important task for discovering new antibodies.
Herein, we adopted extensive collection of antibody sequences from Observed Antibody Space for conducting efficient database searches in publicly available proteomics data with a focus on the SARS-CoV-2 disease. Thirty million heavy antibody sequences from 146 SARS-CoV-2 patients in the Observed Antibody Space were in silico digested to obtain 18 million unique peptides. These peptides were then used to create six databases (DB1-DB6) for bottom-up proteomics. We used those databases for searching antibody peptides in publicly available SARS-CoV-2 human plasma samples in the Proteomics Identification Database (PRIDE), and we consistently found new antibody peptides in those samples. The database searching task was done by using Fragpipe softwares.
Table 1. Information of databases. In addition to human SARS-CoV-2 antibody peptides, every database also contains human protein sequences from UniProt database and contaminants from cRAP database.
File | Database | Number of human SARS-CoV-2 antibody peptides |
DB1.fasta | DB1 | 100 |
DB2.fasta | DB2 | 1,000 |
DB3.fasta | DB3 | 10,000 |
DB4.fasta | DB4 | 100,000 |
DB5.fasta | DB5 | 1,000,000 |
DB6.fasta | DB6 | 10,000,000 |
Direct-to-consumer (DTC) genetics services are increasingly popular for genetic genealogy, with tens of millions of customers as of 2019. Several DTC genealogy services allow users to upload their own genetic datasets in order to search for genetic relatives. The statement that a user's uploaded genome shares one or more segments in common with that of a target person in the database---that is, that the two genomes share one or more regions identical by state (IBS)---reveals some information about the genotypes of the target person, particularly if the chromosomal locations of IBS matches are shared with the uploader. Here, we describe several methods by which an adversary who wants to learn the genotypes of people in the database can do so by uploading multiple datasets. Depending on the methods used for IBS matching and the information about IBS segments returned to the user, substantial information about users' genotypes can be revealed with a few hundred uploaded datasets. For examp...
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
People data provides complete people information and gives the ability to link individual information to organizations and roles.