53 datasets found

F
African Facial Timeline Dataset | Facial Images from Past
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). African Facial Timeline Dataset | Facial Images from Past [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-african
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the African Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.
Facial Image Data
This dataset comprises over 10,000+ images, divided into participant-wise sets with each set including:
•
Historical Images: 22 different high-quality historical images per individual from the timeline of 10 years.

•
Enrollment Image: One modern high-quality image for reference.

Diversity and Representation
The dataset includes contributions from a diverse network of individuals across African countries:
•
Geographical Representation: Participants from countries including Kenya, Malawi, Nigeria, Ethiopia, Benin, Somalia, Uganda, and more.

•
Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.

•
File Format: The dataset contains images in JPEG and HEIC file format.

Quality and Conditions
To ensure high utility and robustness, all images are captured under varying conditions:
•
Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.

•
Backgrounds: A variety of backgrounds are available to enhance model generalization.

•
Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

Metadata
Each image set is accompanied by detailed metadata for each participant, including:
•Participant Identifier
•File Name
•Age at the time of capture
•Gender
•Country
•Demographic Information
•File Format
This metadata is essential for training models that can accurately recognize and identify African faces across different demographics and conditions.
Usage and Applications
This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:
•
Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.

•
KYC Models: Streamlining the identity verification processes for financial and other services.

•
Biometric Identity Systems: Developing robust biometric identification solutions.

•
Age Prediction Models: Training models to accurately predict the age of individuals based on facial features.

•
Generative AI Models: Training generative AI models to create realistic and diverse synthetic facial images.

Secure and Ethical Collection
•
Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.

•
Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.

•
Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.

<h3
Human Stampedes (1800 - 2021)
kaggle.com
Updated Dec 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shivam Bansal (2021). Human Stampedes (1800 - 2021) [Dataset]. https://www.kaggle.com/datasets/shivamb/human-stampede
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 13, 2021
Dataset provided by
Kaggle
Authors
Shivam Bansal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About this Dataset: Human Stampedes

Crushes often occur during religious pilgrimages and large entertainment events, as they tend to involve dense crowds, with people closely surrounded on all sides. Human stampedes and crushes also occur as people try to get away from a perceived danger, as in a case where a noxious gas was released in crowded premises.

Content

The dataset contains all the notable human stampedes and crushes events along with meta information such as - location, total deaths, description.

Interesting Analysis Ideas

Perform Explorator Analysis to identify key topics, themes from the event descriptions

Use NLP and Visualizations to generate a dashboard of historical stampedes
The dataset of the Global Collections survey of natural history collections
zenodo.org
bin, pdf, txt, zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matt Woodburn; Matt Woodburn; Robert J. Corrigan; Nicholas Drew; Cailin Meyer; Vincent S. Smith; Vincent S. Smith; Sarah Vincent; Sarah Vincent; Robert J. Corrigan; Nicholas Drew; Cailin Meyer (2024). The dataset of the Global Collections survey of natural history collections [Dataset]. http://doi.org/10.5281/zenodo.6985399
Explore at:
pdf, bin, zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6985399
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matt Woodburn; Matt Woodburn; Robert J. Corrigan; Nicholas Drew; Cailin Meyer; Vincent S. Smith; Vincent S. Smith; Sarah Vincent; Sarah Vincent; Robert J. Corrigan; Nicholas Drew; Cailin Meyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
From 2016 to 2018, we surveyed the world’s largest natural history museum collections to begin mapping this globally distributed scientific infrastructure. The resulting dataset includes 73 institutions across the globe. It has:

Basic institution data for the 73 contributing institutions, including estimated total collection sizes, geographic locations (to the city) and latitude/longitude, and Research Organization Registry (ROR) identifiers where available.

Resourcing information, covering the numbers of research, collections and volunteer staff in each institution.

Indicators of the presence and size of collections within each institution broken down into a grid of 19 collection disciplines and 16 geographic regions.

Measures of the depth and breadth of individual researcher experience across the same disciplines and geographic regions.

This dataset contains the data (raw and processed) collected for the survey, and specifications for the schema used to store the data. It includes:

A diagram of the MySQL database schema.

A SQL dump of the MySQL database schema, excluding the data.

A SQL dump of the MySQL database schema with all data. This may be imported into an instance of MySQL Server to create a complete reconstruction of the database.

Raw data from each database table in CSV format.

A set of more human-readable views of the data in CSV format. These correspond to the database tables, but foreign keys are substituted for values from the linked tables to make the data easier to read and analyse.

A text file containing the definitions of the size categories used in the collection_unit table.

The global collections data may also be accessed at https://rebrand.ly/global-collections. This is a preliminary dashboard, constructed and published using Microsoft Power BI, that enables the exploration of the data through a set of visualisations and filters. The dashboard consists of three pages:

Institutional profile: Enables the selection of a specific institution and provides summary information on the institution and its location, staffing, total collection size, collection breakdown and researcher expertise.

Overall heatmap: Supports an interactive exploration of the global picture, including a heatmap of collection distribution across the discipline and geographic categories, and visualisations that demonstrate the relative breadth of collections across institutions and correlations between collection size and breadth. Various filters allow the focus to be refined to specific regions and collection sizes.

Browse: Provides some alternative methods of filtering and visualising the global dataset to look at patterns in the distribution and size of different types of collections across the global view.
F
South Asian Facial Timeline Dataset | Facial Images from Past
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). South Asian Facial Timeline Dataset | Facial Images from Past [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-south-asian
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
South Asia
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the South Asian Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.
Facial Image Data
This dataset comprises over 10,000+ images, divided into participant-wise sets with each set including:
•
Historical Images: 22 different high-quality historical images per individual from the timeline of 10 years.

•
Enrollment Image: One modern high-quality image for reference.

Diversity and Representation
The dataset includes contributions from a diverse network of individuals across South Asian countries:
•
Geographical Representation: Participants from countries including India, Pakistan, Bangladesh, Nepal, Sri Lanka, Bhutan, Maldives, and more.

•
Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.

•
File Format: The dataset contains images in JPEG and HEIC file format.

Quality and Conditions
To ensure high utility and robustness, all images are captured under varying conditions:
•
Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.

•
Backgrounds: A variety of backgrounds are available to enhance model generalization.

•
Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

Metadata
Each image set is accompanied by detailed metadata for each participant, including:
•Participant Identifier
•File Name
•Age at the time of capture
•Gender
•Country
•Demographic Information
•File Format
This metadata is essential for training models that can accurately recognize and identify South Asian faces across different demographics and conditions.
Usage and Applications
This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:
•
Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.

•
KYC Models: Streamlining the identity verification processes for financial and other services.

•
Biometric Identity Systems: Developing robust biometric identification solutions.

•
Age Prediction Models: Training models to accurately predict the age of individuals based on facial features.

•
Generative AI Models: Training generative AI models to create realistic and diverse synthetic facial images.

Secure and Ethical Collection
•
Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.

•
Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.

•
Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.
f
Data from: HISDAC-ES: Historical Settlement Data Compilation for Spain...
figshare.com
portalinvestigacion.udc.gal
+2more
zip
Updated Aug 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes H. Uhl; Dominic Royé; Keith Burghardt; José Antonio Aldrey Vázquez; Manuel Borobio Sanchiz; Stefan Leyk (2023). HISDAC-ES: Historical Settlement Data Compilation for Spain (1900-2020) [Dataset]. http://doi.org/10.6084/m9.figshare.22009643.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22009643.v2
Dataset updated
Aug 17, 2023
Dataset provided by
figshare
Authors
Johannes H. Uhl; Dominic Royé; Keith Burghardt; José Antonio Aldrey Vázquez; Manuel Borobio Sanchiz; Stefan Leyk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain
Description
The historical settlement data compilation for Spain (HISDAC-ES) is a geospatial dataset consisting of over 240 gridded surfaces measuring the physical, functional, age-related, and evolutionary characteristics of the Spanish building stock. We scraped, harmonized, and aggregated cadastral building footprint data for Spain, covering over 12,000,000 building footprints including construction year attributes, to create a multi-faceted series of gridded surfaces (GeoTIFF format), describing the evolution of human settlements in Spain from 1900 to 2020, at 100m spatial and 5 years temporal resolution. Also, the dataset contains aggregated characteristics and completeness statistics at the municipality level, in CSV and GeoPackage format.!!! UPDATE 08-2023 !!!: We provide a new, improved version of HISDAC-ES. Specifically, we fixed two bugs in the production code that caused an incorrect rasterization of the multitemporal BUFA layers and of the PHYS layers (BUFA, BIA, DWEL, BUNITS sum and mean). Moreover, we added decadal raster datasets measuring residential building footprint and building indoor area (1900-2020), and provide a country-wide, harmonized building footprint centroid dataset in GeoPackage vector data format.File descriptions:Datasets are available in three spatial reference systems:HISDAC-ES_All_LAEA.zip: Raster data in Lambert Azimuthal Equal Area (LAEA) covering all Spanish territory.HISDAC-ES_IbericPeninsula_UTM30.zip: Raster data in UTM Zone 30N covering all the Iberic Peninsula + Céuta and Melilla.HISDAC-ES_CanaryIslands_REGCAN.zip: Raster data in REGCAN-95, covering the Canary Islands only.HISDAC-ES_MunicipAggregates.zip: Municipality-level aggregates and completeness statistics (CSV, GeoPackage), in LAEA projection.ES_building_centroids_merged_spatjoin.gpkg: 7,000,000+ building footprint centroids in GeoPackage format, harmonized from the different cadastral systems, representing the input data for HISDAC-ES. These data can be used for sanity checks or for the creation of further, user-defined gridded surfaces.Source data:HISDAC-ES is derived from cadastral building footprint data, available from different authorities in Spain:Araba province: https://geo.araba.eus/WFS_Katastroa?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetCapabilitiesBizkaia province: https://web.bizkaia.eus/es/inspirebizkaiaGipuzkoa province: https://b5m.gipuzkoa.eus/web5000/es/utilidades/inspire/edificios/Navarra region: https://inspire.navarra.es/services/BU/wfsOther regions: http://www.catastro.minhap.es/INSPIRE/buildings/ES.SDGC.bu.atom.xmlData source of municipality polygons: Centro Nacional de Información Geográfica (https://centrodedescargas.cnig.es/CentroDescargas/index.jsp)Technical notes:Gridded dataFile nomenclature:./region_projection_theme/hisdac_es_theme_variable_version_resolution[m][_year].tifRegions:all: complete territory of Spaincan: Canarian Islands onlyibe: Iberic peninsula + Céuta + MelillaProjections:laea: Lambert azimuthal equal area (EPSG:3035)regcan: REGCAN95 / UTM zone 28N (EPSG:4083)utm: ETRS89 / UTM zone 30N (EPSG:25830)Themes:evolution / evol: multi-temporal physical measurementslanduse: multi-temporal building counts per land use (i.e., building function) classphysical / phys: physical building characteristics in 2020temporal / temp: temporal characteristics (construction year statistics)Variables: evolutionbudens: building density (count per grid cell area)bufa: building footprint areadeva: developed area (any grid cell containing at least one building)resbufa: residential building footprint arearesbia: residential building indoor areaVariables: physicalbia: building indoor areabufa: building footprint areabunits: number of building unitsdwel: number of dwellingsVariables: temporalmincoy: minimum construction year per grid cellmaxcoy: minimum construction year per grid cellmeancoy: mean construction year per grid cellmedcoy: median construction year per grid cellmodecoy: mode (most frequent) construction year per grid cellvarcoy: variety of construction years per grid cellVariable: landuseCounts of buildings per grid cell and land use type.Municipality-level datahisdac_es_municipality_stats_multitemporal_longform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in long form. Note that a value of 0 for the year attribute denotes the statistics for records without construction year information.hisdac_es_municipality_stats_multitemporal_wideform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in wide form. Note that a value of 0 for the year suffix denotes the statistics for records without construction year information.hisdac_es_municipality_stats_completeness_v1.csv: This CSV file contains the missingness rates (in %) of the building attribute per municipality, ranging from 0.0 (attribute exists for all buildings) to 100.0 (attribute exists for none of the buildings) in a given municipality.Column names for the completeness statistics tables:NATCODE: National municipality identifier*num_total: number of buildings per municperc_bymiss: Percentage of buildings with missing built year (construction year)perc_lumiss: Percentage of buildings with missing landuse attributeperc_luother: Percentage of buildings with landuse type "other"perc_num_floors_miss: Percentage of buildings without valid number of floors attributeperc_num_dwel_miss: Percentage of buildings without valid number of dwellings attributeperc_num_bunits_miss: Percentage of buildings without valid number of building units attributeperc_offi_area_miss: Percentage of buildings without valid official area (building indoor area, BIA) attributeperc_num_dwel_and_num_bunits_miss: Percentage of buildings missing both number of dwellings and number of building units attributeThe same statistics are available as geopackage file including municipality polygons in Lambert azimuthal equal area (EPSG:3035).*From the NATCODE, other regional identifiers can be derived as follows:NATCODE: 34 01 04 04001Country: 34Comunidad autónoma (CA_CODE): 01Province (PROV_CODE): 04LAU code: 04001 (province + municipality code)
Data from: LANGUAGE – PROLIFIC SOURCE OF HUMAN MIND AND SPEECH
commons.datacite.org
figshare.com
Updated Apr 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Southern Caucasus Media Group (2020). LANGUAGE – PROLIFIC SOURCE OF HUMAN MIND AND SPEECH [Dataset]. http://doi.org/10.6084/m9.figshare.12110181.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.12110181.v1
Dataset updated
Apr 10, 2020
Dataset provided by
DataCitehttps://www.datacite.org/
Figsharehttp://figshare.com/
figshare
Authors
Southern Caucasus Media Group
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The paper encloses different aspects of language and its facets which influences the social way of thinking and speaking in human behaviour. As we argue, generating new words and other fundamental changes in linguistic as the main part of language development, makes a profound impact on the components of culture and language simultaneously. The article also reviews the strong power of human words and speech that makes unforgettable sense of human evolution. The central issue is how important the word expression is and so features of the words expressed by human beings actually reveal their attitude and the way they perceive the whole world. So, language has played a prominent role in human history. Furthermore, the things we use in our everyday lives rely on specialized knowledge or skills human race produce through the language. The information behind these was historically coded in verbal communications, and with the advent of writing it could be stored and become increasingly complex and sophisticated. As it is underlined, the evolution of human spices absolutely depends on language, and if we carefully accept every change in as language with the correct approach to the issue, we really became inheritors of unusual and invaluable heritage.
F
Native American Facial Timeline Dataset | Facial Images from Past
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Native American Facial Timeline Dataset | Facial Images from Past [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-native-american
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Native American Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.
Facial Image Data
This dataset comprises over 5,000+ images, divided into participant-wise sets with each set including:
•
Historical Images: 22 different high-quality historical images per individual from the timeline of 10 years.

•
Enrollment Image: One modern high-quality image for reference.

Diversity and Representation
The dataset includes contributions from a diverse network of individuals across Native American countries:
•
Geographical Representation: Participants from countries including USA, Canada, Mexico and more.

•
Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.

•
File Format: The dataset contains images in JPEG and HEIC file format.

Quality and Conditions
To ensure high utility and robustness, all images are captured under varying conditions:
•
Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.

•
Backgrounds: A variety of backgrounds are available to enhance model generalization.

•
Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

Metadata
Each image set is accompanied by detailed metadata for each participant, including:
•Participant Identifier
•File Name
•Age at the time of capture
•Gender
•Country
•Demographic Information
•File Format
This metadata is essential for training models that can accurately recognize and identify Native American faces across different demographics and conditions.
Usage and Applications
This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:
•
Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.

•
KYC Models: Streamlining the identity verification processes for financial and other services.

•
Biometric Identity Systems: Developing robust biometric identification solutions.

•
Age Prediction Models: Training models to accurately predict the age of individuals based on facial features.

•
Generative AI Models: Training generative AI models to create realistic and diverse synthetic facial images.

Secure and Ethical Collection
•
Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.

•
Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.

•
Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.

<h3 style="font-weight:
d
Location Data | GLOBAL HISTORICAL (2018 - Present) | Precise Mobile Location...
datarade.ai
.csv
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Veraset (2022). Location Data | GLOBAL HISTORICAL (2018 - Present) | Precise Mobile Location Data [Dataset]. https://datarade.ai/data-products/historical-geospatial-movement-data-170-countries-veraset
Explore at:
.csvAvailable download formats
Dataset updated
May 31, 2022
Dataset authored and provided by
Veraset
Area covered
United States
Description
Veraset 'Movement' (GPS Footfall Data, from a mobile device) offers unparalleled real-time insights into footfall traffic patterns globally covering the US and 170 other countries from 2018 to the present day.

This dataset covers over 170+ countries and comprises billions of pseudonymous GPS signals daily, creating one of the cleanest Mobile Location Datasets available.

Veraset provides the most reliable, compliant, commercially available Location Data dataset on the market, drawing on raw GPS data from tier-1 apps, SDKs, and aggregators of mobile devices to provide customers with accurate, up-to-the-minute information on human movement.

Prioritizing compliance and privacy, it serves as a foundation for advanced analytics and strategic planning across various industries.

Our work has been used by Fortune 500 companies, leading institutions, and top brands that need reliable geospatial data.

Veraset’s Movement (raw Location Data) product is the best choice for anyone building products or models powered by historical raw location data.

Uses for Location Data: - Infrastructure Planning - Route Optimization and Human Migration Patterning - Public Transit Optimization - Placement and Targeting - Advertising and Attribution - Segmentation and Audience Building - Competitive Analysis

For up-to-date schema, visit: https://www.veraset.com/docs/movement
Z
New global dataset on historical water-related conflict and cooperation...
data.niaid.nih.gov
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zahra Kalantari (2024). New global dataset on historical water-related conflict and cooperation events [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7465152
Explore at:
Dataset updated
Jul 15, 2024
Dataset provided by
Elisie Kåresdotter
Zahra Kalantari
Haozhi Pan
Gustav Skoog
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The water-related conflict and cooperation events database was created as a part of a larger project: "The missing link: how does the climate affect human conflicts and collaborations through water?" where the goal is to increase understanding of how people and the climate affect water flows and how, in turn, these changes affect cooperation and conflicts over water. Formas, project 2017-00,608 support this project.

The database includes a collection of cooperation and conflict events between 1951 and 2019. The Transboundary Freshwater Dispute Database (2010) and WCC (Pacific Institute, Oakland, CA, 2022) were used as data on water-related acts of cooperation and conflict over time. As TFDD cooperation data ends in 2008, cooperation events were extended following a similar methodology as was used in the creation of TFDD. Further, geographic locations and regional classifications were added to all events, which can be used to create visualizations and extract subsets of the database for different parts of the world. The database methodology flow chart included in the files gives a brief overview of steps taken to prepare and process the data into the database.

The openly available scientific article in Science of the Total Environment (STOTEN) highlight findings using this dataset and gives further explanations relating to the database. The article can be found here: https://doi.org/10.1016/j.scitotenv.2023.161555.
d
Data from: Networking for Historical Justice: The Application of Graph...
search.dataone.org
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pan, Keyao (2023). Networking for Historical Justice: The Application of Graph Database Management Systems to Network Analysis Projects and the Case Study of the Reparation Movement for Japanese Colonial and Wartime Atrocities [Dataset]. http://doi.org/10.7910/DVN/CZ4PBO
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/CZ4PBO
Dataset updated
Nov 9, 2023
Dataset provided by
Harvard Dataverse
Authors
Pan, Keyao
Description
This is an ongoing project to digitize reparation lawsuits against Japanese colonial and wartime atrocities (most famously the "comfort women" system and Nanjing Massacre) into a graph database. Information about the lawsuits is taken from publicly available sources such as the 日本戦後補償裁判総覧 (http://justice.skr.jp/souran/souran-jp-web.htm), digitized, processed, and exported as cypher codes executable by graph database management or processing systems such as Neo4j. The database seeks to not only preserve historical materials produced in this transnational movement but also aid academic research and teaching of it. The project explores the applicability of graph database management systems to network analysis research and teaching in the field of digital humanities. By inputting the data about lawsuits and lawyers in the movement into a graph database, the project demonstrates the advantages of managing network data in graph database structure over relational database structure, which is the mainstream in network analysis research, in terms of scalability, modifiability, intuitive visibility, and query efficiency. The all-plain.cypher file can be loaded into graph database systems like Neo4j (https://sandbox.neo4j.com) to generate the database. The all data Kineviz-graphxr DATE.graphxr file can be loaded into the web-based graph visualization and processing tool GraphXR(https://graphxr.kineviz.com/register) with an account.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Nov 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.
Metfaces Image Dataset
kaggle.com
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Metfaces Image Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/metfaces-image-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Metfaces Image Dataset

Metropolitan Museum of Art Faces Image Dataset

By huggan (From Huggingface) [source]

About this dataset

Researchers and developers can leverage this dataset to explore and analyze facial representations depicted in different artistic styles throughout history. These images represent a rich tapestry of human expressions, cultural diversity, and artistic interpretations, providing ample opportunities for leveraging computer vision techniques.

By utilizing this extensive dataset during model training, machine learning practitioners can enhance their algorithms' ability to recognize and interpret facial elements accurately. This is particularly beneficial in applications such as face recognition systems, emotion detection algorithms, portrait analysis tools, or even historical research endeavors focusing on portraiture.

How to use the dataset

Downloading the Dataset:

Start by downloading the dataset from Kaggle's website. The dataset file is named train.csv, which contains the necessary image data for training your models.

Exploring the Data:

Once you have downloaded and extracted the dataset, it's time to explore its contents. Load the train.csv file into your preferred programming environment or data analysis tool to get an overview of its structure and columns.

Understanding the Columns:

The main column of interest in this dataset is called image. This column contains links or references to specific images in the Metropolitan Museum of Art's collection, showcasing different faces captured within them.

Accessing Images from URLs or References:

To access each image associated with their respective URLs or references, you can write code or use libraries that support web scraping or download functionality. Each row under the image column will provide you with a URL or reference that can be used to fetch and download that particular image.

Preprocessing and Data Augmentation (Optional):

Depending on your use case, you might need to perform various preprocessing techniques on these images before using them as input for your machine learning models. Preprocessing steps may include resizing, cropping, normalization, color space conversions, etc.

Training Machine Learning Models:

Once you have preprocessed any necessary data, it's time to start training your machine learning models using this image dataset as training samples.

Analysis and Evaluation:

After successfully training your model(s), evaluate their performance using validation datasetse if available . You can also make predictions on unseen images, measure accuracy, and analyze the results to gain insights or adjust your models accordingly.

Additional Considerations:

Remember to give appropriate credit to the Metropolitan Museum of Art for providing this image dataset when using it in research papers or other publications. Additionally, be aware of any licensing restrictions or terms of use associated with the images themselves.

Research Ideas

Facial recognition: This dataset can be used to train machine learning models for facial recognition systems. By using the various images of faces from the Metropolitan Museum of Art, the models can learn to identify and differentiate between different individuals based on their facial features.

Emotion detection: The images in this dataset can be utilized for training models that can detect emotions on human faces. This could be valuable in applications such as market research, where understanding customer emotional responses to products or advertisements is crucial.

Cultural analysis: With a diverse range of historical faces from different times and regions, this dataset could be employed for cultural analysis and exploration. Machine learning algorithms can identify common visual patterns or differences among different cultures, shedding light on the evolution of human appearances across time and geography

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description ...
o
Data and Code for: Automated Linking of Historical Data
openicpsr.org
delimited
Updated Mar 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ran Abramitzky; Leah Boustan; Katherine Eriksson; James Feigenbaum; Santiago Pérez (2021). Data and Code for: Automated Linking of Historical Data [Dataset]. http://doi.org/10.3886/E133781V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E133781V1
Dataset updated
Mar 1, 2021
Dataset provided by
American Economic Association
Authors
Ran Abramitzky; Leah Boustan; Katherine Eriksson; James Feigenbaum; Santiago Pérez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1850 - 1940
Area covered
United States; Norway
Description
The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5%) false positive rates. The automated methods trace out a frontier illustrating the tradeoff between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods.
GUI for a historical database
figshare.com
txt
Updated Jun 29, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Don de Lange (2018). GUI for a historical database [Dataset]. http://doi.org/10.6084/m9.figshare.6429452.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6429452.v1
Dataset updated
Jun 29, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Don de Lange
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this research the characteristics of a usable Graphical User Interface (GUI) are determined in the context of a historical database. A GUI is an interface that enables users to directly interact with the content the GUI is build upon and the functionalities the GUI offers. The historical database is about former German citizens residing in the Netherlands, in the process of removing their Enemy of the state status. This status was given by the Dutch government in the aftermath of WWII, as a retribution for the German atrocities during WWII. The operation ended due to resistance amongst the Dutch citizens, after which the citizens could remove their Enemy of the State status. The mockup GUI incorporated the following usable characteristics; giving users the information they seek with justification, clear and useful functionalities of the GUI, simple in its use, and a structured layout. The mockup GUI was evaluated by average internet users, that tested the mockup GUI version interactively and reviewed their experience with usability statements. The mockup GUI was evaluated as good, so the given usable characteristics make the GUI usable.
P
Bluesky Social Dataset Dataset
paperswithcode.com
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Bluesky Social Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/bluesky-social-dataset
Explore at:
Dataset updated
Apr 28, 2024
Description
Bluesky Social Dataset Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.

The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.

Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions and time of bookmarking.

This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection, and performing content virality and diffusion analysis.

Dataset Here is a description of the dataset files.

followers.csv.gz. This compressed file contains the anonymized follower edge list. Once decompressed, each row consists of two comma-separated integers u, v, representing a directed following relation (i.e., user u follows user v). posts.tar.gz. This compressed folder contains data on the individual posts collected. Decompressing this file results in 100 files, each containing the full posts of up to 50,000 users. Each post is stored as a JSON-formatted line. interactions.csv.gz. This compressed file contains the anonymized interactions edge list. Once decompressed, each row consists of six comma-separated integers, and represents a comment, repost, or quote interaction. These integers correspond to the following fields, in this order: user_id, replied_author, thread_root_author, reposted_author ,quoted_author, and date. graphs.tar.gz. This compressed folder contains edge list files for the graphs emerging from reposts, quotes, and replies. Each interaction is timestamped. The folder also contains timestamped higher-order interactions emerging from discussion threads, each containing all users participating in a thread. feed_posts.tar.gz. This compressed folder contains posts that appear in 11 thematic feeds. Decompressing this folder results in 11 files containing posts from one feed each. Posts are stored as a JSON-formatted line. Fields are correspond to those in posts.tar.gz, except for those related to sentiment analysis (sent_label, sent_score), and reposts (repost_from, reposted_author); feed_bookmarks.csv. This file contains users who bookmarked any of the collected feeds. Each record contains three comma-separated values, namely the feed name, the user id, and the timestamp. feed_post_likes.tar.gz. This compressed folder contains data on likes to posts appearing in the feeds, one file per feed. Each record in the files contains the following information, in this order: the id of the ``liker'', the id of the post's author, the id of the liked post, and the like timestamp; scripts.tar.gz. A collection of Python scripts, including the ones originally used to crawl the data, and to perform experiments. These scripts are detailed in a document released within the folder.

Citation If used for research purposes, please cite the following paper describing the dataset details:

Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data. (2024) arXiv:2404.18984

Acknowledgments: This work is supported by :

the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”, Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu); SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021; EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).
f
Post metadata.
plos.figshare.com
xls
Updated Nov 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Failla; Giulio Rossetti (2024). Post metadata. [Dataset]. http://doi.org/10.1371/journal.pone.0310330.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310330.t001
Dataset updated
Nov 5, 2024
Dataset provided by
PLOS ONE
Authors
Andrea Failla; Giulio Rossetti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
F
Polish Open Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Polish Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/polish-open-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Polish Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Polish language, advancing the field of artificial intelligence.
Dataset Content:
This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Polish. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Polish people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Polish Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
Both the question and answers in Polish are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Polish Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.
DEMETER2 data
figshare.com
txt
Updated Apr 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cancer Data Science (2020). DEMETER2 data [Dataset]. http://doi.org/10.6084/m9.figshare.6025238.v6
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6025238.v6
Dataset updated
Apr 9, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Cancer Data Science
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cancer cell line genetic dependencies estimated using the DEMETER2 model. DEMETER2 is applied to three large-scale RNAi screening datasets: the Broad Institute Project Achilles, Novartis Project DRIVE, and the Marcotte et al. breast cell line dataset. The model is also applied to generate a combined dataset of gene dependencies covering a total of 712 unique cancer cell lines. For more information visit https://depmap.org/R2-D2/. Visit the Cancer Dependency Map portal at https://depmap.org to explore related datasets. Email questions to depmap@broadinstitute.org This dataset includes gene dependencies estimated using the DEMETER2 model, the raw input datasets used to fit the models, as well as associated metadata. See Readme file for more details about the dataset contents and version history.-------------------------------------------------------------------Version history: (see README for more details)-------------------------------------------------------------------v1: Initial data releasev2: - Removed small number of non-human genes (e.g. GFP, RFP) from shRNA-to-gene mapping - Updated cell line names to be consistent with DepMap names, according to the following map (old -> new):v3: Added estimated seed effect matricesv4: Added RNAseq and mutation data files used in analysis for manuscriptv5: Fixed minor bug with Marcotte LFC data that caused hairpins targeting multiple genes to appear multiple times in the LFC matrix. This created bias in the seed effect estimates for those hairpins, causing very minor differences to the resulting model parameters.v6: Added tables with shRNA quality metrics for Achilles and DRIVE data
P
Data from: AmsterTime Dataset
paperswithcode.com
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burak Yildiz; Seyran Khademi; Ronald Maria Siebes; Jan van Gemert (2023). AmsterTime Dataset [Dataset]. https://paperswithcode.com/dataset/amstertime
Explore at:
Dataset updated
Apr 13, 2023
Authors
Burak Yildiz; Seyran Khademi; Ronald Maria Siebes; Jan van Gemert
Description
AmsterTime dataset offers a collection of 2,500 well-curated images matching the same scene from a street view matched to historical archival image data from Amsterdam city. The image pairs capture the same place with different cameras, viewpoints, and appearances. Unlike existing benchmark datasets, AmsterTime is directly crowdsourced in a GIS navigation platform (Mapillary). In turn, all the matching pairs are verified by a human expert to verify the correct matches and evaluate the human competence in the Visual Place Recognition (VPR) task for further references.

The properties of the dataset are summarized as:

1200+ license-free images from the Amsterdam City Archive, representing urban places in the city of Amsterdam, captured in the past century by many photographers. All archival queries are matched with street view images from Mapillary. All matches are verified by architectural historians and Amsterdam inhabitants. Image pairs are archival and street views capturing the same place with different cameras, time lags, structural changes, occlusion, viewpoint, appearance, and illuminations. The dataset exhibits a domain shift between query and the gallery due to significant difference between scanned archival and street view images.

Two sub-tasks are created on the dataset:

Verification is a binary classification (auxiliary) task to detect a pair of archival and street-view images of the same place. The verification task for AmsterTime dataset has all of the crowdsourced image pairs as positive labeled, where the same number of negative samples are generated by randomly pairing archival and street-view images summing up to a total of 2,462 pairs in the verification task.

Retrieval is the main task corresponding to VPR, in which a given query image is matched with a set of gallery images. For the retrieval task, AmsterTime dataset offers 1231 query images where the leave-one-out set serves as the gallery images for each query.
p
CARD 2.0 - Dataset - Pandora
pandora.earth
Updated Mar 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). CARD 2.0 - Dataset - Pandora [Dataset]. https://pandora.earth/gl_ES/dataset/card-2-0
Explore at:
Dataset updated
Mar 11, 2025
Description
The Canadian Archaeological Radiocarbon Database (CARD) is a compilation of radiocarbon measurements that indicate the ages of samples primarily from archaeological sites in North America. CARD also includes samples from paleontological and geological contexts. We are slowly expanding our coverage into Central and South America. These data represent a significant investment and resource for researchers interested in human history and its context. CARD was created by Dr. Richard "Dick" Morlan of the Canadian Museum of History (formerly the Canadian Museum of Civilization), and its existence is a product of his genius and labour. In July, 2014 the Canadian Museum of History (CMH) and the Laboratory of Archaeology (LOA) at the University of British Columbia formed a partnership to revise and update the CARD platform. This current version of CARD (2.0) adds useful new features, including unlimited batch uploading/downloading of data and spatial/map visualization. However, the core of CARD remains the c14 dates painstakingly submitted by researchers across the world and compiled by Dick. We hope that this revision maintains the relationship that Dick established in one of the first crowd-sourced, big data endeavors: CARD provides utility and comprehensiveness and in exchange, researchers provide us with dates. See the HELP tab for more information and instructions on using CARD. Our efforts to update and upgrade CARD are just beginning, and we are moving in two directions: to increase the quantity and quality of CARD data and to improve the functionality of the CARD platform. We are looking for partners to assist us in both. CARD data contains some errors and is in some cases incomplete, and we are engaged in a long-term process of scrubbing the data. We are also developing new functional tools to make CARD more valuable to researchers including the generation of heat-maps of date concentrations over time, and the selection of data via a map interface. We also have longer-term plans to add calibration sockets with existing calibration services. If you have 14C data, please upload it to CARD. If you are interested in getting involved in the expansion and development of CARD, please email us at admin@card.anth.ubc.ca. Radiocarbon assessment has an effective range of about 250 to +50,000 years, and as a result most of the samples in CARD are associated with Indigenous archaeological sites. These represent a significant resource into aboriginal history. CARD fuzzes location data for public visitors to the database at 1:2,000,000 scale. Accessing CARD's full capabilities requires a security-account available only to researchers at accredited institutions. As Dick wrote in CARD 1.0, "The long term future of this database will depend upon whether or not the archaeological community finds it truly useful." We hope you do.

Facebook

Twitter

Click to copy link

Link copied

Cite

FutureBee AI (2022). African Facial Timeline Dataset | Facial Images from Past [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-african

African Facial Timeline Dataset | Facial Images from Past

Human Past Image Dataset

Explore at:

wavAvailable download formats

Dataset updated

Aug 1, 2022

Dataset provided by

FutureBeeAI

Authors

FutureBee AI

License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by

FutureBeeAI

Description

Introduction

Welcome to the African Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.

Facial Image Data

This dataset comprises over 10,000+ images, divided into participant-wise sets with each set including:

•

Historical Images: 22 different high-quality historical images per individual from the timeline of 10 years.

•

Enrollment Image: One modern high-quality image for reference.

Diversity and Representation

The dataset includes contributions from a diverse network of individuals across African countries:

•

Geographical Representation: Participants from countries including Kenya, Malawi, Nigeria, Ethiopia, Benin, Somalia, Uganda, and more.

•

Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.

•

File Format: The dataset contains images in JPEG and HEIC file format.

Quality and Conditions

To ensure high utility and robustness, all images are captured under varying conditions:

•

Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.

•

Backgrounds: A variety of backgrounds are available to enhance model generalization.

•

Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

Metadata

Each image set is accompanied by detailed metadata for each participant, including:

•Participant Identifier

•File Name

•Age at the time of capture

•Gender

•Country

•Demographic Information

•File Format

This metadata is essential for training models that can accurately recognize and identify African faces across different demographics and conditions.

Usage and Applications

This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:

•

Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.

•

KYC Models: Streamlining the identity verification processes for financial and other services.

•

Biometric Identity Systems: Developing robust biometric identification solutions.

•

Age Prediction Models: Training models to accurately predict the age of individuals based on facial features.

•

Generative AI Models: Training generative AI models to create realistic and diverse synthetic facial images.

Secure and Ethical Collection

•

Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.

•

Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.

•

Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.

<h3

Clear search

Close search

Google apps

Main menu

African Facial Timeline Dataset | Facial Images from Past

Introduction

Facial Image Data

Diversity and Representation

Quality and Conditions

Metadata

Usage and Applications

Secure and Ethical Collection

Human Stampedes (1800 - 2021)

About this Dataset: Human Stampedes

Content

Interesting Analysis Ideas

The dataset of the Global Collections survey of natural history collections

South Asian Facial Timeline Dataset | Facial Images from Past

Introduction

Facial Image Data

Diversity and Representation

Quality and Conditions

Metadata

Usage and Applications

Secure and Ethical Collection

Data from: HISDAC-ES: Historical Settlement Data Compilation for Spain...

Data from: LANGUAGE – PROLIFIC SOURCE OF HUMAN MIND AND SPEECH

Native American Facial Timeline Dataset | Facial Images from Past

Introduction

Facial Image Data

Diversity and Representation

Quality and Conditions

Metadata

Usage and Applications

Secure and Ethical Collection

Location Data | GLOBAL HISTORICAL (2018 - Present) | Precise Mobile Location...

New global dataset on historical water-related conflict and cooperation...

Data from: Networking for Historical Justice: The Application of Graph...

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Metfaces Image Dataset

Metfaces Image Dataset

Metropolitan Museum of Art Faces Image Dataset

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Data and Code for: Automated Linking of Historical Data

GUI for a historical database

Bluesky Social Dataset Dataset

Post metadata.

Polish Open Ended Question Answer Text Dataset

What’s Included

DEMETER2 data

Data from: AmsterTime Dataset

CARD 2.0 - Dataset - Pandora

African Facial Timeline Dataset | Facial Images from Past

Human Past Image Dataset

Introduction

Facial Image Data

Diversity and Representation

Quality and Conditions

Metadata

Usage and Applications

Secure and Ethical Collection