53 datasets found
  1. F

    African Facial Timeline Dataset | Facial Images from Past

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). African Facial Timeline Dataset | Facial Images from Past [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-african
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the African Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.

    Facial Image Data

    This dataset comprises over 10,000+ images, divided into participant-wise sets with each set including:

    Historical Images: 22 different high-quality historical images per individual from the timeline of 10 years.
    Enrollment Image: One modern high-quality image for reference.

    Diversity and Representation

    The dataset includes contributions from a diverse network of individuals across African countries:

    Geographical Representation: Participants from countries including Kenya, Malawi, Nigeria, Ethiopia, Benin, Somalia, Uganda, and more.
    Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.
    File Format: The dataset contains images in JPEG and HEIC file format.

    Quality and Conditions

    To ensure high utility and robustness, all images are captured under varying conditions:

    Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.
    Backgrounds: A variety of backgrounds are available to enhance model generalization.
    Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

    Metadata

    Each image set is accompanied by detailed metadata for each participant, including:

    Participant Identifier
    File Name
    Age at the time of capture
    Gender
    Country
    Demographic Information
    File Format

    This metadata is essential for training models that can accurately recognize and identify African faces across different demographics and conditions.

    Usage and Applications

    This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:

    Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.
    KYC Models: Streamlining the identity verification processes for financial and other services.
    Biometric Identity Systems: Developing robust biometric identification solutions.
    Age Prediction Models: Training models to accurately predict the age of individuals based on facial features.
    Generative AI Models: Training generative AI models to create realistic and diverse synthetic facial images.

    Secure and Ethical Collection

    Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.
    Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
    Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.
    <h3

  2. Human Stampedes (1800 - 2021)

    • kaggle.com
    Updated Dec 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Bansal (2021). Human Stampedes (1800 - 2021) [Dataset]. https://www.kaggle.com/datasets/shivamb/human-stampede
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2021
    Dataset provided by
    Kaggle
    Authors
    Shivam Bansal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About this Dataset: Human Stampedes

    Crushes often occur during religious pilgrimages and large entertainment events, as they tend to involve dense crowds, with people closely surrounded on all sides. Human stampedes and crushes also occur as people try to get away from a perceived danger, as in a case where a noxious gas was released in crowded premises.

    Content

    The dataset contains all the notable human stampedes and crushes events along with meta information such as - location, total deaths, description.

    Interesting Analysis Ideas

    • Perform Explorator Analysis to identify key topics, themes from the event descriptions
    • Use NLP and Visualizations to generate a dashboard of historical stampedes
  3. The dataset of the Global Collections survey of natural history collections

    • zenodo.org
    bin, pdf, txt, zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt Woodburn; Matt Woodburn; Robert J. Corrigan; Nicholas Drew; Cailin Meyer; Vincent S. Smith; Vincent S. Smith; Sarah Vincent; Sarah Vincent; Robert J. Corrigan; Nicholas Drew; Cailin Meyer (2024). The dataset of the Global Collections survey of natural history collections [Dataset]. http://doi.org/10.5281/zenodo.6985399
    Explore at:
    pdf, bin, zip, txtAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matt Woodburn; Matt Woodburn; Robert J. Corrigan; Nicholas Drew; Cailin Meyer; Vincent S. Smith; Vincent S. Smith; Sarah Vincent; Sarah Vincent; Robert J. Corrigan; Nicholas Drew; Cailin Meyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    From 2016 to 2018, we surveyed the world’s largest natural history museum collections to begin mapping this globally distributed scientific infrastructure. The resulting dataset includes 73 institutions across the globe. It has:

    • Basic institution data for the 73 contributing institutions, including estimated total collection sizes, geographic locations (to the city) and latitude/longitude, and Research Organization Registry (ROR) identifiers where available.

    • Resourcing information, covering the numbers of research, collections and volunteer staff in each institution.

    • Indicators of the presence and size of collections within each institution broken down into a grid of 19 collection disciplines and 16 geographic regions.

    • Measures of the depth and breadth of individual researcher experience across the same disciplines and geographic regions.

    This dataset contains the data (raw and processed) collected for the survey, and specifications for the schema used to store the data. It includes:

    1. A diagram of the MySQL database schema.
    2. A SQL dump of the MySQL database schema, excluding the data.
    3. A SQL dump of the MySQL database schema with all data. This may be imported into an instance of MySQL Server to create a complete reconstruction of the database.
    4. Raw data from each database table in CSV format.
    5. A set of more human-readable views of the data in CSV format. These correspond to the database tables, but foreign keys are substituted for values from the linked tables to make the data easier to read and analyse.
    6. A text file containing the definitions of the size categories used in the collection_unit table.

    The global collections data may also be accessed at https://rebrand.ly/global-collections. This is a preliminary dashboard, constructed and published using Microsoft Power BI, that enables the exploration of the data through a set of visualisations and filters. The dashboard consists of three pages:

    Institutional profile: Enables the selection of a specific institution and provides summary information on the institution and its location, staffing, total collection size, collection breakdown and researcher expertise.

    Overall heatmap: Supports an interactive exploration of the global picture, including a heatmap of collection distribution across the discipline and geographic categories, and visualisations that demonstrate the relative breadth of collections across institutions and correlations between collection size and breadth. Various filters allow the focus to be refined to specific regions and collection sizes.

    Browse: Provides some alternative methods of filtering and visualising the global dataset to look at patterns in the distribution and size of different types of collections across the global view.

  4. F

    South Asian Facial Timeline Dataset | Facial Images from Past

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). South Asian Facial Timeline Dataset | Facial Images from Past [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-south-asian
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    South Asia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the South Asian Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.

    Facial Image Data

    This dataset comprises over 10,000+ images, divided into participant-wise sets with each set including:

    Historical Images: 22 different high-quality historical images per individual from the timeline of 10 years.
    Enrollment Image: One modern high-quality image for reference.

    Diversity and Representation

    The dataset includes contributions from a diverse network of individuals across South Asian countries:

    Geographical Representation: Participants from countries including India, Pakistan, Bangladesh, Nepal, Sri Lanka, Bhutan, Maldives, and more.
    Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.
    File Format: The dataset contains images in JPEG and HEIC file format.

    Quality and Conditions

    To ensure high utility and robustness, all images are captured under varying conditions:

    Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.
    Backgrounds: A variety of backgrounds are available to enhance model generalization.
    Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

    Metadata

    Each image set is accompanied by detailed metadata for each participant, including:

    Participant Identifier
    File Name
    Age at the time of capture
    Gender
    Country
    Demographic Information
    File Format

    This metadata is essential for training models that can accurately recognize and identify South Asian faces across different demographics and conditions.

    Usage and Applications

    This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:

    Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.
    KYC Models: Streamlining the identity verification processes for financial and other services.
    Biometric Identity Systems: Developing robust biometric identification solutions.
    Age Prediction Models: Training models to accurately predict the age of individuals based on facial features.
    Generative AI Models: Training generative AI models to create realistic and diverse synthetic facial images.

    Secure and Ethical Collection

    Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.
    Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
    Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.

  5. f

    Data from: HISDAC-ES: Historical Settlement Data Compilation for Spain...

    • figshare.com
    • portalinvestigacion.udc.gal
    • +2more
    zip
    Updated Aug 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes H. Uhl; Dominic Royé; Keith Burghardt; José Antonio Aldrey Vázquez; Manuel Borobio Sanchiz; Stefan Leyk (2023). HISDAC-ES: Historical Settlement Data Compilation for Spain (1900-2020) [Dataset]. http://doi.org/10.6084/m9.figshare.22009643.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 17, 2023
    Dataset provided by
    figshare
    Authors
    Johannes H. Uhl; Dominic Royé; Keith Burghardt; José Antonio Aldrey Vázquez; Manuel Borobio Sanchiz; Stefan Leyk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Spain
    Description

    The historical settlement data compilation for Spain (HISDAC-ES) is a geospatial dataset consisting of over 240 gridded surfaces measuring the physical, functional, age-related, and evolutionary characteristics of the Spanish building stock. We scraped, harmonized, and aggregated cadastral building footprint data for Spain, covering over 12,000,000 building footprints including construction year attributes, to create a multi-faceted series of gridded surfaces (GeoTIFF format), describing the evolution of human settlements in Spain from 1900 to 2020, at 100m spatial and 5 years temporal resolution. Also, the dataset contains aggregated characteristics and completeness statistics at the municipality level, in CSV and GeoPackage format.!!! UPDATE 08-2023 !!!: We provide a new, improved version of HISDAC-ES. Specifically, we fixed two bugs in the production code that caused an incorrect rasterization of the multitemporal BUFA layers and of the PHYS layers (BUFA, BIA, DWEL, BUNITS sum and mean). Moreover, we added decadal raster datasets measuring residential building footprint and building indoor area (1900-2020), and provide a country-wide, harmonized building footprint centroid dataset in GeoPackage vector data format.File descriptions:Datasets are available in three spatial reference systems:HISDAC-ES_All_LAEA.zip: Raster data in Lambert Azimuthal Equal Area (LAEA) covering all Spanish territory.HISDAC-ES_IbericPeninsula_UTM30.zip: Raster data in UTM Zone 30N covering all the Iberic Peninsula + Céuta and Melilla.HISDAC-ES_CanaryIslands_REGCAN.zip: Raster data in REGCAN-95, covering the Canary Islands only.HISDAC-ES_MunicipAggregates.zip: Municipality-level aggregates and completeness statistics (CSV, GeoPackage), in LAEA projection.ES_building_centroids_merged_spatjoin.gpkg: 7,000,000+ building footprint centroids in GeoPackage format, harmonized from the different cadastral systems, representing the input data for HISDAC-ES. These data can be used for sanity checks or for the creation of further, user-defined gridded surfaces.Source data:HISDAC-ES is derived from cadastral building footprint data, available from different authorities in Spain:Araba province: https://geo.araba.eus/WFS_Katastroa?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetCapabilitiesBizkaia province: https://web.bizkaia.eus/es/inspirebizkaiaGipuzkoa province: https://b5m.gipuzkoa.eus/web5000/es/utilidades/inspire/edificios/Navarra region: https://inspire.navarra.es/services/BU/wfsOther regions: http://www.catastro.minhap.es/INSPIRE/buildings/ES.SDGC.bu.atom.xmlData source of municipality polygons: Centro Nacional de Información Geográfica (https://centrodedescargas.cnig.es/CentroDescargas/index.jsp)Technical notes:Gridded dataFile nomenclature:./region_projection_theme/hisdac_es_theme_variable_version_resolution[m][_year].tifRegions:all: complete territory of Spaincan: Canarian Islands onlyibe: Iberic peninsula + Céuta + MelillaProjections:laea: Lambert azimuthal equal area (EPSG:3035)regcan: REGCAN95 / UTM zone 28N (EPSG:4083)utm: ETRS89 / UTM zone 30N (EPSG:25830)Themes:evolution / evol: multi-temporal physical measurementslanduse: multi-temporal building counts per land use (i.e., building function) classphysical / phys: physical building characteristics in 2020temporal / temp: temporal characteristics (construction year statistics)Variables: evolutionbudens: building density (count per grid cell area)bufa: building footprint areadeva: developed area (any grid cell containing at least one building)resbufa: residential building footprint arearesbia: residential building indoor areaVariables: physicalbia: building indoor areabufa: building footprint areabunits: number of building unitsdwel: number of dwellingsVariables: temporalmincoy: minimum construction year per grid cellmaxcoy: minimum construction year per grid cellmeancoy: mean construction year per grid cellmedcoy: median construction year per grid cellmodecoy: mode (most frequent) construction year per grid cellvarcoy: variety of construction years per grid cellVariable: landuseCounts of buildings per grid cell and land use type.Municipality-level datahisdac_es_municipality_stats_multitemporal_longform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in long form. Note that a value of 0 for the year attribute denotes the statistics for records without construction year information.hisdac_es_municipality_stats_multitemporal_wideform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in wide form. Note that a value of 0 for the year suffix denotes the statistics for records without construction year information.hisdac_es_municipality_stats_completeness_v1.csv: This CSV file contains the missingness rates (in %) of the building attribute per municipality, ranging from 0.0 (attribute exists for all buildings) to 100.0 (attribute exists for none of the buildings) in a given municipality.Column names for the completeness statistics tables:NATCODE: National municipality identifier*num_total: number of buildings per municperc_bymiss: Percentage of buildings with missing built year (construction year)perc_lumiss: Percentage of buildings with missing landuse attributeperc_luother: Percentage of buildings with landuse type "other"perc_num_floors_miss: Percentage of buildings without valid number of floors attributeperc_num_dwel_miss: Percentage of buildings without valid number of dwellings attributeperc_num_bunits_miss: Percentage of buildings without valid number of building units attributeperc_offi_area_miss: Percentage of buildings without valid official area (building indoor area, BIA) attributeperc_num_dwel_and_num_bunits_miss: Percentage of buildings missing both number of dwellings and number of building units attributeThe same statistics are available as geopackage file including municipality polygons in Lambert azimuthal equal area (EPSG:3035).*From the NATCODE, other regional identifiers can be derived as follows:NATCODE: 34 01 04 04001Country: 34Comunidad autónoma (CA_CODE): 01Province (PROV_CODE): 04LAU code: 04001 (province + municipality code)

  6. Data from: LANGUAGE – PROLIFIC SOURCE OF HUMAN MIND AND SPEECH

    • commons.datacite.org
    • figshare.com
    Updated Apr 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Southern Caucasus Media Group (2020). LANGUAGE – PROLIFIC SOURCE OF HUMAN MIND AND SPEECH [Dataset]. http://doi.org/10.6084/m9.figshare.12110181.v1
    Explore at:
    Dataset updated
    Apr 10, 2020
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Figsharehttp://figshare.com/
    figshare
    Authors
    Southern Caucasus Media Group
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The paper encloses different aspects of language and its facets which influences the social way of thinking and speaking in human behaviour. As we argue, generating new words and other fundamental changes in linguistic as the main part of language development, makes a profound impact on the components of culture and language simultaneously. The article also reviews the strong power of human words and speech that makes unforgettable sense of human evolution. The central issue is how important the word expression is and so features of the words expressed by human beings actually reveal their attitude and the way they perceive the whole world. So, language has played a prominent role in human history. Furthermore, the things we use in our everyday lives rely on specialized knowledge or skills human race produce through the language. The information behind these was historically coded in verbal communications, and with the advent of writing it could be stored and become increasingly complex and sophisticated. As it is underlined, the evolution of human spices absolutely depends on language, and if we carefully accept every change in as language with the correct approach to the issue, we really became inheritors of unusual and invaluable heritage.

  7. F

    Native American Facial Timeline Dataset | Facial Images from Past

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Native American Facial Timeline Dataset | Facial Images from Past [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-native-american
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Native American Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.

    Facial Image Data

    This dataset comprises over 5,000+ images, divided into participant-wise sets with each set including:

    Historical Images: 22 different high-quality historical images per individual from the timeline of 10 years.
    Enrollment Image: One modern high-quality image for reference.

    Diversity and Representation

    The dataset includes contributions from a diverse network of individuals across Native American countries:

    Geographical Representation: Participants from countries including USA, Canada, Mexico and more.
    Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.
    File Format: The dataset contains images in JPEG and HEIC file format.

    Quality and Conditions

    To ensure high utility and robustness, all images are captured under varying conditions:

    Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.
    Backgrounds: A variety of backgrounds are available to enhance model generalization.
    Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

    Metadata

    Each image set is accompanied by detailed metadata for each participant, including:

    Participant Identifier
    File Name
    Age at the time of capture
    Gender
    Country
    Demographic Information
    File Format

    This metadata is essential for training models that can accurately recognize and identify Native American faces across different demographics and conditions.

    Usage and Applications

    This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:

    Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.
    KYC Models: Streamlining the identity verification processes for financial and other services.
    Biometric Identity Systems: Developing robust biometric identification solutions.
    Age Prediction Models: Training models to accurately predict the age of individuals based on facial features.
    Generative AI Models: Training generative AI models to create realistic and diverse synthetic facial images.

    Secure and Ethical Collection

    Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.
    Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
    Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.
    <h3 style="font-weight:

  8. d

    Location Data | GLOBAL HISTORICAL (2018 - Present) | Precise Mobile Location...

    • datarade.ai
    .csv
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veraset (2022). Location Data | GLOBAL HISTORICAL (2018 - Present) | Precise Mobile Location Data [Dataset]. https://datarade.ai/data-products/historical-geospatial-movement-data-170-countries-veraset
    Explore at:
    .csvAvailable download formats
    Dataset updated
    May 31, 2022
    Dataset authored and provided by
    Veraset
    Area covered
    United States
    Description

    Veraset 'Movement' (GPS Footfall Data, from a mobile device) offers unparalleled real-time insights into footfall traffic patterns globally covering the US and 170 other countries from 2018 to the present day.

    This dataset covers over 170+ countries and comprises billions of pseudonymous GPS signals daily, creating one of the cleanest Mobile Location Datasets available.

    Veraset provides the most reliable, compliant, commercially available Location Data dataset on the market, drawing on raw GPS data from tier-1 apps, SDKs, and aggregators of mobile devices to provide customers with accurate, up-to-the-minute information on human movement.

    Prioritizing compliance and privacy, it serves as a foundation for advanced analytics and strategic planning across various industries.

    Our work has been used by Fortune 500 companies, leading institutions, and top brands that need reliable geospatial data.

    Veraset’s Movement (raw Location Data) product is the best choice for anyone building products or models powered by historical raw location data.

    Uses for Location Data: - Infrastructure Planning - Route Optimization and Human Migration Patterning - Public Transit Optimization - Placement and Targeting - Advertising and Attribution - Segmentation and Audience Building - Competitive Analysis

    For up-to-date schema, visit: https://www.veraset.com/docs/movement

  9. Z

    New global dataset on historical water-related conflict and cooperation...

    • data.niaid.nih.gov
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zahra Kalantari (2024). New global dataset on historical water-related conflict and cooperation events [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7465152
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Elisie Kåresdotter
    Zahra Kalantari
    Haozhi Pan
    Gustav Skoog
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The water-related conflict and cooperation events database was created as a part of a larger project: "The missing link: how does the climate affect human conflicts and collaborations through water?" where the goal is to increase understanding of how people and the climate affect water flows and how, in turn, these changes affect cooperation and conflicts over water. Formas, project 2017-00,608 support this project.

    The database includes a collection of cooperation and conflict events between 1951 and 2019. The Transboundary Freshwater Dispute Database (2010) and WCC (Pacific Institute, Oakland, CA, 2022) were used as data on water-related acts of cooperation and conflict over time. As TFDD cooperation data ends in 2008, cooperation events were extended following a similar methodology as was used in the creation of TFDD. Further, geographic locations and regional classifications were added to all events, which can be used to create visualizations and extract subsets of the database for different parts of the world. The database methodology flow chart included in the files gives a brief overview of steps taken to prepare and process the data into the database.

    The openly available scientific article in Science of the Total Environment (STOTEN) highlight findings using this dataset and gives further explanations relating to the database. The article can be found here: https://doi.org/10.1016/j.scitotenv.2023.161555.

  10. d

    Data from: Networking for Historical Justice: The Application of Graph...

    • search.dataone.org
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pan, Keyao (2023). Networking for Historical Justice: The Application of Graph Database Management Systems to Network Analysis Projects and the Case Study of the Reparation Movement for Japanese Colonial and Wartime Atrocities [Dataset]. http://doi.org/10.7910/DVN/CZ4PBO
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Pan, Keyao
    Description

    This is an ongoing project to digitize reparation lawsuits against Japanese colonial and wartime atrocities (most famously the "comfort women" system and Nanjing Massacre) into a graph database. Information about the lawsuits is taken from publicly available sources such as the 日本戦後補償裁判総覧 (http://justice.skr.jp/souran/souran-jp-web.htm), digitized, processed, and exported as cypher codes executable by graph database management or processing systems such as Neo4j. The database seeks to not only preserve historical materials produced in this transnational movement but also aid academic research and teaching of it. The project explores the applicability of graph database management systems to network analysis research and teaching in the field of digital humanities. By inputting the data about lawsuits and lawyers in the movement into a graph database, the project demonstrates the advantages of managing network data in graph database structure over relational database structure, which is the mainstream in network analysis research, in terms of scalability, modifiability, intuitive visibility, and query efficiency. The all-plain.cypher file can be loaded into graph database systems like Neo4j (https://sandbox.neo4j.com) to generate the database. The all data Kineviz-graphxr DATE.graphxr file can be loaded into the web-based graph visualization and processing tool GraphXR(https://graphxr.kineviz.com/register) with an account.

  11. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.

  12. Metfaces Image Dataset

    • kaggle.com
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Metfaces Image Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/metfaces-image-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Metfaces Image Dataset

    Metropolitan Museum of Art Faces Image Dataset

    By huggan (From Huggingface) [source]

    About this dataset

    Researchers and developers can leverage this dataset to explore and analyze facial representations depicted in different artistic styles throughout history. These images represent a rich tapestry of human expressions, cultural diversity, and artistic interpretations, providing ample opportunities for leveraging computer vision techniques.

    By utilizing this extensive dataset during model training, machine learning practitioners can enhance their algorithms' ability to recognize and interpret facial elements accurately. This is particularly beneficial in applications such as face recognition systems, emotion detection algorithms, portrait analysis tools, or even historical research endeavors focusing on portraiture.

    How to use the dataset

    • Downloading the Dataset:

      Start by downloading the dataset from Kaggle's website. The dataset file is named train.csv, which contains the necessary image data for training your models.

    • Exploring the Data:

      Once you have downloaded and extracted the dataset, it's time to explore its contents. Load the train.csv file into your preferred programming environment or data analysis tool to get an overview of its structure and columns.

    • Understanding the Columns:

      The main column of interest in this dataset is called image. This column contains links or references to specific images in the Metropolitan Museum of Art's collection, showcasing different faces captured within them.

    • Accessing Images from URLs or References:

      To access each image associated with their respective URLs or references, you can write code or use libraries that support web scraping or download functionality. Each row under the image column will provide you with a URL or reference that can be used to fetch and download that particular image.

    • Preprocessing and Data Augmentation (Optional):

      Depending on your use case, you might need to perform various preprocessing techniques on these images before using them as input for your machine learning models. Preprocessing steps may include resizing, cropping, normalization, color space conversions, etc.

    • Training Machine Learning Models:

      Once you have preprocessed any necessary data, it's time to start training your machine learning models using this image dataset as training samples.

    • Analysis and Evaluation:

      After successfully training your model(s), evaluate their performance using validation datasetse if available . You can also make predictions on unseen images, measure accuracy, and analyze the results to gain insights or adjust your models accordingly.

    • Additional Considerations:

      Remember to give appropriate credit to the Metropolitan Museum of Art for providing this image dataset when using it in research papers or other publications. Additionally, be aware of any licensing restrictions or terms of use associated with the images themselves.

    Research Ideas

    • Facial recognition: This dataset can be used to train machine learning models for facial recognition systems. By using the various images of faces from the Metropolitan Museum of Art, the models can learn to identify and differentiate between different individuals based on their facial features.
    • Emotion detection: The images in this dataset can be utilized for training models that can detect emotions on human faces. This could be valuable in applications such as market research, where understanding customer emotional responses to products or advertisements is crucial.
    • Cultural analysis: With a diverse range of historical faces from different times and regions, this dataset could be employed for cultural analysis and exploration. Machine learning algorithms can identify common visual patterns or differences among different cultures, shedding light on the evolution of human appearances across time and geography

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description ...

  13. o

    Data and Code for: Automated Linking of Historical Data

    • openicpsr.org
    delimited
    Updated Mar 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ran Abramitzky; Leah Boustan; Katherine Eriksson; James Feigenbaum; Santiago Pérez (2021). Data and Code for: Automated Linking of Historical Data [Dataset]. http://doi.org/10.3886/E133781V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 1, 2021
    Dataset provided by
    American Economic Association
    Authors
    Ran Abramitzky; Leah Boustan; Katherine Eriksson; James Feigenbaum; Santiago Pérez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1850 - 1940
    Area covered
    United States; Norway
    Description

    The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5%) false positive rates. The automated methods trace out a frontier illustrating the tradeoff between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods.

  14. GUI for a historical database

    • figshare.com
    txt
    Updated Jun 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Don de Lange (2018). GUI for a historical database [Dataset]. http://doi.org/10.6084/m9.figshare.6429452.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 29, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Don de Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this research the characteristics of a usable Graphical User Interface (GUI) are determined in the context of a historical database. A GUI is an interface that enables users to directly interact with the content the GUI is build upon and the functionalities the GUI offers. The historical database is about former German citizens residing in the Netherlands, in the process of removing their Enemy of the state status. This status was given by the Dutch government in the aftermath of WWII, as a retribution for the German atrocities during WWII. The operation ended due to resistance amongst the Dutch citizens, after which the citizens could remove their Enemy of the State status. The mockup GUI incorporated the following usable characteristics; giving users the information they seek with justification, clear and useful functionalities of the GUI, simple in its use, and a structured layout. The mockup GUI was evaluated by average internet users, that tested the mockup GUI version interactively and reviewed their experience with usability statements. The mockup GUI was evaluated as good, so the given usable characteristics make the GUI usable.

  15. P

    Bluesky Social Dataset Dataset

    • paperswithcode.com
    Updated Apr 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Bluesky Social Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/bluesky-social-dataset
    Explore at:
    Dataset updated
    Apr 28, 2024
    Description

    Bluesky Social Dataset Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.

    The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.

    Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions and time of bookmarking.

    This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection, and performing content virality and diffusion analysis.

    Dataset Here is a description of the dataset files.

    followers.csv.gz. This compressed file contains the anonymized follower edge list. Once decompressed, each row consists of two comma-separated integers u, v, representing a directed following relation (i.e., user u follows user v). posts.tar.gz. This compressed folder contains data on the individual posts collected. Decompressing this file results in 100 files, each containing the full posts of up to 50,000 users. Each post is stored as a JSON-formatted line. interactions.csv.gz. This compressed file contains the anonymized interactions edge list. Once decompressed, each row consists of six comma-separated integers, and represents a comment, repost, or quote interaction. These integers correspond to the following fields, in this order: user_id, replied_author, thread_root_author, reposted_author ,quoted_author, and date. graphs.tar.gz. This compressed folder contains edge list files for the graphs emerging from reposts, quotes, and replies. Each interaction is timestamped. The folder also contains timestamped higher-order interactions emerging from discussion threads, each containing all users participating in a thread. feed_posts.tar.gz. This compressed folder contains posts that appear in 11 thematic feeds. Decompressing this folder results in 11 files containing posts from one feed each. Posts are stored as a JSON-formatted line. Fields are correspond to those in posts.tar.gz, except for those related to sentiment analysis (sent_label, sent_score), and reposts (repost_from, reposted_author); feed_bookmarks.csv. This file contains users who bookmarked any of the collected feeds. Each record contains three comma-separated values, namely the feed name, the user id, and the timestamp. feed_post_likes.tar.gz. This compressed folder contains data on likes to posts appearing in the feeds, one file per feed. Each record in the files contains the following information, in this order: the id of the ``liker'', the id of the post's author, the id of the liked post, and the like timestamp; scripts.tar.gz. A collection of Python scripts, including the ones originally used to crawl the data, and to perform experiments. These scripts are detailed in a document released within the folder.

    Citation If used for research purposes, please cite the following paper describing the dataset details:

    Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data. (2024) arXiv:2404.18984

    Acknowledgments: This work is supported by :

    the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”, Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu); SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021; EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).

  16. f

    Post metadata.

    • plos.figshare.com
    xls
    Updated Nov 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Failla; Giulio Rossetti (2024). Post metadata. [Dataset]. http://doi.org/10.1371/journal.pone.0310330.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Andrea Failla; Giulio Rossetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.

  17. F

    Polish Open Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Polish Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/polish-open-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    The Polish Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Polish language, advancing the field of artificial intelligence.

    Dataset Content:

    This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Polish. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Polish people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled Polish Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    Both the question and answers in Polish are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Polish Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.

  18. DEMETER2 data

    • figshare.com
    txt
    Updated Apr 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cancer Data Science (2020). DEMETER2 data [Dataset]. http://doi.org/10.6084/m9.figshare.6025238.v6
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 9, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Cancer Data Science
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cancer cell line genetic dependencies estimated using the DEMETER2 model. DEMETER2 is applied to three large-scale RNAi screening datasets: the Broad Institute Project Achilles, Novartis Project DRIVE, and the Marcotte et al. breast cell line dataset. The model is also applied to generate a combined dataset of gene dependencies covering a total of 712 unique cancer cell lines. For more information visit https://depmap.org/R2-D2/. Visit the Cancer Dependency Map portal at https://depmap.org to explore related datasets. Email questions to depmap@broadinstitute.org This dataset includes gene dependencies estimated using the DEMETER2 model, the raw input datasets used to fit the models, as well as associated metadata. See Readme file for more details about the dataset contents and version history.-------------------------------------------------------------------Version history: (see README for more details)-------------------------------------------------------------------v1: Initial data releasev2: - Removed small number of non-human genes (e.g. GFP, RFP) from shRNA-to-gene mapping - Updated cell line names to be consistent with DepMap names, according to the following map (old -> new):v3: Added estimated seed effect matricesv4: Added RNAseq and mutation data files used in analysis for manuscriptv5: Fixed minor bug with Marcotte LFC data that caused hairpins targeting multiple genes to appear multiple times in the LFC matrix. This created bias in the seed effect estimates for those hairpins, causing very minor differences to the resulting model parameters.v6: Added tables with shRNA quality metrics for Achilles and DRIVE data

  19. P

    Data from: AmsterTime Dataset

    • paperswithcode.com
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burak Yildiz; Seyran Khademi; Ronald Maria Siebes; Jan van Gemert (2023). AmsterTime Dataset [Dataset]. https://paperswithcode.com/dataset/amstertime
    Explore at:
    Dataset updated
    Apr 13, 2023
    Authors
    Burak Yildiz; Seyran Khademi; Ronald Maria Siebes; Jan van Gemert
    Description

    AmsterTime dataset offers a collection of 2,500 well-curated images matching the same scene from a street view matched to historical archival image data from Amsterdam city. The image pairs capture the same place with different cameras, viewpoints, and appearances. Unlike existing benchmark datasets, AmsterTime is directly crowdsourced in a GIS navigation platform (Mapillary). In turn, all the matching pairs are verified by a human expert to verify the correct matches and evaluate the human competence in the Visual Place Recognition (VPR) task for further references.

    The properties of the dataset are summarized as:

    1200+ license-free images from the Amsterdam City Archive, representing urban places in the city of Amsterdam, captured in the past century by many photographers. All archival queries are matched with street view images from Mapillary. All matches are verified by architectural historians and Amsterdam inhabitants. Image pairs are archival and street views capturing the same place with different cameras, time lags, structural changes, occlusion, viewpoint, appearance, and illuminations. The dataset exhibits a domain shift between query and the gallery due to significant difference between scanned archival and street view images.

    Two sub-tasks are created on the dataset:

    Verification is a binary classification (auxiliary) task to detect a pair of archival and street-view images of the same place. The verification task for AmsterTime dataset has all of the crowdsourced image pairs as positive labeled, where the same number of negative samples are generated by randomly pairing archival and street-view images summing up to a total of 2,462 pairs in the verification task.

    Retrieval is the main task corresponding to VPR, in which a given query image is matched with a set of gallery images. For the retrieval task, AmsterTime dataset offers 1231 query images where the leave-one-out set serves as the gallery images for each query.

  20. p

    CARD 2.0 - Dataset - Pandora

    • pandora.earth
    Updated Mar 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). CARD 2.0 - Dataset - Pandora [Dataset]. https://pandora.earth/gl_ES/dataset/card-2-0
    Explore at:
    Dataset updated
    Mar 11, 2025
    Description

    The Canadian Archaeological Radiocarbon Database (CARD) is a compilation of radiocarbon measurements that indicate the ages of samples primarily from archaeological sites in North America. CARD also includes samples from paleontological and geological contexts. We are slowly expanding our coverage into Central and South America. These data represent a significant investment and resource for researchers interested in human history and its context. CARD was created by Dr. Richard "Dick" Morlan of the Canadian Museum of History (formerly the Canadian Museum of Civilization), and its existence is a product of his genius and labour. In July, 2014 the Canadian Museum of History (CMH) and the Laboratory of Archaeology (LOA) at the University of British Columbia formed a partnership to revise and update the CARD platform. This current version of CARD (2.0) adds useful new features, including unlimited batch uploading/downloading of data and spatial/map visualization. However, the core of CARD remains the c14 dates painstakingly submitted by researchers across the world and compiled by Dick. We hope that this revision maintains the relationship that Dick established in one of the first crowd-sourced, big data endeavors: CARD provides utility and comprehensiveness and in exchange, researchers provide us with dates. See the HELP tab for more information and instructions on using CARD. Our efforts to update and upgrade CARD are just beginning, and we are moving in two directions: to increase the quantity and quality of CARD data and to improve the functionality of the CARD platform. We are looking for partners to assist us in both. CARD data contains some errors and is in some cases incomplete, and we are engaged in a long-term process of scrubbing the data. We are also developing new functional tools to make CARD more valuable to researchers including the generation of heat-maps of date concentrations over time, and the selection of data via a map interface. We also have longer-term plans to add calibration sockets with existing calibration services. If you have 14C data, please upload it to CARD. If you are interested in getting involved in the expansion and development of CARD, please email us at admin@card.anth.ubc.ca. Radiocarbon assessment has an effective range of about 250 to +50,000 years, and as a result most of the samples in CARD are associated with Indigenous archaeological sites. These represent a significant resource into aboriginal history. CARD fuzzes location data for public visitors to the database at 1:2,000,000 scale. Accessing CARD's full capabilities requires a security-account available only to researchers at accredited institutions. As Dick wrote in CARD 1.0, "The long term future of this database will depend upon whether or not the archaeological community finds it truly useful." We hope you do.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). African Facial Timeline Dataset | Facial Images from Past [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-historical-african

African Facial Timeline Dataset | Facial Images from Past

Human Past Image Dataset

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by
FutureBeeAI
Description

Introduction

Welcome to the African Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.

Facial Image Data

This dataset comprises over 10,000+ images, divided into participant-wise sets with each set including:

Historical Images: 22 different high-quality historical images per individual from the timeline of 10 years.
Enrollment Image: One modern high-quality image for reference.

Diversity and Representation

The dataset includes contributions from a diverse network of individuals across African countries:

Geographical Representation: Participants from countries including Kenya, Malawi, Nigeria, Ethiopia, Benin, Somalia, Uganda, and more.
Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.
File Format: The dataset contains images in JPEG and HEIC file format.

Quality and Conditions

To ensure high utility and robustness, all images are captured under varying conditions:

Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.
Backgrounds: A variety of backgrounds are available to enhance model generalization.
Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

Metadata

Each image set is accompanied by detailed metadata for each participant, including:

Participant Identifier
File Name
Age at the time of capture
Gender
Country
Demographic Information
File Format

This metadata is essential for training models that can accurately recognize and identify African faces across different demographics and conditions.

Usage and Applications

This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:

Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.
KYC Models: Streamlining the identity verification processes for financial and other services.
Biometric Identity Systems: Developing robust biometric identification solutions.
Age Prediction Models: Training models to accurately predict the age of individuals based on facial features.
Generative AI Models: Training generative AI models to create realistic and diverse synthetic facial images.

Secure and Ethical Collection

Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.
Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.
<h3

Search
Clear search
Close search
Google apps
Main menu