100+ datasets found
  1. Movies Performance and Feature Statistics

    • kaggle.com
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Movies Performance and Feature Statistics [Dataset]. https://www.kaggle.com/datasets/thedevastator/movies-performance-and-feature-statistics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Movies Performance and Feature Statistics

    Analyzing Box Office Performance, Rating and Audience Reactions

    By Yashwanth Sharaff [source]

    About this dataset

    This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    To get the most out of this data set you need to understand what each column in it represents. The ‘Title’ column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The ‘MPAA Rating’ lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and ‘Rating Count’ cover subje​cts such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.

    So go ahead - start exploring this interesting dataset today!

    Research Ideas

    • Creating a box office prediction model using budget, genre, release date and MPAA rating
    • Using the summary data to create a sentiment analysis tool for movie reviews
    • Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.

  2. d

    Data from: Surface Ocean CO2 Atlas Database Version 2021 (SOCATv2021) (NCEI...

    • catalog.data.gov
    • cmr.earthdata.nasa.gov
    • +2more
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2025). Surface Ocean CO2 Atlas Database Version 2021 (SOCATv2021) (NCEI Accession 0235360) [Dataset]. https://catalog.data.gov/dataset/surface-ocean-co2-atlas-database-version-2021-socatv2021-ncei-accession-0235360
    Explore at:
    Dataset updated
    Nov 1, 2025
    Dataset provided by
    (Point of Contact)
    Description

    This dataset consists of the Surface Ocean CO2 Atlas Version 2021 (SOCATv2021) data product files. The Surface Ocean CO2 Atlas (SOCAT) documents the increase in surface ocean CO2 (carbon dioxide), a critical measure as the oceans are taking up one quarter of the global CO2 emissions from human activity. SOCAT version 2021 has 30.6 million quality-controlled surface ocean fCO2 (fugacity of CO2) observations with an estimated accuracy of better than 5 μatm and a WOCE flag of 2 (good) from 1957 to 2020 for the global oceans and coastal seas. In addition, 2.1 million values with an estimated accuracy of 5 to 10 μatm are available. During quality control, marine scientists assign a flag to each data set, as well as WOCE flags of 2 (good), 3 (questionable) or 4 (bad) to individual fCO2 values. Data sets are assigned flags of A and B for an estimated accuracy of better than 2 μatm, flags of C and D for an accuracy of better than 5 μatm and a flag of E for an accuracy of better than 10 μatm. Bakker et al. (2016) describe the quality control criteria used in SOCAT versions 3 to 2021. Quality control comments for individual data sets can be accessed via the SOCAT Data Set Viewer (www.socat.info). All data sets, where data quality has been deemed acceptable, have been made public. The main SOCAT synthesis files and the gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Access to data sets with an estimated accuracy of 5 to 10 (flag of E) and fCO2 values with flags of 3 and 4 is via additional data products and the Data Set Viewer (Table 8 in Bakker et al., 2016). SOCAT publishes a global gridded product with a 1° longitude by 1° latitude resolution. A second product with a higher resolution of 0.25° longitude by 0.25° latitude is available for the coastal seas. The gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Gridded products are available monthly, per year and per decade. Two powerful, interactive, online viewers, the Data Set Viewer and the Gridded Data Viewer (www.socat.info), enable investigation of the SOCAT synthesis and gridded data products. SOCAT data products can be downloaded. Matlab code is available for reading these files. Ocean Data View also provides access to the SOCAT data products (www.socat.info). SOCAT data products are discoverable, accessible and citable. The SOCAT Data Use Statement asks users to generously acknowledge the contribution of SOCAT scientists by invitation to co-authorship, especially for data providers in regional studies, and/or reference to relevant scientific articles. The SOCAT website (www.socat.info) provides a single access point for online viewers, downloadable data sets, the Data Use Statement, a list of contributors and an overview of scientific publications on and using SOCAT. Automation of data upload and initial data checks allows annual releases of SOCAT from version 4 onwards. SOCAT-based data products are used for quantification of the ocean carbon sink, to estimate ocean acidification, for evaluation of biogeochemical sensor data and to evaluate climate models (CMIP). Since 2013 SOCAT products inform the annual Global Carbon Budget. The annual SOCAT releases are made by the SOCAT scientific community as a Voluntary Commitment for United Nations Sustainable Development Goal 14.3 (Reduce Ocean Acidification) (#OceanAction20464). More broadly the SOCAT releases contribute to UN SDG 13 (Climate Action) and SDG 14 (Life Below Water), and to the UN Decade of Ocean Science for Sustainable Development. Hundreds of peer-reviewed scientific publications and high-impact reports cite SOCAT. The SOCAT community-led synthesis product is a key step in the value chain based on in situ inorganic carbon measurements of the oceans, which provides policy makers with essential information on ocean CO2 uptake in climate negotiations. The global need for accurate knowledge of ocean CO2 uptake and its variation (including ocean acidification) makes sustained funding for in situ surface ocean CO2 observations imperative.

  3. Reddit: /r/Art

    • kaggle.com
    zip
    Updated Dec 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit: /r/Art [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-online-art-trends-with-reddit-posting/discussion?sort=undefined
    Explore at:
    zip(84621 bytes)Available download formats
    Dataset updated
    Dec 17, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Reddit: /r/Art

    Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

    By Reddit [source]

    About this dataset

    This dataset offers an in-depth exploration of the artistic world of Reddit, with a focus on the posts available on the website. By examining the titles, scores, ID's, URLs, comments, creation dates and timestamps associated with each post about art on Reddit, researchers can gain invaluable insight into how art enthusiasts share their work and build networks within this platform. Through analyzing this data we can understand what sorts of topics attract more attention from viewers and how members interact with one another in online discussions. Moreover, this dataset has potential to explore some of the larger underlying issues that shape art communities today - from examining production trends to better understanding consumption patterns. Overall, this comprehensive dataset is an essential resource for those aiming to analyze and comprehend digital spaces where art is circulated and discussed - giving unique insight into how ideas are created and promoted throughout creative networks

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is an excellent source of information related to online art trends, providing comprehensive analysis of Reddit posts related to art. In this guide, we’ll discuss how you can use this dataset to gather valuable insights about the way in which art is produced and shared on the web.
    First and foremost, you should start by familiarizing yourself with the columns included in the dataset. Each post contains a title, score (number of upvotes), URL, comments (number of comments), created date and timestamp. When interpreting each column individually or comparing different posts/threads, these values will provide invaluable insight into topics such as most discussed or favored content within the Reddit community.
    After exploring the general features within each post/thread in your analysis it’s time to move onto more specific components such as body content (including images) and creative dates - when users began responding and interacting with content posted about a specific topic or action related item). Utilizing these variables will help researchers uncover meaningful patterns regarding how communities interact with certain types of content over longer periods of time & also give context from what type of topics are trending at any given moment when analyzing at shorter intervals.
    Finally one last creative output that can stem from using this data set revolves around examining titles for common words & phrases that appear often among posts discussing similar types of artwork or other forms media production - identifying potential keywords & symbols associated across several different groups can paint a holistic picture regards what kind engagement each group desires while they engage amongst other like-minded individuals further aided by parameters presented through number scores what helps measure overall reception per submissions or individual thoughts presented in comment thread discussions among others known similar outlets available on site itself! Here's hoping utilizing these techniques may bring attention to some possible conclusions derived already exists previously undiscovered apart our eyes – good luck everyone!

    Research Ideas

    • Analyzing topics and themes within art posts to determine what content is most popular.
    • Examining the score of art posts to determine how the responding audience engages with each piece.
    • Comparing across different subreddits to explore the ‘meta-discourse’ of topics that appear in multiple forums or platforms

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: Art.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | ...

  4. h

    MPEP_DUTCH

    • huggingface.co
    Updated Apr 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Is Better Together (2024). MPEP_DUTCH [Dataset]. https://huggingface.co/datasets/data-is-better-together/MPEP_DUTCH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2024
    Dataset authored and provided by
    Data Is Better Together
    Description

    Dataset Card for MPEP_DUTCH

    This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

      Dataset Summary
    

    This dataset contains:

    A dataset configuration file conforming to the Argilla dataset format named argilla.yaml. This configuration file will be used to configure the dataset when using the… See the full description on the dataset page: https://huggingface.co/datasets/data-is-better-together/MPEP_DUTCH.

  5. T

    imdb_reviews

    • tensorflow.org
    • kaggle.com
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
    Explore at:
    Dataset updated
    Sep 20, 2024
    Description

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imdb_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  6. n

    Global Roads Open Access Data Set, Version 1 (gROADSv1)

    • earthdata.nasa.gov
    • dataverse.harvard.edu
    • +5more
    Updated May 16, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ESDIS (2013). Global Roads Open Access Data Set, Version 1 (gROADSv1) [Dataset]. http://doi.org/10.7927/H4VD6WCT
    Explore at:
    Dataset updated
    May 16, 2013
    Dataset authored and provided by
    ESDIS
    Description

    The Global Roads Open Access Data Set, Version 1 (gROADSv1) was developed under the auspices of the CODATA Global Roads Data Development Task Group. The data set combines the best available roads data by country into a global roads coverage, using the UN Spatial Data Infrastructure Transport (UNSDI-T) version 2 as a common data model. All country road networks have been joined topologically at the borders, and many countries have been edited for internal topology. Source data for each country are provided in the documentation, and users are encouraged to refer to the readme file for use constraints that apply to a small number of countries. Because the data are compiled from multiple sources, the date range for road network representations ranges from the 1980s to 2010 depending on the country (most countries have no confirmed date), and spatial accuracy varies. The baseline global data set was compiled by the Information Technology Outreach Services (ITOS) of the University of Georgia. Updated data for 27 countries and 6 smaller geographic entities were assembled by Columbia University's Center for International Earth Science Information Network (CIESIN), with a focus largely on developing countries with the poorest data coverage.

  7. Data Science Stack Exchange Dataset

    • kaggle.com
    zip
    Updated Jul 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aneesh Tickoo (2022). Data Science Stack Exchange Dataset [Dataset]. https://www.kaggle.com/datasets/aneeshtickoo/data-science-stack-exchange
    Explore at:
    zip(91829637 bytes)Available download formats
    Dataset updated
    Jul 11, 2022
    Authors
    Aneesh Tickoo
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Stack Exchange is a network of question-and-answer websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows the sites to be self-moderating.

    The dataset here is specific to one such network site of Stack Exchange named Data Science Stack Exchange. The dataset is distributed over multiple files. It contains information on various Posts on data science that can be used for language processing, it has data on which posts are being liked by users more, etc. A lot of analysis can be done on this dataset.

  8. w

    Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Apr 15, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2015). Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan [Dataset]. https://microdata.worldbank.org/index.php/catalog/1117
    Explore at:
    Dataset updated
    Apr 15, 2015
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2011
    Area covered
    Afghanistan
    Description

    Abstract

    Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.

    The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.

    Geographic coverage

    National Coverage.

    Analysis unit

    Individual

    Universe

    The target population is the civilian, non-institutionalized population 15 years and above.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.

    Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.

    Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.

    The sample size in Afghanistan was 1,000 individuals. Gender-matched sampling was used during the final stage of selection.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.

    Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.

  9. g

    Inspire data set BPL “Good morning mat III”

    • gimi9.com
    • data.europa.eu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inspire data set BPL “Good morning mat III” [Dataset]. https://gimi9.com/dataset/eu_7577aaaf-97c6-4c8f-aa76-a93a442a2d74
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    According to INSPIRE transformed development plan “Good Morningmatt III” of the municipality of Iffezheim based on an XPlanung dataset in version 5.0.

  10. Milling Wear - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Aug 28, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2007). Milling Wear - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/milling-wear
    Explore at:
    Dataset updated
    Aug 28, 2007
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Experiments on a milling machine for different speeds, feeds, and depth of cut. Records the wear of the milling insert, VB. The data set was provided by the UC Berkeley Emergent Space Tensegrities (BEST) Lab.

  11. Database populated with European diversification experiences

    • data.europa.eu
    • data.niaid.nih.gov
    unknown
    Updated Jul 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2020). Database populated with European diversification experiences [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3966842?locale=pl
    Explore at:
    unknown(250)Available download formats
    Dataset updated
    Jul 29, 2020
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Europe
    Description

    The EU Horizon 2020 project DiverIMPACTS aims to promote the realisation of the full potential of crop diversification through rotation, multicropping and intercropping by demonstrating technical, economic and environmental benefits for famers, along the value chain and for society at large, and by providing innovations that can remove existing barriers and lock-ins of practical diffusion. DiverIMPACTS does so by combining findings from several participatory case studies with a set of field experiments across Europe, and translating these into strategies, recommendations and fit-for-purpose tools developed with and for farmers, advisors and other actors along the value chain. To first gain a good overview of the current situation, i.e. the existing success stories and challenges of crop diversification in Europe, Work Package 1 (WP 1) identified and analysed factors of success and failure associated with a variety of crop diversification experiences (CDEs) outside those already represented in the consortium (see Deliverable 1.1). WP 1 thus makes sure that the rich experience with crop diversification initiatives across Europe (e.g. from other Horizon 2020 projects) is taken into account for developing strategies, recommendations and tools. Deliverable 1.1 provided i) a list of key drivers (ex ante occurrence of market opportunities, environmental constraints, availability of enabling advisory services, land and workforce availability etc.) to be further considered in WP3, and WP5; and ii) a comprehensive and exhaustive description of the links between key factors and CDE types. This analysis is the basis for consolidating or updating the tentative typology of crop diversification situations used for setting up DiverIMPACTS (case studies), and was used for selecting experiences for more detailed investigations in T1.2. It also complements the identification and characterisation of lock-ins and barriers to crop diversification, and serves their overcoming. During the process of collecting, cleaning and analysing the survey data, a Database of European diversification experiences was created. All together 128 valid responses from 15 European countries – mainly from the project countries Belgium, France, Germany, Hungary, Italy, the Netherlands, Poland, Romania, Sweden, Switzerland, and UK, but also from Denmark, Finland, Luxemburg and Spain were received in T1.1, and were included in the database. The database is stored in original and back-up form in a tabular ='.csv'= format that can be opened in Excel on the Sharepoint system of the project and now on Zenodo, under restricted WP1 area. A further ='.csv'= file was created to store the metadata of the database. This file helps to have a better overview of the questions and sub-questions that were asked in the survey and the type of answer that could be provided to each of them (e.g. factor, Yes-No selection or character). Using the meta data and the database, a selection of personal data fields has been made (e.g. email addresses and names of people) that cannot be published with open access, and needs special attention and data handling. These variables were removed from the original database, and a public version of the database was created that can be shared with third parties. Links to the data files will be shared here after. Developing a Shiny(c) application in R was chosen as a solution to visualize the public data, and make it possible for Partners and all interested parties to interactively view the survey results. The Shiny application is shared as an R-package and are freely accessible on the internet. The users have the possibility to download application and public data in order to visualize them on their own computer. A remote solution, facilitating the consultation of the data, will be installed in CRA-W, where the open data analyses module will be hosted. A short user guide and tutorial is part of this deliverable for helping interested parties to use the Shiny interface. The chosen approach, linking R scripts, R packages and data files, will be useful in the future in order to continiously complete the data base and to update the application (new graphs, new functions regarding the demand of the main users). The release of the application will be shared using modern technologies of information and communication : project website, newsletter, blogs, twitter and other social networks. The main deliverable (D1.2) which is public, is available here : 10.5281/zenodo.3966852

  12. d

    Temu and Fast Fashion E-Receipt Data | Consumer Transaction Data | Asia,...

    • datarade.ai
    .json, .xml, .csv
    Updated Mar 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Measurable AI (2024). Temu and Fast Fashion E-Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM, MENA, India | Granular & Aggregate Data | 23+ Countries [Dataset]. https://datarade.ai/data-products/temu-and-fast-fashion-email-receipt-data-consumer-transacti-measurable-ai
    Explore at:
    .json, .xml, .csvAvailable download formats
    Dataset updated
    Mar 3, 2024
    Dataset authored and provided by
    Measurable AI
    Area covered
    India, Latin America, Brazil, United States of America, Japan, Mexico, Colombia, Chile, Argentina
    Description

    The Measurable AI Temu & Fast Fashion E-Receipt Dataset is a leading source of email receipts and transaction data, offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.

    We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.

    Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.

    Coverage - Asia (Japan, Thailand, Malaysia, Vietnam, Indonesia, Singapore, Hong Kong, Phillippines) - EMEA (Spain, United Arab Emirates, Saudi, Qatar) - Latin America (Brazil, Mexico, Columbia, Argentina)

    Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more - Email ID (can work out user overlap with peers and loyalty)

    Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018.

    Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.

    Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.

  13. d

    Crash Data

    • catalog.data.gov
    • data.townofcary.org
    • +2more
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cary (2025). Crash Data [Dataset]. https://catalog.data.gov/dataset/crash-data
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    Cary
    Description

    This dataset contains crash information from the last five years to the current date. The data is based on the National Incident Based Reporting System (NIBRS). The data is dynamic, allowing for additions, deletions and modifications at any time, resulting in more accurate information in the database. Due to ongoing and continuous data entry, the numbers of records in subsequent extractions are subject to change.About Crash DataThe Cary Police Department strives to make crash data as accurate as possible, but there is no avoiding the introduction of errors into this process, which relies on data furnished by many people and that cannot always be verified. As the data is updated on this site there will be instances of adding new incidents and updating existing data with information gathered through the investigative process.Not surprisingly, crash data becomes more accurate over time, as new crashes are reported and more information comes to light during investigations.This dynamic nature of crash data means that content provided here today will probably differ from content provided a week from now. Likewise, content provided on this site will probably differ somewhat from crime statistics published elsewhere by the Town of Cary, even though they draw from the same database.About Crash LocationsCrash locations reflect the approximate locations of the crash. Certain crashes may not appear on maps if there is insufficient detail to establish a specific, mappable location.

  14. o

    University SET data, with faculty and courses characteristics

    • openicpsr.org
    Updated Sep 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
    Explore at:
    Dataset updated
    Sep 12, 2021
    Authors
    Under blind review in refereed journal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○

  15. d

    Campaign Finance Summary

    • catalog.data.gov
    • data.wa.gov
    • +1more
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.wa.gov (2025). Campaign Finance Summary [Dataset]. https://catalog.data.gov/dataset/campaign-finance-summary
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    data.wa.gov
    Description

    This data set contains a summary of information about candidate campaigns and political committees by election year. For candidate campaigns and single-year/election committees, a single record is provided that covers all activity of the campaign for the given election year. Information for continuing political committees is summarized by calendar/reporting year. The data set covers that prior 16 years plus the current election year. The data are compiled from the campaign reports deposit (C3), campaign summary reports (C4), campaign registrations (C1/C1pc) and candidate declarations and elections data provided to the PDC by the Washington Secretary of State. Records are updated in near real-time, typically less than 2 minutes from the time the campaign submits new data. This dataset is a best-effort by the PDC to provide a complete set of records as described herewith. The PDC provides access to the original reports for the purpose of record verification. Descriptions attached to this dataset do not constitute legal definitions; please consult RCW 42.17A and WAC Title 390 for legal definitions and additional information regarding political finance disclosure requirements. CONDITION OF RELEASE: This publication and or referenced documents constitutes a list of individuals prepared by the Washington State Public Disclosure Commission and may not be used for commercial purposes. This list is provided on the condition and with the understanding that the persons receiving it agree to this statutorily imposed limitation on its use. See RCW 42.56.070(9) and AGO 1975 No. 15.

  16. w

    Global Financial Inclusion (Global Findex) Database 2011 - Cameroon

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Apr 15, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2015). Global Financial Inclusion (Global Findex) Database 2011 - Cameroon [Dataset]. https://microdata.worldbank.org/index.php/catalog/1143
    Explore at:
    Dataset updated
    Apr 15, 2015
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2011
    Area covered
    Cameroon
    Description

    Abstract

    Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.

    The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.

    Geographic coverage

    National Coverage.

    Analysis unit

    Individual

    Universe

    The target population is the civilian, non-institutionalized population 15 years and above. The sample is nationally representative.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.

    Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.

    Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.

    The sample size in Cameroon was 1,000 individuals.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.

    Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.

  17. d

    Color GeoTIFF Image of the 10-m Interpolated Bathymetric Grid of the...

    • catalog.data.gov
    • search.dataone.org
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Color GeoTIFF Image of the 10-m Interpolated Bathymetric Grid of the Northern Part of National Oceanic and Atmospheric Administration (NOAA) Survey H11044 off Milford, Connecticut (H11044N_MB10M_UTM18.TIF, UTM Zone 18, WGS84) [Dataset]. https://catalog.data.gov/dataset/color-geotiff-image-of-the-10-m-interpolated-bathymetric-grid-of-the-northern-part-of-nati
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Connecticut, Milford
    Description

    During 2001 the NOAA Ship RUDE completed charting survey H11044 that covered a roughly 293 km2 area of the sea floor in north-central Long Island Sound, off Milford Connecticut. Although 100 percent coverage was achieved with sidescan sonar for charting purposes, only reconnaissance (spaced line) bathymetry was acquired with shallow-water multibeam and single-beam systems. Therefore, further processing was conducted at the USGS's Woods Hole Science Center to provide bathymetric datasets with more continuous coverage. This project produced grids and GeoTIFF imagery of the combined and interpolated shallow-water multibeam and single-beam bathymetry generated from the northern part of this data set. Anthropogenic wastes, toxic chemicals, and changes in land-use patterns resulting from residential, commercial, and recreational development have stressed the environment of the Sound, causing degradation and potential loss of benthic habitats. Detailed maps of the sea floor are needed to help evaluate the extent of adverse impacts and to help manage resources wisely in the future. Therefore, in a continuing effort to better understand Long Island Sound, we have interpolated and gridded shallow-water multibeam and single-beam bathymetric data within specific areas of special interest.

  18. d

    Cadastral PLSS Standardized Data - PLSSReferenceGrid, SE - Version 1.1

    • catalog.data.gov
    • datadiscoverystudio.org
    • +4more
    Updated Dec 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2020). Cadastral PLSS Standardized Data - PLSSReferenceGrid, SE - Version 1.1 [Dataset]. https://catalog.data.gov/dataset/cadastral-plss-standardized-data-plssreferencegrid-se-version-1-1
    Explore at:
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    (Point of Contact)
    Description

    This feature class is part of the Cadastral National Spatial Data Infrastructure (NSDI) CADNSDI publication data set for rectangular and non-rectangular Public Land Survey System (PLSS) data set. The metadata description in the Cadastral Reference System Feature Data Set more fully describes the entire data set. The PLSS Reference Grid is a generalized data set providing the Township and First Divisions of the PLSS as a separate feature class to support data requests, mapping and indexing. The spatial location and position and attributes of this feature class are the same as those in the primary data sets from which this data is built. These data are often used for map sheet layouts and general location reference

  19. b

    North American Rail Network Lines

    • geodata.bts.gov
    • geodata.colorado.gov
    • +8more
    Updated Jul 1, 1995
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Transportation: ArcGIS Online (1995). North American Rail Network Lines [Dataset]. https://geodata.bts.gov/datasets/usdot::north-american-rail-network-lines/about
    Explore at:
    Dataset updated
    Jul 1, 1995
    Dataset authored and provided by
    U.S. Department of Transportation: ArcGIS Online
    Area covered
    Description

    The North American Rail Network (NARN) Rail Lines dataset was created in 2016 and was updated on July 18, 2025 from the Federal Railroad Administration (FRA) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The NARN Rail Lines dataset is a database that provides ownership, trackage rights, type, passenger, STRACNET, and geographic reference for North America's railway system at 1:24,000 or better within the United States. The data set covers all 50 States, the District of Columbia, Mexico, and Canada. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1528950

  20. National Tunnel Inventory

    • catalog.data.gov
    • geodata.bts.gov
    • +2more
    Updated Sep 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (FHWA) (Point of Contact) (2025). National Tunnel Inventory [Dataset]. https://catalog.data.gov/dataset/national-tunnel-inventory1
    Explore at:
    Dataset updated
    Sep 5, 2025
    Dataset provided by
    Federal Highway Administrationhttps://highways.dot.gov/
    Description

    The National Tunnel Inventory dataset was compiled on September 02, 2025 and published on August 26, 2025 from the Federal Highway Administration (FHWA) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The National Tunnel Inventory (NTI) is a collection of information (database) describing the more than 500 of the Nation's tunnels located on public roads, including Interstate Highways, U.S. highways, State and county roads, as well as publicly-accessible tunnels on Federal lands. The inventory data present a complete picture of the location, description, and classification data for each tunnel, as well as any load rating and inspection information. The Specifications for the National Tunnel Inventory (SNTI) contains a detailed description of each data element including coding instructions and attribute definitions. The Coding Manual is published for each year of data collection; the manual is available at: https://doi.org/10.21949/1519104. For additional questions regarding regulations for the National Tunnel Inventory or the Coding Guide please contact the National Bridge and Tunnel Inventory team at NBTIS_Support@dot.gov. For questions on the geospatial component of the dataset, contact the NTAD team at NTAD@dot.gov. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529051

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). Movies Performance and Feature Statistics [Dataset]. https://www.kaggle.com/datasets/thedevastator/movies-performance-and-feature-statistics
Organization logo

Movies Performance and Feature Statistics

Analyzing Box Office Performance, Rating and Audience Reactions

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description

Movies Performance and Feature Statistics

Analyzing Box Office Performance, Rating and Audience Reactions

By Yashwanth Sharaff [source]

About this dataset

This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

To get the most out of this data set you need to understand what each column in it represents. The ‘Title’ column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The ‘MPAA Rating’ lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and ‘Rating Count’ cover subje​cts such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.

So go ahead - start exploring this interesting dataset today!

Research Ideas

  • Creating a box office prediction model using budget, genre, release date and MPAA rating
  • Using the summary data to create a sentiment analysis tool for movie reviews
  • Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.

Search
Clear search
Close search
Google apps
Main menu