91 datasets found
  1. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    British Indian Ocean Territory, Northern Mariana Islands, Moldova (Republic of), Bangladesh, Isle of Man, Canada, Andorra, Taiwan, Tunisia, Nepal
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  2. d

    data.govt.nz website usage statistics Nov 09 - Oct 11 - Dataset -...

    • catalogue.data.govt.nz
    Updated Oct 11, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2009). data.govt.nz website usage statistics Nov 09 - Oct 11 - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/data-govt-nz-website-usage-statistics-nov-09-oct-11
    Explore at:
    Dataset updated
    Oct 11, 2009
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    Raw website usage statistics for data.govt.nz including unique visitors, page views, click-thoughs to data hosting websites, cumulative number of dataset listing pages, and 25 most viewed datasets per month.

  3. d

    NYC Benefits Platform: Benefits and Programs Multilingual Dataset

    • catalog.data.gov
    • data.cityofnewyork.us
    • +3more
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). NYC Benefits Platform: Benefits and Programs Multilingual Dataset [Dataset]. https://catalog.data.gov/dataset/benefits-and-programs-multilingual-dataset
    Explore at:
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    data.cityofnewyork.us
    Area covered
    New York
    Description

    This dataset provides benefit, program, and resource information for over 80 health and human services available to NYC residents in all eleven local law languages. The data is kept up-to-date, including the most recent applications, eligibility requirements, and application dates. Information in this dataset is used on ACCESS NYC, Generation NYC, and Growing Up NYC. Reach out to products@nycopportunity.nyc.gov if you have any questions about this dataset. "This data makes it easier for NYC residents to discover and be aware of multiple benefits they may be eligible for. NYC Opportunity Product team works with 15+ government agencies to collect and update this data. Each record in the dataset represents a benefit or program. Blank fields are NULL values in this dataset. The data can be used to develop new websites or directory resources to help residents to discover benefits they need. The English-only version of the data is available at https://data.cityofnewyork.us/Social-Services/NYC-Benefits-Platform-Benefits-and-Programs-Datase/kvhd-5fmu."

  4. Job Postings Dataset for Labour Market Research and Insights

    • datarade.ai
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset authored and provided by
    Oxylabs
    Area covered
    Jamaica, Togo, Luxembourg, Kyrgyzstan, Tajikistan, Zambia, Switzerland, Sierra Leone, Anguilla, British Indian Ocean Territory
    Description

    Introducing Job Posting Datasets: Uncover labor market insights!

    Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

    Job Posting Datasets Source:

    1. Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

    2. Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

    3. StackShare: Access StackShare datasets to make data-driven technology decisions.

    Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

    Choose your preferred dataset delivery options for convenience:

    Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

    Why Choose Oxylabs Job Posting Datasets:

    1. Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

    2. Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

    3. Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.

  5. Requirements data sets (user stories)

    • zenodo.org
    • data.mendeley.com
    txt
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabiano Dalpiaz; Fabiano Dalpiaz (2025). Requirements data sets (user stories) [Dataset]. http://doi.org/10.17632/7zbk8zsd8y.1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Mendeley Ltd.
    Authors
    Fabiano Dalpiaz; Fabiano Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of 22 data set of 50+ requirements each, expressed as user stories.

    The dataset has been created by gathering data from web sources and we are not aware of license agreements or intellectual property rights on the requirements / user stories. The curator took utmost diligence in minimizing the risks of copyright infringement by using non-recent data that is less likely to be critical, by sampling a subset of the original requirements collection, and by qualitatively analyzing the requirements. In case of copyright infringement, please contact the dataset curator (Fabiano Dalpiaz, f.dalpiaz@uu.nl) to discuss the possibility of removal of that dataset [see Zenodo's policies]

    The data sets have been originally used to conduct experiments about ambiguity detection with the REVV-Light tool: https://github.com/RELabUU/revv-light

    This collection has been originally published in Mendeley data: https://data.mendeley.com/datasets/7zbk8zsd8y/1

    Overview of the datasets [data and links added in December 2024]

    The following text provides a description of the datasets, including links to the systems and websites, when available. The datasets are organized by macro-category and then by identifier.

    Public administration and transparency

    g02-federalspending.txt (2018) originates from early data in the Federal Spending Transparency project, which pertain to the website that is used to share publicly the spending data for the U.S. government. The website was created because of the Digital Accountability and Transparency Act of 2014 (DATA Act). The specific dataset pertains a system called DAIMS or Data Broker, which stands for DATA Act Information Model Schema. The sample that was gathered refers to a sub-project related to allowing the government to act as a data broker, thereby providing data to third parties. The data for the Data Broker project is currently not available online, although the backend seems to be hosted in GitHub under a CC0 1.0 Universal license. Current and recent snapshots of federal spending related websites, including many more projects than the one described in the shared collection, can be found here.

    g03-loudoun.txt (2018) is a set of extracted requirements from a document, by the Loudoun County Virginia, that describes the to-be user stories and use cases about a system for land management readiness assessment called Loudoun County LandMARC. The source document can be found here and it is part of the Electronic Land Management System and EPlan Review Project - RFP RFQ issued in March 2018. More information about the overall LandMARC system and services can be found here.

    g04-recycling.txt(2017) concerns a web application where recycling and waste disposal facilities can be searched and located. The application operates through the visualization of a map that the user can interact with. The dataset has obtained from a GitHub website and it is at the basis of a students' project on web site design; the code is available (no license).

    g05-openspending.txt (2018) is about the OpenSpending project (www), a project of the Open Knowledge foundation which aims at transparency about how local governments spend money. At the time of the collection, the data was retrieved from a Trello board that is currently unavailable. The sample focuses on publishing, importing and editing datasets, and how the data should be presented. Currently, OpenSpending is managed via a GitHub repository which contains multiple sub-projects with unknown license.

    g11-nsf.txt (2018) refers to a collection of user stories referring to the NSF Site Redesign & Content Discovery project, which originates from a publicly accessible GitHub repository (GPL 2.0 license). In particular, the user stories refer to an early version of the NSF's website. The user stories can be found as closed Issues.

    (Research) data and meta-data management

    g08-frictionless.txt (2016) regards the Frictionless Data project, which offers an open source dataset for building data infrastructures, to be used by researchers, data scientists, and data engineers. Links to the many projects within the Frictionless Data project are on GitHub (with a mix of Unlicense and MIT license) and web. The specific set of user stories has been collected in 2016 by GitHub user @danfowler and are stored in a Trello board.

    g14-datahub.txt (2013) concerns the open source project DataHub, which is currently developed via a GitHub repository (the code has Apache License 2.0). DataHub is a data discovery platform which has been developed over multiple years. The specific data set is an initial set of user stories, which we can date back to 2013 thanks to a comment therein.

    g16-mis.txt (2015) is a collection of user stories that pertains a repository for researchers and archivists. The source of the dataset is a public Trello repository. Although the user stories do not have explicit links to projects, it can be inferred that the stories originate from some project related to the library of Duke University.

    g17-cask.txt (2016) refers to the Cask Data Application Platform (CDAP). CDAP is an open source application platform (GitHub, under Apache License 2.0) that can be used to develop applications within the Apache Hadoop ecosystem, an open-source framework which can be used for distributed processing of large datasets. The user stories are extracted from a document that includes requirements regarding dataset management for Cask 4.0, which includes the scenarios, user stories and a design for the implementation of these user stories. The raw data is available in the following environment.

    g18-neurohub.txt (2012) is concerned with the NeuroHub platform, a neuroscience data management, analysis and collaboration platform for researchers in neuroscience to collect, store, and share data with colleagues or with the research community. The user stories were collected at a time NeuroHub was still a research project sponsored by the UK Joint Information Systems Committee (JISC). For information about the research project from which the requirements were collected, see the following record.

    g22-rdadmp.txt (2018) is a collection of user stories from the Research Data Alliance's working group on DMP Common Standards. Their GitHub repository contains a collection of user stories that were created by asking the community to suggest functionality that should part of a website that manages data management plans. Each user story is stored as an issue on the GitHub's page.

    g23-archivesspace.txt (2012-2013) refers to ArchivesSpace: an open source, web application for managing archives information. The application is designed to support core functions in archives administration such as accessioning; description and arrangement of processed materials including analog, hybrid, and
    born digital content; management of authorities and rights; and reference service. The application supports collection management through collection management records, tracking of events, and a growing number of administrative reports. ArchivesSpace is open source and its

  6. Machine Learning Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Machine Learning Dataset [Dataset]. https://brightdata.com/products/datasets/machine-learning
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jun 19, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our machine learning datasets to develop and validate your models. Our datasets are designed to support a variety of machine learning applications, from image recognition to natural language processing and recommendation systems. You can access a comprehensive dataset or tailor a subset to fit your specific requirements, using data from a combination of various sources and websites, including custom ones. Popular use cases include model training and validation, where the dataset can be used to ensure robust performance across different applications. Additionally, the dataset helps in algorithm benchmarking by providing extensive data to test and compare various machine learning algorithms, identifying the most effective ones for tasks such as fraud detection, sentiment analysis, and predictive maintenance. Furthermore, it supports feature engineering by allowing you to uncover significant data attributes, enhancing the predictive accuracy of your machine learning models for applications like customer segmentation, personalized marketing, and financial forecasting.

  7. Developer Community and Code Datasets

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs, Developer Community and Code Datasets [Dataset]. https://datarade.ai/data-products/developer-community-and-code-datasets-oxylabs
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset authored and provided by
    Oxylabs
    Area covered
    Philippines, El Salvador, Bahamas, Tuvalu, Marshall Islands, Guyana, Saint Pierre and Miquelon, United Kingdom, South Sudan, Djibouti
    Description

    Unlock the power of ready-to-use data sourced from developer communities and repositories with Developer Community and Code Datasets.

    Data Sources:

    1. GitHub: Access comprehensive data about GitHub repositories, developer profiles, contributions, issues, social interactions, and more.

    2. StackShare: Receive information about companies, their technology stacks, reviews, tools, services, trends, and more.

    3. DockerHub: Dive into data from container images, repositories, developer profiles, contributions, usage statistics, and more.

    Developer Community and Code Datasets are a treasure trove of public data points gathered from tech communities and code repositories across the web.

    With our datasets, you'll receive:

    • Usernames;
    • Companies;
    • Locations;
    • Job Titles;
    • Follower Counts;
    • Contact Details;
    • Employability Statuses;
    • And More.

    Choose from various output formats, storage options, and delivery frequencies:

    • Get datasets in CSV, JSON, or other preferred formats.
    • Opt for data delivery via SFTP or directly to your cloud storage, such as AWS S3.
    • Receive datasets either once or as per your agreed-upon schedule.

    Why choose our Datasets?

    1. Fresh and accurate data: Access complete, clean, and structured data from scraping professionals, ensuring the highest quality.

    2. Time and resource savings: Let us handle data extraction and processing cost-effectively, freeing your resources for strategic tasks.

    3. Customized solutions: Share your unique data needs, and we'll tailor our data harvesting approach to fit your requirements perfectly.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is trusted by Fortune 500 companies and adheres to GDPR and CCPA standards.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Empower your data-driven decisions with Oxylabs Developer Community and Code Datasets!

  8. eCommerce events history in electronics store

    • kaggle.com
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Kechinov (2021). eCommerce events history in electronics store [Dataset]. https://www.kaggle.com/datasets/mkechinov/ecommerce-events-history-in-electronics-store/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Michael Kechinov
    Description

    About

    This file contains behavior data for 5 months (Oct 2019 – Feb 2020) from a large electronics online store.

    Each row in the file represents an event. All events are related to products and users. Each event is like many-to-many relation between products and users.

    Data collected by Open CDP project. Feel free to use open source customer data platform.

    More datasets

    Checkout another datasets:

    1. https://www.kaggle.com/mkechinov/ecommerce-behavior-data-from-multi-category-store
    2. https://www.kaggle.com/mkechinov/ecommerce-purchase-history-from-electronics-store
    3. https://www.kaggle.com/mkechinov/ecommerce-events-history-in-cosmetics-shop
    4. https://www.kaggle.com/mkechinov/ecommerce-purchase-history-from-jewelry-store
    5. https://www.kaggle.com/mkechinov/ecommerce-events-history-in-electronics-store - you're reading it right now
    6. [NEW] https://www.kaggle.com/datasets/mkechinov/direct-messaging

    How to read it

    There are different types of events. See below.

    Semantics (or how to read it):

    User user_id during session user_session added to shopping cart (property event_type is equal cart) product product_id of brand brand of category category_code (category_code) with price price at event_time

    File structure

    PropertyDescription
    event_timeTime when event happened at (in UTC).
    event_typeOnly one kind of event: purchase.
    product_idID of a product
    category_idProduct's category ID
    category_codeProduct's category taxonomy (code name) if it was possible to make it. Usually present for meaningful categories and skipped for different kinds of accessories.
    brandDowncased string of brand name. Can be missed.
    priceFloat price of a product. Present.
    user_idPermanent user ID.
    ** user_session**Temporary user's session ID. Same for each user's session. Is changed every time user come back to online store from a long pause.

    Event types

    Events can be:

    • view - a user viewed a product
    • cart - a user added a product to shopping cart
    • remove_from_cart - a user removed a product from shopping cart
    • purchase - a user purchased a product

    Multiple purchases per session

    A session can have multiple purchase events. It's ok, because it's a single order.

    Many thanks

    Thanks to REES46 Marketing Platform for this dataset.

    Using datasets in your works, books, education materials

    You can use this dataset for free. Just mention the source of it: link to this page and link to REES46 Marketing Platform.

  9. Number of internet users worldwide 2014-2029

    • statista.com
    • flwrdeptvarieties.store
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    World
    Description

    The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.

  10. DeepCube: Post-processing and annotated datasets of social media data

    • zenodo.org
    • data.niaid.nih.gov
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis; Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis (2024). DeepCube: Post-processing and annotated datasets of social media data [Dataset]. http://doi.org/10.5281/zenodo.10731637
    Explore at:
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis; Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Researcher(s): Alexandros Mokas, Eleni Kamateri

    Supervisor: Ioannis Tsampoulatidis

    This repository contains 3 social media datasets:

    2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:

    • The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.
    • The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.

    1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:

    • The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.

    For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.

    After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.

    The dataset is provided by INFALIA.

    INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.

  11. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  12. AI Training Data Market will grow at a CAGR of 23.50% from 2024 to 2031.

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). AI Training Data Market will grow at a CAGR of 23.50% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/ai-training-data-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Ai Training Data market size is USD 1865.2 million in 2023 and will expand at a compound annual growth rate (CAGR) of 23.50% from 2023 to 2030.

    The demand for Ai Training Data is rising due to the rising demand for labelled data and diversification of AI applications.
    Demand for Image/Video remains higher in the Ai Training Data market.
    The Healthcare category held the highest Ai Training Data market revenue share in 2023.
    North American Ai Training Data will continue to lead, whereas the Asia-Pacific Ai Training Data market will experience the most substantial growth until 2030.
    

    Market Dynamics of AI Training Data Market

    Key Drivers of AI Training Data Market

    Rising Demand for Industry-Specific Datasets to Provide Viable Market Output
    

    A key driver in the AI Training Data market is the escalating demand for industry-specific datasets. As businesses across sectors increasingly adopt AI applications, the need for highly specialized and domain-specific training data becomes critical. Industries such as healthcare, finance, and automotive require datasets that reflect the nuances and complexities unique to their domains. This demand fuels the growth of providers offering curated datasets tailored to specific industries, ensuring that AI models are trained with relevant and representative data, leading to enhanced performance and accuracy in diverse applications.

    In July 2021, Amazon and Hugging Face, a provider of open-source natural language processing (NLP) technologies, have collaborated. The objective of this partnership was to accelerate the deployment of sophisticated NLP capabilities while making it easier for businesses to use cutting-edge machine-learning models. Following this partnership, Hugging Face will suggest Amazon Web Services as a cloud service provider for its clients.

    (Source: about:blank)

    Advancements in Data Labelling Technologies to Propel Market Growth
    

    The continuous advancements in data labelling technologies serve as another significant driver for the AI Training Data market. Efficient and accurate labelling is essential for training robust AI models. Innovations in automated and semi-automated labelling tools, leveraging techniques like computer vision and natural language processing, streamline the data annotation process. These technologies not only improve the speed and scalability of dataset preparation but also contribute to the overall quality and consistency of labelled data. The adoption of advanced labelling solutions addresses industry challenges related to data annotation, driving the market forward amidst the increasing demand for high-quality training data.

    In June 2021, Scale AI and MIT Media Lab, a Massachusetts Institute of Technology research centre, began working together. To help doctors treat patients more effectively, this cooperation attempted to utilize ML in healthcare.

    www.ncbi.nlm.nih.gov/pmc/articles/PMC7325854/

    Restraint Factors Of AI Training Data Market

    Data Privacy and Security Concerns to Restrict Market Growth
    

    A significant restraint in the AI Training Data market is the growing concern over data privacy and security. As the demand for diverse and expansive datasets rises, so does the need for sensitive information. However, the collection and utilization of personal or proprietary data raise ethical and privacy issues. Companies and data providers face challenges in ensuring compliance with regulations and safeguarding against unauthorized access or misuse of sensitive information. Addressing these concerns becomes imperative to gain user trust and navigate the evolving landscape of data protection laws, which, in turn, poses a restraint on the smooth progression of the AI Training Data market.

    How did COVID–19 impact the Ai Training Data market?

    The COVID-19 pandemic has had a multifaceted impact on the AI Training Data market. While the demand for AI solutions has accelerated across industries, the availability and collection of training data faced challenges. The pandemic disrupted traditional data collection methods, leading to a slowdown in the generation of labeled datasets due to restrictions on physical operations. Simultaneously, the surge in remote work and the increased reliance on AI-driven technologies for various applications fueled the need for diverse and relevant training data. This duali...

  13. d

    Lake and Wetland Public Access Sites - Dataset - data.govt.nz - discover and...

    • catalogue.data.govt.nz
    Updated Aug 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Lake and Wetland Public Access Sites - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/lake-and-wetland-public-access-sites
    Explore at:
    Dataset updated
    Aug 23, 2020
    Description

    The data provides information on recreation and amenities at lake and wetland access sites. The information describes the site and its amenities and suggests recreational uses. The public access sites are open to the public without obtaining prior landowner’s permission.Almost, every public access site has been visited by an Environment Canterbury staff member. During the site visit, information is collected that describes the site and photos are taken. The access site is usually located where it is appropriate to leave a vehicle, such as an informal parking area, or picnic area. Distances to the lake or wetland are approximate and may vary depending on water levels.The purpose of the data is to provide information on Canterbury’s public access sites, the data helps answer questions from the public relating to recreation and access. In addition, the data also helps meet some of the requirements set out by the Canterbury Water Management Strategy for Recreation and Amenity targets.The information is recorded in a consistent manner with consistent standards applied during its collection. The categories used to describe access are:Foot and vehicle access: pedestrian and vehicle access.Foot access: pedestrian access onlyFoot and vehicle access over private property: pedestrian and vehicle access, available over private property, permitted by the land occupier or ownerFoot access over private property: pedestrian only access, available over private property, permitted by the land occupier or owner.The categories used to describe the quality of the road or track as described below:Good: well maintained, easily passable.Average: may require some care when driving a road car but still easily passable.Poor: difficult to drive along care and skill required, urgently in need of maintenance.4WD: four-wheel drive recommended for track. For more information about the structure and content of this layer, please refer to the Data Dictionary.Caveats – conditions or limitations of the layerEvery effort is made to identify public access points through site visits and checking the data by utilising maps, plans, and the digital cadastral database (DCDB).Every effort is made to survey all access points along riparian margins unless access is in a remote, hard to reach area, or is unlikely to be frequently visited by the public.The information is only accurate to the time of the survey, refer to ‘Date surveyed’ attribute for each access point

  14. Most popular websites in the Netherlands 2015

    • ssh.datastations.nl
    • datacatalogue.cessda.eu
    csv, xlsx, zip
    Updated May 9, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DANS Data Station Social Sciences and Humanities (2017). Most popular websites in the Netherlands 2015 [Dataset]. http://doi.org/10.17026/dans-x6h-6qqt
    Explore at:
    zip(15855), xlsx(144965), csv(138294)Available download formats
    Dataset updated
    May 9, 2017
    Dataset provided by
    Data Archiving and Networked Services
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Netherlands
    Dataset funded by
    NWO
    Description

    This dataset contains a list of 3654 Dutch websites that we considered the most popular websites in 2015. This list served as whitelist for the Newstracker Research project in which we monitored the online web behaviour of a group of respondents.The research project 'The Newstracker' was a subproject of the NWO-funded project 'The New News Consumer: A User-Based Innovation Project to Meet Paradigmatic Change in News Use and Media Habits'.For the Newstracker project we aimed to understand the web behaviour of a group of respondents. We created custom-built software to monitor their web browsing behaviour on their laptops and desktops (please find the code in open access at https://github.com/NITechLabs/NewsTracker). For reasons of scale and privacy we created a whitelist with websites that were the most popular websites in 2015. We manually compiled this list by using data of DDMM, Alexa and own research. The dataset consists of 5 columns:- the URL- the type of website: We created a list of types of websites and each website has been manually labeled with 1 category- Nieuws-regio: When the category was 'News', we subdivided these websites in the regional focus: International, National or Local- Nieuws-onderwerp: Furthermore, each website under the category News was further subdivided in type of news website. For this we created an own list of news categories and manually coded each website- Bron: For each website we noted which source we used to find this website.The full description of the research design of the Newstracker including the set-up of this whitelist is included in the following article: Kleppe, M., Otte, M. (in print), 'Analysing & understanding news consumption patterns by tracking online user behaviour with a multimodal research design', Digital Scholarship in the Humanities, doi 10.1093/llc/fqx030.

  15. Dataset used for HTTPS traffic classification using packet burst statistics

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tropkova Zdena; Hynek Karel; Hynek Karel; Cejka Tomas; Cejka Tomas; Tropkova Zdena (2022). Dataset used for HTTPS traffic classification using packet burst statistics [Dataset]. http://doi.org/10.5281/zenodo.4911551
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 11, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tropkova Zdena; Hynek Karel; Hynek Karel; Cejka Tomas; Cejka Tomas; Tropkova Zdena
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We are publishing a dataset we created for the HTTPS traffic classification.

    Since the data were captured mainly in the real backbone network, we omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).

    During our research, we divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.

    We have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. We also used several popular websites that primarily focus on the audience in our country. The identified traffic classes and their representatives are provided below:

    • Live Video Stream Twitch, Czech TV, YouTube Live
    • Video Player DailyMotion, Stream.cz, Vimeo, YouTube
    • Music Player AppleMusic, Spotify, SoundCloud
    • File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive
    • Website and Other Traffic Websites from Alexa Top 1M list


  16. Military Installations, Ranges, and Training Areas (MIRTA) DoD Sites -...

    • hifld-geoplatform.hub.arcgis.com
    • disasters-geoplatform.hub.arcgis.com
    • +4more
    Updated Jul 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeoPlatform ArcGIS Online (2023). Military Installations, Ranges, and Training Areas (MIRTA) DoD Sites - Points [Dataset]. https://hifld-geoplatform.hub.arcgis.com/datasets/military-installations-ranges-and-training-areas-mirta-dod-sites-points
    Explore at:
    Dataset updated
    Jul 31, 2023
    Dataset provided by
    Authors
    GeoPlatform ArcGIS Online
    Area covered
    North Pacific Ocean, Pacific Ocean
    Description

    The dataset depicts the authoritative locations of the most commonly known Department of Defense (DoD) sites, installations, ranges, and training areas world-wide. These sites encompass land which is federally owned or otherwise managed. This dataset was created from source data provided by the four Military Service Component headquarters and was compiled by the Defense Installation Spatial Data Infrastructure (DISDI) Program within the Office of the Assistant Secretary of Defense for Energy, Installations, and Environment. Only sites reported in the BSR or released in a map supplementing the Foreign Investment Risk Review Modernization Act of 2018 (FIRRMA) Real Estate Regulation (31 CFR Part 802) were considered for inclusion. This list does not necessarily represent a comprehensive collection of all Department of Defense facilities. For inventory purposes, installations are comprised of sites, where a site is defined as a specific geographic location of federally owned or managed land and is assigned to military installation. DoD installations are commonly referred to as a base, camp, post, station, yard, center, homeport facility for any ship, or other activity under the jurisdiction, custody, control of the DoD.While every attempt has been made to provide the best available data quality, this data set is intended for use at mapping scales between 1:50,000 and 1:3,000,000. For this reason, boundaries in this data set may not perfectly align with DoD site boundaries depicted in other federal data sources. Maps produced at a scale of 1:50,000 or smaller which otherwise comply with National Map Accuracy Standards, will remain compliant when this data is incorporated. Boundary data is most suitable for larger scale maps; point locations are better suited for mapping scales between 1:250,000 and 1:3,000,000.If a site is part of a Joint Base (effective/designated on 1 October, 2010) as established under the 2005 Base Realignment and Closure process, it is attributed with the name of the Joint Base. All sites comprising a Joint Base are also attributed to the responsible DoD Component, which is not necessarily the pre-2005 Component responsible for the site.

  17. Hazard Mitigation Assistance Projects by NFIP CRS Communities

    • catalog.data.gov
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FEMA/Resilience/Federal Insurance Directorate (2024). Hazard Mitigation Assistance Projects by NFIP CRS Communities [Dataset]. https://catalog.data.gov/dataset/hazard-mitigation-assistance-projects-by-nfip-crs-communities-nemis
    Explore at:
    Dataset updated
    Dec 8, 2024
    Dataset provided by
    Federal Emergency Management Agencyhttp://www.fema.gov/
    Description

    This dataset contains funded projects under FEMA's Hazard Mitigation Assistance (HMA) grant programs by communities participating in the National Flood Insurance Program (NFIP) Community Rating System (CRS). The Hazard Mitigation Assistance Projects by NFIP CRS Communities dataset can be joined to the OpenFEMA Hazard Mitigation Assistance Funded Project dataset by the Project Identifier field. Note, not all projects in the Hazard Mitigation Assistance Funded Project dataset will associate to an NFIP CRS Community. For more information on the NFIP CRS Program, visit https://www.fema.gov/flood-insurance/rules-legislation/community-rating-system.rnrnFEMA administers three programs that provide funding for eligible mitigation planning and projects that reduces disaster losses and protect life and property from future disaster damages. The three programs are the Hazard Mitigation Grant Program (HMGP), Flood Mitigation Assistance (FMA) grant program, and Building Resilient Infrastructure and Communities (BRIC) grant program. This dataset also contains data from the HMA grant programs that were eliminated by the Disaster Recovery Reform Act of 2018 (Pre-Disaster Mitigation (PDM) grant program) and Biggert Water Flood Insurance Reform Act of 2012 (Repetitive Flood Claims (RFC) grant program and Severe Repetitive Loss (SRL) grant program). For more information on the Hazard Mitigation Assistance grant programs, please visit: https://www.fema.gov/grants/mitigation. rnrnThis is raw, unedited data from FEMA's National Emergency Management Information System (NEMIS) and Mitigation eGrants Systems and is dependent on Regional entry, as such it is subject to a small percentage of human error and delayed entry of plan information. The data is updated from authoritative sources and has a minimum 24 hour delay. This dataset is not derived from FEMA's official financial system and is not intended to be used for any official federal financial reporting. Due to differences in reporting periods, status of obligations and how business rules are applied, the financial information in this dataset may differ slightly from official publication on public websites such as usaspending.gov. rnrnPlease note that jurisdictions may participate in multiple plans. rnrnCitation: The Agency's preferred citation for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page, Citing Data section: https://www.fema.gov/about/openfema/terms-conditions.rnrnPlace name may differ from official naming standard referenced in update organization documents (i.e. Tribal name under BIA list or other authoritative source Village of, City of, etc).rnrnIf you have media inquiries about this dataset please email the FEMA News Desk FEMA-News-Desk@dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open government program please contact the OpenFEMA team via email OpenFEMA@fema.dhs.gov.

  18. National Hydrography Dataset Plus Version 2.1

    • resilience.climate.gov
    • oregonwaterdata.org
    • +1more
    Updated Aug 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2022). National Hydrography Dataset Plus Version 2.1 [Dataset]. https://resilience.climate.gov/maps/4bd9b6892530404abfe13645fcb5099a
    Explore at:
    Dataset updated
    Aug 16, 2022
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    North Pacific Ocean, Pacific Ocean
    Description

    The National Hydrography Dataset Plus (NHDplus) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US EPA Office of Water and the US Geological Survey, the NHDPlus provides mean annual and monthly flow estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses. For more information on the NHDPlus dataset see the NHDPlus v2 User Guide.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territories not including Alaska.Geographic Extent: The United States not including Alaska, Puerto Rico, Guam, US Virgin Islands, Marshall Islands, Northern Marianas Islands, Palau, Federated States of Micronesia, and American SamoaProjection: Web Mercator Auxiliary Sphere Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: EPA and USGSUpdate Frequency: There is new new data since this 2019 version, so no updates planned in the futurePublication Date: March 13, 2019Prior to publication, the NHDPlus network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the NHDPlus Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, On or Off Network (flowlines only), Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original NHDPlus dataset. No data values -9999 and -9998 were converted to Null values for many of the flowline fields.What can you do with this layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer is limited to scales of approximately 1:1,000,000 or larger but a vector tile layer created from the same data can be used at smaller scales to produce a webmap that displays across the full range of scales. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute. Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map. Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class. Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.

  19. o

    Education Attainment and Enrollment around the World - Dataset - Data...

    • data.opendata.am
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Education Attainment and Enrollment around the World - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/dcwb0038973
    Explore at:
    Dataset updated
    Jul 7, 2023
    Area covered
    World
    Description

    Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.

  20. d

    Data for multiple linear regression models for estimating Escherichia coli...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data for multiple linear regression models for estimating Escherichia coli (E. coli) concentrations or the probability of exceeding the bathing-water standard at recreational sites in Ohio and Pennsylvania as part of the Great Lakes NowCast, 2019 [Dataset]. https://catalog.data.gov/dataset/data-for-multiple-linear-regression-models-for-estimating-escherichia-coli-e-coli-concentr
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Pennsylvania, The Great Lakes
    Description

    Site-specific multiple linear regression models were developed for one beach in Ohio (three discrete sampling sites) and one beach in Pennsylvania to estimate concentrations of Escherichia coli (E. coli) or the probability of exceeding the bathing-water standard for E. coli in recreational waters used by the public. Traditional culture-based methods are commonly used to estimate concentrations of fecal indicator bacteria, such as E. coli; however, results are obtained 18 to 24 hours post sampling and do not accurately reflect current water-quality conditions. Beach-specific mathematical models use environmental and water-quality variables that are easily and quickly measured as surrogates to estimate concentrations of fecal-indicator bacteria or to provide the probability that a State recreational water-quality standard will be exceeded. When predictive models are used for beach closure or advisory decisions, they are referred to as “nowcasts”. Software designed for model development by the U.S. Environmental Protection Agency (Virtual Beach) was used. The selected model for each beach was based on a combination of explanatory variables including, most commonly, turbidity, water temperature, change in lake level over 24 hours, and antecedent rainfall. Model results are used by managers to report water-quality conditions to the public through the Great Lakes NowCast in 2019 (https://pa.water.usgs.gov/apps/nowcast/). Model performance in 2019 (sensitivity, specificity, and accuracy) was compared to using the previous day's E. coli concentration (persistence method).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Organization logo

Company Datasets for Business Profiling

Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
British Indian Ocean Territory, Northern Mariana Islands, Moldova (Republic of), Bangladesh, Isle of Man, Canada, Andorra, Taiwan, Tunisia, Nepal
Description

Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

  • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

  • Company name;
  • Size;
  • Founding date;
  • Location;
  • Industry;
  • Revenue;
  • Employee count;
  • Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

  • Fresh and accurate data collected and parsed by our expert web scraping team.
  • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
  • A customized approach tailored to your specific business needs.
  • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

  • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
  • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
  • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
  • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

Search
Clear search
Close search
Google apps
Main menu