100+ datasets found
  1. My First Data Science Project

    • kaggle.com
    Updated Jan 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NIYIBIGIRA Geredi (2023). My First Data Science Project [Dataset]. https://www.kaggle.com/datasets/niyibigirageredi/my-first-data-science-project/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    NIYIBIGIRA Geredi
    Description

    Dataset

    This dataset was created by NIYIBIGIRA Geredi

    Contents

  2. R

    Data Science Group Project Dataset

    • universe.roboflow.com
    zip
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony (2024). Data Science Group Project Dataset [Dataset]. https://universe.roboflow.com/anthony-pygfy/data-science-group-project/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 4, 2024
    Dataset authored and provided by
    Anthony
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Ball Bounding Boxes
    Description

    Data Science Group Project

    ## Overview
    
    Data Science Group Project is a dataset for object detection tasks - it contains Ball annotations for 200 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. List of Citizen Science Projects in the Netherlands

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, pdf
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Annemarie Timmers; Annemarie Timmers; Peter Lugtig; Peter Lugtig (2024). List of Citizen Science Projects in the Netherlands [Dataset]. http://doi.org/10.5281/zenodo.4724570
    Explore at:
    csv, bin, pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Annemarie Timmers; Annemarie Timmers; Peter Lugtig; Peter Lugtig
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Netherlands
    Description

    This dataset contains citizen science projects in the Netherlands. Citizen science is scientific work that citizens completely or partially conduct themselves on a voluntary basis. Professional researchers often provide instructions and guidance, but they are not the main collectors of the data. Alongside project names, the dataset includes supplementary information regarding a description of the project, its organizers, location, duration, data accessibility, and links to the main project information and data if applicable.

    The CSV and XLSX files provide an up-to-date list of both closed and ongoing citizen science projects in the Netherlands categorized by discipline. Anyone can add or update a citizen science project, alongside its supplementary information, using the main GitHub repository (https://github.com/sodascience/awesome-citizen-science-nl). It is only possible to add projects conducted in the Netherlands and we encourage other countries to create such a list themselves. We provide further descriptions of the supplementary information and their input format in the codebook, which can be found on Github and in this repository. We update the dataset as additional projects are added and approved.

    This project received support of the SoDa team of ODISSEI (https://odissei-data.nl/en/using-soda/) to make the dataset openly available via Github and publish the data on Zenodo.

  4. agile project dataset 2024

    • kaggle.com
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    digro k (2025). agile project dataset 2024 [Dataset]. https://www.kaggle.com/datasets/digrok/agile-project-dataset-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    digro k
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset Description: 200 Agile Software Projects Overview This dataset contains records of 200 Agile software development projects. It includes various performance metrics related to Agile methodologies, measuring their effectiveness in project success, risk mitigation, time efficiency, and cost savings. The dataset is designed for analysis of AI-driven automation in Agile software teams.

    Dataset Variables Agile Effectiveness (Likert scale: 2 to 5)

    1. Measures how well Agile methodologies enhance project management processes. Risk Mitigation (Likert scale: 2 to 5)

    2. Captures the effectiveness of Agile in identifying and reducing risks throughout the project lifecycle. Management Satisfaction (Likert scale: 2 to 5)

    3. Represents how satisfied management is with the outcomes of Agile-implemented projects. Supply Chain Improvement (Likert scale: 2 to 5)

    4. Evaluates the impact of Agile practices on optimizing supply chain processes. Time Efficiency (Likert scale: 2 to 5)

    5. Measures improvements in time management within Agile projects. Cost Savings (%) (Range: 10% to 48%)

    6. Quantifies the percentage of cost savings achieved due to Agile methodologies. Project Success (Binary: 0 = Failure, 1 = Success)

    Indicates whether the project was considered successful. Usage This dataset is useful for: ✅ Evaluating the impact of AI automation on Agile workflows. ✅ Understanding factors contributing to Agile project success. ✅ Analyzing cost savings and efficiency improvements in Agile teams. ✅ Building machine learning models to predict project success based on Agile metrics.

  5. Data Scientist Survey Project

    • kaggle.com
    Updated Nov 12, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manqiong (2017). Data Scientist Survey Project [Dataset]. https://www.kaggle.com/datasets/manqiong/data-scientist-survey-project/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 12, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Manqiong
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Manqiong

    Released under Database: Open Database, Contents: © Original Authors

    Contents

  6. c

    Grainger products dataset

    • crawlfeeds.com
    csv, zip
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Grainger products dataset [Dataset]. https://crawlfeeds.com/datasets/grainger-products-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock the full potential of your data-driven projects with our comprehensive Grainger products dataset. This meticulously curated dataset includes detailed information on a wide range of products available on Grainger, one of the leading industrial supply companies.

    This dataset is perfect for eCommerce platforms, market analysis, competitive analysis, product comparison, and more. Leverage the power of high-quality, structured data to enhance your business strategies and decision-making processes.

    Versions:

    Available latest version of the Grainger dataset with 1.2 Million records and last extracted on Jan 2025.

    Reach out to contact@crawlfeeds.com

    Use Cases:

    • eCommerce Platforms: Integrate detailed product information to enhance your product listings.
    • Market Analysis: Analyze product trends, pricing, and competition in the industrial supply market.
    • Inventory Management: Utilize SKUs and unique identifiers for efficient inventory tracking.
    • Data-Driven Projects: Incorporate rich product data into your data science and machine learning models.

    Explore the vast collection of Grainger products and elevate your business insights with this high-quality dataset.

  7. Spain 2019 Data for Data Science Project

    • kaggle.com
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Sebastian Moreno (2022). Spain 2019 Data for Data Science Project [Dataset]. https://www.kaggle.com/datasets/juansebastianmoreno/spain-2019-data-for-data-science-project/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Juan Sebastian Moreno
    Description

    Dataset

    This dataset was created by Juan Sebastian Moreno

    Contents

  8. R

    Data_science Project Dataset

    • universe.roboflow.com
    zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankur (2024). Data_science Project Dataset [Dataset]. https://universe.roboflow.com/ankur-ucrzr/data_science-project/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 9, 2024
    Dataset authored and provided by
    Ankur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Fractures 2LEG Bounding Boxes
    Description

    Data_Science Project

    ## Overview
    
    Data_Science Project is a dataset for object detection tasks - it contains Fractures 2LEG annotations for 1,035 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  9. Z

    CS Track database - Dataset

    • data.niaid.nih.gov
    Updated Nov 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIDE-UPF (2022). CS Track database - Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7310095
    Explore at:
    Dataset updated
    Nov 28, 2022
    Dataset authored and provided by
    TIDE-UPF
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the main dataset which consist a list all relevant details of the CS Track database. This dataset contains the following information from the CS Track database:

    Citizen Science (CS) projects title

    the data extracted date

    the language of the CS projects informations

    the URL(s) of the website(s) from where the CS projects information was extracted. For other studies developed in CS Track consortium it might be useful to consult this data

    full list of assignments for research areas and SDGs for each CS project.

  10. R

    Science Project Dataset

    • universe.roboflow.com
    zip
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SciVerse (2024). Science Project Dataset [Dataset]. https://universe.roboflow.com/sciverse/science-project
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset authored and provided by
    SciVerse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Hand Sign Bounding Boxes
    Description

    Science Project

    ## Overview
    
    Science Project is a dataset for object detection tasks - it contains Hand Sign annotations for 1,802 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  11. Z

    Data from: [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User...

    • data.niaid.nih.gov
    • recerca.uoc.edu
    Updated Nov 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Ulrich Hoppe (2022). [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects - Graph Files [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7356425
    Explore at:
    Dataset updated
    Nov 24, 2022
    Dataset provided by
    Ishari Amarasinghe
    H. Ulrich Hoppe
    Nicolás Felipe Gutiérrez-Páez
    Simon Krukowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Explanation/Overview:

    Corresponding graph files of the extracted Zooniverse networks described in D3.3 (can be found here), which are the result of our research that culminated into the publication "Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects", a conference paper for the conference CollabTech 2022: Collaboration Technologies and Social Computing and published as part of the Lecture Notes in Computer Science book series (LNCS,volume 13632) here. Usernames have been anonymised.

    The graph files are in .gexf (graph exchange XML format) and .gml (graph modeling language) formats which can be used by common graph/network-analysis and visualisation tools such as Gephi.

    Purpose:

    The purpose of this dataset is to provide the basis for possible further examinations of the network structure, involving additional (not yet analysed) features such as the content of the comments etc.

    Relatedness:

    The data of the different projects was derived from the forums of 7 Zooniverse projects based on similar discussion board features. The projects are: 'Galaxy Zoo', 'Gravity Spy', 'Seabirdwatch', 'Snapshot Wisconsin', 'Wildwatch Kenya', 'Galaxy Nurseries', 'Penguin Watch'.

    Content:

    The dataset contains distinct graph files for each of the analysed projects. For each graph file, there are nodes and edges and their associated attributes (i.e., each edge can have an attribute). For the edges, apart from source and target, we have as attributes:

    weight

    project_title

    body (i.e., text)

    created_at

    userRoles

    discussion_title

    discussion_id

    user_id

    board_title

    relation

    target_role

    For the nodes, the attributes are:

    user_id

    userRoles

    degree_reply (i.e., degree for the reply relation)

    in_degree_reply

    out_degree_reply

    degree_comment

    in_degree_comment

    out_degree_comment

    degree_total

    in_degree_total

    out_degree_total

    target_role

    Grouping:

    Each graph file represents all the comments for the respective project across its lifespan irrespective of any time slices. Edges represent the comments and users represent the nodes. While the different boards are still contained within the data, all boards occur in the data.

  12. Data from: The iratebirds Citizen Science Project: a Dataset on Birds’...

    • figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Haukka; Aleksi Lehikoinen; Stefano Mammola; William Morris; Andrea Santangeli (2023). The iratebirds Citizen Science Project: a Dataset on Birds’ Visual Aesthetic Attractiveness to Humans [Dataset]. http://doi.org/10.6084/m9.figshare.20170082.v2
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anna Haukka; Aleksi Lehikoinen; Stefano Mammola; William Morris; Andrea Santangeli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The iratebirds database contains comprehensive visual aesthetic attractiveness, as seen by humans, data for bird taxonomic units (following the eBird/Clements integrated checklist v. 2019). The data were collected with the iratebirds.app -website citizen science project, where users rated the appearance of birds on a linear scale from 1-10. The rating were based on photographs of the birds available from the Macaulay Library database. Each rating score of a bird species or subspecies is based on several photographs of the same bird species. The application code is openly available on GitHub: https://github.com/luomus/iratebirds The application was spread during August 2020 – April 2021, globally, to as wide audiences as possible using social media, traditional media, collaborators and email-lists.

    The iratebirds database is based on 408 207 ratings from 6 212 users. It consists of raw visual aesthetic attractiveness rating data as well as complementary data from an online survey that sourced demographic information from a subset of 2 785 users who scored the birds. The online survey gives information on these users’ birding skills, nature connectedness, profession, home country, age and gender. On top of these, the data scores for birds’ visual aesthetic attractiveness to humans have been modelled with hierarchical models to obtain overall average scores for the bird species and subspecies. More details on the data are found in this file’s section “Methodological information” as well as in the publication Haukka, A. et al. (2023), The iratebirds Citizen Science Project: a Dataset on Birds’ Visual Aesthetic Attractiveness to Humans, Scientific Data. The full database "iratebirds_raw_data_taxonomy_photoinfo_ratings_survey_251022.csv" includes all the data related to the photographs scored (e.g. place and location of the photograph, and its quality), the species and subspecies names (following the eBird/Clements integrated checklist v. 2019), the raw scores made by the users, details of the users (e.g. language used), and internal user ID, and for the users who took the online survey, also detailed information about their demography, e.g. home country and other information related to their knowledge of and connection to nature and birds. The modeled rating scores database "iratebirds_final_predictions_average_fullmodel_subsetmodel_151122.csv" includes visual aesthetic attractiveness of birds, as perceived by humans, calculated in three different ways. The most appropiate score can be chosen by the user according to the specific research needs, but in general we recommend using the scores from the full model (ii). The three different measures are i) raw visual aesthetic attractiveness for each bird species (or subspecies), ii) full model: visual aesthetic attractiveness corrected for language group of the scorer and the quality of the photo scored, iii) subset model: visual aesthetic attractiveness corrected as in ii) plus other user specific factors (related to bird and nature knowlegde and connections, home country, age. and gender). The file also gives information on how many photos were used for scoring each bird and how many users have scored the species. The latter subset model iii) represents only a subset of all the species. The data on visual aesthetic attractiveness are also available at the species and the sex within-species level, for the sexually dichromatic species, in the file "iratebirds_pred_ratings_species_and_sex_level_120123.csv".

    All database files are given both as .csv- and .xlsx -files. The data and code to reproduce the analyses, figures and tables presented in Haukka et al. 2023 The iratebirds citizen science project: a dataset of birds’ visual aesthetic attractiveness to humans (Scientific Data doi: https://doi.org/10.1038/s41597-023-02169-0) are included in the 'iratebirds_raw_data_taxonomy_photoinfo_ratings_survey_251022.csv' and 'Haukka_et_al_Scientific_Data_modelling.R','Haukka_et_al_Scientific_Data_Figure.R' and 'Haukka_et_al_Scientific_Data_Tables.R' -files. Detailed information on dataprosessing and models can be found in the publication Haukka et al. 2023 The iratebirds Citizen Science Project: a Dataset on Birds’ Visual Aesthetic Attractiveness to Humans, Scientific Data doi: https://doi.org/10.1038/s41597-023-02169-0)

  13. h

    NATCOOP dataset

    • heidata.uni-heidelberg.de
    csv, docx, pdf, tsv +1
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Diekert; Florian Diekert; Robbert-Jan Schaap; Robbert-Jan Schaap; Tillmann Eymess; Tillmann Eymess (2022). NATCOOP dataset [Dataset]. http://doi.org/10.11588/DATA/GV8NBL
    Explore at:
    docx(90179), pdf(432619), csv(3441765), docx(499022), tsv(86553), pdf(473493), pdf(856157), pdf(467245), docx(101203), pdf(351653), pdf(576588), pdf(200225), pdf(124038), type/x-r-syntax(14339), pdf(345323), pdf(69467), docx(43108), pdf(268168), docx(493800), docx(25110), docx(43036), pdf(270379), pdf(77960), pdf(464499), pdf(392748), docx(42158), pdf(374488), docx(498354), pdf(282466), pdf(482954), pdf(302513), pdf(513748), pdf(126342), docx(33772), tsv(2313475), pdf(441389), pdf(92836), pdf(392718)Available download formats
    Dataset updated
    Jan 27, 2022
    Dataset provided by
    heiDATA
    Authors
    Florian Diekert; Florian Diekert; Robbert-Jan Schaap; Robbert-Jan Schaap; Tillmann Eymess; Tillmann Eymess
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBLhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBL

    Time period covered
    Jan 1, 2017 - Jan 1, 2021
    Dataset funded by
    European Commission
    Description

    The NATCOOP project set out to study how nature shapes the preferences and incentives of economic agents and how this in turn affects common-pool resource management. Imagine a group of fishermen targeting a species that requires a lot of teamwork to harvest. Do these fishers become more social over time compared to fishers that work in a more solitary manner? If so, does this have implications for how the fishery should be managed? To study this, the NATCOOP team travelled to Chile and Tanzania and collected data using surveys and economic experiments. These two very different countries have a large population of small-scale fishermen, and both host several distinct types of fisheries. Over the course of five field trips, the project team surveyed more than 2500 fishermen with each field trip contributing to the main research question by measuring fishermen’s preferences for cooperation and risk. Additionally, each fieldtrip aimed to answer another smaller research question that was either focused on risk taking or cooperation behavior in the fisheries. The data from both surveys and experiments are now publicly available and can be freely studied by other researchers, resource managers, or interested citizens. Overall, the NATCOOP dataset contains participants’ responses to a plethora of survey questions and their actions during incentivized economic experiments. It is available in both the .dta and .csv format, and its use is recommended with statistical software such as R or Stata. For those unaccustomed with statistical analysis, we included a video tutorial on how to use the data set in the open-source program R.

  14. s

    Dataset - Understanding the software and data used in the social sciences

    • eprints.soton.ac.uk
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chue Hong, Neil; Aragon, Selina; Antonioletti, Mario; Walker, Johanna (2023). Dataset - Understanding the software and data used in the social sciences [Dataset]. http://doi.org/10.5281/zenodo.7785710
    Explore at:
    Dataset updated
    Mar 30, 2023
    Dataset provided by
    Zenodo
    Authors
    Chue Hong, Neil; Aragon, Selina; Antonioletti, Mario; Walker, Johanna
    Description

    This is a repository for a UKRI Economic and Social Research Council (ESRC) funded project to understand the software used to analyse social sciences data. Any software produced has been made available under a BSD 2-Clause license and any data and other non-software derivative is made available under a CC-BY 4.0 International License. Note that the software that analysed the survey is provided for illustrative purposes - it will not work on the decoupled anonymised data set. Exceptions to this are: Data from the UKRI ESRC is mostly made available under a CC BY-NC-SA 4.0 Licence. Data from Gateway to Research is made available under an Open Government Licence (Version 3.0). Contents Survey data & analysis: esrc_data-survey-analysis-data.zip Other data: esrc_data-other-data.zip Transcripts: esrc_data-transcripts.zip Data Management Plan: esrc_data-dmp.zip Survey data & analysis The survey ran from 3rd February 2022 to 6th March 2023 during which 168 responses were received. Of these responses, three were removed because they were supplied by people from outside the UK without a clear indication of involvement with the UK or associated infrastructure. A fourth response was removed as both came from the same person which leaves us with 164 responses in the data. The survey responses, Question (Q) Q1-Q16, have been decoupled from the demographic data, Q17-Q23. Questions Q24-Q28 are for follow-up and have been removed from the data. The institutions (Q17) and funding sources (Q18) have been provided in a separate file as this could be used to identify respondents. Q17, Q18 and Q19-Q23 have all been independently shuffled. The data has been made available as Comma Separated Values (CSV) with the question number as the header of each column and the encoded responses in the column below. To see what the question and the responses correspond to you will have to consult the survey-results-key.csv which decodes the question and responses accordingly. A pdf copy of the survey questions is available on GitHub. The survey data has been decoupled into: survey-results-key.csv - maps a question number and the responses to the actual question values. q1-16-survey-results.csv- the non-demographic component of the survey responses (Q1-Q16). q19-23-demographics.csv - the demographic part of the survey (Q19-Q21, Q23). q17-institutions.csv - the institution/location of the respondent (Q17). q18-funding.csv - funding sources within the last 5 years (Q18). Please note the code that has been used to do the analysis will not run with the decoupled survey data. Other data files included CleanedLocations.csv - normalised version of the institutions that the survey respondents volunteered. DTPs.csv - information on the UKRI Doctoral Training Partnerships (DTPs) scaped from the UKRI DTP contacts web page in October 2021. projectsearch-1646403729132.csv.gz - data snapshot from the UKRI Gateway to Research released on the 24th February 2022 made available under an Open Government Licence. locations.csv - latitude and longitude for the institutions in the cleaned locations. subjects.csv - research classifications for the ESRC projects for the 24th February data snapshot. topics.csv - topic classification for the ESRC projects for the 24th February data snapshot. Interview transcripts The interview transcripts have been anonymised and converted to markdown so that it's easier to process in general. List of interview transcripts: 1269794877.md 1578450175.md 1792505583.md 2964377624.md 3270614512.md 40983347262.md 4288358080.md 4561769548.md 4938919540.md 5037840428.md 5766299900.md 5996360861.md 6422621713.md 6776362537.md 7183719943.md 7227322280.md 7336263536.md 75909371872.md 7869268779.md 8031500357.md 9253010492.md Data Management Plan The study's Data Management Plan is provided in PDF format and shows the different data sets used throughout the duration of the study and where they have been deposited, as well as how long the SSI will keep these records.

  15. Python for Data Science-Uber Drive Project

    • kaggle.com
    zip
    Updated May 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Athisya Nadar (2021). Python for Data Science-Uber Drive Project [Dataset]. https://www.kaggle.com/athisyanadar/python-for-data-scienceuber-drive-project
    Explore at:
    zip(59869 bytes)Available download formats
    Dataset updated
    May 26, 2021
    Authors
    Athisya Nadar
    Description

    Dataset

    This dataset was created by Athisya Nadar

    Contents

    It contains the following files:

  16. Q

    Catalog of Ocean Data Science Initiatives

    • data.qdr.syr.edu
    pdf, txt, xlsx
    Updated May 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lauren Alexandra Drakopulos; Lauren Alexandra Drakopulos; Elizabeth Havice; Elizabeth Havice; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell (2022). Catalog of Ocean Data Science Initiatives [Dataset]. http://doi.org/10.5064/F6ZQWQJS
    Explore at:
    xlsx(344302), pdf(81722), txt(4514), pdf(222143)Available download formats
    Dataset updated
    May 26, 2022
    Dataset provided by
    Qualitative Data Repository
    Authors
    Lauren Alexandra Drakopulos; Lauren Alexandra Drakopulos; Elizabeth Havice; Elizabeth Havice; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell
    License

    https://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions

    Dataset funded by
    National Science Foundation, Human-Environment and Geographical Sciences Program
    Description

    Project Overview This dataset is a catalog of oceans data science initiatives (ODSIs). We define an ODSI as an initiative that mobilizes (often geospatial and temporal) big data and/or novel data sources about the oceans with an express goal of informing or improving conditions in the oceans. ODSI identification began in Jan 2020. Additional ODSIs will continue to be added. We identified more than 150 ODSIs and populated the catalog with data gathered from ODSI websites describing key features of their work including 1) the data infrastructure 2) their organizational structure, 3) the ocean worlds, or ontologies, they create, and 4) the (explicit or implicit) policy and governance ‘solutions’ and relations they promote. The ODSIs in the catalog are global and regional in scope and aim to enhance understanding around three topical concerns: fisheries extraction, biodiversity conservation, and enhancing basic scientific knowledge. Data overview For 100 ODSIs, we created metadata about the data architecture, organizational governance, and world-making practices such as their stated purpose, theory of change, and problem/solution framing. For a subset of 30 ODSIs, we created metadata about their policy and governance stances and practices. All metadata was created based on a textual analysis of their websites and public communications. Data collection overview Sampling strategy: We began with a purposive sample of ODSIs based on the research team’s prior knowledge of and participation in global and regional ODSIs. This sample allowed us to pilot and refine our metadata catalog approach. We then used a combination of keyword searches on Google using search terms such as ‘ocean data’ ‘marine data’ and ‘fisheries data’. Adopting a snowball sampling method, we reviewed the websites of ODSIs that came up in our initial search to find references to additional ODSIs. To determine if an entity was an ODSI, we reviewed web pages for information on purpose, goals, objectives, mission, values (usually in tabs labeled ‘About’ ‘Goals’ or ‘Objectives’) and we looked for links to ‘data’ or ‘data products.’ Entities were selected for our catalog based on two criteria: 1) their stated purpose, goals, objectives, mission, values indicated a commitment to advancing ocean science and data and 2) if they focused on regional or global scales. We selected and categorized ODSIs according to three broad focal areas in global and regional oceans governance: fisheries extraction, biodiversity conservation, and basic ocean science development. Shared data organization This catalog is comprised of three files. 'Havice_ODSIC.pdf' provides a list of each ODSI included in the catalog, and a permalink to the webpage used to populate catalog metadata categories. 'Havice_ODSIC-CodingScheme.pdf' provides a list of code description for the catalog metadata. 'Havice_ODSIC-Metadata.xlsx' is the full catalog with populated metadata.

  17. ghtorrent-projects Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, txt
    Updated Jul 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marios Papachristou; Marios Papachristou (2021). ghtorrent-projects Dataset [Dataset]. http://doi.org/10.5281/zenodo.5111043
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Jul 17, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marios Papachristou; Marios Papachristou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A hypergraph dataset mined from the GHTorrent project is presented. The dataset contains two files

    1. project_members.txt: Contains GitHub projects with at least 2 contributors and the corresponding contributors (as a hyperedge). The format of the data is:

    2. num_followers.txt: Contains all GitHub users and their number of followers.

    The artifact also contains the SQL queries used to obtain the data from GHTorrent (schema).

  18. Data from: MODIRISK: Monitoring of Mosquito Vectors of Disease (inventory)

    • gbif.org
    • data.europa.eu
    Updated May 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veerle Versteirt; Wouter Dekoninck; Wim Van Bortel; Dimitri Brosens; Veerle Versteirt; Wouter Dekoninck; Wim Van Bortel; Dimitri Brosens (2022). MODIRISK: Monitoring of Mosquito Vectors of Disease (inventory) [Dataset]. http://doi.org/10.15468/4fidg2
    Explore at:
    Dataset updated
    May 2, 2022
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Belgian Biodiversity Platform
    Authors
    Veerle Versteirt; Wouter Dekoninck; Wim Van Bortel; Dimitri Brosens; Veerle Versteirt; Wouter Dekoninck; Wim Van Bortel; Dimitri Brosens
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2007 - Dec 31, 2011
    Area covered
    Description

    MODIRISK aims at studying biodiversity of mosquitoes and monitoring/predicting its changes, and hence actively prepares to address issues on the impact of biodiversity change with particular reference to invasive species and the risk to introduce new pathogens. This is essential in the perspective of the ongoing global changes creating suitable conditions for the spread of invasive species and the (re)emergence of vector-borne diseases in Europe. The main strengths of the project in the context of sustainable development are the link between biodiversity and health-environment, and its contribution to the development of tools to better describe the spatial distribution of mosquito biodiversity. MODIRISK addresses key topics of the global initiative Diversitas, which was one of the main drivers of the 'Research programme Science for a Sustainable Development' (SSD). This dataset contains the monitoring data.

    The project was coordinated by the Institute of Tropical Medicine (http://www.itg.be/E) in Antwerp.

  19. P

    EDGE-IIOTSET Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Description

    ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

    Instructions:

    Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

    Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

    Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

    The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

    Link to paper : https://ieeexplore.ieee.org/document/9751703

    The directories of the Edge-IIoTset dataset include the following:

    •File 1 (Normal traffic)

    -File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

    -File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

    -File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

    -File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

    -File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

    -File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

    -File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

    -File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

    •File 2 (Attack traffic):

    -File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

    -File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

    •File 3 (Selected dataset for ML and DL):

    -File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

    -File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

    Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

    !pip install -q kaggle

    files.upload()

    !mkdir ~/.kaggle

    !cp kaggle.json ~/.kaggle/

    !chmod 600 ~/.kaggle/kaggle.json

    !kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

    !unzip DNN-EdgeIIoT-dataset.csv.zip

    !rm DNN-EdgeIIoT-dataset.csv.zip

    Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

    import numpy as np

    df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

    Step 3 : Exploring some of the DataFrame's contents: df.head(5)

    print(df['Attack_type'].value_counts())

    Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

    drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

     "http.file_data","http.request.full_uri","icmp.transmit_timestamp",
    
     "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
    
     "tcp.dstport", "udp.port", "mqtt.msg"]
    

    df.drop(drop_columns, axis=1, inplace=True)

    df.dropna(axis=0, how='any', inplace=True)

    df.drop_duplicates(subset=None, keep="first", inplace=True)

    df = shuffle(df)

    df.isna().sum()

    print(df['Attack_type'].value_counts())

    Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from sklearn import preprocessing

    def encode_text_dummy(df, name):

    dummies = pd.get_dummies(df[name])

    for x in dummies.columns:

    dummy_name = f"{name}-{x}"
    
    df[dummy_name] = dummies[x]
    

    df.drop(name, axis=1, inplace=True)

    encode_text_dummy(df,'http.request.method')

    encode_text_dummy(df,'http.referer')

    encode_text_dummy(df,"http.request.version")

    encode_text_dummy(df,"dns.qry.name.len")

    encode_text_dummy(df,"mqtt.conack.flags")

    encode_text_dummy(df,"mqtt.protoname")

    encode_text_dummy(df,"mqtt.topic")

    Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

    For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

    More information about Dr. Mohamed Amine Ferrag is available at:

    https://www.linkedin.com/in/Mohamed-Amine-Ferrag

    https://dblp.uni-trier.de/pid/142/9937.html

    https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

    https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

    https://www.scopus.com/authid/detail.uri?authorId=56115001200

    https://publons.com/researcher/1322865/mohamed-amine-ferrag/

    https://orcid.org/0000-0002-0632-3172

    Last Updated: 27 Mar. 2023

  20. Z

    [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User...

    • data.niaid.nih.gov
    • recerca.uoc.edu
    • +1more
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Krukowski (2022). [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7331437
    Explore at:
    Dataset updated
    Nov 25, 2022
    Dataset provided by
    Ishari
    H. Ulrich
    Nicolás Felipe
    Simon Krukowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Corresponding dataset for the publication "Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects", a conference paper for the conference CollabTech 2022: Collaboration Technologies and Social Computing and published as part of the Lecture Notes in Computer Science book series (LNCS,volume 13632) here. Usernames have been anonymised.

    The structure of the dataset is as follows:

    Annotations

    List of annotations made per day for each of the analysed projects.

    annotations.csv

    Comments

    Total list of comments with several data fields (i.e., comment id, text, reply_user_id)

    comments.csv

    Rolechanges

    List of roles per user to determine number of role changes

    478_rolechanges.csv

    1104_rolechanges.csv

    ...

    Totalnetworkdata

    Network data (edge and node sets) for the given projects (without time slices).

    Edges

    478_edges.csv

    1104_edges.csv

    Nodes

    478_nodes.csv

    1104_nodes.csv

    Trajectories

    Network data (edge and node sets) for the given projects and all time slices (Q1 2016 - Q4 2021)

    478

    Edges

    edges_4782016_q1.csv

    edges_4782016_q2.csv

    edges_4782016_q3.csv

    edges_4782016_q4.csv

    ...

    Nodes

    nodes_4782016_q1.csv

    nodes_4782016_q4.csv

    nodes_4782016_q3.csv

    nodes_4782016_q2.csv

    ...

    1104

    Edges

    ...

    Nodes

    ...

    ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NIYIBIGIRA Geredi (2023). My First Data Science Project [Dataset]. https://www.kaggle.com/datasets/niyibigirageredi/my-first-data-science-project/discussion?sort=undefined
Organization logo

My First Data Science Project

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
NIYIBIGIRA Geredi
Description

Dataset

This dataset was created by NIYIBIGIRA Geredi

Contents

Search
Clear search
Close search
Google apps
Main menu