86 datasets found
  1. Q

    Catalog of Ocean Data Science Initiatives

    • data.qdr.syr.edu
    pdf, txt, xlsx
    Updated May 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lauren Alexandra Drakopulos; Lauren Alexandra Drakopulos; Elizabeth Havice; Elizabeth Havice; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell (2022). Catalog of Ocean Data Science Initiatives [Dataset]. http://doi.org/10.5064/F6ZQWQJS
    Explore at:
    xlsx(344302), pdf(81722), txt(4514), pdf(222143)Available download formats
    Dataset updated
    May 26, 2022
    Dataset provided by
    Qualitative Data Repository
    Authors
    Lauren Alexandra Drakopulos; Lauren Alexandra Drakopulos; Elizabeth Havice; Elizabeth Havice; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell
    License

    https://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions

    Dataset funded by
    National Science Foundation, Human-Environment and Geographical Sciences Program
    Description

    Project Overview This dataset is a catalog of oceans data science initiatives (ODSIs). We define an ODSI as an initiative that mobilizes (often geospatial and temporal) big data and/or novel data sources about the oceans with an express goal of informing or improving conditions in the oceans. ODSI identification began in Jan 2020. Additional ODSIs will continue to be added. We identified more than 150 ODSIs and populated the catalog with data gathered from ODSI websites describing key features of their work including 1) the data infrastructure 2) their organizational structure, 3) the ocean worlds, or ontologies, they create, and 4) the (explicit or implicit) policy and governance ‘solutions’ and relations they promote. The ODSIs in the catalog are global and regional in scope and aim to enhance understanding around three topical concerns: fisheries extraction, biodiversity conservation, and enhancing basic scientific knowledge. Data overview For 100 ODSIs, we created metadata about the data architecture, organizational governance, and world-making practices such as their stated purpose, theory of change, and problem/solution framing. For a subset of 30 ODSIs, we created metadata about their policy and governance stances and practices. All metadata was created based on a textual analysis of their websites and public communications. Data collection overview Sampling strategy: We began with a purposive sample of ODSIs based on the research team’s prior knowledge of and participation in global and regional ODSIs. This sample allowed us to pilot and refine our metadata catalog approach. We then used a combination of keyword searches on Google using search terms such as ‘ocean data’ ‘marine data’ and ‘fisheries data’. Adopting a snowball sampling method, we reviewed the websites of ODSIs that came up in our initial search to find references to additional ODSIs. To determine if an entity was an ODSI, we reviewed web pages for information on purpose, goals, objectives, mission, values (usually in tabs labeled ‘About’ ‘Goals’ or ‘Objectives’) and we looked for links to ‘data’ or ‘data products.’ Entities were selected for our catalog based on two criteria: 1) their stated purpose, goals, objectives, mission, values indicated a commitment to advancing ocean science and data and 2) if they focused on regional or global scales. We selected and categorized ODSIs according to three broad focal areas in global and regional oceans governance: fisheries extraction, biodiversity conservation, and basic ocean science development. Shared data organization This catalog is comprised of three files. 'Havice_ODSIC.pdf' provides a list of each ODSI included in the catalog, and a permalink to the webpage used to populate catalog metadata categories. 'Havice_ODSIC-CodingScheme.pdf' provides a list of code description for the catalog metadata. 'Havice_ODSIC-Metadata.xlsx' is the full catalog with populated metadata.

  2. d

    Website Analytics

    • catalog.data.gov
    • data.brla.gov
    • +2more
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.brla.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics-89ba5
    Explore at:
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    data.brla.gov
    Description

    Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.

  3. S

    Word cloud for data science

    • scidb.cn
    Updated Apr 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lili Zhang (2023). Word cloud for data science [Dataset]. http://doi.org/10.57760/sciencedb.07847
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2023
    Dataset provided by
    Science Data Bank
    Authors
    Lili Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes a .txt file and a .ipynb file. Raw data are captured through Web of Science as retrieval records on 24 February 2023. Refined by only published articles entitled "data science," 3490 pieces of records with abstracts are selected. Besides, the python code for word cloud analysis is also shared. This package provides supporting details for a paper, Looking Back to the Future: A Glimpse at Twenty Years of Data Science, submitted to the Data science Journal.

  4. Data-Science-Book

    • kaggle.com
    Updated Aug 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Waquar Azam (2022). Data-Science-Book [Dataset]. http://doi.org/10.34740/kaggle/dsv/4096198
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Md Waquar Azam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context This dataset holds a list of approx 200 + books in the field of Data science related topics. The list of books was constructed using one of the popular websites Amazon which provide information on book ratings and many details given below.

    There are 6 column

    1. Book_name / book title

    2. Publisher:-- name of the publisher or writer

    3. Buyers ():--it means no of customer who purchase the same book

    4. Cover_type:-- types of cover use to protect the book

    5. stars:--out of 5 * how much rated

    6. Price

    Inspiration I’d like to call the attention of my fellow Kagglers to use Machine Learning and Data Sciences to help me explore these ideas:

    • What is the best-selling book?

    • Find any hidden patterns if you can

    . EDA of dataset

  5. m

    Austin_Survey_for_MDCOR_Analyses

    • data.mendeley.com
    Updated Nov 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Gonzalez Canche (2022). Austin_Survey_for_MDCOR_Analyses [Dataset]. http://doi.org/10.17632/nb7yvhjvzk.1
    Explore at:
    Dataset updated
    Nov 14, 2022
    Authors
    Manuel Gonzalez Canche
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Austin
    Description

    The city of Austin has administered a community survey for the 2015, 2016, 2017, 2018 and 2019 years (https://data.austintexas.gov/City-Government/Community-Survey/s2py-ceb7), to “assess satisfaction with the delivery of the major City Services and to help determine priorities for the community as part of the City’s ongoing planning process.” To directly access this dataset from the city of Austin’s website, you can follow this link https://cutt.ly/VNqq5Kd. Although we downloaded the dataset analyzed in this study from the former link, given that the city of Austin is interested in continuing administering this survey, there is a chance that the data we used for this analysis and the data hosted in the city of Austin’s website may differ in the following years. Accordingly, to ensure the replication of our findings, we recommend researchers to download and analyze the dataset we employed in our analyses, which can be accessed at the following link https://github.com/democratizing-data-science/MDCOR/blob/main/Community_Survey.csv. Replication Features or Variables The community survey data has 10,684 rows and 251 columns. Of these columns, our analyses will rely on the following three indicators that are taken verbatim from the survey: “ID”, “Q25 - If there was one thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?", and “Do you own or rent your home?”

  6. d

    Website Analytics

    • catalog.data.gov
    • data.nola.gov
    • +4more
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nola.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    data.nola.gov
    Description

    This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.

  7. A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    csv
    Updated Jul 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur; Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and other sources about the 2024 outbreak of Measles [Dataset]. http://doi.org/10.5281/zenodo.11711230
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nirmalya Thakur; Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 15, 2024
    Area covered
    YouTube
    Description

    Please cite the following paper when using this dataset:

    N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A. Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” Proceedings of the 26th International Conference on Human-Computer Interaction (HCII 2024), Washington, USA, 29 June - 4 July 2024. (Accepted as a Late Breaking Paper, Preprint Available at: https://doi.org/10.48550/arXiv.2406.07693)

    Abstract

    This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.

  8. r

    International Journal of Engineering and Advanced Technology Publication fee...

    • researchhelpdesk.org
    Updated Jun 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). International Journal of Engineering and Advanced Technology Publication fee - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/publication-fee/552/international-journal-of-engineering-and-advanced-technology
    Explore at:
    Dataset updated
    Jun 25, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    International Journal of Engineering and Advanced Technology Publication fee - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level

  9. Data Scientist Role in-2020

    • kaggle.com
    Updated Jul 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vikas Bhadoria (2020). Data Scientist Role in-2020 [Dataset]. https://www.kaggle.com/vikasbhadoria/data-scientist-role-in2020/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 29, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vikas Bhadoria
    Description

    Abstract

    Are you looking for a career transition to data science? Looking for a job in data science? This dataset can help you.😃

    Content

    Using this data one can try to find out the skills and trends that are most sought in the industry right now for data scientist. This whole data consists of information related to only Data science jobs in India. The data have been gathered from the top job hunt website in India- > Naukri.com, which almost every job aspirant uses these days. Selenium-python is been used for web scraping. The scrapped data consists of these 5 important features(columns): - Job Roles. - Company name. - Experience required. - Location. - Key Skills

    Answers that you can find here:

    • What are the top skills companies are looking for?
    • What is the most desired experience level in the industry?
    • What are the companies that are actively offering jobs in this field?
    • What are the locations that have more openings?
  10. o

    Data and supplementary information for 'Source related argumentation found...

    • openicpsr.org
    delimited
    Updated Jul 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ralph Barnes; Zoë Neumann; Samuel Draznin-Nagy (2020). Data and supplementary information for 'Source related argumentation found in science websites: a quantitative study' [Dataset]. http://doi.org/10.3886/E120216V2
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Jul 7, 2020
    Dataset provided by
    Montana State University
    Authors
    Ralph Barnes; Zoë Neumann; Samuel Draznin-Nagy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    NA
    Description


    This is the summary for online materials that supplement:
    Source related argumentation found in science websites: a quantitative study
    by Ralph M. Barnes, Zoë Noel Neumann, and Samuel Draznin-Nagy.

    The raw data is presented in two CSV files. The two file names are:
    raw data for text
    raw data for source codes

    The coding rubrics are presented in three CSV files. The three file names are:
    initial rubric
    action rubric
    positive rubric

    For all CSV files, brief column labels can be found in row #2. More detailed column labels can be found in row #1.

    One additional table can be found in a pdf file named:
    Supplemental table S1. Source search strings


  11. Data Science in Biomedicine - Web of Science datasets

    • zenodo.org
    txt
    Updated Mar 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yovaninna Alarcón Soto; Yovaninna Alarcón Soto (2020). Data Science in Biomedicine - Web of Science datasets [Dataset]. http://doi.org/10.5281/zenodo.3735063
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 31, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yovaninna Alarcón Soto; Yovaninna Alarcón Soto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets from the Web of Science search for the number of publications associated with the topics "Data Science", "Big Data" and "Cloud Computing" from 2004 to 2019 in 9 different countries.

  12. d

    TagX Web Browsing clickstream Data - 300K Users North America, EU - GDPR -...

    • datarade.ai
    .json, .csv, .xls
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TagX (2024). TagX Web Browsing clickstream Data - 300K Users North America, EU - GDPR - CCPA Compliant [Dataset]. https://datarade.ai/data-products/tagx-web-browsing-clickstream-data-300k-users-north-america-tagx
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 16, 2024
    Dataset authored and provided by
    TagX
    Area covered
    United States
    Description

    TagX Web Browsing Clickstream Data: Unveiling Digital Behavior Across North America and EU Unique Insights into Online User Behavior TagX Web Browsing clickstream Data offers an unparalleled window into the digital lives of 1 million users across North America and the European Union. This comprehensive dataset stands out in the market due to its breadth, depth, and stringent compliance with data protection regulations. What Makes Our Data Unique?

    Extensive Geographic Coverage: Spanning two major markets, our data provides a holistic view of web browsing patterns in developed economies. Large User Base: With 300K active users, our dataset offers statistically significant insights across various demographics and user segments. GDPR and CCPA Compliance: We prioritize user privacy and data protection, ensuring that our data collection and processing methods adhere to the strictest regulatory standards. Real-time Updates: Our clickstream data is continuously refreshed, providing up-to-the-minute insights into evolving online trends and user behaviors. Granular Data Points: We capture a wide array of metrics, including time spent on websites, click patterns, search queries, and user journey flows.

    Data Sourcing: Ethical and Transparent Our web browsing clickstream data is sourced through a network of partnered websites and applications. Users explicitly opt-in to data collection, ensuring transparency and consent. We employ advanced anonymization techniques to protect individual privacy while maintaining the integrity and value of the aggregated data. Key aspects of our data sourcing process include:

    Voluntary user participation through clear opt-in mechanisms Regular audits of data collection methods to ensure ongoing compliance Collaboration with privacy experts to implement best practices in data anonymization Continuous monitoring of regulatory landscapes to adapt our processes as needed

    Primary Use Cases and Verticals TagX Web Browsing clickstream Data serves a multitude of industries and use cases, including but not limited to:

    Digital Marketing and Advertising:

    Audience segmentation and targeting Campaign performance optimization Competitor analysis and benchmarking

    E-commerce and Retail:

    Customer journey mapping Product recommendation enhancements Cart abandonment analysis

    Media and Entertainment:

    Content consumption trends Audience engagement metrics Cross-platform user behavior analysis

    Financial Services:

    Risk assessment based on online behavior Fraud detection through anomaly identification Investment trend analysis

    Technology and Software:

    User experience optimization Feature adoption tracking Competitive intelligence

    Market Research and Consulting:

    Consumer behavior studies Industry trend analysis Digital transformation strategies

    Integration with Broader Data Offering TagX Web Browsing clickstream Data is a cornerstone of our comprehensive digital intelligence suite. It seamlessly integrates with our other data products to provide a 360-degree view of online user behavior:

    Social Media Engagement Data: Combine clickstream insights with social media interactions for a holistic understanding of digital footprints. Mobile App Usage Data: Cross-reference web browsing patterns with mobile app usage to map the complete digital journey. Purchase Intent Signals: Enrich clickstream data with purchase intent indicators to power predictive analytics and targeted marketing efforts. Demographic Overlays: Enhance web browsing data with demographic information for more precise audience segmentation and targeting.

    By leveraging these complementary datasets, businesses can unlock deeper insights and drive more impactful strategies across their digital initiatives. Data Quality and Scale We pride ourselves on delivering high-quality, reliable data at scale:

    Rigorous Data Cleaning: Advanced algorithms filter out bot traffic, VPNs, and other non-human interactions. Regular Quality Checks: Our data science team conducts ongoing audits to ensure data accuracy and consistency. Scalable Infrastructure: Our robust data processing pipeline can handle billions of daily events, ensuring comprehensive coverage. Historical Data Availability: Access up to 24 months of historical data for trend analysis and longitudinal studies. Customizable Data Feeds: Tailor the data delivery to your specific needs, from raw clickstream events to aggregated insights.

    Empowering Data-Driven Decision Making In today's digital-first world, understanding online user behavior is crucial for businesses across all sectors. TagX Web Browsing clickstream Data empowers organizations to make informed decisions, optimize their digital strategies, and stay ahead of the competition. Whether you're a marketer looking to refine your targeting, a product manager seeking to enhance user experience, or a researcher exploring digital trends, our cli...

  13. Article: web scraping in data science

    • kaggle.com
    Updated Dec 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rania Tarek Fleifel (2022). Article: web scraping in data science [Dataset]. https://www.kaggle.com/datasets/raniatarekfleifel/articleupload-web-scraping-in-data-science
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rania Tarek Fleifel
    Description

    Dataset

    This dataset was created by Rania Tarek Fleifel

    Contents

  14. Web Analytics Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Web Analytics Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/web-analytics-market-global-industry-analysis
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Web Analytics Market Outlook



    According to our latest research, the global web analytics market size was valued at USD 8.4 billion in 2024, reflecting robust growth driven by the increasing adoption of digital platforms across industries. The market is projected to expand at a compound annual growth rate (CAGR) of 17.2% from 2025 to 2033, reaching an estimated USD 36.8 billion by 2033. This significant upsurge is primarily attributed to the escalating demand for actionable insights, data-driven decision-making, and the proliferation of online consumer activity. As per the latest research, enterprises worldwide are leveraging advanced web analytics tools to enhance customer engagement, improve marketing strategies, and drive business outcomes.




    One of the principal growth factors fueling the web analytics market is the exponential increase in digitalization and internet penetration. Organizations across various sectors are rapidly transitioning their operations online, resulting in a surge of data generation through multiple digital touchpoints. This digital transformation has heightened the need for sophisticated web analytics solutions that can process vast volumes of data, extract meaningful patterns, and provide actionable insights. Moreover, the rise in e-commerce activities, coupled with the growing popularity of social media platforms, has created a fertile environment for the adoption of web analytics, enabling businesses to track consumer behavior, measure campaign effectiveness, and optimize user experiences.




    Another critical driver for the web analytics market is the integration of artificial intelligence (AI) and machine learning (ML) technologies. These advanced technologies are revolutionizing the way organizations analyze web data by enabling predictive analytics, real-time reporting, and personalized recommendations. AI-powered web analytics tools can automatically identify trends, anomalies, and customer preferences, empowering businesses to make data-driven decisions faster and more accurately. Furthermore, the increasing focus on omnichannel marketing strategies and the need to unify customer data across different platforms have further accelerated the demand for comprehensive web analytics solutions.




    The regulatory landscape and growing emphasis on data privacy and compliance are also shaping the web analytics market. With the implementation of stringent data protection regulations such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States, organizations are compelled to adopt web analytics tools that ensure data security and privacy. This has led to the development of privacy-centric analytics platforms that offer enhanced data governance features, enabling businesses to comply with global regulatory requirements while still deriving valuable insights from web data. The ability to balance data-driven innovation with privacy considerations is becoming a key differentiator for vendors in this dynamic market.




    From a regional perspective, North America continues to dominate the web analytics market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The region’s leadership is attributed to the presence of major technology providers, a mature digital ecosystem, and high levels of investment in analytics infrastructure. However, Asia Pacific is expected to witness the fastest growth during the forecast period, driven by the rapid adoption of digital technologies, expanding internet user base, and increasing investments in e-commerce and digital marketing. The growing awareness among businesses in emerging economies about the benefits of web analytics is further propelling market growth in this region.





    Component Analysis



    The web analytics market by component is bifurcated into software and services, with each segment playing a pivotal role in market expansion. The software segment holds the lion’s share of the market, driven by the continuous evolution of analytics plat

  15. f

    Identifiers for the 21st century: How to design, provision, and reuse...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie A. McMurry; Nick Juty; Niklas Blomberg; Tony Burdett; Tom Conlin; Nathalie Conte; Mélanie Courtot; John Deck; Michel Dumontier; Donal K. Fellows; Alejandra Gonzalez-Beltran; Philipp Gormanns; Jeffrey Grethe; Janna Hastings; Jean-Karim Hériché; Henning Hermjakob; Jon C. Ison; Rafael C. Jimenez; Simon Jupp; John Kunze; Camille Laibe; Nicolas Le Novère; James Malone; Maria Jesus Martin; Johanna R. McEntyre; Chris Morris; Juha Muilu; Wolfgang Müller; Philippe Rocca-Serra; Susanna-Assunta Sansone; Murat Sariyar; Jacky L. Snoep; Stian Soiland-Reyes; Natalie J. Stanford; Neil Swainston; Nicole Washington; Alan R. Williams; Sarala M. Wimalaratne; Lilly M. Winfree; Katherine Wolstencroft; Carole Goble; Christopher J. Mungall; Melissa A. Haendel; Helen Parkinson (2023). Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data [Dataset]. http://doi.org/10.1371/journal.pbio.2001414
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Biology
    Authors
    Julie A. McMurry; Nick Juty; Niklas Blomberg; Tony Burdett; Tom Conlin; Nathalie Conte; Mélanie Courtot; John Deck; Michel Dumontier; Donal K. Fellows; Alejandra Gonzalez-Beltran; Philipp Gormanns; Jeffrey Grethe; Janna Hastings; Jean-Karim Hériché; Henning Hermjakob; Jon C. Ison; Rafael C. Jimenez; Simon Jupp; John Kunze; Camille Laibe; Nicolas Le Novère; James Malone; Maria Jesus Martin; Johanna R. McEntyre; Chris Morris; Juha Muilu; Wolfgang Müller; Philippe Rocca-Serra; Susanna-Assunta Sansone; Murat Sariyar; Jacky L. Snoep; Stian Soiland-Reyes; Natalie J. Stanford; Neil Swainston; Nicole Washington; Alan R. Williams; Sarala M. Wimalaratne; Lilly M. Winfree; Katherine Wolstencroft; Carole Goble; Christopher J. Mungall; Melissa A. Haendel; Helen Parkinson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

  16. h

    WebDS

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebDS, WebDS [Dataset]. https://huggingface.co/datasets/yamhm/WebDS
    Explore at:
    Authors
    WebDS
    Description

    WebDS: A Benchmark for Web-based Data Science

    WebDS is the first end-to-end benchmark designed for evaluating agents on real-world web-based data science workflows. It contains 870 tasks across 29 containerized websites spanning 10 domains, including economics, health, climate, and scientific research. Agents are tested on:

    Multi-hop web navigation Structured and unstructured data processing Tool usage (e.g., Python scripts, visualization tools) Downstream task completion (e.g.… See the full description on the dataset page: https://huggingface.co/datasets/yamhm/WebDS.

  17. A

    ‘Coursera Course Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Coursera Course Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-coursera-course-dataset-839a/86aaffe7/?iid=003-735&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Coursera Course Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/siddharthm1698/coursera-course-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This is a dataset i generated during a hackathon for project purpose. Here i have scrapped data from Coursera official web site. Our project aims to help any new learner get the right course to learn by just answering a few questions. It is an intelligent course recommendation system. Hence we had to scrap data from few educational websites. This is data scrapped from Coursera website. For the project visit: https://github.com/Siddharth1698/Coursu . Please do show your support by following us. I have just started to learn on data science and hope this dataset will be helpful to someone for his/her personal purposes. The scrapping code is here : https://github.com/Siddharth1698/Coursera-Course-Dataset Article about the dataset generation : https://medium.com/analytics-vidhya/web-scraping-and-coursera-8db6af45d83f

    Content

    This dataset contains mainly 6 columns and 890 course data. The detailed description: 1. course_title : Contains the course title. 2. course_organization : It tells which organization is conducting the courses. 3. course_Certificate_type : It has details about what are the different certifications available in courses. 4. course_rating : It has the ratings associated with each course. 5. course_difficulty : It tells about how difficult or what is the level of the course. 6. course_students_enrolled : It has the number of students that are enrolled in the course.

    Inspiration

    This is just one of my first scrapped dataset. Follow my GitHub for more: https://github.com/Siddharth1698

    --- Original source retains full ownership of the source dataset ---

  18. o

    Website Categorisation Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Website Categorisation Dataset [Dataset]. https://www.opendatabay.com/data/dataset/42ebfeae-a971-4d33-af3d-41401587cd49
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Website Analytics & User Experience
    Description

    This dataset provides a collection of website URLs and their corresponding cleaned text content, which have been categorised into various topics. It is designed to facilitate website classification tasks, offering valuable insights for web analytics and user experience analysis. The data was created by extracting and cleaning text from different websites, then assigning categories based on this content.

    Columns

    • index: An identifier for each row in the dataset.
    • website_url: The URL link of the website.
    • cleaned_website_text: The cleaned text content extracted from the website URL.
    • Category: The assigned category of the URL.

    Distribution

    The dataset comprises 1408 rows of data. It is typically available in a CSV file format. The categories present in the dataset include 'Education' (8%), 'Business/Corporate' (8%), and 'Other' (84%), reflecting a diverse range of website types. There are 1375 unique website URLs and 1407 unique categories.

    Usage

    This dataset is ideal for various applications, including: * Website classification: Training models to automatically assign categories to new websites. * Website analytics: Understanding the topical distribution of websites. * User experience studies: Analysing website content for improved user engagement. * Data visualisation: Creating visual representations of website categories. * Natural Language Processing (NLP) tasks: Developing and testing NLP models for text extraction and categorisation. * Multiclass classification problems: Serving as a foundation for building complex classification algorithms.

    Coverage

    The dataset offers global coverage, encompassing websites from various regions.

    License

    CCO

    Who Can Use It

    This dataset is suitable for: * Beginner data scientists and analysts looking to practice classification, NLP, and data visualisation. * Machine learning engineers developing and testing multiclass classification models. * Researchers interested in web content analysis and automatic categorisation. * Developers building applications that require website categorisation capabilities.

    Dataset Name Suggestions

    • Website Categorisation Dataset
    • Web Content Classification
    • URL Classification Data
    • Cleaned Website Text Categories
    • Web Page Classification Repository

    Attributes

    Original Data Source: Website Classification

  19. o

    Data Science Career Opportunities (USA)

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Data Science Career Opportunities (USA) [Dataset]. https://www.opendatabay.com/data/ai-ml/6d1c5965-8fb2-4749-a8bd-f1c40861b401
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States, Data Science and Analytics
    Description

    This dataset provides valuable insights into the US data science job market, containing detailed job listings scraped from the Indeed web portal on 20th November 2022. It is ideal for those seeking to understand job trends, analyse salary expectations, or develop skills in data analysis, machine learning, and natural language processing. The dataset's purpose is to offer a snapshot of available positions across various data science roles, including data scientists, machine learning engineers, and business analysts. It serves as a rich resource for exploratory data analysis, feature engineering, and predictive modelling tasks.

    Columns

    • Title: The job title of the listed position.
    • Company: The hiring company posting the job.
    • Location: The geographic location of the job within the US.
    • Rating: The rating associated with the job or company.
    • Date: Indicates how long the job had been posted prior to 20th November 2022.
    • Salary: The salary information provided in US Dollars ($). Please note that many entries in this column may be missing as salary details are often not disclosed in job listings.
    • Description: A brief summary description of the job.
    • Links: The direct link to the original job posting on the Indeed platform.
    • Descriptions: The full-length description of the job, encompassing all details found in the complete job posting.

    Distribution

    This dataset is provided as a single data file, typically in CSV format. It comprises 1200 rows (records) and 9 distinct columns. The file name is data_science_jobs_indeed_us.csv.

    Usage

    This dataset is perfectly suited for a variety of analytical tasks and applications: * Data Cleaning and Preparation: Practise handling missing values, especially in the 'Salary' column. * Exploratory Data Analysis (EDA): Discover trends in job titles, company types, and locations. * Feature Engineering: Extract new features from the 'Descriptions' column, such as required skills, education levels, or experience. * Classification and Clustering: Develop models for salary prediction, or perform skill clustering analysis to guide curriculum development. * Text Processing and Natural Language Processing (NLP): Analyse job descriptions to identify common skill demands or industry buzzwords.

    Coverage

    The dataset's geographic scope is limited to job postings within the United States. All data was collected on 20th November 2022, with the 'Date' column providing information on how long each job had been active before this date. The dataset covers a wide range of data science positions, including roles such as data scientist, machine learning engineer, data engineer, business analyst, and data science manager. It is important to note the presence of many missing entries in the 'Salary' column, reflecting common data availability challenges in job listings.

    License

    CCO

    Who Can Use It

    This dataset is an excellent resource for: * Aspiring Data Scientists and Machine Learning Engineers: To sharpen their data cleaning, EDA, and model deployment skills. * Educators and Curriculum Developers: To inform and guide the development of relevant data science and analytics courses based on real-world job market demands. * Job Seekers: To understand the current landscape of data science roles, required skills, and potential salary ranges. * Researchers and Analysts: To glean insights into labour market trends in the data science domain. * Human Resources Professionals: To benchmark job roles, skill requirements, and compensation within the industry.

    Dataset Name Suggestions

    • Indeed US Data Science Job Insights
    • US Data Science Job Market Analysis
    • Data Professional Job Postings (Indeed USA)
    • Data Science Career Opportunities (USA)

    Attributes

    Original Data Source: Data Science Job Postings (Indeed USA)

  20. M

    Data Science and Machine Learning Service Market By Key Players (ZS, Amazon...

    • marketresearchstore.com
    pdf
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Store (2025). Data Science and Machine Learning Service Market By Key Players (ZS, Amazon Web Services, Bigml, Hewlett-Packard Enterprise Development); Global Report by Size, Share, Industry Analysis, Growth Trends, Regional Outlook, and Forecast 2024-2032 [Dataset]. https://www.marketresearchstore.com/market-insights/data-science-and-machine-learning-service-market-797270
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    Market Research Store
    License

    https://www.marketresearchstore.com/privacy-statementhttps://www.marketresearchstore.com/privacy-statement

    Time period covered
    2022 - 2030
    Area covered
    Global
    Description

    [Keywords] Market include Microsoft, DataScience.com, Bigml, ZS, International Business Machine

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lauren Alexandra Drakopulos; Lauren Alexandra Drakopulos; Elizabeth Havice; Elizabeth Havice; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell (2022). Catalog of Ocean Data Science Initiatives [Dataset]. http://doi.org/10.5064/F6ZQWQJS

Catalog of Ocean Data Science Initiatives

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
xlsx(344302), pdf(81722), txt(4514), pdf(222143)Available download formats
Dataset updated
May 26, 2022
Dataset provided by
Qualitative Data Repository
Authors
Lauren Alexandra Drakopulos; Lauren Alexandra Drakopulos; Elizabeth Havice; Elizabeth Havice; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell; Katie Crisp; Ana Zurita Posas; Lisa M. Campbell
License

https://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions

Dataset funded by
National Science Foundation, Human-Environment and Geographical Sciences Program
Description

Project Overview This dataset is a catalog of oceans data science initiatives (ODSIs). We define an ODSI as an initiative that mobilizes (often geospatial and temporal) big data and/or novel data sources about the oceans with an express goal of informing or improving conditions in the oceans. ODSI identification began in Jan 2020. Additional ODSIs will continue to be added. We identified more than 150 ODSIs and populated the catalog with data gathered from ODSI websites describing key features of their work including 1) the data infrastructure 2) their organizational structure, 3) the ocean worlds, or ontologies, they create, and 4) the (explicit or implicit) policy and governance ‘solutions’ and relations they promote. The ODSIs in the catalog are global and regional in scope and aim to enhance understanding around three topical concerns: fisheries extraction, biodiversity conservation, and enhancing basic scientific knowledge. Data overview For 100 ODSIs, we created metadata about the data architecture, organizational governance, and world-making practices such as their stated purpose, theory of change, and problem/solution framing. For a subset of 30 ODSIs, we created metadata about their policy and governance stances and practices. All metadata was created based on a textual analysis of their websites and public communications. Data collection overview Sampling strategy: We began with a purposive sample of ODSIs based on the research team’s prior knowledge of and participation in global and regional ODSIs. This sample allowed us to pilot and refine our metadata catalog approach. We then used a combination of keyword searches on Google using search terms such as ‘ocean data’ ‘marine data’ and ‘fisheries data’. Adopting a snowball sampling method, we reviewed the websites of ODSIs that came up in our initial search to find references to additional ODSIs. To determine if an entity was an ODSI, we reviewed web pages for information on purpose, goals, objectives, mission, values (usually in tabs labeled ‘About’ ‘Goals’ or ‘Objectives’) and we looked for links to ‘data’ or ‘data products.’ Entities were selected for our catalog based on two criteria: 1) their stated purpose, goals, objectives, mission, values indicated a commitment to advancing ocean science and data and 2) if they focused on regional or global scales. We selected and categorized ODSIs according to three broad focal areas in global and regional oceans governance: fisheries extraction, biodiversity conservation, and basic ocean science development. Shared data organization This catalog is comprised of three files. 'Havice_ODSIC.pdf' provides a list of each ODSI included in the catalog, and a permalink to the webpage used to populate catalog metadata categories. 'Havice_ODSIC-CodingScheme.pdf' provides a list of code description for the catalog metadata. 'Havice_ODSIC-Metadata.xlsx' is the full catalog with populated metadata.

Search
Clear search
Close search
Google apps
Main menu