100+ datasets found
  1. Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  2. PISA Data Analysis Manual: SAS, Second Edition

    • catalog.data.gov
    Updated Mar 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). PISA Data Analysis Manual: SAS, Second Edition [Dataset]. https://catalog.data.gov/dataset/pisa-data-analysis-manual-sas-second-edition
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    United States Department of Statehttp://state.gov/
    Description

    The OECD Programme for International Student Assessment (PISA) surveys collected data on students’ performance in reading, mathematics and science, as well as contextual information on students’ background, home characteristics and school factors which could influence performance. This publication includes detailed information on how to analyse the PISA data, enabling researchers to both reproduce the initial results and to undertake further analyses. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the PISA 2006 database. It also includes worked examples providing full syntax in SAS

  3. H

    Political Analysis Using R: Example Code and Data, Plus Data for Practice...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jamie Monogan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

  4. cases study1 example for google data analytics

    • kaggle.com
    zip
    Updated Apr 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mohammed hatem (2023). cases study1 example for google data analytics [Dataset]. https://www.kaggle.com/datasets/mohammedhatem/cases-study1-example-for-google-data-analytics
    Explore at:
    zip(25278847 bytes)Available download formats
    Dataset updated
    Apr 22, 2023
    Authors
    mohammed hatem
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    In the way of my journey to earn the google data analytics certificate I will practice real world example by following the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Picking the Bellabeat example.

  5. Best Database Types for Data Analytics by Industry

    • blog.devart.com
    html
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Devart (2025). Best Database Types for Data Analytics by Industry [Dataset]. https://blog.devart.com/best-database-for-data-analytics.html
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Devart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Industry, Database Type, Common Databases
    Description

    A guide to choosing the most suitable database types for data analytics across different industries, including examples of common databases.

  6. Data Analytics Market Analysis, Size, and Forecast 2025-2029: North America...

    • technavio.com
    pdf
    Updated Jan 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Data Analytics Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, and UK), Middle East and Africa (UAE), APAC (China, India, Japan, and South Korea), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/data-analytics-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 11, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Description

    Snapshot img

    Data Analytics Market Size 2025-2029

    The data analytics market size is forecast to increase by USD 288.7 billion, at a CAGR of 14.7% between 2024 and 2029.

    The market is driven by the extensive use of modern technology in company operations, enabling businesses to extract valuable insights from their data. The prevalence of the Internet and the increased use of linked and integrated technologies have facilitated the collection and analysis of vast amounts of data from various sources. This trend is expected to continue as companies seek to gain a competitive edge by making data-driven decisions. However, the integration of data from different sources poses significant challenges. Ensuring data accuracy, consistency, and security is crucial as companies deal with large volumes of data from various internal and external sources. Additionally, the complexity of data analytics tools and the need for specialized skills can hinder adoption, particularly for smaller organizations with limited resources. Companies must address these challenges by investing in robust data management systems, implementing rigorous data validation processes, and providing training and development opportunities for their employees. By doing so, they can effectively harness the power of data analytics to drive growth and improve operational efficiency.

    What will be the Size of the Data Analytics Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleIn the dynamic and ever-evolving the market, entities such as explainable AI, time series analysis, data integration, data lakes, algorithm selection, feature engineering, marketing analytics, computer vision, data visualization, financial modeling, real-time analytics, data mining tools, and KPI dashboards continue to unfold and intertwine, shaping the industry's landscape. The application of these technologies spans various sectors, from risk management and fraud detection to conversion rate optimization and social media analytics. ETL processes, data warehousing, statistical software, data wrangling, and data storytelling are integral components of the data analytics ecosystem, enabling organizations to extract insights from their data. Cloud computing, deep learning, and data visualization tools further enhance the capabilities of data analytics platforms, allowing for advanced data-driven decision making and real-time analysis. Marketing analytics, clustering algorithms, and customer segmentation are essential for businesses seeking to optimize their marketing strategies and gain a competitive edge. Regression analysis, data visualization tools, and machine learning algorithms are instrumental in uncovering hidden patterns and trends, while predictive modeling and causal inference help organizations anticipate future outcomes and make informed decisions. Data governance, data quality, and bias detection are crucial aspects of the data analytics process, ensuring the accuracy, security, and ethical use of data. Supply chain analytics, healthcare analytics, and financial modeling are just a few examples of the diverse applications of data analytics, demonstrating the industry's far-reaching impact. Data pipelines, data mining, and model monitoring are essential for maintaining the continuous flow of data and ensuring the accuracy and reliability of analytics models. The integration of various data analytics tools and techniques continues to evolve, as the industry adapts to the ever-changing needs of businesses and consumers alike.

    How is this Data Analytics Industry segmented?

    The data analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. ComponentServicesSoftwareHardwareDeploymentCloudOn-premisesTypePrescriptive AnalyticsPredictive AnalyticsCustomer AnalyticsDescriptive AnalyticsOthersApplicationSupply Chain ManagementEnterprise Resource PlanningDatabase ManagementHuman Resource ManagementOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyUKMiddle East and AfricaUAEAPACChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)

    By Component Insights

    The services segment is estimated to witness significant growth during the forecast period.The market is experiencing significant growth as businesses increasingly rely on advanced technologies to gain insights from their data. Natural language processing is a key component of this trend, enabling more sophisticated analysis of unstructured data. Fraud detection and data security solutions are also in high demand, as companies seek to protect against threats and maintain customer trust. Data analytics platforms, including cloud-based offerings, are driving innovatio

  7. Data from: Teaching and Learning Data Visualization: Ideas and Assignments

    • tandf.figshare.com
    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deborah Nolan; Jamis Perrett (2023). Teaching and Learning Data Visualization: Ideas and Assignments [Dataset]. http://doi.org/10.6084/m9.figshare.1627940.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Deborah Nolan; Jamis Perrett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article discusses how to make statistical graphics a more prominent element of the undergraduate statistics curricula. The focus is on several different types of assignments that exemplify how to incorporate graphics into a course in a pedagogically meaningful way. These assignments include having students deconstruct and reconstruct plots, copy masterful graphs, create one-minute visual revelations, convert tables into “pictures,” and develop interactive visualizations, for example, with the virtual earth as a plotting canvas. In addition to describing the goals and details of each assignment, we also discuss the broader topic of graphics and key concepts that we think warrant inclusion in the statistics curricula. We advocate that more attention needs to be paid to this fundamental field of statistics at all levels, from introductory undergraduate through graduate level courses. With the rapid rise of tools to visualize data, for example, Google trends, GapMinder, ManyEyes, and Tableau, and the increased use of graphics in the media, understanding the principles of good statistical graphics, and having the ability to create informative visualizations is an ever more important aspect of statistics education. Supplementary materials containing code and data for the assignments are available online.

  8. Survival Analysis - Example Data

    • kaggle.com
    zip
    Updated Apr 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lambros Georgiades (2023). Survival Analysis - Example Data [Dataset]. https://www.kaggle.com/datasets/lambrosgeorgiades/sa-data
    Explore at:
    zip(565 bytes)Available download formats
    Dataset updated
    Apr 27, 2023
    Authors
    Lambros Georgiades
    Description

    This is a made-up dataset in the context of a test group vs placebo group study that is used in a report introducing the Kaplan-Meier estimation and the Cox proportional hazards model.

  9. Big data and business analytics revenue worldwide 2015-2022

    • statista.com
    Updated Aug 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2021). Big data and business analytics revenue worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/551501/worldwide-big-data-business-analytics-revenue/
    Explore at:
    Dataset updated
    Aug 17, 2021
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global big data and business analytics (BDA) market was valued at ***** billion U.S. dollars in 2018 and is forecast to grow to ***** billion U.S. dollars by 2021. In 2021, more than half of BDA spending will go towards services. IT services is projected to make up around ** billion U.S. dollars, and business services will account for the remainder. Big data High volume, high velocity and high variety: one or more of these characteristics is used to define big data, the kind of data sets that are too large or too complex for traditional data processing applications. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. For example, connected IoT devices are projected to generate **** ZBs of data in 2025. Business analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate business insights. The size of the business intelligence and analytics software application market is forecast to reach around **** billion U.S. dollars in 2022. Growth in this market is driven by a focus on digital transformation, a demand for data visualization dashboards, and an increased adoption of cloud.

  10. Understanding and Managing Missing Data.pdf

    • figshare.com
    pdf
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim Denis Fofanah (2025). Understanding and Managing Missing Data.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.29265155.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ibrahim Denis Fofanah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This document provides a clear and practical guide to understanding missing data mechanisms, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Through real-world scenarios and examples, it explains how different types of missingness impact data analysis and decision-making. It also outlines common strategies for handling missing data, including deletion techniques and imputation methods such as mean imputation, regression, and stochastic modeling.Designed for researchers, analysts, and students working with real-world datasets, this guide helps ensure statistical validity, reduce bias, and improve the overall quality of analysis in fields like public health, behavioral science, social research, and machine learning.

  11. Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    pdf
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 8, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    United States
    Description

    Snapshot img

    Data Science Platform Market Size 2025-2029

    The data science platform market size is valued to increase USD 763.9 million, at a CAGR of 40.2% from 2024 to 2029. Integration of AI and ML technologies with data science platforms will drive the data science platform market.

    Major Market Trends & Insights

    North America dominated the market and accounted for a 48% growth during the forecast period.
    By Deployment - On-premises segment was valued at USD 38.70 million in 2023
    By Component - Platform segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 1.00 million
    Market Future Opportunities: USD 763.90 million
    CAGR : 40.2%
    North America: Largest market in 2023
    

    Market Summary

    The market represents a dynamic and continually evolving landscape, underpinned by advancements in core technologies and applications. Key technologies, such as machine learning and artificial intelligence, are increasingly integrated into data science platforms to enhance predictive analytics and automate data processing. Additionally, the emergence of containerization and microservices in data science platforms enables greater flexibility and scalability. However, the market also faces challenges, including data privacy and security risks, which necessitate robust compliance with regulations.
    According to recent estimates, the market is expected to account for over 30% of the overall big data analytics market by 2025, underscoring its growing importance in the data-driven business landscape.
    

    What will be the Size of the Data Science Platform Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the Data Science Platform Market Segmented and what are the key trends of market segmentation?

    The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      On-premises
      Cloud
    
    
    Component
    
      Platform
      Services
    
    
    End-user
    
      BFSI
      Retail and e-commerce
      Manufacturing
      Media and entertainment
      Others
    
    
    Sector
    
      Large enterprises
      SMEs
    
    
    Application
    
      Data Preparation
      Data Visualization
      Machine Learning
      Predictive Analytics
      Data Governance
      Others
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        France
        Germany
        UK
    
    
      Middle East and Africa
    
        UAE
    
    
      APAC
    
        China
        India
        Japan
    
    
      South America
    
        Brazil
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The on-premises segment is estimated to witness significant growth during the forecast period.

    In the dynamic and evolving the market, big data processing is a key focus, enabling advanced model accuracy metrics through various data mining methods. Distributed computing and algorithm optimization are integral components, ensuring efficient handling of large datasets. Data governance policies are crucial for managing data security protocols and ensuring data lineage tracking. Software development kits, model versioning, and anomaly detection systems facilitate seamless development, deployment, and monitoring of predictive modeling techniques, including machine learning algorithms, regression analysis, and statistical modeling. Real-time data streaming and parallelized algorithms enable real-time insights, while predictive modeling techniques and machine learning algorithms drive business intelligence and decision-making.

    Cloud computing infrastructure, data visualization tools, high-performance computing, and database management systems support scalable data solutions and efficient data warehousing. ETL processes and data integration pipelines ensure data quality assessment and feature engineering techniques. Clustering techniques and natural language processing are essential for advanced data analysis. The market is witnessing significant growth, with adoption increasing by 18.7% in the past year, and industry experts anticipate a further expansion of 21.6% in the upcoming period. Companies across various sectors are recognizing the potential of data science platforms, leading to a surge in demand for scalable, secure, and efficient solutions.

    API integration services and deep learning frameworks are gaining traction, offering advanced capabilities and seamless integration with existing systems. Data security protocols and model explainability methods are becoming increasingly important, ensuring transparency and trust in data-driven decision-making. The market is expected to continue unfolding, with ongoing advancements in technology and evolving business needs shaping its future trajectory.

    Request Free Sample

    The On-premises segment was valued at USD 38.70 million in 2019 and showed

  12. A data analysis framework for biomedical big data: Application on mesoderm...

    • plos.figshare.com
    txt
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Ulfenborg; Alexander Karlsson; Maria Riveiro; Caroline Améen; Karolina Åkesson; Christian X. Andersson; Peter Sartipy; Jane Synnergren (2023). A data analysis framework for biomedical big data: Application on mesoderm differentiation of human pluripotent stem cells [Dataset]. http://doi.org/10.1371/journal.pone.0179613
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Benjamin Ulfenborg; Alexander Karlsson; Maria Riveiro; Caroline Améen; Karolina Åkesson; Christian X. Andersson; Peter Sartipy; Jane Synnergren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The development of high-throughput biomolecular technologies has resulted in generation of vast omics data at an unprecedented rate. This is transforming biomedical research into a big data discipline, where the main challenges relate to the analysis and interpretation of data into new biological knowledge. The aim of this study was to develop a framework for biomedical big data analytics, and apply it for analyzing transcriptomics time series data from early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. To this end, transcriptome profiling by microarray was performed on differentiating human pluripotent stem cells sampled at eleven consecutive days. The gene expression data was analyzed using the five-stage analysis framework proposed in this study, including data preparation, exploratory data analysis, confirmatory analysis, biological knowledge discovery, and visualization of the results. Clustering analysis revealed several distinct expression profiles during differentiation. Genes with an early transient response were strongly related to embryonic- and mesendoderm development, for example CER1 and NODAL. Pluripotency genes, such as NANOG and SOX2, exhibited substantial downregulation shortly after onset of differentiation. Rapid induction of genes related to metal ion response, cardiac tissue development, and muscle contraction were observed around day five and six. Several transcription factors were identified as potential regulators of these processes, e.g. POU1F1, TCF4 and TBP for muscle contraction genes. Pathway analysis revealed temporal activity of several signaling pathways, for example the inhibition of WNT signaling on day 2 and its reactivation on day 4. This study provides a comprehensive characterization of biological events and key regulators of the early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. The proposed analysis framework can be used to structure data analysis in future research, both in stem cell differentiation, and more generally, in biomedical big data analytics.

  13. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Isle of Man, Moldova (Republic of), Andorra, British Indian Ocean Territory, Canada, Nepal, Taiwan, Tunisia, Northern Mariana Islands, Bangladesh
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  14. Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN

    • ckan.americaview.org
    Updated Sep 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.americaview.org (2022). Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN [Dataset]. https://ckan.americaview.org/dataset/open-source-spatial-analytics-r
    Explore at:
    Dataset updated
    Sep 10, 2022
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.

  15. PISA 2003 Data Analysis Manual SPSS

    • catalog.data.gov
    • gimi9.com
    • +1more
    Updated Mar 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). PISA 2003 Data Analysis Manual SPSS [Dataset]. https://catalog.data.gov/dataset/pisa-2003-data-analysis-manual-spss
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    United States Department of Statehttp://state.gov/
    Description

    This publication provides all the information required to understand the PISA 2003 educational performance database and perform analyses in accordance with the complex methodologies used to collect and process the data. It enables researchers to both reproduce the initial results and to undertake further analyses. The publication includes introductory chapters explaining the statistical theories and concepts required to analyse the PISA data, including full chapters on how to apply replicate weights and undertake analyses using plausible values; worked examples providing full syntax in SPSS®; and a comprehensive description of the OECD PISA 2003 international database. The PISA 2003 database includes micro-level data on student educational performance for 41 countries collected in 2003, together with students’ responses to the PISA 2003 questionnaires and the test questions. A similar manual is available for SAS users.

  16. H

    Current Population Survey (CPS)

    • dataverse.harvard.edu
    • search.dataone.org
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  17. Google Data Analytics Capstone Project

    • kaggle.com
    zip
    Updated Nov 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NANCY CHAUHAN (2021). Google Data Analytics Capstone Project [Dataset]. https://www.kaggle.com/datasets/nancychauhan199/google-case-study-pdf
    Explore at:
    zip(284279 bytes)Available download formats
    Dataset updated
    Nov 13, 2021
    Authors
    NANCY CHAUHAN
    Description

    Case Study: How Does a Bike-Share Navigate Speedy Success?¶

    Introduction

    Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path. By the end of this lesson, you will have a portfolio-ready case study. Download the packet and reference the details of this case study anytime. Then, when you begin your job hunt, your case study will be a tangible way to demonstrate your knowledge and skills to potential employers.

    Scenario

    You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations. Characters and teams ● Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day. ● Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels. ● Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them. ● Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

    About the company

    In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends

    Three questions will guide the future marketing program:

    How do annual members and casual riders use Cyclistic bikes differently? Why would casual riders buy Cyclistic annual memberships? How can Cyclistic use digital media to influence casual riders to become members? Moreno has assigned you the first question to answer: How do annual members and casual rid...

  18. m

    Raw data outputs 1-18

    • bridges.monash.edu
    • researchdata.edu.au
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

  19. t

    Dropbox S-1 Analysis - The King of Freemium - Data Analysis

    • tomtunguz.com
    Updated Feb 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz Tunguz (2018). Dropbox S-1 Analysis - The King of Freemium - Data Analysis [Dataset]. https://tomtunguz.com/dropbox-s-1/
    Explore at:
    Dataset updated
    Feb 24, 2018
    Dataset provided by
    Theory Ventures
    Authors
    Tomasz Tunguz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analyze Dropbox's path to $1.1B revenue through freemium: 500M users, 2% conversion, 90% self-serve, and 35% CAGR. Key SaaS metrics unpacked.

  20. n

    Matlab example for Local Enrichment Analysis (LEA) analysis with real data

    • data-staging.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Aug 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berend Snijder; Yannik Severin (2022). Matlab example for Local Enrichment Analysis (LEA) analysis with real data [Dataset]. http://doi.org/10.5061/dryad.2jm63xssk
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 29, 2022
    Dataset provided by
    ETH Zurich
    Authors
    Berend Snijder; Yannik Severin
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Phenotypic plasticity is essential to the immune system, yet the factors that shape it are not fully understood. Here, we comprehensively analyze immune cell phenotypes including morphology across human cohorts by single-round multiplexed immunofluorescence, automated microscopy, and deep learning. Using the uncertainty of convolutional neural networks to cluster the phenotypes of 8 distinct immune cell subsets, we find that the resulting maps are influenced by donor age, gender, and blood pressure, revealing distinct polarization and activation-associated phenotypes across immune cell classes. We further associate T-cell morphology to transcriptional state based on their joint donor variability, and validate an inflammation-associated polarized T-cell morphology, and an age-associated loss of mitochondria in CD4+ T-cells. Taken together, we show that immune cell phenotypes reflect both molecular and personal health information, opening new perspectives into the deep immune phenotyping of individual people in health and disease. Methods This dataset accompanies the manuscript "Multiplexed high-throughput immune cell imaging reveals molecular health-associated phenotypes" by Yannik Severin et al., Science Advances, 2022. It includes: - knnlea.m: Matlab function for the presented Local Enrichment Analysis method - LEA_Example_Data.mat containing data from the manuscript to reproduce a LEA analysis - LEA_Example_Script.mat that runs through the analysis steps - README.txt

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Organization logo

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:
txtAvailable download formats
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Search
Clear search
Close search
Google apps
Main menu