100+ datasets found

Amazon data science challenge - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Amazon data science challenge - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/amazon-data-science-challenge
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Amazon data science challenge.
g
Data from: Data Science Problems
github.com
opendatalab.com
Updated Feb 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Data Science Problems [Dataset]. https://github.com/microsoft/DataScienceProblems
Explore at:
Dataset updated
Feb 8, 2022
License
https://github.com/microsoft/DataScienceProblems/blob/main/LICENSE.txthttps://github.com/microsoft/DataScienceProblems/blob/main/LICENSE.txt
Description
Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant (https://arxiv.org/abs/2201.12901) for more details about state of the art results and other properties of the dataset.
d
Amazon data science challenge
catalog.data.gov
data.wu.ac.at
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Amazon data science challenge [Dataset]. https://catalog.data.gov/dataset/amazon-data-science-challenge
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Amazon data science challenge.
Data Science Challenge by Coursera
kaggle.com
zip
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bisma Ridho Pambudi (2025). Data Science Challenge by Coursera [Dataset]. https://www.kaggle.com/datasets/bismaridho/data-science-challenge-by-coursera
Explore at:
zip(25124511 bytes)Available download formats
Dataset updated
Feb 27, 2025
Authors
Bisma Ridho Pambudi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Bisma Ridho Pambudi

Released under CC0: Public Domain

Contents
t
2018 Data Science Bowl challenge dataset - Dataset - LDM
service.tib.eu
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). 2018 Data Science Bowl challenge dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/2018-data-science-bowl-challenge-dataset
Explore at:
Dataset updated
Dec 3, 2024
Description
The 2018 Data Science Bowl challenge dataset is used for nuclei cell image segmentation.
Gemma-Data Science Agent- Instruct- Dataset
kaggle.com
zip
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ian cecil akoto (2024). Gemma-Data Science Agent- Instruct- Dataset [Dataset]. https://www.kaggle.com/datasets/ianakoto/gemma-data-science-agent-instruct-dataset
Explore at:
zip(9680013 bytes)Available download formats
Dataset updated
Apr 2, 2024
Authors
ian cecil akoto
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview This dataset contains question-answer pairs with context extracted from Kaggle solution write-ups and discussion forums. The dataset was created to facilitate fine-tuning Gemma, an AI model, for data scientist assistant tasks such as question answering and providing data science assistance.

Dataset Details Columns: Question: The question generated based on the context extracted from Kaggle solution write-ups and discussion forums. Answer: The corresponding answer to the generated question. Context: The context extracted from Kaggle solution write-ups and discussion forums, which serves as the basis for generating questions and answers. Subtitle: Subtitle or additional information related to the Kaggle competition or topic. Title: Title of the Kaggle competition or topic. Sources and Inspiration

Sources:

Meta Kaggle: The dataset was sourced from Meta Kaggle, an official Kaggle platform where users discuss competitions, kernels, datasets, and more. Kaggle Solution Write-ups: Solution write-ups submitted by Kaggle users were utilized as a primary source of context for generating questions and answers. Discussion Forums: Discussion threads on Kaggle forums were used to gather additional insights and context for the dataset. Inspiration:

The dataset was inspired by the need for a specialized dataset tailored for fine-tuning Gemma, an AI model designed for data scientist assistant tasks. The goal was to create a dataset that captures the essence of real-world data science problems discussed on Kaggle, enabling Gemma to provide accurate and relevant assistance to data scientists and Kaggle users. Dataset Specifics Total Records: [Specify the total number of question-answer pairs in the dataset] Format: CSV (Comma Separated Values) Size: [Specify the size of the dataset in MB or GB] License: [Specify the license under which the dataset is distributed, e.g., CC BY-SA 4.0] Download Link: [Provide a link to download the dataset] Acknowledgments We acknowledge Kaggle and its community for providing valuable data science resources and discussions that contributed to the creation of this dataset. We appreciate the efforts of Gemma and Langchain in fine-tuning AI models for data scientist assistant tasks, enabling enhanced productivity and efficiency in the field of data science.
Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
pdf
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Feb 8, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
United States
Description
Snapshot img

Data Science Platform Market Size 2025-2029

The data science platform market size is valued to increase USD 763.9 million, at a CAGR of 40.2% from 2024 to 2029. Integration of AI and ML technologies with data science platforms will drive the data science platform market.

Major Market Trends & Insights

North America dominated the market and accounted for a 48% growth during the forecast period. By Deployment - On-premises segment was valued at USD 38.70 million in 2023 By Component - Platform segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 1.00 million Market Future Opportunities: USD 763.90 million CAGR : 40.2% North America: Largest market in 2023

Market Summary

The market represents a dynamic and continually evolving landscape, underpinned by advancements in core technologies and applications. Key technologies, such as machine learning and artificial intelligence, are increasingly integrated into data science platforms to enhance predictive analytics and automate data processing. Additionally, the emergence of containerization and microservices in data science platforms enables greater flexibility and scalability. However, the market also faces challenges, including data privacy and security risks, which necessitate robust compliance with regulations. According to recent estimates, the market is expected to account for over 30% of the overall big data analytics market by 2025, underscoring its growing importance in the data-driven business landscape.

What will be the Size of the Data Science Platform Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

How is the Data Science Platform Market Segmented and what are the key trends of market segmentation?

The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment On-premises Cloud Component Platform Services End-user BFSI Retail and e-commerce Manufacturing Media and entertainment Others Sector Large enterprises SMEs Application Data Preparation Data Visualization Machine Learning Predictive Analytics Data Governance Others Geography North America US Canada Europe France Germany UK Middle East and Africa UAE APAC China India Japan South America Brazil Rest of World (ROW)

By Deployment Insights

The on-premises segment is estimated to witness significant growth during the forecast period.

In the dynamic and evolving the market, big data processing is a key focus, enabling advanced model accuracy metrics through various data mining methods. Distributed computing and algorithm optimization are integral components, ensuring efficient handling of large datasets. Data governance policies are crucial for managing data security protocols and ensuring data lineage tracking. Software development kits, model versioning, and anomaly detection systems facilitate seamless development, deployment, and monitoring of predictive modeling techniques, including machine learning algorithms, regression analysis, and statistical modeling. Real-time data streaming and parallelized algorithms enable real-time insights, while predictive modeling techniques and machine learning algorithms drive business intelligence and decision-making.

Cloud computing infrastructure, data visualization tools, high-performance computing, and database management systems support scalable data solutions and efficient data warehousing. ETL processes and data integration pipelines ensure data quality assessment and feature engineering techniques. Clustering techniques and natural language processing are essential for advanced data analysis. The market is witnessing significant growth, with adoption increasing by 18.7% in the past year, and industry experts anticipate a further expansion of 21.6% in the upcoming period. Companies across various sectors are recognizing the potential of data science platforms, leading to a surge in demand for scalable, secure, and efficient solutions.

API integration services and deep learning frameworks are gaining traction, offering advanced capabilities and seamless integration with existing systems. Data security protocols and model explainability methods are becoming increasingly important, ensuring transparency and trust in data-driven decision-making. The market is expected to continue unfolding, with ongoing advancements in technology and evolving business needs shaping its future trajectory.

Request Free Sample

The On-premises segment was valued at USD 38.70 million in 2019 and showed
Data from: A large-scale comparative analysis of Coding Standard conformance...
figshare.com
application/x-gzip
Updated Oct 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12377237.v3
Dataset updated
Oct 4, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978
r
International Journal of Data Science and Analytics Abstract & Indexing -...
researchhelpdesk.org
Updated Jan 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2024). International Journal of Data Science and Analytics Abstract & Indexing - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/abstract-and-indexing/418/international-journal-of-data-science-and-analytics
Explore at:
Dataset updated
Jan 16, 2024
Dataset authored and provided by
Research Help Desk
Description
International Journal of Data Science and Analytics Abstract & Indexing - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.
2025 Kaggle Machine Learning & Data Science Survey
kaggle.com
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hina Ismail (2025). 2025 Kaggle Machine Learning & Data Science Survey [Dataset]. https://www.kaggle.com/datasets/sonialikhan/2025-kaggle-machine-learning-and-data-science-survey
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hina Ismail
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview Welcome to Kaggle's second annual Machine Learning and Data Science Survey ― and our first-ever survey data challenge.

This year, as last year, we set out to conduct an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. The survey was live for one week in October, and after cleaning the data we finished with 23,859 responses, a 49% increase over last year!

There's a lot to explore here. The results include raw numbers about who is working with data, what’s happening with machine learning in different industries, and the best ways for new data scientists to break into the field. We've published the data in as raw a format as possible without compromising anonymization, which makes it an unusual example of a survey dataset.

Challenge This year Kaggle is launching the first Data Science Survey Challenge, where we will be awarding a prize pool of $28,000 to kernel authors who tell a rich story about a subset of the data science and machine learning community..

In our second year running this survey, we were once again awed by the global, diverse, and dynamic nature of the data science and machine learning industry. This survey data EDA provides an overview of the industry on an aggregate scale, but it also leaves us wanting to know more about the many specific communities comprised within the survey. For that reason, we’re inviting the Kaggle community to dive deep into the survey datasets and help us tell the diverse stories of data scientists from around the world.

The challenge objective: tell a data story about a subset of the data science community represented in this survey, through a combination of both narrative text and data exploration. A “story” could be defined any number of ways, and that’s deliberate. The challenge is to deeply explore (through data) the impact, priorities, or concerns of a specific group of data science and machine learning practitioners. That group can be defined in the macro (for example: anyone who does most of their coding in Python) or the micro (for example: female data science students studying machine learning in masters programs). This is an opportunity to be creative and tell the story of a community you identify with or are passionate about!

Submissions will be evaluated on the following:

Composition - Is there a clear narrative thread to the story that’s articulated and supported by data? The subject should be well defined, well researched, and well supported through the use of data and visualizations. Originality - Does the reader learn something new through this submission? Or is the reader challenged to think about something in a new way? A great entry will be informative, thought provoking, and fresh all at the same time. Documentation - Are your code, and kernel, and additional data sources well documented so a reader can understand what you did? Are your sources clearly cited? A high quality analysis should be concise and clear at each step so the rationale is easy to follow and the process is reproducible To be valid, a submission must be contained in one kernel, made public on or before the submission deadline. Participants are free to use any datasets in addition to the Kaggle Data Science survey, but those datasets must also be publicly available on Kaggle by the deadline for a submission to be valid.

While the challenge is running, Kaggle will also give a Weekly Kernel Award of $1,500 to recognize excellent kernels that are public analyses of the survey. Weekly Kernel Awards will be announced every Friday between 11/9 and 11/30.

How to Participate To make a submission, complete the submission form. Only one submission will be judged per participant, so if you make multiple submissions we will review the last (most recent) entry.

No submission is necessary for the Weekly Kernels Awards. To be eligible, a kernel must be public and use the 2018 Data Science Survey as a data source.

Timeline All dates are 11:59PM UTC

Submission deadline: December 3rd

Winners announced: December 10th

Weekly Kernels Award prize winners announcements: November 9th, 16th, 23rd, and 30th

All kernels are evaluated after the deadline.

Rules To be eligible to win a prize in either of the above prize tracks, you must be:

a registered account holder at Kaggle.com; the older of 18 years old or the age of majority in your jurisdiction of residence; and not a resident of Crimea, Cuba, Iran, Syria, North Korea, or Sudan Your kernels will only be eligible to win if they have been made public on kaggle.com by the above deadline. All prizes are awarded at the discretion of Kaggle. Kaggle reserves the right to cancel or modify prize criteria.

Unfortunately employees, interns, contractors, officers and directors of Kaggle Inc., and their parent companies, are not eligible to win any prizes.

Survey Methodology ...
Riga Data Science Club
kaggle.com
zip
Updated Mar 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitry Yemelyanov (2021). Riga Data Science Club [Dataset]. https://www.kaggle.com/datasets/dmitryyemelyanov/rigadsclub
Explore at:
zip(494849 bytes)Available download formats
Dataset updated
Mar 29, 2021
Authors
Dmitry Yemelyanov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Riga
Description
Context

Riga Data Science Club is a non-profit organisation to share ideas, experience and build machine learning projects together. Data Science community should known own data, so this is a dataset about ourselves: our website analytics, social media activity, slack statistics and even meetup transcriptions!

Content

Dataset is split up in several folders by the context: * linkedin - company page visitor, follower and post stats * slack - messaging and member activity * typeform - new member responses * website - website visitors by country, language, device, operating system, screen resolution * youtube - meetup transcriptions

Inspiration

Let's make Riga Data Science Club better! We expect this data to bring lots of insights on how to improve.

"Know your c̶u̶s̶t̶o̶m̶e̶r̶ member" - Explore member interests by analysing sign-up survey (typeform) responses - Explore messaging patterns in Slack to understand how members are retained and when they are lost

Social media intelligence * Define LinkedIn posting strategy based on historical engagement data * Define target user profile based on LinkedIn page attendance data

Website * Define website localisation strategy based on data about visitor countries and languages * Define website responsive design strategy based on data about visitor devices, operating systems and screen resolutions

Have some fun * NLP analysis of meetup transcriptions: word frequencies, question answering, something else?
Online Data Science Training Programs Market Analysis, Size, and Forecast...
technavio.com
pdf
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Online Data Science Training Programs Market Analysis, Size, and Forecast 2025-2029: North America (Mexico), Europe (France, Germany, Italy, and UK), Middle East and Africa (UAE), APAC (Australia, China, India, Japan, and South Korea), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/online-data-science-training-programs-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Feb 12, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Description
Snapshot img

Online Data Science Training Programs Market Size 2025-2029

The online data science training programs market size is forecast to increase by USD 8.67 billion, at a CAGR of 35.8% between 2024 and 2029.

The market is experiencing significant growth due to the increasing demand for data science professionals in various industries. The job market offers lucrative opportunities for individuals with data science skills, making online training programs an attractive option for those seeking to upskill or reskill. Another key driver in the market is the adoption of microlearning and gamification techniques in data science training. These approaches make learning more engaging and accessible, allowing individuals to acquire new skills at their own pace. Furthermore, the availability of open-source learning materials has democratized access to data science education, enabling a larger pool of learners to enter the field. However, the market also faces challenges, including the need for continuous updates to keep up with the rapidly evolving data science landscape and the lack of standardization in online training programs, which can make it difficult for employers to assess the quality of graduates. Companies seeking to capitalize on market opportunities should focus on offering up-to-date, high-quality training programs that incorporate microlearning and gamification techniques, while also addressing the challenges of continuous updates and standardization. By doing so, they can differentiate themselves in a competitive market and meet the evolving needs of learners and employers alike.

What will be the Size of the Online Data Science Training Programs Market during the forecast period?

Request Free SampleThe online data science training market continues to evolve, driven by the increasing demand for data-driven insights and innovations across various sectors. Data science applications, from computer vision and deep learning to natural language processing and predictive analytics, are revolutionizing industries and transforming business operations. Industry case studies showcase the impact of data science in action, with big data and machine learning driving advancements in healthcare, finance, and retail. Virtual labs enable learners to gain hands-on experience, while data scientist salaries remain competitive and attractive. Cloud computing and data science platforms facilitate interactive learning and collaborative research, fostering a vibrant data science community. Data privacy and security concerns are addressed through advanced data governance and ethical frameworks. Data science libraries, such as TensorFlow and Scikit-Learn, streamline the development process, while data storytelling tools help communicate complex insights effectively. Data mining and predictive analytics enable organizations to uncover hidden trends and patterns, driving innovation and growth. The future of data science is bright, with ongoing research and development in areas like data ethics, data governance, and artificial intelligence. Data science conferences and education programs provide opportunities for professionals to expand their knowledge and expertise, ensuring they remain at the forefront of this dynamic field.

How is this Online Data Science Training Programs Industry segmented?

The online data science training programs industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. TypeProfessional degree coursesCertification coursesApplicationStudentsWorking professionalsLanguageR programmingPythonBig MLSASOthersMethodLive streamingRecordedProgram TypeBootcampsCertificatesDegree ProgramsGeographyNorth AmericaUSMexicoEuropeFranceGermanyItalyUKMiddle East and AfricaUAEAPACAustraliaChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)

By Type Insights

The professional degree courses segment is estimated to witness significant growth during the forecast period.The market encompasses various segments catering to diverse learning needs. The professional degree course segment holds a significant position, offering comprehensive and in-depth training in data science. This segment's curriculum covers essential aspects such as statistical analysis, machine learning, data visualization, and data engineering. Delivered by industry professionals and academic experts, these courses ensure a high-quality education experience. Interactive learning environments, including live lectures, webinars, and group discussions, foster a collaborative and engaging experience. Data science applications, including deep learning, computer vision, and natural language processing, are integral to the market's growth. Data analysis, a crucial application, is gaining traction due to the increasing demand for data-driven decisio
H
Mo(Wa)²TER Data Science Workshop Material
dataverse.harvard.edu
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amanda S. Hering; Kathryn B. Newhart; Derek Weix (2024). Mo(Wa)²TER Data Science Workshop Material [Dataset]. http://doi.org/10.7910/DVN/PKLIOC
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/PKLIOC
Dataset updated
Sep 9, 2024
Dataset provided by
Harvard Dataverse
Authors
Amanda S. Hering; Kathryn B. Newhart; Derek Weix
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset funded by
National Science Foundation
Description
These are the materials developed for the Mo(Wa)²TER Data Science workshop, which is designed for upper level and graduate students in environmental engineering or industry professionals in the water and wastewater treatment (W/WWT) fields. Working through this material will improve a learner’s data analysis and programming skills with the free R language and will focus exclusively on problems arising in W/WWT. Training in basic R coding, data cleaning, visualization, data analysis, statistical modeling, and machine learning are provided. Real W/WWT examples and exercises are given with each topic to strengthen and deepen comprehension. These materials aim to equip students with the skills to handle data science challenges in their future careers. Materials were developed over three offerings of this workshop in 2021, 2022, and 2023. At the time of publication, all code runs, but we provide no guarantees on future versions of R or packages used in this workshop.
r
International Journal of Data Science and Analytics Acceptance Rate -...
researchhelpdesk.org
Updated Apr 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/418/international-journal-of-data-science-and-analytics
Explore at:
Dataset updated
Apr 30, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.
r
International Journal of Data Science and Analytics Impact Factor 2024-2025...
researchhelpdesk.org
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Data Science and Analytics Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/418/international-journal-of-data-science-and-analytics
Explore at:
Dataset updated
Feb 23, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Data Science and Analytics Impact Factor 2024-2025 - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.
h
DataScience-Instruct-500K
huggingface.co
Updated Oct 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RUC-DataLab (2025). DataScience-Instruct-500K [Dataset]. https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K
Explore at:
Dataset updated
Oct 21, 2025
Dataset authored and provided by
RUC-DataLab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Authors: Shaolei Zhang, Ju Fan*, Meihao Fan, Guoliang Li, Xiaoyong Du

DeepAnalyze is the first agentic LLM for autonomous data science. It can autonomously complete a wide range of data-centric tasks without human intervention, supporting: 🛠 Entire data science pipeline: Automatically perform any data science tasks such as data preparation, analysis, modeling, visualization, and report generation. 🔍… See the full description on the dataset page: https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K.
D
Data Science Collaboration Platform Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Data Science Collaboration Platform Report [Dataset]. https://www.archivemarketresearch.com/reports/data-science-collaboration-platform-18259
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global data science collaboration platform market is projected to grow from USD 13,860 million in 2025 to USD XX million by 2033, at a CAGR of XX% during the forecast period. The increasing demand for data science collaboration platforms is primarily driven by the growing adoption of artificial intelligence (AI) and machine learning (ML) technologies, which require effective collaboration among data scientists, data engineers, and business stakeholders. The cloud-based segment is expected to account for the largest share of the market due to its flexibility, scalability, and cost-effectiveness. Large enterprises are expected to remain the primary end-users of data science collaboration platforms due to their complex data science workflows and the need for efficient collaboration across teams. The market is highly competitive, with key players including Databricks, Google, Microsoft, Kaggle, DataRobot, IBM, and Alteryx. The market is expected to witness significant growth in the Asia Pacific region due to the increasing adoption of data science technologies and the presence of a large population of data scientists. The Middle East & Africa region is also expected to experience significant growth due to government initiatives to promote digital transformation. However, the lack of skilled data scientists and the high cost of implementation may pose challenges to the growth of the market. Overall, the data science collaboration platform market is expected to continue growing steadily over the forecast period, driven by the increasing demand for data science technologies and the need for effective collaboration among data scientists.
D
Data Science Services Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Science Services Report [Dataset]. https://www.datainsightsmarket.com/reports/data-science-services-1960009
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jan 9, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global data science services market is projected to experience significant growth, reaching a value of 73060 million by 2033, expanding at a CAGR of 18.2% from 2025 to 2033. The surge in data generation, the increasing adoption of artificial intelligence (AI) and machine learning (ML), and the growing need for data-driven decision-making in various industries are major factors driving market growth. Additionally, the increasing demand for cloud-based data science services and the rise of data science-as-a-service (DSaaS) offerings are further contributing to market expansion. Key market trends include the increasing adoption of data science services by small and medium-sized enterprises (SMEs) and the growing demand for data scientists with specialized skills. The market is segmented into different applications and types, with data collection and data cleaning being the most prominent segments. North America holds a dominant share of the market, followed by Europe and Asia Pacific. Key players in the market include EY, Deloitte, KPMG, McKinsey & Company, and Boston Consulting Group, among others. These companies offer a range of data science services, including data analytics, data visualization, and predictive modeling. The market is expected to face challenges such as data privacy and security concerns, as well as the shortage of qualified data science professionals. However, ongoing advancements in technology, the growing adoption of AI and ML, and the increasing awareness of the benefits of data science services are expected to drive continued growth in the market.
Top challenges for big data analytics implementation in companies worldwide...
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Top challenges for big data analytics implementation in companies worldwide 2017 [Dataset]. https://www.statista.com/statistics/933143/worldwide-big-data-implementation-problems/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017
Area covered
Worldwide
Description
The statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around ** percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.
d
Grand Challenges: Science, Engineering, and Societal Advances, Requiring...
catalog.data.gov
s.cnmilf.com
+1more
Updated May 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCO NITRD (2025). Grand Challenges: Science, Engineering, and Societal Advances, Requiring Networking and Information Technology Research and Development [Dataset]. https://catalog.data.gov/dataset/grand-challenges-science-engineering-and-societal-advances-requiring-networking-and-inform
Explore at:
Dataset updated
May 14, 2025
Dataset provided by
NCO NITRD
Description
...the U.S. Government makes critical decisions about appropriate investments in IT R and D to help society forward both socially and economically. To inform that decision-making, in July of 2003, a group of leading Government technical program managers who participate in the Networking and Information Technology Research and Development NITRD Program completed their formulation of 16 illustrative science, engineering, and societal grand challenges...

Facebook

Twitter

Click to copy link

Link copied

Cite

nasa.gov (2025). Amazon data science challenge - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/amazon-data-science-challenge

Amazon data science challenge - Dataset - NASA Open Data Portal

Explore at:

Dataset updated

Mar 31, 2025

Dataset provided by

NASAhttp://nasa.gov/

Description

Amazon data science challenge.

Clear search

Close search

Google apps

Main menu

Amazon data science challenge - Dataset - NASA Open Data Portal

Data from: Data Science Problems

Amazon data science challenge

Data Science Challenge by Coursera

Dataset

Contents

2018 Data Science Bowl challenge dataset - Dataset - LDM

Gemma-Data Science Agent- Instruct- Dataset

Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Data from: A large-scale comparative analysis of Coding Standard conformance...

International Journal of Data Science and Analytics Abstract & Indexing -...

2025 Kaggle Machine Learning & Data Science Survey

Riga Data Science Club

Context

Content

Inspiration

Online Data Science Training Programs Market Analysis, Size, and Forecast...

Snapshot img

Mo(Wa)²TER Data Science Workshop Material

International Journal of Data Science and Analytics Acceptance Rate -...

International Journal of Data Science and Analytics Impact Factor 2024-2025...

DataScience-Instruct-500K

Data Science Collaboration Platform Report

Data Science Services Report

Top challenges for big data analytics implementation in companies worldwide...

Grand Challenges: Science, Engineering, and Societal Advances, Requiring...

Amazon data science challenge - Dataset - NASA Open Data Portal