100+ datasets found
  1. Amazon data science challenge - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Amazon data science challenge - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/amazon-data-science-challenge
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Amazon data science challenge.

  2. g

    Data from: Data Science Problems

    • github.com
    • opendatalab.com
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Data Science Problems [Dataset]. https://github.com/microsoft/DataScienceProblems
    Explore at:
    Dataset updated
    Feb 8, 2022
    License

    https://github.com/microsoft/DataScienceProblems/blob/main/LICENSE.txthttps://github.com/microsoft/DataScienceProblems/blob/main/LICENSE.txt

    Description

    Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant (https://arxiv.org/abs/2201.12901) for more details about state of the art results and other properties of the dataset.

  3. d

    Amazon data science challenge

    • catalog.data.gov
    • data.wu.ac.at
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Amazon data science challenge [Dataset]. https://catalog.data.gov/dataset/amazon-data-science-challenge
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Amazon data science challenge.

  4. Data Science Challenge by Coursera

    • kaggle.com
    zip
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bisma Ridho Pambudi (2025). Data Science Challenge by Coursera [Dataset]. https://www.kaggle.com/datasets/bismaridho/data-science-challenge-by-coursera
    Explore at:
    zip(25124511 bytes)Available download formats
    Dataset updated
    Feb 27, 2025
    Authors
    Bisma Ridho Pambudi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Bisma Ridho Pambudi

    Released under CC0: Public Domain

    Contents

  5. t

    2018 Data Science Bowl challenge dataset - Dataset - LDM

    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). 2018 Data Science Bowl challenge dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/2018-data-science-bowl-challenge-dataset
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    The 2018 Data Science Bowl challenge dataset is used for nuclei cell image segmentation.

  6. Gemma-Data Science Agent- Instruct- Dataset

    • kaggle.com
    zip
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ian cecil akoto (2024). Gemma-Data Science Agent- Instruct- Dataset [Dataset]. https://www.kaggle.com/datasets/ianakoto/gemma-data-science-agent-instruct-dataset
    Explore at:
    zip(9680013 bytes)Available download formats
    Dataset updated
    Apr 2, 2024
    Authors
    ian cecil akoto
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview This dataset contains question-answer pairs with context extracted from Kaggle solution write-ups and discussion forums. The dataset was created to facilitate fine-tuning Gemma, an AI model, for data scientist assistant tasks such as question answering and providing data science assistance.

    Dataset Details Columns: Question: The question generated based on the context extracted from Kaggle solution write-ups and discussion forums. Answer: The corresponding answer to the generated question. Context: The context extracted from Kaggle solution write-ups and discussion forums, which serves as the basis for generating questions and answers. Subtitle: Subtitle or additional information related to the Kaggle competition or topic. Title: Title of the Kaggle competition or topic. Sources and Inspiration

    Sources:

    Meta Kaggle: The dataset was sourced from Meta Kaggle, an official Kaggle platform where users discuss competitions, kernels, datasets, and more. Kaggle Solution Write-ups: Solution write-ups submitted by Kaggle users were utilized as a primary source of context for generating questions and answers. Discussion Forums: Discussion threads on Kaggle forums were used to gather additional insights and context for the dataset. Inspiration:

    The dataset was inspired by the need for a specialized dataset tailored for fine-tuning Gemma, an AI model designed for data scientist assistant tasks. The goal was to create a dataset that captures the essence of real-world data science problems discussed on Kaggle, enabling Gemma to provide accurate and relevant assistance to data scientists and Kaggle users. Dataset Specifics Total Records: [Specify the total number of question-answer pairs in the dataset] Format: CSV (Comma Separated Values) Size: [Specify the size of the dataset in MB or GB] License: [Specify the license under which the dataset is distributed, e.g., CC BY-SA 4.0] Download Link: [Provide a link to download the dataset] Acknowledgments We acknowledge Kaggle and its community for providing valuable data science resources and discussions that contributed to the creation of this dataset. We appreciate the efforts of Gemma and Langchain in fine-tuning AI models for data scientist assistant tasks, enabling enhanced productivity and efficiency in the field of data science.

  7. Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    pdf
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 8, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    United States
    Description

    Snapshot img

    Data Science Platform Market Size 2025-2029

    The data science platform market size is valued to increase USD 763.9 million, at a CAGR of 40.2% from 2024 to 2029. Integration of AI and ML technologies with data science platforms will drive the data science platform market.

    Major Market Trends & Insights

    North America dominated the market and accounted for a 48% growth during the forecast period.
    By Deployment - On-premises segment was valued at USD 38.70 million in 2023
    By Component - Platform segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 1.00 million
    Market Future Opportunities: USD 763.90 million
    CAGR : 40.2%
    North America: Largest market in 2023
    

    Market Summary

    The market represents a dynamic and continually evolving landscape, underpinned by advancements in core technologies and applications. Key technologies, such as machine learning and artificial intelligence, are increasingly integrated into data science platforms to enhance predictive analytics and automate data processing. Additionally, the emergence of containerization and microservices in data science platforms enables greater flexibility and scalability. However, the market also faces challenges, including data privacy and security risks, which necessitate robust compliance with regulations.
    According to recent estimates, the market is expected to account for over 30% of the overall big data analytics market by 2025, underscoring its growing importance in the data-driven business landscape.
    

    What will be the Size of the Data Science Platform Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the Data Science Platform Market Segmented and what are the key trends of market segmentation?

    The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      On-premises
      Cloud
    
    
    Component
    
      Platform
      Services
    
    
    End-user
    
      BFSI
      Retail and e-commerce
      Manufacturing
      Media and entertainment
      Others
    
    
    Sector
    
      Large enterprises
      SMEs
    
    
    Application
    
      Data Preparation
      Data Visualization
      Machine Learning
      Predictive Analytics
      Data Governance
      Others
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        France
        Germany
        UK
    
    
      Middle East and Africa
    
        UAE
    
    
      APAC
    
        China
        India
        Japan
    
    
      South America
    
        Brazil
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The on-premises segment is estimated to witness significant growth during the forecast period.

    In the dynamic and evolving the market, big data processing is a key focus, enabling advanced model accuracy metrics through various data mining methods. Distributed computing and algorithm optimization are integral components, ensuring efficient handling of large datasets. Data governance policies are crucial for managing data security protocols and ensuring data lineage tracking. Software development kits, model versioning, and anomaly detection systems facilitate seamless development, deployment, and monitoring of predictive modeling techniques, including machine learning algorithms, regression analysis, and statistical modeling. Real-time data streaming and parallelized algorithms enable real-time insights, while predictive modeling techniques and machine learning algorithms drive business intelligence and decision-making.

    Cloud computing infrastructure, data visualization tools, high-performance computing, and database management systems support scalable data solutions and efficient data warehousing. ETL processes and data integration pipelines ensure data quality assessment and feature engineering techniques. Clustering techniques and natural language processing are essential for advanced data analysis. The market is witnessing significant growth, with adoption increasing by 18.7% in the past year, and industry experts anticipate a further expansion of 21.6% in the upcoming period. Companies across various sectors are recognizing the potential of data science platforms, leading to a surge in demand for scalable, secure, and efficient solutions.

    API integration services and deep learning frameworks are gaining traction, offering advanced capabilities and seamless integration with existing systems. Data security protocols and model explainability methods are becoming increasingly important, ensuring transparency and trust in data-driven decision-making. The market is expected to continue unfolding, with ongoing advancements in technology and evolving business needs shaping its future trajectory.

    Request Free Sample

    The On-premises segment was valued at USD 38.70 million in 2019 and showed

  8. Data from: A large-scale comparative analysis of Coding Standard conformance...

    • figshare.com
    application/x-gzip
    Updated Oct 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Oct 4, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978

  9. r

    International Journal of Data Science and Analytics Abstract & Indexing -...

    • researchhelpdesk.org
    Updated Jan 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2024). International Journal of Data Science and Analytics Abstract & Indexing - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/abstract-and-indexing/418/international-journal-of-data-science-and-analytics
    Explore at:
    Dataset updated
    Jan 16, 2024
    Dataset authored and provided by
    Research Help Desk
    Description

    International Journal of Data Science and Analytics Abstract & Indexing - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.

  10. 2025 Kaggle Machine Learning & Data Science Survey

    • kaggle.com
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hina Ismail (2025). 2025 Kaggle Machine Learning & Data Science Survey [Dataset]. https://www.kaggle.com/datasets/sonialikhan/2025-kaggle-machine-learning-and-data-science-survey
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hina Ismail
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview Welcome to Kaggle's second annual Machine Learning and Data Science Survey ― and our first-ever survey data challenge.

    This year, as last year, we set out to conduct an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. The survey was live for one week in October, and after cleaning the data we finished with 23,859 responses, a 49% increase over last year!

    There's a lot to explore here. The results include raw numbers about who is working with data, what’s happening with machine learning in different industries, and the best ways for new data scientists to break into the field. We've published the data in as raw a format as possible without compromising anonymization, which makes it an unusual example of a survey dataset.

    Challenge This year Kaggle is launching the first Data Science Survey Challenge, where we will be awarding a prize pool of $28,000 to kernel authors who tell a rich story about a subset of the data science and machine learning community..

    In our second year running this survey, we were once again awed by the global, diverse, and dynamic nature of the data science and machine learning industry. This survey data EDA provides an overview of the industry on an aggregate scale, but it also leaves us wanting to know more about the many specific communities comprised within the survey. For that reason, we’re inviting the Kaggle community to dive deep into the survey datasets and help us tell the diverse stories of data scientists from around the world.

    The challenge objective: tell a data story about a subset of the data science community represented in this survey, through a combination of both narrative text and data exploration. A “story” could be defined any number of ways, and that’s deliberate. The challenge is to deeply explore (through data) the impact, priorities, or concerns of a specific group of data science and machine learning practitioners. That group can be defined in the macro (for example: anyone who does most of their coding in Python) or the micro (for example: female data science students studying machine learning in masters programs). This is an opportunity to be creative and tell the story of a community you identify with or are passionate about!

    Submissions will be evaluated on the following:

    Composition - Is there a clear narrative thread to the story that’s articulated and supported by data? The subject should be well defined, well researched, and well supported through the use of data and visualizations. Originality - Does the reader learn something new through this submission? Or is the reader challenged to think about something in a new way? A great entry will be informative, thought provoking, and fresh all at the same time. Documentation - Are your code, and kernel, and additional data sources well documented so a reader can understand what you did? Are your sources clearly cited? A high quality analysis should be concise and clear at each step so the rationale is easy to follow and the process is reproducible To be valid, a submission must be contained in one kernel, made public on or before the submission deadline. Participants are free to use any datasets in addition to the Kaggle Data Science survey, but those datasets must also be publicly available on Kaggle by the deadline for a submission to be valid.

    While the challenge is running, Kaggle will also give a Weekly Kernel Award of $1,500 to recognize excellent kernels that are public analyses of the survey. Weekly Kernel Awards will be announced every Friday between 11/9 and 11/30.

    How to Participate To make a submission, complete the submission form. Only one submission will be judged per participant, so if you make multiple submissions we will review the last (most recent) entry.

    No submission is necessary for the Weekly Kernels Awards. To be eligible, a kernel must be public and use the 2018 Data Science Survey as a data source.

    Timeline All dates are 11:59PM UTC

    Submission deadline: December 3rd

    Winners announced: December 10th

    Weekly Kernels Award prize winners announcements: November 9th, 16th, 23rd, and 30th

    All kernels are evaluated after the deadline.

    Rules To be eligible to win a prize in either of the above prize tracks, you must be:

    a registered account holder at Kaggle.com; the older of 18 years old or the age of majority in your jurisdiction of residence; and not a resident of Crimea, Cuba, Iran, Syria, North Korea, or Sudan Your kernels will only be eligible to win if they have been made public on kaggle.com by the above deadline. All prizes are awarded at the discretion of Kaggle. Kaggle reserves the right to cancel or modify prize criteria.

    Unfortunately employees, interns, contractors, officers and directors of Kaggle Inc., and their parent companies, are not eligible to win any prizes.

    Survey Methodology ...

  11. Riga Data Science Club

    • kaggle.com
    zip
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmitry Yemelyanov (2021). Riga Data Science Club [Dataset]. https://www.kaggle.com/datasets/dmitryyemelyanov/rigadsclub
    Explore at:
    zip(494849 bytes)Available download formats
    Dataset updated
    Mar 29, 2021
    Authors
    Dmitry Yemelyanov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Riga
    Description

    Context

    Riga Data Science Club is a non-profit organisation to share ideas, experience and build machine learning projects together. Data Science community should known own data, so this is a dataset about ourselves: our website analytics, social media activity, slack statistics and even meetup transcriptions!

    Content

    Dataset is split up in several folders by the context: * linkedin - company page visitor, follower and post stats * slack - messaging and member activity * typeform - new member responses * website - website visitors by country, language, device, operating system, screen resolution * youtube - meetup transcriptions

    Inspiration

    Let's make Riga Data Science Club better! We expect this data to bring lots of insights on how to improve.

    "Know your c̶u̶s̶t̶o̶m̶e̶r̶ member" - Explore member interests by analysing sign-up survey (typeform) responses - Explore messaging patterns in Slack to understand how members are retained and when they are lost

    Social media intelligence * Define LinkedIn posting strategy based on historical engagement data * Define target user profile based on LinkedIn page attendance data

    Website * Define website localisation strategy based on data about visitor countries and languages * Define website responsive design strategy based on data about visitor devices, operating systems and screen resolutions

    Have some fun * NLP analysis of meetup transcriptions: word frequencies, question answering, something else?

  12. Online Data Science Training Programs Market Analysis, Size, and Forecast...

    • technavio.com
    pdf
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Online Data Science Training Programs Market Analysis, Size, and Forecast 2025-2029: North America (Mexico), Europe (France, Germany, Italy, and UK), Middle East and Africa (UAE), APAC (Australia, China, India, Japan, and South Korea), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/online-data-science-training-programs-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Description

    Snapshot img

    Online Data Science Training Programs Market Size 2025-2029

    The online data science training programs market size is forecast to increase by USD 8.67 billion, at a CAGR of 35.8% between 2024 and 2029.

    The market is experiencing significant growth due to the increasing demand for data science professionals in various industries. The job market offers lucrative opportunities for individuals with data science skills, making online training programs an attractive option for those seeking to upskill or reskill. Another key driver in the market is the adoption of microlearning and gamification techniques in data science training. These approaches make learning more engaging and accessible, allowing individuals to acquire new skills at their own pace. Furthermore, the availability of open-source learning materials has democratized access to data science education, enabling a larger pool of learners to enter the field. However, the market also faces challenges, including the need for continuous updates to keep up with the rapidly evolving data science landscape and the lack of standardization in online training programs, which can make it difficult for employers to assess the quality of graduates. Companies seeking to capitalize on market opportunities should focus on offering up-to-date, high-quality training programs that incorporate microlearning and gamification techniques, while also addressing the challenges of continuous updates and standardization. By doing so, they can differentiate themselves in a competitive market and meet the evolving needs of learners and employers alike.

    What will be the Size of the Online Data Science Training Programs Market during the forecast period?

    Request Free SampleThe online data science training market continues to evolve, driven by the increasing demand for data-driven insights and innovations across various sectors. Data science applications, from computer vision and deep learning to natural language processing and predictive analytics, are revolutionizing industries and transforming business operations. Industry case studies showcase the impact of data science in action, with big data and machine learning driving advancements in healthcare, finance, and retail. Virtual labs enable learners to gain hands-on experience, while data scientist salaries remain competitive and attractive. Cloud computing and data science platforms facilitate interactive learning and collaborative research, fostering a vibrant data science community. Data privacy and security concerns are addressed through advanced data governance and ethical frameworks. Data science libraries, such as TensorFlow and Scikit-Learn, streamline the development process, while data storytelling tools help communicate complex insights effectively. Data mining and predictive analytics enable organizations to uncover hidden trends and patterns, driving innovation and growth. The future of data science is bright, with ongoing research and development in areas like data ethics, data governance, and artificial intelligence. Data science conferences and education programs provide opportunities for professionals to expand their knowledge and expertise, ensuring they remain at the forefront of this dynamic field.

    How is this Online Data Science Training Programs Industry segmented?

    The online data science training programs industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. TypeProfessional degree coursesCertification coursesApplicationStudentsWorking professionalsLanguageR programmingPythonBig MLSASOthersMethodLive streamingRecordedProgram TypeBootcampsCertificatesDegree ProgramsGeographyNorth AmericaUSMexicoEuropeFranceGermanyItalyUKMiddle East and AfricaUAEAPACAustraliaChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)

    By Type Insights

    The professional degree courses segment is estimated to witness significant growth during the forecast period.The market encompasses various segments catering to diverse learning needs. The professional degree course segment holds a significant position, offering comprehensive and in-depth training in data science. This segment's curriculum covers essential aspects such as statistical analysis, machine learning, data visualization, and data engineering. Delivered by industry professionals and academic experts, these courses ensure a high-quality education experience. Interactive learning environments, including live lectures, webinars, and group discussions, foster a collaborative and engaging experience. Data science applications, including deep learning, computer vision, and natural language processing, are integral to the market's growth. Data analysis, a crucial application, is gaining traction due to the increasing demand for data-driven decisio

  13. H

    Mo(Wa)²TER Data Science Workshop Material

    • dataverse.harvard.edu
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amanda S. Hering; Kathryn B. Newhart; Derek Weix (2024). Mo(Wa)²TER Data Science Workshop Material [Dataset]. http://doi.org/10.7910/DVN/PKLIOC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Amanda S. Hering; Kathryn B. Newhart; Derek Weix
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Dataset funded by
    National Science Foundation
    Description

    These are the materials developed for the Mo(Wa)²TER Data Science workshop, which is designed for upper level and graduate students in environmental engineering or industry professionals in the water and wastewater treatment (W/WWT) fields. Working through this material will improve a learner’s data analysis and programming skills with the free R language and will focus exclusively on problems arising in W/WWT. Training in basic R coding, data cleaning, visualization, data analysis, statistical modeling, and machine learning are provided. Real W/WWT examples and exercises are given with each topic to strengthen and deepen comprehension. These materials aim to equip students with the skills to handle data science challenges in their future careers. Materials were developed over three offerings of this workshop in 2021, 2022, and 2023. At the time of publication, all code runs, but we provide no guarantees on future versions of R or packages used in this workshop.

  14. r

    International Journal of Data Science and Analytics Acceptance Rate -...

    • researchhelpdesk.org
    Updated Apr 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/418/international-journal-of-data-science-and-analytics
    Explore at:
    Dataset updated
    Apr 30, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    International Journal of Data Science and Analytics Acceptance Rate - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.

  15. r

    International Journal of Data Science and Analytics Impact Factor 2024-2025...

    • researchhelpdesk.org
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). International Journal of Data Science and Analytics Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/418/international-journal-of-data-science-and-analytics
    Explore at:
    Dataset updated
    Feb 23, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    International Journal of Data Science and Analytics Impact Factor 2024-2025 - ResearchHelpDesk - International Journal of Data Science and Analytics - Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation. The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations.

  16. h

    DataScience-Instruct-500K

    • huggingface.co
    Updated Oct 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RUC-DataLab (2025). DataScience-Instruct-500K [Dataset]. https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K
    Explore at:
    Dataset updated
    Oct 21, 2025
    Dataset authored and provided by
    RUC-DataLab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

    Authors: Shaolei Zhang, Ju Fan*, Meihao Fan, Guoliang Li, Xiaoyong Du

    DeepAnalyze is the first agentic LLM for autonomous data science. It can autonomously complete a wide range of data-centric tasks without human intervention, supporting: 🛠 Entire data science pipeline: Automatically perform any data science tasks such as data preparation, analysis, modeling, visualization, and report generation. 🔍… See the full description on the dataset page: https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K.

  17. D

    Data Science Collaboration Platform Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Science Collaboration Platform Report [Dataset]. https://www.archivemarketresearch.com/reports/data-science-collaboration-platform-18259
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global data science collaboration platform market is projected to grow from USD 13,860 million in 2025 to USD XX million by 2033, at a CAGR of XX% during the forecast period. The increasing demand for data science collaboration platforms is primarily driven by the growing adoption of artificial intelligence (AI) and machine learning (ML) technologies, which require effective collaboration among data scientists, data engineers, and business stakeholders. The cloud-based segment is expected to account for the largest share of the market due to its flexibility, scalability, and cost-effectiveness. Large enterprises are expected to remain the primary end-users of data science collaboration platforms due to their complex data science workflows and the need for efficient collaboration across teams. The market is highly competitive, with key players including Databricks, Google, Microsoft, Kaggle, DataRobot, IBM, and Alteryx. The market is expected to witness significant growth in the Asia Pacific region due to the increasing adoption of data science technologies and the presence of a large population of data scientists. The Middle East & Africa region is also expected to experience significant growth due to government initiatives to promote digital transformation. However, the lack of skilled data scientists and the high cost of implementation may pose challenges to the growth of the market. Overall, the data science collaboration platform market is expected to continue growing steadily over the forecast period, driven by the increasing demand for data science technologies and the need for effective collaboration among data scientists.

  18. D

    Data Science Services Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Science Services Report [Dataset]. https://www.datainsightsmarket.com/reports/data-science-services-1960009
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global data science services market is projected to experience significant growth, reaching a value of 73060 million by 2033, expanding at a CAGR of 18.2% from 2025 to 2033. The surge in data generation, the increasing adoption of artificial intelligence (AI) and machine learning (ML), and the growing need for data-driven decision-making in various industries are major factors driving market growth. Additionally, the increasing demand for cloud-based data science services and the rise of data science-as-a-service (DSaaS) offerings are further contributing to market expansion. Key market trends include the increasing adoption of data science services by small and medium-sized enterprises (SMEs) and the growing demand for data scientists with specialized skills. The market is segmented into different applications and types, with data collection and data cleaning being the most prominent segments. North America holds a dominant share of the market, followed by Europe and Asia Pacific. Key players in the market include EY, Deloitte, KPMG, McKinsey & Company, and Boston Consulting Group, among others. These companies offer a range of data science services, including data analytics, data visualization, and predictive modeling. The market is expected to face challenges such as data privacy and security concerns, as well as the shortage of qualified data science professionals. However, ongoing advancements in technology, the growing adoption of AI and ML, and the increasing awareness of the benefits of data science services are expected to drive continued growth in the market.

  19. Top challenges for big data analytics implementation in companies worldwide...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Top challenges for big data analytics implementation in companies worldwide 2017 [Dataset]. https://www.statista.com/statistics/933143/worldwide-big-data-implementation-problems/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017
    Area covered
    Worldwide
    Description

    The statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around ** percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.

  20. d

    Grand Challenges: Science, Engineering, and Societal Advances, Requiring...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated May 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCO NITRD (2025). Grand Challenges: Science, Engineering, and Societal Advances, Requiring Networking and Information Technology Research and Development [Dataset]. https://catalog.data.gov/dataset/grand-challenges-science-engineering-and-societal-advances-requiring-networking-and-inform
    Explore at:
    Dataset updated
    May 14, 2025
    Dataset provided by
    NCO NITRD
    Description

    ...the U.S. Government makes critical decisions about appropriate investments in IT R and D to help society forward both socially and economically. To inform that decision-making, in July of 2003, a group of leading Government technical program managers who participate in the Networking and Information Technology Research and Development NITRD Program completed their formulation of 16 illustrative science, engineering, and societal grand challenges...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nasa.gov (2025). Amazon data science challenge - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/amazon-data-science-challenge
Organization logo

Amazon data science challenge - Dataset - NASA Open Data Portal

Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description

Amazon data science challenge.

Search
Clear search
Close search
Google apps
Main menu