100+ datasets found
  1. Data from: Data Mining Project Dataset

    • kaggle.com
    zip
    Updated Dec 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Dobres (2020). Data Mining Project Dataset [Dataset]. https://www.kaggle.com/markdobres/data-mining-project-dataset
    Explore at:
    zip(1552418617 bytes)Available download formats
    Dataset updated
    Dec 10, 2020
    Authors
    Mark Dobres
    Description

    Dataset

    This dataset was created by Mark Dobres

    Contents

  2. Data Mining Project 1

    • kaggle.com
    zip
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Newt (2024). Data Mining Project 1 [Dataset]. https://www.kaggle.com/datasets/willnewt/data-mining-project-1/data
    Explore at:
    zip(6058765 bytes)Available download formats
    Dataset updated
    Jan 29, 2024
    Authors
    Will Newt
    Description

    Dataset

    This dataset was created by Will Newt

    Contents

  3. d

    Data-Mining-Final-Project-Data

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anderson, Ty Julian (2024). Data-Mining-Final-Project-Data [Dataset]. http://doi.org/10.7910/DVN/8ETVW9
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Anderson, Ty Julian
    Description

    Financial News Headlines. Visit https://dataone.org/datasets/sha256%3Ade01b1cf5318d53f0296b475ff28734d90acd6240a76f1eee1df39fefda07ef0 for complete metadata about this dataset.

  4. u

    Data from: The use of project portfolios in effective strategy execution to...

    • researchdata.up.ac.za
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palesa Agnes Ramashala (2023). The use of project portfolios in effective strategy execution to improve business value [Dataset]. http://doi.org/10.25403/UPresearchdata.13280141.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University of Pretoria
    Authors
    Palesa Agnes Ramashala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Qualitative data gathered from interviews that were conducted with case organisations. The data is analysed using a qualitative data analysis tool (AtlasTi) to code and generate network diagrams. Software such as Atlas.ti 8 Windows will be a great advantage to use in order to view these results. Interviews were conducted with four case organisations. The details of the responses from the respondents from case organisations are captured. The data gathered during the interview sessions is captured in a tabular form and graphs were also created to identify trends. Also in this study is desktop review of the case organisations that formed part of the study. The desktop study was done using published annual reports over a period of more than seven years. The analysis was done given the scope of the project and its constructs.

  5. Data from: Data Mining Project

    • kaggle.com
    zip
    Updated Nov 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oscar NG (2018). Data Mining Project [Dataset]. https://www.kaggle.com/oscar321a/data-mining-project
    Explore at:
    zip(8083512 bytes)Available download formats
    Dataset updated
    Nov 30, 2018
    Authors
    Oscar NG
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Oscar NG

    Released under CC0: Public Domain

    Contents

  6. R

    Data Mining Kel 11 Dataset

    • universe.roboflow.com
    zip
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Mining (2025). Data Mining Kel 11 Dataset [Dataset]. https://universe.roboflow.com/data-mining-mtwls/data-mining-kel-11-zp4xe
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2025
    Dataset authored and provided by
    Data Mining
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Beras
    Description

    Data Mining Kel 11

    ## Overview
    
    Data Mining Kel 11 is a dataset for classification tasks - it contains Beras annotations for 59,785 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  7. Data from: Enhancing the Human Health Status Prediction: The ATHLOS Project

    • tandf.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos (2023). Enhancing the Human Health Status Prediction: The ATHLOS Project [Dataset]. http://doi.org/10.6084/m9.figshare.14798079.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume, and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized a part of solving these challenges, respectively. Toward this direction, we focus on the development of a complete methodology for the ATHLOS Project – funded by the European Union’s Horizon 2020 Research and Innovation Program, which aims to achieve a better interpretation of the impact of aging on health. The inherent complexity of the provided dataset lies in the fact that the project includes 15 independent European and international longitudinal studies of aging. In this work, we mainly focus on the HealthStatus (HS) score, an index that estimates the human status of health, aiming to examine the effect of various data imputation models to the prediction power of classification and regression models. Our results are promising, indicating the critical importance of data imputation in enhancing preventive medicine’s crucial role.

  8. Data from: Data Mining Project

    • kaggle.com
    zip
    Updated May 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khanh Vương (2022). Data Mining Project [Dataset]. https://www.kaggle.com/khanhvng/data-mining-project
    Explore at:
    zip(69155672 bytes)Available download formats
    Dataset updated
    May 14, 2022
    Authors
    Khanh Vương
    Description

    Dataset

    This dataset was created by Khanh Vương

    Contents

  9. Data from: A large-scale comparative analysis of Coding Standard conformance...

    • figshare.com
    application/x-gzip
    Updated Oct 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Oct 4, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978

  10. Data from: DATA MINING THE GALAXY ZOO MERGERS

    • data.nasa.gov
    • gimi9.com
    • +3more
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). DATA MINING THE GALAXY ZOO MERGERS [Dataset]. https://data.nasa.gov/dataset/data-mining-the-galaxy-zoo-mergers
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the coalescence (merger) of the two galaxies. Collisions and mergers are rare phenomena, yet they may signal the ultimate fate of most galaxies, including our own Milky Way. With the onset of massive collection of astronomical data, a computerized and automated method will be necessary for identifying those colliding galaxies worthy of more detailed study. This project researches methods to accomplish that goal. Astronomical data from the Sloan Digital Sky Survey (SDSS) and human-provided classifications on merger status from the Galaxy Zoo project are combined and processed with machine learning algorithms. The goal is to determine indicators of merger status based solely on discovering those automated pipeline-generated attributes in the astronomical database that correlate most strongly with the patterns identified through visual inspection by the Galaxy Zoo volunteers. In the end, we aim to provide a new and improved automated procedure for classification of collisions and mergers in future petascale astronomical sky surveys. Both information gain analysis (via the C4.5 decision tree algorithm) and cluster analysis (via the Davies-Bouldin Index) are explored as techniques for finding the strongest correlations between human-identified patterns and existing database attributes. Galaxy attributes measured in the SDSS green waveband images are found to represent the most influential of the attributes for correct classification of collisions and mergers. Only a nominal information gain is noted in this research, however, there is a clear indication of which attributes contribute so that a direction for further study is apparent.

  11. Retrospective data mining project of Student Subject Experience Surveys from...

    • researchdata.edu.au
    Updated 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monica Short; Environmental and Social Justice Research Group (2023). Retrospective data mining project of Student Subject Experience Surveys from WEL418 [Dataset]. https://researchdata.edu.au/retrospective-mining-project-surveys-wel418/2923246
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    Charles Sturt Universityhttp://csu.edu.au/
    Authors
    Monica Short; Environmental and Social Justice Research Group
    Time period covered
    2014 - Jun 17, 2022
    Description

    This data is the set of responses to Student Subject Experience Surveys from WEL418 case management for two academics, Katrina Gersbach and Dr Monica Short for the sessions that they taught in the period 2014-17th June 2022.

  12. R

    Data Mining Dataset

    • universe.roboflow.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ilham project (2023). Data Mining Dataset [Dataset]. https://universe.roboflow.com/ilham-project/data-mining-n52lu/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    ilham project
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Uangrupiah Bounding Boxes
    Description

    Data Mining

    ## Overview
    
    Data Mining is a dataset for object detection tasks - it contains Uangrupiah annotations for 692 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  13. m

    Educational Attainment in North Carolina Public Schools: Use of statistical...

    • data.mendeley.com
    Updated Nov 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
    Explore at:
    Dataset updated
    Nov 14, 2018
    Authors
    Scott Herford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.

  14. d

    Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry...

    • catalog.data.gov
    • data.openei.org
    • +3more
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davis Energy (2023). Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry Climate [Dataset]. https://catalog.data.gov/dataset/community-scale-attic-retrofit-and-home-energy-upgrade-data-mining-hot-dry-climate
    Explore at:
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    Davis Energy
    Description

    Retrofitting is an essential element of any comprehensive strategy for improving residential energy efficiency. The residential retrofit market is still developing, and program managers must develop innovative strategies to increase uptake and promote economies of scale. Residential retrofitting remains a challenging proposition to sell to homeowners, because awareness levels are low and financial incentives are lacking. The U.S. Department of Energy's Building America research team, Alliance for Residential Building Innovation (ARBI), implemented a project to increase residential retrofits in Davis, California. The project used a neighborhood-focused strategy for implementation and a low-cost retrofit program that focused on upgraded attic insulation and duct sealing. ARBI worked with a community partner, the not-for-profit Cool Davis Initiative, as well as selected area contractors to implement a strategy that sought to capitalize on the strong local expertise of partners and the unique aspects of the Davis, California, community. Working with community partners also allowed ARBI to collect and analyze data about effective messaging tactics for community-based retrofit programs. ARBI expected this project, called Retrofit Your Attic, to achieve higher uptake than other retrofit projects, because it emphasized a low-cost, one-measure retrofit program. However, this was not the case. The program used a strategy that focused on attics-including air sealing, duct sealing, and attic insulation-as a low-cost entry for homeowners to complete home retrofits. The price was kept below $4,000 after incentives; both contractors in the program offered the same price. The program completed only five retrofits. Interestingly, none of those homeowners used the one-measure strategy. All five homeowners were concerned about cost, comfort, and energy savings and included additional measures in their retrofits. The low-cost, one-measure strategy did not increase the uptake among homeowners, even in a well-educated, affluent community such as Davis. This project has two primary components. One is to complete attic retrofits on a community scale in the hot-dry climate on Davis, CA. Sufficient data will be collected on these projects to include them in the BAFDR. Additionally, ARBI is working with contractors to obtain building and utility data from a large set of retrofit projects in CA (hot-dry). These projects are to be uploaded into the BAFDR.

  15. DATA MINING

    • kaggle.com
    zip
    Updated Dec 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chimaralavamshireddy (2021). DATA MINING [Dataset]. https://www.kaggle.com/chimaralavamshireddy/data-mining
    Explore at:
    zip(901512 bytes)Available download formats
    Dataset updated
    Dec 3, 2021
    Authors
    chimaralavamshireddy
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Dataset

    This dataset was created by chimaralavamshireddy

    Released under U.S. Government Works

    Contents

  16. Knowledge Graph: tyrolean mining documents 15th and 16th century

    • zenodo.org
    • data-staging.niaid.nih.gov
    • +1more
    bin
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine (2024). Knowledge Graph: tyrolean mining documents 15th and 16th century [Dataset]. http://doi.org/10.5281/zenodo.6276586
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains a Knowledge Graph (.nq file) of two historical mining documents: “Verleihbuch der Rattenberger Bergrichter” ( Hs. 37, 1460-1463) and “Schwazer Berglehenbuch” (Hs. 1587, approx. 1515) stored by the Tyrolean Regional Archive, Innsbruck (Austria). The user of the KG may explore the montanistic network and relations between people, claims and mines in the late medieval Tyrol. The core regions concern the districts Schwaz and Kufstein (Tyrol, Austria).

    The ontology used to represent the claims is CIDOC CRM, an ISO certified ontology for Cultural Heritage documentation. Supported by the Karma tool the KG is generated as RDF (Resource Description Framework). The generated RDF data is imported into a Triplestore, in this case GraphDB, and then displayed visually. This puts the data from the early mining texts into a semantically structured context and makes the mutual relationships between people, places and mines visible.

    Both documents and the Knowledge Graph were processed and generated by the research team of the project “Text Mining Medieval Mining Texts”. The research project (2019-2022) was carried out at the university of Innsbruck and funded by go!digital next generation programme of the Austrian Academy of Sciences.

    Citeable Transcripts of the historical documents are online available:
    Hs. 37 DOI: 10.5281/zenodo.6274562
    Hs. 1587 DOI: 10.5281/zenodo.6274928

  17. A high-level dynamic analysis approach for studying global process plant...

    • scielo.figshare.com
    jpeg
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dennis Travagini Cremonese; Bhaskar Karanth; Giorgio de Tomi (2023). A high-level dynamic analysis approach for studying global process plant availability and production time in the early stages of mining projects [Dataset]. http://doi.org/10.6084/m9.figshare.7519172.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Dennis Travagini Cremonese; Bhaskar Karanth; Giorgio de Tomi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract In the early stage of front-end studies of a Mining Project, the global availability (i.e. number of hours a plant is available for production) and production (number of hours a plant is actually operated with material) time of the process plant are normally assumed based on the experience of the study team. Understanding and defining the availability hours at the early stages of the project are important for the future stages of the project, as drastic changes in work hours will impact the economics of the project at that stage. An innovative high-level dynamic modeling approach has been developed to assist in the rapid evaluation of assumptions made by the study team. This model incorporates systems or equipment that are commonly used in mining projects from mine to product stockyard discharge after the processing plant. It includes subsystems that will simulate all the component handling, and major process plant systems required for a mining project. The output data provided by this high-level dynamic simulation approach will enhance the confidence level of engineering carried out during the early stage of the project. This study discusses the capabilities of the approach, and a test case compared with standard techniques used in mining project front-end studies.

  18. S

    Electronic Medical Record Data-Mining

    • simtk.org
    Updated Sep 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Chen (2017). Electronic Medical Record Data-Mining [Dataset]. https://simtk.org/frs/?group_id=892
    Explore at:
    data/images/video(5 MB), application/x-zip-compressed(1 MB), source code(1 MB)Available download formats
    Dataset updated
    Sep 26, 2017
    Dataset provided by
    Stanford
    Authors
    Jonathan Chen
    Description

    EMR data-mining code such as association rules for order recommendations and outcome predictions and order set evaluation



    This project includes the following software/data packages:

    • Order Sets and Topic Models : Application code and support script to reproduce topic model and order set prediction evaluations as published in JAMIA 2016 manuscript.
    • ICU DNR : Data underlying paper: "Reversals and Limitations of High-Intensity, Life-Sustaining Treatments" regarding clinical factors associated with DNR and Comfort Care orders in the ICU
    • Item Association Code PSB 2016

  19. Data Mining Project - Boston

    • kaggle.com
    zip
    Updated Nov 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston
    Explore at:
    zip(59313797 bytes)Available download formats
    Dataset updated
    Nov 25, 2019
    Authors
    SophieLiu
    Area covered
    Boston
    Description

    Context

    To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

    Use of Data Files

    You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

    This loads the file into R

    df<-read.csv('uber.csv')

    The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

    df_black<-subset(uber_df, uber_df$name == 'Black')

    This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

    write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

    The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

    getwd()

    The output will be the file path to your working directory. You will find the file you just created in that folder.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  20. project.json

    • figshare.com
    txt
    Updated Oct 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Tintori (2019). project.json [Dataset]. http://doi.org/10.6084/m9.figshare.9933785.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 3, 2019
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Sophia Tintori
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Configuration file for DrEdGE website

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mark Dobres (2020). Data Mining Project Dataset [Dataset]. https://www.kaggle.com/markdobres/data-mining-project-dataset
Organization logo

Data from: Data Mining Project Dataset

Related Article
Explore at:
zip(1552418617 bytes)Available download formats
Dataset updated
Dec 10, 2020
Authors
Mark Dobres
Description

Dataset

This dataset was created by Mark Dobres

Contents

Search
Clear search
Close search
Google apps
Main menu