100+ datasets found
  1. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  2. C

    Healthcare Payments Data Snapshot

    • data.chhs.ca.gov
    • data.ca.gov
    • +3more
    csv, pdf, zip
    Updated Nov 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Healthcare Payments Data Snapshot [Dataset]. https://data.chhs.ca.gov/dataset/healthcare-payments-data-snapshot
    Explore at:
    zip, pdf(458278), csv(907195), csv(107962), csv(1023), pdf(218738), csv(769), pdf(245152), csv(4432152), csv(1003)Available download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    This dataset contains data for the Healthcare Payments Data (HPD) Snapshot visualization. The Enrollment data file contains counts of claims and encounter data collected for California's statewide HPD Program. It includes counts of enrollment records, service records from medical and pharmacy claims, and the number of individuals represented across these records. Aggregate counts are grouped by payer type (Commercial, Medi-Cal, or Medicare), product type, and year. The Medical data file contains counts of medical procedures from medical claims and encounter data in HPD. Procedures are categorized using claim line procedure codes and grouped by year, type of setting (e.g., outpatient, laboratory, ambulance), and payer type. The Pharmacy data file contains counts of drug prescriptions from pharmacy claims and encounter data in HPD. Prescriptions are categorized by name and drug class using the reported National Drug Code (NDC) and grouped by year, payer type, and whether the drug dispensed is branded or a generic.

  3. R

    Test Depeca Data Dataset

    • universe.roboflow.com
    zip
    Updated Jun 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    polyps detection (2025). Test Depeca Data Dataset [Dataset]. https://universe.roboflow.com/polyps-detection-tvgdu/test-depeca-data
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    polyps detection
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Polyps Bounding Boxes
    Description

    Test Depeca Data

    ## Overview
    
    Test Depeca Data is a dataset for object detection tasks - it contains Polyps annotations for 460 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  4. d

    311 Data

    • catalog.data.gov
    • gimi9.com
    • +3more
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Pittsburgh (2023). 311 Data [Dataset]. https://catalog.data.gov/dataset/311-data
    Explore at:
    Dataset updated
    Jan 24, 2023
    Dataset provided by
    City of Pittsburgh
    Description

    This data set shows 311 service requests in the City of Pittsburgh. This data is collected from the request intake software used by the 311 Response Center in the Department of Innovation & Performance. Requests are collected from phone calls, tweets, emails, a form on the City website, and through the 311 mobile application. For more information, see the 311 Data User Guide. If you are unable to download the 311 Data table due to a 504 Gateway Timeout error, use this link instead: https://tools.wprdc.org/downstream/76fda9d0-69be-4dd5-8108-0de7907fc5a4 NOTE: The data feed for this dataset is broken as of December 21st, 2022. We're working on restoring it.

  5. My NASA Data

    • data.nasa.gov
    • catalog.data.gov
    • +2more
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). My NASA Data [Dataset]. https://data.nasa.gov/dataset/my-nasa-data
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    MY NASA DATA (MND) is a tool that allows anyone to make use of satellite data that was previously unavailable.Through the use of MND’s Live Access Server (LAS) a multitude of charts, plots and graphs can be generated using a wide variety of constraints. This site provides a large number of lesson plans with a wide variety of topics, all with the students in mind. Not only can you use our lesson plans, you can use the LAS to improve the ones that you are currently implementing in your classroom.

  6. BOREAS TF-10 NSA-Fen Tower Flux and Meteorological Data - Dataset - NASA...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). BOREAS TF-10 NSA-Fen Tower Flux and Meteorological Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/boreas-tf-10-nsa-fen-tower-flux-and-meteorological-data-ad471
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The BOREAS TF-10 team collected tower flux and meteorological data at two sites, a fen and a young jack pine forest, near Thompson, Manitoba, Canada, as part of BOREAS. A preliminary data set was assembled in August 1993 while field testing the instrument packages, and at both sites data were collected from 15-Aug to 31-Aug. The main experimental period was in 1994, when continuous data were collected from 08-Apr to 23-Sept at the fen site. A very limited experiment was run in the spring/summer of 1995, when the fen site tower was operated from 08-Apr to 14-Jun in support of a hydrology experiment in an adjoining, feeder basin. Upon examination of the 1994 data set, it became clear that the behavior of the heat, water, and carbon dioxide fluxes throughout the whole growing season was an important scientific question, and that the 1994 data record was not sufficiently long to capture the character of the seasonal behavior of the fluxes. Thus, the fen site was operated in 1996 in order to collect data from spring melt to autumn freeze-up. Data were collected from 29-Apr to 05-Nov at the fen site. All variables are presented as 30-minute averages.

  7. a

    Maine Snow Survey Data

    • mgs-maine.opendata.arcgis.com
    • hub.arcgis.com
    • +1more
    Updated Jan 5, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Maine (2018). Maine Snow Survey Data [Dataset]. https://mgs-maine.opendata.arcgis.com/datasets/maine-snow-survey-data
    Explore at:
    Dataset updated
    Jan 5, 2018
    Dataset authored and provided by
    State of Maine
    Area covered
    Description

    The Maine Geological Survey and the USGS coordinate the colletction of snow measurements each winter for the Maine River Flow Advisory Commission's flood prediction report. These measurements are sent to MGS monthly in January and February and weekly in March, April and May as long as there is snow on the ground. The dataset contains all the raw snow survey measurements (depth (inches), water content (inches), and density), their locations, data quality, and other qualitative comments or observations. These measurements are used to create the snow survey statewide maps.

  8. m

    Dataset for Crop Pest and Disease Detection

    • data.mendeley.com
    Updated Apr 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Mensah Kwabena (2023). Dataset for Crop Pest and Disease Detection [Dataset]. http://doi.org/10.17632/bwh3zbpkpv.1
    Explore at:
    Dataset updated
    Apr 26, 2023
    Authors
    Patrick Mensah Kwabena
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The application of Artificial Intelligence (AI) has been evident in the agricultural sector recently. The main goal of AI in agriculture is to improve crop yield, control crop pests/diseases, and reduce cost. The agricultural sector in developing countries faces severe in the form of disease and pest infestation, the knowledge gap between farmers and technology, and a lack of storage facilities, among others. To help address some of these challenges, this work presents crop pests/disease datasets sourced from local farms in Ghana. The dataset is presented in two folds; the raw images which consists of 24,881 images ( 6,549-Cashew, 7,508-Cassava, 5,389-Maize, and 5,435-Tomato) and augmented images which is further split into train and test set consists of 102,976 images (25,811-Cashew, 26,330-Cassava, 23,657-Maize, and 27,178-Tomato), categorized into 22 classes. All images are de-identified, validated by expert plant virologists, and freely available for use by the research community.

  9. Sample Leads Dataset

    • kaggle.com
    zip
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ThatSean (2022). Sample Leads Dataset [Dataset]. https://www.kaggle.com/datasets/thatsean/sample-leads-dataset
    Explore at:
    zip(22640 bytes)Available download formats
    Dataset updated
    Jun 24, 2022
    Authors
    ThatSean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.

  10. V

    Data Inventory

    • data.virginia.gov
    • s.cnmilf.com
    • +3more
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fairfax County (2024). Data Inventory [Dataset]. https://data.virginia.gov/dataset/data-inventory
    Explore at:
    arcgis geoservices rest api, geojson, csv, html, zip, kmlAvailable download formats
    Dataset updated
    Sep 27, 2024
    Dataset provided by
    County of Fairfax
    Authors
    Fairfax County
    Description

    List and description of datasets available on Open Data for Fairfax County, Virginia

  11. m

    Data from: Predicting Long-term Dynamics of Soil Salinity and Sodicity on a...

    • data.mendeley.com
    • geokur-dmp.geo.tu-dresden.de
    Updated Nov 26, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhossein Hassani (2020). Predicting Long-term Dynamics of Soil Salinity and Sodicity on a Global Scale [Dataset]. http://doi.org/10.17632/v9mgbmtnf2.1
    Explore at:
    Dataset updated
    Nov 26, 2020
    Authors
    Amirhossein Hassani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset globally (excluding frigid/polar zones) quantifies the different facets of variability in surface soil (0 – 30 cm) salinity and sodicity for the period between 1980 and 2018. This is realised by developing 4-D predictive models of Electrical Conductivity of saturated soil Extract (ECe) and soil Exchangeable Sodium Percentage (ESP) as indicators of soil salinity and sodicity. These machine learning-based models make predictions for ECe and ESP at different times, locations, and depths and by extracting meaningful statistics form those predictions, different facets of variability in the surface soil salinity and sodicity are quantified. The dataset includes 10 maps documenting different aspects of soil salinity and sodicity variations, and auxiliary data required for generation of those maps. Users are referred to the corresponding "READ_ME" file for more information about this dataset.

  12. h

    data

    • huggingface.co
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ulvi Shukurzade (2025). data [Dataset]. https://huggingface.co/datasets/shukuzade/data
    Explore at:
    Dataset updated
    Oct 26, 2025
    Authors
    Ulvi Shukurzade
    Description

    shukuzade/data dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. a

    Police Transparency - Calls for Service - All Data (Dataset)

    • safe-and-secure-communities-tempegov.hub.arcgis.com
    • data-academy.tempe.gov
    • +6more
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). Police Transparency - Calls for Service - All Data (Dataset) [Dataset]. https://safe-and-secure-communities-tempegov.hub.arcgis.com/items/d2937ee4e83140559d94080237a6e84c
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    The Calls for Service dataset includes police service requests for which patrol officers, traffic officers, bike officers and, on occasion, detectives will be dispatched to public safety response. It also includes self-initiated calls for service where an officer witnesses a violation or suspicious activity for which they would respond. This item represents a consolidated item of all records.Why the Datasets are Organized into Separate Layers In January of 2022, the Tempe Police Department completed a major transition in how crimes data is reported, moving from the FBI Uniform Crime Report program to the enhanced National-Incident Based Reporting System, or NIBRS. NIBRS is now the required reporting method for the FBI. The Uniform Crime Report (UCR) Program's traditional Summary Reporting System (SRS) was limited in comparison to NIBRS, which offers more detailed data collection that provides a deeper understanding of crime and its circumstances. NIBRS captures a wider range of details on crime incidents and can reflect separate offenses occurring during the same event, including information on victims, known offenders, relationships between victims and offenders, arrestees, and property involved in the crimes. With greater specificity in reporting offenses, NIBRS provides for more accurate and detailed crime-related information, and helps give context to specific crime issues while affording greater analytic capability of crime. Below is the link to Tempe-specific NIBRS reports. Use the drop-down filters to select Tempe PD, the year, and the type of report. Because of these differences, trends and numbers between the two systems should not be directly compared. That’s why we treat 2022 and later (NIBRS) separately from 2021 and earlier (UCR). To make the older data easier to browse, we grouped the data from 2021 and earlier into year ranges instead of showing it all at once. This helps with performance and loading speed due to the large count of records. For detailed guidance on interpreting calls for service data, as well as data scope and limitations, please refer to the User Guide.Data DictionaryAdditional InformationContact Email: PD_DataRequest@tempe.govContact Phone: N/ALink: N/AData Source: Versaterm Informix RMSData Source Type: Informix and/or SQL ServerPreparation Method: Automated processPublish Frequency: DailyPublish Method: Automatic

  14. h

    VLM-3R-DATA

    • huggingface.co
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JIAN ZHANG (2025). VLM-3R-DATA [Dataset]. https://huggingface.co/datasets/Journey9ni/VLM-3R-DATA
    Explore at:
    Dataset updated
    Jun 15, 2025
    Authors
    JIAN ZHANG
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Journey9ni/VLM-3R-DATA dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. c

    County of San Diego-DEH HMD Hazardous Waste and Materials Data - Datasets -...

    • civicdata.com
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). County of San Diego-DEH HMD Hazardous Waste and Materials Data - Datasets - CivicData.com [Dataset]. https://civicdata.com/dataset/deh_hmd_hupfp_inventory_16965
    Explore at:
    Dataset updated
    Sep 11, 2024
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    San Diego County
    Description

    County of San Diego-DEH HMD Hazardous Waste...CSV

  16. Data sets

    • figshare.com
    xlsx
    Updated Aug 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McKay Cavanaugh (2020). Data sets [Dataset]. http://doi.org/10.6084/m9.figshare.12783944.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 21, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    McKay Cavanaugh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    All raw data sets

  17. o

    PG&E: Energy Usage Data

    • openenergyhub.ornl.gov
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PG&E: Energy Usage Data [Dataset]. https://openenergyhub.ornl.gov/explore/dataset/pg-and-e-energy-usage-data/
    Explore at:
    Dataset updated
    Nov 7, 2025
    Description

    Note: data is continuously updated・ PG&E provides non-confidential, aggregated usage data that are available to the public and updated on a quarterly basis. These public datasets consist of monthly consumption aggregated by ZIP code and by customer segment: Residential, Commercial, Industrial and Agricultural. The public datasets must meet the standards for aggregating and anonymizing customer data pursuant to CPUC Decision 14-05-016, as follows: a minimum of 100 Residential customers; a minimum of 15 Non-Residential customers, with no single Non-Residential customer in each sector accounting for more than 15% of the total consumption. If the aggregation standard is not met, the consumption will be combined with a neighboring ZIP code until the aggregation requirements are met.

  18. t

    WMT18 data - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). WMT18 data - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/wmt18-data
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in the paper is the WMT18 data.

  19. n

    Kartdata Svalbard 1:100 000 (S100 Kartdata) / Map Data

    • data.npolar.no
    pdf, png, zip
    Updated Dec 4, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melvær, Yngve (yngve.melvar@npolar.no); Melvær, Yngve (yngve.melvar@npolar.no) (2014). Kartdata Svalbard 1:100 000 (S100 Kartdata) / Map Data [Dataset]. http://doi.org/10.21334/npolar.2014.645336c7
    Explore at:
    pdf, png, zipAvailable download formats
    Dataset updated
    Dec 4, 2014
    Dataset provided by
    Norwegian Polar Data Centre
    Authors
    Melvær, Yngve (yngve.melvar@npolar.no); Melvær, Yngve (yngve.melvar@npolar.no)
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    http://spdx.org/licenses/CC0-1.0http://spdx.org/licenses/CC0-1.0

    Time period covered
    Aug 15, 2013
    Area covered
    Description

    Det mest detaljerte, heldekkende kartdatasettet for landarealet av Svalbard. Produktet har et innhold som i all hovedsak tilsvarer kartserien Svalbard 1:100 000. Produktet oppdateres flere ganger årlig.

    The most detailed Svalbard land covering map dataset. The product has a content which on the whole corresponds to the map series Svalbard 1:100 000. The product is updated several times yearly.

    Quality

    Deler av kartdataene er av eldre dato og ikke egnet for navigasjon. Datakvaliteten er angitt på objektnivå i kartdatasettene (SOSI-egenskapene målemetode og nøyaktighet). Høydeangivelse på punkt- og nodenivå er kun angitt i SOSI-filene.

    Parts of the map data are of older dates and not suited for navigation. Data quality is indicated on object level in the map datsets (the SOSI attributes "målemetode" (measuring method) and "nøyaktighet" (accuracy). Elevation on point and node level is present only in the SOSI files.

  20. h

    eval-grounding-data

    • huggingface.co
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mlfoundations-cua-dev (2025). eval-grounding-data [Dataset]. https://huggingface.co/datasets/mlfoundations-cua-dev/eval-grounding-data
    Explore at:
    Dataset updated
    Jul 27, 2025
    Dataset authored and provided by
    mlfoundations-cua-dev
    Description

    mlfoundations-cua-dev/eval-grounding-data dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Organization logoOrganization logo

Orange dataset table

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
xlsxAvailable download formats
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Rui Simões
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

Search
Clear search
Close search
Google apps
Main menu