Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Facebook
TwitterThis dataset contains data for the Healthcare Payments Data (HPD) Snapshot visualization. The Enrollment data file contains counts of claims and encounter data collected for California's statewide HPD Program. It includes counts of enrollment records, service records from medical and pharmacy claims, and the number of individuals represented across these records. Aggregate counts are grouped by payer type (Commercial, Medi-Cal, or Medicare), product type, and year. The Medical data file contains counts of medical procedures from medical claims and encounter data in HPD. Procedures are categorized using claim line procedure codes and grouped by year, type of setting (e.g., outpatient, laboratory, ambulance), and payer type. The Pharmacy data file contains counts of drug prescriptions from pharmacy claims and encounter data in HPD. Prescriptions are categorized by name and drug class using the reported National Drug Code (NDC) and grouped by year, payer type, and whether the drug dispensed is branded or a generic.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Test Depeca Data is a dataset for object detection tasks - it contains Polyps annotations for 460 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThis data set shows 311 service requests in the City of Pittsburgh. This data is collected from the request intake software used by the 311 Response Center in the Department of Innovation & Performance. Requests are collected from phone calls, tweets, emails, a form on the City website, and through the 311 mobile application. For more information, see the 311 Data User Guide. If you are unable to download the 311 Data table due to a 504 Gateway Timeout error, use this link instead: https://tools.wprdc.org/downstream/76fda9d0-69be-4dd5-8108-0de7907fc5a4 NOTE: The data feed for this dataset is broken as of December 21st, 2022. We're working on restoring it.
Facebook
TwitterMY NASA DATA (MND) is a tool that allows anyone to make use of satellite data that was previously unavailable.Through the use of MND’s Live Access Server (LAS) a multitude of charts, plots and graphs can be generated using a wide variety of constraints. This site provides a large number of lesson plans with a wide variety of topics, all with the students in mind. Not only can you use our lesson plans, you can use the LAS to improve the ones that you are currently implementing in your classroom.
Facebook
TwitterThe BOREAS TF-10 team collected tower flux and meteorological data at two sites, a fen and a young jack pine forest, near Thompson, Manitoba, Canada, as part of BOREAS. A preliminary data set was assembled in August 1993 while field testing the instrument packages, and at both sites data were collected from 15-Aug to 31-Aug. The main experimental period was in 1994, when continuous data were collected from 08-Apr to 23-Sept at the fen site. A very limited experiment was run in the spring/summer of 1995, when the fen site tower was operated from 08-Apr to 14-Jun in support of a hydrology experiment in an adjoining, feeder basin. Upon examination of the 1994 data set, it became clear that the behavior of the heat, water, and carbon dioxide fluxes throughout the whole growing season was an important scientific question, and that the 1994 data record was not sufficiently long to capture the character of the seasonal behavior of the fluxes. Thus, the fen site was operated in 1996 in order to collect data from spring melt to autumn freeze-up. Data were collected from 29-Apr to 05-Nov at the fen site. All variables are presented as 30-minute averages.
Facebook
TwitterThe Maine Geological Survey and the USGS coordinate the colletction of snow measurements each winter for the Maine River Flow Advisory Commission's flood prediction report. These measurements are sent to MGS monthly in January and February and weekly in March, April and May as long as there is snow on the ground. The dataset contains all the raw snow survey measurements (depth (inches), water content (inches), and density), their locations, data quality, and other qualitative comments or observations. These measurements are used to create the snow survey statewide maps.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The application of Artificial Intelligence (AI) has been evident in the agricultural sector recently. The main goal of AI in agriculture is to improve crop yield, control crop pests/diseases, and reduce cost. The agricultural sector in developing countries faces severe in the form of disease and pest infestation, the knowledge gap between farmers and technology, and a lack of storage facilities, among others. To help address some of these challenges, this work presents crop pests/disease datasets sourced from local farms in Ghana. The dataset is presented in two folds; the raw images which consists of 24,881 images ( 6,549-Cashew, 7,508-Cassava, 5,389-Maize, and 5,435-Tomato) and augmented images which is further split into train and test set consists of 102,976 images (25,811-Cashew, 26,330-Cassava, 23,657-Maize, and 27,178-Tomato), categorized into 22 classes. All images are de-identified, validated by expert plant virologists, and freely available for use by the research community.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.
Facebook
TwitterList and description of datasets available on Open Data for Fairfax County, Virginia
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset globally (excluding frigid/polar zones) quantifies the different facets of variability in surface soil (0 – 30 cm) salinity and sodicity for the period between 1980 and 2018. This is realised by developing 4-D predictive models of Electrical Conductivity of saturated soil Extract (ECe) and soil Exchangeable Sodium Percentage (ESP) as indicators of soil salinity and sodicity. These machine learning-based models make predictions for ECe and ESP at different times, locations, and depths and by extracting meaningful statistics form those predictions, different facets of variability in the surface soil salinity and sodicity are quantified. The dataset includes 10 maps documenting different aspects of soil salinity and sodicity variations, and auxiliary data required for generation of those maps. Users are referred to the corresponding "READ_ME" file for more information about this dataset.
Facebook
Twittershukuzade/data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Calls for Service dataset includes police service requests for which patrol officers, traffic officers, bike officers and, on occasion, detectives will be dispatched to public safety response. It also includes self-initiated calls for service where an officer witnesses a violation or suspicious activity for which they would respond. This item represents a consolidated item of all records.Why the Datasets are Organized into Separate Layers In January of 2022, the Tempe Police Department completed a major transition in how crimes data is reported, moving from the FBI Uniform Crime Report program to the enhanced National-Incident Based Reporting System, or NIBRS. NIBRS is now the required reporting method for the FBI. The Uniform Crime Report (UCR) Program's traditional Summary Reporting System (SRS) was limited in comparison to NIBRS, which offers more detailed data collection that provides a deeper understanding of crime and its circumstances. NIBRS captures a wider range of details on crime incidents and can reflect separate offenses occurring during the same event, including information on victims, known offenders, relationships between victims and offenders, arrestees, and property involved in the crimes. With greater specificity in reporting offenses, NIBRS provides for more accurate and detailed crime-related information, and helps give context to specific crime issues while affording greater analytic capability of crime. Below is the link to Tempe-specific NIBRS reports. Use the drop-down filters to select Tempe PD, the year, and the type of report. Because of these differences, trends and numbers between the two systems should not be directly compared. That’s why we treat 2022 and later (NIBRS) separately from 2021 and earlier (UCR). To make the older data easier to browse, we grouped the data from 2021 and earlier into year ranges instead of showing it all at once. This helps with performance and loading speed due to the large count of records. For detailed guidance on interpreting calls for service data, as well as data scope and limitations, please refer to the User Guide.Data DictionaryAdditional InformationContact Email: PD_DataRequest@tempe.govContact Phone: N/ALink: N/AData Source: Versaterm Informix RMSData Source Type: Informix and/or SQL ServerPreparation Method: Automated processPublish Frequency: DailyPublish Method: Automatic
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Journey9ni/VLM-3R-DATA dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
County of San Diego-DEH HMD Hazardous Waste...CSV
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All raw data sets
Facebook
TwitterNote: data is continuously updated・ PG&E provides non-confidential, aggregated usage data that are available to the public and updated on a quarterly basis. These public datasets consist of monthly consumption aggregated by ZIP code and by customer segment: Residential, Commercial, Industrial and Agricultural. The public datasets must meet the standards for aggregating and anonymizing customer data pursuant to CPUC Decision 14-05-016, as follows: a minimum of 100 Residential customers; a minimum of 15 Non-Residential customers, with no single Non-Residential customer in each sector accounting for more than 15% of the total consumption. If the aggregation standard is not met, the consumption will be combined with a neighboring ZIP code until the aggregation requirements are met.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
http://spdx.org/licenses/CC0-1.0http://spdx.org/licenses/CC0-1.0
Det mest detaljerte, heldekkende kartdatasettet for landarealet av Svalbard. Produktet har et innhold som i all hovedsak tilsvarer kartserien Svalbard 1:100 000. Produktet oppdateres flere ganger årlig.
The most detailed Svalbard land covering map dataset. The product has a content which on the whole corresponds to the map series Svalbard 1:100 000. The product is updated several times yearly.
Deler av kartdataene er av eldre dato og ikke egnet for navigasjon. Datakvaliteten er angitt på objektnivå i kartdatasettene (SOSI-egenskapene målemetode og nøyaktighet). Høydeangivelse på punkt- og nodenivå er kun angitt i SOSI-filene.
Parts of the map data are of older dates and not suited for navigation. Data quality is indicated on object level in the map datsets (the SOSI attributes "målemetode" (measuring method) and "nøyaktighet" (accuracy). Elevation on point and node level is present only in the SOSI files.
Facebook
Twittermlfoundations-cua-dev/eval-grounding-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.