70 datasets found

pii-comp
kaggle.com
zip
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devin Anzelmo (2024). pii-comp [Dataset]. https://www.kaggle.com/datasets/devinanzelmo/pii-comp
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 18, 2024
Authors
Devin Anzelmo
Description
Models and external data of 3rd place efficiency solution for https://www.kaggle.com/competitions/pii-detection-removal-from-educational-data competition.

See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-process-external-data for links to external data and processing code

See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-train for training code that generated models.

See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-inference for inference code
m
Data from: Predicting Long-term Dynamics of Soil Salinity and Sodicity on a...
data.mendeley.com
Updated Nov 26, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhossein Hassani (2020). Predicting Long-term Dynamics of Soil Salinity and Sodicity on a Global Scale [Dataset]. http://doi.org/10.17632/v9mgbmtnf2.1
Explore at:
Unique identifier
https://doi.org/10.17632/v9mgbmtnf2.1
Dataset updated
Nov 26, 2020
Authors
Amirhossein Hassani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset globally (excluding frigid/polar zones) quantifies the different facets of variability in surface soil (0 – 30 cm) salinity and sodicity for the period between 1980 and 2018. This is realised by developing 4-D predictive models of Electrical Conductivity of saturated soil Extract (ECe) and soil Exchangeable Sodium Percentage (ESP) as indicators of soil salinity and sodicity. These machine learning-based models make predictions for ECe and ESP at different times, locations, and depths and by extracting meaningful statistics form those predictions, different facets of variability in the surface soil salinity and sodicity are quantified. The dataset includes 10 maps documenting different aspects of soil salinity and sodicity variations, and auxiliary data required for generation of those maps. Users are referred to the corresponding "READ_ME" file for more information about this dataset.
e
Inspire data set BPL “Field path No. 129 — construction line”
data.europa.eu
gimi9.com
wfs, wms
Updated Jan 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Inspire data set BPL “Field path No. 129 — construction line” [Dataset]. https://data.europa.eu/data/datasets/a913ce31-9bc3-4d82-994f-044b3ea6e84d?locale=en
Explore at:
wms, wfsAvailable download formats
Dataset updated
Jan 10, 2021
Description
According to INSPIRE transformed development plan “Field Path No. 129 — Construction Line” of the city of Großbottwar based on an XPlanung dataset in version 5.0.
P
Data from the "Resistance Against Manipulative AI: key factors and possible...
paperswithcode.com
Updated Apr 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Data from the "Resistance Against Manipulative AI: key factors and possible actions" article Dataset [Dataset]. https://paperswithcode.com/dataset/data-from-the-resistance-against-manipulative
Explore at:
Dataset updated
Apr 21, 2024
Description
Data from the "Resistance Against Manipulative AI: key factors and possible actions" article
Data from: Coal Fields of the Conterminous United States
catalog.data.gov
gimi9.com
+1more
Updated Dec 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Surface Mining, Reclamation and Enforcement (2023). Coal Fields of the Conterminous United States [Dataset]. https://catalog.data.gov/dataset/coal-fields-of-the-conterminous-united-states
Explore at:
Dataset updated
Dec 12, 2023
Dataset provided by
Office of Surface Mining Reclamation and Enforcementhttp://www.osmre.gov/
Area covered
Contiguous United States, United States
Description
"Coal Fields of the Conterminous United States" is a digital representation of James Trumbull's "Coal Fields of the United States" (sheet 1, 1960), which is an adaptation of previous maps by Averitt (1942) and Campbell(1908). It is intended to be the first in a series of open file reports that will eventually result in an I-series map that conforms to the U.S. Geological Survey mapping standards. For this edition, coal boundaries were digitized from Trumbull and plotted to represent as closely as possible the original map. In addition, the Gulf Province was updated using generalized boundaries of coal bearing formations digitized from various state geological maps.
Soil Data Grevena
kaggle.com
data.mendeley.com
Updated Sep 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jocelyn Dumlao (2023). Soil Data Grevena [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/soil-data-grevena
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jocelyn Dumlao
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Grevena
Description
Description

In this dataset, there are soil data analyses with properties such as pH, organic matter (OM), salinity (EC), etc., major elements (N, P, K, Mg) as well as some microelements (Fe, Zn, Mn, Cu, B) with significant impact on plant nutrition.

Categories

Agricultural Soil

Acknowledgements & Source

Panagiotis Tziachris

Data Source

View Details

Image Source
d
IFHEADS01 - Family Units
datasalsa.com
data.europa.eu
csv, json-stat, px +1
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistics Office (2025). IFHEADS01 - Family Units [Dataset]. https://datasalsa.com/dataset/?catalogue=data.gov.ie&name=ifheads01-family-units
Explore at:
xlsx, json-stat, px, csvAvailable download formats
Dataset updated
Jan 3, 2025
Dataset authored and provided by
Central Statistics Office
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 27, 2025
Description
IFHEADS01 - Family Units. Published by Central Statistics Office. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).Family Units...
California Streams
data.ca.gov
data.cnra.ca.gov
+6more
Updated Sep 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2023). California Streams [Dataset]. https://data.ca.gov/dataset/california-streams
Explore at:
arcgis geoservices rest api, kml, geojson, csv, zip, htmlAvailable download formats
Dataset updated
Sep 13, 2023
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
California
Description
Notes: As of June 2020 this dataset has been static for several years. Recent versions of NHD High Res may be more detailed than this dataset for some areas, while this dataset may still be more detailed than NHD High Res in other areas. This dataset is considered authoritative as used by CDFW for particular tracking purposes but may not be current or comprehensive for all streams in the state.
National Hydrography Dataset (NHD) high resolution NHDFlowline features for California were originally dissolved on common GNIS_ID or StreamLevel* attributes and routed from mouth to headwater in meters. The results are measured polyline features representing entire streams. Routes on these streams are measured upstream, i.e., the measure at the mouth of a stream is zero and at the upstream end the measure matches the total length of the stream feature. Using GIS tools, a user of this dataset can retrieve the distance in meters upstream from the mouth at any point along a stream feature.** CA_Streams_v3 Update Notes: This version includes over 200 stream modifications and additions resulting from requests for updating from CDFW staff and others***. New locator fields from the USGS Watershed Boundary Dataset (WBD) have been added for v3 to enhance user's ability to search for or extract subsets of California Streams by hydrologic area. *See the Source Citation section of this metadata for further information on NHD, WBD, NHDFlowline, GNIS_ID and StreamLevel. **See the Data Quality section of this metadata for further explanation of stream feature development. ***Some current NHD data has not yet been included in CA_Streams. The effort to synchronize CA_Streams with NHD is ongoing.
r
Vital sign data 2012
redivis.com
Updated Nov 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Impact Data Collaborative (2023). Vital sign data 2012 [Dataset]. https://redivis.com/datasets/bp7s-5nnxzmn8t
Explore at:
Dataset updated
Nov 13, 2023
Dataset authored and provided by
Environmental Impact Data Collaborative
Description
The table Vital sign data 2012 is part of the dataset Baltimore Vital Signs Data ***, available at https://redivis.com/datasets/bp7s-5nnxzmn8t. It contains 56 rows across 215 variables.
GSA JSON
data.wu.ac.at
cloud.csiss.gmu.edu
+5more
json
Updated Jan 7, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
General Services Administration (2016). GSA JSON [Dataset]. https://data.wu.ac.at/odso/data_gov/ZjE3MWVmYzYtMDNmZi00MmUyLTljNmItNGE1YTI3ODZhZmY2
Explore at:
jsonAvailable download formats
Dataset updated
Jan 7, 2016
Dataset provided by
General Services Administrationhttp://www.gsa.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The General Service Administration's data.json harvest source. This file contains the metadata for the GSA's public data listing shown on data.gov as defined by the Project Open Data
r
Data from: SMARTBUY dataset
researchdata.se
gimi9.com
Updated Jan 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karl Andersson; Damianos Gavalas (2021). SMARTBUY dataset [Dataset]. http://doi.org/10.5878/cg82-h783
Explore at:
(181405)Available download formats
Unique identifier
https://doi.org/10.5878/cg82-h783
Dataset updated
Jan 29, 2021
Dataset provided by
Luleå University of Technology
Authors
Karl Andersson; Damianos Gavalas
Time period covered
Sep 1, 2018 - Dec 31, 2018
Area covered
Greece
Description
The dataset represents a compilation of user interaction data generated by users who participated in the project's pilot activities in Patras, Greece. Data was generated by users in the SMARTBUY app and includes information about users, stores, product categories, professions, and events.

The dataset comprises the following data: - users: user account data for the Patras pilot users - occupation: all possible occupations that the pilot users could choose from - stores: stores which participated in the Patras pilot - sel_products_cat: products uploaded to the SMARTBUY platform by retailers - events: geo-stamped and time-stamped descriptions of a user interaction event (for instance, "user_id 67 rated product_id 722 with rating 4 at location x1 at datetime y1", or "user_id 91 denoted product_id 78 as favorite at location x2 at datetime y2") - event_types: all possible event types captured by the SMARTBUY platform ('Product searches', 'Product views', 'Featured product', 'Products near you views', 'Product photos browsed', 'Product ratings', 'Clicks on Read More button to read product reviews', 'Clicks on Open map button', 'Clicks on Send this info by email button', 'Products denoted as Favorite')

Privacy-sensitive information such as user names, retailer owner names and store names and keywords searched are anonymized.
m
Data from: The 2D shape structure dataset: A user annotated open access...
data.mendeley.com
Updated Jun 6, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Axel Carlier (2016). The 2D shape structure dataset: A user annotated open access database [Dataset]. http://doi.org/10.17632/74w9c6h2np.1
Explore at:
Unique identifier
https://doi.org/10.17632/74w9c6h2np.1
Dataset updated
Jun 6, 2016
Authors
Axel Carlier
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The 2D shape structure dataset: A user annotated open access database

A Carlier, K Leonard, S Hahmann, G Morin, M Collins

This code was tested on Matlab R2015a, on Ubuntu 14.04 and on Mac OS 10.9.5.
m
Panoply.io for Database Warehousing and Post Analysis using Sequal Language...
data.mendeley.com
Updated Feb 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav Pandya (2020). Panoply.io for Database Warehousing and Post Analysis using Sequal Language (SQL) [Dataset]. http://doi.org/10.17632/4gphfg5tgs.1
Explore at:
Unique identifier
https://doi.org/10.17632/4gphfg5tgs.1
Dataset updated
Feb 2, 2020
Authors
Pranav Pandya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It has never been easier to solve any database related problem using any sequel language and the following gives an opportunity for you guys to understand how I was able to figure out some of the interline relationships between databases using Panoply.io tool.

I was able to insert coronavirus dataset and create a submittable, reusable result. I hope it helps you work in Data Warehouse environment.

The following is list of SQL commands performed on dataset attached below with the final output as stored in Exports Folder QUERY 1 SELECT "Province/State" As "Region", Deaths, Recovered, Confirmed FROM "public"."coronavirus_updated" WHERE Recovered>(Deaths/2) AND Deaths>0 Description: How will we estimate where Coronavirus has infiltrated, but there is effective recovery amongst patients? We can view those places by having Recovery twice more than the Death Toll.

Query 2 SELECT country, sum(confirmed) as "Confirmed Count", sum(Recovered) as "Recovered Count", sum(Deaths) as "Death Toll" FROM "public"."coronavirus_updated" WHERE Recovered>(Deaths/2) AND Confirmed>0 GROUP BY country

Description: Coronavirus Epidemic has infiltrated multiple countries, and the only way to be safe is by knowing the countries which have confirmed Coronavirus Cases. So here is a list of those countries

Query 3 SELECT country as "Countries where Coronavirus has reached" FROM "public"."coronavirus_updated" WHERE confirmed>0 GROUP BY country Description: Coronavirus Epidemic has infiltrated multiple countries, and the only way to be safe is by knowing the countries which have confirmed Coronavirus Cases. So here is a list of those countries.

Query 4 SELECT country, sum(suspected) as "Suspected Cases under potential CoronaVirus outbreak" FROM "public"."coronavirus_updated" WHERE suspected>0 AND deaths=0 AND confirmed=0 GROUP BY country ORDER BY sum(suspected) DESC

Description: Coronavirus is spreading at alarming rate. In order to know which countries are newly getting the virus is important because in these countries if timely measures are taken, it could prevent any causalities. Here is a list of suspected cases with no virus resulted deaths.

Query 5 SELECT country, sum(suspected) as "Coronavirus uncontrolled spread count and human life loss", 100*sum(suspected)/(SELECT sum((suspected)) FROM "public"."coronavirus_updated") as "Global suspected Exposure of Coronavirus in percentage" FROM "public"."coronavirus_updated" WHERE suspected>0 AND deaths=0 GROUP BY country ORDER BY sum(suspected) DESC Description: Coronavirus is getting stronger in particular countries, but how will we measure that? We can measure it by knowing the percentage of suspected patients amongst countries which still doesn’t have any Coronavirus related deaths. The following is a list.
m
Global Burden of Disease analysis dataset of BMI and CVD outcomes, risk...
data.mendeley.com
Updated Aug 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Cundiff (2021). Global Burden of Disease analysis dataset of BMI and CVD outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.6
Explore at:
Unique identifier
https://doi.org/10.17632/g6b39zxck4.6
Dataset updated
Aug 17, 2021
Authors
David Cundiff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This formatted dataset originates from raw data files from the Institute of Health Metrics and Evaluation Global Burden of Disease (GBD2017). It is population weighted worldwide data on male and female cohorts ages 15-69 years including body mass index (BMI) and cardiovascular disease (CVD) and associated dietary, metabolic and other risk factors. The purpose of creating this formatted database is to explore the univariate and multiple regression correlations of BMI and CVD and other health outcomes with risk factors. Our research hypothesis is that we can successfully apply artificial intelligence to model BMI and CVD risk factors and health outcomes. We derived a BMI multiple regression risk factor formula that satisfied all nine Bradford Hill causality criteria for epidemiology research. We found that animal products and added fats are negatively correlated with CVD early deaths worldwide but positively correlated with CVD early deaths in high quantities. We interpret this as showing that optimal cardiovascular outcomes come with moderate (not low and not high) intakes of animal foods and added fats.

For questions, please email davidkcundiff@gmail.com. Thanks.
Data from: project-rainbow
kaggle.com
zip
Updated Feb 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MaXiaokai (2021). project-rainbow [Dataset]. https://www.kaggle.com/datasets/maxiaokai/projectrainbow
Explore at:
zip(7976132 bytes)Available download formats
Dataset updated
Feb 25, 2021
Authors
MaXiaokai
Description
Dataset

This dataset was created by MaXiaokai

Contents
MARINER 10 IMAGING ARCHIVE EXPERIMENT DATA RECORD
data.staging.idas-ds1.appdat.jsc.nasa.gov
cloud.csiss.gmu.edu
+4more
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MARINER 10 IMAGING ARCHIVE EXPERIMENT DATA RECORD [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/mariner-10-imaging-archive-experiment-data-record-a750d
Explore at:
Dataset updated
Feb 19, 2025
Dataset provided by
NASAhttp://nasa.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This series of fifteen CDs was produced by JPL's Science Digital Data Preservation Task (SDDPT) by migrating the original Mariner Ten image EDRs from old, deteriorating
f
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
My NASA Data
data.nasa.gov
s.cnmilf.com
+4more
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). My NASA Data [Dataset]. https://data.nasa.gov/dataset/my-nasa-data
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
MY NASA DATA (MND) is a tool that allows anyone to make use of satellite data that was previously unavailable.Through the use of MND’s Live Access Server (LAS) a multitude of charts, plots and graphs can be generated using a wide variety of constraints. This site provides a large number of lesson plans with a wide variety of topics, all with the students in mind. Not only can you use our lesson plans, you can use the LAS to improve the ones that you are currently implementing in your classroom.
Chesapeake & Ohio Canal National Historical Park Small-Scale Base GIS Data
catalog.data.gov
gimi9.com
Updated Jun 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Park Service (2024). Chesapeake & Ohio Canal National Historical Park Small-Scale Base GIS Data [Dataset]. https://catalog.data.gov/dataset/chesapeake-ohio-canal-national-historical-park-small-scale-base-gis-data
Explore at:
Dataset updated
Jun 5, 2024
Dataset provided by
National Park Servicehttp://www.nps.gov/
Area covered
Chesapeake and Ohio Canal
Description
This data set contains small-scale base GIS data layers compiled by the National Park Service Servicewide Inventory and Monitoring Program and Water Resources Division for use in a Baseline Water Quality Data Inventory and Analysis Report that was prepared for the park. The report presents the results of surface water quality data retrievals for the park from six of the United States Environmental Protection Agency's (EPA) national databases: (1) Storage and Retrieval (STORET) water quality database management system; (2) River Reach File (RF3) Hydrography; (3) Industrial Facilities Discharges; (4) Drinking Water Supplies; (5) Water Gages; and (6) Water Impoundments. The small-scale GIS data layers were used to prepare the maps included in the report that depict the locations of water quality monitoring stations, industrial discharges, drinking intakes, water gages, and water impoundments. The data layers included in the maps (and this dataset) vary depending on availability, but generally include roads, hydrography, political boundaries, USGS 7.5' minute quadrangle outlines, hydrologic units, trails, and others as appropriate. The scales of each layer vary depending on data source but are generally 1:100,000.
d
311 Data
catalog.data.gov
gimi9.com
Updated Jan 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Pittsburgh (2023). 311 Data [Dataset]. https://catalog.data.gov/dataset/311-data
Explore at:
Dataset updated
Jan 24, 2023
Dataset provided by
City of Pittsburgh
Description
This data set shows 311 service requests in the City of Pittsburgh. This data is collected from the request intake software used by the 311 Response Center in the Department of Innovation & Performance. Requests are collected from phone calls, tweets, emails, a form on the City website, and through the 311 mobile application. For more information, see the 311 Data User Guide. If you are unable to download the 311 Data table due to a 504 Gateway Timeout error, use this link instead: https://tools.wprdc.org/downstream/76fda9d0-69be-4dd5-8108-0de7907fc5a4 NOTE: The data feed for this dataset is broken as of December 21st, 2022. We're working on restoring it.