Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding
Air Quality
US census 1990
Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly
Number of features: 15-68
Ground truth: No
Type of Graph: No ground truth
More information about the datasets is contained in the dataset_description.html files.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Introduction: The dataset used for this experiment is real and authentic. The dataset is acquired from UCI machine learning repository website [13]. The title of the dataset is ‘Crime and Communities’. It is prepared using real data from socio-economic data from 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crimedata from the 1995 FBI UCR [13]. This dataset contains a total number of 147 attributes and 2216 instances.
The per capita crimes variables were calculated using population values included in the 1995 FBI data (which differ from the 1990 Census values).
The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The crime attributes (N=18) that could be predicted are the 8 crimes considered 'Index Crimes' by the FBI)(Murders, Rape, Robbery, .... ), per capita (actually per 100,000 population) versions of each, and Per Capita Violent Crimes and Per Capita Nonviolent Crimes)
predictive variables : 125 non-predictive variables : 4 potential goal/response variables : 18
http://archive.ics.uci.edu/ml/datasets/Communities%20and%20Crime%20Unnormalized
U. S. Department of Commerce, Bureau of the Census, Census Of Population And Housing 1990 United States: Summary Tape File 1a & 3a (Computer Files),
U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)
U.S. Department of Justice, Bureau of Justice Statistics, Law Enforcement Management And Administrative Statistics (Computer File) U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)
U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States (Computer File) (1995)
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Data available in the dataset may not act as a complete source of information for identifying factors that contribute to more violent and non-violent crimes as many relevant factors may still be missing.
However, I would like to try and answer the following questions answered.
Analyze if number of vacant and occupied houses and the period of time the houses were vacant had contributed to any significant change in violent and non-violent crime rates in communities
How has unemployment changed crime rate(violent and non-violent) in the communities?
Were people from a particular age group more vulnerable to crime?
Does ethnicity play a role in crime rate?
Has education played a role in bringing down the crime rate?
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The data was obtained from the following website: https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset Sakar,C. and Kastro,Yomi. (2018). Online Shoppers Purchasing Intention Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5F88Q.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coupled Model Intercomparison Project Phase 6 (CMIP6) datasets. These data include all datasets published for 'CMIP6.PAMIP.UCI.E3SM-1-0.pdSST-pdSICSIT' with the full Data Reference Syntax following the template 'mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version'.
The E3SM 1.0 (Energy Exascale Earth System Model) climate model, released in 2018, includes the following components: aerosol: MAM4 with resuspension, marine organics, and secondary organics (same grid as atmos), atmos: EAM (v1.0, cubed sphere spectral-element grid; 5400 elements with p=3; 1 deg average grid spacing; 90 x 90 x 6 longitude/latitude/cubeface; 72 levels; top level 0.1 hPa), atmosChem: Troposphere specified oxidants for aerosols. Stratosphere linearized interactive ozone (LINOZ v2) (same grid as atmos), land: ELM (v1.0, cubed sphere spectral-element grid; 5400 elements with p=3; 1 deg average grid spacing; 90 x 90 x 6 longitude/latitude/cubeface; satellite phenology mode), MOSART (v1.0, 0.5 degree latitude/longitude grid), ocean: MPAS-Ocean (v6.0, oEC60to30 unstructured SVTs mesh with 235160 cells and 714274 edges, variable resolution 60 km to 30 km; 60 levels; top grid cell 0-10 m), seaIce: MPAS-Seaice (v6.0, same grid as ocean). The model was run by the Department of Earth System Science, University of California Irvine, Irvine, CA 92697, USA (UCI) in native nominal resolutions: aerosol: 100 km, atmos: 100 km, atmosChem: 100 km, land: 100 km, ocean: 50 km, seaIce: 50 km.
Project: These data have been generated as part of the internationally-coordinated Coupled Model Intercomparison Project Phase 6 (CMIP6; see also GMD Special Issue: http://www.geosci-model-dev.net/special_issue590.html). The simulation data provides a basis for climate research designed to answer fundamental science questions and serves as resource for authors of the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC-AR6).
CMIP6 is a project coordinated by the Working Group on Coupled Modelling (WGCM) as part of the World Climate Research Programme (WCRP). Phase 6 builds on previous phases executed under the leadership of the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and relies on the Earth System Grid Federation (ESGF) and the Centre for Environmental Data Analysis (CEDA) along with numerous related activities for implementation. The original data is hosted and partially replicated on a federated collection of data nodes, and most of the data relied on by the IPCC is being archived for long-term preservation at the IPCC Data Distribution Centre (IPCC DDC) hosted by the German Climate Computing Center (DKRZ).
The project includes simulations from about 120 global climate models and around 45 institutions and organizations worldwide. - Project website: https://pcmdi.llnl.gov/CMIP6.
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Data for joblib example on compression, to make sure we can always serve it. Please don't use this data but refer to the original website: http://kdd.ics.uci.edu/databases/kddcup99/task.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coupled Model Intercomparison Project Phase 6 (CMIP6) datasets. These data include all datasets published for 'CMIP6.PAMIP.NCAR.CESM1-WACCM-SC.futSST-pdSIC' with the full Data Reference Syntax following the template 'mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version'.
The Community Earth System Model 1, with the Whole Atmosphere Community Climate Model and Specified Chemistry climate model, released in 2011, includes the following components: aerosol: MOZART-specified (same grid as atmos), atmos: WACCM4 (1.9x2.5 finite volume grid; 144 x 96 longitude/latitude; 66 levels; top level 5.9e-06 mb), atmosChem: MOZART-specified (same grid as atmos), land: CLM4.0, ocean: POP2 (320 x 384 longitude/latitude; 60 levels; top grid cell 0-10 m), ocnBgchem: BEC (same grid as ocean), seaIce: CICE4 (same as grid as ocean). The model was run by the Department of Earth System Science, University of California Irvine, Irvine, CA 92697, USA (UCI) in native nominal resolutions: aerosol: 250 km, atmos: 250 km, atmosChem: 250 km, land: 250 km, ocean: 100 km, ocnBgchem: 100 km, seaIce: 100 km.
Project: These data have been generated as part of the internationally-coordinated Coupled Model Intercomparison Project Phase 6 (CMIP6; see also GMD Special Issue: http://www.geosci-model-dev.net/special_issue590.html). The simulation data provides a basis for climate research designed to answer fundamental science questions and serves as resource for authors of the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC-AR6).
CMIP6 is a project coordinated by the Working Group on Coupled Modelling (WGCM) as part of the World Climate Research Programme (WCRP). Phase 6 builds on previous phases executed under the leadership of the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and relies on the Earth System Grid Federation (ESGF) and the Centre for Environmental Data Analysis (CEDA) along with numerous related activities for implementation. The original data is hosted and partially replicated on a federated collection of data nodes, and most of the data relied on by the IPCC is being archived for long-term preservation at the IPCC Data Distribution Centre (IPCC DDC) hosted by the German Climate Computing Center (DKRZ).
The project includes simulations from about 120 global climate models and around 45 institutions and organizations worldwide. - Project website: https://pcmdi.llnl.gov/CMIP6.
From UCI Dataset Repository. From the website:
Source: Clara Higuera Department of Software Engineering and Artificial Intelligence, Faculty of Informatics and the Department of Biochemistry and Molecular Biology, Faculty of Chemistry, University Complutense, Madrid, Spain. Email: clarahiguera '@' ucm.es
Katheleen J. Gardiner, creator and owner of the protein expression data, is currently with the Linda Crnic Institute for Down Syndrome, Department of Pediatrics, Department of Biochemistry and Molecular Genetics, Human Medical Genetics and Genomics, and Neuroscience Programs, University of Colorado, School of Medicine, Aurora, Colorado, USA. Email: katheleen.gardiner '@' ucdenver.edu
Krzysztof J. Cios is currently with the Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA, and IITiS Polish Academy of Sciences, Poland. Email: kcios '@' vcu.edu
Data Set Information: The data set consists of the expression levels of 77 proteins/protein modifications that produced detectable signals in the nuclear fraction of cortex. There are 38 control mice and 34 trisomic mice (Down syndrome), for a total of 72 mice. In the experiments, 15 measurements were registered of each protein per sample/mouse. Therefore, for control mice, there are 38x15, or 570 measurements, and for trisomic mice, there are 34x15, or 510 measurements. The dataset contains a total of 1080 measurements per protein. Each measurement can be considered as an independent sample/mouse.
The eight classes of mice are described based on features such as genotype, behavior and treatment. According to genotype, mice can be control or trisomic. According to behavior, some mice have been stimulated to learn (context-shock) and others have not (shock-context) and in order to assess the effect of the drug memantine in recovering the ability to learn in trisomic mice, some mice have been injected with the drug and others have not.
Classes: c-CS-s: control mice, stimulated to learn, injected with saline (9 mice) c-CS-m: control mice, stimulated to learn, injected with memantine (10 mice) c-SC-s: control mice, not stimulated to learn, injected with saline (9 mice) c-SC-m: control mice, not stimulated to learn, injected with memantine (10 mice)
t-CS-s: trisomy mice, stimulated to learn, injected with saline (7 mice) t-CS-m: trisomy mice, stimulated to learn, injected with memantine (9 mice) t-SC-s: trisomy mice, not stimulated to learn, injected with saline (9 mice) t-SC-m: trisomy mice, not stimulated to learn, injected with memantine (9 mice)
The aim is to identify subsets of proteins that are discriminant between the classes.
Attribute Information:
1 Mouse ID 2..78 Values of expression levels of 77 proteins; the names of proteins are followed by “_n†indicating that they were measured in the nuclear fraction. For example: DYRK1A_n 79 Genotype: control (c) or trisomy (t) 80 Treatment type: memantine (m) or saline (s) 81 Behavior: context-shock (CS) or shock-context (SC) 82 Class: c-CS-s, c-CS-m, c-SC-s, c-SC-m, t-CS-s, t-CS-m, t-SC-s, t-SC-m
Relevant Papers:
The posted data were analyzed by: Higuera C, Gardiner KJ, Cios KJ (2015) Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome. PLoS ONE 10(6): e0129126. [Web Link] journal.pone.0129126
The data are a subset of the data analyzed by: Ahmed MM, Dhanasekaran AR, Block A, Tong S, Costa ACS, Stasko M, et al. (2015) Protein Dynamics Associated with Failed and Rescued Learning in the Ts65Dn Mouse Model of Down Syndrome. PLoS ONE 10(3): e0119491. [Web Link]
This is the replication Stata code and log file for the Journal of Politics research note, "The Gender Readings Gap in Political Science Graduate Training," by Heidi Hardt, Amy Erica Smith, Hannah June Kim and Philippe Meister. For our searchable database, see our website here: http://gradtraining.socsci.uci.edu/
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding
Air Quality
US census 1990
Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly
Number of features: 15-68
Ground truth: No
Type of Graph: No ground truth
More information about the datasets is contained in the dataset_description.html files.