23 datasets found

e
Exploratory Data Analytics and Descriptive Statistics
paper.erudition.co.in
html
Updated Jun 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2021). Exploratory Data Analytics and Descriptive Statistics [Dataset]. https://paper.erudition.co.in/makaut/bachelor-in-business-administration-2020-2021/5/data-analytics-skills-for-managers
Explore at:
htmlAvailable download formats
Dataset updated
Jun 1, 2021
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Exploratory Data Analytics and Descriptive Statistics of Data Analytics Skills for Managers, 5th Semester , Bachelor in Business Administration 2020 - 2021
Black Friday Sales EDA
kaggle.com
Updated Oct 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rushikesh Konapure (2022). Black Friday Sales EDA [Dataset]. https://www.kaggle.com/datasets/rishikeshkonapure/black-friday-sales-eda
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rushikesh Konapure
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset History

A retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summaries of various customers for selected high-volume products from last month. The data set also contains customer demographics (age, gender, marital status, city type, stay in the current city), product details (productid and product category) and Total purchase amount from last month.

Now, they want to build a model to predict the purchase amount of customers against various products which will help them to create a personalized offer for customers against different products.

Tasks to perform

The purchase column is the Target Variable, perform Univariate Analysis and Bivariate Analysis w.r.t the Purchase.

Masked in the column description means already converted from categorical value to numerical column.

Below mentioned points are just given to get you started with the dataset, not mandatory to follow the same sequence.

DATA PREPROCESSING

Check the basic statistics of the dataset

Check for missing values in the data

Check for unique values in data

Perform EDA

Purchase Distribution

Check for outliers

Analysis by Gender, Marital Status, occupation, occupation vs purchase, purchase by city, purchase by age group, etc

Drop unnecessary fields

Convert categorical data into integer using map function (e.g 'Gender' column)

Missing value treatment

Rename columns

Fill nan values

map range variables into integers (e.g 'Age' column)

Data Visualisation

visualize individual column

Age vs Purchased

Occupation vs Purchased

Productcategory1 vs Purchased

Productcategory2 vs Purchased

Productcategory3 vs Purchased

City category pie chart

check for more possible plots

All the Best!!
Data from: Supplementary Material for "Sonification for Exploratory Data...
search.datacite.org
pub.uni-bielefeld.de
Updated Feb 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Hermann (2019). Supplementary Material for "Sonification for Exploratory Data Analysis" [Dataset]. http://doi.org/10.4119/unibi/2920448
Explore at:
Unique identifier
https://doi.org/10.4119/unibi/2920448
Dataset updated
Feb 5, 2019
Dataset provided by
DataCitehttps://www.datacite.org/
Bielefeld University
Authors
Thomas Hermann
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Sonification for Exploratory Data Analysis #### Chapter 8: Sonification Models In Chapter 8 of the thesis, 6 sonification models are presented to give some examples for the framework of Model-Based Sonification, developed in Chapter 7. Sonification models determine the rendering of the sonification and possible interactions. The "model in mind" helps the user to interprete the sound with respect to the data. ##### 8.1 Data Sonograms Data Sonograms use spherical expanding shock waves to excite linear oscillators which are represented by point masses in model space. * Table 8.2, page 87: Sound examples for Data Sonograms File: Iris dataset: started in plot (a) at S0 (b) at S1 (c) at S2
10d noisy circle dataset: started in plot (c) at S0 (mean) (d) at S1 (edge)
10d Gaussian: plot (d) started at S0
3 clusters: Example 1
3 clusters: invisible columns used as output variables: Example 2 Description: Data Sonogram Sound examples for synthetic datasets and the Iris dataset Duration: about 5 s ##### 8.2 Particle Trajectory Sonification Model This sonification model explores features of a data distribution by computing the trajectories of test particles which are injected into model space and move according to Newton's laws of motion in a potential given by the dataset. * Sound example: page 93, PTSM-Ex-1 Audification of 1 particle in the potential of phi(x). * Sound example: page 93, PTSM-Ex-2 Audification of a sequence of 15 particles in the potential of a dataset with 2 clusters. * Sound example: page 94, PTSM-Ex-3 Audification of 25 particles simultaneous in a potential of a dataset with 2 clusters. * Sound example: page 94, PTSM-Ex-4 Audification of 25 particles simultaneous in a potential of a dataset with 1 cluster. * Sound example: page 95, PTSM-Ex-5 sigma-step sequence for a mixture of three Gaussian clusters * Sound example: page 95, PTSM-Ex-6 sigma-step sequence for a Gaussian cluster * Sound example: page 96, PTSM-Iris-1 Sonification for the Iris Dataset with 20 particles per step. * Sound example: page 96, PTSM-Iris-2 Sonification for the Iris Dataset with 3 particles per step. * Sound example: page 96, PTSM-Tetra-1 Sonification for a 4d tetrahedron clusters dataset. ##### 8.3 Markov chain Monte Carlo Sonification The McMC Sonification Model defines a exploratory process in the domain of a given density p such that the acoustic representation summarizes features of p, particularly concerning the modes of p by sound. * Sound Example: page 105, MCMC-Ex-1 McMC Sonification, stabilization of amplitudes. * Sound Example: page 106, MCMC-Ex-2 Trajectory Audification for 100 McMC steps in 3 cluster dataset * McMC Sonification for Cluster Analysis, dataset with three clusters, page 107 * Stream 1 MCMC-Ex-3.1 * Stream 2 MCMC-Ex-3.2 * Stream 3 MCMC-Ex-3.3 * Mix MCMC-Ex-3.4 * McMC Sonification for Cluster Analysis, dataset with three clusters, T =0.002s, page 107 * Stream 1 MCMC-Ex-4.1 (stream 1) * Stream 2 MCMC-Ex-4.2 (stream 2) * Stream 3 MCMC-Ex-4.3 (stream 3) * Mix MCMC-Ex-4.4 * McMC Sonification for Cluster Analysis, density with 6 modes, T=0.008s, page 107 * Stream 1 MCMC-Ex-5.1 (stream 1) * Stream 2 MCMC-Ex-5.2 (stream 2) * Stream 3 MCMC-Ex-5.3 (stream 3) * Mix MCMC-Ex-5.4 * McMC Sonification for the Iris dataset, page 108 * MCMC-Ex-6.1 * MCMC-Ex-6.2 * MCMC-Ex-6.3 * MCMC-Ex-6.4 * MCMC-Ex-6.5 * MCMC-Ex-6.6 * MCMC-Ex-6.7 * MCMC-Ex-6.8 ##### 8.4 Principal Curve Sonification Principal Curve Sonification represents data by synthesizing the soundscape while a virtual listener moves along the principal curve of the dataset through the model space. * Noisy Spiral dataset, PCS-Ex-1.1 , page 113 * Noisy Spiral dataset with variance modulation PCS-Ex-1.2 , page 114 * 9d tetrahedron cluster dataset (10 clusters) PCS-Ex-2 , page 114 * Iris dataset, class label used as pitch of auditory grains PCS-Ex-3 , page 114 ##### 8.5 Data Crystallization Sonification Model * Table 8.6, page 122: Sound examples for Crystallization Sonification for 5d Gaussian distribution File: DCS started at center, in tail, from far outside Description: DCS for dataset sampled from N{0, I_5} excited at different locations Duration: 1.4 s * Mixture of 2 Gaussians, page 122 * DCS started at point A DCS-Ex1A * DCS started at point B DCS-Ex1B * Table 8.7, page 124: Sound examples for DCS on variation of the harmonics factor File: h_omega = 1, 2, 3, 4, 5, 6 Description: DCS for a mixture of two Gaussians with varying harmonics factor Duration: 1.4 s * Table 8.8, page 124: Sound examples for DCS on variation of the energy decay time File: tau_(1/2) = 0.001, 0.005, 0.01, 0.05, 0.1, 0.2 Description: DCS for a mixture of two Gaussians varying the energy decay time tau_(1/2) Duration: 1.4 s * Table 8.9, page 125: Sound examples for DCS on variation of the sonification time File: T = 0.2, 0.5, 1, 2, 4, 8 Description: DCS for a mixture of two Gaussians on varying the duration T Duration: 0.2s -- 8s * Table 8.10, page 125: Sound examples for DCS on variation of model space dimension File: selected columns of the dataset: (x0) (x0,x1) (x0,...,x2) (x0,...,x3) (x0,...,x4) (x0,...,x5) Description: DCS for a mixture of two Gaussians varying the dimension Duration: 1.4 s * Table 8.11, page 126: Sound examples for DCS for different excitation locations File: starting point: C0, C1, C2 Description: DCS for a mixture of three Gaussians in 10d space with different rank(S) = {2,4,8} Duration: 1.9 s * Table 8.12, page 126: Sound examples for DCS for the mixture of a 2d distribution and a 5d cluster File: condensation nucleus in (x0,x1)-plane at: (-6,0)=C1, (-3,0)=C2, ( 0,0)=C0 Description: DCS for a mixture of a uniform 2d and a 5d Gaussian Duration: 2.16 s * Table 8.13, page 127: Sound examples for DCS for the cancer dataset File: condensation nucleus in (x0,x1)-plane at: benign 1, benign 2
malignant 1, malignant 2 Description: DCS for a mixture of a uniform 2d and a 5d Gaussian Duration: 2.16 s ##### 8.6 Growing Neural Gas Sonification * Table 8.14, page 133: Sound examples for GNGS Probing File: Cluster C0 (2d): a, b, c
Cluster C1 (4d): a, b, c
Cluster C2 (8d): a, b, c Description: GNGS for a mixture of 3 Gaussians in 10d space Duration: 1 s * Table 8.15, page 134: Sound examples for GNGS for the noisy spiral dataset File: (a) GNG with 3 neurons 1, 2
(b) GNG with 20 neurons end, middle, inner end
(c) GNG with 45 neurons outer end, middle, close to inner end, at inner end
(d) GNG with 150 neurons outer end, in the middle, inner end
(e) GNG with 20 neurons outer end, in the middle, inner end
(f) GNG with 45 neurons outer end, in the middle, inner end Description: GNG probing sonification for 2d noisy spiral dataset Duration: 1 s * Table 8.16, page 136: Sound examples for GNG Process Monitoring Sonification for different data distributions File: Noisy spiral with 1 rotation: sound
Noisy spiral with 2 rotations: sound
Gaussian in 5d: sound
Mixture of 5d and 2d distributions: sound Description: GNG process sonification examples Duration: 5 s #### Chapter 9: Extensions #### In this chapter, two extensions for Parameter Mapping
u
ERA5 Reanalysis Monthly Means
data.ucar.edu
rda.ucar.edu
grib
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (2024). ERA5 Reanalysis Monthly Means [Dataset]. http://doi.org/10.5065/D63B5XW1
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.5065/D63B5XW1
Dataset updated
Aug 4, 2024
Dataset provided by
Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory
Authors
European Centre for Medium-Range Weather Forecasts
Time period covered
Jan 1, 2008 - Dec 31, 2017
Area covered
Description
Please note: Please use ds633.1 to access RDA maintained ERA-5 Monthly Mean data, see ERA5 Reanalysis (Monthly Mean 0.25 Degree Latitude-Longitude Grid), RDA dataset ds633.1. This dataset is no longer being updated, and web access has been removed. After many years of research and technical preparation, the production of a new ECMWF climate reanalysis to replace ERA-Interim is in progress. ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, which started with the FGGE reanalyses produced in the 1980s, followed by ERA-15, ERA-40 and most recently ERA-Interim. ERA5 will cover the period January 1950 to near real time, though the first segment of data to be released will span the period 2010-2016. ERA5 is produced using high-resolution forecasts (HRES) at 31 kilometer resolution (one fourth the spatial resolution of the operational model) and a 62 kilometer resolution ten member 4D-Var ensemble of data assimilation (EDA) in CY41r2 of ECMWF's Integrated Forecast System (IFS) with 137 hybrid sigma-pressure (model) levels in the vertical, up to a top level of 0.01 hPa. Atmospheric data on these levels are interpolated to 37 pressure levels (the same levels as in ERA-Interim). Surface or single level data are also available, containing 2D parameters such as precipitation, 2 meter temperature, top of atmosphere radiation and vertical integrals over the entire atmosphere. The IFS is coupled to a soil model, the parameters of which are also designated as surface parameters, and an ocean wave model. Generally, the data is available at an hourly frequency and consists of analyses and short (18 hour) forecasts, initialized twice daily from analyses at 06 and 18 UTC. Most analyses parameters are also available from the forecasts. There are a number of forecast parameters, e.g. mean rates and accumulations, that are not available from the analyses. Together, the hourly analysis and twice daily forecast parameters form the basis of the monthly...
Data from: Evaluating the Use of Uncertainty Visualisations for Imputations...
osf.io
Updated Aug 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhraneel Sarma (2024). Evaluating the Use of Uncertainty Visualisations for Imputations of Data Missing At Random in Scatterplots [Dataset]. https://osf.io/q4y5r
Explore at:
Dataset updated
Aug 26, 2024
Dataset provided by
Center for Open Sciencehttps://cos.io/
Authors
Abhraneel Sarma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains supplementary materials for the paper, Evaluating the Use of Uncertainty Visualisations for Imputations of Data Missing At Random in Scatterplots

Abstract: Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems only support visualising data points with complete cases. This omission may potentially lead the user to biased analyses and insights. Imputation techniques can help estimate the value of a missing data point, but introduces additional uncertainty. In this work, we investigate the effects of visualising imputed values in charts using different types of uncertainty visualisation techniques—no imputation, mean, 95% confidence intervals, probability density plots, gradient intervals, and hypothetical outcome plots. We focus on scatterplots, which is a commonly used chart type, and conduct a crowdsourced study with 202 participants. We measure users’ bias and precision in performing two tasks—estimating average and detecting trend—and their self-reported confidence in performing these tasks. Our results suggest that, when estimating averages, uncertainty representations may reduce bias but at the cost of decreasing precision. When estimating trend, only hypothetical outcome plots may lead to a small probability of reducing bias while increasing precision. Participants in every uncertainty representation were less certain about their response when compared to the baseline. The findings point towards potential trade-offs in using uncertainty encodings for datasets with a large number of missing values.
Iterative Imputation of Jane St train.csv
kaggle.com
Updated Nov 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tpmeli (2020). Iterative Imputation of Jane St train.csv [Dataset]. https://www.kaggle.com/tpmeli/iterative-imputation-of-jane-st-traincsv/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
tpmeli
Description
I will be sharing all of my missing data exploration here:

https://www.kaggle.com/tpmeli/missing-data-exploration-mean-iterative-more
Data from: Exploratory investigation of historical decorative laminates by...
zenodo.org
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
An Jacquemain; Klara Retko; Lea Legan; Polonca Ropret; Friederike Waentig; Vincent Cattersel; An Jacquemain; Klara Retko; Lea Legan; Polonca Ropret; Friederike Waentig; Vincent Cattersel (2023). Exploratory investigation of historical decorative laminates by means of vibrational spectroscopic techniques [Dataset]. http://doi.org/10.5281/zenodo.7862015
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7862015
Dataset updated
Apr 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
An Jacquemain; Klara Retko; Lea Legan; Polonca Ropret; Friederike Waentig; Vincent Cattersel; An Jacquemain; Klara Retko; Lea Legan; Polonca Ropret; Friederike Waentig; Vincent Cattersel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the data used for the publication entitled "Exploratory investigation of historical decorative laminates by means of vibrational spectroscopic techniques".
u
Data from: Exploratory Twitter hashtag analysis of movie premieres in the...
portalcientificovalencia.univeuropea.com
Updated 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Exploratory Twitter hashtag analysis of movie premieres in the USA [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed1aea56d4af0485dad
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Area covered
United States
Description
This work is an exploratory, quantitative, and not experimental study with an inductive inference type and a longitudinal follow-up. It analyzes movie data and tweets published by users using the official Twitter hashtags of movie premieres the week before, the same week, and the week after each release date.The scope of the study is the collection of movies released in February 2022 in the USA, and the object of the study includes them and the tweets that refer to the film in the 3 closest weeks to their premiere dates. The tweets recollected were classified by the week they were published, so they are classified by a time dimension called timepoint. The week before the release date has been designated as timepoint 1, the week of the release date is timepoint 2, and the week immediately afterward is timepoint 3. Another dimension that has been considered is if the movie has domestic production or not, which means that if one of the countries of origin is the United States, the movie is designated as domestic.The chosen variables are organized in two data tables, one for the movies and one for the collected tweets.Variables related to the movies:id: Internal id of the moviename: Title of the moviehashtag: Official hashtag of the moviecountries: List of countries of the movie, separated by a semicolonmpaa: Film ratings system by the Motion Picture Association of America. It is a completely voluntary rating system and ratings have no legal standing. The currently rating systems include G (general audiences), PG (parental guidance suggested), PG-13 (parents strongly cautioned), R (restricted, under 17 requires accompanying parent or adult guardian) and NC-17 (no one 17 and under admitted)(Film Ratings - Motion Picture Association, n.d.)genres: List of genres of the movie, e.g., Action or Thriller, separated by a semicolonrelease_date: Release date of the movie in a format YYYY-MM-DDopening_grosses: Amount of USA dollars that the movie obtained on the opening date (the first week after the release date)opening_theaters: Amount of USA theaters that released the movie on the opening date (the first week after the release date)rating_avg: Average rating of the movieVariables related to the tweets:id: Internal id of the tweetstatus_id: Twitter id of the tweetmovie_id: Internal id of the movietimepoint: Week number related to the movie premiere that the tweet was published on. “1” is the week before the movie release, “2” is the week after the movie release” and “3” is the second week after the movie release.author_id: Twitter id of the author of the tweetcreated_at: Date and time of the tweet, with format “YYYY-MM-DD HH:MM:SS”quote_count: Number of the tweet’s quotesreply_count: Number of the tweet’s repliesretweet_count: Number of the tweet’s retweetslike_count: Number of the tweet’s likessentiment: Sentiment analysis of the tweet’s content with a range from -1 (negative) to 1 (positive)This dataset has contributed to the elaboration of the book chapters:Yeste, Víctor; Calduch-Losa, Ángeles (2022). Genre classification of movie releases in the USA: Exploring data with Twitter hashtags. In Narrativas emergentes para la comunicación digital (pp. 1012-1044). Dykinson, S. L.Yeste, Víctor; Calduch-Losa, Ángeles (2022). Exploratory Twitter hashtag analysis of movie premieres in the USA. In Desafíos audiovisuales de la tecnología y los contenidos en la cultura digital (pp. 169-187). McGraw-Hill Interamericana de España S.L.Yeste, Víctor; Calduch-Losa, Ángeles (2022). ANOVA to study movie premieres in the USA and online conversation on Twitter. The case of rating average using data from official Twitter hashtags. In El mapa y la brújula. Navegando por las metodologías de investigación en comunicación (pp. 151-168). Editorial Fragua.
m
Data from: Wrist-worn sensor validation for heart rate variability and...
data.mendeley.com
data.niaid.nih.gov
+1more
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Costantini (2023). Wrist-worn sensor validation for heart rate variability and electrodermal activity detection in a stressful driving environment [Dataset]. http://doi.org/10.17632/npnv4tsbg7.1
Explore at:
Unique identifier
https://doi.org/10.17632/npnv4tsbg7.1
Dataset updated
Jun 21, 2023
Authors
Simone Costantini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The current dataset contributes to assess the accuracy of the Empatica 4 (E4) wristband for the detection of heart rate variability (HRV) and electrodermal activity (EDA) metrics in stress-inducing conditions and growing-risk driving scenarios. Heart Rate Variability (HRV) and ElectroDermal Activity (EDA) signals were recorded over six experimental conditions (i.e., Baseline, Video Clip, Scream, No Risk Driving, Low-Risk Driving, and High-Risk Driving) and by means of two measurement systems: the E4 device and a gold standard system. The raw quality of the physiological signals was enhanced by means of robust semi-automatic reconstruction algorithms. Heart Rate Variability time-domain parameters showed high accuracy in motion-free experimental conditions, while Heart Rate Variability frequency-domain parameters reported sufficient accuracy in almost every experimental condition.

Folder 01 contains both HRV and EDA parameters for every experimental condition, according to the Gold Standard measurement system and the Empatica 4 device, in two separate Excel files.

Folder 02 contains supplementary material on the assessment of the signals quality.

Folder 03 contains the Bland-Altman plot for each HRV and EDA parameter and for each condition (1 .png file per each parameter), and an excel file that resumes the Bland-Altman analyses numerical outcomes.
g
Data from: Exploratory Research on the Impact of the Growing Oil Industry in...
gimi9.com
s.cnmilf.com
+3more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exploratory Research on the Impact of the Growing Oil Industry in North Dakota and Montana on Domestic Violence, Dating Violence, Sexual Assault, and Stalking, 2000-2015 [Dataset]. https://gimi9.com/dataset/data-gov_3b52792d42c345dc455bcde14b2a752051363cac
Explore at:
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study used secondary analysis of data from several different sources to examine the impact of increased oil development on domestic violence, dating violence, sexual assault, and stalking (DVDVSAS) in the Bakken region of Montana and North Dakota. Distributed here are the code used for the secondary analysis data; the data are not available through other public means. Please refer to the User Guide distributed with this study for a list of instructions on how to obtain all other data used in this study. This collection contains a secondary analysis of the Uniform Crime Reports (UCR). UCR data serve as periodic nationwide assessments of reported crimes not available elsewhere in the criminal justice system. Each year, participating law enforcement agencies contribute reports to the FBI either directly or through their state reporting programs. Distributed here are the codes used to create the datasets and preform the secondary analysis. Please refer to the User Guide, distributed with this study, for more information. This collection contains a secondary analysis of the National Incident Based Reporting System (NIBRS), a component part of the Uniform Crime Reporting Program (UCR) and an incident-based reporting system for crimes known to the police. For each crime incident coming to the attention of law enforcement, a variety of data were collected about the incident. These data included the nature and types of specific offenses in the incident, characteristics of the victim(s) and offender(s), types and value of property stolen and recovered, and characteristics of persons arrested in connection with a crime incident. NIBRS collects data on each single incident and arrest within 22 offense categories, made up of 46 specific crimes called Group A offenses. In addition, there are 11 Group B offense categories for which only arrest data were reported. NIBRS data on different aspects of crime incidents such as offenses, victims, offenders, arrestees, etc., can be examined as different units of analysis. Distributed here are the codes used to create the datasets and preform the secondary analysis. Please refer to the User Guide, distributed with this study, for more information. The collection includes 17 SPSS syntax files. Qualitative data collected for this study are not available as part of the data collection at this time.
f
Mean and standard deviation of SI by BMI group and maternal age.
figshare.com
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson Borovac-Pinheiro; Filipe Moraes Ribeiro; Sirlei Siani Morais; Rodolfo Carvalho Pacagnella (2023). Mean and standard deviation of SI by BMI group and maternal age. [Dataset]. http://doi.org/10.1371/journal.pone.0217907.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0217907.t003
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Anderson Borovac-Pinheiro; Filipe Moraes Ribeiro; Sirlei Siani Morais; Rodolfo Carvalho Pacagnella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mean and standard deviation of SI by BMI group and maternal age.
Data from: The effects of exploratory behavior on physical activity in a...
zenodo.org
datadryad.org
bin
Updated Oct 21, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cairsty DePasquale; Cairsty DePasquale (2022). The effects of exploratory behavior on physical activity in a common animal model of human disease, zebrafish (Danio rerio) [Dataset]. http://doi.org/10.5061/dryad.c2fqz61c8
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.c2fqz61c8
Dataset updated
Oct 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cairsty DePasquale; Cairsty DePasquale
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Zebrafish (Danio rerio) are widely accepted as a multidisciplinary vertebrate model for neurobehavioral and clinical studies, and more recently have become established as a model for exercise physiology and behavior. Individual differences in activity level (e.g., exploration) have been characterized in zebrafish, however, how different levels of exploration correspond to differences in motivation to engage in swimming behavior has not yet been explored. We screened individual zebrafish in two tests of exploration: the open field and novel tank diving tests. The fish were then exposed to a tank in which they could choose to enter a compartment with a flow of water (as a means of testing voluntary motivation to exercise). After a 2-day habituation period, behavioral observations were conducted. We used correlative analyses to investigate the robustness of the different exploration tests. Due to the complexity of dependent behavioral variables, we used machine learning to determine the personality variables that were best at predicting swimming behavior. Our results show that contrary to our predictions, the correlation between novel tank diving test variables and open field test variables was relatively weak. Novel tank diving variables were more correlated with themselves than open field variables were to each other. Males exhibited stronger relationships between behavioral variables than did females. In terms of swimming behavior, fish that spent more time in the swimming zone spent more time actively swimming, however, swimming behavior was inconsistent across the time of the study. All relationships between swimming variables and exploration tests were relatively weak, though novel tank diving test variables had stronger correlations. Machine learning showed that three novel tank diving variables (entries top/bottom, movement rate, average top entry duration) and one open field variable (proportion of time spent frozen) were the best predictors of swimming behavior, demonstrating that the novel tank diving test is a powerful tool to investigate exploration. Increased knowledge about how individual differences in exploration may play a role in swimming behavior in zebrafish is fundamental to their utility as a model of exercise physiology and behavior.
u
ERA5 Reanalysis Model Level Data
data.ucar.edu
rda.ucar.edu
+2more
netcdf
Updated Mar 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (2025). ERA5 Reanalysis Model Level Data [Dataset]. http://doi.org/10.5065/XV5R-5344
Explore at:
netcdfAvailable download formats
Unique identifier
https://doi.org/10.5065/XV5R-5344
Dataset updated
Mar 8, 2025
Dataset provided by
Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory
Authors
European Centre for Medium-Range Weather Forecasts
Time period covered
Jan 1, 1979 - Dec 31, 2024
Area covered
Description
After many years of research and technical preparation, the production of a new ECMWF climate reanalysis to replace ERA-Interim is in progress. ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, which started with the FGGE reanalyses produced in the 1980s, followed by ERA-15, ERA-40 and most recently ERA-Interim. ERA5 will cover the period January 1950 to near real time. ERA5 is produced using high-resolution forecasts (HRES) at 31 kilometer resolution (one fourth the spatial resolution of the operational model) and a 62 kilometer resolution ten member 4D-Var ensemble of data assimilation (EDA) in CY41r2 of ECMWF's Integrated Forecast System (IFS) with 137 hybrid sigma-pressure (model) levels in the vertical, up to a top level of 0.01 hPa. Atmospheric data on these levels are interpolated to 37 pressure levels (the same levels as in ERA-Interim). Surface or single level data are also available, containing 2D parameters such as precipitation, 2 meter temperature, top of atmosphere radiation and vertical integrals over the entire atmosphere. The IFS is coupled to a soil model, the parameters of which are also designated as surface parameters, and an ocean wave model. Generally, the data is available at an hourly frequency and consists of analyses and short (12 hour) forecasts, initialized twice daily from analyses at 06 and 18 UTC. Most analyses parameters are also available from the forecasts. There are a number of forecast parameters, for example mean rates and accumulations, that are not available from the analyses. Improvements to ERA5, compared to ERA-Interim, include use of HadISST.2, reprocessed ECMWF climate data records (CDR), and implementation of RTTOV11 radiative transfer. Variational bias corrections have not only been applied to satellite radiances, but also ozone retrievals, aircraft observations, surface pressure, and radiosonde profiles. Please note: DECS is producing a CF 1.6 compliant netCDF-4/HDF5 version of ERA5...
ERA5 Reanalysis (Monthly Mean 0.25 Degree Latitude-Longitude Grid)
oidc.rda.ucar.edu
data.ucar.edu
+1more
Updated Nov 5, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (2019). ERA5 Reanalysis (Monthly Mean 0.25 Degree Latitude-Longitude Grid) [Dataset]. http://doi.org/10.5065/P8GT-0R61
Explore at:
Unique identifier
https://doi.org/10.5065/P8GT-0R61
Dataset updated
Nov 5, 2019
Dataset provided by
University Corporation for Atmospheric Research
Authors
European Centre for Medium-Range Weather Forecasts
Time period covered
Dec 31, 1978 - Dec 31, 2022
Area covered
Earth
Description
For RDA ERA5 monthly mean data prior to 1979, please see ds633.5: ERA5 monthly mean back extension 1950-1978 (Preliminary version) [https://rda.ucar.edu/datasets/ds633.5/] After many years of research and technical preparation, the production of a new ECMWF climate reanalysis to replace ERA-Interim is in progress. ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, which started with the FGGE reanalyses produced in the 1980s, followed by ERA-15, ERA-40 and most recently ERA-Interim. ERA5 will cover the period January 1950 to near real time.

ERA5 is produced using high-resolution forecasts (HRES) at 31 kilometer resolution (one fourth the spatial resolution of the operational model) and a 62 kilometer resolution ten member 4D-Var ensemble of data assimilation (EDA) in CY41r2 of ECMWF's Integrated Forecast System (IFS) with 137 hybrid sigma-pressure (model) levels in the vertical, up to a top level of 0.01 hPa. Atmospheric data on these levels are interpolated to 37 pressure levels (the same levels as in ERA-Interim). Surface or single level data are also available, containing 2D parameters such as precipitation, 2 meter temperature, top of atmosphere radiation and vertical integrals over the entire atmosphere. The IFS is coupled to a soil model, the parameters of which are also designated as surface parameters, and an ocean wave model. Generally, the data is available at an hourly frequency and consists of analyses and short (12 hour) forecasts, initialized twice daily from analyses at 06 and 18 UTC. Most analyses parameters are also available from the forecasts. There are a number of forecast parameters, e.g. mean rates and accumulations, that are not available from the analyses.

Improvements to ERA5, compared to ERA-Interim, include use of HadISST.2, reprocessed ECMWF climate data records (CDR), and implementation of RTTOV11 radiative transfer. Variational bias corrections have not only been applied to satellite radiances, but also ozone retrievals, aircraft observations, surface pressure, and radiosonde profiles.
w
What AB 2644 Means for Geothermal Exploratory Projects in California
data.wu.ac.at
Updated Dec 29, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). What AB 2644 Means for Geothermal Exploratory Projects in California [Dataset]. https://data.wu.ac.at/odso/geothermaldata_org/ZWUxOGFiY2EtOTBkNi00NTVkLWFlYjMtMjk2NjA5MzYzNzlj
Explore at:
Dataset updated
Dec 29, 2015
Description
No Publication Abstract is Available
cylistic_trip_data
kaggle.com
zip
Updated Jan 31, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracy Nguyen (2022). cylistic_trip_data [Dataset]. https://www.kaggle.com/trnguyen1510/cylistic-trip-data
Explore at:
zip(204750591 bytes)Available download formats
Dataset updated
Jan 31, 2022
Authors
Tracy Nguyen
Description
Context

Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path. By the end of this lesson, you will have a portfolio-ready case study.

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Content

The datasets contain the previous 12 months of Cyclistic trip data. The datasets have a different name because Cyclistic is a fictional company. For the purposes of this case study, the datasets are appropriate and will enable you to answer business questions.

Acknowledgements

This data has been made available by Motivate International Inc. under this license. This is public data that you can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit you from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

Inspiration

Research question: How do annual members and casual riders use Cylistic bikes differently.
m
Proposal of process optimazation and human capital factors as means of value...
data.mendeley.com
Updated Sep 30, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flavio Andrade (2019). Proposal of process optimazation and human capital factors as means of value generation in organizations [Dataset]. http://doi.org/10.17632/f3g6ythk5h.2
Explore at:
Unique identifier
https://doi.org/10.17632/f3g6ythk5h.2
Dataset updated
Sep 30, 2019
Authors
Flavio Andrade
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data are from a MSc survey research and represent the valuation of 19 variables aimed to depict both process optimization and human capital factors to hold an organizational strategy.
Z
Data from: Determinants of emotional distress in neonatal healthcare...
data.niaid.nih.gov
zenodo.org
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gagliardi Luigi (2022). Determinants of emotional distress in neonatal healthcare professionals: an exploratory analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7079092
Explore at:
Dataset updated
Dec 24, 2022
Dataset provided by
Provenzi Livio
Gagliardi Luigi
Merusi Ilaria
Ciotti Sabina
Grumi Serena
Nazzari Sarah
Description
This database includes the raw data linked with the paper “Determinants of emotional distress in neonatal healthcare professionals: an exploratory analysis”. This study is part of the Staff and Parental Adjustment to COVID-19 Epidemics – Neonatal Experience in Tuscany” (SPACE-NET) multicenter project. In this paper, we report data on potential predictors of emotional distress of healthcare professionals who work in neonatal wards (NWs) and neonatal intensive care units (NICUs).

Procedures - Healthcare professionals of seven level-3 and six level-2 neonatal units in Tuscany (Italy) were invited to complete an online survey. Emotional distress (i.e., anxiety, depression, psychosomatic, post-traumatic stress symptoms and emotional exhaustion), Behavioral Inhibition System (BIS) and Behavioral Approach System (BAS) sensitivity, coping strategies and safety culture were assessed through well-validated, self-reported questionnaires.

Analytical plan - Differences in mean levels of personality, coping and safety between professionals from NICUs or NWs were determined by Student’s t tests. Forward stepwise multivariate regression analyses were performed to identify significant predictors of Emotional Distress for the total sample and separately for professionals from NWs and NICUs. Furthermore, we performed a two-step cluster analysis to exploratorily identify specific profiles of professionals in terms of personality, coping strategies and safety culture and their relationship with emotional distress.

Findings in brief - Greater BIS/BAS sensitivity, avoidance coping strategies and a sub-dimension of safety culture (i.e., stress recognition) were all associated with greater risk of emotional distress, whereas job satisfaction emerged as a protective factor. Neonatal wards and NICUs personnel presented different associations between personality, coping and safety culture.
KID-F (K-pop Idol Dataset - Female)
kaggle.com
Updated Aug 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dongkyu Kim (2022). KID-F (K-pop Idol Dataset - Female) [Dataset]. https://www.kaggle.com/datasets/vkehfdl1/kidf-kpop-idol-dataset-female
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 5, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dongkyu Kim
Description
Description

K-pop Idol Dataset - Female (KID-F) is the first dataset of K-pop idol high quality face images. It consists of about 6,000 high quality face images at 512x512 resolution and identity labels for each image.

We collected about 90,000 K-pop female idol images and crop the face from each image. And we classified high quality face images. As a result, there are about 6,000 high quality face images in this dataset.

There are 300 test datasets for a benchmark. There are no duplicate images between test and train images. Some identities in test images are not duplicated with train images. (It means some test images is new identity to the trained model) Each test images have its degraded pair. You can use these degraded test images for testing face super resolution performance.

We also provide identity labels for each image.

You can use this dataset for training face super resolution models.

Agreement

The use of this software is RESTRICTED to non-commercial research and educational purposes.

All images of the KID-F dataset are obtained from the internet which are not property of EDA(PCEO-AI-CLUB). EDA is not responsible for the content nor the meaning of these images.

You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.

You agree not to further copy, publish or distribute any portion of the KID-F dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

EDA reserves the right to terminate your access to the CelebA dataset at any time.
ERA5 Reanalysis
oidc.rda.ucar.edu
data.ucar.edu
+1more
Updated Sep 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (2017). ERA5 Reanalysis [Dataset]. http://doi.org/10.5065/D6X34W69
Explore at:
Unique identifier
https://doi.org/10.5065/D6X34W69
Dataset updated
Sep 5, 2017
Dataset provided by
University Corporation for Atmospheric Research
Authors
European Centre for Medium-Range Weather Forecasts
Time period covered
Jan 1, 2002 - Feb 1, 2019
Area covered
Description
Please note: Please use ds633.0 to access RDA maintained ERA-5 data, see ERA5 Reanalysis (0.25 Degree Latitude-Longitude Grid) [https://rda.ucar.edu/datasets/ds633.0], RDA dataset ds633.0. This dataset is no longer being updated, and web access has been removed.

After many years of research and technical preparation, the production of a new ECMWF climate reanalysis to replace ERA-Interim is in progress. ERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, which started with the FGGE reanalyses produced in the 1980s, followed by ERA-15, ERA-40 and most recently ERA-Interim. ERA5 will cover the period January 1950 to near real time, though the first segment of data to be released will span the period 2010-2016.

ERA5 is produced using high-resolution forecasts (HRES) at 31 kilometer resolution (one fourth the spatial resolution of the operational model) and a 62 kilometer resolution ten member 4D-Var ensemble of data assimilation (EDA) in CY41r2 of ECMWF's Integrated Forecast System (IFS) with 137 hybrid sigma-pressure (model) levels in the vertical, up to a top level of 0.01 hPa. Atmospheric data on these levels are interpolated to 37 pressure levels (the same levels as in ERA-Interim). Surface or single level data are also available, containing 2D parameters such as precipitation, 2 meter temperature, top of atmosphere radiation and vertical integrals over the entire atmosphere. The IFS is coupled to a soil model, the parameters of which are also designated as surface parameters, and an ocean wave model. Generally, the data is available at an hourly frequency and consists of analyses and short (18 hour) forecasts, initialized twice daily from analyses at 06 and 18 UTC. Most analyses parameters are also available from the forecasts. There are a number of forecast parameters, e.g. mean rates and accumulations, that are not available from the analyses.

Improvements to ERA5, compared to ERA-Interim, include use of HadISST.2, reprocessed ECMWF climate data records (CDR), and implementation of RTTOV11 radiative transfer. Variational bias corrections have not only been applied to satellite radiances, but also ozone retrievals, aircraft observations, surface pressure, and radiosonde profiles.

NCAR's Data Support Section (DSS) is performing and supplying a grid transformed version of ERA5, in which variables originally represented as spectral coefficients or archived on a reduced Gaussian grid are transformed to a regular 1280 longitude by 640 latitude N320 Gaussian grid. In addition, DSS is also computing horizontal winds (u-component, v-component) from spectral vorticity and divergence where these are available. Finally, the data is reprocessed into single parameter time series.

Please note: As of November 2017, DSS is also producing a CF 1.6 compliant netCDF-4/HDF5 version of ERA5 for CISL RDA at NCAR. The netCDF-4/HDF5 version is the de facto RDA ERA5 online data format. The GRIB1 data format is only available via NCAR's High Performance Storage System (HPSS). We encourage users to evaluate the netCDF-4/HDF5 version for their work, and to use the currently existing GRIB1 files as a reference and basis of comparison. To ease this transition, there is a one-to-one correspondence between the netCDF-4/HDF5 and GRIB1 files, with as much GRIB1 metadata as possible incorporated into the attributes of the netCDF-4/HDF5 counterpart.

Facebook

Twitter

Click to copy link

Link copied

Cite

Einetic (2021). Exploratory Data Analytics and Descriptive Statistics [Dataset]. https://paper.erudition.co.in/makaut/bachelor-in-business-administration-2020-2021/5/data-analytics-skills-for-managers

Exploratory Data Analytics and Descriptive Statistics

EDADS

Explore at:

htmlAvailable download formats

Dataset updated

Jun 1, 2021

Dataset authored and provided by

Einetic

License

https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

Description

Question Paper Solutions of chapter Exploratory Data Analytics and Descriptive Statistics of Data Analytics Skills for Managers, 5th Semester , Bachelor in Business Administration 2020 - 2021

Clear search

Close search

Google apps

Main menu

Exploratory Data Analytics and Descriptive Statistics

Black Friday Sales EDA

Data from: Supplementary Material for "Sonification for Exploratory Data...

ERA5 Reanalysis Monthly Means

Data from: Evaluating the Use of Uncertainty Visualisations for Imputations...

Iterative Imputation of Jane St train.csv

I will be sharing all of my missing data exploration here:

Data from: Exploratory investigation of historical decorative laminates by...

Data from: Exploratory Twitter hashtag analysis of movie premieres in the...

Data from: Wrist-worn sensor validation for heart rate variability and...

Data from: Exploratory Research on the Impact of the Growing Oil Industry in...

Mean and standard deviation of SI by BMI group and maternal age.

Data from: The effects of exploratory behavior on physical activity in a...

ERA5 Reanalysis Model Level Data

ERA5 Reanalysis (Monthly Mean 0.25 Degree Latitude-Longitude Grid)

What AB 2644 Means for Geothermal Exploratory Projects in California

cylistic_trip_data

Context

Content

Acknowledgements

Inspiration

Proposal of process optimazation and human capital factors as means of value...

Data from: Determinants of emotional distress in neonatal healthcare...

KID-F (K-pop Idol Dataset - Female)

Description

Agreement

ERA5 Reanalysis

Exploratory Data Analytics and Descriptive Statistics

EDADS