Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Facebook
TwitterThis is the new (GPM-formated) TRMM product. It replaces the old TRMM_3A25,3A26Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07.This is the GPM-like formatted TRMM Precipitation Radar (PR) monthly gridded data, first released with the "V8" TRMM reprocessing. The TRMM radar Level 3 grids are now consistent with the GPM Dual-frequency Precipitation Radar (DPR). The closest ancestor of this dataset was the monthly radar statistics 3A25.This product consists of monthly statistics of the PR measurements at 0.25x0.25 degrees, and monthly histograms and statistics at 5x5 degrees, horizontal resolution.The objective of the algorithm is to calculate various daily statistics from the level 2 PRoutput products. Four types of statistics are calculated:1. Probabilities of occurrence (count values)2. Means and standard deviations3. Histograms4. Correlation coefficientsIn all cases, the statistics are conditioned on the presence of rain or some other quantity suchas the presence of stratiform rain or the presence of a bright-band. For example, to computethe unconditioned mean rain rate, the conditional mean must be multiplied by the probabilityof rain which, in turn is calculated from the ratio of rain counts to the total number ofobservations in the box of interest.The grids are in the Planetary Grid 2 structure matching the Dual-frequency PR on the core GPM observatory that covers 67S to 67N degrees of latitudes. The low resolution 5x5 deg grid covers 70S to 70N. Areas beyond the ±40 degrees of latitudes are padded with empty grid cells.
Facebook
TwitterFigures containing a histogram of frequency of effect sizes on AG and BG herbivores and a funnel plot of effect size and sample sizes indicating absence of publication bias.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Raw gaze, interview and other data from 50 secondary school students (grade 10 - 12) solving statical graph tasks: estimating or comparing the mean from histograms, case-value plots, (stacked) dotplots and horizontal histograms. It contains some processed data. Furthermore, it contains all relevant information needed to reproduce or replicate this data collection process, for example, the design of the data collection, html-files with the webpages that were used, letters to participants, sizes and screen shots of AOIs, heatmaps and static gazeplots. Also: transcripts, legends, overview of tasks. (Note, these data are not processed for a specific article)
Facebook
TwitterThis is the new (GPM-formated) TRMM product. It replaces the old TRMM_3A25,3A26Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07.This is the GPM-like formatted TRMM Precipitation Radar (PR) monthly gridded data, first released with the "V8" TRMM reprocessing. The TRMM radar Level 3 grids are now consistent with the GPM Dual-frequency Precipitation Radar (DPR). The closest ancestor of this dataset was the monthly radar statistics 3A25.This product consists of monthly statistics of the PR measurements at 0.25x0.25 degrees, and monthly histograms and statistics at 5x5 degrees, horizontal resolution.The objective of the algorithm is to calculate various daily statistics from the level 2 PRoutput products. Four types of statistics are calculated:1. Probabilities of occurrence (count values)2. Means and standard deviations3. Histograms4. Correlation coefficientsIn all cases, the statistics are conditioned on the presence of rain or some other quantity suchas the presence of stratiform rain or the presence of a bright-band. For example, to computethe unconditioned mean rain rate, the conditional mean must be multiplied by the probabilityof rain which, in turn is calculated from the ratio of rain counts to the total number ofobservations in the box of interest.The grids are in the Planetary Grid 2 structure matching the Dual-frequency PR on the core GPM observatory that covers 67S to 67N degrees of latitudes. The low resolution 5x5 deg grid covers 70S to 70N. Areas beyond the ±40 degrees of latitudes are padded with empty grid cells.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Molecular simulations of water adsorption in porous materials often converge slowly due to sampling bottlenecks that follow from hydrogen bonding and, in many cases, the formation of water clusters. These effects may be exacerbated in metal–organic framework (MOF) adsorbents, due to the presence of pore spaces (cages) that promote the formation of discrete-size clusters and hydrophobic effects (if present), among other reasons. In Grand Canonical Monte Carlo (MC) simulations, these sampling challenges are typically manifested by low MC acceptance ratios, a tendency for the simulation to become stuck in a particular loading state (i.e., macrostates), and the persistence of specific clusters for long periods of the simulation. We present simulation strategies to address these sampling challenges, by applying flat-histogram MC (FHMC) methods and specialized MC move types to simulations of water adsorption. FHMC, in both Transition-matrix and Wang–Landau forms, drives the simulation to sample relevant macrostates by incorporating weights that are self-consistently adjusted throughout the simulation and generate the macrostate probability distribution (MPD). Specialized MC moves, based on aggregation-volume bias and configurational bias methods, separately address low acceptance ratios for basic MC trial moves and specifically target water molecules in clusters; in turn, the specialized MC moves improve the efficiency of generating new configurations which is ultimately reflected in improved statistics collected by FHMC. The combined strategies are applied to study the adsorption of water in CuBTC and ZIF-8 at 300 K, through examination of the MPD and the adsorption isotherm generated by histogram reweighting. A key result is the appearance of nontrivial oscillations in the MPD, which we show to be associated with water clusters in the adsorption system. Additionally, we show that the probabilities of certain clusters become similar in value near the boundaries of the isotherm hysteresis loop, indicating a strong connection between cluster formation/destruction and the thermodynamic limits of stability. For a hydrophobic MOF, the FHMC results show that the phase transition from low density to high density is suppressed to water pressure far above the bulk-fluid saturation pressure; this is consistent with results presented elsewhere. We also compare our FHMC simulation isotherm to one measured by a different technique but with ostensibly the same molecular interactions and comment on observed differences and the need for follow-up work. The simulation strategies presented here can be applied to the simulation of water in other MOFs using heuristic guidelines laid out in our text, which should facilitate the more consistent and efficient simulation of water adsorption in porous materials in future applications.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset includes statistics about durations between two consecutive subtitles in 5,000 top-ranked IMDB movies. The dataset can be used to understand how dialogue is used in films and to develop tools to improve the watching experience. This notebook contains the code and data that were used to create this dataset.
Dataset statistics:
Dataset use cases:
Data Analysis:
The next histogram shows the distribution of movie runtimes in minutes. The mean runtime is 99.903 minutes, the maximum runtime is 877 minutes, and the median runtime is 98.5 minutes.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3228936%2F5c78e4866f203dfe5f7a7f55e41f69d0%2Ffig%201.png?generation=1696861842737260&alt=media" alt="">
Figure 1: Histogram of the runtime in minutes
The next histogram shows the distribution of the percentage of gaps (duration between two consecutive subtitles) out of all the movie runtime. The mean percentage of gaps is 0.187, the maximum percentage of gaps is 0.033, and the median percentage of gaps is 327.586.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3228936%2F235453706269472da11082f080b1f41d%2Ffig%202.png?generation=1696862163125288&alt=media" alt="">
Figure 2: Histogram of the percentage of gaps (duration between two consecutive subtitles) out of all the movie runtime
The next histogram shows the distribution of the total movie's subtitle duration (seconds) between two consecutive subtitles. The mean subtitle duration is 4,837.089 seconds and the median subtitle duration is 2,906.435 seconds.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3228936%2F234d31e3abaf6c4d174f494bf5cb86fa%2Ffig%203.png?generation=1696862309880510&alt=media" alt="">
Figure 3: Histogram of the total movie's subtitle duration (seconds) between two consecutive subtitles
Example use case:
The Dynamic Adjustment of Playback Speed (DAPS), a VLC extension, can be used to save time while watching movies by increasing the playback speed between dialogues. However, it is essential to choose the appropriate settings for the extension, as increasing the playback speed can impact the overall tone and impact of the film.
The dataset of 5,000 top-ranked movie subtitle durations can be used to help users choose the appropriate settings for the DAPS extension. For example, users who are watching a fast-paced action movie may want to set a higher minimum duration between subtitles before speeding up, while users who are watching a slow-paced drama movie may want to set a lower minimum duration.
Additionally, users can use the dataset to understand how the different settings of the DAPS extension impact the overall viewing experience. For example, users can experiment with different settings to see how they affect the pacing of the movie and the overall impact of the dialogue scenes.
Conclusion
This dataset is a valuable resource for researchers and developers who are interested in understanding and improving the use of dialogue in movies or in tools for watching movies.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The tasks (called items in the study) are the first 6 histogram and all 6 case-value plot tasks (hence, the first 12 tasks from the data in dataset 1_Raw_Data_Students). It contains all data needed for reproducing the results described in the qualitative article belonging to this dataset, including for example, codebook, coding of transcripts, RStudio file for calculating accuracy and precision. Also detailed coding results, including second coder results. Note that the raw data of this project as well as the design of the project, materials and so on are in the dataset: 1_Raw_Data_Students. The latter dataset is needed for replicating the whole eye-tracking study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The normal distribution of residential land prices in 2014–2017. (ZIP)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.