29 datasets found

Reddit: /r/news
kaggle.com
zip
Updated Dec 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit: /r/news [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-popularity-and-user-engagement-trends/discussion
Explore at:
zip(146481 bytes)Available download formats
Dataset updated
Dec 17, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Reddit: /r/news

Exploring Topics, Scores, and Engagement

By Reddit [source]

About this dataset

This dataset provides an in-depth look into learning what communities find important and engaging in the news. With this data, researchers can discover trends related to user engagement and popular topics within subreddits. By examining the “score” and “comms_num” columns, our researchers will be able to pinpoint which topics are most liked, discussed or shared within the various subreddits. Researchers may also gain insights into not only how popular a topic is but how it is growing over time. Additionally, by exploring the body column of our dataset, researchers can understand more about which types of news stories drive conversation within particular subreddits—providing an opportunity for deeper analysis of that subreddit’s diverse community dynamics

The dataset includes eight columns: title, score, id, url, comms_num created**body and timestamp** which can help us identify key insights into user engagement among popular subreddits. With this data we may also determine relationships between topics of discussion and their impact on user engagement allowing us to create a better understanding surrounding issue-based conversations online as well as uncover emerging trends in online news consumption habits

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is useful for those who are looking to gain insight into the popularity and user engagement of specific subreddits. The data includes 8 different columns including title, score, id, url, comms_num, created, body and timestamp. This can provide valuable information about how users view and interact with particular topics across various subreddits.

In this guide we’ll look at how you can use this dataset to uncover trends in user engagement on topics within specific subreddits as well as measure the overall popularity of these topics within a subreddit.

1) Analyzing Score: By analyzing the “score” column you can determine which news stories are popular in a particular subreddit and which ones aren't by looking at how many upvotes each story has received. With this data you will be able to determine trends in what types of stories users preferred within a particular subreddit over time.

2) Analyzing Comms_Num: Similarly to analyzing the score column you can analyze the “comms_num” column to see which news stories had more engagement from users by tracking number of comments received on each post. Knowing these points can provide insight into what types of stories tend to draw more comment activity from users in certain subreddits from one day or an extended period of time such tracking post activity for multiple weeks or months at once 3) Analyzing Body: Additionally by looking at the “body” column for each post researchers can gain a better understanding which kinds of topics/news draw attention among specific Reddit communities.. With that complete picture researchers have access not only to data measuring Reddit buzz but also access topic discussion/comments helping generate further insights into why certain posts might be popular or receive more comments than others

Overallthis dataset provides valuable insights about user engagedment related specifically topics trending accross subsbreddits allowing anyone interested reseraching such things easier way access those insights all one place

Research Ideas

Grouping news topics within particular subreddits and assessing the overall popularity of those topics in terms of scores/user engagement.

Correlating user engagement with certain news topics to understand how they influence discussion or reactions on a subreddit.

Examining the potential correlation between score and the actual body content of a given post to assess what types of content are most successful in gaining interest from users and creating positive engagement for posts

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: news.csv | Column name | Description ...
U
Water column sample data from predefined locations of the West Florida...
data.usgs.gov
search.dataone.org
+2more
Updated Feb 15, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Geological Survey (2014). Water column sample data from predefined locations of the West Florida Shelf: USGS Cruise 11BHM03 [Dataset]. https://data.usgs.gov/datacatalog/data/USGS:94b95e3f-fe33-40d3-885f-54c70ead5714
Explore at:
Dataset updated
Feb 15, 2014
Dataset provided by
United States Geological Survey
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Sep 20, 2011 - Sep 28, 2011
Area covered
Florida
Description
The United States Geological Survey (USGS) is conducting a study on the effects of climate change on ocean acidification within the Gulf of Mexico; dealing specifically with the effect of ocean acidification on marine organisms and habitats. To investigate this, the USGS participated in two cruises in the West Florida Shelf and northern Gulf of Mexico regions aboard the R/V Weatherbird II, a ship of opportunity lead by Dr. Kendra Daly, of the University of South Florida (USF). The cruises occurred September 20 - 28 and November 2 - 4, 2011. Both left from and returned to Saint Petersburg, Florida, but followed different routes (see Trackline). On both cruises the USGS collected data pertaining to pH, dissolved inorganic carbon (DIC), and total alkalinity in discrete samples. Discrete surface samples were taken during transit approximatly hourly on both cruises, 95 in September were collected over a span of 2127 km, and 7 over a trackline of 732 km line on the November cruise. Along wit ...
b
Video Plankton Recorder data (formatted with taxa displayed in single...
bco-dmo.org
search.dataone.org
csv
Updated Jul 31, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carin J. Ashjian (2012). Video Plankton Recorder data (formatted with taxa displayed in single column); from R/V Columbus Iselin and R/V Endeavor cruises CI9407, EN259, EN262 in the Gulf of Maine and Georges Bank from 1994-1995 [Dataset]. https://www.bco-dmo.org/dataset/3685
Explore at:
csv(370.26 MB)Available download formats
Dataset updated
Jul 31, 2012
Dataset provided by
Biological and Chemical Data Management Office
Authors
Carin J. Ashjian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Gulf of Maine
Variables measured
lat, lon, sal, temp, year, fluor, press, taxon, flvolt, abund_L, and 9 more
Measurement technique
Video Plankton Recorder
Description
This dataset includes ALL the abundance values, zero and non-zero. Taxonomic groups are diplayed in the 'taxon' column, rather than in separate columns, with abundances in the 'abund_L' column. For the original presentation of the data, see VPR_ashjian_orig. For a version of the data with only non-zero data, see VPR_ashjian_nonzero. In the 'nonzero' dataset, values of 0 in the abund_L column (taxon abundance) have been removed.

Methodology
The following information was extracted from C.J. Ashjian et al., Deep- Sea Research II 48(2001) 245-282 . An in-depth discussion of the data and sampling methods can be found there.

The Video Plankton Recorder was towed at 2 m/s, collecting data from the surface to the bottom (towyo). The VPR was equipped with 2-4 cameras, temperature and conductivity probes, fluorometer and transmissometer. Environmental data was collected at 0.25 Hz (CI9407) or 0.5 Hz (EN259, EN262). Video images were recorded at 60 fields per second (fps).

Video tapes were analyzed for plankton abundances using a semi-automated method discussed in Davis, C.S. et al., Deep-Sea Research II 43 (1996) 1946-1970. In-focus images were extracted from the video tapes and identified by hand to particle type, taxon, or species. Plankton and particle observations were merged with environmental and navigational data by binning the observations for each category into the time intervals at which the environmental data were collected (again see above Davis citation). Concentrations were calculated utilizing the total volume (liters) imaged during that period. For less-abundant categories, usually only a single organism was observed during each time interval so that the resulting concentrations are close to presence or absence data rather than covering a range of values.
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
d
Numerical code and data for the stellar structure and dynamical instability...
datadryad.org
search.dataone.org
+1more
zip
Updated May 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arun Mathew; Malay K. Nandy (2021). Numerical code and data for the stellar structure and dynamical instability analysis of generalised uncertainty white dwarfs [Dataset]. http://doi.org/10.5061/dryad.dncjsxkzt
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.dncjsxkzt
Dataset updated
May 10, 2021
Dataset provided by
Dryad
Authors
Arun Mathew; Malay K. Nandy
Time period covered
Apr 28, 2021
Description
There is a total of 17 datasets to produce all the Figures in the article. There are mainly two different data files: GUP White Dwarf Mass-Radius (GUPWD_M-R) data and GUP White Dwarf Profile (GUPWD_Profile) data.

The file GUPWD_M-R gives only the Mass-Radius relation with Radius (km) in the first column and Mass (solar mass) in the second.

On the other hand GUPWD_Profile provides the complete profile with following columns.

column 1: Dimensionless central Fermi Momentum $\xi_c$ column 2: Central Density $\rho_c$ ( Log10 [$\rho_c$ g cm$^{-3}$] ) column 3: Radius $R$ (km) column 4: Mass $M$ (solar mass) column 5: Square of fundamental frequency $\omega_0^2$ (sec$^{-2}$)

=====================================================================================

Figure 1 (a) gives Mass-Radius (M-R) curves for $\beta_0=10^{42}$, $10^{41}$ and $10^{40}$. The filenames of the corresponding dataset are

GUPWD_M-R[Beta0=E42].dat GUPWD_M-R[Beta0=E41].dat GUPWD_M-R[Beta0...
Tennessee Eastman Process Simulation Dataset
kaggle.com
zip
Updated Feb 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergei Averkiev (2020). Tennessee Eastman Process Simulation Dataset [Dataset]. https://www.kaggle.com/averkij/tennessee-eastman-process-simulation-dataset
Explore at:
zip(1370814903 bytes)Available download formats
Dataset updated
Feb 9, 2020
Authors
Sergei Averkiev
Description
Intro

This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017.

Content

Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files.

Each dataframe contains 55 columns:

Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions).

Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping).

Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively.

Columns 4 to 55 contain the process variables; the column names retain the original variable names.

Acknowledgements

This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.

User Agreement

By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms.

The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission.

In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights.

Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law.

When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work.

This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website.
H
Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...
dataverse.harvard.edu
dataone.org
Updated Jul 6, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cory A. Rieth; Ben D. Amsel; Randy Tran; Maia B. Cook (2017). Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation [Dataset]. http://doi.org/10.7910/DVN/6C3JR1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/6C3JR1
Dataset updated
Jul 6, 2017
Dataset provided by
Harvard Dataverse
Authors
Cory A. Rieth; Ben D. Amsel; Randy Tran; Maia B. Cook
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1
Description
User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
w
Dataset of ISBN of books by Jason R. Hackworth
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of ISBN of books by Jason R. Hackworth [Dataset]. https://www.workwithdata.com/datasets/books?col=book%2Cisbn&f=1&fcol0=author&fop0=%3D&fval0=Jason+R.+Hackworth
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 2 rows and is filtered where the author is Jason R. Hackworth. It features 2 columns including ISBN.
R Package History on CRAN
kaggle.com
zip
Updated Jul 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heads or Tails (2022). R Package History on CRAN [Dataset]. https://www.kaggle.com/datasets/headsortails/r-package-history-on-cran/code
Explore at:
zip(5637913 bytes)Available download formats
Dataset updated
Jul 18, 2022
Authors
Heads or Tails
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Comprehensive R Archive Network (CRAN) is the central repository for software packages in the powerful R programming language for statistical computing. It describes itself as "a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R." If you're installing an R package in the standard way then it is provided by one of the CRAN mirrors.

The ecosystem of R packages continues to grow at an accelerated pace, covering a multitude of aspects of statistics, machine learning, data visualisation, and many other areas. This dataset provides monthly updates of all the packages available through CRAN, as well as their release histories. Explore the evolution of the R multiverse and all of its facets through this comprehensive data.

Content

I'm providing 2 csv tables that describe the current set of R packages on CRAN, as well as the version history of these packages. To derive the data, I made use of the fantastic functionality of the tools package, via the CRAN_package_db function, and the equally wonderful packageRank package and its packageHistory function. The results from those function were slightly adjusted and formatted. I might add further related tables over time.

See the associated blog post for how the data was derived, and for some ideas on how to explore this dataset.

These are the tables contained in this dataset:

cran_package_overview.csv: all R packages currently available through CRAN, with (usually) 1 row per package. (At the time of the creation of this Kaggle dataset there were a few packages with 2 entries and different dependencies. Feel free to contribute some EDA investigating those.) Packages are listed in alphabetical order according to their names.

cran_package_history.csv: version history of virtually all packages in the previous table. This table has one row for each combination of package name and version number, which in most cases leads to multiple rows per package. Packages are listed in alphabetical order according to their names.

I will update this dataset on a roughly monthly cadence by checking which packages have newer version in the overview table, and then replacing

Column Description

Table cran_package_overview.csv: I decided to simplify the large number of columns provided by CRAN and tools::CRAN_package_db into a smaller set of more focus features. All columns are formatted as strings, except for the boolean feature needs_compilation, but the date_published can be read as a ymd date:

package: package name following the official spelling and capitalisation. Table is sorted alphabetically according to this column.

version: current version.

depends: package depends on which other packages.

imports: package imports which other packages.

licence: the licence under which the package is distributed (e.g. GPL versions)

needs_compilation: boolean feature describing whether the package needs to be compiled.

author: package author.

bug_reports: where to send bugs.

url: where to read more.

date_published: when the current version of the package was published. Note: this is not the date of the initial package release. See the package history table for that.

description: relatively detailed description of what the package is doing.

title: the title and tagline of the package.

Table cran_package_history.csv: The output of packageRank::packageHistory for each package from the overview table. Almost all of them have a match in this table, and can be matched by package and version. All columns are strings, and the date can again be parsed as a ymd date:

package: package name. Joins to the feature of the same name in the overview table. Table is sorted alphabetically according to this column.

version: historical or current package version. Also joins. Secondary sorting column within each package name.

date: when this version was published. Should sort in the same way as the version does.

repository: on CRAN or in the Archive.

Acknowledgements

All data is being made publicly available by the Comprehensive R Archive Network (CRAN). I'm grateful to the authors and maintainers of the packages tools and packageRank for providing the functionality to query CRAN packages smoothly and easily.

The vignette photo is the official logo for the R language © 2016 The R Foundation. You can distribute the logo under the terms of the Creative Commons Attribution-ShareAlike 4.0 International license...
g
CTD data collected onboard R/V Weatherbird II cruise WB-0717 in the Gulf of...
data.griidc.org
search.dataone.org
+1more
Updated Feb 12, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Schwing (2019). CTD data collected onboard R/V Weatherbird II cruise WB-0717 in the Gulf of Mexico from 2017-07-19 to 2017-07-30 [Dataset]. http://doi.org/10.7266/n7-q3pt-vd56
Explore at:
Unique identifier
https://doi.org/10.7266/n7-q3pt-vd56
Dataset updated
Feb 12, 2019
Dataset provided by
GRIIDC
Authors
Patrick Schwing
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered

Description
This dataset includes CTD and environmental data for nine stations collected onboard R/V Weatherbird II cruise WB-0717 in the Gulf of Mexico from 2017-07-19 to 2017-07-30. The dataset includes 16 profiles/casts of temperature, salinity, conductivity, chlorophyll-a and colored dissolved organic matter fluorescence, turbidity, oxygen saturation, sound velocity, altimetry and Photosynthetically Available Radiation (PAR). R/V Weatherbird II cruise WB-0717 was led by chief scientist Dr. Steve Murawski. The cruise objectives were to evaluate the extent of fish disease and to characterize the distribution and fate of oiled sediment at sites on the continental shelf and slope of the northern Gulf of Mexico.
d
Data from: Data and code from: Stem borer herbivory dependent on...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Sep 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: Stem borer herbivory dependent on interactions of sugarcane variety, associated traits, and presence of prior borer damage [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-stem-borer-herbivory-dependent-on-interactions-of-sugarcane-variety-ass-1e076
Explore at:
Dataset updated
Sep 2, 2025
Dataset provided by
Agricultural Research Service
Description
This dataset contains all the data and code needed to reproduce the analyses in the manuscript: Penn, H. J., & Read, Q. D. (2023). Stem borer herbivory dependent on interactions of sugarcane variety, associated traits, and presence of prior borer damage. Pest Management Science. https://doi.org/10.1002/ps.7843 Included are two .Rmd notebooks containing all code required to reproduce the analyses in the manuscript, two .html file of rendered notebook output, three .csv data files that are loaded and analyzed, and a .zip file of intermediate R objects that are generated during the model fitting and variable selection process. Notebook files 01_boring_analysis.Rmd: This RMarkdown notebook contains R code to read and process the raw data, create exploratory data visualizations and tables, fit a Bayesian generalized linear mixed model, extract output from the statistical model, and create graphs and tables summarizing the model output including marginal means for different varieties and contrasts between crop years. 02_trait_covariate_analysis.Rmd: This RMarkdown notebook contains R code to read raw variety-level trait data, perform feature selection based on correlations between traits, fit another generalized linear mixed model using traits as predictors, and create graphs and tables from that model output including marginal means by categorical trait and marginal trends by continuous trait. HTML files These HTML files contain the rendered output of the two RMarkdown notebooks. They were generated by Quentin Read on 2023-08-30 and 2023-08-15. 01_boring_analysis.html 02_trait_covariate_analysis.html CSV data files These files contain the raw data. To recreate the notebook output the CSV files should be at the file path project/data/ relative to where the notebook is run. Columns are described below. BoredInternodes_26April2022_no format.csv: primary data file with sugarcane borer (SCB) damage Columns A-C are the year, date, and location. All location values are the same. Column D identifies which experiment the data point was collected from. Column E, Stubble, indicates the crop year (plant cane or first stubble) Column F indicates the variety Column G indicates the plot (integer ID) Column H indicates the stalk within each plot (integer ID) Column I, # Internodes, indicates how many internodes were on the stalk Columns J-AM are numbered 1-30 and indicate whether SCB damage was observed on that internode (0 if no, 1 if yes, blank cell if that internode was not present on the stalk) Column AN indicates the experimental treatment for those rows that are part of a manipulative experiment Column AO contains notes variety_lookup.csv: summary information for the 16 varieties analyzed in this study Column A is the variety name Column B is the total number of stalks assessed for SCB damage for that variety across all years Column C is the number of years that variety is present in the data Column D, Stubble, indicates which crop years were sampled for that variety ("PC" if only plant cane, "PC, 1S" if there are data for both plant cane and first stubble crop years) Column E, SCB resistance, is a categorical designation with four values: susceptible, moderately susceptible, moderately resistant, resistant Column F is the literature reference for the SCB resistance value Select_variety_traits_12Dec2022.csv: variety-level traits for the 16 varieties analyzed in this study Column A is the variety name Column B is the SCB resistance designation as an integer Column C is the categorical SCB resistance designation (see above) Columns D-I are continuous traits from year 1 (plant cane), including sugar (Mg/ha), biomass or aboveground cane production (Mg/ha), TRS or theoretically recoverable sugar (g/kg), stalk weight of individual stalks (kg), stalk population density (stalks/ha), and fiber content of stalk (percent). Columns J-O are the same continuous traits from year 2 (first stubble) Columns P-V are categorical traits (in some cases continuous traits binned into categories): maturity timing, amount of stalk wax, amount of leaf sheath wax, amount of leaf sheath hair, tightness of leaf sheath, whether leaf sheath becomes necrotic with age, and amount of collar hair. ZIP file of intermediate R objects To recreate the notebook output without having to run computationally intensive steps, unzip the archive. The fitted model objects should be at the file path project/ relative to where the notebook is run. intermediate_R_objects.zip: This file contains intermediate R objects that are generated during the model fitting and variable selection process. You may use the R objects in the .zip file if you would like to reproduce final output including figures and tables without having to refit the computationally intensive statistical models. binom_fit_intxns_updated_only5yrs.rds: fitted brms model object for the main statistical model binom_fit_reduced.rds: fitted brms model object for the trait covariate analysis marginal_trends.RData: calculated values of the estimated marginal trends with respect to year and previous damage marginal_trend_trs.rds: calculated values of the estimated marginal trend with respect to TRS marginal_trend_fib.rds: calculated values of the estimated marginal trend with respect to fiber content Resources in this dataset:Resource Title: Sugarcane borer damage data by internode, 1993-2021. File Name: BoredInternodes_26April2022_no format.csvResource Title: Summary information for the 16 sugarcane varieties analyzed. File Name: variety_lookup.csvResource Title: Variety-level traits for the 16 sugarcane varieties analyzed. File Name: Select_variety_traits_12Dec2022.csvResource Title: RMarkdown notebook 2: trait covariate analysis. File Name: 02_trait_covariate_analysis.RmdResource Title: Rendered HTML output of notebook 2. File Name: 02_trait_covariate_analysis.htmlResource Title: RMarkdown notebook 1: main analysis. File Name: 01_boring_analysis.RmdResource Title: Rendered HTML output of notebook 1. File Name: 01_boring_analysis.htmlResource Title: Intermediate R objects. File Name: intermediate_R_objects.zip
a
dfsav
rstudio-pubs-static.s3.amazonaws.com
Updated Aug 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). dfsav [Dataset]. https://rstudio-pubs-static.s3.amazonaws.com/796695_ba5065250da44efa9263b28dbe1b7b12.html
Explore at:
Dataset updated
Aug 5, 2021
Variables measured
I, J, K, L, M, N, O, P, Q, S, and 201 more
Description
The dataset has N=1135 rows and 211 columns. 2 rows have no missing values on any column.

Table of variables

This table contains variable names, labels, and number of missing values. See the complete codebook for more.

[truncated]

Note

This dataset was automatically described using the codebook R package (version 0.9.2).
b
Water column analytical data from the Chile Triple Junction collected on two...
bco-dmo.org
search.dataone.org
csv
Updated Mar 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher R. German; Tamara Baumberger; Mak A. Saito (2022). Water column analytical data from the Chile Triple Junction collected on two cruises aboard the R/V Melville during February-March 2010 and April 2012 [Dataset]. http://doi.org/10.26008/1912/bco-dmo.871203.1
Explore at:
csv(13.00 KB)Available download formats
Unique identifier
https://doi.org/10.26008/1912/bco-dmo.871203.1
Dataset updated
Mar 28, 2022
Dataset provided by
Biological and Chemical Data Management Office
Authors
Christopher R. German; Tamara Baumberger; Mak A. Saito
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 25, 2010 - Apr 26, 2012
Area covered

Variables measured
CH4, Cast, Date, TDFe, TDMn, Depth, Bottle, Latitude, Time_UTC, Cruise_ID, and 3 more
Measurement technique
Gas Chromatograph, Mass Spectrometer, Centrifuge, Inductively Coupled Plasma Mass Spectrometer, CTD Sea-Bird SBE 911plus
Description
The hydrothermal plume samples reported here were acquired aboard R/V Melville during two short cruises of opportunity conducted in 2010 (MV1003) and 2012 (MV1205). Surveys along the axis of the ridge-crest were conducted using the ship's Seabird 911+ CTD rosette. A combination of tow-yo, vertical casts, and "pogo" stations were employed. An ultra-short baseline (USBL) navigation beacon was attached to the CTD-rosette for all deployments to ensure that we could navigate precisely where all samples and ancillary data were collected, as well as their sample depths.
Z
Data from: Lower complexity of motor primitives ensures robust control of...
data.niaid.nih.gov
Updated Jun 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Santuz, Alessandro; Ekizos, Antonis; Kunimasa, Yoko; Kijima, Kota; Ishikawa, Masaki; Arampatzis, Adamantios (2022). Lower complexity of motor primitives ensures robust control of high-speed human locomotion [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3764760
Explore at:
Dataset updated
Jun 18, 2022
Dataset provided by
Osaka University of Health and Sport Sciences
Humboldt-Universität zu Berlin
Humboldt-Universität zu Berlin, Dalhousie University
Authors
Santuz, Alessandro; Ekizos, Antonis; Kunimasa, Yoko; Kijima, Kota; Ishikawa, Masaki; Arampatzis, Adamantios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Walking and running are mechanically and energetically different locomotion modes. For selecting one or another, speed is a parameter of paramount importance. Yet, both are likely controlled by similar low-dimensional neuronal networks that reflect in patterned muscle activations called muscle synergies. Here, we investigated how humans synergistically activate muscles during locomotion at different submaximal and maximal speeds. We analysed the duration and complexity (or irregularity) over time of motor primitives, the temporal components of muscle synergies. We found that the challenge imposed by controlling high-speed locomotion forces the central nervous system to produce muscle activation patterns that are wider and less complex relative to the duration of the gait cycle. The motor modules, or time-independent coefficients, were redistributed as locomotion speed changed. These outcomes show that robust locomotion control at challenging speeds is achieved by modulating the relative contribution of muscle activations and producing less complex and wider control signals, whereas slow speeds allow for more irregular control.

In this supplementary data set we made available: a) the metadata with anonymized participant information, b) the raw EMG, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via NMF and f) the code to process the data, including the scripts to calculate the Higuchi's fractal dimension (HFD) of motor primitives. In total, 180 trials from 30 participants are included in the supplementary data set.

The file “metadata.dat” is available in ASCII and RData format and contains:

Code: the participant’s code

Group: the experimental group in which the participant was involved (G1 = walking and submaximal running; G2 = submaximal and maximal running)

Sex: the participant’s sex (M or F)

Speeds: the type of locomotion (W for walking or R for running) and speed at which the recordings were conducted in 10*[m/s]

Age: the participant’s age in years

Height: the participant’s height in [cm]

Mass: the participant’s body mass in [kg]

PB: 100 m-personal best time (for G2).

The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17). All the other trials consist of 30 gait cycles. Trials are named like “P20_R_20,” where the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.

Old versions not compatible with the R package musclesyneRgies

The files containing the gait cycle breakdown are available in RData format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with as many rows as the available number of gait cycles and two columns. The first column named “touchdown” contains the touchdown incremental times in seconds. The second column named “stance” contains the duration of each stance phase of the right foot in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P20_R_20,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times, the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17).

The files containing the raw, filtered and the normalized EMG data are available in RData format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with as many rows as the amount of recorded data points and 13 columns. The first column named “time” contains the incremental time in seconds. The remaining 12 columns contain the raw EMG data, named with muscle abbreviations that follow those reported above. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P03_R_30”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P03” indicate the participant number (in this example the 3rd), the character “R” indicate the locomotion type (see above), and the numbers “30” indicate the locomotion speed (see above). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.

The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorisation and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed in the methods section to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P12_W_07”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P12” indicate the participant number (in this example the 12th), the character “W” indicate the locomotion type (see above), and the numbers “07” indicate the speed (see above). Motor modules are data frames with 12 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported above, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P22_R_20”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P22” indicate the participant number (in this example the 22nd), the character “W” indicates the locomotion type (see above), and the numbers “20” indicate the speed (see above). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.

The files containing the HFD calculated from motor primitives are available in RData format, in the file named “HFD.RData”. HFD results are presented in a list of lists containing, for each trial, 1) the HFD, and 2) the interval time k used for the calculations. HFDs are presented as one number (mean HFD of the primitives for that trial), as are the interval times k. Trials are named like “HFD_P01_R_95”, where the characters “HFD” indicate that the trial contains HFD data, the characters “P01” indicate the participant number (in this example the 1st), the character “R” indicates the locomotion type (see above), and the numbers “95” indicate the speed (see above).

All the code used for the pre-processing of EMG data, the extraction of muscle synergies and the calculation of HFD is available in R format. Explanatory comments are profusely present throughout the script “muscle_synergies.R”.
Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...
search.datacite.org
doi.org
+1more
Updated 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/e102263v5-10021
Explore at:
Unique identifier
https://doi.org/10.3886/e102263v5-10021
Dataset updated
2018
Dataset provided by
DataCitehttps://www.datacite.org/
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Jacob Kaplan
Description
Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.

As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:

Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
r
Inequality measures based on election data 1871 and 1892 for Swedish...
researchdata.se
demo.researchdata.se
Updated Apr 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Moricz (2019). Inequality measures based on election data 1871 and 1892 for Swedish municipalities [Dataset]. http://doi.org/10.5878/cw7b-g897
Explore at:
(429925)Available download formats
Unique identifier
https://doi.org/10.5878/cw7b-g897
Dataset updated
Apr 30, 2019
Dataset provided by
Lund University
Authors
Sara Moricz
Time period covered
1871
Area covered
Sweden
Description
The data contains inequality measures at the municipality-level for 1892 and 1871, as estimated in the PhD thesis "Institutions, Inequality and Societal Transformations" by Sara Moricz. The data also contains the source publications: 1) tabel 1 from “Bidrag till Sverige official statistik R) Valstatistik. XI. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1892” (biSOS R 1892) 2) tabel 1 from “Bidrag till Sverige official statistik R) Valstatistik. II. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1871” (biSOS R 1871)

moricz_inequality_agriculture.csv

A UTF-8 encoded .csv-file. Each row is a municipality of the agricultural sample (2222 in total). Each column is a variable.

R71muncipality_id: a unique identifier for the municipalities in the R1871 publication (the municipality name can be obtained from the source data) R92muncipality_id: a unique identifier for the municipalities in the R1892 publication (the municipality name can be obtained from the source data) agriTop1_1871: an ordinal measure (ranking) of the top 1 income share in the agricultural sector for 1871 agriTop1_1892: an ordinal measure (ranking) of the top 1 income share in the agricultural sector for 1892 highestFarm_1871: a cardinal measure of the top 1 person share in the agricultural sector for 1871 highestFarm_1871: a cardinal measure of the top 1 person share in the agricultural sector for 1892

moricz_inequality_industry.csv

A UTF-8 encoded .csv-file. Each row is a municipality of the industrial sample (1328 in total). Each column is a variable.

R71muncipality_id: see above description R92muncipality_id: see above description indTop1_1871: an ordinal measure (ranking) of the top 1 income share in the industrial sector for 1871 indTop1_1892: an ordinal measure (ranking) of the top 1 income share in the industrial sector for 1892

moricz_R1892_source_data.csv

A UTF-8 encoded .csv-file with the source data. The variables are described in the adherent codebook moricz_R1892_source_data_codebook.csv.

Contains table 1 from “Bidrag till Sverige official statistik R) Valstatistik. XI. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1892” (biSOS R 1892). SCB provides the scanned publication on their website. Dollar Typing Service typed and delivered the data in 2015. All numerical variables but two have been checked. This is easy to do since nearly all columns should sum up to another column. For “Folkmangd” (population) the numbers have been corrected against U1892. The highest estimate of errors in the variables is 0.005 percent (0.5 promille), calculated at cell level. The two numerical variables which have not been checked is “hogsta_fyrk_jo“ and “hogsta_fyrk_ov“, as this cannot much be compared internally in the data. According to my calculations as the worst case scenario, I have measurement errors of 0.0043 percent (0.43 promille) in those variables.

moricz_R1871_source_data.csv

A UTF-8 encoded .csv-file with the source data. The variables are described in the adherent codebook moricz_R1871_source_data_codebook.csv.

Contains table 1 from “Bidrag till Sverige official statistik R) Valstatistik. II. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1871” (biSOS R 1871). SCB provides the scanned publication on their website. Dollar Typing Service typed and delivered the data in 2015. The variables have been checked for accuracy, which is feasible since columns and rows should sum. The variables that most likely carry mistakes are “hogsta_fyrk_al” and “hogsta_fyrk_jo”.
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...
openicpsr.org
search.datacite.org
Updated May 18, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1991-2017 [Dataset]. http://doi.org/10.3886/E103500V5
Explore at:
Unique identifier
https://doi.org/10.3886/E103500V5
Dataset updated
May 18, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1991 - 2017
Area covered
United States
Description
For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), changed the name of some UCR offense codes (e.g. from "agg asslt" to "aggravated assault"), made all character values lower case, reordered columns. I also added state, county, and place FIPS code from the LEAIC (crosswalk) and generated incident month, weekday, and month-day variables from the incident date variable included in the original data.
b
Water column GeoFish and bottle pH data from Leg 2 (Hilo, HI to Papeete,...
bco-dmo.org
search.dataone.org
csv
Updated May 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory A. Cutter; Nicole R. Buckley (2021). Water column GeoFish and bottle pH data from Leg 2 (Hilo, HI to Papeete, French Polynesia) of the US GEOTRACES Pacific Meridional Transect (PMT) cruise (GP15, RR1815) on R/V Roger Revelle from October to November 2018 [Dataset]. http://doi.org/10.26008/1912/bco-dmo.838173.2
Explore at:
csv(24.11 KB)Available download formats
Unique identifier
https://doi.org/10.26008/1912/bco-dmo.838173.2
Dataset updated
May 6, 2021
Dataset provided by
Biological and Chemical Data Management Office
Authors
Gregory A. Cutter; Nicole R. Buckley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 25, 2018 - Nov 21, 2018
Area covered

Variables measured
Event_ID, Sample_ID, Station_ID, End_Date_UTC, End_Latitude, End_Time_UTC, Sample_Depth, End_Longitude, Start_Date_UTC, Start_Latitude, and 10 more
Measurement technique
Spectrophotometer, GO-FLO Bottle, GeoFish Towed near-Surface Sampler
Description
Water column fish and bottle pH data
Water column temperature profiles at station PS104_13-2 colected during RV...
doi.pangaea.de
search.dataone.org
html, tsv
Updated Jan 25, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricarda Dziadek (2019). Water column temperature profiles at station PS104_13-2 colected during RV POLARSTERN cruise PS104, 2017 [Dataset]. http://doi.org/10.1594/PANGAEA.897721
Explore at:
html, tsvAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.897721
Dataset updated
Jan 25, 2019
Dataset provided by
PANGAEA
Authors
Ricarda Dziadek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 19, 2017 - Feb 20, 2017
Area covered

Variables measured
DEPTH, water, Temperature, water
Description
Temperature profiles of the water column, extracted from Miniaturized Temperature data-Logger (MTL) measurements. Depth calculated from winch data.
Data from: Humans exploit robust locomotion by improving the stability of...
zenodo.org
explore.openaire.eu
bin
Updated Jun 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessandro Santuz; Alessandro Santuz; Leon Brüll; Antonis Ekizos; Antonis Ekizos; Arno Schroll; Nils Eckardt; Nils Eckardt; Armin Kibele; Armin Kibele; Michael Schwenk; Michael Schwenk; Adamantios Arampatzis; Adamantios Arampatzis; Leon Brüll; Arno Schroll (2022). Humans exploit robust locomotion by improving the stability of control signals [Dataset]. http://doi.org/10.5281/zenodo.2687682
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2687682
Dataset updated
Jun 17, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alessandro Santuz; Alessandro Santuz; Leon Brüll; Antonis Ekizos; Antonis Ekizos; Arno Schroll; Nils Eckardt; Nils Eckardt; Armin Kibele; Armin Kibele; Michael Schwenk; Michael Schwenk; Adamantios Arampatzis; Adamantios Arampatzis; Leon Brüll; Arno Schroll
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background

Is the control of movement less stable when we walk or run in challenging settings? Intuitively, one might answer that it is, given that adding constraints to locomotion (e.g. rough terrain, age-related impairments, etc.) makes movements less stable. Here, we investigated how young and old humans synergistically activate muscles during locomotion when different perturbation levels are introduced. Of these control signals, called muscle synergies, we analyzed the stability over time. Surprisingly, we found that perturbations and older age force the central nervous system to produce muscle activation patterns that are more stable. These outcomes show that robust locomotion in challenging settings is achieved by increasing the stability of control signals, whereas easier tasks allow for more unstable control.

How to use the data set

This supplementary data set contains: a) the metadata with anonymized participant information, b) the raw electromyographic (EMG) data acquired during locomotion, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via non-negative matrix factorization and f) the code written in R (R Found. for Stat. Comp.) to process the data, including the scripts to calculate the Maximum Lyapunov Exponents of motor primitives. In total, 476 trials from 86 participants are included in the supplementary data set.

The file “participant_data.dat” is available in ASCII and RData (R Found. for Stat. Comp.) format and contains:

Code: the participant’s code

Experiment: the experimental setup in which the participant was involved (E1 = walking and running, overground and treadmill; E2 = walking and running, even- and uneven-surface; E3 = unperturbed and perturbed walking, young and old)

Group: the group to which the participant was assigned (see above for the details)

Sex: the participant’s sex (M or F)

Speed: the speed at which the recordings were conducted in [m/s] (two values separated by a comma mean that recordings were done at two different speeds, i.e. walking and running)

Age: the participant’s age in years (participants were considered old if older than 65 years, but younger than 80)

Height: the participant’s height in [cm]

Mass: the participant’s body mass in [kg].

The files containing the gait cycle breakdown are available in RData (R Found. for Stat. Comp.) format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with 30 rows (one for each gait cycle) and two columns. The first column contains the touchdown incremental times in seconds. The second column contains the duration of each stance phase in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P0020,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times and the characters “P0020” indicate the participant number (in this example the 20th). Please note that the overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 only contain 22, 27 and 23 cycles, respectively.

The files containing the raw, filtered and the normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with 30000 rows (one for each recorded data point) and 14 columns. The first column contains the incremental time in seconds. The remaining thirteen columns contain the raw EMG data, named with muscle abbreviations that follow those reported in the Materials and Methods section of this Supplementary Materials file. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P0053_OG_02”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P0053_OG_02”.

The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorization and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “Time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed above to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P0012_PW_02”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P0012” indicate the participant number (in this example the 12th), ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Motor modules are data frames with 13 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported in the Materials and Methods section of this Supplementary Materials file, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P0082_PW_02”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P0082” indicate the participant number (in this example the 82nd) ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.

The files containing the MLE calculated from motor primitives are available in RData (R Found. for Stat. Comp.) format, in the file named “MLE.RData”. MLE results are presented in a list of lists containing, for each trial, 1) the divergences, 2) the MLE, and 3) the value of the R² between the divergence curve and its linear interpolation made using the specified amount of points. The divergences are presented as a one-dimensional vector. MLE are one number like the R² value. Trials are named like “MLE_P0081_EW_01”, where the characters “MLE” indicate that the trial contains MLE data, the characters “P0081” indicate the participant number (in this example the 81st) ), the characters “EW” indicate the locomotion type (see above), and the numbers “01” indicate the trial number (in this case the 1st).

All the code used for the preprocessing of EMG data, the extraction of muscle synergies and the calculation of MLE is available in R (R Found. for Stat. Comp.) format. Explanatory comments are profusely present throughout the scripts (“SYNS.R”, which is the script to extract synergies, “fun_NMF.R”, which contains the NMF function, “MLE.R”, which is the script to calculate the MLE of motor primitives and “fun_MLE.R”, which contains the MLE function).

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Reddit: /r/news [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-popularity-and-user-engagement-trends/discussion

Reddit: /r/news

Exploring Topics, Scores, and Engagement

Explore at:

zip(146481 bytes)Available download formats

Dataset updated

Dec 17, 2022

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Reddit: /r/news

Exploring Topics, Scores, and Engagement

By Reddit [source]

About this dataset

This dataset provides an in-depth look into learning what communities find important and engaging in the news. With this data, researchers can discover trends related to user engagement and popular topics within subreddits. By examining the “score” and “comms_num” columns, our researchers will be able to pinpoint which topics are most liked, discussed or shared within the various subreddits. Researchers may also gain insights into not only how popular a topic is but how it is growing over time. Additionally, by exploring the body column of our dataset, researchers can understand more about which types of news stories drive conversation within particular subreddits—providing an opportunity for deeper analysis of that subreddit’s diverse community dynamics

The dataset includes eight columns: title, score, id, url, comms_num created**body and timestamp** which can help us identify key insights into user engagement among popular subreddits. With this data we may also determine relationships between topics of discussion and their impact on user engagement allowing us to create a better understanding surrounding issue-based conversations online as well as uncover emerging trends in online news consumption habits

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is useful for those who are looking to gain insight into the popularity and user engagement of specific subreddits. The data includes 8 different columns including title, score, id, url, comms_num, created, body and timestamp. This can provide valuable information about how users view and interact with particular topics across various subreddits.

In this guide we’ll look at how you can use this dataset to uncover trends in user engagement on topics within specific subreddits as well as measure the overall popularity of these topics within a subreddit.

1) Analyzing Score: By analyzing the “score” column you can determine which news stories are popular in a particular subreddit and which ones aren't by looking at how many upvotes each story has received. With this data you will be able to determine trends in what types of stories users preferred within a particular subreddit over time.

2) Analyzing Comms_Num: Similarly to analyzing the score column you can analyze the “comms_num” column to see which news stories had more engagement from users by tracking number of comments received on each post. Knowing these points can provide insight into what types of stories tend to draw more comment activity from users in certain subreddits from one day or an extended period of time such tracking post activity for multiple weeks or months at once 3) Analyzing Body: Additionally by looking at the “body” column for each post researchers can gain a better understanding which kinds of topics/news draw attention among specific Reddit communities.. With that complete picture researchers have access not only to data measuring Reddit buzz but also access topic discussion/comments helping generate further insights into why certain posts might be popular or receive more comments than others

Overallthis dataset provides valuable insights about user engagedment related specifically topics trending accross subsbreddits allowing anyone interested reseraching such things easier way access those insights all one place

Research Ideas

Grouping news topics within particular subreddits and assessing the overall popularity of those topics in terms of scores/user engagement.

Correlating user engagement with certain news topics to understand how they influence discussion or reactions on a subreddit.

Examining the potential correlation between score and the actual body content of a given post to assess what types of content are most successful in gaining interest from users and creating positive engagement for posts

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: news.csv | Column name | Description ...

Clear search

Close search

Google apps

Main menu

Reddit: /r/news

Reddit: /r/news

Exploring Topics, Scores, and Engagement

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Water column sample data from predefined locations of the West Florida...

Video Plankton Recorder data (formatted with taxa displayed in single...

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Numerical code and data for the stellar structure and dynamical instability...

Tennessee Eastman Process Simulation Dataset

Intro

Content

Acknowledgements

User Agreement

Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...

Dataset of ISBN of books by Jason R. Hackworth

R Package History on CRAN

Context

Content

Column Description

Acknowledgements

CTD data collected onboard R/V Weatherbird II cruise WB-0717 in the Gulf of...

Data from: Data and code from: Stem borer herbivory dependent on...

dfsav

Table of variables

Note

Water column analytical data from the Chile Triple Junction collected on two...

Data from: Lower complexity of motor primitives ensures robust control of...

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

Inequality measures based on election data 1871 and 1892 for Swedish...

moricz_inequality_agriculture.csv

moricz_inequality_industry.csv

moricz_R1892_source_data.csv

moricz_R1871_source_data.csv

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

Water column GeoFish and bottle pH data from Leg 2 (Hilo, HI to Papeete,...

Water column temperature profiles at station PS104_13-2 colected during RV...

Data from: Humans exploit robust locomotion by improving the stability of...

Reddit: /r/news

Exploring Topics, Scores, and Engagement

Reddit: /r/news

Exploring Topics, Scores, and Engagement

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns