10 datasets found

w
Dataset of book subjects that contain The economics of immigration :...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain The economics of immigration : selected papers of Barry R. Chiswick [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+economics+of+immigration+:+selected+papers+of+Barry+R.+Chiswick&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 1 row and is filtered where the books is The economics of immigration : selected papers of Barry R. Chiswick. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Z
Spatio-temporal dynamics of attacks around deaths of wolves: A statistical...
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grente, Oksana; Opitz, Thomas; Duchamp, Christophe; Drouet-Hoguet, Nolwenn; Chamaillé-Jammes, Simon; Gimenez, Olivier (2025). Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_12772867
Explore at:
Dataset updated
Feb 19, 2025
Dataset provided by
Centre National de la Recherche Scientifique
Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement
French Biodiversity Agency
Authors
Grente, Oksana; Opitz, Thomas; Duchamp, Christophe; Drouet-Hoguet, Nolwenn; Chamaillé-Jammes, Simon; Gimenez, Olivier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
France
Description
This repository contains the supplementary materials (Supplementary_figures.docx, Supplementary_tables.docx) of the manuscript: "Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France". This repository also provides the R codes and datasets necessary to run the analyses described in the manuscript.

The R datasets with suffix "_a" have anonymous spatial coordinates to respect confidentiality. Therefore, the preliminary preparation of the data is not provided in the public codes. These datasets, all geolocated and necessary to the analyses, are:

Attack_sf_a.RData: 19,302 analyzed wolf attacks on sheep

ID: unique ID of the attack

DATE: date of the attack

PASTURE: the related pasture ID from "Pasture_sf_a" where the attack is located

STATUS: column resulting from the preparation and the attribution of attacks to pastures (part 2.2.4 of the manuscript); not shown here to respect confidentiality

Pasture_sf_a.RData: 4987 analyzed pastures grazed by sheep

ID: unique ID of the pasture

CODE: Official code in the pastoral census

FLOCK_SIZE: maximum annual number of sheep grazing in the pasture

USED_MONTHS: months for which the pasture is grazed by sheep

Removal_sf_a.RData: 232 analyzed single wolf removal or groups of wolf removals

ID: unique ID of the removal

OVERLAP: are they single removal ("non-interacting" in the manuscript => "NO" here), or not ("interacting" in the manuscrit, here "SIMULTANEOUS" for removals occurring during the same operation or "NON-SIMULTANEOUS" if not).

DATE_MIN: date of the single removal or date of the first removal of a group

DATE_MAX: date of the single removal or date of the last removal of a group

CLASS: administrative type of the removal according to definitions from 2.1 part of the manuscript

SEX: sex or sexes of the removed wolves if known

AGE: class age of the removed wolves if known

BREEDER: breeding status of the removed female wolves, "Yes" for female breeder, "No" for female non-breeder. Males are "No" by default, when necropsied; dead individuals with NA were not found.

SEASON: season of the removal, as defined in part 2.3.4 of the manuscript

MASSIF: mountain range attributed to the removal, as defined in part 2.3.4 of the manuscript

Area_to_exclude_sf_a.RData: one row for each mountain range, corresponding to the area where removal controls of the mountain range could not be sampled, as defined in part 2.3.6 of the manuscript

These datasets were used to run the following analyses codes:

Code 1 : The file Kernel_wolf_culling_attacks_p.R contains the before-after analyses.

We start by delimiting the spatio-temporal buffer for each row of the "Removal_sf_a.RData" dataset.

We identify the attacks from "Attack_sf_a.RData" within each buffer, giving the data frame "Buffer_df" (one row per attack)

We select the pastures from "Pasture_sf_a.RData" within each buffer, giving the data frame "Buffer_sf" (one row per removal)

We calculate the spatial correction

We spatially slice each buffer into 200 rings, giving the data frame "Ring_sf" (one row per ring)

We add the total pastoral area of the ring of the attack ("SPATIAL_WEIGHT"), for each attack of each buffer, within Buffer_df ("Buffer_df.RData")

We calculate the pastoral correction

We create the pastoral matrix for each removal, giving a matrix of 200 rows (one for each ring) and 180 columns (one for each day, 90 days before the removal date and 90 day after the removal date), with the total pastoral area in use by sheep for each corresponding cell of the matrix (one element per removal, "Pastoral_matrix_lt.RData")

We simulate, for each removal, the random distribution of the attacks from "Buffer_df.RData" according to "Pastoral_matrix_lt.RData". The process is done 100 times (one element per simulation, "Buffer_simulation_lt.RData").

We estimate the attack intensities

We classified the removals into 20 subsets, according to part 2.3.4 of the manuscript ("Variables_lt.RData") (one element per subset)

We perform, for each subset, the kernel estimations with the observed attacks ("Kernel_lt.RData"), with the simulated attacks ("Kernel_simulation_lt.RData") and we correct the first kernel computations with the second ("Kernel_controlled_lt.RData") (one element per subset).

We calculate the trend of attack intensities, for each subset, that compares the total attack intensity before and after the removals (part 2.3.5 of the manuscript), giving "Trends_intensities_df.RData". (one row per subset)

We calculate the trend of attack intensities, for each subset, along the spatial axis, three times, one for each time analysis scale. This gives "Shift_df" (one row per ring and per time analysis scale.

Code 2 : The file Control_removals_p.R contains the control-impact analyses.

It starts with the simulation of 100 removal control sets ("Control_sf_lt_a.RData") from the real set of removals ("Removal_sf_a.RData"), that is done with the function "Control_fn" (l. 92).

The rest of the analyses follows the same process as in the first code "Kernel_wolf_culling_attacks_p.R", in order to apply the before-after analyses to each control set. All objects have the same structure as before, except that they are now a list, with one resulting element per control set. These objects have "control" in their names (not to be confused with "controlled" which refers to the pastoral correction already applied in the first code).

The code is also applied again, from l. 92 to l. 433, this time for the real set of removals (l. 121) - with "Simulated = FALSE" (l. 119). We could not simply use the results from the first code because the set of removals is restricted to removals attributed to mountain ranges only. There are 2 resulting objects: "Kernel_real_lt.RData" (observed real trends) and "Kernel_controlled_real_lt.RData" (real trends corrected for pastoral use).

The part of the code from line 439 to 524 relates to the calculations of the trends (for the real set and the control sets), as in the first code, giving "Trends_intensities_real_df.RData" and "Trends_intensities_control_lt.RData".

The part of the code from line 530 to 588 relates to the calculation of the 95% confidence intervals and the means of the intensity trends for each subset based on the results of the 100 control sets (Trends_intensities_mean_control_df.RData, Trends_intensities_CImin_control_df.RData and Trends_intensities_CImax_control_df.RData). This will be used to test the significativity of the real trends. This comparison is done right after, l. 595-627, and gives the data frame "Trends_comparison_df.RData".

Code 3 : The file Figures.R produces part of the figures from the manuscript:

"Dataset map": figure 1

"Buffer": figure 2 (then pasted in powerpoint)

"Kernel construction": figure 5 (then pasted in powerpoint)

"Trend distributions": figure 7

"Kernels": part of figures 10 and S2

"Attack shifts": figure 9 and S1

"Significant": figure 8
w
Dataset of books called The wit & wisdom of Tommy Dewar : a selection from...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The wit & wisdom of Tommy Dewar : a selection from the speeches of Sir Thomas R. Dewar, whisky baron [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+wit+%26+wisdom+of+Tommy+Dewar+%3A+a+selection+from+the+speeches+of+Sir+Thomas+R.+Dewar%2C+whisky+baron
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is The wit & wisdom of Tommy Dewar : a selection from the speeches of Sir Thomas R. Dewar, whisky baron. It features 7 columns including author, publication date, language, and book publisher.
Dataset for modeling spatial and temporal variation in natural background...
catalog.data.gov
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Dataset for modeling spatial and temporal variation in natural background specific conductivity [Dataset]. https://catalog.data.gov/dataset/dataset-for-modeling-spatial-and-temporal-variation-in-natural-background-specific-conduct
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This file contains the data set used to develop a random forest model predict background specific conductivity for stream segments in the contiguous United States. This Excel readable file contains 56 columns of parameters evaluated during development. The data dictionary provides the definition of the abbreviations and the measurement units. Each row is a unique sample described as R** which indicates the NHD Hydrologic Unit (underscore), up to a 7-digit COMID, (underscore) sequential sample month. To develop models that make stream-specific predictions across the contiguous United States, we used StreamCat data set and process (Hill et al. 2016; https://github.com/USEPA/StreamCat). The StreamCat data set is based on a network of stream segments from NHD+ (McKay et al. 2012). These stream segments drain an average area of 3.1 km2 and thus define the spatial grain size of this data set. The data set consists of minimally disturbed sites representing the natural variation in environmental conditions that occur in the contiguous 48 United States. More than 2.4 million SC observations were obtained from STORET (USEPA 2016b), state natural resource agencies, the U.S. Geological Survey (USGS) National Water Information System (NWIS) system (USGS 2016), and data used in Olson and Hawkins (2012) (Table S1). Data include observations made between 1 January 2001 and 31 December 2015 thus coincident with Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data (https://modis.gsfc.nasa.gov/data/). Each observation was related to the nearest stream segment in the NHD+. Data were limited to one observation per stream segment per month. SC observations with ambiguous locations and repeat measurements along a stream segment in the same month were discarded. Using estimates of anthropogenic stress derived from the StreamCat database (Hill et al. 2016), segments were selected with minimal amounts of human activity (Stoddard et al. 2006) using criteria developed for each Level II Ecoregion (Omernik and Griffith 2014). Segments were considered as potentially minimally stressed where watersheds had 0 - 0.5% impervious surface, 0 – 5% urban, 0 – 10% agriculture, and population densities from 0.8 – 30 people/km2 (Table S3). Watersheds with observations with large residuals in initial models were identified and inspected for evidence of other human activities not represented in StreamCat (e.g., mining, logging, grazing, or oil/gas extraction). Observations were removed from disturbed watersheds, with a tidal influence or unusual geologic conditions such as hot springs. About 5% of SC observations in each National Rivers and Stream Assessment (NRSA) region were then randomly selected as independent validation data. The remaining observations became the large training data set for model calibration. This dataset is associated with the following publication: Olson, J., and S. Cormier. Modeling spatial and temporal variation in natural background specific conductivity. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 53(8): 4316-4325, (2019).
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
o
Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...
openicpsr.org
doi.org
+1more
Updated Aug 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/E102263V5
Explore at:
Unique identifier
https://doi.org/10.3886/E102263V5
Dataset updated
Aug 16, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1980 - 2016
Area covered
United States
Description
Version 5 release notes:
Removes support for SPSS and Excel data.
Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.
Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.
Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.
Order rows by year (descending) and ORI.
Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.

The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.

All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:
Total Male Juvenile
Total Female Juvenile
Total Male Adult
Total Female Adult
Total Ma
The whole rows of the used dataset [32].
plos.figshare.com
xls
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khalid Aljohani (2024). The whole rows of the used dataset [32]. [Dataset]. http://doi.org/10.1371/journal.pone.0309242.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0309242.t001
Dataset updated
Sep 4, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Khalid Aljohani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent decades, unfavorable solubility of novel therapeutic agents is considered as an important challenge in pharmaceutical industry. Supercritical carbon dioxide (SCCO2) is known as a green, cost-effective, high-performance, and promising solvent to develop the low solubility of drugs with the aim of enhancing their therapeutic effects. The prominent objective of this study is to improve and modify disparate predictive models through artificial intelligence (AI) to estimate the optimized value of the Oxaprozin solubility in SCCO2 system. In this paper, three different models were selected to develop models on a solubility dataset. Pressure (bar) and temperature (K) are the two inputs for each vector, and each vector has one output (solubility). Selected models include NU-SVM, Linear-SVM, and Decision Tree (DT). Models were optimized through hyper-parameters and assessed applying standard metrics. Considering R-squared metric, NU-SVM, Linear-SVM, and DT have scores of 0.994, 0.854, and 0.950, respectively. Also, they have RMSE error rates of 3.0982E-05, 1.5024E-04, and 1.1680E-04, respectively. Based on the evaluations made, NU-SVM was considered as the most precise method, and optimal values can be summarized as (T = 336.05 K, P = 400.0 bar, solubility = 0.00127) employing this model. Fig 4
120 years of Olympic history: athletes and results
kaggle.com
zip
Updated Jun 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
rgriffin (2018). 120 years of Olympic history: athletes and results [Dataset]. https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
Explore at:
zip(5690772 bytes)Available download formats
Dataset updated
Jun 15, 2018
Authors
rgriffin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis.

Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.

Content

The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are:

ID - Unique number for each athlete

Name - Athlete's name

Sex - M or F

Age - Integer

Height - In centimeters

Weight - In kilograms

Team - Team name

NOC - National Olympic Committee 3-letter code

Games - Year and season

Year - Integer

Season - Summer or Winter

City - Host city

Sport - Sport

Event - Event

Medal - Gold, Silver, Bronze, or NA

Acknowledgements

The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis.

Inspiration

This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.
Detailed NFL Play-by-Play Data 2009-2018
kaggle.com
zip
Updated Dec 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Horowitz (2018). Detailed NFL Play-by-Play Data 2009-2018 [Dataset]. https://www.kaggle.com/datasets/maxhorowitz/nflplaybyplay2009to2016
Explore at:
zip(287411671 bytes)Available download formats
Dataset updated
Dec 22, 2018
Authors
Max Horowitz
Description
Introduction

The lack of publicly available National Football League (NFL) data sources has been a major obstacle in the creation of modern, reproducible research in football analytics. While clean play-by-play data is available via open-source software packages in other sports (e.g. nhlscrapr for hockey; PitchF/x data in baseball; the Basketball Reference for basketball), the equivalent datasets are not freely available for researchers interested in the statistical analysis of the NFL. To solve this issue, a group of Carnegie Mellon University statistical researchers including Maksim Horowitz, Ron Yurko, and Sam Ventura, built and released nflscrapR an R package which uses an API maintained by the NFL to scrape, clean, parse, and output clean datasets at the individual play, player, game, and season levels. Using the data outputted by the package, the trio went on to develop reproducible methods for building expected point and win probability models for the NFL. The outputs of these models are included in this dataset and can be accessed using the nflscrapR package.

Content

The dataset made available on Kaggle contains all the regular season plays from the 2009-2016 NFL seasons. The dataset has 356,768 rows and 100 columns. Each play is broken down into great detail containing information on: game situation, players involved, results, and advanced metrics such as expected point and win probability values. Detailed information about the dataset can be found at the following web page, along with more NFL data: https://github.com/ryurko/nflscrapR-data.

Acknowledgements

This dataset was compiled by Ron Yurko, Sam Ventura, and myself. Special shout-out to Ron for improving our current expected points and win probability models and compiling this dataset. All three of us are proud founders of the Carnegie Mellon Sports Analytics Club.

Inspiration

This dataset is meant to both grow and bring together the community of sports analytics by providing clean and easily accessible NFL data that has never been availabe on this scale for free.
Data from: Global Superstore Dataset
kaggle.com
zip
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih İlhan (2023). Global Superstore Dataset [Dataset]. https://www.kaggle.com/datasets/fatihilhan/global-superstore-dataset
Explore at:
zip(3349507 bytes)Available download formats
Dataset updated
Nov 16, 2023
Authors
Fatih İlhan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About this file The Kaggle Global Superstore dataset is a comprehensive dataset containing information about sales and orders in a global superstore. It is a valuable resource for data analysis and visualization tasks. This dataset has been processed and transformed from its original format (txt) to CSV using the R programming language. The original dataset is available here, and the transformed CSV file used in this analysis can be found here.

Here is a description of the columns in the dataset:

category: The category of products sold in the superstore.

city: The city where the order was placed.

country: The country in which the superstore is located.

customer_id: A unique identifier for each customer.

customer_name: The name of the customer who placed the order.

discount: The discount applied to the order.

market: The market or region where the superstore operates.

ji_lu_shu: An unknown or unspecified column.

order_date: The date when the order was placed.

order_id: A unique identifier for each order.

order_priority: The priority level of the order.

product_id: A unique identifier for each product.

product_name: The name of the product.

profit: The profit generated from the order.

quantity: The quantity of products ordered.

region: The region where the order was placed.

row_id: A unique identifier for each row in the dataset.

sales: The total sales amount for the order.

segment: The customer segment (e.g., consumer, corporate, or home office).

ship_date: The date when the order was shipped.

ship_mode: The shipping mode used for the order.

shipping_cost: The cost of shipping for the order.

state: The state or region within the country.

sub_category: The sub-category of products within the main category.

year: The year in which the order was placed.

market2: Another column related to market information.

weeknum: The week number when the order was placed.

This dataset can be used for various data analysis tasks, including understanding sales patterns, customer behavior, and profitability in the context of a global superstore.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Work With Data (2024). Dataset of book subjects that contain The economics of immigration : selected papers of Barry R. Chiswick [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+economics+of+immigration+:+selected+papers+of+Barry+R.+Chiswick&j=1&j0=books

Dataset of book subjects that contain The economics of immigration : selected papers of Barry R. Chiswick

Explore at:

Dataset updated

Nov 7, 2024

Dataset authored and provided by

Work With Data

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is about book subjects. It has 1 row and is filtered where the books is The economics of immigration : selected papers of Barry R. Chiswick. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

Clear search

Close search

Google apps

Main menu

Dataset of book subjects that contain The economics of immigration :...

Spatio-temporal dynamics of attacks around deaths of wolves: A statistical...

Dataset of books called The wit & wisdom of Tommy Dewar : a selection from...

Dataset for modeling spatial and temporal variation in natural background...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

The whole rows of the used dataset [32].

120 years of Olympic history: athletes and results

Context

Content

Acknowledgements

Inspiration

Detailed NFL Play-by-Play Data 2009-2018

Introduction

Content

Acknowledgements

Inspiration

Data from: Global Superstore Dataset

Dataset of book subjects that contain The economics of immigration : selected papers of Barry R. Chiswick