This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such as the U.S. census. The fisheries databases used include port listings, weighout (dealer) landings, permit information on homeports and owner cities of residence, dealer permit information, and logbook records. The database consolidated port names in line with USGS and Census conventions, and corrected typographical errors, non-conventional spellings, or other issues. Each row is a community, and there may be confidential data since not all communities have 3 or more entities for the various variables.
The files and workflow will allow you to replicate the study titled "Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus". This study aimed at utilizing the CEToM-corpus (https://cetom.univie.ac.at/) (Tocharian) to analyze the life-world of the elites of an extinct society situated in modern eastern China. To acquire the raw data needed for steps 1 & 2, please contact Melanie Malzahn melanie.malzahn@univie.ac.at. We conducted a mixed methods study, containing of close reading, content analysis, and multiple correspondence analysis (MCA). The excel file titled "fragments_architecture_combined.xlsx" allows for replication of the MCA and equates to the third step of the workflow outlined below. We used the following programming languages and packages to prepare the dataset and to analyze the data. Data preparation and merging procedures were achieved in python (version 3.9.10) with packages pandas (version 1.5.3), os (version 3.12.0), re (version 3.12.0), numpy (version 1.24.3), gensim (version 4.3.1), BeautifulSoup4 (version 4.12.2), pyasn1 (version 0.4.8), and langdetect (version 1.0.9). Multiple Correspondence Analyses were conducted in R (version 4.3.2) with the packages FactoMineR (version 2.9), factoextra (version 1.0.7), readxl version(1.4.3), tidyverse version(2.0.0), ggplot2 (version 3.4.4) and psych (version 2.3.9). After requesting the necessary files, please open the scripts in the order outlined bellow and execute the code-files to replicate the analysis: Preparatory step: Create a folder for the python and r-scripts downloadable in this repository. Open the file 0_create folders.py and declare a root folder in line 19. This first script will generate you the following folders: "tarim-brahmi_database" = Folder, which contains tocharian dictionaries and tocharian text fragments. "dictionaries" = contains tocharian A and tocharian B vocabularies, including linguistic features such as translations, meanings, part of speech tags etc. A full overview of the words is provided on https://cetom.univie.ac.at/?words. "fragments" = contains tocharian text fragments as xml-files. "word_corpus_data" = folder will contain excel-files of the corpus data after the first step. "Architectural_terms" = This folder contains the data on the architectural terms used in the dataset (e.g. dwelling, house). "regional_data" = This folder contains the data on the findsports (tocharian and modern chinese equivalent, e.g. Duldur-Akhur & Kucha). "mca_ready_data" = This is the folder, in which the excel-file with the merged data will be saved. Note that the prepared file named "fragments_architecture_combined.xlsx" can be saved into this directory. This allows you to skip steps 1 &2 and reproduce the MCA of the content analysis based on the third step of our workflow (R-Script titled 3_conduct_MCA.R). First step - run 1_read_xml-files.py: loops over the xml-files in folder dictionaries and identifies a) word metadata, including language (Tocharian A or B), keywords, part of speech, lemmata, word etymology, and loan sources. Then, it loops over the xml-textfiles and extracts a text id number, langauge (Tocharian A or B), text title, text genre, text subgenre, prose type, verse type, material on which the text is written, medium, findspot, the source text in tocharian, and the translation where available. After successful feature extraction, the resulting pandas dataframe object is exported to the word_corpus_data folder. Second step - run 2_merge_excel_files.py: merges all excel files (corpus, data on findspots, word data) and reproduces the content analysis, which was based upon close reading in the first place. Third step - run 3_conduct_MCA.R: recodes, prepares, and selects the variables necessary to conduct the MCA. Then produces the descriptive values, before conducitng the MCA, identifying typical texts per dimension, and exporting the png-files uploaded to this repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.
IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.
IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform
The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.
Due to the changes in our systems, some tables have been affected.
Data quality has been improved across all tables.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is a cleaned and merged version of the original UCI Online Retail and Online Retail II datasets. It contains transaction data from a UK-based online retailer, covering a period from December 2009 to December 2011.
The original UCI Online Retail II dataset contains two separate sheets: - Year 2009–2010 - Year 2010–2011
These have been merged with the original UCI Online Retail dataset to create a unified and continuous dataset.
quantity
price
customer_id
total_price
column (quantity × price
)is_cancelled
column based on invoice format or return flaginvoicedate
formattingColumn | Description |
---|---|
invoice | Invoice number (returns start with 'C') |
stockcode | Product code |
description | Description of product |
quantity | Number of items purchased |
invoicedate | Date and time of invoice |
price | Unit price in GBP |
customer_id | Unique identifier for each customer |
country | Customer’s country |
is_cancelled | Boolean flag for cancelled transactions |
total_price | Computed total (quantity × price ) for each line item |
File | Type | Description |
---|---|---|
online_retail_cleaned.csv | Data | Cleaned and merged retail transactions from 2009–2011 |
rfm_final_score.csv | Output | Final RFM scores for each customer with segment labels |
Retail_Data_Analysis_Dashboard.xlsx | Excel | Interactive Excel dashboard with KPIs, CLV, monthly trends |
Retail_Data_Analysis_Dashboard.png | Image | Visual preview of the Excel dashboard |
RFM_Segmentation.sql | SQL | SQL logic to calculate RFM scores and assign segments |
Cohort_Analysis_on_Customer.sql | SQL | Cohort analysis based on acquisition month |
Cohort_Analysis_on_Revenue.sql | SQL | Cohort revenue tracking over time |
In addition to the cleaned dataset, this dataset includes complete analysis artifacts:
These files are provided in .xlsx
and .sql
formats and can be used for further business analysis or modeling.
Original datasets: - UCI Online Retail II: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II
This version was cleaned and merged by: Md Shah Nawaj
retail, ecommerce, customer segmentation, transactions, time series, data cleaning, rfm, python, pandas, online retail
http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj
The Eastern Partnership Risk Analysis Network (EaP-RAN) performs monthly exchanges of statistical data and information on the most recent irregular migration trends. This information is compiled at the level of the Frontex Risk Analysis Unit (RAU) and analysed in cooperation with the regional partners on a quarterly and annual basis. The annual reports offer a more in-depth analysis of the occurring developments and phenomena which impact the regional and common borders while the quarterly reports are meant to provide regular updates and identify emerging trends in order to maintain situational awareness. Both types of reports are aimed at offering support for strategic and operational decision making.
The Eastern Partnership Quarterly statistical overview is focused on quarterly developments for the seven key indicators of irregular migration: (1) detections of illegal border-crossing between BCPs; (2) detections of illegal border-crossing at BCPs; (3) refusals of entry; (4) detections of illegal stay; (5) asylum applications; (6) detections of facilitators; and (7) detections of fraudulent documents.
The backbone of this overview are monthly statistics provided within the framework of the EaP-RAN (Armenia, Azerbaijan, Belarus, Georgia, Moldova and Ukraine) and reference period statistics from common border sections of the neighbouring EU Member States and Schengen Associated Countries (Norway, Finland, Estonia, Latvia, Lithuania, Poland, Slovakia, Hungary and Romania). The data are processed, checked for errors and merged into an Excel database for further analysis.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These data sets are the analysis files used in article ''Identification of Hub Genes Associated With Cervical Cancer by Integrated Bioinformatics Analysis'', obtained from public databases GEO and TCGA respectively. It contains 6 Excel files, which are described below:GSE63514.xlsx: The expression data of tumor group and control group in dataset GSE63514 were included.GSE9750.xlsx: The expression data of tumor group and control group in dataset GSE9750 were included.GEO-merge.xlsx: Data sets GSE63514 and GSE9750 were merged and standardized to obtain the total expression data.Clinical.xlsx: Clinical data of CESC patients in TCGA database.TCGA_RPKM.xlsx:The expression data of CESC patients (FPKM) in TCGA database was transformed into RPKM expression data by R software.Genelist.xlsx: It contains differentially expressed genes, genes of the most relevant modules in WGCNA, and a list of four part intersection genes.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data files in this study have been created by merging individual per-county census files available as Microsoft Excel spreadsheets. Thus the study contains 111 data files, 1 file per census category. Each of these files contains merged census data for the following counties: GULOU 鼓楼区 (350102), TAIJIANG 台江区 (350103), CANGSHAN 仓山区 (350104), MAWEI 马尾区 (350105), JINAN 晋安区 (350111), MINHOU 闽侯县 (350121), LIANJIANG 连江县 (350122), LUOYUAN 罗源县 (350123), MINQING 闽清县 (350124), YONGTAI 永泰县 (350125), PINGTAN 平潭县 (350128), FUQING 福清市 (350181), CHANGLE 长乐市 (350182), GULANGYU 鼓浪屿区 (350202), SIMING 思明区 (350203), KAIYUAN 开元区 (350204), XINGLIN 杏林区 (350205), HULI 湖里区 (350206), JIMEI 集美区 (350211), TONGAN 同安区 (350212), CHENGXIANG 城厢区 (350302), HANJIANG 涵江区 (350303), PUTIAN 莆田县 (350321), XIANYOU 仙游县 (350322), MEILIE 梅列区 (350402), SANYUAN 三元区 (350403), MINGXI 明溪县 (350421), QINGLIU 清流县 (350423), NINGHUA 宁化县 (350424), DATIAN 大田县 (350425), YOUXI 尤溪县 (350426), SHAXIAN 沙县 (350427), JIANGLE 将乐县 (350428), TAINING 泰宁县 (350429), JIANNING 建宁县 (350430), YONGAN 永安市 (350481), LICHENG 鲤城区 (350502), FENGZE 丰泽区 (350503), LUOJIANG 洛江区 (350504), QUANGANG 泉港区 (350505), HUIAN 惠安县 (350521), ANXI 安溪县 (350524), YONGCHUN 永春县 (350525), DEHUA 德化县 (350526), SHISHI 石狮市 (350581), JINJIANG 晋江市 (350582), NANAN 南安市 (350583), XIANGCHENG 芗城区 (350602), LONGWEN 龙文区 (350603), YUNXIAO 云霄县 (350622), ZHANGPU 漳浦县 (350623), ZHAOAN 诏安县 (350624), CHANGTAI 长泰县 (350625), DONGSHAN 东山县 (350626), NANJING 南靖县 (350627), PINGHE 平和县 (350628), HUAAN 华安县 (350629), LONGHAI 龙海市 (350681), YANPING 延平区 (350702), SHUNCHANG 顺昌县 (350721), PUCHENG 浦城县 (350722), GUANGZE 光泽县 (350723), SONGXI 松溪县 (350724), ZHENGHE 政和县 (350725), SHAOWU 邵武市 (350781), WUYISHAN 武夷山市 (350782), JIANOU 建瓯市 (350783), JIANYANG 建阳市 (350784), XINLUO 新罗区 (350802), CHANGTING 长汀县 (350821), YONGDING 永定县 (350822), SHANGHANG 上杭县 (350823), WUPING 武平县 (350824), LIANCHENG 连城县 (350825), ZHANGPING 漳平市 (350881), JIAOCHENG 蕉城区 (350902), XIAPU 霞浦县 (350921), GUTIAN 古田县 (350922), PINGNAN 屏南县 (350923), SHOUNING 寿宁县 (350924), ZHOUNING 周宁县 (350925), ZHERONG 柘荣县 (350926), FUAN 福安市 (350981), FUDING 福鼎市 (350982).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication materials for the manuscript "Skepticism in Science and Punitive Attitudes", published in the Journal of Criminal Justice.Note that the GSS repeated cross sections for 1972 to 2018 are too large to upload here, but they can be accessed from https://gss.norc.org/content/dam/gss/get-the-data/documents/spss/GSS_spss.zipIncluded here are:(A link to the repeated cross-sections data)Each of the 3 wave panels (2006-2010; 2008-2012; 2010-2014)Replication R script for the repeated cross sections cleaning and analysisReplication R script for the panel data cleaning and analysisAn excel spreadsheet with Uniform Crime Report data to merge to the cross sections.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data set for the essay "Automatic merging of separated construction plans of hydraulic structures" submitted for Bautechnik 5/22. The data set is structured as follows: - The ZIP file "01 Original Data" contains 233 folders (named after the TU IDs) with the associated partial recordings in TIF format. The TIFs are binary compressed in CCITT Fax 4 format. 219 TUs are divided into two parts and 14 into three parts. The original data therefore consists of 480 partial recordings. - The ZIP file "02 Interim Results" contains 233 folders (named after the TU IDs) with relevant intermediate results generated during stitching. This includes the input images scaled to 10 MP, the visualization of the feature assignment(s) and the result in downscaled resolution with visualized seam lines. - The ZIP file "03_Results" contains the 170 successfully merged plans in high resolution in TIF format - The Excel file "Dataset" contains metadata on the 233 examined TUs including the DOT graph of the assignment described in the work and the correctness rating the results and the assignment to the presented sources of error. The data set was generated with the following metadata query in the IT system Digital Management of Technical Documents (DVtU): Microfilm metadata - TA (partial recording) - Number: "> 1" Document metadata - Object part: "130 (Wehrwangen, Wehrpillars)" - Object ID no .: "213 (Weir systems)" - Detail: "*[Bb]wehrung*" - Version: "01.00.00"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SMARTDEST DATASET WP3 v1.0 includes data at sub-city level for 7 cities: Amsterdam, Barcelona, Edinburgh, Lisbon, Ljubljana, Turin, and Venice. It is made up of information extracted from public sources at the local level (mostly, city council open data portals) or volunteered geographic information, that is, geospatial content generated by non-professionals using mapping systems available on the Internet (e.g., Geofabrik). Details on data sources and variables are included in a ‘metadata’ spreadsheet in the excel file. The same excel file contains 5 additional spreadsheets. The first one, labelled #1, was used to perform the analysis on the determinants of the geographical spread of tourism supply in SMARTDEST case study’s cities (in the main document D3.3, section 4.1), The second one (labelled #2) offers information that would allow to replicate the analysis on tourism-led population decline reported in section 4.3. As for spreadsheets named #3-AMS, #4-BCN, and #5-EDI, they refer to data sources and variables used to run follow-up analyses discussed in section 5.1, with the objective of digging into the causes of depopulation in Amsterdam, Barcelona, and Edinburgh, respectively. The column ‘row’ can be used to merge the excel file with the shapefile ‘db_task3.3_SmartDest’. Data are available at the buurt level in Amsterdam (an administrative unit roughly corresponding to a neighbourhood), census tract level in Barcelona and Ljubljana, for data zones in Edinburgh, statistical zones in Turin, and località in Venice.
For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 3 release notes:Adds data in the following formats: Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 2 release notes:Adds data for 2017.Adds a "number_of_months_reported" variable which says how many months of the year the agency reported data.Property Stolen and Recovered is a Uniform Crime Reporting (UCR) Program data set with information on the number of offenses (crimes included are murder, rape, robbery, burglary, theft/larceny, and motor vehicle theft), the value of the offense, and subcategories of the offense (e.g. for robbery it is broken down into subcategories including highway robbery, bank robbery, gas station robbery). The majority of the data relates to theft. Theft is divided into subcategories of theft such as shoplifting, theft of bicycle, theft from building, and purse snatching. For a number of items stolen (e.g. money, jewelry and previous metals, guns), the value of property stolen and and the value for property recovered is provided. This data set is also referred to as the Supplement to Return A (Offenses Known and Reported). All the data was received directly from the FBI as text or .DTA files. I created a setup file based on the documentation provided by the FBI and read the data into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here: https://github.com/jacobkap/crime_data. The Word document file available for download is the guidebook the FBI provided with the raw data which I used to create the setup file to read in data.There may be inaccuracies in the data, particularly in the group of columns starting with "auto." To reduce (but certainly not eliminate) data errors, I replaced the following values with NA for the group of columns beginning with "offenses" or "auto" as they are common data entry error values (e.g. are larger than the agency's population, are much larger than other crimes or months in same agency): 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99942. This cleaning was NOT done on the columns starting with "value."For every numeric column I replaced negative indicator values (e.g. "j" for -1) with the negative number they are supposed to be. These negative number indicators are not included in the FBI's codebook for this data but are present in the data. I used the values in the FBI's codebook for the Offenses Known and Clearances by Arrest data.To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. If an agency has used a different FIPS code in the past, check to make sure the FIPS code is the same as in this data.
Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.
To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.
To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.
I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.
As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:
Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
Here you can download the data underlying our household application as Pdf, Excel, CSV, XML document. You may use the data, e.g. reproduce, distribute, process and merge with other data, including for commercial purposes.
The legally binding document is always the official document on the federal budget. Therefore, if you use data from this application, please always point out these documents to your users.
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
About Cyclistic
Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. They offer making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Problem Statement
The target/aim of marketing team is to convert casual riders into annual riders. In order to convert the causal riders into annual members need to understand the behavior of the users that how the annual members are using this service differently than causal riders. Need to understand how often this service is being used by annual members and casual riders.
Solution
For the analysis of this project, we picked/chose Excel with the mutual consent of our team to show our work. To help with our analysis we started with Ask, then we prepare our data according to what client was asking to provide then we process the data to make it clean, organize and easy to accessible and at the end we analyze that data to get the results.
As per the requirement of our client, they wanted to increase the number of their annual members. To increase their annual members, they wanted to know How do annual members and casual riders use Cyclistic bike differently?
After having company’s requirement now, it was the time to Prepare and Process the data. For this analysis we been told to use only previous 12 months of Cyclistic trip data. The data has been made available online by Motivational International Inc. we checked the integrity and credibility of data by making sure that online source is safe and secure through which the data is available to use.
While preparing the data, we started with downloading the files on our machine. We saved the files and unzip them. Then we created the subfolders for the .csv and the .xls sheets. Before further analysis we cleaned the data. We used Filter option on our required columns to see if there are any NULLS or any data that it supposed to be not here.
While cleaning the data in some of the monthly files we found that start_at and end_at columns had the custom format of mm: ss.0. For consistency with all other spreadsheets we changed the custom format to m/d/yy h:mm. We also found that some spreadsheets had the data from other months but after further analysis we figured it out that the ride was starting in that month and ending in the next month so that data supposed to belong from that worksheet.
After cleaning the data, we created 2 new columns in each worksheet to perform our calculations. To perform our calculations, we made 2 new columns and named them:
a) ride_length
b) day_of _week
To create ride_length column we used Subtraction Formula by choosing stareted_at and ended_at columns. That gave us the ride length of each ride for everyday of the month. To create day_of_week we used WEEKDAY command. After cleaning the data on monthly basis, it was the time to merge all 12 months into a single spreadsheet. After merging the whole data into a new sheet, it was time to Analyze! Before analyzing our team made sure one more time that the data is properly organize, formatted and there is no error or bug in our data to get the correct results. We made sure on more time that all the Formatting are correct.To analyze the data we ran few calculations to get a better sense of the data layout that we were using. We calculated: a) mean of ride_length b) max of ride_length c) mode of day_of_week
To find out mean of ride_length, we used Average Formula, to get an estimate/ overview of how long rides usually last. By doing Max calculation we found out the longest ride length. Last but not the least mode function we calculate the most frequent day of the week when riders were using that service.
To Support the requirement/ question that been asked by our client to identify the trends and relationship we made a Pivot Table in Excel so that we can show/ present our work/ insights/ results in an easy way to the client. By using Pivot Table its clearer to see the trend that annual members are using this service more than the casual riders and it’s also giving the good picture of the relation that how often annual members are using this service. By using the Pivot Table, we analyzed that total number of rides for annual members are more than the causal riders. On the basis of our analysis, we found out that the average length of ride is more for casual riders than the annual members, it means that casual members are riding for longer period of time than the annual members. But annual members are using more often than casual ri...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data files in this study have been created by merging individual per-county census files available as Microsoft Excel spreadsheets. Thus the study contains 111 data files, 1 file per census category. Each of these files contains merged census data for the following counties: FURONG 芙蓉区 (430102), TIANXIN 天心区 (430103), YUELU 岳麓区 (430104), KAIFU 开福区 (430105), YUHUA 雨花区 (430111), CHANGSHA 长沙县 (430121), WANGCHENG 望城县 (430122), NINGXIANG 宁乡县 (430124), LIUYANG 浏阳市 (430181), HETANG 荷塘区 (430202), LUSONG 芦淞区 (430203), SHIFENG 石峰区 (430204), TIANYUAN 天元区 (430211), ZHUZHOU 株洲县 (430221), YOU 攸县 (430223), CHALING 茶陵县 (430224), YANLING 炎陵县 (430225), LILING 醴陵市 (430281), YUHU 雨湖区 (430302), YUETANG 岳塘区 (430304), XIANGTAN 湘潭县 (430321), XIANGXIANG 湘乡市 (430381), SHAOSHAN 韶山市 (430382), JIANGDONG 江东区 (430402), CHENGNAN 城南区 (430403), CHENGBEI 城北区 (430404), HENGYANG JIAOQU 衡阳郊区 (430411), NANYUE 南岳区 (430412), HENGYANG 衡阳县 (430421), HENGNAN 衡南县 (430422), HENGSHAN 衡山县 (430423), HENGDONG 衡东县 (430424), QIDONG 祁东县 (430426), LEIYANG 耒阳市 (430481), CHANGNING 常宁市 (430482), SHUANGQING 双清区 (430502), DAXIANG 大祥区 (430503), BEITA 北塔区 (430511), SHAODONG 邵东县 (430521), XINSHAO 新邵县 (430522), SHAOYANG 邵阳县 (430523), LONGHUI 隆回县 (430524), DONGKOU 洞口县 (430525), SUINING 绥宁县 (430527), XINNING 新宁县 (430528), CHENGBUMIAOZUZIZHIXIAN 城步苗族自治县 (430529), WUGANG 武冈市 (430581), YUEYANGLOU 岳阳楼区 (430602), YUNXI 云溪区 (430603), JUNSHAN 君山区 (430611), YUEYANG 岳阳县 (430621), HUARONG 华容县 (430623), XIANGYIN 湘阴县 (430624), PINGJIANG 平江县 (430626), MILUO 汨罗市 (430681), LINXIANG 临湘市 (430682), WULING 武陵区 (430702), DINGCHENG 鼎城区 (430703), ANXIANG 安乡县 (430721), HANSHOU 汉寿县 (430722), LI 澧县 (430723), LINLI 临澧县 (430724), TAOYUAN 桃源县 (430725), SHIMEN 石门县 (430726), JINSHI 津市市 (430781), YONGDING 永定区 (430802), WULINGYUAN 武陵源区 (430811), CILI 慈利县 (430821), SANGZHI 桑植县 (430822), ZIYANG 资阳区 (430902), HESHAN 赫山区 (430903), NAN 南县 (430921), TAOJIANG 桃江县 (430922), ANHUA 安化县 (430923), YUANJIANG 沅江市 (430981), BEIHU 北湖区 (431002), SUXIAN 苏仙区 (431003), GUIYANG 桂阳县 (431021), YIZHANG 宜章县 (431022), YONGXING 永兴县 (431023), JIAHE 嘉禾县 (431024), LINWU 临武县 (431025), RUCHENG 汝城县 (431026), GUIDONG 桂东县 (431027), ANREN 安仁县 (431028), ZIXING 资兴市 (431081), ZHISHAN 芝山区 (431102), LENGSHUITAN 冷水滩区 (431103), QIYANG 祁阳县 (431121), DONGAN 东安县 (431122), SHUANGPAI 双牌县 (431123), DAO 道县 (431124), JIANGYONG 江永县 (431125), NINGYUAN 宁远县 (431126), LANSHAN 蓝山县 (431127), XINTIAN 新田县 (431128), JIANGHUAYAOZUZIZHIXIAN 江华瑶族自治县 (431129), HECHENG 鹤城区 (431202), ZHONGFANG 中方县 (431221), YUANLING 沅陵县 (431222), CHENXI 辰溪县 (431223), XUPU 溆浦县 (431224), HUITONG 会同县 (431225), MAYANGMIAOZUZIZHIXIAN 麻阳苗族自治县 (431226), XINHUANGDONGZUZIZHIXIAN 新晃侗族自治县 (431227), ZHIJIANGDONGZUZIZHIXIAN 芷江侗族自治县 (431228), JINGZHOUMIAOZUDONGZUZIZHIXIAN 靖州苗族侗族自治县 (431229), TONGDAODONGZUZIZHIXIAN 通道侗族自治县 (431230), HONGJIANG 洪江市 (431281), LOUXING 娄星区 (431302), SHUANGFENG 双峰县 (431321), XINHUA 新化县 (431322), LENGSHUIJIANG 冷水江市 (431381), LIANYUAN 涟源市 (431382), JISHOU 吉首市 (433101), LUXI 泸溪县 (433122), FENGHUANG 凤凰县 (433123), HUAYUAN 花垣县 (433124), BAOJING 保靖县 (433125), GUZHANG 古丈县 (433126), YONGSHUN 永顺县 (433127), LONGSHAN 龙山县 (433130).
For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 4 release notes:Adds data for 2018Version 3 release notes:Adds data in the following formats: Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 2 release notes:Adds data for 2017.Adds a "number_of_months_reported" variable which says how many months of the year the agency reported data.Property Stolen and Recovered is a Uniform Crime Reporting (UCR) Program data set with information on the number of offenses (crimes included are murder, rape, robbery, burglary, theft/larceny, and motor vehicle theft), the value of the offense, and subcategories of the offense (e.g. for robbery it is broken down into subcategories including highway robbery, bank robbery, gas station robbery). The majority of the data relates to theft. Theft is divided into subcategories of theft such as shoplifting, theft of bicycle, theft from building, and purse snatching. For a number of items stolen (e.g. money, jewelry and previous metals, guns), the value of property stolen and and the value for property recovered is provided. This data set is also referred to as the Supplement to Return A (Offenses Known and Reported). All the data was received directly from the FBI as text or .DTA files. I created a setup file based on the documentation provided by the FBI and read the data into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here: https://github.com/jacobkap/crime_data. The Word document file available for download is the guidebook the FBI provided with the raw data which I used to create the setup file to read in data.There may be inaccuracies in the data, particularly in the group of columns starting with "auto." To reduce (but certainly not eliminate) data errors, I replaced the following values with NA for the group of columns beginning with "offenses" or "auto" as they are common data entry error values (e.g. are larger than the agency's population, are much larger than other crimes or months in same agency): 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99942. This cleaning was NOT done on the columns starting with "value."For every numeric column I replaced negative indicator values (e.g. "j" for -1) with the negative number they are supposed to be. These negative number indicators are not included in the FBI's codebook for this data but are present in the data. I used the values in the FBI's codebook for the Offenses Known and Clearances by Arrest data.To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. If an agency has used a different FIPS code in the past, check to make sure the FIPS code is the same as in this data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data files in this study have been created by merging individual per-county census files available as Microsoft Excel spreadsheets. Thus the study contains 111 data files, 1 file per census category. Each of these files contains merged census data for the following counties: XUANWU 玄武区 (320102), BAIXIA 白下区 (320103), QINHUAI 秦淮区 (320104), JIANYE 建邺区 (320105), GULOU 鼓楼区 (320106), XIAGUAN 下关区 (320107), PUKOU 浦口区 (320111), DACHANG 大厂区 (320112), XIXIA 栖霞区 (320113), YUHUATAI 雨花台区 (320114), JIANGNING 江宁县 (320121), JIANGPU 江浦县 (320122), LIUHE 六合县 (320123), LISHUI 溧水县 (320124), GAOCHUN 高淳县 (320125), CHONGAN 崇安区 (320202), NANCHANG 南长区 (320203), BEITANG 北塘区 (320204), WUXI JIAOQU 无锡市郊区 (320211), MASHAN 马山区 (320212), JIANGYIN 江阴市 (320281), YIXING 宜兴市 (320282), XISHAN 锡山市 (320283), GULOU 鼓楼区 (320302), YUNLONG 云龙区 (320303), JIULI 九里区 (320304), JIAWANG 贾汪区 (320305), QUANSHAN 泉山区 (320311), FENGXIAN 丰县 (320321), PEIXIAN 沛县 (320322), TONGSHAN 铜山县 (320323), SUINING 睢宁县 (320324), XINYI 新沂市 (320381), PIZHOU 邳州市 (320382), TIANNING 天宁区 (320402), ZHONGLOU 钟楼区 (320404), QISHUYAN 戚墅堰区 (320405), CHANGZHOU JIAOQU 常州市郊区 (320411), LIYANG 溧阳市 (320481), JINTAN 金坛市 (320482), WUJIN 武进市 (320483), CANGLANG 沧浪区 (320502), PINGJIANG 平江区 (320503), JINCHANG 金阊区 (320504), HUQIU 虎丘区 (320505), CHANGSHU 常熟市 (320581), ZHANGJIAGANG 张家港市 (320582), KUNSHAN 昆山市 (320583), WUJIANG 吴江市 (320584), TAICANG 太仓市 (320585), WUXIAN 吴县市 (320586), CHONGCHUAN 崇川区 (320602), GANGZHA 港闸区 (320611), HAIAN 海安县 (320621), RUDONG 如东县 (320623), QIDONG 启东市 (320681), RUGAO 如皋市 (320682), TONGZHOU 通州市 (320683), HAIMEN 海门市 (320684), LIANYUN 连云区 (320703), YUNTAI 云台区 (320704), XINPU 新浦区 (320705), HAIZHOU 海州区 (320706), GANYU 赣榆县 (320721), DONGHAI 东海县 (320722), GUANYUN 灌云县 (320723), GUANNAN 灌南县 (320724), QINGHE 清河区 (320802), QINGPU 清浦区 (320811), HUAIYIN 淮阴县 (320821), LIANSHUI 涟水县 (320826), HONGZE 洪泽县 (320829), XUYI 盱眙县 (320830), JINHU 金湖县 (320831), HUAIAN 淮安市 (320882), YANCHENG CHENGQU 盐城城区 (320902), XIANGSHUI 响水县 (320921), BINHAI 滨海县 (320922), FUNING 阜宁县 (320923), SHEYANG 射阳县 (320924), JIANHU 建湖县 (320925), YANDU 盐都县 (320928), DONGTAI 东台市 (320981), DAFENG 大丰市 (320982), GUANGLING 广陵区 (321002), YANGZHOU JIAOQU 扬州市郊区 (321011), BAOYING 宝应县 (321023), HANJIANG 邗江县 (321027), YIZHENG 仪征市 (321081), GAOYOU 高邮市 (321084), JIANGDU 江都市 (321088), JINGKOU 京口区 (321102), RUNZHOU 润州区 (321111), DANTU 丹徒县 (321121), DANYANG 丹阳市 (321181), YANGZHONG 扬中市 (321182), JURONG 句容市 (321183), HAILING 海陵区 (321202), GAOGANG 高港区 (321203), XINGHUA 兴化市 (321281), JINGJIANG 靖江市 (321282), TAIXING 泰兴市 (321283), JIANGYAN 姜堰市 (321284), SUCHENG 宿城区 (321302), SUYU 宿豫县 (321321), SHUYANG 沭阳县 (321322), SIYANG 泗阳县 (321323), SIHONG 泗洪县 (321324).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
2008 Population & demographic census data for Israel, at the level of settlements and lower .
Data provided at the sub-settlement level (i.e neighborhoods). Variable names (in Hebrew and English) and data dictionary provided in XLS files. 2008 statistical area names provided (along with top roads/neighborhoods per settlement). Excel data needs cleaning/merging from multiple sub-pages.
Data from Israel Central Bureau of Statistics (CBS): http://www.cbs.gov.il/census/census/pnimi_page.html?id_topic=12
Photo by Me (Dan Ofer).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data files in this study have been created by merging individual per-county census files available as Microsoft Excel spreadsheets. Thus the study contains 111 data files, 1 file per census category. Each of these files contains merged census data for the following counties: CHANGAN 长安区 (130102), QIAODONG 桥东区 (130103), QIAOXI 桥西区 (130104), XINHUA 新华区 (130105), SHIJIAZHUANG JIAOQU 石家庄市郊区 (130106), JINGXINGKUANGQU 井陉矿区 (130107), JINGXING 井陉县 (130121), ZHENGDING 正定县 (130123), LUANCHENG 栾城县 (130124), XINGTANG 行唐县 (130125), LINGSHOU 灵寿县 (130126), GAOYI 高邑县 (130127), SHENZE 深泽县 (130128), ZANHUANG 赞皇县 (130129), WUJI 无极县 (130130), PINGSHAN 平山县 (130131), YUANSHI 元氏县 (130132), ZHAOXIAN 赵县 (130133), XINJI 辛集市 (130181), GAOCHENG 藁城市 (130182), JINZHOU 晋州市 (130183), XINLE 新乐市 (130184), LUQUAN 鹿泉市 (130185), LUNAN 路南区 (130202), LUBEI 路北区 (130203), GUYE 古冶区 (130204), KAIPING 开平区 (130205), XINQU 新区 (130206), FENGRUN 丰润县 (130221), LUANXIAN 滦县 (130223), LUANNAN 滦南县 (130224), LETING 乐亭县 (130225), QIANXI 迁西县 (130227), YUTIAN 玉田县 (130229), TANGHAI 唐海县 (130230), ZUNHUA 遵化市 (130281), FENGNAN 丰南市 (130282), QIANAN 迁安市 (130283), HAIGANG 海港区 (130302), SHANHAIGUAN 山海关区 (130303), BEIDAIHE 北戴河区 (130304), QINGLONGMANZUZIZHIXIANXIAN 青龙满族自治县 (130321), CHANGLI 昌黎县 (130322), FUNING 抚宁县 (130323), LULONG 卢龙县 (130324), HANSHAN 邯山区 (130402), CONGTAI 丛台区 (130403), FUXING 复兴区 (130404), FENGFENGKUANGQU 峰峰矿区 (130406), HANDAN 邯郸县 (130421), LINZHANG 临漳县 (130423), CHENGAN 成安县 (130424), DAMING 大名县 (130425), SHEXIAN 涉县 (130426), CIXIAN 磁县 (130427), FEIXIANG 肥乡县 (130428), YONGNIAN 永年县 (130429), QIUXIAN 邱县 (130430), JIZE 鸡泽县 (130431), GUANGPING 广平县 (130432), GUANTAO 馆陶县 (130433), WEIXIAN 魏县 (130434), QUZHOU 曲周县 (130435), WUAN 武安市 (130481), QIAODONG 桥东区 (130502), QIAOXI 桥西区 (130503), XINGTAI 邢台县 (130521), LINCHENG 临城县 (130522), NEIQIU 内丘县 (130523), BAIXIANG 柏乡县 (130524), LONGYAO 隆尧县 (130525), RENXIAN 任县 (130526), NANHE 南和县 (130527), NINGJIN 宁晋县 (130528), JULU 巨鹿县 (130529), XINHE 新河县 (130530), GUANGZONG 广宗县 (130531), PINGXIANG 平乡县 (130532), WEIXIAN 威县 (130533), QINGHE 清河县 (130534), LINXI 临西县 (130535), NANGONG 南宫市 (130581), SHAHE 沙河市 (130582), XINSHI 新市区 (130602), BEISHI 北市区 (130603), NANSHI 南市区 (130604), MANCHENG 满城县 (130621), QINGYUAN 清苑县 (130622), LAISHUI 涞水县 (130623), FUPING 阜平县 (130624), XUSHUI 徐水县 (130625), DINGXING 定兴县 (130626), TANGXIAN 唐县 (130627), GAOYANG 高阳县 (130628), RONGCHENG 容城县 (130629), LAIYUAN 涞源县 (130630), WANGDU 望都县 (130631), ANXIN 安新县 (130632), YIXIAN 易县 (130633), QUYANG 曲阳县 (130634), LIXIAN 蠡县 (130635), SHUNPING 顺平县 (130636), BOYE 博野县 (130637), XIONGXIAN 雄县 (130638), ZHUOZHOU 涿州市 (130681), DINGZHOU 定州市 (130682), ANGUO 安国市 (130683), GAOBEIDIAN 高碑店市 (130684), QIAODONG 桥东区 (130702), QIAOXI 桥西区 (130703), XUANHUA 宣化区 (130705), XIAHUAYUAN 下花园区 (130706), XUANHUA 宣化县 (130721), ZHANGBEI 张北县 (130722), KANGBAO 康保县 (130 723), GUYUAN 沽源县 (130724), SHANGYI 尚义县 (130725), WEIXIAN 蔚县 (130726), YANGYUAN 阳原县 (130727), HUAIAN 怀安县 (130728), WANQUAN 万全县 (130729), HUAILAI 怀来县 (130730), ZHUOLU 涿鹿县 (130731), CHICHENG 赤城县 (130732), CHONGLI 崇礼县 (130733), SHUANGQIAO 双桥区 (130802), SHUANGLUAN 双滦区 (130803), YINGSHOUYINGZIKUANGQU 鹰手营子矿区 (130804), CHENGDE 承德县 (130821), XINGLONG 兴隆县 (130822), PINGQUAN 平泉县 (130823), LUANPING 滦平县 (130824), LONGHUA 隆化县 (130825), FENGNINGMANZUZIZHIXIAN 丰宁满族自治县 (130826), KUANCHENGMANZUZIZHIXIAN 宽城满族自治县 (130827), WEICHANGMANZUMENGGUZUZIZHIXIAN 围场满族蒙古族自治县 (130828), XINHUA 新华区 (130902), YUNHE 运河区 (130903), CANGXIAN 沧县 (130921), QINGXIAN 青县 (130922), DONGGUANG 东光县 (130923), HAIXING 海兴县 (130924), YANSHAN 盐山县 (130925), SUNING 肃宁县 (130926), NANPI 南皮县 (130927), WUQIAO 吴桥县 (130928), XIANXIAN 献县 (130929), MENGCUNHUIZUZIZHIXIAN 孟村回族自治县 (130930), BOTOU 泊头市 (130981), RENQIU 任丘市 (130982), HUANGHUA 黄骅市 (130983), HEJIAN 河间市 (130984), ANCI 安次区 (131002), GUANGYANG 广阳区 (131003), GUAN 固安县 (131022), YONGQING 永清县 (131023), XIANGHE 香河县 (131024), DACHENG 大城县 (131025), WENAN 文安县 (131026), DACHANGHUIZUZIZHIXIAN 大厂回族自治县 (131028), BAZHOU 霸州市 (131081), SANHE 三河市 (131082), TAOCHENG 桃城区 (131102), ZAOQIANG 枣强县 (131121), WUYI 武邑县 (131122), WUQIANG 武强县 (131123), RAOYANG 饶阳县 (131124), ANPING 安平县 (131125), GUCHENG 故城县 (131126), JINGXIAN 景县 (131127), FUCHENG 阜城县 (131128), JIZHOU 冀州市 (131181), SHENZHOU 深州市 (131182).
This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such as the U.S. census. The fisheries databases used include port listings, weighout (dealer) landings, permit information on homeports and owner cities of residence, dealer permit information, and logbook records. The database consolidated port names in line with USGS and Census conventions, and corrected typographical errors, non-conventional spellings, or other issues. Each row is a community, and there may be confidential data since not all communities have 3 or more entities for the various variables.