3 datasets found

Kickastarter Campaigns
kaggle.com
zip
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Cantara (2024). Kickastarter Campaigns [Dataset]. https://www.kaggle.com/datasets/alessiocantara/kickastarter-project/discussion
Explore at:
zip(2233314 bytes)Available download formats
Dataset updated
Jan 25, 2024
Authors
Alessio Cantara
Description
Welcome to my Kickstarter case study! In this project I’m trying to understand what the success’s factors for a Kickstarter campaign are, analyzing an available public dataset from Web Robots. The process of analysis will follow the data analysis roadmap: ASK, PREPARE, PROCESS, ANALYZE, SHARE and ACT.

ASK

Different questions will guide my analysis: 1. Is the campaign duration influencing the success of the project? 2. Is it the chosen funding budget? 3. Which category of campaign is the most likely to be successful?

PREPARE

I’m using the Kickstarter Datasets publicly available on Web Robots. Data are scraped using a bot which collects the data in CSV format once a month and all the data are divided into CSV files. Each table contains: - backers_count : number of people that contributed to the campaign - blurb : a captivating text description of the project - category : the label categorizing the campaign (technology, art, etc) - country - created_at : day and time of campaign creation - deadline : day and time of campaign max end - goal : amount to be collected - launched_at : date and time of campaign launch - name : name of campaign - pledged : amount of money collected - state : success or failure of the campaign

Each month scraping produce a huge amount of CSVs, so for an initial analysis I decided to focus on three months: November and December 2023, and January 2024. I’ve downloaded zipped files which once unzipped contained respectively: 7 CSVs (November 2023), 8 CSVs (December 2023), 8 CSVs (January 2024). Each month was divided into a specific folder.

Having a first look at the spreadsheets, it’s clear that there is some need for cleaning and modification: for example, dates and times are shown in Unix code, there are multiple columns that are not helpful for the scope of my analysis, currencies need to be uniformed (some are US$, some GB£, etc). In general, I have all the data that I need to answer my initial questions, identify trends, and make predictions.

PROCESS

I decided to use R to clean and process the data. For each month I started setting a new working environment in its own folder. After loading the necessary libraries: R library(tidyverse) library(lubridate) library(ggplot2) library(dplyr) library(tidyr) I scripted a general R code that searches for CSVs files in the folder, open them as separate variable and into a single data frame:

csv_files <- list.files(pattern = "\\.csv$") data_frames <- list() for (file in csv_files) { variable_name <- sub("\\.csv$", "", file) assign(variable_name, read.csv(file)) data_frames[[variable_name]] <- get(variable_name) }

Next, I converted some columns in numeric values because I was running into types error when trying to merge all the CSVs into a single comprehensive file.

data_frames <- lapply(data_frames, function(df) { df$converted_pledged_amount <- as.numeric(df$converted_pledged_amount) return(df) }) data_frames <- lapply(data_frames, function(df) { df$usd_exchange_rate <- as.numeric(df$usd_exchange_rate) return(df) }) data_frames <- lapply(data_frames, function(df) { df$usd_pledged <- as.numeric(df$usd_pledged) return(df) })

In each folder I then ran a command to merge the CSVs in a single file (one for November 2023, one for December 2023 and one for January 2024):

all_nov_2023 = bind_rows(data_frames) all_dec_2023 = bind_rows(data_frames) all_jan_2024 = bind_rows(data_frames)`

After merging I converted the UNIX code datestamp into a readable datetime for the columns “created”, “launched”, “deadline” and deleted all the columns that had these data set to 0. I also filtered the values into the “slug” columns to show only the category of the campaign, without unnecessary information for the scope of my analysis. The final table was then saved.

filtered_dec_2023 <- all_dec_2023 %>% #this was modified according to the considered month select(blurb, backers_count, category, country, created_at, launched_at, deadline,currency, usd_exchange_rate, goal, pledged, state) %>% filter(created_at != 0 & deadline != 0 & launched_at != 0) %>% mutate(category_slug = sub('.*?"slug":"(.*?)".*', '\\1', category)) %>% mutate(created = as.POSIXct(created_at, origin = "1970-01-01")) %>% mutate(launched = as.POSIXct(launched_at, origin = "1970-01-01")) %>% mutate(setted_deadline = as.POSIXct(deadline, origin = "1970-01-01")) %>% select(-category, -deadline, -launched_at, -created_at) %>% relocate(created, launched, setted_deadline, .before = goal) write.csv(filtered_dec_2023, "filtered_dec_2023.csv", row.names = FALSE)

The three generated files were then merged into one comprehensive CSV called "kickstarter_cleaned" which was further modified, converting a...
Data from: A Phanerozoic gridded dataset for palaeogeographic...
zenodo.org
portalcientifico.uvigo.gal
+1more
zip
Updated May 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lewis A. Jones; Lewis A. Jones; Mathew Domeier; Mathew Domeier (2024). A Phanerozoic gridded dataset for palaeogeographic reconstructions [Dataset]. http://doi.org/10.5281/zenodo.11384745
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11384745
Dataset updated
May 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lewis A. Jones; Lewis A. Jones; Mathew Domeier; Mathew Domeier
License
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Time period covered
May 29, 2024
Description
This repository provides access to five pre-computed reconstruction files as well as the static polygons and rotation files used to generate them. This set of palaeogeographic reconstruction files provide palaeocoordinates for three global grids at H3 resolutions 2, 3, and 4, which have an average cell spacing of ~316 km, ~119 km, and ~45 km, respectively. Grids were reconstructed at a temporal resolution of one million years throughout the entire Phanerozoic (540–0 Ma). The reconstruction files are stored as comma-separated-value (CSV) files which can be easily read by almost any spreadsheet program (e.g. Microsoft Excel and Google Sheets) or programming language (e.g. Python, Julia, and R). In addition, R Data Serialization (RDS) files—a common format for saving R objects—are also provided as lighter (and compressed) alternatives to the CSV files. The structure of the reconstruction files follows a wide-form data frame structure to ease indexing. Each file consists of three initial index columns relating to the H3 cell index (i.e. the 'H3 address'), present-day longitude of the cell centroid, and the present-day latitude of the cell centroid. The subsequent columns provide the reconstructed longitudinal and latitudinal coordinate pairs for their respective age of reconstruction in ascending order, indicated by a numerical suffix. Each row contains a unique spatial point on the Earth's continental surface reconstructed through time. NA values within the reconstruction files indicate points which are not defined in deeper time (i.e. either the static polygon does not exist at that time, or it is outside the temporal coverage as defined by the rotation file).

The following five Global Plate Models are provided (abbreviation, temporal coverage, reference) within the GPMs folder:

WR13, 0–550 Ma, (Wright et al., 2013)

MA16, 0–410 Ma, (Matthews et al., 2016)

TC16, 0–540 Ma, (Torsvik and Cocks, 2016)

SC16, 0–1100 Ma, (Scotese, 2016)

ME21, 0–1000 Ma, (Merdith et al., 2021)

In addition, the H3 grids for resolutions 2, 3, and 4 are provided within the grids folder. Finally, we also provide two scripts (python and R) within the code folder which can be used to generate reconstructed coordinates for user data from the reconstruction files.

For access to the code used to generate these files:

https://github.com/LewisAJones/PhanGrids

For more information, please refer to the article describing the data:

Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. (2024).

For any additional queries, contact:

Lewis A. Jones (lewisa.jones@outlook.com) or Mathew M. Domeier (mathewd@uio.no)

If you use these files, please cite:

Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. DOI: 10.5281/zenodo.10069221

References

Matthews, K. J., Maloney, K. T., Zahirovic, S., Williams, S. E., Seton, M., & Müller, R. D. (2016). Global plate boundary evolution and kinematics since the late Paleozoic. Global and Planetary Change, 146, 226–250. https://doi.org/10.1016/j.gloplacha.2016.10.002.

Merdith, A. S., Williams, S. E., Collins, A. S., Tetley, M. G., Mulder, J. A., Blades, M. L., Young, A., Armistead, S. E., Cannon, J., Zahirovic, S., & Müller, R. D. (2021). Extending full-plate tectonic models into deep time: Linking the Neoproterozoic and the Phanerozoic. Earth-Science Reviews, 214, 103477. https://doi.org/10.1016/j.earscirev.2020.103477.

Scotese, C. R. (2016). Tutorial: PALEOMAP paleoAtlas for GPlates and the paleoData plotter program: PALEOMAP Project, Technical Report.

Torsvik, T. H., & Cocks, L. R. M. (2017). Earth history and palaeogeography. Cambridge University Press. https://doi.org/10.1017/9781316225523.

Wright, N., Zahirovic, S., Müller, R. D., & Seton, M. (2013). Towards community-driven paleogeographic reconstructions: Integrating open-access paleogeographic and paleobiology data with plate tectonics. Biogeosciences, 10, 1529–1541. https://doi.org/10.5194/bg-10-1529-2013.
i
Demographic and Health Survey 1996 - Zambia
catalog.ihsn.org
microdata.worldbank.org
Updated Jul 6, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistical Office (2017). Demographic and Health Survey 1996 - Zambia [Dataset]. https://catalog.ihsn.org/catalog/2475
Explore at:
Dataset updated
Jul 6, 2017
Dataset authored and provided by
Central Statistical Office
Time period covered
1996 - 1997
Area covered
Zambia
Description
Abstract

The 1996 Zambia Demographic and Health Survey (ZDHS) is a nationally representative survey conducted by the Central Statistical Office at the request of the Ministry of Health, with the aim of gathering reliable information on fertility, childhood and maternal mortality rates, maternal and child health indicators, contraceptive knowledge and use, and knowledge and prevalence of sexually transmitted diseases (STDs) including AIDS. The survey is a follow-up to the Zambia DHS survey carried out in 1992.

The primary objectives of the ZDHS are: - To collect up-to-date information on fertility, infant and child mortality and family planning; - To collect information on health-related matters such as breastfeeding, antenatal care, children's immunisations and childhood diseases; - To assess the nutritional status of mothers and children; iv) To support dissemination and utilisation of the results in planning, managing and improving family planning and health services in the country; and - To enhance the survey capabilities of the institutions involved in order to facilitate the implementation of surveys of this type in the future.

SUMMARY OF FINDINGS

FERTILITY

Fertility Trends. The 1996 ZDHS survey results indicate that the level of fertility in Zambia is continuing to decline.

Fertility Differentials. Some women are apparently leading the fertility decline. Moreover, women who have received some secondary education have the lowest level of fertility.

Age at First Birth. Childbearing begins early in Zambia, with over one-third of women becoming mothers by the time they reach age 18 and around two-thirds having had a child by the time they reach age 20.

Birth Intervals. The majority of Zambian children (81 percent) are born after a "safe" birth interval (24 or more months apart), with 36 percent born at least 36 months after a prior birth. Nevertheless, 19 percent of non-first births occur less than 24 months after the preceding birth. The overall median birth interval is 32 months.

Fertility Preferences. Survey data indicate that there is a strong desire for children and a preference for large families in Zambian society.

Unplanned Fertility. Despite the increasing level of contraceptive use, ZDHS data indicate that unplanned pregnancies are still common.

FAMILY PLANNING

Increasing Use of Contraception. The contraceptive prevalence rate in Zambia has increased significantly over the past five years, rising from 15 percent in 1992 to 26 percent in 1996.

Differentials in Family Planning Use. Differentials in current use of family planning by province are large.

Source of Contraception. Six in ten users obtain their methods from public sources, while 24 percent use non-governmental medical sources and shops and friends account for the remaining 13 percent. Government health centres (41 percent) and government hospitals (16 percent) are the most common sources of contraceptive methods.

Knowledge of Contraceptive Methods. Knowledge of contraceptive methods is nearly universal, with 96 percent of all women and men knowing at least one method of family planning.

Family Planning Messages. One reason for the increase in level of contraceptive awareness is that family planning messages are prevalent.

Unmet Need for Family Planning. ZDHS data show that there is a considerable unmet need for family planning services in Zambia.

MATERNAL AND CHILD HEALTH

Maternal Health Care. ZDHS data show some encouraging results regarding maternal health care, as well as to some areas in which improvements could be made. Results show that most Zambian mothers receive antenatal care, 3 percent from a doctor and 93 percent from a nurse or trained midwife.

High Childhood Mortality. One of the more disturbing findings from the survey is that child survival has not improved over the past few years.

Childhood Vaccination Coverage. Vaccination coverage against the most common childhood illnesses has increased recently.

Childhood Health. ZDHS data indicate that Zambian mothers are reasonably well-informed about childhood illnesses and that a high proportion are treated appropriately.

Breastfeeding Practices. The ZDHS results indicate that breastfeeding is almost universally practised in Zambia, with a median duration of 20 months.

Knowledge and Behaviour Regarding AIDS. Survey results indicate that virtually all respondents had heard of AIDS. Common sources of information were friends/relatives, the radio, and health workers. The vast majority of respondents--80 percent of women and 94 percent of men--say they have changed their behaviour in order to avoid contracting AIDS, mostly by restricting themselves to one sexual partner.

Geographic coverage

The 1996 Zambia Demographic and Health Survey (ZDHS) is a nationally representative survey. The sample was designed to produce reliable estimates for the country as a whole, for the urban and the rural areas separately, and for each of the nine provinces in the country.

Analysis unit

Household

women age 15-49

Men age 15-59

Children under five years

Universe

The survey covered all de jure household members (usual residents), all women of reproductive age, aged 15-49 years in the total sample of households, men aged 15-59 and Children under age 5 resident in the household.

Kind of data

Sample survey data

Sampling procedure

The 1996 ZDHS covered the population residing in private households in the country. The design for the ZDHS called for a representative probability sample of approximately 8,000 completed individual interviews with women between the ages of 15 and 49. It is designed principally to produce reliable estimates for the country as a whole, for the urban and the rural areas separately, and for each of the nine provinces in the country. In addition to the sample of women, a sub-sample of about 2,000 men between the ages of 15 and 59 was also designed and selected to allow for the study of AIDS knowledge and other topics.

SAMPLING FRAME

Zambia is divided administratively into nine provinces and 57 districts. For the Census of Population, Housing and Agriculture of 1990, the whole country was demarcated into census supervisory areas (CSAs). Each CSA was in turn divided into standard enumeration areas (SEAs) of approximately equal size. For the 1992 ZDHS, this frame of about 4,200 CSAs and their corresponding SEAs served as the sampling frame. The measure of size was the number of households obtained during a quick count operation carried out in 1987. These same CSAs and SEAs were later updated with new measures of size which are the actual numbers of households and population figures obtained in the census. The sample for the 1996 ZDHS was selected from this updated CSA and SEA frame.

CHARACTERISTICS OF THE AMPLE

The sample for ZDHS was selected in three stages. At the first stage, 312 primary sampling units corresponding to the CSAs were selected from the frame of CSAs with probability proportional to size, the size being the number of households obtained from the 1990 census. At the second stage, one SEA was selected, again with probability proportional to size, within each selected CSA. An updating of the maps as well as a complete listing of the households in the selected SEAs was carried out. The list of households obtained was used as the frame for the third-stage sampling in which households were selected for interview. Women between the ages of 15 and 49 were identified in these households and interviewed. Men between the ages of 15 and 59 were also interviewed, but only in one-fourth of the households selected for the women's survey.

SAMPLE ALLOCATION

The provinces, stratified by urban and rural areas, were the sampling strata. There were thus 18 strata. The proportional allocation would result in a completely self-weighting sample but would not allow for reliable estimates for at least three of the nine provinces, namely Luapula, North-Western and Western. Results of other demographic and health surveys show that a minimum sample of 800-1,000 women is required in order to obtain estimates of fertility and childhood mortality rates at an acceptable level of sampling errors. It was decided to allocate a sample of 1,000 women to each of the three largest provinces, and a sample of 800 women to the two smallest provinces. The remaining provinces got samples of 850 women. Within each province, the sample was distributed approximately proportionally to the urban and rural areas.

STRATIFICATION AND SYSTEMATIC SELECTION OF CLUSTERS

A cluster is the ultimate area unit retained in the survey. In the 1992 ZDHS and the 1996 ZDHS, the cluster corresponds exactly to an SEA selected from the CSA that contains it. In order to decrease sampling errors of comparisons over time between 1992 and 1996--it was decided that as many as possible of the 1992 clusters be retained. After carefully examining the 262 CSAs that were included in the 1992 ZDHS, locating them in the updated frame and verifying their SEA composition, it was decided to retain 213 CSAs (and their corresponding SEAs). This amounted to almost 70 percent of the new sample. Only 99 new CSAs and their corresponding SEAs were selected.

As in the 1992 ZDHS, stratification of the CSAs was only geographic. In each stratum, the CSAs were listed by districts ordered geographically. The procedure for selecting CSAs in each stratum consisted of: (1) calculating the sampling interval for the stratum: (2) calculating the cumulated size of each CSA; (3) calculating the series of sampling numbers R, R+I, R+21, .... R+(a-1)l, where R is a random number between 1 and 1; (4) comparing each sampling number with the cumulated sizes.

The reasons for not
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Alessio Cantara (2024). Kickastarter Campaigns [Dataset]. https://www.kaggle.com/datasets/alessiocantara/kickastarter-project/discussion

Kickastarter Campaigns

Beginner Project on Kickstarter Dataset

Explore at:

zip(2233314 bytes)Available download formats

Dataset updated

Jan 25, 2024

Authors

Alessio Cantara

Description

Welcome to my Kickstarter case study! In this project I’m trying to understand what the success’s factors for a Kickstarter campaign are, analyzing an available public dataset from Web Robots. The process of analysis will follow the data analysis roadmap: ASK, PREPARE, PROCESS, ANALYZE, SHARE and ACT.

ASK

Different questions will guide my analysis: 1. Is the campaign duration influencing the success of the project? 2. Is it the chosen funding budget? 3. Which category of campaign is the most likely to be successful?

PREPARE

I’m using the Kickstarter Datasets publicly available on Web Robots. Data are scraped using a bot which collects the data in CSV format once a month and all the data are divided into CSV files. Each table contains: - backers_count : number of people that contributed to the campaign - blurb : a captivating text description of the project - category : the label categorizing the campaign (technology, art, etc) - country - created_at : day and time of campaign creation - deadline : day and time of campaign max end - goal : amount to be collected - launched_at : date and time of campaign launch - name : name of campaign - pledged : amount of money collected - state : success or failure of the campaign

Each month scraping produce a huge amount of CSVs, so for an initial analysis I decided to focus on three months: November and December 2023, and January 2024. I’ve downloaded zipped files which once unzipped contained respectively: 7 CSVs (November 2023), 8 CSVs (December 2023), 8 CSVs (January 2024). Each month was divided into a specific folder.

Having a first look at the spreadsheets, it’s clear that there is some need for cleaning and modification: for example, dates and times are shown in Unix code, there are multiple columns that are not helpful for the scope of my analysis, currencies need to be uniformed (some are US$, some GB£, etc). In general, I have all the data that I need to answer my initial questions, identify trends, and make predictions.

PROCESS

I decided to use R to clean and process the data. For each month I started setting a new working environment in its own folder. After loading the necessary libraries: R library(tidyverse) library(lubridate) library(ggplot2) library(dplyr) library(tidyr) I scripted a general R code that searches for CSVs files in the folder, open them as separate variable and into a single data frame:

csv_files <- list.files(pattern = "\\.csv$")
data_frames <- list()

for (file in csv_files) {
 variable_name <- sub("\\.csv$", "", file)
 assign(variable_name, read.csv(file))
 data_frames[[variable_name]] <- get(variable_name)
}

Next, I converted some columns in numeric values because I was running into types error when trying to merge all the CSVs into a single comprehensive file.

data_frames <- lapply(data_frames, function(df) {
 df$converted_pledged_amount <- as.numeric(df$converted_pledged_amount)
 return(df)
})
data_frames <- lapply(data_frames, function(df) {
 df$usd_exchange_rate <- as.numeric(df$usd_exchange_rate)
 return(df)
})
data_frames <- lapply(data_frames, function(df) {
 df$usd_pledged <- as.numeric(df$usd_pledged)
 return(df)
})

In each folder I then ran a command to merge the CSVs in a single file (one for November 2023, one for December 2023 and one for January 2024):

all_nov_2023 = bind_rows(data_frames)
all_dec_2023 = bind_rows(data_frames)
all_jan_2024 = bind_rows(data_frames)`

After merging I converted the UNIX code datestamp into a readable datetime for the columns “created”, “launched”, “deadline” and deleted all the columns that had these data set to 0. I also filtered the values into the “slug” columns to show only the category of the campaign, without unnecessary information for the scope of my analysis. The final table was then saved.

filtered_dec_2023 <- all_dec_2023 %>% #this was modified according to the considered month
 select(blurb, backers_count, category, country, created_at, launched_at, deadline,currency, usd_exchange_rate, goal, pledged, state) %>%
 filter(created_at != 0 & deadline != 0 & launched_at != 0) %>% 
 mutate(category_slug = sub('.*?"slug":"(.*?)".*', '\\1', category)) %>% 
 mutate(created = as.POSIXct(created_at, origin = "1970-01-01")) %>% 
 mutate(launched = as.POSIXct(launched_at, origin = "1970-01-01")) %>% 
 mutate(setted_deadline = as.POSIXct(deadline, origin = "1970-01-01")) %>% 
 select(-category, -deadline, -launched_at, -created_at) %>% 
 relocate(created, launched, setted_deadline, .before = goal)

write.csv(filtered_dec_2023, "filtered_dec_2023.csv", row.names = FALSE)

The three generated files were then merged into one comprehensive CSV called "kickstarter_cleaned" which was further modified, converting a...

Clear search

Close search

Google apps

Main menu

Kickastarter Campaigns

Data from: A Phanerozoic gridded dataset for palaeogeographic...

Demographic and Health Survey 1996 - Zambia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Kickastarter Campaigns

Beginner Project on Kickstarter Dataset