3 datasets found
  1. Kickastarter Campaigns

    • kaggle.com
    zip
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessio Cantara (2024). Kickastarter Campaigns [Dataset]. https://www.kaggle.com/datasets/alessiocantara/kickastarter-project/discussion
    Explore at:
    zip(2233314 bytes)Available download formats
    Dataset updated
    Jan 25, 2024
    Authors
    Alessio Cantara
    Description

    Welcome to my Kickstarter case study! In this project I’m trying to understand what the success’s factors for a Kickstarter campaign are, analyzing an available public dataset from Web Robots. The process of analysis will follow the data analysis roadmap: ASK, PREPARE, PROCESS, ANALYZE, SHARE and ACT.

    ASK

    Different questions will guide my analysis: 1. Is the campaign duration influencing the success of the project? 2. Is it the chosen funding budget? 3. Which category of campaign is the most likely to be successful?

    PREPARE

    I’m using the Kickstarter Datasets publicly available on Web Robots. Data are scraped using a bot which collects the data in CSV format once a month and all the data are divided into CSV files. Each table contains: - backers_count : number of people that contributed to the campaign - blurb : a captivating text description of the project - category : the label categorizing the campaign (technology, art, etc) - country - created_at : day and time of campaign creation - deadline : day and time of campaign max end - goal : amount to be collected - launched_at : date and time of campaign launch - name : name of campaign - pledged : amount of money collected - state : success or failure of the campaign

    Each month scraping produce a huge amount of CSVs, so for an initial analysis I decided to focus on three months: November and December 2023, and January 2024. I’ve downloaded zipped files which once unzipped contained respectively: 7 CSVs (November 2023), 8 CSVs (December 2023), 8 CSVs (January 2024). Each month was divided into a specific folder.

    Having a first look at the spreadsheets, it’s clear that there is some need for cleaning and modification: for example, dates and times are shown in Unix code, there are multiple columns that are not helpful for the scope of my analysis, currencies need to be uniformed (some are US$, some GB£, etc). In general, I have all the data that I need to answer my initial questions, identify trends, and make predictions.

    PROCESS

    I decided to use R to clean and process the data. For each month I started setting a new working environment in its own folder. After loading the necessary libraries: R library(tidyverse) library(lubridate) library(ggplot2) library(dplyr) library(tidyr) I scripted a general R code that searches for CSVs files in the folder, open them as separate variable and into a single data frame:

    csv_files <- list.files(pattern = "\\.csv$")
    data_frames <- list()
    
    for (file in csv_files) {
     variable_name <- sub("\\.csv$", "", file)
     assign(variable_name, read.csv(file))
     data_frames[[variable_name]] <- get(variable_name)
    }
    

    Next, I converted some columns in numeric values because I was running into types error when trying to merge all the CSVs into a single comprehensive file.

    data_frames <- lapply(data_frames, function(df) {
     df$converted_pledged_amount <- as.numeric(df$converted_pledged_amount)
     return(df)
    })
    data_frames <- lapply(data_frames, function(df) {
     df$usd_exchange_rate <- as.numeric(df$usd_exchange_rate)
     return(df)
    })
    data_frames <- lapply(data_frames, function(df) {
     df$usd_pledged <- as.numeric(df$usd_pledged)
     return(df)
    })
    

    In each folder I then ran a command to merge the CSVs in a single file (one for November 2023, one for December 2023 and one for January 2024):

    all_nov_2023 = bind_rows(data_frames)
    all_dec_2023 = bind_rows(data_frames)
    all_jan_2024 = bind_rows(data_frames)`
    

    After merging I converted the UNIX code datestamp into a readable datetime for the columns “created”, “launched”, “deadline” and deleted all the columns that had these data set to 0. I also filtered the values into the “slug” columns to show only the category of the campaign, without unnecessary information for the scope of my analysis. The final table was then saved.

    filtered_dec_2023 <- all_dec_2023 %>% #this was modified according to the considered month
     select(blurb, backers_count, category, country, created_at, launched_at, deadline,currency, usd_exchange_rate, goal, pledged, state) %>%
     filter(created_at != 0 & deadline != 0 & launched_at != 0) %>% 
     mutate(category_slug = sub('.*?"slug":"(.*?)".*', '\\1', category)) %>% 
     mutate(created = as.POSIXct(created_at, origin = "1970-01-01")) %>% 
     mutate(launched = as.POSIXct(launched_at, origin = "1970-01-01")) %>% 
     mutate(setted_deadline = as.POSIXct(deadline, origin = "1970-01-01")) %>% 
     select(-category, -deadline, -launched_at, -created_at) %>% 
     relocate(created, launched, setted_deadline, .before = goal)
    
    write.csv(filtered_dec_2023, "filtered_dec_2023.csv", row.names = FALSE)
    
    

    The three generated files were then merged into one comprehensive CSV called "kickstarter_cleaned" which was further modified, converting a...

  2. Data from: A Phanerozoic gridded dataset for palaeogeographic...

    • zenodo.org
    • portalcientifico.uvigo.gal
    • +1more
    zip
    Updated May 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lewis A. Jones; Lewis A. Jones; Mathew Domeier; Mathew Domeier (2024). A Phanerozoic gridded dataset for palaeogeographic reconstructions [Dataset]. http://doi.org/10.5281/zenodo.11384745
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lewis A. Jones; Lewis A. Jones; Mathew Domeier; Mathew Domeier
    License

    https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html

    Time period covered
    May 29, 2024
    Description

    This repository provides access to five pre-computed reconstruction files as well as the static polygons and rotation files used to generate them. This set of palaeogeographic reconstruction files provide palaeocoordinates for three global grids at H3 resolutions 2, 3, and 4, which have an average cell spacing of ~316 km, ~119 km, and ~45 km, respectively. Grids were reconstructed at a temporal resolution of one million years throughout the entire Phanerozoic (540–0 Ma). The reconstruction files are stored as comma-separated-value (CSV) files which can be easily read by almost any spreadsheet program (e.g. Microsoft Excel and Google Sheets) or programming language (e.g. Python, Julia, and R). In addition, R Data Serialization (RDS) files—a common format for saving R objects—are also provided as lighter (and compressed) alternatives to the CSV files. The structure of the reconstruction files follows a wide-form data frame structure to ease indexing. Each file consists of three initial index columns relating to the H3 cell index (i.e. the 'H3 address'), present-day longitude of the cell centroid, and the present-day latitude of the cell centroid. The subsequent columns provide the reconstructed longitudinal and latitudinal coordinate pairs for their respective age of reconstruction in ascending order, indicated by a numerical suffix. Each row contains a unique spatial point on the Earth's continental surface reconstructed through time. NA values within the reconstruction files indicate points which are not defined in deeper time (i.e. either the static polygon does not exist at that time, or it is outside the temporal coverage as defined by the rotation file).

    The following five Global Plate Models are provided (abbreviation, temporal coverage, reference) within the GPMs folder:

    • WR13, 0–550 Ma, (Wright et al., 2013)
    • MA16, 0–410 Ma, (Matthews et al., 2016)
    • TC16, 0–540 Ma, (Torsvik and Cocks, 2016)
    • SC16, 0–1100 Ma, (Scotese, 2016)
    • ME21, 0–1000 Ma, (Merdith et al., 2021)

    In addition, the H3 grids for resolutions 2, 3, and 4 are provided within the grids folder. Finally, we also provide two scripts (python and R) within the code folder which can be used to generate reconstructed coordinates for user data from the reconstruction files.

    For access to the code used to generate these files:

    https://github.com/LewisAJones/PhanGrids

    For more information, please refer to the article describing the data:

    Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. (2024).

    For any additional queries, contact:

    Lewis A. Jones (lewisa.jones@outlook.com) or Mathew M. Domeier (mathewd@uio.no)

    If you use these files, please cite:

    Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. DOI: 10.5281/zenodo.10069221

    References

    1. Matthews, K. J., Maloney, K. T., Zahirovic, S., Williams, S. E., Seton, M., & Müller, R. D. (2016). Global plate boundary evolution and kinematics since the late Paleozoic. Global and Planetary Change, 146, 226–250. https://doi.org/10.1016/j.gloplacha.2016.10.002.
    2. Merdith, A. S., Williams, S. E., Collins, A. S., Tetley, M. G., Mulder, J. A., Blades, M. L., Young, A., Armistead, S. E., Cannon, J., Zahirovic, S., & Müller, R. D. (2021). Extending full-plate tectonic models into deep time: Linking the Neoproterozoic and the Phanerozoic. Earth-Science Reviews, 214, 103477. https://doi.org/10.1016/j.earscirev.2020.103477.
    3. Scotese, C. R. (2016). Tutorial: PALEOMAP paleoAtlas for GPlates and the paleoData plotter program: PALEOMAP Project, Technical Report.
    4. Torsvik, T. H., & Cocks, L. R. M. (2017). Earth history and palaeogeography. Cambridge University Press. https://doi.org/10.1017/9781316225523.
    5. Wright, N., Zahirovic, S., Müller, R. D., & Seton, M. (2013). Towards community-driven paleogeographic reconstructions: Integrating open-access paleogeographic and paleobiology data with plate tectonics. Biogeosciences, 10, 1529–1541. https://doi.org/10.5194/bg-10-1529-2013.
  3. i

    Demographic and Health Survey 1996 - Zambia

    • catalog.ihsn.org
    • microdata.worldbank.org
    Updated Jul 6, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistical Office (2017). Demographic and Health Survey 1996 - Zambia [Dataset]. https://catalog.ihsn.org/catalog/2475
    Explore at:
    Dataset updated
    Jul 6, 2017
    Dataset authored and provided by
    Central Statistical Office
    Time period covered
    1996 - 1997
    Area covered
    Zambia
    Description

    Abstract

    The 1996 Zambia Demographic and Health Survey (ZDHS) is a nationally representative survey conducted by the Central Statistical Office at the request of the Ministry of Health, with the aim of gathering reliable information on fertility, childhood and maternal mortality rates, maternal and child health indicators, contraceptive knowledge and use, and knowledge and prevalence of sexually transmitted diseases (STDs) including AIDS. The survey is a follow-up to the Zambia DHS survey carried out in 1992.

    The primary objectives of the ZDHS are: - To collect up-to-date information on fertility, infant and child mortality and family planning; - To collect information on health-related matters such as breastfeeding, antenatal care, children's immunisations and childhood diseases; - To assess the nutritional status of mothers and children; iv) To support dissemination and utilisation of the results in planning, managing and improving family planning and health services in the country; and - To enhance the survey capabilities of the institutions involved in order to facilitate the implementation of surveys of this type in the future.

    SUMMARY OF FINDINGS

    FERTILITY

    • Fertility Trends. The 1996 ZDHS survey results indicate that the level of fertility in Zambia is continuing to decline.
    • Fertility Differentials. Some women are apparently leading the fertility decline. Moreover, women who have received some secondary education have the lowest level of fertility.
    • Age at First Birth. Childbearing begins early in Zambia, with over one-third of women becoming mothers by the time they reach age 18 and around two-thirds having had a child by the time they reach age 20.
    • Birth Intervals. The majority of Zambian children (81 percent) are born after a "safe" birth interval (24 or more months apart), with 36 percent born at least 36 months after a prior birth. Nevertheless, 19 percent of non-first births occur less than 24 months after the preceding birth. The overall median birth interval is 32 months.
    • Fertility Preferences. Survey data indicate that there is a strong desire for children and a preference for large families in Zambian society.
    • Unplanned Fertility. Despite the increasing level of contraceptive use, ZDHS data indicate that unplanned pregnancies are still common.

    FAMILY PLANNING

    • Increasing Use of Contraception. The contraceptive prevalence rate in Zambia has increased significantly over the past five years, rising from 15 percent in 1992 to 26 percent in 1996.
    • Differentials in Family Planning Use. Differentials in current use of family planning by province are large.
    • Source of Contraception. Six in ten users obtain their methods from public sources, while 24 percent use non-governmental medical sources and shops and friends account for the remaining 13 percent. Government health centres (41 percent) and government hospitals (16 percent) are the most common sources of contraceptive methods.
    • Knowledge of Contraceptive Methods. Knowledge of contraceptive methods is nearly universal, with 96 percent of all women and men knowing at least one method of family planning.
    • Family Planning Messages. One reason for the increase in level of contraceptive awareness is that family planning messages are prevalent.
    • Unmet Need for Family Planning. ZDHS data show that there is a considerable unmet need for family planning services in Zambia.

    MATERNAL AND CHILD HEALTH

    • Maternal Health Care. ZDHS data show some encouraging results regarding maternal health care, as well as to some areas in which improvements could be made. Results show that most Zambian mothers receive antenatal care, 3 percent from a doctor and 93 percent from a nurse or trained midwife.
    • High Childhood Mortality. One of the more disturbing findings from the survey is that child survival has not improved over the past few years.
    • Childhood Vaccination Coverage. Vaccination coverage against the most common childhood illnesses has increased recently.
    • Childhood Health. ZDHS data indicate that Zambian mothers are reasonably well-informed about childhood illnesses and that a high proportion are treated appropriately.
    • Breastfeeding Practices. The ZDHS results indicate that breastfeeding is almost universally practised in Zambia, with a median duration of 20 months.
    • Knowledge and Behaviour Regarding AIDS. Survey results indicate that virtually all respondents had heard of AIDS. Common sources of information were friends/relatives, the radio, and health workers. The vast majority of respondents--80 percent of women and 94 percent of men--say they have changed their behaviour in order to avoid contracting AIDS, mostly by restricting themselves to one sexual partner.

    Geographic coverage

    The 1996 Zambia Demographic and Health Survey (ZDHS) is a nationally representative survey. The sample was designed to produce reliable estimates for the country as a whole, for the urban and the rural areas separately, and for each of the nine provinces in the country.

    Analysis unit

    • Household
    • women age 15-49
    • Men age 15-59
    • Children under five years

    Universe

    The survey covered all de jure household members (usual residents), all women of reproductive age, aged 15-49 years in the total sample of households, men aged 15-59 and Children under age 5 resident in the household.

    Kind of data

    Sample survey data

    Sampling procedure

    The 1996 ZDHS covered the population residing in private households in the country. The design for the ZDHS called for a representative probability sample of approximately 8,000 completed individual interviews with women between the ages of 15 and 49. It is designed principally to produce reliable estimates for the country as a whole, for the urban and the rural areas separately, and for each of the nine provinces in the country. In addition to the sample of women, a sub-sample of about 2,000 men between the ages of 15 and 59 was also designed and selected to allow for the study of AIDS knowledge and other topics.

    SAMPLING FRAME

    Zambia is divided administratively into nine provinces and 57 districts. For the Census of Population, Housing and Agriculture of 1990, the whole country was demarcated into census supervisory areas (CSAs). Each CSA was in turn divided into standard enumeration areas (SEAs) of approximately equal size. For the 1992 ZDHS, this frame of about 4,200 CSAs and their corresponding SEAs served as the sampling frame. The measure of size was the number of households obtained during a quick count operation carried out in 1987. These same CSAs and SEAs were later updated with new measures of size which are the actual numbers of households and population figures obtained in the census. The sample for the 1996 ZDHS was selected from this updated CSA and SEA frame.

    CHARACTERISTICS OF THE AMPLE

    The sample for ZDHS was selected in three stages. At the first stage, 312 primary sampling units corresponding to the CSAs were selected from the frame of CSAs with probability proportional to size, the size being the number of households obtained from the 1990 census. At the second stage, one SEA was selected, again with probability proportional to size, within each selected CSA. An updating of the maps as well as a complete listing of the households in the selected SEAs was carried out. The list of households obtained was used as the frame for the third-stage sampling in which households were selected for interview. Women between the ages of 15 and 49 were identified in these households and interviewed. Men between the ages of 15 and 59 were also interviewed, but only in one-fourth of the households selected for the women's survey.

    SAMPLE ALLOCATION

    The provinces, stratified by urban and rural areas, were the sampling strata. There were thus 18 strata. The proportional allocation would result in a completely self-weighting sample but would not allow for reliable estimates for at least three of the nine provinces, namely Luapula, North-Western and Western. Results of other demographic and health surveys show that a minimum sample of 800-1,000 women is required in order to obtain estimates of fertility and childhood mortality rates at an acceptable level of sampling errors. It was decided to allocate a sample of 1,000 women to each of the three largest provinces, and a sample of 800 women to the two smallest provinces. The remaining provinces got samples of 850 women. Within each province, the sample was distributed approximately proportionally to the urban and rural areas.

    STRATIFICATION AND SYSTEMATIC SELECTION OF CLUSTERS

    A cluster is the ultimate area unit retained in the survey. In the 1992 ZDHS and the 1996 ZDHS, the cluster corresponds exactly to an SEA selected from the CSA that contains it. In order to decrease sampling errors of comparisons over time between 1992 and 1996--it was decided that as many as possible of the 1992 clusters be retained. After carefully examining the 262 CSAs that were included in the 1992 ZDHS, locating them in the updated frame and verifying their SEA composition, it was decided to retain 213 CSAs (and their corresponding SEAs). This amounted to almost 70 percent of the new sample. Only 99 new CSAs and their corresponding SEAs were selected.

    As in the 1992 ZDHS, stratification of the CSAs was only geographic. In each stratum, the CSAs were listed by districts ordered geographically. The procedure for selecting CSAs in each stratum consisted of: (1) calculating the sampling interval for the stratum: (2) calculating the cumulated size of each CSA; (3) calculating the series of sampling numbers R, R+I, R+21, .... R+(a-1)l, where R is a random number between 1 and 1; (4) comparing each sampling number with the cumulated sizes.

    The reasons for not

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alessio Cantara (2024). Kickastarter Campaigns [Dataset]. https://www.kaggle.com/datasets/alessiocantara/kickastarter-project/discussion
Organization logo

Kickastarter Campaigns

Beginner Project on Kickstarter Dataset

Explore at:
zip(2233314 bytes)Available download formats
Dataset updated
Jan 25, 2024
Authors
Alessio Cantara
Description

Welcome to my Kickstarter case study! In this project I’m trying to understand what the success’s factors for a Kickstarter campaign are, analyzing an available public dataset from Web Robots. The process of analysis will follow the data analysis roadmap: ASK, PREPARE, PROCESS, ANALYZE, SHARE and ACT.

ASK

Different questions will guide my analysis: 1. Is the campaign duration influencing the success of the project? 2. Is it the chosen funding budget? 3. Which category of campaign is the most likely to be successful?

PREPARE

I’m using the Kickstarter Datasets publicly available on Web Robots. Data are scraped using a bot which collects the data in CSV format once a month and all the data are divided into CSV files. Each table contains: - backers_count : number of people that contributed to the campaign - blurb : a captivating text description of the project - category : the label categorizing the campaign (technology, art, etc) - country - created_at : day and time of campaign creation - deadline : day and time of campaign max end - goal : amount to be collected - launched_at : date and time of campaign launch - name : name of campaign - pledged : amount of money collected - state : success or failure of the campaign

Each month scraping produce a huge amount of CSVs, so for an initial analysis I decided to focus on three months: November and December 2023, and January 2024. I’ve downloaded zipped files which once unzipped contained respectively: 7 CSVs (November 2023), 8 CSVs (December 2023), 8 CSVs (January 2024). Each month was divided into a specific folder.

Having a first look at the spreadsheets, it’s clear that there is some need for cleaning and modification: for example, dates and times are shown in Unix code, there are multiple columns that are not helpful for the scope of my analysis, currencies need to be uniformed (some are US$, some GB£, etc). In general, I have all the data that I need to answer my initial questions, identify trends, and make predictions.

PROCESS

I decided to use R to clean and process the data. For each month I started setting a new working environment in its own folder. After loading the necessary libraries: R library(tidyverse) library(lubridate) library(ggplot2) library(dplyr) library(tidyr) I scripted a general R code that searches for CSVs files in the folder, open them as separate variable and into a single data frame:

csv_files <- list.files(pattern = "\\.csv$")
data_frames <- list()

for (file in csv_files) {
 variable_name <- sub("\\.csv$", "", file)
 assign(variable_name, read.csv(file))
 data_frames[[variable_name]] <- get(variable_name)
}

Next, I converted some columns in numeric values because I was running into types error when trying to merge all the CSVs into a single comprehensive file.

data_frames <- lapply(data_frames, function(df) {
 df$converted_pledged_amount <- as.numeric(df$converted_pledged_amount)
 return(df)
})
data_frames <- lapply(data_frames, function(df) {
 df$usd_exchange_rate <- as.numeric(df$usd_exchange_rate)
 return(df)
})
data_frames <- lapply(data_frames, function(df) {
 df$usd_pledged <- as.numeric(df$usd_pledged)
 return(df)
})

In each folder I then ran a command to merge the CSVs in a single file (one for November 2023, one for December 2023 and one for January 2024):

all_nov_2023 = bind_rows(data_frames)
all_dec_2023 = bind_rows(data_frames)
all_jan_2024 = bind_rows(data_frames)`

After merging I converted the UNIX code datestamp into a readable datetime for the columns “created”, “launched”, “deadline” and deleted all the columns that had these data set to 0. I also filtered the values into the “slug” columns to show only the category of the campaign, without unnecessary information for the scope of my analysis. The final table was then saved.

filtered_dec_2023 <- all_dec_2023 %>% #this was modified according to the considered month
 select(blurb, backers_count, category, country, created_at, launched_at, deadline,currency, usd_exchange_rate, goal, pledged, state) %>%
 filter(created_at != 0 & deadline != 0 & launched_at != 0) %>% 
 mutate(category_slug = sub('.*?"slug":"(.*?)".*', '\\1', category)) %>% 
 mutate(created = as.POSIXct(created_at, origin = "1970-01-01")) %>% 
 mutate(launched = as.POSIXct(launched_at, origin = "1970-01-01")) %>% 
 mutate(setted_deadline = as.POSIXct(deadline, origin = "1970-01-01")) %>% 
 select(-category, -deadline, -launched_at, -created_at) %>% 
 relocate(created, launched, setted_deadline, .before = goal)

write.csv(filtered_dec_2023, "filtered_dec_2023.csv", row.names = FALSE)

The three generated files were then merged into one comprehensive CSV called "kickstarter_cleaned" which was further modified, converting a...

Search
Clear search
Close search
Google apps
Main menu