Facebook
TwitterComprehensive YouTube channel statistics for Fun Quiz Questions, featuring 249,000 subscribers and 100,309,230 total views. This dataset includes detailed performance metrics such as subscriber growth, video views, engagement rates, and estimated revenue. The channel operates in the Music category and is based in CA. Track 453 videos with daily and monthly performance data, including view counts, subscriber changes, and earnings estimates. Analyze growth trends, engagement patterns, and compare performance against similar channels in the same category.
Facebook
TwitterEmbarking on a new research endeavor can be a daunting task. User guides, books, and published articles are written for an audience that already has some background experience in the field. Undergraduate students like you, who are at the very beginning of their research careers, often struggle to make sense of these documents. Furthermore, students like you often attempt to do so while balancing heavy course loads. Thus, I have written this document to help ease the burden so that you have more time to ponder the interesting scientific questions instead of digging through pages upon pages of documentation. I assume that you already have some basic familiarity with R before starting this project. I also assume that you have a well devised plan for your experimental design, including the variables you want to collect, the sampling scheme, and of course the questions of interest. Finally, I assume that you have taken a basic course in statistics and have a research mentor that can assist you with more advanced statistical methods. This is not an exhaustive user manual, but rather a guide to help you get started on your journey with geometric morphometrics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAs the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems.Methodology/Principal FindingsWe introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into “cases” and “controls”, we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (λ = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms.Conclusions/SignificanceIQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.
Facebook
TwitterThe Cambodia Socio-Economic Survey (CSES) asks questions to a country wide sample of households and household members about housing conditions, education, economic activities, household production and income, household level and structure of consumption, health, victimization, etc. There are also questions related to people in the labour force, e.g. labour force participation.
Poverty reduction is a major commitment by the Royal Government of Cambodia. Accurate statistical information about the living standards of the population and the extent of poverty is an essential instrument to assist the Government in diagnosing the problems, in designing effective policies for reducing poverty and in monitoring and evaluating the progress of poverty reduction. The Millennium Development Goals (MDG) has been adopted by the Royal Government of Cambodia and a National Strategic Development Plan (NSDP) has been developed. The MDGs are also incorporated into the “Rectangular Strategy of Cambodia”.
Cambodia is still a predominantly rural and agricultural society. The vast majority of the population get their subsistence in households as self-employed in agriculture. The level of living is determined by the household's command over labour and resources for own-production in terms of land and livestock for agricultural activities, equipments and tools for fishing, forestry and construction activities and income-earning activities in the informal and formal sector. The CSES aims to estimate household income and consumption/expenditure as well as a number of other household and individual characteristics.
The main objective of the survey is to collect statistical information about living conditions of the Cambodian population and the extent of poverty. The survey can be used for identifying problems and making decisions based on statistical data.
The main user is the Royal Government of Cambodia (RGC) as the survey supports monitoring the National Strategic Development Plan (NSDP) by different socio-economic indicators. Other users are university researchers, analysts, international organizations e.g. the World Bank and NGO’s. The World Bank has published a report on poverty profile and social indicators using CSES 2007 data . In this regard, the CSES continues to serve all stakeholders involved as essential instruments in order to assist in diagnosing the problems and designing their most effective policies. The CSES micro data at NIS is available for research and analysis by external researchers after approval by Senior Minister of Planning. The interesting research questions that could be put to the data are many; NIS welcomes new research based on CSES data.
General Objectives: CSES 2012 will continue the work started through CSES 2004 and the annual CSES 2007 and 2008 and would primarily aim at producing information needed for planning and policy making for reduction of poverty in Cambodia. Reduction of poverty has been given high priority in Cambodia's National Strategic Development Plan (NSDP 2009-2013). In addition to this, the survey data help in various other ways in developmental planning and policy making in the country. They would also prove useful for the production of National Accounts in Cambodia.
A long-term objective of the entire project is to build national capability in NIS for conducting socio-economic surveys and for utilizing survey data for planning for national development and social welfare.
Specific Objectives:
Among specific objectives, the following deserve special mention: 1) Obtain data on infrastructural facilities in villages, especially facilities for schooling and health care and associated problems. 2) Obtain data on retail prices of selected food, non-food and medicine items prevailing in the villages. 3) Collect data on utilization of education, housing and land ownership 4) Collect data on household assets and outstanding loans. 5) Collect data on household's construction activities. 6) Collect information on maternal health, child health/care. 7) Collect information on health care seeking and expenditure of the household members related to illness, injury and disability. 8) Collect information on economic activities including the economic activities for children aged between 5 and 17 years. 9) Collect information on victimization by the household 10) Collect information on the presence of the household members.
National Phnom Penh / Other Urban / Other Rural
All resident households in Cambodia
Sample survey data [ssd]
The sampling design in the CSES 2012 survey is a three-stage design. In stage one a sample of villages is selected, in stage two an Enumeration Area (EA) is selected from each village selected in stage one, and in stage three a sample of households is selected from each EA selected in stage two.
Stage 1: A random sample of PSUs was selected from each stratum. The sampling method was systematic PPS (PPS=sampling with probability proportional to size). The size measure used was the number of households in the PSU according to the sampling frame.
Stage 2: One EA was selected by Simple Random Sampling (SRS), in each village selected in stage 1.
Stage 3: In each selected EA a sample of 10 households was selected. The selection of households was done in the field by the supervisors/interviewers. All households in selected EAs were listed by the enumerator. The sample of households was then selected from the list by systematic sampling with a random start (the start value controlled by NIS).
For the details of sample selection please refer to the document "Process Description: Design and Select the Sample for CSES 2012"
Face-to-face [f2f]
Three different questionnaires or forms were used in the survey:
Form 1: Household listing sheets to be used in the sampling procedure in the enumeration areas.
Form 2: Village questionnaire answered by the village leader about economy and infrastructure, crop production, health, education, retail prices and sales prices of agriculture, employment and wages, and recruitment of children for work outside the village.
Form 3: Household questionnaire with questions for each household member, including modules on migration, education and literacy, housing conditions, crop production, household liabilities, durable goods, construction activities, nutrition, fertility and child care, child feeding and vaccination, health of children, mortality, current economic activity, health and illness, smoking, HIV/AIDS awareness, and victimization.
The interviewer is responsible for filling up Form 1 and Form 3 to respondents. For Form 2, the supervisors will be asked to canvass this form. In case that the supervisors are absent for any reason, the interviewers may be also asked to help fill up this form (Form 2).
The NIS team commenced their work of checking and coding and coding in begining of February after the first month of fieldwork was completed. Supervisors from the field delivered questionaires to NIS. Sida project expert and NIS Survey Manager helped in solving relevant matters that become apparent when reviewing questionires on delivery.
The CSES 2012 enjoyed almost a 100 percent response rate. The high response rate together with close and systematic fieldwork supervision by the core group members were a major contribution for achieving high quality survey results.
In order to provide a basis for assessing the reliability or precision of CSES estimates, the estimation of the magnitude of sampling error in the survey data were computed. Since most of the estimates from the survey are in the form of weighted ratios, thus variances for ratio estimates are computed.
The Coefficients of Variation (CV) on national level estimates are generally below 4 percent. The exception is the CV for total value of assets where there are rather high CVs especially in the urban areas, which should be expected.
The CVs are somewhat higher in the urban and rural domains but still generally below 7 percent. For the five zones, the average CVs are in the range 5 to 13 percent with a few exceptions where the CVs are above 20 percent. For provinces the CVs for food consumption are 9 percent on average.
The sample take within Primary Sampling Units (PSU) was set to 10 households per PSU in the CSES 1999. When data on variances became available, it was possible to make crude calculations of the optimal sample take within PSU. Calculations on some of the central estimates in the CSES 1999 show that the design effects in most cases are in the range 1 to 5.
Intra-cluster correlation coefficients have been calculated based on the design effects. These correlation coefficients are somewhat high. The reason is that the characteristics that are measured tend to be concentrated (clustered) within the PSUs. The optimal sample size within PSUs under different assumptions on cost ratios and intra-cluster correlation coefficients was then calculated. The cost ratio is the average cost for adding a village to the sample divided by the average cost of including an extra household in the sample. In the CSES, it was chosen to adopt a fairly low cost ratio due to the fact that the interview time per household is long. Under this assumption the optimal sample size is probably around 10 households per village for many of the CSES indicators.
Facebook
TwitterThe statistic shows the percentage growth in the number of crowdfunding platforms worldwide from 2008 to 2012. In 2008, the number of crowdfunding platforms worldwide increased by 38 percent in comparison to the previous year. The rate of crowdfunding platform growth continued in all of the following years and the growth rate reached 60 percent in 2012. The total number of crowdfunding platforms worldwide as of April 2012 amounted to 342, the estimated number by the end of December 2012 was at 536.
The growth of crowdfunding platforms
Crowdfunding or crowd sourcing is the collective effort of a number of individuals who pool together resources, usually online, in order to support the efforts of individuals or organizations wishing to get their project off the ground.
After the launch of ArtistShare, the site that is often billed as being the first crowdfunding site, the growth in the number of platforms has been constant. The trends began to show in the United States towards the mid-2000s and then in Europe. The increase in the funding volume increases over the past few years are a clear sign of the developing prevalence of this method and raises some interesting questions over the true extent of its potential.
In the world after the 2008 economic crisis, a world of fragile and uncertain growth and austerity, it has been proven that it is often extremely difficult for small and medium-sized businesses to procure capital loans from banks. This driver of economic growth is, at least to some extent, being helped along by those who support crowdfunding campaigns online.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset that I built by scraping the United States Department of Labor's Bureau of Labor Statistics. I was looking for county-level unemployment data and realized that there was a data source for this, but the data set itself hadn't existed yet, so I decided to write a scraper and build it out myself.
This data represents the Local Area Unemployment Statistics from 1990-2016, broken down by state and month. The data itself is pulled from this mapping site:
https://data.bls.gov/map/MapToolServlet?survey=la&map=county&seasonal=u
Further, the ever-evolving and ever-improving codebase that pulled this data is available here:
https://github.com/jayrav13/bls_local_area_unemployment
Of course, a huge shoutout to bls.gov and their open and transparent data. I've certainly been inspired to dive into US-related data recently and having this data open further enables my curiosities.
I was excited about building this data set out because I was pretty sure something similar didn't exist - curious to see what folks can do with it once they run with it! A curious question I had was surrounding Unemployment vs 2016 Presidential Election outcome down to the county level. A comparison can probably lead to interesting questions and discoveries such as trends in local elections that led to their most recent election outcome, etc.
Version 1 of this is as a massive JSON blob, normalized by year / month / state. I intend to transform this into a CSV in the future as well.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Overwatch is a 6v6 FPS (first-person shooter) team game with great variety between heroes who can be played as. Overwatch League (OWL) is the professional esports league of Overwatch. When watching the OWL matches this year, I noticed the power-rankings and predictive statistics by IBM Watson extremely intriguing, so I determined to introduce the datasets into Kaggle. I, myself, really want to replicate the predictions and rankings, then testing with the stats lab.
The datasets include players, head-to-head match-ups, and maps. The player historical statistics should contain OWL games from 2018 till now. It's centered around each player, and player's picked hero, its team name, performance, match IDs, etc.
Overwatch League Stats Lab has updated and downloadable csv files. And here are some interesting and inspiring questions to look at: https://overwatchleague.com/en-us/news/23303225.
Other than the power rankings and outcome predictions, I plan to look at teamfight stats, first elimination, and first death to compare the team's power.
For Players: 1. Match Report dashboard 2. Rate Ranks dashboard: Who led the league in solo kills/10 mins in 2018 as Junkrat? (min. 60 mins played) 3. Career Totals dashboard 4. Single Records dashboard
For Heroes: 1. Which 4 heroes did one play for 10% or more of his time on assault map attack rounds in the season? 2. Which hero increased in usage from 8% at the end of Stage 4 of 2018 to over 45% in the inaugural season playoffs?
For Matches: 1. Which team played the most matches that ended in a 3-2 score during the 2021 regular season? 2. Which team is entering the 2021 season on a 7-map loss streak? 3. Which team has the fastest completion time on Rialto?
Facebook
TwitterMost publicly available football (soccer) statistics are limited to aggregated data such as Goals, Shots, Fouls, Cards. When assessing performance or building predictive models, this simple aggregation, without any context, can be misleading. For example, a team that produced 10 shots on target from long range has a lower chance of scoring than a club that produced the same amount of shots from inside the box. However, metrics derived from this simple count of shots will similarly asses the two teams.
A football game generates much more events and it is very important and interesting to take into account the context in which those events were generated. This dataset should keep sports analytics enthusiasts awake for long hours as the number of questions that can be asked is huge.
This dataset is a result of a very tiresome effort of webscraping and integrating different data sources. The central element is the text commentary. All the events were derived by reverse engineering the text commentary, using regex. Using this, I was able to derive 11 types of events, as well as the main player and secondary player involved in those events and many other statistics. In case I've missed extracting some useful information, you are gladly invited to do so and share your findings. The dataset provides a granular view of 9,074 games, totaling 941,009 events from the biggest 5 European football (soccer) leagues: England, Spain, Germany, Italy, France from 2011/2012 season to 2016/2017 season as of 25.01.2017. There are games that have been played during these seasons for which I could not collect detailed data. Overall, over 90% of the played games during these seasons have event data.
The dataset is organized in 3 files:
I have used this data to:
There are tons of interesting questions a sports enthusiast can answer with this dataset. For example:
And many many more...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.fivb.org/Vis2009/Images/GetImage.asmx?No=81682&type=Press" width="1080">
I’m from Brazil, the country of soccer, we’re famous for being the only country to win five FIFA World Cups. But another sport that is very popular here is volleyball, I’m a volleyball maniac, I started to play when I was 10 and never stopped, I even tried to go pro, but my height didn’t allow me.
I’ve started an engineering course and there I’ve became a Data Science maniac. Then an idea came to my mind why not play with both my passions? That is the story behind this dataset.
This dataset contains information about men and women stats from FIBV 2020 Preliminary Round, the data is subdivided into specific datasets. Those are: Best Attackers, Best Blockers, Best Diggers, Best Receivers, Best Scorers, Best Servers and Best Setters, which of them with their on specialized stats.
I wouldn’t be here without the help of other people materials; I would like to acknowledge Luis Felipe Bueno for his Medium Article that was my starting point to make this dataset. Luis Article
I hope to help fellow Volleyball and Data Science maniacs like me to have a little fun, so I would like to see the answers for some questions: 1. Based in this dataset what would be the Men and Women Dream Teams? * 2. Which are the 5 countries with more players within the highest ranks?
*obs: (2 - Setters, 4 - Hitters, 2 - Opposite Hitters, 4 – Middle Blockers, 2 – Liberos)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterComprehensive YouTube channel statistics for Fun Quiz Questions, featuring 249,000 subscribers and 100,309,230 total views. This dataset includes detailed performance metrics such as subscriber growth, video views, engagement rates, and estimated revenue. The channel operates in the Music category and is based in CA. Track 453 videos with daily and monthly performance data, including view counts, subscriber changes, and earnings estimates. Analyze growth trends, engagement patterns, and compare performance against similar channels in the same category.