Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ranked data is commonly used in research across many fields of study including medicine, biology, psychology, and economics. One common statistic used for analyzing ranked data is Kendall’s τ coefficient, a nonparametric measure of rank correlation which describes the strength of the association between two monotonic continuous or ordinal variables. While the mathematics involved in calculating Kendall’s τ is well-established, there are relatively few graphing methods available to visualize the results. Here, we describe several alternative and complementary visualization methods and provide an interactive app for graphing Kendall’s τ. The resulting graphs provide a visualization of rank correlation which helps display the proportion of concordant and discordant pairs. Moreover, these methods highlight other key features of the data which are not represented by Kendall’s τ alone but may nevertheless be meaningful, such as longer monotonic chains and the relationship between discrete pairs of observations. We demonstrate the utility of these approaches through several examples and compare our results to other visualization methods.
Facebook
TwitterBy Granger Huntress [source]
This dataset provides a comprehensive look at the world of men's professional tennis throughout the Open Era. Every year, a new crop of tennis players has emerged to challenge long-standing traditions, while others have continued to maintain their place near the top. Through this dataset you will uncover which players succeeded in reaching or maintaining their ranking positions in the record books and how they navigated through changing eras in men’s professional tennis. Dive deep into what makes these successful athletes stand out from the rest and make impacts on their game year after year with an overview of invaluable data provided by this collection from first name, birthdate, country of origin, handedness, date range for records kept and more importantly their ATP career end rankings. Whether you are interested in a snapshot view to analyze long term trends or want to get inside insights on why top players succeed—this analysis provides invaluable resources to explore men's ATP rankings throughout its Open Era journey
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is a comprehensive source of men's year-end rankings during the Open Era. Each record includes information on the player's ranking, name, birthdate, country of origin, and handedness. This dataset can be used to study the trends in professional tennis throughout this time period and analyze how they have changed over time.
To use this dataset effectively one should first explore the data by reviewing some basic statistics. Examples include summary statistics such as total players by country or average ranking across years. Summarizing the data will help get a quick understanding of what the data is composed of and any existing patterns that may be present in it.
Another important step you can take before analyzing this data deeper would be to check for missing values or outliers within it that could affect your results if ignored or not handled appropriately. Having an understanding about any potential issues with your data like these can save you from potentially misinterpreting results due to an incomplete analysis process at some later point in time after further exploration and analysis has been done with it.
Once an overview of your dataset has been established and potential issues have been addressed it is now time to start conducting a more detailed exploration into what insights our data holds us answer questions related to Professional Tennis during this time period such as: How did various nations perform over different years? Who was consistently ranked among the top 10 players throughout this period? Any trends we see associated with handedness preference? etc… Answering questions like these properly requires finding appropriate ways analyze them given our available set up variables so keep that in mind when trying pin down connections between our variables using techniques like correlations, linear regression etc… In addition, visualizations can also help you make sense out of large amounts complex multivariable relationships which may exist between varying sets up parameters all at once so don't forget including those whenever possible! This way you'e able maximize accuracy when uncovering hidden intricacies regarding both individual components and holistic summary statistics for tennis rankings all over years covered within this open era range
- Analyzing the global trends in men's tennis in the Open Era over time by examining shifts in countries represented at each year-end ranking.
- Examining the effectiveness of different opponents based on the nature of their hands (right or left) when compared to men's hand-edness throughout the Open Era.
- Tracking and predicting future player rankings based on birthdates, country, and other relevant factors that influence performance
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: ltdPlayerMaster.csv | Column name | Description | |:--------------|:---------------------------------------------------| | FIRST | First name of the player. (String) | | LAST | Last name of the player. (String) | | HAND | Handedness of the player (Right or Left). ...
Facebook
TwitterAmazon Best Seller data contains information about the best-selling products on Amazon; this information is very useful for monitoring the best-selling products in various categories and sub-categories.
A. Usecase/Applications possible with the data:
Competition Monitoring: Amazon's Best Sellers Data contains the data of best-selling goods on Amazon, which features a lot about the top e-commerce trends. Direct competition with these items might be challenging, but the Best Sellers list can be a source of inspiration for new products and help e-commerce merchants keep ahead of the game. Getting your item onto the Best Sellers list and keeping it there is one of the most reliable strategies to ensure sales for your company. Once a product makes the Best Sellers list, e-commerce businesses increasingly use web scraping to keep track of new items and change their own to compete.
New Product Launch: Amazon Best Sellers Data is critical when it comes to launching a new product or repositioning existing products. Indeed, Amazon's best seller rank data can be used as a guide, indicating when you and your products are on the right track.
How does it work?
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This dataset contains simulated datasets, empirical data, and R scripts described in the paper: “Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)”.
A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species’ presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.
Methods To simulate independent datasets (size = 1000), we adopted Galipaud et al.’s approach (2014) with custom modifications of the data.simulation function, which used the multiple normal distribution function rmvnorm in R package mvtnorm(v1.0-5, Genz et al. 2016). Each dataset was simulated with a preset correlation structure between a response variable (y) and four predictors(x1, x2, x3, x4). The first three (genuine) predictors were set to be strongly, moderately, and weakly correlated with the response variable, respectively (denoted by large, medium, small Pearson correlation coefficients, r), while the correlation between the response and the last (spurious) predictor was set to be zero. We simulated datasets with three levels of differences of correlation coefficients of consecutive predictors, where ∆r = 0.1, 0.2, 0.3, respectively. These three levels of ∆r resulted in three correlation structures between the response and four predictors: (0.3, 0.2, 0.1, 0.0), (0.6, 0.4, 0.2, 0.0), and (0.8, 0.6, 0.3, 0.0), respectively. We repeated the simulation procedure 200 times for each of three preset correlation structures (600 datasets in total), for LM fitting later. For GLM fitting, we modified the simulation procedures with additional steps, in which we converted the continuous response into binary data O (e.g., occurrence data having 0 for absence and 1 for presence). We tested the WiBB method, along with two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate the ability to correctly rank predictor importances under various scenarios. The empirical dataset of 71 Mimulus species was collected by their occurrence coordinates and correponding values extracted from climatic layers from WorldClim dataset (www.worldclim.org), and we applied the WiBB method to infer important predictors for their geographical distributions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution selection plays a key role in parametric statistics. Choosing one from a set of candidate distributions that best fits the sample data is a recurring task in a large number of statistical analysis. The choice of a misspecified and/or poor fitted distribution can lead to unreliable results and conclusions. Although distribution selection has been widely studied regarding goodness of fit procedures, less attention has been given on how the sampling design impacts the chance of choosing the most suitable distribution. In this work we address the performance of ranked set sampling (RSS), extreme ranked set sampling (ERSS), and mixture ranked set sampling (MRSS) in distribution selection. In particular, we focus on count data, an important and very usual class of random variables. A comprehensive simulation study was carried out, accounting for five distributions: Poisson, negative binomial, geometric, zero-inflated Poisson, and zero-inflated negative binomial. Results show that RSS and extensions performed better than simple random sampling (SRS) under perfect ranking. When ranking errors are present, better efficiency depends on how accurate is the ranking criterion, and the different RSS-based designs also differ in terms of their robustness. Findings are reinforced by additional simulation based on two real data sets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Wilcoxon rank sum test for two independent samples and the Kruskal–Wallis rank test for the one-way model with k independent samples are very competitive robust alternatives to the two-sample t-test and k-sample F-test when the underlying data have tails longer than the normal distribution. However, these positives for rank methods do not extend as readily to methods for making all pairwise comparisons used to reveal where the differences in location may exist. Here, we show that the closed method of Marcus et al. applied to ranks is quite powerful for both small and large samples and better than any methods suggested in the list of applied nonparametric texts found in the recent study by Richardson. In addition, we show that the closed method applied to means is even more powerful than the classical Tukey–Kramer method applied to means, which itself is very competitive for nonnormal data with moderately long tails and small samples.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
I started playing Overwatch towards the end of Season 2 and I thought it would be interesting to start collecting data from my own ranked games. The goal was to maintain a dataset of Overwatch ranked data that I could analyze to better understand my own gameplay, and also to see how skill rating (SR) changes as a function of, for example, win/loss streaks and medals.
I wanted to make the dataset available to the Overwatch community as there are very few publicly available datasets containing ranked data, presumably because it's quite time-consuming to collect.
Moving forward, I want to find a way to include all player-viewable data from individual matches. This will hopefully include things like SR of all players in the match, characters played, global performance data for individual characters (for comparison against global averages) etc. However, Blizzard does not make this easy as I don't believe there's a public API for accessing this type of data...
The data is from my own ranked matches (1500-2600SR) and the values are taken from screenshots before and after each match. The values were manually typed into a spreadsheet and includes variables such as number of medals, eliminations, healing etc.
The data contains separate CSV files for each competitive season. I have also included an aggregated data file that combines all seasons into a single CSV.
You can find a full description of all variables here, which also contains some notes about the quality of the data.
https://www.simonho.ca/wp-content/uploads/2018/04/overwatch_data.png" alt="Overwatch data preview">
Not all season data contains all fields, and I may end up adding more fields over time, but currently you can find the following in the dataset:
Later seasons also include the following from the post-match summary screen:
I think there's a lot that we can learn about how the game works from this type of data. A few examples:
I would love to get your thoughts on what additional variables might be worth collecting data on, and the best way to collect it. Are there APIs for accessing some of the more difficult to obtain metrics?
I'd also like to find a way for others to contribute their own gameplay data. Right now the dataset is very limited by the roles I play in the game, the division/tier I play in etc. Let me know if you have any ideas about the best way to incorporate ranked data directly from the community.
This dataset is maintained by me in my spare time. You can find my original blog post detailing the background and publication of the dataset here.
Facebook
TwitterThe County Health Rankings, a collaboration between the Robert Wood Johnson Foundation and the University of Wisconsin Population Health Institute, measure the health of nearly all counties in the nation and rank them within states. This feature layer contains 2020 County Health Rankings data for nation, state, and county levels. The Rankings are compiled using county-level measures from a variety of national and state data sources. Some example measures are:adult smokingphysical inactivityflu vaccinationschild povertydriving alone to workTo see a full list of variables, as well as their definitions and descriptions, explore the Fields information by clicking the Data tab here in the Item Details. These measures are standardized and combined using scientifically-informed weights."By ranking the health of nearly every county in the nation, County Health Rankings & Roadmaps (CHR&R) illustrates how where we live affects how well and how long we live. CHR&R also shows what each of us can do to create healthier places to live, learn, work, and play – for everyone."Some new features of the 2020 Rankings data compared to previous versions:More race/ethnicity categories, including Asian/Pacific Islander and American Indian/Alaska NativeReliability flags that to flag an estimate as unreliable5 new variables: math scores, reading scores, juvenile arrests, suicides, and traffic volumeData Processing Notes:Data downloaded March 2020Slight modifications made to the source data are as follows:The string " raw value" was removed from field labels/aliases so that auto-generated legends and pop-ups would only have the measure's name, not "(measure's name) raw value" and strings such as "(%)", "rate", or "per 100,000" were added depending on the type of measure.Percentage and Prevalence fields were multiplied by 100 to make them easier to work with in the map.For demographic variables only, the word "numerator" was removed and the word "population" was added where appropriate.Fields dropped from analytic data file: yearall fields ending in "_cihigh" and "_cilow"and any variables that are not listed in the sources and years documentation.Analytic data file was then merged with state-specific ranking files so that all county rankings and subrankings are included in this layer.
Facebook
TwitterThis table includes Spearman correlation statistics for associations between pairs of analytes.
Facebook
TwitterMarket leader Facebook was the first social network to surpass one billion registered accounts and currently sits at more than three billion monthly active users. Meta Platforms owns four of the biggest social media platforms, all with more than one billion monthly active users each: Facebook (core platform), WhatsApp, Messenger, and Instagram. In the third quarter of 2023, Facebook reported around four billion monthly core Family product users. The United States and China account for the most high-profile social platforms Most top-ranked social networks with more than 100 million users originated in the United States, but services like Chinese social networks WeChat, QQ, or video-sharing app Douyin have also garnered mainstream appeal in their respective regions due to local context and content. Douyin’s popularity has led to the platform releasing an international version of its network, TikTok. How many people use social media? The leading social networks are usually available in multiple languages and enable users to connect with friends or people across geographical, political, or economic borders. In 2025, social networking sites are estimated to reach 5.44 billion users, and these figures are still expected to grow as mobile device usage and mobile social networks increasingly gain traction in previously underserved markets.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Welcome to the "Milk, Cheese, and Eggs Prices Dataset" on Kaggle! This dataset provides comprehensive information about the prices of milk, cheese, and eggs across various countries in the year 2017. Understanding the pricing trends of these essential food items is crucial for economists, policymakers, and consumers alike, as they form the foundation of many diets worldwide. This dataset is a valuable resource for researchers, analysts, and anyone interested in studying global food pricing dynamics.
The dataset consists of the following columns:
1.**Countries**: This column represents the names of different countries included in the dataset. It is a categorical variable.
2.**Milk, Cheese, and Eggs Prices, 2017**: This column contains the average prices of milk, cheese, and eggs in each respective country for the year 2017. These prices are provided as numerical values, typically in the local currency per unit (e.g., per liter, per kilogram, etc.). This column is a numerical variable.
3.**Global Rank**: This column indicates the global ranking of each country based on the average prices of milk, cheese, and eggs in 2017. A lower rank suggests lower prices, while a higher rank indicates higher prices. This column is a numerical variable.
4.**Available Data**: This column informs users about the availability of data for each country. It can have binary values (e.g., "Yes" or "No"), indicating whether complete data is available for that specific country. This column is a categorical variable.
Column Features: 1.**Countries**: Categorical - Description: The names of countries included in the dataset. - Example: "United States," "France," "India."
2.**Milk, Cheese, and Eggs Prices, 2017**: Numerical - Description: The average prices of milk, cheese, and eggs in each respective country for the year 2017. - Example: 2.50 (indicating the average price in the local currency per unit).
3.**Global Rank**: Numerical - Description: The global ranking of each country based on the average prices of milk, cheese, and eggs in 2017. - Example: 5 (indicating the country's rank).
4.**Available Data**: Categorical - Description: Indicates whether complete data is available for a specific country or not. - Example: "Yes" or "No."
This dataset allows you to explore and analyze the pricing disparities of milk, cheese, and eggs across different countries, identify trends, and gain insights into the global food market. Researchers can use it for comparative studies, and policymakers can use it to inform decisions related to food affordability and accessibility.
Feel free to perform your analyses, build predictive models, or generate visualizations to extract meaningful insights from this dataset. We encourage you to share your findings with the Kaggle community and contribute to a better understanding of global food pricing dynamics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We start with a simple introduction to topological data analysis where the most popular tool is called a persistence diagram. Briefly, a persistence diagram is a multiset of points in the plane describing the persistence of topological features of a compact set when a scale parameter varies. Since statistical methods are difficult to apply directly on persistence diagrams, various alternative functional summary statistics have been suggested, but either they do not contain the full information of the persistence diagram or they are two-dimensional functions. We suggest a new functional summary statistic that is one-dimensional and hence easier to handle, and which under mild conditions contains the full information of the persistence diagram. Its usefulness is illustrated in statistical settings concerned with point clouds and brain artery trees. The supplementary materials include additional methods and examples, technical details, and the R code used for all examples.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Overview
This dataset compiles a decade (2016–2025) of official National Institutional Ranking Framework (NIRF) data, published by the Government of India (Ministry of Education). It includes rankings across all major categories such as Engineering, Management, Universities, Colleges, Medical, Law, Architecture, and Overall.
All data has been web scraped directly from the official NIRF India website (https://www.nirfindia.org ) to maintain accuracy and consistency. This dataset provides a longitudinal view of India’s higher education performance, enabling detailed trend analysis, performance comparison, and educational insights.
Dataset Summary
Each year (2016–2025) contains rankings with detailed parameters for every institution. The typical columns include: | Column Name | Description | | ------------------------------------------- | ----------------------------------------------- | | Institute Name | Full name of the institution | | Category | Type of ranking (Engineering, Management, etc.) | | Rank | Official NIRF rank of the institution | | Score | Overall score assigned by NIRF | | City | Location of the institution | | State | State in which the institution is located | | TLR (Teaching, Learning & Resources) | Measures teaching quality and infrastructure | | RP (Research and Professional Practice) | Reflects research output and innovation | | GO (Graduation Outcomes) | Evaluates student results and placements | | OI (Outreach & Inclusivity) | Captures diversity and social inclusivity | | PR (Perception) | Public and academic perception score | | Year | Year of ranking (2016–2025) |
Why This Dataset Matters
Example Use Cases
Comparing IITs, IIMs, NITs, and private universities over 10 years Measuring correlation between NIRF scores and research output Visualizing rank progression of top 100 institutions Clustering institutions by category or score Predicting rank changes using regression or time-series analysis
Data Source All data extracted from: 🔗 Official NIRF India Website: https://www.nirfindia.org
Facebook
TwitterElectoral rules can affect who wins, who loses, and how voters feel about the electoral process. Most cities select office holders through plurality rule, but an alternative, ranked choice voting (RCV), has become increasingly popular. RCV requires voters to rank candidates, instead of simply selecting their most preferred candidate. Observers debate whether RCV will cure a variety of electoral ills or undermine representation. We test the effect of RCV on voter’s choices and perceptions of representation using survey experiments with large, representative samples of respondents. We find that candidates of color are significantly penalized in both plurality and RCV elections, with no significant difference between the rule types. However, providing respondents with candidates’ partisan affiliation significantly increases support for candidates of color.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Figures showing construction of a rank clock in which the rank order abundance for each species in a community sample is plotted at each time interval in temporal clockwise direction, rank clock display of a 20-yr data set with 25 species in which no species turnover (gains or losses) or changes in rank abundance occur over time, and a conceptual diagram illustrating two species dynamics in linear and rank clock format.
Facebook
Twitterhttps://opensource.org/license/MIThttps://opensource.org/license/MIT
Algorithm (.php) for retrieving the co-citation set of a scholarly output by DOI, and calculating CPR for it. Configuration, database operations and input sanitizing code omitted. Also, example data and statistical analyses used in Seppänen et al (2020). For context see: Seppänen et al (2020): Co-Citation Percentile Rank and JYUcite: a new network-standardized output-level citation influence metric https://oscsolutions.cc.jyu.fi/jyucite
Facebook
TwitterThe County Health Rankings, a collaboration between the Robert Wood Johnson Foundation and the University of Wisconsin Population Health Institute, measure the health of nearly all counties in the nation and rank them within states. This feature layer contains 2022 County Health Rankings data for nation, state, and county levels. The Rankings are compiled using county-level measures from a variety of national and state data sources. Some example measures are:adult smokingphysical inactivityflu vaccinationschild povertydriving alone to workTo see a full list of variables, as well as their definitions and descriptions, explore the Fields information by clicking the Data tab here in the Item Details. These measures are standardized and combined using scientifically-informed weights."By ranking the health of nearly every county in the nation, County Health Rankings & Roadmaps (CHR&R) illustrates how where we live affects how well and how long we live. CHR&R also shows what each of us can do to create healthier places to live, learn, work, and play – for everyone."Counties are ranked within their state on both health outcomes and health factors. Counties with a lower (better) health outcomes ranking than health factors ranking may see the health of their county decline in the future, as factors today can result in outcomes later. Conversely, counties with a lower (better) factors ranking than outcomes ranking may see the health of their county improve in the future.Some new variables in the 2022 Rankings data compared to previous versions:COVID-19 age-adjusted mortalitySchool segregationSchool funding adequacyGender pay gapChildcare cost burdenChildcare centersLiving wage (while the Living wage measure was introduced to the CHRR dataset in 2022 from the Living Wage Calculator, it is not available in the Living Atlas dataset and user’s interested in the most up to date living wage data can look that up on the Living Wage Calculator website).Data Processing Notes:Data downloaded April 2022Slight modifications made to the source data are as follows:The string " raw value" was removed from field labels/aliases so that auto-generated legends and pop-ups would only have the measure's name, not "(measure's name) raw value" and strings such as "(%)", "rate", or "per 100,000" were added depending on the type of measure.Percentage and Prevalence fields were multiplied by 100 to make them easier to work with in the map.Ratios were set to null if negative to make them easier to work with in the map.For demographic variables, the word "numerator" was removed and the word "population" was added where appropriate.Fields dropped from analytic data file: yearall fields ending in "_cihigh" and "_cilow"and any variables that are not listed in the sources and years documentation.Analytic data file was then merged with state-specific ranking files so that all county rankings and subrankings are included in this layer.2010 US boundaries were used as the data contain 2010 US census geographies, for a total of 3,142 counties.
Facebook
TwitterSouth African policymakers are endeavouring to ensure that the poor have better access to financial services. However, a lack of understanding of the financial needs of poor households impedes a broad strategy to attend to this need.
The Financial Diaries study addresses this knowledge gap by examining financial management in rural and urban households. The study is a year-long household survey based on fortnightly interviews in Diepsloot (Gauteng), Langa (Western Cape) and Lugangeni (Eastern Cape). In total, 160 households were involved in this pioneering study which promises to offer important insights into how poor people manage their money as well as the context in which poor people make financial decisions. The study paints a rich picture of the texture of financial markets in townships, highlighting the prevalence of informal financial products, the role of survivalist business and the contribution made by social grants. The Financial Diaries dataset includes highly detailed, daily cash flow data on income, expenditure and financial flows on both a household and individual basis.
Langa in Cape Town, Diepsloot in Johannesburg and Lugangeni, a rural village in the Eastern Cape
Units of analysis in the Financial Diaries Study 2003-2004 include households and individuals
Sample survey data [ssd]
To create the sampling frame for the Financial Diaries, the researchers echoed the method used in the Rutherford (2002) and Ruthven (2002), a participatory wealth ranking (PWR). Within South Africa, the participatory wealth ranking method is used by the Small Enterprise Foundation (SEF), a prominent NGO microlender based in the rural Limpopo Province. Simanowitz (1999) compared the PWR method to the Visual Indicator of Poverty (VIP) and found that the VIP test was seen to be at best 70% consistent with the PWR tests. At times one third of the list of households that were defined as the poorest by the VIP test was actually some of the richest according to the PWR. The PWR method was also implicitly assessed in van der Ruit, May and Roberts (2001) by comparing it to the Principle Components Analysis (PCA) used by CGAP as a means to assess client poverty. They found that three quarters of those defined as poor by the PCA were also defined as poor by the PWR. We closely followed the SEF manual to conduct our wealth rankings, and consulted with SEF on adapting the method to urban areas.
The first step is to consult with community leaders and ask how they would divide their community. Within each type of areas, representative neighbourhoods of about 100 households each were randomly chosen. Townships in South Africa are organised by street - with each street or zone having its own street committee. The street committees are meant to know everyone on their street and to serve as stewards of all activity within the street. Each street committee in each area was invited to a central meeting and asked to map their area and give a roster of household names. Following the mapping, each area was visited and the maps and rosters were checked by going door to door with the street committee.
Two references groups were then selected from the street committee and senior members of the community with between four and eight people in each reference group. Each reference group was first asked to indicate how they define a poor household versus those that are well off. This discussion had a dual purpose. First, it relayed information about what each community believes is rich or poor. Second, it started the reference group thinking about which households belong under which heading.
Following this discussion, each reference group then ranked each household in the neighbourhood according to their perceived wealth. The SEF methodology of wealth ranking is de-normalised in that reference groups are invited to put households into as many different wealth piles as they feel in appropriate. Only households that are known by both reference groups were kept in the sample.
The SEF guidelines were used to assign a score to each household in a particular pile. The scores were created by dividing 100 by the number of piles multiplied by the level of the pile. This means that if the poorest pile was number 1, then every household in the pile was assigned a score of 100, representing 100% poverty. If the wealthiest pile was pile number 6, then every household in that pile received a score of 16.7 and every household in pile 5 received a score of 33.3. An average score for both reference groups was taken for the distribution.
One way of assessing how good the results are is to analyse how consistent the rankings were between the two reference groups. According to the SEF methodology, a result is consistent if the scores between the two reference groups have no more than a 25 points difference. A result is inconsistent if the difference between the scores is between 26 and 50 points while a result is unreliable is the difference between the scores is above 50 points. SEF uses both consistent and inconsistent rankings, as long as they use the average across two reference groups - this would mean that 91% of the sample could be used. However, because only used two reference groups were used, only the consistent household for the final sample selection was considered.
To test this further,the number of times that the reference groups put a household in the exact same category was counted. The extent of agreement at either end of the wealth spectrum between the two reference groups was also assessed. This result would be unbiased by how many categories the reference groups put households into.
Following the example used in India and Bangladesh, the sample was divided into three different wealth categories depending on the household's overall score. Making a distinction between three different categories of wealth allowed the following of a similar ranking of wealth to Bangladesh and India, but also it kept the sample from being over-stratified. A sample of 60 households each was then drawn randomly from each area. To draw the sample based on a proportion representation of each wealth ranking within the population would likely leave the sample lacking in wealthier households of some rankings to draw conclusions. Therefore the researchers drew equally from each ranking.
Face-to-face [f2f]
Facebook
TwitterCounty Health Ranking and Roadmap
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Facebook
TwitterGIC created the habitat cores model using the National Land Cover Database (NLCD) 2019 land cover data (the most recent land cover available when this project began). The NLCD provides nation-wide data on land cover and land cover change at the Landsat Thematic Mapper (TM) 30-meter resolution (30 x 30 meter pixels of analysis) and is appropriate for mapping rural landscapes.
To be considered a habitat core, the native landscape must encompass more than 100 acres of intact area. This acreage standard is based on studies evaluating the minimum acreage for terrestrial species to survive and thrive. For example, interior forest dwelling birds such as cerulean warblers need 100 acres of interior forest habitat for adequate foraging and nesting habitats. Large, intact forest cores are less impacted by disturbances and can better support area-sensitive and extinction-prone species because they retain larger populations, and their habitat is less likely to degrade through time (Ewers et al 2006). Forest fragments or woodlands less than 100 acres (known as patches) were also mapped to aid in identification of corridors or pathways for species to migrate across the landscape. These fragments, while not ideal habitat for larger species, can provide quality refugia for some species. Fragments can act also act as stepping stones, allowing species to move across the landscape while minimizing their exposure to predators and other disturbances. Such 2019 NLCD landcover types as forests and wetlands were then evaluated to determine their intactness by identifying features that fragment them, such as roads, buildings, transmission corridors, large rivers, and so on. These features bisect the landscape into smaller units (see maps). If an area is bisected too often, it does not contain a large enough habitat area to support interior nesting species and thus is too small to function as a habitat core.
To ensure that there is enough interior habitat, GIC’s analysts first subtract (clip out) the outer edge for a distance of 300 feet to ensure that potentially disturbed area is not counted as interior habitat. Edge areas are more likely to contain invasive species, suffer from wind impacts leading to dryness and blowdowns, and opportunistic predators such as domestic cats and dogs. In the final map of intact habitats, this edge area is added back in, but does not count towards the 100-acre minimum core size.The next step in the process is to divide the acreage into quintiles or “natural breaks.” This sorts the cores by size, which is the most important element for contributing to species abundance – bigger landscapes can generally support more species. However, there are other landscape factors that contribute to species abundance such as surface waters. Thus, in addition to geometry and extent, habitat cores are ranked based additional environmental attributes. Assigning attributes to each core allows for the identification and prioritization of specific high-quality and high-value habitat during strategy development. Not all habitats will be protected and resources for management or conservation are usually limited. Ranking habitat cores by their quality allows land-use planners, agency officials, and landowners or site managers to prioritize specific landscapes that provide the highest value for species.
The rankings use landscape-based environmental and ecological attributes. Examples of environmental attribute data used to rank cores include the number of wetlands found within a core; the presence of rare, threatened or endangered species; species richness; soil diversity; the length of stream miles; and topography. These factors all influence the diversity of plants, insects, animals and other biota within a forest or even a wetland core. Core Ranking is represented in the Habitat Core layer. To access it, download the Habitat Core Layer and view the “Score Weight” attribute field.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ranked data is commonly used in research across many fields of study including medicine, biology, psychology, and economics. One common statistic used for analyzing ranked data is Kendall’s τ coefficient, a nonparametric measure of rank correlation which describes the strength of the association between two monotonic continuous or ordinal variables. While the mathematics involved in calculating Kendall’s τ is well-established, there are relatively few graphing methods available to visualize the results. Here, we describe several alternative and complementary visualization methods and provide an interactive app for graphing Kendall’s τ. The resulting graphs provide a visualization of rank correlation which helps display the proportion of concordant and discordant pairs. Moreover, these methods highlight other key features of the data which are not represented by Kendall’s τ alone but may nevertheless be meaningful, such as longer monotonic chains and the relationship between discrete pairs of observations. We demonstrate the utility of these approaches through several examples and compare our results to other visualization methods.