Facebook
TwitterThis dataset was created by Shiva Vashishtha
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterAccess and clean an open source herbarium dataset using Excel or RStudio.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
EA Sports FIFA 21 is a popular video game that simulates football matches. Often, data collected from this game might be messy, containing inconsistencies, missing values, and various formatting issues.
For this project, I will attempt to clean, organize and prepare this messy FIFA_21 data for analysis using just Excel. Although, it can be done somewhat faster using Python, R, or other programming languages; the challenge at hand is to use Excel.
Observations(Rows)=18980
Column 'Loan Date End' has '17966' blanks.
=COUNTIF(A1:A18980; "=0")
'Value', 'Wage', 'Release Clause', 'Hits' have '0' values.
=SUBSTITUTE(A1; " "; "_")
Unique_Atributes(columns)=76
At first glance this height column looked like it needed a simple formula to turn a string ending in 'cm' to real numbers expressing a height in centimeteres, but then it was visible that some values were also in feet. And they were expressed with apostrophes and air quotes which called for a more intricate formula to fetch every value and transform it. Inches had to be turned to feet. Then the total value turned into centimeteres. The 'IF' formula verifies if the string is a number by leaving out the 'cm' 'feet(')' and 'inches(")' from the string. If it is centimeteres, the number is kept. If it is feet, the digits before the airquotes are kept, the digits after the airquotes (the inches) are turned into feet, then added together, and finally turned into centimeters.
=IF(ISNUMBER(FIND("cm";$O2)); VALUE(SUBSTITUTE($O2; "cm"; "")); ROUND((LEFT($O2; FIND("'"; $O2) - 1) * 12 + MID($O2; FIND("'"; $O2) + 1; FIND(""""; $O2) - FIND("'"; $O2) - 1)) * 2,54;0))
Weight was added in 'Kg' and 'Lbs'. For 'Kg' the value is turned into numbers. For 'Lbs' the value is converted into 'Kg' and then turned into numbers. The result is rounded up to null decimal points.
=ROUND(IF(ISNUMBER(FIND("kg";$P2));VALUE(SUBSTITUTE($P2;"kg";""))*1;IF(ISNUMBER(FIND("lbs";$P2));VALUE(SUBSTITUTE($P2;"lbs";""))/2,205;0));0)
A new column is added to the right of 'Joined' by the name 'WithClub10Years'. This column shows whether the player has been at the same club for a minimum of 10 years.
=IF(YEAR(NOW())-YEAR(T2)>=10; "10 Years"; "")
The monetary figures were converted into numerical values only. The values are Euros. The 'M' and 'K' removed and its according figure multiplied to show millions and thousands respectively. Decimal points delimiter changed from '.' to ',' for calculation.
=IF(ISNUMBER(FIND("M"; Z2)); VALUE(SUBSTITUTE(Z2; "M"; ""))*1000000; IF(ISNUMBER(FIND("K"; Z2)); VALUE(SUBSTITUTE(Z2; "K"; ""))*1000; Z2*1))
Values included stars. Stars were removed and string turned to numbers.
=LEFT(BO2; 1)
Conclusion
The clean dataset is now ready for more analysis, such as exploring player statistics, team performance, or other insigths that can provide a deeper understanding of the FIFA 21 game.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset has been obtained by web scraping a Wikipedia page for which code is linked below: https://www.kaggle.com/amruthayenikonda/simple-web-scraping-using-pandas
This dataset can be used to practice data cleaning and manipulation for example dropping of unwanted columns, null vales, removing symbols etc
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analyze 950 Clean,excel import shipments to India from Italy till Mar-26. Import data includes Buyers, Suppliers, Pricing, Qty & Contacts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Discover New & profitable Clean Excel buyers & suppliers, Access 2,289 export import shipment records till Dec - 25 with 52 importers & 33 Exporters.
Facebook
TwitterIn the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, after collecting a number of revenues and expenses over the months.
Needed to know the answers to a number of questions to make important decisions based on intuition-free data.
The Questions:-
About Rev. & Exp.
- What is the total sales and profit for the whole period? And What Total products sold? And What is Net profit?
- In which month was the highest percentage of revenue achieved? And in the same month, what is the largest day have amount of revenue?
- In which month was the highest percentage of expenses achieved? And in the same month, what is the largest day have amount of exp.?
- What is the extent of the change in expenditures for each month?
Percentage change in net profit over the months?
About Distribution
- What is the number of products sold each month in the largest state?
-The top 3 largest states buying products during the two years?
Comparison
- Between Sales Method by Sales?
- Between Men and Women’s Product by Sales?
- Between Retailer by Profit?
What I did? - Understanding the data - preprocessing and clean the data - Solve The problems in the cleaning like missing data or false type data - querying the data and make some calculations like "COGS" with power query "Excel". - Modeling and make some measures on the data with power pivot "Excel" - After finishing processing and preparation, I made Some Pivot tables to answers the questions. - Last, I made a dashboard with Power BI to visualize The Results.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.
Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.
Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.
The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files.
The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw.
The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey.
The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively.
The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean.
The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was created to evaluate students’ performance in the most recent school examination. The goal is to help the school administration understand overall academic achievement, examine score distribution across grades, and identify student groups that may need additional academic support to improve learning outcomes.
The dataset provides detailed student result records, including subjects, scores, grades, and performance categories. It serves as a practical resource for educators, analysts, and data learners who wish to explore educational data using Excel or data analytics tools.
Tool Used: Microsoft Excel Spreadsheet
Data Frame Process: This analysis followed the Google Data Analytics data-phase approach, which involves:
Ask: Define the key questions and objectives
Prepare: Organize and clean the student result data
Process: Perform calculations and structure the data in Excel
Analyze: Evaluate performance trends and identify weak areas
Share: Present findings using tables, charts, and summaries
Act: Provide actionable recommendations to improve student outcomes
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The Orders database contains information on the following variables.
• Continuous variables: Row ID, Order ID, Order Date, Ship Date, Customer ID, Product ID, Sales, Quantity, Discount, Profit, Shipping Cost
• Categorical variables: Ship Mode, Customer Name, Segment, Postal Code, City, State, Country, Region, Market, Category, Subcategory, Product Name, Order Priority
The purpose of this project: 1. To use descriptive statistics methods to assess the sales performance across various segments, markets, product categories and subcategories; 2. To use diagnostic analytics methods to understand the statistical significance of the factors that influence sales; 3. Use predictive analytics (regression) to understand the strengths of the relationship between sales and sales drivers and generate a regression formula to predict sales 4. develop a sales forecasting model based on the insights.
Descriptive analytics
Descriptive statistics for sales
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F848f47b38b7f2360163bb2221703c658%2FPicture2.png?generation=1715109635788424&alt=media" alt="">
Frequency distribution for sales
Around 44,500 transactions of value >=USD 500.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F39cfd8ffd8fdf296300bb9f1fa5243e2%2FPicture3.png?generation=1715109667755923&alt=media" alt="">
Sales values across markets
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F3385959d11b6daafae24c848b4b00f13%2FPicture4.png?generation=1715109744629587&alt=media" alt="">
We see an increase in sales across all markets and throughout 2012-2015.
We have high sales volumes in the USCA and LATAM markets:
• USCA: USD 757,108 in 2015;
• LATAM: USD 706,632 in 2015.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4aa59b5a5b980aad6873c8a4af4cd223%2FPicture1.png?generation=1715109770510368&alt=media" alt="">
Sales across product categories
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F867cbe622bf94d25a25a1c4b9281656d%2FPicture5.png?generation=1715109794950614&alt=media" alt="">
Office supplies were the largely sold product category in 2012-2015. Technology was the least sold product category by quantity. However, the Technology category yields high sales.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F5c74664f77cce2bc2f7c77c7b01e9890%2FPicture6.png?generation=1715109834309500&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd3bb766183e9f58fbf009a998c01adf6%2FPicture7.png?generation=1715109872961254&alt=media" alt="">
Further analysis of profitable products reveals that phones and copiers demonstrate high sales.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F109c4c3eab81fa581c19a5c09beff839%2FPicture9.png?generation=1715109914590660&alt=media" alt="">
Sales across segments
The data reveals that there are high sales in the Consumer segment across all product categories.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F65075cc20028a37a1aff6932fa89d3d5%2FPicture10.png?generation=1715109992655572&alt=media" alt="">
Diagnostic analytics
Two sample T-test
Using a t-test, we can evaluate how sales differ across different segments, regions, and product types. T-test allows us to evaluate the statistical significance of sales samples.
The two-sample t-test of sales numbers across markets resulted in the statistical significance of sales in USCA and LATAM markets with p-values >0.05.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F7b7264d5f44a9a79b352028b28d1c618%2FPicture11.png?generation=1715110082746375&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4061ef38ea83d7e3bbd252a802863e8f%2FPicture12.png?generation=1715110097203251&alt=media" alt="">
The two-sample t-test of sales numbers across product categories resulted in the statistical significance of sales in Office supplies and Technology categories with p-values >0.05.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd9994377d605222d77ef67af3e273771%2FPicture13.png?generation=1715110126112322&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F669779e9aad19d51a28fb44e7c484bc7%2FPicture14.png?generation=1715110140543290&alt=media" alt="">
Pearson correlation The correlation of continuous values in the dataset allows us to see the relationship between sales, quantity sold, shipping costs and profit. . The length of beach cleaned is at the discretion of the clean-up group and the total weight of items removed is either weighed with handheld scales or estimated.
Facebook
TwitterThis a dataset of finances which are also available in Power BI for practice. Use this dataset to practice Power BI.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consist of: 1. soildatato sharing. Excel table showing visual detection (1) or no detection (0) by 15 consumers of three types of food soils on cutting boards or counter tops 2.visualdetection. Excel table showing data for 13 consumers doing visual detection (scale clean =1 to dirty =4) of kitchen surfaces, and swabs used at kitchen surfaces. 3.survivalpathogensdrysoil. Excel table showing fate of Salmonella, Campylobacter and total counts when dried in 3 types of food soils and water
Facebook
TwitterData used to evaluate potential downstream impacts of the NorthMet Mine, by USEPA Office of Research and Development is providing, for USEPA Region 5’s use, including a characterization of stream specific conductivity (SC) levels, least disturbed background SC, and SC levels that may exceed the Fond du Lac Band’s WQ standards and adversely affect aquatic life, including brook trout (Salvelinus fontinalis), lake sturgeon (Acipenser fulvescens), and benthic macroinvertebrates. Keywords: Conductivity, St. Louis River, benthic invertebrates; mining The attached Excel Pedigree includes: _Datasets: Data file uploaded to EPA Science Hub and/or Environmental Data Set Gateway _R : Clean R scripts used to generate document figures and tables _Tables_Figures: Files generated from R script and used in the Region 5 memo 20220325 R Code and Data: All additional files used for this project, including original files, intermediate files, extra output files, and extra functions. The "_R" folder contains four subfolders. Each subfolder has several R scripts, input and output files, and an R project file. Users can run R scripts directly from each subfolder by installing R, RStudio, and associated R packages. Data Dictionary: See tab DataDictionary in Excel file Datasets: Simplified language is used in the text to identify parent data sets. Source and File names are retained in this pedigree in original form to enable R-scripts to retain functionality. • Thingvold et al. (1975-1977) • Griffith (1998-2009) • Predicted background (2000-2015) • Water Quality Portal (1996-2021) • Water Quality Portal Less Disturbed (1996-2021) • Minnesota Pollution Control Agency (MPCA) (1996-2013) • Mid-Atlantic Highlands (1990 to 2014). This dataset is associated with the following publication: Cormier, S., and Y. Wang. Appendix C: ORD Specific Conductance Memo, from Susan Cormier to Tera Fong. March 15, 2022. Assessment of effects of increased ion concentrations in the St. Louis River Watershed with special attention to potential mining influence and the jurisdiction of the Fond du Lac Band of Lake Superior Chippewa. U.S. Environmental Protection Agency, Washington, DC, USA, 2022.
Facebook
TwitterA Knowledge, Attitudes, and Practices (KAP) survey was conducted in Ajuong Thok and Pamir Refugee Camps in November 2018 to determine the current Water, Sanitation, and Hygiene (WASH) conditions as well as hygiene attitudes and practices within the households (HHs) surveyed. The assessment utilized a systematic random sampling method, and a total of 1,040 HHs (520 HHs in each location) were surveyed using mobile data collection (MDC) within a period of 10 days. Data was cleaned and analyzed in Excel. The summary of the results is presented in this report.
The findings showed that the overall average number of liters of water per person per day was 21, in both Ajuong Thok and Pamir Camps, which was slightly higher than the recommended Office of the United Nations High Commissioner for Refugees (UNHCR) minimum standard of at least 20 liters of water available per person per day. This is a slight improvement from the 19.5 liters reported the previous year. The average HH size was six people. Women comprised 83.2% of the surveyed respondents and males 16.8%. Almost all the respondents were refugees, constituting 99.6%. The refugees were aware of the key health and hygiene practices, possibly as a result of routine health and hygiene messages delivered to them by Samaritan´s Purse (SP), Africa Humanitarian Action (AHA) and International Rescue Committee (IRC). Most refugees had knowledge about keeping water containers clean, washing hands during critical times, safe excreta disposal and disease prevention.
Ajuong Thok and Pamir Refugee Camps
Households
All households in Ajuong Thok and Pamir Refugee Camps
Sample survey data [ssd]
Households were selected using systematic random sampling. Enumerators systematically walked through each row in each block of the camps, in such a way as to give each HH a chance to be selected. For each block, the enumerators began at one corner and went row by row, systematically using the sampling interval (SI) to select HHs. The first HH sampled in each block was determined by selecting a random number between 1 and the SI, (6 in Ajuong Thok and 7 in Pamir). After selecting the first HH, the SI was used to identify the next respondent HH. The female head of the household was the preferred respondent. If she was not available, another adult (over 15 years of age) with knowledge of the HH´s WASH practices was surveyed. If no one qualified to answer the survey, the HH was replaced systematically using the SI.
Face-to-face [f2f]
The survey questionnaire used to collect the data consists of the following sections: - Demographics - Water - Sanitation - Hygiene - NFI Distribution
The data collected was uploaded to a server at the end of each day. IFormBuilder generated a Microsoft (MS) Excel spreadsheet dataset which was then cleaned and analyzed using MS Excel.
Given that SP is currently implementing a WASH program in Ajuong Thok and Pamir, the assessment data collected in these camps will not only serve as the endline for UNHCR 2018 programming but also as the baseline for 2019 programming.
Data was anonymized through decoding and local suppression.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The CEPS EurLex dataset The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. The dataset was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU. Brief description: - The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables. - The full text of 134.633 laws is included (column "act_raw_text"). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English). - 22 additional variables are included, such as 'Act_name', 'Act_type', 'Subject_matter', 'Authors', 'Date_document', 'ELI_link', 'CELEX' (a unique identifier for every law). Please see the "CEPS_EurLex_codebook.pdf" file for an explanation of all variables. - Given its size, the dataset was uploaded in different batches to facilitate usage. Some Excel files are provided for non-technical users. We recommend, however, the use of the CSV files, since Excel does not save large amounts of data properly. EurLex_all.csv is the master file containing all data. Caveats: - The Eur-lex.eu website does not consistently provide data for all the variables. In addition, the HTML documents were not always cleanly formatted and text extraction from scanned PDFs is not entirely clean. Some data points are therefore missing for some laws and some laws were excluded entirely. - Not not all (older) laws were available in English, especially since Ireland and the UK only joined the European Communities in 1973. Non-English laws are excluded from the dataset. Other: - For details on the types of EU legal acts: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en - An example for an experimental analysis with this dataset: https://trigger-project.eu/2019/10/28/a-data-science-approach-to-eu-differentiated-integration/ - The TRIGGER project is funded by the EU's Horizon 2020 programme, grant number 822735
Facebook
TwitterHabitat loss and fragmentation are one of the biggest threats facing wildlife today. Understanding the role of wildlife pathways in connecting resource areas is key to maintain landscape connectivity, reduce the impacts of habitat loss and help address human-wildlife conflict. In this study, we used sign surveys and camera trapping to understand the fine scale movement of elephants moving between a protected area and agricultural zone in the Masai Mara, Kenya. We used Generalised Linear Models to determine factors driving high frequency of pathway use by elephants. Our results showed strong seasonal trends in pathway use, with peaks coinciding with the dry season. However, no correlations between rainfall and pathway use were found. Temporal patterns of pathway use indicate that elephants use risk avoidance strategies by moving between the two areas at times of low human disturbance. Spatial analysis revealed that the most frequently used pathways were closer to farms, saltlicks and for..., We identified active pathways along the escarpment with the assistance of local rangers and farmers (Figure 2). We assumed pathways were in use if the path was devoid of vegetation (Blake and Inkamba-Nkulu, 2004), marked with elephant dung or footprints and showed signs of elephant browsing on the bordering vegetation (Von Gerhardt et al., 2014). Pathways that did not show any of these signs were not included in this study. We then mapped each pathway using a Garmin Etrek30 Global Positioning System (GPS). The GPS track was taken from the bottom of the escarpment on the border of the Masai Mara to the top of the escarpment. The end of the pathway was determined by the point at which the pathway widened and became open habitat. Habitat type was also recorded on each pathway using a classification system from Kindt et al., (2011). As each pathway went through a number of different habitats, we used a GPS to record the co-ordinate at which there was a change in habitat type. To determine s..., , # Elephant pathway use in a human-dominated landscape
https://doi.org/10.5061/dryad.ns1rn8q20
Data includes the final clean Excel sheets containing all the variable data that was imported into R for analysis. This data was used for Spearman’s Rank Correlation tests, a linear model and descriptive statistics.
The files 'SURVEY A_results' and 'SURVEY B_results' are Excel spreadsheets with a summary of the camera trap images from the pathways. Each row is one camera trap image with the processed data of the date, time, photo label, elephant group type, number of elephants and whether the elephants were traveling up or down the pathway.
The file 'Data_Analysis_1' is an Excel spreadsheet that has all the data used in the papers models. This dataset has the different pathway use variables that were tested. For example, distance to farmland, slope etc.Â
The file 'conflict' is an Excel spreadsheet wit...
Facebook
TwitterThis dataset was created by Shiva Vashishtha