Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Homininos_DataSet(1).csv is the original///////// Homininos_DataSet.csv It already has the categorical values encoded
Exploring Human Evolution Through a Comprehensive Dataset
Introduction:
In this dataset, we delve into the fascinating story of human evolution. With 720 rows and 28 columns, this dataset covers a wide range of characteristics of different hominids, from the earliest consensual ancestors to modern Homo sapiens. This comprehensive compilation aims to facilitate the search for relationships between various key variables, thereby providing a more complete and detailed understanding of human evolution.
Objectives:
The main objective of this dataset is to facilitate the exploration and understanding of human evolution from a broader and more detailed perspective. Some specific objectives include:
Seeking relationships between important columns of the dataset. Understanding human evolution considering the collected data. Investigating the possible linearity of evolution over time. Analyzing potential relationships between brain size, developed technologies, diet, and physiological modifications over time. Significance:
This dataset is crucial for advancing our understanding of human evolution and history. It provides a solid foundation for research in various fields, from anthropology and evolutionary biology to archaeology and genetics. By allowing us to examine relationships and patterns among different variables, this dataset helps us trace the course of human evolution and gain a better understanding of our place in the tree of life.
Conclusions:
In summary, this comprehensive dataset provides us with a valuable tool for exploring human evolution in depth. With its numerous rows and columns, it allows us to delve into the complexity and diversity of our evolutionary history. By analyzing and understanding the collected data, we can gain new insights into how we have come to be what we are today and how our species has evolved over time.
This dataset not only expands our knowledge of human evolution but also inspires us to continue researching and discovering more about our shared past as a species.
I studied Biological Anthropology for 4 years at the National University of La Palta, and I had the opportunity to compile these data from classes and books such as Carbonell's "Homínidos: las primeras ocupaciones de los continentes," published in 2005.
INFO About Columns: Genus & Species: (categorical) This column contains the genus and specific name of the species. It provides taxonomic information about each hominid included in the dataset, allowing for precise identification
Time : (categorical) This column indicates the time period during which each hominid species lived. It helps to establish chronological context and understand the temporal distribution of different hominid groups.
Location: (categorical) This column records the continent location where each hominid species lived.
Zone: (categorical) Describes either east, west, south or north of the continent
Current Country: (categorical) Records the modern-day country associated with the location where each hominid species lived, facilitating geographical comparisons.
Habitat: (categorical) This column describes the typical habitat or environment inhabited by each hominid species. It provides information about the ecological niche and adaptation strategies of different hominids throughout history.
Cranial Capacity: (numeric) This column provides data on the cranial capacity of each hominid species. Cranial capacity is a key indicator of brain size and can offer insights into cognitive abilities and evolutionary trends.
Height: (numeric) Describes the average height or stature of each hominid species
Incisor Size: (categorical) Indicates the size of the incisors in each hominid species
Jaw Shape: (categorical) Describes the shape or morphology of the jaw in each hominid species
Torus Supraorbital: (categorical) Specifies the shape or morphology of a supraorbital torus in each hominid species
Prognathism: (categorical) Indicates the degree of facial prognathism or protrusion in each hominid species
Foramen Mágnum Position: (categorical) Describes the position of the foramen magnum in each hominid species
Canine Size: (categorical) Indicates the size of the canines in each hominid species
Canines Shape: (categorical) Describes the shape of the canines in each hominid species, providing information about their dietary adaptations and social behavior.
Tooth Enamel: (categorical) Specifies the characteristics of tooth enamel in each hominid species, which may indicate aspects of dietary ecology and dental health.
Tecno: (categorical) Records the presence or absence of technological advancements
Tecno Type: (categorical) Describes the specific type or style of technology associated with each hom...
The Human Know-How Dataset describes 211,696 human activities from many different domains. These activities are decomposed into 2,609,236 entities (each with an English textual label). These entities represent over two million actions and half a million pre-requisites. Actions are interconnected both according to their dependencies (temporal/logical orders between actions) and decompositions (decomposition of complex actions into simpler ones). This dataset has been integrated with DBpedia (259,568 links). For more information see: - The project website: http://homepages.inf.ed.ac.uk/s1054760/prohow/index.htm - The data is also available on datahub: https://datahub.io/dataset/human-activities-and-instructions ---------------------------------------------------------------- * Quickstart: if you want to experiment with the most high-quality data before downloading all the datasets, download the file '9of11_knowhow_wikihow', and optionally files 'Process - Inputs', 'Process - Outputs', 'Process - Step Links' and 'wikiHow categories hierarchy'. * Data representation based on the PROHOW vocabulary: http://w3id.org/prohow# Data extracted from existing web resources is linked to the original resources using the Open Annotation specification * Data Model: an example of how the data is represented within the datasets is available in the attached Data Model PDF file. The attached example represents a simple set of instructions, but instructions in the dataset can have more complex structures. For example, instructions could have multiple methods, steps could have further sub-steps, and complex requirements could be decomposed into sub-requirements. ---------------------------------------------------------------- Statistics: * 211,696: number of instructions. From wikiHow: 167,232 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 44,464 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 2,609,236: number of RDF nodes within the instructions From wikiHow: 1,871,468 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 737,768 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 255,101: number of process inputs linked to 8,453 distinct DBpedia concepts (dataset Process - Inputs) * 4,467: number of process outputs linked to 3,439 distinct DBpedia concepts (dataset Process - Outputs) * 376,795: number of step links between 114,166 different sets of instructions (dataset Process - Step Links)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book series. It has 1 row and is filtered where the books is The first people : from the earliest primates to homo sapiens : where and how our ancestors lived. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations. Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES PLACE OF BIRTH - DP02 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 People not reporting a place of birth were assigned the state or country of birth of another family member, or were allocated the response of another individual with similar characteristics. People born outside the United States were asked to report their place of birth according to current international boundaries. Since numerous changes in boundaries of foreign countries have occurred in the last century, some people may have reported their place of birth in terms of boundaries that existed at the time of their birth or emigration, or in accordance with their own national preference.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
In this dataset, we delve into the fascinating story of human evolution. With 12000 rows and 28 columns, this dataset covers a wide range of characteristics of different hominids, from the earliest consensual ancestors to modern Homo sapiens. This comprehensive compilation aims to facilitate the search for relationships between various key variables, thereby providing a more complete and detailed understanding of human evolution.
Objectives: The objective is to predict either the gender and species or whether they were bipedal or not. Also, the objective is to avoid the overfeeding of the model, because there are several models that show signs of overfeeding
About the Data: Genus & Species: (categorical) This column contains the genus and specific name of the species. It provides taxonomic information about each hominid included in the dataset, allowing for precise identification
Time : (categorical) This column indicates the time period during which each hominid species lived. It helps to establish chronological context and understand the temporal distribution of different hominid groups.
Location: (categorical) This column records the continent location where each hominid species lived.
Zone: (categorical) Describes either east, west, south or north of the continent
Current Country: (categorical) Records the modern-day country associated with the location where each hominid species lived, facilitating geographical comparisons.
Habitat: (categorical) This column describes the typical habitat or environment inhabited by each hominid species. It provides information about the ecological niche and adaptation strategies of different hominids throughout history.
Cranial Capacity: (numeric) This column provides data on the cranial capacity of each hominid species. Cranial capacity is a key indicator of brain size and can offer insights into cognitive abilities and evolutionary trends.
Height: (numeric) Describes the average height or stature of each hominid species
Incisor Size: (categorical) Indicates the size of the incisors in each hominid species
Jaw Shape: (categorical) Describes the shape or morphology of the jaw in each hominid species
Torus Supraorbital: (categorical) Specifies the shape or morphology of a supraorbital torus in each hominid species
Prognathism: (categorical) Indicates the degree of facial prognathism or protrusion in each hominid species
Foramen Mágnum Position: (categorical) Describes the position of the foramen magnum in each hominid species
Canine Size: (categorical) Indicates the size of the canines in each hominid species
Canines Shape: (categorical) Describes the shape of the canines in each hominid species, providing information about their dietary adaptations and social behavior.
Tooth Enamel: (categorical) Specifies the characteristics of tooth enamel in each hominid species, which may indicate aspects of dietary ecology and dental health.
Tecno: (categorical) Records the presence or absence of technological advancements
Tecno Type: (categorical) Describes the specific type or style of technology associated with each hominid species
Biped: (categorical) Indicates whether each hominid species exhibited bipedal locomotion, a key characteristic distinguishing humans from other primates.
Arms: (categorical) Describes the morphology or characteristics of the arms in each hominid species, offering insights into their locomotor adaptations and manual dexterity.
Foots: (categorical) Specifies the morphology or characteristics of the feet in each hominid species, providing information about their locomotor adaptations and foot anatomy.
Diet: (categorical) Characterizes the dietary habits or preferences of each hominid species
Sexual Dimorphism: (categorical) Indicates the degree of sexual dimorphism
Hip: (categorical) Describes the size of the hip in each hominid species
Vertical Front: (categorical) Specifies the presence or absence of verticality or curvature of the frontal bone in each hominid species, providing information about their cranial morphology.
Anatomy: (categorical) Provides additional information about the anatomical features or characteristics of each hominid species, aiding in comprehensive morphological analyses.
Migrated: (categorical) Indicates whether each hominid species exhibited migration or movement to different geographical areas, offering insights into their dispersal patterns and population dynamics.
Skeleton: (categorical) Describes additional information about anatomy
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The event-Human 3.6m is a synthetic conversion of the the Human 3.6m dataset (H36m). H36m is an existing benchmark video dataset for Human Pose Estimation. This has been cropped optimally and converted to event streams. The final resolution of the samples is 640x480. The Ground Truth is adjusted accordingly. Code for this conversion is available at https://github.com/event-driven-robotics/hpe-core
The dataset is split into parts by zip. To use, download the parts. For linux systems, use the command
cat h36m.z* > eh36m.zip
then unzip normally.
S9 and S11 are test subjects, the rest are training splits. There is a .py file present to demonstrate reading a sample and GT from the dataset.If you use this dataset in your project, please cite the following paper:
@inproceedings{goyal2023moveenet, title={MoveEnet: Online High-Frequency Human Pose Estimation with an Event Camera}, author={Goyal, Gaurvi and Di Pietro, Franco and Carissimi, Nicolo and Glover, Arren and Bartolozzi, Chiara}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={4023--4032}, year={2023}}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 4 rows and is filtered where the books is The meaning of human existence. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an update of a prior dataset publication containing baseline and 5-year follow-up data from the PERU MIGRANT Study (PEru's Rural to Urban MIGRANTs Study).The PERU MIGRANT Study was designed to investigate the magnitude of differences between rural-to-urban migrant and non-migrant groups in specific cardiovascular risk factors. Three groups were selected: i) Rural, people who have always have lived in a rural environment; ii) Rural-urban, people who migrated from rural to urban areas; and, iii) Urban, people who have always lived in a urban environment.PERU MIGRANT Study protocol, instruments and variables are described in full in:Miranda JJ, Gilman RH, García HH, Smeeth L. The effect on cardiovascular risk factors of migration from rural to urban areas in Peru: PERU MIGRANT Study. BMC Cardiovasc Disord 2009;9:23. PERU MIGRANT Study baseline dataset is available at:https://figshare.com/articles/PERU_MIGRANT_Study_Baseline_dataset/3125005Main findings of the baseline study:Miranda JJ, Gilman RH, Smeeth L. Differences in cardiovascular risk factors in rural, urban and rural-to-urban migrants in Peru. Heart 2011;97(10):787-96. Main findings of the 5-yr follow-up study: Carrillo-Larco RM, Bernabé-Ortiz A, Pillay TD, Gilman RH, Sanchez JF, Poterico JA, Quispe R, Smeeth L, Miranda JJ. Obesity risk in rural, urban and rural-to-urban migrants: prospective results of the PERU MIGRANT study. Int J Obes (Lond) 2016;40(1):181-5. Bernabe-Ortiz A, Sanchez JF, Carrillo-Larco RM, Gilman RH, Poterico JA, Quispe R, Smeeth L, Miranda JJ. Rural-to-urban migration and risk of hypertension: longitudinal results of the PERU MIGRANT study. J Hum Hypertens 2017;31(1):22-28. Lazo-Porras M, Bernabe-Ortiz A, Málaga G, Gilman RH, Acuña-Villaorduña A, Cardenas-Montero D, Smeeth L, Miranda JJ. Low HDL cholesterol as a cardiovascular risk factor in rural, urban, and rural-urban migrants: PERU MIGRANT cohort study. Atherosclerosis 2016;246:36-43.Burroughs Pena MS, Bernabé-Ortiz A, Carrillo-Larco RM, Sánchez JF, Quispe R, Pillay TD, Málaga G, Gilman RH, Smeeth L, Miranda JJ. Migration, urbanisation and mortality: 5-year longitudinal analysis of the PERU MIGRANT study. J Epidemiol Community Health 2015;69(7):715-8.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Pre-existing conditions of people who died due to COVID-19, broken down by country, broad age group, and place of death occurrence, usual residents of England and Wales.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description:
There is growing recognition that human-provided food resources are becoming increasingly available to animals across the globe (Oro et al., 2013). The food resources that are wasted by humans have influenced predators’ ecology and behavior and can indirectly affect their co-occurring species, leading to mostly negative ecological effects (Newsome et al., 2014). However, large increases have been found in the abundances of terrestrial mammalian predators such as coyotes (Canis latrans), cats (Felis catus) and red foxes (Vulpes vulpes), which are associated with their access to waste foods provided by humans (Denny et al., 2002; Fedriani et al., 2001; Shapira et al., 2008). Therefore, under anthropogenic global changes where human activities are continually expanding, a spatially explicit data for waste foods is essential to assessing the ecological effects of anthropogenic food subsidies to species occurrences and abundances.
The repository contains a global dataset consisting of four different variables to depict anthropogenic food waste index: household food waste (tons/year), food service food waste (tons/year), retail food waste (tons/year), and total human-provided food waste (tons/year). To produce the dataset, I first allocated the food waste estimates (kg/capita/year) to 30 arc-second grid cells for each county. The food waste estimates for 2021 were generated by normalizing different food waste measurements to a single metric (i.e., kg/capita/year), accounting for known biases or different scopes of measurement, and aggregating a series of studies or observations if multiple observations existed in a geographic entity of interest (United Nations Environment Programme 2021). The food waste estimates were then multiplied by the estimated population count for 2021 produced by Sims et al. 2022. The data files were produced as global rasters at 30 arc-second (~1km at the equator) resolution in geotiff format under WGS 84 geographical coordinate system.
Keywords: Anthropogenic food subsidies, human-provided food wastes, household food waste, food service food waste, retail food waste, food availability, anthropogenic global changes, human activities
Reference:
United Nations Environment Programme (2021). Food Waste Index Report 2021. Nairobi.
Denny, E., Yaklovlevich, P., Eldridge, M.D.B. & Dickman, C.R. (2002) Social and genetic analysis of a population of free-living cats (Felis catus L.) exploiting a resource-rich habitat. Wildlife Research, 29, 405–413.
Fedriani, J.M., Fuller, T.K. & Sauvajot, R.M. (2001) Does availability of anthropogenic food enhance densities of omnivorous mammals? An example with coyotes in southern California. Ecography, 24, 325–331.
Newsome, T. M., Dellinger, J. A., Pavey, C. R., Ripple, W. J., Shores, C. R., Wirsing, A. J., & Dickman, C. R. (2015). The ecological effects of providing resource subsidies to predators. Global Ecology and Biogeography, 24, 1-11.
Oro, D., Genovart, M., Tavecchia, G., Fowler, M. S., & Martínez‐Abraín, A. (2013). Ecological and evolutionary implications of food subsidies from humans. Ecology letters, 16(12), 1501-1514.
Shapira, I., Sultan, H. & Shanas, U. (2008) Agricultural farming alters predator–prey interactions in nearby natural habitats. Animal Conservation, 11, 1–8.
Sims, K., Reith, A., Bright, E., McKee, J., & Rose, A. (2022). LandScan Global 2021 [Data set]. Oak Ridge National Laboratory. https://doi.org/10.48690/1527702.
By Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is D.H. Lawrence and human existence. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chimpanzees are our closest living relatives and have been extensively used in research into the evolution of humans. Although chimpanzees and humans share many of the same cognitive abilities, how they compare in solving spatial tasks is unclear to date. Therefore this study conducted a human physical simulation method that resembles foraging patterns of chimpanzees to enable comparing these spatiotemporal cognitive abilities. Furthermore, this study aimed to interpret animal movement and spatiotemporal cognitive abilities by relating revisit intervals to cognitive processes such as learning and memory. For this, two variables, constancy and contingency, have been used to reflect search efficiency, and their values were used to make inferences about the cognitive abilities of humans and chimpanzees. Ultimately, this study investigated how the average patterns in revisit constancy and contingency relate to the spatiotemporal cognitive abilities of chimpanzees, and how this compares to those of humans. These results are highly valuable in addressing the aforementioned existing knowledge gaps, but the novel stimulation method additionally provides a great perspective for future research into animal movement. This dataset contains the data obtained from the human foraging experiment that was conducted for the Bachelor's thesis: "Using Recursive Movement Data to Study Animal Cognition: Assessing a New Method to Compare Spatiotemporal Intelligence of Humans and Chimpanzees ".
Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.
Section 1 - Ask: A. Guiding Questions: Who are the key stakeholders and what are their goals for the data analysis project? What is the business task that this data analysis project is attempting to solve?
B. Key Tasks: Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team. Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.
Section 2 - Prepare: A. Guiding Questions: Where is the data stored and organized? Are there any problems with the data? How does the data help answer the business question?
B. Key Tasks: Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016. *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDaymerged.csv -dailyActivitymerged.csv Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual IDs in the dailyActivity_merged dataset. *Due to the small number of participants (...
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
NOTE: This dataset pertains only to the 2020-2021 school year and is no longer being updated. For additional data on COVID-19, visit data.ct.gov/coronavirus.
This dataset includes the leading and secondary metrics identified by the Connecticut Department of Health (DPH) and the Department of Education (CSDE) to support local district decision-making on the level of in-person, hybrid (blended), and remote learning model for Pre K-12 education.
Data represent daily averages for two-week periods by date of specimen collection (cases and positivity), date of hospital admission, or date of ED visit. Hospitalization data come from the Connecticut Hospital Association and are based on hospital location, not county of patient residence. COVID-19-like illness includes fever and cough or shortness of breath or difficulty breathing or the presence of coronavirus diagnosis code and excludes patients with influenza-like illness. All data are preliminary.
These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).
These metrics were adapted from recommendations by the Harvard Global Institute and supplemented by existing DPH measures.
For national data on COVID-19, see COVID View, the national weekly surveillance summary of U.S. COVID-19 activity, at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html
DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County/rpph-4ysy
As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.
With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Existing studies on talking video generation have predominantly focused on single-person monologues or isolated facial animations, limiting their applicability to realistic multi-human interactions. To bridge this gap, we introduce MIT, a large-scale dataset specifically designed for multi-human talking video generation. To this end, we develop an automatic pipeline that collects and annotates multi-person conversational videos. The resulting dataset comprises 12 hours of high-resolution footage, each featuring two to four speakers, with fine-grained annotations of body poses and speech interactions. It captures natural conversational dynamics in multi-speaker scenario, offering a rich resource for studying interactive visual behaviors. To demonstrate the potential of MIT, we furthur propose CovOG, a baseline model for this novel task. It integrates a Multi-Human Pose Encoder (MPE) to handle varying numbers of speakers by aggregating individual pose embeddings, and an Interactive Audio Driver (IAD) to modulate head dynamics based on speaker-specific audio features. Together, these components showcase the feasibility and challenges of generating realistic multi-human talking videos, establishing MIT as a valuable benchmark for future research. The dataset and code will be public available.
DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the coalescence (merger) of the two galaxies. Collisions and mergers are rare phenomena, yet they may signal the ultimate fate of most galaxies, including our own Milky Way. With the onset of massive collection of astronomical data, a computerized and automated method will be necessary for identifying those colliding galaxies worthy of more detailed study. This project researches methods to accomplish that goal. Astronomical data from the Sloan Digital Sky Survey (SDSS) and human-provided classifications on merger status from the Galaxy Zoo project are combined and processed with machine learning algorithms. The goal is to determine indicators of merger status based solely on discovering those automated pipeline-generated attributes in the astronomical database that correlate most strongly with the patterns identified through visual inspection by the Galaxy Zoo volunteers. In the end, we aim to provide a new and improved automated procedure for classification of collisions and mergers in future petascale astronomical sky surveys. Both information gain analysis (via the C4.5 decision tree algorithm) and cluster analysis (via the Davies-Bouldin Index) are explored as techniques for finding the strongest correlations between human-identified patterns and existing database attributes. Galaxy attributes measured in the SDSS green waveband images are found to represent the most influential of the attributes for correct classification of collisions and mergers. Only a nominal information gain is noted in this research, however, there is a clear indication of which attributes contribute so that a direction for further study is apparent.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A thorough analysis of the existing human action recognition datasets demonstrates that only a few HRI datasets are available that target real-world applications, all of which are adapted to home settings. Therefore, given the shortage of datasets in industrial tasks, we aim to provide the community with a dataset created in a laboratory setting that includes actions commonly performed within manufacturing and service industries. In addition, the proposed dataset meets the requirements of deep learning algorithms for the development of intelligent learning models for action recognition and imitation in HRI applications.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Homininos_DataSet(1).csv is the original///////// Homininos_DataSet.csv It already has the categorical values encoded
Exploring Human Evolution Through a Comprehensive Dataset
Introduction:
In this dataset, we delve into the fascinating story of human evolution. With 720 rows and 28 columns, this dataset covers a wide range of characteristics of different hominids, from the earliest consensual ancestors to modern Homo sapiens. This comprehensive compilation aims to facilitate the search for relationships between various key variables, thereby providing a more complete and detailed understanding of human evolution.
Objectives:
The main objective of this dataset is to facilitate the exploration and understanding of human evolution from a broader and more detailed perspective. Some specific objectives include:
Seeking relationships between important columns of the dataset. Understanding human evolution considering the collected data. Investigating the possible linearity of evolution over time. Analyzing potential relationships between brain size, developed technologies, diet, and physiological modifications over time. Significance:
This dataset is crucial for advancing our understanding of human evolution and history. It provides a solid foundation for research in various fields, from anthropology and evolutionary biology to archaeology and genetics. By allowing us to examine relationships and patterns among different variables, this dataset helps us trace the course of human evolution and gain a better understanding of our place in the tree of life.
Conclusions:
In summary, this comprehensive dataset provides us with a valuable tool for exploring human evolution in depth. With its numerous rows and columns, it allows us to delve into the complexity and diversity of our evolutionary history. By analyzing and understanding the collected data, we can gain new insights into how we have come to be what we are today and how our species has evolved over time.
This dataset not only expands our knowledge of human evolution but also inspires us to continue researching and discovering more about our shared past as a species.
I studied Biological Anthropology for 4 years at the National University of La Palta, and I had the opportunity to compile these data from classes and books such as Carbonell's "Homínidos: las primeras ocupaciones de los continentes," published in 2005.
INFO About Columns: Genus & Species: (categorical) This column contains the genus and specific name of the species. It provides taxonomic information about each hominid included in the dataset, allowing for precise identification
Time : (categorical) This column indicates the time period during which each hominid species lived. It helps to establish chronological context and understand the temporal distribution of different hominid groups.
Location: (categorical) This column records the continent location where each hominid species lived.
Zone: (categorical) Describes either east, west, south or north of the continent
Current Country: (categorical) Records the modern-day country associated with the location where each hominid species lived, facilitating geographical comparisons.
Habitat: (categorical) This column describes the typical habitat or environment inhabited by each hominid species. It provides information about the ecological niche and adaptation strategies of different hominids throughout history.
Cranial Capacity: (numeric) This column provides data on the cranial capacity of each hominid species. Cranial capacity is a key indicator of brain size and can offer insights into cognitive abilities and evolutionary trends.
Height: (numeric) Describes the average height or stature of each hominid species
Incisor Size: (categorical) Indicates the size of the incisors in each hominid species
Jaw Shape: (categorical) Describes the shape or morphology of the jaw in each hominid species
Torus Supraorbital: (categorical) Specifies the shape or morphology of a supraorbital torus in each hominid species
Prognathism: (categorical) Indicates the degree of facial prognathism or protrusion in each hominid species
Foramen Mágnum Position: (categorical) Describes the position of the foramen magnum in each hominid species
Canine Size: (categorical) Indicates the size of the canines in each hominid species
Canines Shape: (categorical) Describes the shape of the canines in each hominid species, providing information about their dietary adaptations and social behavior.
Tooth Enamel: (categorical) Specifies the characteristics of tooth enamel in each hominid species, which may indicate aspects of dietary ecology and dental health.
Tecno: (categorical) Records the presence or absence of technological advancements
Tecno Type: (categorical) Describes the specific type or style of technology associated with each hom...