100+ datasets found
  1. w

    Dataset of books series that contain The first people : from the earliest...

    • workwithdata.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of books series that contain The first people : from the earliest primates to homo sapiens : where and how our ancestors lived [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=The+first+people+:+from+the+earliest+primates+to+homo+sapiens+:+where+and+how+our+ancestors+lived&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book series. It has 1 row and is filtered where the books is The first people : from the earliest primates to homo sapiens : where and how our ancestors lived. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  2. t

    PLACE OF BIRTH - DP02_MAN_P - Dataset - CKAN

    • portal.tad3.org
    Updated Nov 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). PLACE OF BIRTH - DP02_MAN_P - Dataset - CKAN [Dataset]. https://portal.tad3.org/dataset/place-of-birth-dp02_man_p
    Explore at:
    Dataset updated
    Nov 18, 2024
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES PLACE OF BIRTH - DP02 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 People not reporting a place of birth were assigned the state or country of birth of another family member, or were allocated the response of another individual with similar characteristics. People born outside the United States were asked to report their place of birth according to current international boundaries. Since numerous changes in boundaries of foreign countries have occurred in the last century, some people may have reported their place of birth in terms of boundaries that existed at the time of their birth or emigration, or in accordance with their own national preference.

  3. Pre-existing conditions of people who died due to coronavirus (COVID-19),...

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Jul 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2023). Pre-existing conditions of people who died due to coronavirus (COVID-19), England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/preexistingconditionsofpeoplewhodiedduetocovid19englandandwales
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 21, 2023
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Pre-existing conditions of people who died due to COVID-19, broken down by country, broad age group, and place of death occurrence, usual residents of England and Wales.

  4. A

    ‘WHO national life expectancy ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Oct 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘WHO national life expectancy ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-who-national-life-expectancy-c4c7/d31e495e/?iid=008-942&v=presentation
    Explore at:
    Dataset updated
    Oct 30, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘WHO national life expectancy ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mmattson/who-national-life-expectancy on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    I am developing my data science skills in areas outside of my previous work. An interesting problem for me was to identify which factors influence life expectancy on a national level. There is an existing Kaggle data set that explored this, but that information was corrupted. Part of the problem solving process is to step back periodically and ask "does this make sense?" Without reasonable data, it is harder to notice mistakes in my analysis code (as opposed to unusual behavior due to the data itself). I wanted to make a similar data set, but with reliable information.

    This is my first time exploring life expectancy, so I had to guess which features might be of interest when making the data set. Some were included for comparison with the other Kaggle data set. A number of potentially interesting features (like air pollution) were left off due to limited year or country coverage. Since the data was collected from more than one server, some features are present more than once, to explore the differences.

    Content

    A goal of the World Health Organization (WHO) is to ensure that a billion more people are protected from health emergencies, and provided better health and well-being. They provide public data collected from many sources to identify and monitor factors that are important to reach this goal. This set was primarily made using GHO (Global Health Observatory) and UNESCO (United Nations Educational Scientific and Culture Organization) information. The set covers the years 2000-2016 for 183 countries, in a single CSV file. Missing data is left in place, for the user to decide how to deal with it.

    Three notebooks are provided for my cursory analysis, a comparison with the other Kaggle set, and a template for creating this data set.

    Inspiration

    There is a lot to explore, if the user is interested. The GHO server alone has over 2000 "indicators". - How are the GHO and UNESCO life expectancies calculated, and what is causing the difference? That could also be asked for Gross National Income (GNI) and mortality features. - How does the life expectancy after age 60 compare to the life expectancy at birth? Is the relationship with the features in this data set different for those two targets? - What other indicators on the servers might be interesting to use? Some of the GHO indicators are different studies with different coverage. Can they be combined to make a more useful and robust data feature? - Unraveling the correlations between the features would take significant work.

    --- Original source retains full ownership of the source dataset ---

  5. Country data on COVID-19

    • kaggle.com
    Updated Aug 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carla Oliveira (2023). Country data on COVID-19 [Dataset]. https://www.kaggle.com/datasets/carlaoliveira/country-data-on-covid19/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carla Oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The data is in CSV format and includes all historical data on the pandemic up to 03/01/2023, following a 1-line format per country and date.

    In the pre-processing of these data, missing data were checked. It was observed, for example, that the missing data referring to new_cases was where the total number of cases had not been changed and that most of the missing data related to vaccination, which actually at the beginning of the pandemic there was no data. Therefore, to solve these cases of missing data it was decided to replace the data containing “NaN” by zero. Some of these features were combined to generate new features. This process that creates new features (data) from existing data, aiming to improve the data before applying machine learning algorithms, is called feature engineering. The new features created were: - Vaccination rate (vaccination_ratio'): total number of people who received at least one dose of vaccine divided by the population at risk. This dose number was chosen because it has a higher correlation with new deaths. - Prevalence: existing cases of the disease at a given time divided by the population at risk of having the disease. Formula: COVID-19 cases ÷ Population at risk * 100. Example: 168,331 ÷ 210,000,000 * 100 = 0.08. - Incidence: new cases of the disease in a defined population during a specific period (one day, for example) divided by the population at risk. Formula: New COVID-19 cases in one day ÷ Population - Total cases * 100. Example: 5,632 ÷ 209,837,301 * 100 = 0.0026.

  6. P

    HumAID Dataset

    • paperswithcode.com
    Updated Apr 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Firoj Alam; Umair Qazi; Muhammad Imran; Ferda Ofli (2021). HumAID Dataset [Dataset]. https://paperswithcode.com/dataset/humaid
    Explore at:
    Dataset updated
    Apr 6, 2021
    Authors
    Firoj Alam; Umair Qazi; Muhammad Imran; Ferda Ofli
    Description

    Social networks are widely used for information consumption and dissemination, especially during time-critical events such as natural disasters. Despite its significantly large volume, social media content is often too noisy for direct use in any application. Therefore, it is important to filter, categorize, and concisely summarize the available content to facilitate effective consumption and decision-making. To address such issues automatic classification systems have been developed using supervised modeling approaches, thanks to the earlier efforts on creating labeled datasets. However, existing datasets are limited in different aspects (e.g., size, contains duplicates) and less suitable to support more advanced and data-hungry deep learning models.

    HumAID is a large-scale dataset for crisis informatics research with ~77K human-labeled tweets, sampled from a pool of ~24 million tweets across 19 disaster events that happened between 2016 and 2019. The annotations in the provided datasets consists of following humanitarian categories. The dataset consists only english tweets and it is the largest dataset for crisis informatics so far.

    Humanitarian categories: * Caution and advice * Displaced people and evacuations * Don't know can't judge * Infrastructure and utility damage * Injured or dead people * Missing or found people * Not humanitarian * Other relevant information * Requests or urgent needs * Rescue volunteering or donation effort * Sympathy and support

  7. d

    Anomaly Detection with Text Mining

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Anomaly Detection with Text Mining [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-with-text-mining
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The problem that we address in this paper is the discovery of recurring anomalies and relationships between problem reports that may indicate larger systemic problems. We will illustrate our techniques on data from discrepancy reports regarding software anomalies in the Space Shuttle. These free text reports are written by a number of different people, thus the emphasis and wording vary considerably. With Mehran Sahami from Stanford University, I'm putting together a book on text mining called "Text Mining: Theory and Applications" to be published by Taylor and Francis.

  8. O

    CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE

    • data.ct.gov
    • catalog.data.gov
    application/rdfxml +5
    Updated Aug 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CT DPH (2021). CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County-14-d/e4bh-ax24
    Explore at:
    application/rdfxml, xml, tsv, json, csv, application/rssxmlAvailable download formats
    Dataset updated
    Aug 5, 2021
    Dataset authored and provided by
    CT DPH
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Connecticut
    Description

    NOTE: This dataset pertains only to the 2020-2021 school year and is no longer being updated. For additional data on COVID-19, visit data.ct.gov/coronavirus.

    This dataset includes the leading and secondary metrics identified by the Connecticut Department of Health (DPH) and the Department of Education (CSDE) to support local district decision-making on the level of in-person, hybrid (blended), and remote learning model for Pre K-12 education.

    Data represent daily averages for two-week periods by date of specimen collection (cases and positivity), date of hospital admission, or date of ED visit. Hospitalization data come from the Connecticut Hospital Association and are based on hospital location, not county of patient residence. COVID-19-like illness includes fever and cough or shortness of breath or difficulty breathing or the presence of coronavirus diagnosis code and excludes patients with influenza-like illness. All data are preliminary.

    These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

    These metrics were adapted from recommendations by the Harvard Global Institute and supplemented by existing DPH measures.

    For national data on COVID-19, see COVID View, the national weekly surveillance summary of U.S. COVID-19 activity, at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

    DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County/rpph-4ysy

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

  9. m

    Against the medicalisation of companion animals: a multispecies ethnography...

    • figshare.mq.edu.au
    • researchdata.edu.au
    docx
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kat Fletcher (2024). Against the medicalisation of companion animals: a multispecies ethnography of care and companionship. (Dataset) [Dataset]. http://doi.org/10.25949/26264885.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    Macquarie University
    Authors
    Kat Fletcher
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This data was collected as part of a Master of Research in Anthropology exploring the medicalisation of companion animals in Australia. Ethnographic data was collected along with interviews.The data set contains the transcripts of interviews conducted with individuals involved in animal-assisted therapy, who have a service animal, or believe their companion animals significantly impact their mental health.Further research outputs by Katherine Fletcher use this data to discuss the intersubjective relationships between humans and their companion animals, the complexities of these relationships, and the ethical considerations of using another lively social creature as an instrument of therapy.The data has been de-identified.Unspecified consent was obtained by the research participants for data to be used in future research projects.Data context:This data was collected in various locations of the Central Coast NSW Australia.This data was collected to answer a research question: What is the lived experience of people who utilise their relationship with companion species to better their mental health? This was an anthropological project and utilised ethnographic methods, observations and phenomenological theories.Abstract of masters thesis:In modern Australian society, viewing negative human experiences as pathological events is becoming increasingly common. Experiences of suffering, trauma, stress, poverty, neurodivergence and anxiety are conceptualised in terms of clinical disorders, treatable through medical interventions. Human interactions with companion animals have also been medicalised through animal-assisted therapy, service animals, and individually proclaimed ‘emotional support animals’. In the process, interactions with non-humans have been commodified, researched, and medically sanctioned for their utility for human mental health. Yet, while these companions play a hugely significant role in human lives, their relationship to human health is more complicated and ambiguous than clinical psychological models allow. Medical literature often reduces the agency, individualism, and contextual behaviour of non-human species in favour of finding a statistically significant connection between the reduction of pathological symptoms and various human-animal interactions. Critiquing this project, this thesis explores the experience of living, healing, and suffering with non-human companions through a phenomenological lens. It looks beyond clinical psychological models to explore the intersubjective encounters, lively responses, and inevitable conflicts between different social species. Interactions with companion species can allow people to view themselves from different perspectives and undergo animal-motivated self alteration. However, multispecies healing experiences are subjective. They involve an intersection of particular bonds between entities, the interests of the animals, and the attitudes and expectations of humans. Our shared mortality and sociality bind humans and our companion animals together but also create conflicts, existential suffering, and change.Data processing declaration:This data has been processed to ensure the anonymity of the research participants.Details such as names, ages, and locations have been omitted or changed.Some data has been intentionally left out as it cannot be anonymised and still reflect a true research context. This data includes details surrounding medical diagnosis, medication regimes, and specific event details.Please be aware that if you intend to use this data in your research, these anonymising details may change certain aspects of your research results.The words of participants remain as unchanged as possible to ensure the sentiment of their words and their experiences can be used accurately within other research projects.

  10. T

    United States Existing Home Sales

    • tradingeconomics.com
    • ar.tradingeconomics.com
    • +12more
    csv, excel, json, xml
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). United States Existing Home Sales [Dataset]. https://tradingeconomics.com/united-states/existing-home-sales
    Explore at:
    csv, json, xml, excelAvailable download formats
    Dataset updated
    May 22, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1968 - Jun 30, 2025
    Area covered
    United States
    Description

    Existing Home Sales in the United States decreased to 3930 Thousand in June from 4040 Thousand in May of 2025. This dataset provides the latest reported value for - United States Existing Home Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  11. Data from: An open presurgery MRI dataset of people with epilepsy and focal...

    • openneuro.org
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabiane Schuch; Lennart Walger; Matthias Schmitz; Bastian David; Tobias Bauer; Antonia Harms; Laura Fischbach; Freya Schulte; Martin Schidlowski; Johannes Reiter; Felix Bitzer; Randi von Wrede; Attila Rácz; Tobias Baumgartner; Valeri Borger; Matthias Schneider; Achim Flender; Albert Becker; Hartmut Vatter; Bernd Weber; Louisa Specht-Riemenschneider; Alexander Radbruch; Rainer Surges; Theodor Rüber (2025). An open presurgery MRI dataset of people with epilepsy and focal cortical dysplasia type II [Dataset]. http://doi.org/10.18112/openneuro.ds004199.v1.0.6
    Explore at:
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Fabiane Schuch; Lennart Walger; Matthias Schmitz; Bastian David; Tobias Bauer; Antonia Harms; Laura Fischbach; Freya Schulte; Martin Schidlowski; Johannes Reiter; Felix Bitzer; Randi von Wrede; Attila Rácz; Tobias Baumgartner; Valeri Borger; Matthias Schneider; Achim Flender; Albert Becker; Hartmut Vatter; Bernd Weber; Louisa Specht-Riemenschneider; Alexander Radbruch; Rainer Surges; Theodor Rüber
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We present an MRI dataset of 85 people with epilepsy due to focal cortical dysplasia (FCD) type II and 85 healthy controls. In epilepsy imaging, automated detection of FCDs plays a vital role because FCDs often escape conventional MRI analysis. Accurate recognition of FCDs is essential for affected patients. Surgical resection of the dysplastic cortex is associated with a high success rate, and a substantial number of patients subsequently become seizure-free. We hope this dataset's publication will improve computer-aided FCD detection by enabling the validation of existing algorithms or aiding the development of new approaches. Our dataset includes MRI data from T1 and FLAIR weighted images, manually labeled lesion masks, and selected clinical features.

  12. d

    Synthetic vehicle trajectory dataset for the metropolitan city of Los...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Dec 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chrysovalantis Anastasiou; Seon Ho Kim; Cyrus Shahabi (2022). Synthetic vehicle trajectory dataset for the metropolitan city of Los Angeles using DDTG [Dataset]. http://doi.org/10.5061/dryad.4j0zpc8gf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 2, 2022
    Dataset provided by
    Dryad
    Authors
    Chrysovalantis Anastasiou; Seon Ho Kim; Cyrus Shahabi
    Time period covered
    Nov 22, 2022
    Area covered
    Los Angeles
    Description

    All methods are described in the paper "Generation of Synthetic Urban Vehicle Trajectories", IEEE BigData 2022.

  13. Data from: PANACEA dataset - Heterogeneous COVID-19 Claims

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Arana-Catania; Miguel Arana-Catania; Elena Kochkina; Elena Kochkina; Arkaitz Zubiaga; Arkaitz Zubiaga; Maria Liakata; Maria Liakata; Rob Procter; Rob Procter; Yulan He; Yulan He (2022). PANACEA dataset - Heterogeneous COVID-19 Claims [Dataset]. http://doi.org/10.5281/zenodo.6493847
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Miguel Arana-Catania; Miguel Arana-Catania; Elena Kochkina; Elena Kochkina; Arkaitz Zubiaga; Arkaitz Zubiaga; Maria Liakata; Maria Liakata; Rob Procter; Rob Procter; Yulan He; Yulan He
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.

    This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.

    The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).

    The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).

    The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.

    The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.

    The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.

    The data sources used are:

    - The CoronaVirusFacts/DatosCoronaVirus Alliance Database. https://www.poynter.org/ifcn-covid-19-misinformation/

    - CoAID dataset (Cui and Lee, 2020) https://github.com/cuilimeng/CoAID

    - MM-COVID (Li et al., 2020) https://github.com/bigheiniu/MM-COVID

    - CovidLies (Hossain et al., 2020) https://github.com/ucinlp/covid19-data

    - TREC Health Misinformation track https://trec-health-misinfo.github.io/

    - TREC COVID challenge (Voorhees et al., 2021; Roberts et al., 2020) https://ir.nist.gov/covidSubmit/data.html

    The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).

    The entries in the dataset contain the following information:

    - Claim. Text of the claim.

    - Claim label. The labels are: False, and True.

    - Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.

    - Original information source. Information about which general information source was used to obtain the claim.

    - Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.

    Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).

    References

    - Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596

    - Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.

    - Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.

    - Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.

    - Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.

    - Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.

    - Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

    - Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.

    - Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.

    - Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.

    - Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.

    - Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.

  14. d

    Dataset of underserved microtransit users in the Sacramento area, California...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Xing; Susan Pike; Susan Handy; Yunshi Wang (2024). Dataset of underserved microtransit users in the Sacramento area, California [Dataset]. http://doi.org/10.5061/dryad.r7sqv9smh
    Explore at:
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Yan Xing; Susan Pike; Susan Handy; Yunshi Wang
    Description

    Transportation-disadvantaged populations often face significant challenges in meeting their basic travel needs. Microtransit, a technology-enabled transit mobility solution, can potentially address these issues by providing on-demand, affordable, and flexible services. However, the extent to which microtransit serves underserved populations and the factors influencing their adoption remain unclear. This research focuses on SmaRT Ride, a microtransit pilot program operated by the Sacramento Regional Transit (SacRT) in the Sacramento area. From early February to the end of May 2024, online and intercept surveys were conducted among underserved populations to understand their travel behavior. After data cleaning, 180 valid responses were collected. Descriptive analysis of the data shows that SmaRT Ride has significantly improved transportation access for these communities. Furthermore, logistic regressions were employed to explore factors influencing the willingness to adopt microtransit a..., Reaching underserved communities, especially for surveys, is challenging due to socioeconomic and language barriers. To improve our sample and gather more responses to our survey, we used contacts from existing datasets, worked with local food banks, and identified transit stops and other intercept survey site recommendations from stakeholders such as SacRT. Additionally, multiple methods such as online, in-person, and telephone/text message survey recruitment methods were used to accommodate different preferences and access levels of underserved individuals., , Title: Dataset of underserved microtransit users in the Sacrament Area, California

    Access this dataset on Dryad: https://doi.org/10.5061/dryad.r7sqv9smh

    Dataset contents: Variables/Features: 273 variables. Detailed descriptions of these variables are provided in the attached "Variable description" Excel file. Number of Entries: 180 cases Time Frame: Data was collected From early February to the end of May 2024 Format: SPSS (.sav)

    Description: This dataset contains underserved populations' opinions, daily travel pattern, and use of SmaRT Ride, a microtransit pilot program operated by the Sacmento Regional Transit (SacRT) in the Sacramento area. Sampling methods used to reach underserved communities included obtaining email addresses from existing datasets for online surveys, conducting intercept surveys at food distribution sites associated with food banks, busy transit stops, and other locations recommended by stakeholders such as SacRT. Rea...

  15. Z

    Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

    • data.niaid.nih.gov
    Updated Jan 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mihael Mohorčič (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
    Explore at:
    Dataset updated
    Jan 6, 2023
    Dataset provided by
    Andrej Hrovat
    Aleš Simončič
    Miha Mohorčič
    Mihael Mohorčič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

    This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

    It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

    Related dataset

    Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

    Measurement setup

    The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

    The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

    The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

    Data preprocessing

    The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

    PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

    Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
    Missing IE fields in the captured PR are not included in PR_IE_DATA.

    When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

    {'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

    where PR_data is structured as follows:

    { 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

    This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

    At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

    Folder structure

    For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

    The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

    Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

    Environments description

    The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

    Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

    Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

    Known dataset shortcomings

    Due to technical and physical limitations, the dataset contains some identified deficiencies.

    PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

    Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

    The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

     Location 1 - Piazza del Duomo - Chierici
    

    The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

     Location 2 - Via Etnea - Piazza del Duomo
    

    The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

     Location 3 - Via Etnea - Piazza Università
    

    Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

     Location 4 - Piazza Università
    

    This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

    Recognitions

    The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.

  16. Physical Exercise Recognition Dataset

    • kaggle.com
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhannad Tuameh (2023). Physical Exercise Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/muhannadtuameh/exercise-recognition
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhannad Tuameh
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Note:

    Because this dataset has been used in a competition, we had to hide some of the data to prepare the test dataset for the competition. Thus, in the previous version of the dataset, only train.csv file is existed.

    Content

    This dataset represents 10 different physical poses that can be used to distinguish 5 exercises. The exercises are Push-up, Pull-up, Sit-up, Jumping Jack and Squat. For every exercise, 2 different classes have been used to represent the terminal positions of that exercise (e.g., “up” and “down” positions for push-ups).

    Collection Process

    About 500 videos of people doing the exercises have been used in order to collect this data. The videos are from Countix Dataset that contain the YouTube links of several human activity videos. Using a simple Python script, the videos of 5 different physical exercises are downloaded. From every video, at least 2 frames are manually extracted. The extracted frames represent the terminal positions of the exercise.

    Processing Data

    For every frame, MediaPipe framework is used for applying pose estimation, which detects the human skeleton of the person in the frame. The landmark model in MediaPipe Pose predicts the location of 33 pose landmarks (see figure below). Visit Mediapipe Pose Classification page for more details.

    https://mediapipe.dev/images/mobile/pose_tracking_full_body_landmarks.png" alt="33 pose landmarks">

  17. a

    2010 Census Tracts Profile

    • opendata.atlantaregional.com
    • hub.arcgis.com
    Updated Feb 1, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fulton County, Georgia - GIS (2017). 2010 Census Tracts Profile [Dataset]. https://opendata.atlantaregional.com/documents/969730e0c21247b98a3d2628133a2dcb
    Explore at:
    Dataset updated
    Feb 1, 2017
    Dataset authored and provided by
    Fulton County, Georgia - GIS
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The 2010 Census Blocks with Demographic Profile dataset was produced by joining the U.S.Census Bureau's 2010 TIGER/Line File-derived Census Blocks for Fulton County with selected 2010 Summary File 1 data fields. The result is a census block boundary layer attributed with some the more commonly used demographics such as total population, population by race, population by age group, median age, and housing and household characteristics. Because the dataset was derived from the TIGER/Line File Census Blocks, the U.S.Census Bureau's metadata for that dataset is provided below.The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.

  18. Z

    Dataset of knee joint contact force peaks and corresponding subject...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lavikainen, Jere Joonatan (2023). Dataset of knee joint contact force peaks and corresponding subject characteristics from 4 open datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7253457
    Explore at:
    Dataset updated
    Oct 9, 2023
    Dataset provided by
    Stenroth, Lauri
    Lavikainen, Jere Joonatan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data from overground walking trials of 166 subjects with several trials per subject (approximately 2900 trials total).

    DATA ORIGINS & LICENSE INFORMATION

    The data comes from four existing open datasets collected by others:

    Schreiber & Moissenet, A multimodal dataset of human gait at different walking speeds established on injury-free adult participants

    article: https://www.nature.com/articles/s41597-019-0124-4

    dataset: https://figshare.com/articles/dataset/A_multimodal_dataset_of_human_gait_at_different_walking_speeds/7734767

    Fukuchi et al., A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals

    article: https://peerj.com/articles/4640/

    dataset: https://figshare.com/articles/dataset/A_public_data_set_of_overground_and_treadmill_walking_kinematics_and_kinetics_of_healthy_individuals/5722711

    Horst et al., A public dataset of overground walking kinetics and full-body kinematics in healthy adult individuals

    article: https://www.nature.com/articles/s41598-019-38748-8

    dataset: https://data.mendeley.com/datasets/svx74xcrjr/3

    Camargo et al., A comprehensive, open-source dataset of lower limb biomechanics in multiple conditions of stairs, ramps, and level-ground ambulation and transitions

    article: https://www.sciencedirect.com/science/article/pii/S0021929021001007

    dataset (3 links): https://data.mendeley.com/datasets/fcgm3chfff/1 https://data.mendeley.com/datasets/k9kvm5tn3f/1 https://data.mendeley.com/datasets/jj3r5f9pnf/1

    In this dataset, those datasets are referred to as the Schreiber, Fukuchi, Horst, and Camargo datasets, respectively. The Schreiber, Fukuchi, Horst, and Camargo datasets are licensed under the CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/).

    We have modified the datasets by analyzing the data with musculoskeletal simulations & analysis software (OpenSim). In this dataset, we publish modified data as well as some of the original data.

    STRUCTURE OF THE DATASET The dataset contains two kinds of text files: those starting with "predictors_" and those starting with "response_".

    Predictors comprise 12 text files, each describing the input (predictor) variables we used to train artifical neural networks to predict knee joint loading peaks. Responses similarly comprise 12 text files, each describing the response (outcome) variables that we trained and evaluated the network on. The file names are of the form "predictors_X" for predictors and "response_X" for responses, where X describes which response (outcome) variable is predicted with them. X can be: - loading_response_both: the maximum of the first peak of stance for the sum of the loading of the medial and lateral compartments - loading_response_lateral: the maximum of the first peak of stance for the loading of the lateral compartment - loading_response_medial: the maximum of the first peak of stance for the loading of the medial compartment - terminal_extension_both: the maximum of the second peak of stance for the sum of the loading of the medial and lateral compartments - terminal_extension_lateral: the maximum of the second peak of stance for the loading of the lateral compartment - terminal_extension_medial: the maximum of the second peak of stance for the loading of the medial compartment - max_peak_both: the maximum of the entire stance phase for the sum of the loading of the medial and lateral compartments - max_peak_lateral: the maximum of the entire stance phase for the loading of the lateral compartment - max_peak_medial: the maximum of the entire stance phase for the loading of the medial compartment - MFR_common: the medial force ratio for the entire stance phase - MFR_LR: the medial force ratio for the first peak of stance - MFR_TE: the medial force ratio for the second peak of stance

    The predictor text files are organized as comma-separated values. Each row corresponds to one walking trial. A single subject typically has several trials. The column labels are DATASET_INDEX,SUBJECT_INDEX,KNEE_ADDUCTION,MASS,HEIGHT,BMI,WALKING_SPEED,HEEL_STRIKE_VELOCITY,AGE,GENDER.

    DATASET_INDEX describes which original dataset the trial is from, where {1=Schreiber, 2=Fukuchi, 3=Horst, 4=Camargo}

    SUBJECT_INDEX is the index of the subject in the original dataset. If you use this column, you will have to rewrite these to avoid duplicates (e.g., several datasets probably have subject "3").

    KNEE_ADDUCTION is the knee adduction-abduction angle (positive for adduction, negative for abduction) of the subject in static pose, estimated from motion capture markers.

    MASS is the mass of the subject in kilograms

    HEIGHT is the height of the subject in millimeters

    BMI is the body mass index of the subject

    WALKING_SPEED is the mean walking speed of the subject during the trial

    HEEL_STRIKE_VELOCITY is the mean of the velocities of the subject's pelvis markers at the instant of heel strike

    AGE is the age of the subject in years

    GENDER is an integer/boolean where {1=male, 0=female}

    The response text files contain one floating-point value per row, describing the knee joint contact force peak for the trial in newtons (or the medial force ratio). Each row corresponds to one walking trial. The rows in predictor and response text files match each other (e.g., row 7 describes the same trial in both predictors_max_peak_medial.txt and response_max_peak_medial.txt).

    See our journal article "Prediction of Knee Joint Compartmental Loading Maxima Utilizing Simple Subject Characteristics and Neural Networks" (https://doi.org/10.1007/s10439-023-03278-y) for more information.

    Questions & other contacts: jere.lavikainen@uef.fi

  19. P

    Data from: FindingEmo Dataset

    • paperswithcode.com
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laurent Mertens; Elahe' Yargholi; Hans Op de Beeck; Jan Van den Stock; Joost Vennekens (2025). FindingEmo Dataset [Dataset]. https://paperswithcode.com/dataset/findingemo
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    Laurent Mertens; Elahe' Yargholi; Hans Op de Beeck; Jan Van den Stock; Joost Vennekens
    Description

    FindingEmo is an image dataset containing annotations for 25k images, specifically tailored to Emotion Recognition. Contrary to existing datasets, it focuses on complex scenes depicting multiple people in various naturalistic, social settings, with images being annotated as a whole, thereby going beyond the traditional focus on faces or single individuals. Annotated dimensions include Valence, Arousal and Emotion label, with annotations gathered using Prolific. Together with the annotations, we release the list of URLs pointing to the original images, as well as all associated source code.

  20. i

    COVID-19 Vaccination Demographics by County and District

    • hub.mph.in.gov
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    COVID-19 Vaccination Demographics by County and District [Dataset]. https://hub.mph.in.gov/dataset/covid-19-vaccinations-demographics-by-county-and-district
    Explore at:
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: 11/1/2023: Publication of the COVID data will be delayed because of technical difficulties. Note: 9/20/2023: With the end of the federal emergency and reporting requirements continuing to evolve, the Indiana Department of Health will no longer publish and refresh the COVID-19 datasets after November 15, 2023 - one final dataset publication will continue to be available. Vaccination demographics data by county/region, by race, by ethnicity, by gender, and by age. Fields with less than 5 results have been marked as suppressed. Note: 3/22/2023: Due to a technical issue updates are delayed for COVID data. New files will be published as soon as they are available. Historical Changes: 1/5/2023: Due to a technical issue the COVID datasets were not updated on 1/4/23. Updates will be published as soon as they are available. 9/29/22: Due to a technical difficulty, the weekly COVID datasets were not generated yesterday. They will be updated with current data today - 9/29 - and may result in a temporary discrepancy with the numbers published on the dashboard until the normal weekly refresh resumes 10/5. 9/27/2022: As of 9/28, the Indiana Department of Health (IDOH) is moving to a weekly COVID update for the dashboard and all associated datasets to continue to provide trend data that is applicable and usable for our partners and the public. This is to maintain alignment across the nation as states move to weekly updates. 8/19/2022 - The first and second dose columns are being removed as of 8/22/22 as the Health department has transitioned to reporting on Fully/Partially vaccinated. The final historical file including these columns from 8/19 will continue to be available. 2/10/2022: Data was not published on 2/9/2022 due to a technical issue, but updated data was released 2/10/2022. 10/13/2021: This dataset now includes columns for new and total booster shots administered. Please see the data dictionary for additional details. 08/06/2021: There are updates today to county-level vaccination rates to reflect a correction to records that were assigned to the wrong location based on ZIP code. 06/23/2021: COVID Hub files will no longer be updated on Saturdays. The normal refresh of these files has been changed to Mon-Fri. 06/10/2021: COVID Hub files will no longer be updated on Sundays. The normal refresh of these files has been changed to Mon-Sat. 06/07/2021: Today’s new counts include doses newly reported to the Indiana Department of Health on Saturday and Sunday. 06/03/2021: Individuals are able to update their personal and demographic information during the vaccination registration process. Today’s data reflects changes made by individuals to their race, ethnicity, or county of residence over the course of their vaccination series. 05/13/2021: The 12-15 year-old age group has been added into the dataset as of today. 05/06/2021: On Monday 5/3, individuals classified as "Unknown" county of residence were inadvertently converted to "Out of State." These individuals have been corrected in today's dataset. 03/11/2021: This dataset has been updated to include totals and newly administered single dose vaccination data. Additionally the existing age groups have been further stratified into a 16-19 year old age group, and 5 year groups for 20-79 year olds.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Work With Data (2024). Dataset of books series that contain The first people : from the earliest primates to homo sapiens : where and how our ancestors lived [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=The+first+people+:+from+the+earliest+primates+to+homo+sapiens+:+where+and+how+our+ancestors+lived&j=1&j0=books

Dataset of books series that contain The first people : from the earliest primates to homo sapiens : where and how our ancestors lived

Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is about book series. It has 1 row and is filtered where the books is The first people : from the earliest primates to homo sapiens : where and how our ancestors lived. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

Search
Clear search
Close search
Google apps
Main menu