15 datasets found
  1. m

    Demographics of Upper-Middle Class Citizens in Gachibowli, Hyderabad, India

    • data.mendeley.com
    Updated Dec 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praagna Shrikrishna Sriram (2019). Demographics of Upper-Middle Class Citizens in Gachibowli, Hyderabad, India [Dataset]. http://doi.org/10.17632/k55rb6zk3v.1
    Explore at:
    Dataset updated
    Dec 15, 2019
    Authors
    Praagna Shrikrishna Sriram
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Hyderabad, India, Gachibowli
    Description

    This dataset is one which highlights the demographics of Upper-Middle Class people living in Gachibowli, Hyderabad, India and attempts to, through various methods of statistical analysis, establish a relationship between several of these demographic details.

  2. F

    Middle Eastern Facial Images with Occlusion Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Middle Eastern Facial Images with Occlusion Dataset [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-occlusion-middle-east
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Middle Eastern Human Face with Occlusion Dataset, meticulously curated to enhance face recognition models and support the development of advanced occlusion detection systems, biometric identification systems, KYC models, and other facial recognition technologies.

    Facial Image Data

    This dataset comprises over 3,000 human facial images, divided into participant-wise sets with each set including:

    Occluded Images: 5 different high-quality facial images per individual occluded through various accessories such as masks, caps, sunglasses, or a combination of these accessories.
    Normal Images: One image without any accessories.

    Diversity and Representation

    The dataset includes contributions from a diverse network of individuals across Middle Eastern countries:

    Geographical Representation: Participants from countries including Egypt, Jordan, Suadi Arabia, UAE, Tunisia, and more.
    Demographics: Participants range from 18 to 70 years old, representing both males and females in 60:40 ratio, respectively.
    File Format: The dataset contains images in JPEG and HEIC file format.

    Quality and Conditions

    To ensure high utility and robustness, all images are captured under varying conditions:

    Lighting Conditions: Images are taken in different lighting environments to ensure variability and realism.
    Backgrounds: A variety of backgrounds are available to enhance model generalization.
    Device Quality: Photos are taken using the latest mobile devices to ensure high resolution and clarity.

    Metadata

    Each facial image set is accompanied by detailed metadata for each participant, including:

    Unique Identifier
    File Name
    Age
    Gender
    Country
    Demographic Information
    Occlusion Type
    File Format

    This metadata is essential for training models that can accurately recognize and identify human faces with occlusions across different demographics and conditions.

    Usage and Applications

    This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:

    Facial Recognition Models: Improving the accuracy and reliability of facial recognition systems.
    KYC Models: Streamlining the identity verification processes for financial and other services.
    Biometric Identity Systems: Developing robust biometric identification solutions.
    Occlusion Identification: Enhancing models to accurately identify faces with occlusions.

    Secure and Ethical Collection

    Data Security: Data was securely stored and processed within our platform, ensuring data security and confidentiality.
    Ethical Guidelines: The biometric data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
    Participant Consent: All participants were informed of the purpose of collection and potential use of the data, as agreed through written consent.

    Updates and Customization

    We understand the evolving nature of AI and machine

  3. Income of individuals by age group, sex and income source, Canada, provinces...

    • www150.statcan.gc.ca
    • ouvert.canada.ca
    • +2more
    Updated May 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2025). Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas [Dataset]. http://doi.org/10.25318/1110023901-eng
    Explore at:
    Dataset updated
    May 1, 2025
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.

  4. High income tax filers in Canada

    • www150.statcan.gc.ca
    • open.canada.ca
    • +1more
    Updated Oct 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2024). High income tax filers in Canada [Dataset]. http://doi.org/10.25318/1110005501-eng
    Explore at:
    Dataset updated
    Oct 28, 2024
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    This table presents income shares, thresholds, tax shares, and total counts of individual Canadian tax filers, with a focus on high income individuals (95% income threshold, 99% threshold, etc.). Income thresholds are based on national threshold values, regardless of selected geography; for example, the number of Nova Scotians in the top 1% will be calculated as the number of taxfiling Nova Scotians whose total income exceeded the 99% national income threshold. Different definitions of income are available in the table namely market, total, and after-tax income, both with and without capital gains.

  5. Single-earner and dual-earner census families by number of children

    • www150.statcan.gc.ca
    • ouvert.canada.ca
    • +2more
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2024). Single-earner and dual-earner census families by number of children [Dataset]. http://doi.org/10.25318/1110002801-eng
    Explore at:
    Dataset updated
    Jun 27, 2024
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Families of tax filers; Single-earner and dual-earner census families by number of children (final T1 Family File; T1FF).

  6. t

    Tucson Equity Priority Index (TEPI): Ward 6 Census Block Groups

    • teds.tucsonaz.gov
    Updated Feb 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tucson (2025). Tucson Equity Priority Index (TEPI): Ward 6 Census Block Groups [Dataset]. https://teds.tucsonaz.gov/maps/cotgis::tucson-equity-priority-index-tepi-ward-6-census-block-groups
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset authored and provided by
    City of Tucson
    Area covered
    Description

    For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the Data DictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.

  7. A Gold Standard Corpus for Activity Information (GoSCAI)

    • zenodo.org
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). A Gold Standard Corpus for Activity Information (GoSCAI) [Dataset]. http://doi.org/10.5281/zenodo.15528545
    Explore at:
    Dataset updated
    May 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Description

    A Gold Standard Corpus for Activity Information

    Dataset Title: A Gold Standard Corpus for Activity Information (GoSCAI)

    Dataset Curators: The Epidemiology & Biostatistics Section of the NIH Clinical Center Rehabilitation Medicine Department

    Dataset Version: 1.0 (May 16, 2025)

    Dataset Citation and DOI: NIH CC RMD Epidemiology & Biostatistics Section. (2025). A Gold Standard Corpus for Activity Information (GoSCAI) [Data set]. Zenodo. doi: 10.5281/zenodo.15528545

    EXECUTIVE SUMMARY

    This data statement is for a gold standard corpus of de-identified clinical notes that have been annotated for human functioning information based on the framework of the WHO's International Classification of Functioning, Disability and Health (ICF). The corpus includes 484 notes from a single institution within the United States written in English in a clinical setting. This dataset was curated for the purpose of training natural language processing models to automatically identify, extract, and classify information on human functioning at the whole-person, or activity, level.

    CURATION RATIONALE

    This dataset is curated to be a publicly available resource for the development and evaluation of methods for the automatic extraction and classification of activity-level functioning information as defined in the ICF. The goals of data curation are to 1) create a corpus of a size that can be manually deidentified and annotated, 2) maximize the density and diversity of functioning information of interest, and 3) allow public dissemination of the data.

    LANGUAGE VARIETIES

    Language Region: en-US

    Prose Description: English as written by native and bilingual English speakers in a clinical setting

    LANGUAGE USER DEMOGRAPHIC

    The language users represented in this dataset are medical and clinical professionals who work in a research hospital setting. These individuals hold professional degrees corresponding to their respective specialties. Specific demographic characteristics of the language users such as age, gender, or race/ethnicity were not collected.

    ANNOTATOR DEMOGRAPHIC

    The annotator group consisted of five people, 33 to 76 years old, including four females and one male. Socioeconomically, they came from the middle and upper-middle income classes. Regarding first language, three annotators had English as their first language, one had Chinese, and one had Spanish. Proficiency in English, the language of the data being annotated, was native for three of the annotators and bilingual for the other two. The annotation team included clinical rehabilitation domain experts with backgrounds in occupational therapy, physical therapy, and individuals with public health and data science expertise. Prior to annotation, all annotators were trained on the specific annotation process using established guidelines for the given domain, and annotators were required to achieve a specified proficiency level prior to annotating notes in this corpus.

    LINGUISTIC SITUATION AND TEXT CHARACTERISTICS

    The notes in the dataset were written as part of clinical care within a U.S. research hospital between May 2008 and November 2019. These notes were written by health professionals asynchronously following the patient encounter to document the interaction and support continuity of care. The intended audience of these notes were clinicians involved in the patients' care. The included notes come from nine disciplines - neuropsychology, occupational therapy, physical medicine (physiatry), physical therapy, psychiatry, recreational therapy, social work, speech language pathology, and vocational rehabilitation. The notes were curated to support research on natural language processing for functioning information between 2018 and 2024.

    PREPROCESSING AND DATA FORMATTING

    The final corpus was derived from a set of clinical notes extracted from the hospital electronic medical record (EMR) for the purpose of clinical research. The original data include character-based digital content originally. We work in ASCII 8 or UNICODE encoding, and therefore part of our pre-processing includes running encoding detection and transformation from encodings such as Windows-1252 or ISO-8859 format to our preferred format.

    On the larger corpus, we applied sampling to match our curation rationale. Given the resource constraints of manual annotation, we set out to create a dataset of 500 clinical notes, which would exclude notes over 10,000 characters in length.

    To promote density and diversity, we used five note characteristics as sampling criteria. We used the text length as expressed in number of characters. Next, we considered the discipline group as derived from note type metadata and describes which discipline a note originated from: occupational and vocational therapy (OT/VOC), physical therapy (PT), recreation therapy (RT), speech and language pathology (SLP), social work (SW), or miscellaneous (MISC, including psychiatry, neurology and physiatry). These disciplines were selected for collecting the larger corpus because their notes are likely to include functioning information. Existing information extraction tools were used to obtain annotation counts in four areas of functioning and provided a note’s annotation count, annotation density (annotation count divided by text length), and domain count (number of domains with at least 1 annotation).

    We used stratified sampling across the 6 discipline groups to ensure discipline diversity in the corpus. Because of low availability, 50 notes were sampled from SLP with relaxed criteria, and 90 notes each from the 5 other discipline groups with stricter criteria. Sampled SLP notes were those with the highest annotation density that had an annotation count of at least 5 and a domain count of at least 2. Other notes were sampled by highest annotation count and lowest text length, with a minimum annotation count of 15 and minimum domain count of 3.

    The notes in the resulting sample included certain types of PHI and PII. To prepare for public dissemination, all sensitive or potentially identifying information was manually annotated in the notes and replaced with substituted content to ensure readability and enough context needed for machine learning without exposing any sensitive information. This de-identification effort was manually reviewed to ensure no PII or PHI exposure and correct any resulting readability issues. Notes about pediatric patients were excluded. No intent was made to sample multiple notes from the same patient. No metadata is provided to group notes other than by note type, discipline, or discipline group. The dataset is not organized beyond the provided metadata, but publications about models trained on this dataset should include information on the train/test splits used.

    All notes were sentence-segmented and tokenized using the spaCy en_core_web_lg model with additional rules for sentence segmentation customized to the dataset. Notes are stored in an XML format readable by the GATE annotation software (https://gate.ac.uk/family/developer.html), which stores annotations separately in annotation sets.

    CAPTURE QUALITY

    As the clinical notes were extracted directly from the EMR in text format, the capture quality was determined to be high. The clinical notes did not have to be converted from other data formats, which means this dataset is free from noise introduced by conversion processes such as optical character recognition.

    LIMITATIONS

    Because of the effort required to manually deidentify and annotate notes, this corpus is limited in terms of size and representation. The curation decisions skewed note selection towards specific disciplines and note types to increase the likelihood of encountering information on functioning. Some subtypes of functioning occur infrequently in the data, or not at all. The deidentification of notes was done in a manner to preserve natural language as it would occur in the notes, but some information is lost, e.g. on rare diseases.

    METADATA

    Information on the manual annotation process is provided in the annotation guidelines for each of the four domains:

    - Communication & Cognition (https://zenodo.org/records/13910167)

    - Mobility (https://zenodo.org/records/11074838)

    - Self-Care & Domestic Life (SCDL) (https://zenodo.org/records/11210183)

    - Interpersonal Interactions & Relationships (IPIR) (https://zenodo.org/records/13774684)

    Inter-annotator agreement was established on development datasets described in the annotation guidelines prior to the annotation of this gold standard corpus.

    The gold standard corpus consists of 484 documents, which include 35,147 sentences in total. The distribution of annotated information is provided in the table below.

    <td style="width: 1.75in; padding: 0in 5.4pt 0in

    Domain

    Number of Annotated Sentences

    % of All Sentences

    Mean Number of Annotated Sentences per Document

    Communication & Cognition

    6033

    17.2%

  8. f

    Slopes of the three income groups (low, middle & high) and their associated...

    • figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lazar Ilic; M. Sawada (2023). Slopes of the three income groups (low, middle & high) and their associated 95% confidence intervals (CI) obtained via non-parametric bootstrapping of individual income data. [Dataset]. http://doi.org/10.1371/journal.pone.0251430.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Lazar Ilic; M. Sawada
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Slopes of the three income groups (low, middle & high) and their associated 95% confidence intervals (CI) obtained via non-parametric bootstrapping of individual income data.

  9. Z

    Integrated Agent-based Modelling and Simulation of Transportation Demand and...

    • data.niaid.nih.gov
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sprei, Frances (2024). Integrated Agent-based Modelling and Simulation of Transportation Demand and Mobility Patterns in Sweden [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10648077
    Explore at:
    Dataset updated
    Jun 19, 2024
    Dataset provided by
    Yeh, Sonia
    Tozluoğlu, Çağlar
    Liao, Yuan
    Sprei, Frances
    Dhamal, Swapnil
    Ghosh, Kaniska
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sweden
    Description

    About

    The Synthetic Sweden Mobility (SySMo) model provides a simplified yet statistically realistic microscopic representation of the real population of Sweden. The agents in this synthetic population contain socioeconomic attributes, household characteristics, and corresponding activity plans for an average weekday. This agent-based modelling approach derives the transportation demand from the agents’ planned activities using various transport modes (e.g., car, public transport, bike, and walking).

    This open data repository contains four datasets:

    (1) Synthetic Agents,

    (2) Activity Plans of the Agents,

    (3) Travel Trajectories of the Agents, and

    (4) Road Network (EPSG: 3006)

    (OpenStreetMap data were retrieved on August 28, 2023, from https://download.geofabrik.de/europe.html, and GTFS data were retrieved on September 6, 2023 from https://samtrafiken.se/)

    The database can serve as input to assess the potential impacts of new transportation technologies, infrastructure changes, and policy interventions on the mobility patterns of the Swedish population.

    Methodology

    This dataset contains statistically simulated 10.2 million agents representing the population of Sweden, their socio-economic characteristics and the activity plan for an average weekday. For preparing data for the MATSim simulation, we randomly divided all the agents into 10 batches. Each batch's agents are then simulated in MATSim using the multi-modal network combining road networks and public transit data in Sweden using the package pt2matsim (https://github.com/matsim-org/pt2matsim).

    The agents' daily activity plans along with the road network serve as the primary inputs in the MATSim environment which ensures iterative replanning while aiming for a convergence on optimal activity plans for all the agents. Subsequently, the individual mobility trajectories of the agents from the MATSim simulation are retrieved.

    The activity plans of the individual agents extracted from the MATSim simulation output data are then further processed. All agents with negative utility score and negative activity time corresponding to at least one activity are filtered out as the ‘infeasible’ agents. The dataset ‘Synthetic Agents’ contains all synthetic agents regardless of their ‘feasibility’ (0=excluded & 1=included in plans and trajectories). In the other datasets, only agents with feasible activity plans are included.

    The simulation setup adheres to the MATSim 13.0 benchmark scenario, with slight adjustments. The strategy for replanning integrates BestScore (60%), TimeAllocationMutator (30%), and ReRoute (10%)— the percentages denote the proportion of agents utilizing these strategies. In each iteration of the simulation, the agents adopt these strategies to adjust their activity plans. The "BestScore" strategy retains the plan with the highest score from the previous iteration, selecting the most successful strategy an agent has employed up until that point. The "TimeAllocationMutator" modifies the end times of activities by introducing random shifts within a specified range, allowing for the exploration of different schedules. The "ReRoute" strategy enables agents to alter their current routes, potentially optimizing travel based on updated information or preferences. These strategies are detailed further in W. Axhausen et al. (2016) work, which provides comprehensive insights into their implementation and impact within the context of transport simulation modeling.

    Data Description

    (1) Synthetic Agents

    This dataset contains all agents in Sweden and their socioeconomic characteristics.

    The attribute ‘feasibility’ has two categories: feasible agents (73%), and infeasible agents (27%). Infeasible agents are agents with negative utility score and negative activity time corresponding to at least one activity.

    File name: 1_syn_pop_all.parquet

    Column

    Description

    Data type

    Unit

    PId

    Agent ID

    Integer

    -

    Deso Zone code of Demographic statistical areas (DeSO)1

    String

    kommun

    Municipality code

    Integer

    marital

    Marital Status (single/ couple/ child)

    String

    sex

    Gender (0 = Male, 1 = Female)

    Integer

    age

    Age

    Integer

    HId

    A unique identifier for households

    Integer

    HHtype

    Type of households (single/ couple/ other)

    String

    HHsize

    Number of people living in the households

    Integer

    num_babies

    Number of children less than six years old in the household

    Integer

    employment Employment Status (0 = Not Employed, 1 = Employed)

    Integer

    studenthood Studenthood Status (0 = Not Student, 1 = Student)

    Integer

    income_class Income Class (0 = No Income, 1 = Low Income, 2 = Lower-middle Income, 3 = Upper-middle Income, 4 = High Income)

    Integer

    num_cars Number of cars owned by an individual

    Integer

    HHcars Number of cars in the household

    Integer

    feasibility

    Status of the individual (1=feasible, 0=infeasible)

    Integer

    1 https://www.scb.se/vara-tjanster/oppna-data/oppna-geodata/deso--demografiska-statistikomraden/

    (2) Activity Plans of the Agents

    The dataset contains the car agents’ (agents that use cars on the simulated day) activity plans for a simulated average weekday.

    File name: 2_plans_i.parquet, i = 0, 1, 2, ..., 8, 9. (10 files in total)

    Column

    Description

    Data type

    Unit

    act_purpose

    Activity purpose (work/ home/ school/ other)

    String

    -

    PId

    Agent ID

    Integer

    -

    act_end

    End time of activity (0:00:00 – 23:59:59)

    String

    hour:minute:seco

    nd

    act_id

    Activity index of each agent

    Integer

    -

    mode

    Transport mode to reach the activity location

    String

    -

    POINT_X

    Coordinate X of activity location (SWEREF99TM)

    Float

    metre

    POINT_Y

    Coordinate Y of activity location (SWEREF99TM)

    Float

    metre

    dep_time

    Departure time (0:00:00 – 23:59:59)

    String

    hour:minute:seco

    nd

    score

    Utility score of the simulation day as obtained from MATSim

    Float

    -

    trav_time

    Travel time to reach the activity location

    String

    hour:minute:seco

    nd

    trav_time_min

    Travel time in decimal minute

    Float

    minute

    act_time

    Activity duration in decimal minute

    Float

    minute

    distance

    Travel distance between the origin and the destination

    Float

    km

    speed

    Travel speed to reach the activity location

    Float

    km/h

    (3) Travel Trajectories of the Agents

    This dataset contains the driving trajectories of all the agents on the road network, and the public transit vehicles used by these agents, including buses, ferries, trams etc. The files are produced by MATSim simulations and organised into 10 *.parquet’ files (representing different batches of simulation) corresponding to each plan file.

    File name: 3_events_i.parquet, i = 0, 1, 2, ..., 8, 9. (10 files in total)

    Column

    Description

    Data type

    Unit

    time

    Time in second in a simulation day (0-86399)

    Integer

    second

    type

    Event type defined by MATSim simulation*

    String

    person

    Agent ID

    Integer

    link

    Nearest road link consistent with the road network

    String

    vehicle

    Vehicle ID identical to person

    Integer

    from_node

    Start node of the link

    Integer

    to_node

    End node of the link

    Integer

    • One typical episode of MATSim simulation events: Activity ends (actend) -> Agent’s vehicle enters traffic (vehicle enters traffic) -> Agent’s vehicle moves from previous road segment to its next connected one (left link) -> Agent’s vehicle leaves traffic for activity (vehicle leaves traffic) -> Activity starts (actstart)

    (4) Road Network

    This dataset contains the road network.

    File name: 4_network.shp

    Column

    Description

    Data type

    Unit

    length

    The length of road link

    Float

    metre

    freespeed

    Free speed

    Float

    km/h

    capacity

    Number of vehicles

    Integer

    permlanes

    Number of lanes

    Integer

    oneway

    Whether the segment is one-way (0=no, 1=yes)

    Integer

    modes

    Transport mode

    String

    from_node

    Start node of the link

    Integer

    to_node

    End node of the link

    Integer

    geometry

    LINESTRING (SWEREF99TM)

    geometry

    metre

    Additional Notes

    This research is funded by the RISE Research Institutes of Sweden, the Swedish Research Council for Sustainable Development (Formas, project number 2018-01768), and Transport Area of Advance, Chalmers.

    Contributions

    YL designed the simulation, analyzed the simulation data, and, along with CT, executed the simulation. CT, SD, FS, and SY conceptualized the model (SySMo), with CT and SD further developing the model to produce agents and their activity plans. KG wrote the data document. All authors reviewed, edited, and approved the final document.

  10. f

    Dataset used for analysis.

    • plos.figshare.com
    application/csv
    Updated Apr 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    So Yeon Joyce Kong; Ankit Acharya; Omkar Basnet; Solveig Haukås Haaland; Rejina Gurung; Øystein Gomo; Fredrik Ahlsson; Øyvind Meinich-Bache; Anna Axelin; Yuba Nidhi Basula; Sunil Mani Pokharel; Hira Subedi; Helge Myklebust; Ashish KC (2024). Dataset used for analysis. [Dataset]. http://doi.org/10.1371/journal.pdig.0000471.s007
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Apr 1, 2024
    Dataset provided by
    PLOS Digital Health
    Authors
    So Yeon Joyce Kong; Ankit Acharya; Omkar Basnet; Solveig Haukås Haaland; Rejina Gurung; Øystein Gomo; Fredrik Ahlsson; Øyvind Meinich-Bache; Anna Axelin; Yuba Nidhi Basula; Sunil Mani Pokharel; Hira Subedi; Helge Myklebust; Ashish KC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveThis study aims to assess the acceptability of a novel technology, MAchine Learning Application (MALA), among the mothers of newborns who required resuscitation.SettingThis study took place at Bharatpur Hospital, which is the second-largest public referral hospital with 13 000 deliveries per year in Nepal.DesignThis is a cross-sectional survey.Data collection and analysisData collection took place from January 21 to February 13, 2022. Self-administered questionnaires on acceptability (ranged 1–5 scale) were collected from participating mothers. The acceptability of the MALA system, which included video and audio recordings of the newborn resuscitation, was examined among mothers according to their age, parity, education level and technology use status using a stratified analysis.ResultsThe median age of 21 mothers who completed the survey was 25 years (range 18–37). Among them, 11 mothers (52.4%) completed their bachelor’s or master’s level of education, 13 (61.9%) delivered first child, 14 (66.7%) owned a computer and 16 (76.2%) carried a smartphone. Overall acceptability was high that all participating mothers positively perceived the novel technology with video and audio recordings of the infant’s care during resuscitation. There was no statistical difference in mothers’ acceptability of MALA system, when stratified by mothers’ age, parity, or technology usage (p>0.05). When the acceptability of the technology was stratified by mothers’ education level (up to higher secondary level vs. bachelor’s level or higher), mothers with Bachelor’s degree or higher more strongly felt that they were comfortable with the infant’s care being video recorded (p = 0.026) and someone using a tablet when observing the infant’s care (p = 0.046). Compared with those without a computer (n = 7), mothers who had a computer at home (n = 14) more strongly agreed that they were comfortable with someone observing the resuscitation activity of their newborns (71.4% vs. 14.3%) (p = 0.024).ConclusionThe novel technology using video and audio recordings for newborn resuscitation was accepted by mothers in this study. Its application has the potential to improve resuscitation quality in low-and-middle income settings, given proper informed consent and data protection measures are in place.

  11. Global Development Indicators (2000-2020)

    • kaggle.com
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Matta (2025). Global Development Indicators (2000-2020) [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/global-development-indicators-2000-2020/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Michael Matta
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Global Economic, Environmental, Health, and Social indicators Ready for Analysis

    📝 Description

    This comprehensive dataset merges global economic, environmental, technological, and human development indicators from 2000 to 2020. Sourced and transformed from multiple public datasets via Google BigQuery, it is designed for advanced exploratory data analysis, machine learning, policy modeling, and sustainability research.

    Curated by combining and transforming data from the Google BigQuery Public Data program, this dataset offers a harmonized view of global development across more than 40 key indicators spanning over two decades (2000–2020). It supports research across multiple domains such as:

    • Economic Growth
    • Climate Sustainability
    • Digital Transformation
    • Public Health
    • Human Development
    • Resilience and Governance

    📅 Temporal Coverage

    • Years: 2000–2020
    • Includes calculated features:

      • years_since_2000
      • years_since_century
      • is_pandemic_period (binary indicator for pandemic periods)

    🌍 Geographic Scope

    • Countries: Global (identified by ISO country codes)
    • Regions and Income Groups included for aggregated analysis

    📊 Key Feature Groups

    • Economic Indicators:

      • GDP (USD), GDP per capita
      • FDI, inflation, unemployment, economic growth index
    • Environmental Indicators:

      • CO₂ emissions, renewable energy use
      • Forest area, green transition score, CO₂ intensity
    • Technology & Connectivity:

      • Internet usage, mobile subscriptions
      • Digital readiness score, digital connectivity index
    • Health & Education:

      • Life expectancy, child mortality
      • School enrollment, healthcare capacity, health development ratio
    • Governance & Resilience:

      • Governance quality, global resilience
      • Human development composite, ecological preservation

    🔍 Use Cases

    • Trend analysis over time
    • Country-level comparisons
    • Modeling development outcomes
    • Predictive analytics on sustainability or human development
    • Correlation and clustering across multiple indicators

    ⚠️ Note on Missing Region and Income Group Data

    Approximately 18% of the entries in the region and income_group columns are null. This is primarily due to the inclusion of aggregate regions (e.g., Arab World, East Asia & Pacific, Africa Eastern and Southern) and non-country classifications (e.g., Early-demographic dividend, Central Europe and the Baltics). These entries represent groups of countries with diverse income levels and geographic characteristics, making it inappropriate or misleading to assign a single region or income classification. In some cases, the data source may have intentionally left these fields blank to avoid oversimplification or due to a lack of standardized classification.

    📋 Column Descriptions

    • year: Year of the recorded data, representing a time series for each country.
    • country_code: Unique code assigned to each country (ISO-3166 standard).
    • country_name: Name of the country corresponding to the data.
    • region: Geographical region of the country (e.g., Africa, Asia, Europe).
    • income_group: Income classification based on Gross National Income (GNI) per capita (low, lower-middle, upper-middle, high income).
    • currency_unit: Currency used in the country (e.g., USD, EUR).
    • gdp_usd: Gross Domestic Product (GDP) in USD (millions or billions).
    • population: Total population of the country for the given year.
    • gdp_per_capita: GDP divided by population (economic output per person).
    • inflation_rate: Annual rate of inflation (price level rise).
    • unemployment_rate: Percentage of the labor force unemployed but seeking employment.
    • fdi_pct_gdp: Foreign Direct Investment (FDI) as a percentage of GDP.
    • co2_emissions_kt: Total CO₂ emissions in kilotons (kt).
    • energy_use_per_capita: Energy consumption per person (kWh).
    • renewable_energy_pct: Percentage of energy consumption from renewable sources.
    • forest_area_pct: Percentage of total land area covered by forests.
    • electricity_access_pct: Percentage of the population with access to electricity.
    • life_expectancy: Average life expectancy at birth.
    • child_mortality: Deaths of children under 5 per 1,000 live births.
    • school_enrollment_secondary: Percentage of population enrolled in secondary education.
    • health_expenditure_pct_gdp: Percentage of GDP spent on healthcare.
    • hospital_beds_per_1000: Hospital beds per 1,000 people.
    • physicians_per_1000: Physicians (doctors) per 1,000 people.
    • internet_usage_pct: Percentage of population with internet access.
    • **mobile_subscriptions_per_10...
  12. T

    Vital Signs: Jobs by Wage Level - Metro

    • data.bayareametro.gov
    application/rdfxml +5
    Updated Jan 18, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Vital Signs: Jobs by Wage Level - Metro [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Jobs-by-Wage-Level-Metro/bt32-8udw
    Explore at:
    csv, tsv, application/rssxml, application/rdfxml, xml, jsonAvailable download formats
    Dataset updated
    Jan 18, 2019
    Description

    VITAL SIGNS INDICATOR Jobs by Wage Level (EQ1)

    FULL MEASURE NAME Distribution of jobs by low-, middle-, and high-wage occupations

    LAST UPDATED January 2019

    DESCRIPTION Jobs by wage level refers to the distribution of jobs by low-, middle- and high-wage occupations. In the San Francisco Bay Area, low-wage occupations have a median hourly wage of less than 80% of the regional median wage; median wages for middle-wage occupations range from 80% to 120% of the regional median wage, and high-wage occupations have a median hourly wage above 120% of the regional median wage.

    DATA SOURCE California Employment Development Department OES (2001-2017) http://www.labormarketinfo.edd.ca.gov/data/oes-employment-and-wages.html

    American Community Survey (2001-2017) http://api.census.gov

    CONTACT INFORMATION vitalsigns.info@bayareametro.gov

    METHODOLOGY NOTES (across all datasets for this indicator) Jobs are determined to be low-, middle-, or high-wage based on the median hourly wage of their occupational classification in the most recent year. Low-wage jobs are those that pay below 80% of the regional median wage. Middle-wage jobs are those that pay between 80% and 120% of the regional median wage. High-wage jobs are those that pay above 120% of the regional median wage. Regional median hourly wages are estimated from the American Community Survey and are published on the Vital Signs Income indicator page. For the national context analysis, occupation wage classifications are unique to each metro area. A low-wage job in New York, for instance, may be a middle-wage job in Miami. For the Bay Area in 2017, the median hourly wage for low-wage occupations was less than $20.86 per hour. For middle-wage jobs, the median ranged from $20.86 to $31.30 per hour; and for high-wage jobs, the median wage was above $31.30 per hour.

    Occupational employment and wage information comes from the Occupational Employment Statistics (OES) program. Regional and subregional data is published by the California Employment Development Department. Metro data is published by the Bureau of Labor Statistics. The OES program collects data on wage and salary workers in nonfarm establishments to produce employment and wage estimates for some 800 occupations. Data from non-incorporated self-employed persons are not collected, and are not included in these estimates. Wage estimates represent a three-year rolling average.

    Due to changes in reporting during the analysis period, subregion data from the EDD OES have been aggregated to produce geographies that can be compared over time. West Bay is San Mateo, San Francisco, and Marin counties. North Bay is Sonoma, Solano and Napa counties. East Bay is Alameda and Contra Costa counties. South Bay is Santa Clara County from 2001-2004 and Santa Clara and San Benito counties from 2005-2017.

    Due to changes in occupation classifications during the analysis period, all occupations have been reassigned to 2010 SOC codes. For pre-2009 reporting years, all employment in occupations that were split into two or more 2010 SOC occupations are assigned to the first 2010 SOC occupation listed in the crosswalk table provided by the Census Bureau. This method assumes these occupations always fall in the same wage category, and sensitivity analysis of this reassignment method shows this is true in most cases.

    In order to use OES data for time series analysis, several steps were taken to handle missing wage or employment data. For some occupations, such as airline pilots and flight attendants, no wage information was provided and these were removed from the analysis. Other occupations did not record a median hourly wage (mostly due to irregular work hours) but did record an annual average wage. Nearly all these occupations were in education (i.e. teachers). In this case, a 2080 hour-work year was assumed and [annual average wage/2080] was used as a proxy for median income. Most of these occupations were classified as high-wage, thus dispelling concern of underestimating a median wage for a teaching occupation that requires less than 2080 hours of work a year (equivalent to 12 months fulltime). Finally, the OES has missing employment data for occupations across the time series. To make the employment data comparable between years, gaps in employment data for occupations are ‘filled-in’ using linear interpolation if there are at least two years of employment data found in OES. Occupations with less than two years of employment data were dropped from the analysis. Over 80% of interpolated cells represent missing employment data for just one year in the time series. While this interpolating technique may impact year-over-year comparisons, the long-term trends represented in the analysis generally are accurate.

  13. United States US: Poverty Headcount Ratio at $5.50 a Day: 2011 PPP: % of...

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, United States US: Poverty Headcount Ratio at $5.50 a Day: 2011 PPP: % of Population [Dataset]. https://www.ceicdata.com/en/united-states/poverty/us-poverty-headcount-ratio-at-550-a-day-2011-ppp--of-population
    Explore at:
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 1979 - Dec 1, 2016
    Area covered
    United States
    Description

    United States US: Poverty Headcount Ratio at $5.50 a Day: 2011 PPP: % of Population data was reported at 2.000 % in 2016. This stayed constant from the previous number of 2.000 % for 2013. United States US: Poverty Headcount Ratio at $5.50 a Day: 2011 PPP: % of Population data is updated yearly, averaging 1.500 % from Dec 1979 (Median) to 2016, with 11 observations. The data reached an all-time high of 2.000 % in 2016 and a record low of 1.200 % in 1986. United States US: Poverty Headcount Ratio at $5.50 a Day: 2011 PPP: % of Population data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Poverty. Poverty headcount ratio at $5.50 a day is the percentage of the population living on less than $5.50 a day at 2011 international prices. As a result of revisions in PPP exchange rates, poverty rates for individual countries cannot be compared with poverty rates reported in earlier editions.; ; World Bank, Development Research Group. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. Data for high-income economies are from the Luxembourg Income Study database. For more information and methodology, please see PovcalNet (http://iresearch.worldbank.org/PovcalNet/index.htm).; ; The World Bank’s internationally comparable poverty monitoring database now draws on income or detailed consumption data from more than one thousand six hundred household surveys across 164 countries in six regions and 25 other high income countries (industrialized economies). While income distribution data are published for all countries with data available, poverty data are published for low- and middle-income countries and countries eligible to receive loans from the World Bank (such as Chile) and recently graduated countries (such as Estonia) only. The aggregated numbers for low- and middle-income countries correspond to the totals of 6 regions in PovcalNet, which include low- and middle-income countries and countries eligible to receive loans from the World Bank (such as Chile) and recently graduated countries (such as Estonia). See PovcalNet (http://iresearch.worldbank.org/PovcalNet/WhatIsNew.aspx) for definitions of geographical regions and industrialized countries.

  14. H

    Milwaukee Area Renters Study (MARS)

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 6, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Desmond (2016). Milwaukee Area Renters Study (MARS) [Dataset]. http://doi.org/10.7910/DVN/BLUU3U
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 6, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Matthew Desmond
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Milwaukee
    Description

    Designed to collect new data related to housing, poverty, and urban life, the Milwaukee Area Renters Study (MARS) is an in-person survey of 1,086 households in Milwaukee. One person per household, usually an adult leaseholder, was interviewed. The MARS instrument was comprised of more than 250 unique items and administered in-person in English and Spanish. The University of Wisconsin Survey Center supervised data collection, which took place between 2009 and 2011. The MARS sample was limited to renters. Nationwide, the majority of low-income families live in rental housing, and most receive no federal housing assistance. Except in exceptional cities with very high housing costs, the rental population is comprised of some upper- and middle-class households who prefer renting and most of the cities’ low-income households who are excluded both from public housing and homeownership. To focus on urban renters in the private market, then, is to focus on the lived experience of most low-income families living in cities. MARS was funded by the John D. and Catherine T. MacArthur Foundation, through its “How Housing Matters” initiative.

  15. f

    Outputs generated in the last 5 years.

    • plos.figshare.com
    xls
    Updated Nov 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naomi Waithira; Mavuto Mukaka; Evelyne Kestelyn; Keitcheya Chotthanawathit; Dung Nguyen Thi Phuong; Hoa Nguyen Thanh; Anne Osterrieder; Trudie Lang; Phaik Yeong Cheah (2024). Outputs generated in the last 5 years. [Dataset]. http://doi.org/10.1371/journal.pgph.0003392.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    PLOS Global Public Health
    Authors
    Naomi Waithira; Mavuto Mukaka; Evelyne Kestelyn; Keitcheya Chotthanawathit; Dung Nguyen Thi Phuong; Hoa Nguyen Thanh; Anne Osterrieder; Trudie Lang; Phaik Yeong Cheah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data sharing holds promise to accelerate innovative discoveries through artificial intelligence (AI) and traditional analytics. However, it remains unclear whether these prospects translate into tangible benefits in improving health care and scientific progress. In this cross-sectional study, we investigate current data reuse practices and explore ways to enhance the use of existing data in clinical research, focusing on low- and middle-income countries. 643 clinical researchers and data professionals participated in the study. 55.5% analysed clinical trial data. 75.3% of data users analysed data from observational studies obtained mainly through personal requests or downloads from publicly available sources. Data was mainly used to influence the design of new studies or in pooled and individual patient-level data meta-analyses. Key benefits realised were career progression and academic qualification, with more gains reported by users affiliated with high-income and upper-middle-income countries (p = 0.046, chi = 8.0). Scientific progress through publications and collaborations was associated with gender (p = 0.012, chi = 10.9), with males more likely to contribute. Benefits to the public although minimal, were associated with career seniority (p = 0.001, chi = 18.8), with works by senior researchers being more likely to influence health policy or treatment guidelines. Although 54% of the respondents accessed at least 3 datasets in the past 5 years, 79.4% of data users encountered difficulty finding relevant data for planned analyses. Researchers affiliated with low and middle income institutions reported more difficulty interpreting data (p = 0.012, chi = 25.7), while challenges with language were regionally influenced (p = 0.000, chi = 51.3) and more commonly reported by researchers in Latin America and South and East Asia institutions. While the utilisation of shared data is lower than expected, focused efforts to enrich existing data with extensive metadata using standard terminologies can enhance data findability. Investment in training programmes, building professional networks, and mentorship in data science may improve the quality of data generated and increase researchers’ ability to use existing datasets.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Praagna Shrikrishna Sriram (2019). Demographics of Upper-Middle Class Citizens in Gachibowli, Hyderabad, India [Dataset]. http://doi.org/10.17632/k55rb6zk3v.1

Demographics of Upper-Middle Class Citizens in Gachibowli, Hyderabad, India

Explore at:
Dataset updated
Dec 15, 2019
Authors
Praagna Shrikrishna Sriram
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Hyderabad, India, Gachibowli
Description

This dataset is one which highlights the demographics of Upper-Middle Class people living in Gachibowli, Hyderabad, India and attempts to, through various methods of statistical analysis, establish a relationship between several of these demographic details.

Search
Clear search
Close search
Google apps
Main menu