75 datasets found
  1. Cora Dataset

    • linkagelibrary.icpsr.umich.edu
    • openicpsr.org
    • +3more
    delimited
    Updated Apr 2, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahin Ramezani (2019). Cora Dataset [Dataset]. http://doi.org/10.3886/E109167V2
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Apr 2, 2019
    Dataset provided by
    Texas A&M University
    Authors
    Mahin Ramezani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cora data contains bibliographic records of machine learning papers that have been manually clustered into groups that refer to the same publication. Originally, Cora was prepared by Andrew McCallum, and his versions of this data set are available on his Data web page. The data is also hosted here. Note that various versions of the Cora data set have been used by many publications in record linkage and entity resolution over the years.

  2. Data from: Dataset of the manuscript "What is local research? Towards a...

    • zenodo.org
    • produccioncientifica.ugr.es
    bin
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victoria Di Césare; Victoria Di Césare; Nicolas Robinson-Garcia; Nicolas Robinson-Garcia (2024). Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods" [Dataset]. http://doi.org/10.5281/zenodo.14190851
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Victoria Di Césare; Victoria Di Césare; Nicolas Robinson-Garcia; Nicolas Robinson-Garcia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods". In this research article we propose a theoretical and empirical framework of local research, a concept of growing importance due to its far-reaching implications for public policy. Our motivation stems from the lack of clarity surrounding the increasing yet uncritical use of the term in both scientific publications and policy documents, where local research is conceptualized and measured in many ways. A clear understanding of it is crucial for informed decision-making when setting research agendas, allocating funds, and evaluating and rewarding scientists. Our twofold aim is (1) to compare the existing approaches that define and measure local research, and (2) to assess the implications of applying one over another. We first review the perspectives and measures used since the 1970s. Drawing on spatial scientometrics and proximities, we then build a framework that splits the concept into several dimensions: locally informed research, locally situated research, locally relevant research, locally bound research, and locally governed research. Each dimension is composed of a definition and a methodological approach, which we test in 10 million publications from the Dimensions database. Our findings reveal that these approaches measure distinct and sometimes unaligned aspects of local research, with varying effectiveness across countries and disciplines. This study highlights the complex, multifaceted nature of local research. We provide a flexible framework that facilitates the analysis of these dimensions and their intersections, in an attempt to contribute to the understanding and assessment of local research and its role within the production, dissemination, and impact of scientific knowledge.

  3. CFSR Process as Defined by Regulation (45 CFR 1355.31-37)

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). CFSR Process as Defined by Regulation (45 CFR 1355.31-37) [Dataset]. https://catalog.data.gov/dataset/cfsr-process-as-defined-by-regulation-45-cfr-1355-31-37
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This graphic presents the timeline for selected steps in the Child and Family Services Review process. Metadata-only record linking to the original dataset. Open original dataset below.

  4. CCWIS Data and Reporting Requirements Presentation

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). CCWIS Data and Reporting Requirements Presentation [Dataset]. https://catalog.data.gov/dataset/ccwis-data-and-reporting-requirements-presentation
    Explore at:
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This DSS presentation describes the Comprehensive Child Welfare Information System (CCWIS) Data Requirements as defined by Federal Regulation 45 CFR 1355.52(b) and CCWIS Reporting Requirements as defined by Federal Regulation at 45 CFR 1355.52(c) and provides examples. Metadata-only record linking to the original dataset. Open original dataset below.

  5. d

    Health regions: boundaries and correspondence with census geography, 2007...

    • dataone.org
    • borealisdata.ca
    Updated Feb 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2024). Health regions: boundaries and correspondence with census geography, 2007 [Canada] [Excel files, digital mapping files] [Dataset]. http://doi.org/10.5683/SP3/7JF5RN
    Explore at:
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    Borealis
    Authors
    Statistics Canada
    Area covered
    Canada
    Description

    This issue describes in detail the health region limits as of December 2007 and their correspondence with the 2006 and 2001 Census geography. Health regions are defined by the provinces and represent administrative areas or regions of interest to health authorities. This product contains correspondence files (linking health regions to census geographic codes) and digital boundary files. User documentation provides an overview of health regions, sources, methods, limitations and product description (file format and layout). In addition to the geographic files, this product also includes 2006 Census data (basic profile) for health regions. For current Health Regions data, refer to Statistics Canada.

  6. u

    Synthetic Administrative Data: Census 1991, 2023

    • datacatalogue.ukdataservice.ac.uk
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shlomo, N, University of Manchester; Kim, M, University of Manchester (2024). Synthetic Administrative Data: Census 1991, 2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-856310
    Explore at:
    Dataset updated
    Feb 21, 2024
    Authors
    Shlomo, N, University of Manchester; Kim, M, University of Manchester
    Area covered
    United Kingdom
    Description

    We create a synthetic administrative dataset to be used in the development of the R package for calculating quality indicators for administrative data (see: https://github.com/sook-tusk/qualadmin) that mimic the properties of a real administrative dataset according to specifications by the ONS. Taking over 1 million records from a synthetic 1991 UK census dataset, we deleted records, moved records to a different geography and duplicated records to a different geography according to pre-specified proportions for each broad ethnic group (White, Non-white) and gender (males, females). The final size of the synthetic administrative data was 1033664 individuals.

    National Statistical Institutes (NSIs) are directing resources into advancing the use of administrative data in official statistics systems. This is a top priority for the UK Office for National Statistics (ONS) as they are undergoing transformations in their statistical systems to make more use of administrative data for future censuses and population statistics. Administrative data are defined as secondary data sources since they are produced by other agencies as a result of an event or a transaction relating to administrative procedures of organisations, public administrations and government agencies. Nevertheless, they have the potential to become important data sources for the production of official statistics by significantly reducing the cost and burden of response and improving the efficiency of such systems. Embedding administrative data in statistical systems is not without costs and it is vital to understand where potential errors may arise. The Total Administrative Data Error Framework sets out all possible sources of error when using administrative data as statistical data, depending on whether it is a single data source or integrated with other data sources such as survey data. For a single administrative data, one of the main sources of error is coverage and representation to the target population of interest. This is particularly relevant when administrative data is delivered over time, such as tax data for maintaining the Business Register. For sub-project 1 of this research project, we develop quality indicators that allow the statistical agency to assess if the administrative data is representative to the target population and which sub-groups may be missing or over-covered. This is essential for producing unbiased estimates from administrative data. Another priority at statistical agencies is to produce a statistical register for population characteristic estimates, such as employment statistics, from multiple sources of administrative and survey data. Using administrative data to build a spine, survey data can be integrated using record linkage and statistical matching approaches on a set of common matching variables. This will be the topic for sub-project 2, which will be split into several topics of research. The first topic is whether adding statistical predictions and correlation structures improves the linkage and data integration. The second topic is to research a mass imputation framework for imputing missing target variables in the statistical register where the missing data may be due to multiple underlying mechanisms. Therefore, the third topic will aim to improve the mass imputation framework to mitigate against possible measurement errors, for example by adding benchmarks and other constraints into the approaches. On completion of a statistical register, estimates for key target variables at local areas can easily be aggregated. However, it is essential to also measure the precision of these estimates through mean square errors and this will be the fourth topic of the sub-project. Finally, this new way of producing official statistics is compared to the more common method of incorporating administrative data through survey weights and model-based estimation approaches. In other words, we evaluate whether it is better 'to weight' or 'to impute' for population characteristic estimates - a key question under investigation by survey statisticians in the last decade.

  7. How are complex needs defined?

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). How are complex needs defined? [Dataset]. https://data.virginia.gov/dataset/how-are-complex-needs-defined
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 5, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    ACF Children Bureau resource

    Metadata-only record linking to the original dataset. Open original dataset below.

  8. V

    CCWIS Design, Software Repository, and CCWIS Options Presentation

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). CCWIS Design, Software Repository, and CCWIS Options Presentation [Dataset]. https://data.virginia.gov/dataset/ccwis-design-software-repository-and-ccwis-options-presentation
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 5, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This DSS presentation describes the Comprehensive Child Welfare Information System (CCWIS) Design Requirements, as defined by Federal Regulations 45 CFR 1355.53, the CCWIS Software Repository, as defined by Federal Regulations 45 CFR 1355.52(h), and CCWIS Options, as defined by Federal Regulations 45 CFR 1355.54.

    Metadata-only record linking to the original dataset. Open original dataset below.

  9. f

    Data_Sheet_1_Utilizing Text Mining, Data Linkage and Deep Learning in Police...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Feb 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cabral, Rina Carines; Butler, Tony; Han, Soyeon Caren; Poon, Josiah; Karystianis, George (2021). Data_Sheet_1_Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000901508
    Explore at:
    Dataset updated
    Feb 17, 2021
    Authors
    Cabral, Rina Carines; Butler, Tony; Han, Soyeon Caren; Poon, Josiah; Karystianis, George
    Description

    Family and Domestic violence (FDV) is a global problem with significant social, economic, and health consequences for victims including increased health care costs, mental trauma, and social stigmatization. In Australia, the estimated annual cost of FDV is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe abuse types, victim injuries, threats, including the mental health status for POIs and victims. This information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries, mental illness mentions), we linked these characteristics with the respective fixed fields and with actual mental health diagnoses obtained from the NSW Ministry of Health for the same cohort to form a comprehensive FDV dataset. These data were input into five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offense types (“hands-on,” “hands-off,” “Apprehended Domestic Violence Order (ADVO) breach”). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for “ADVO breach” in a multilabel classification setup while the binary classification setup generated similar results. “Hands-off” offenses proved the hardest offense type to predict (60.72% accuracy; 57.86% ROC using BERT) but showed potential to improve with fine-tuning of binary classification setups. “Hands-on” offenses benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three out of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The encouraging results indicate that future FDV offenses can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely increase the performance which can assist those working on FDV and law enforcement to improve outcomes and better manage FDV events.

  10. CCWIS Data Quality Plans

    • healthdata.gov
    • data.virginia.gov
    • +1more
    csv, xlsx, xml
    Updated Sep 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). CCWIS Data Quality Plans [Dataset]. https://healthdata.gov/d/8vfj-9tst
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Sep 4, 2025
    Description

    This DSS presentation describes the CCWIS data quality requirements, as defined by Federal Regulations 45 CFR 1355.52. In addition, this presentation provides guidance on biennial reviews and on how to compose CCWIS data quality plans.

    Metadata-only record linking to the original dataset. Open original dataset below.

  11. Additional file 1: of Determinants of first-time utilization of long-term...

    • springernature.figshare.com
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laurentius Slobbe; Albert Wong; Robert Verheij; Hans Oers; Johan Polder (2023). Additional file 1: of Determinants of first-time utilization of long-term care services in the Netherlands: an observational record linkage study [Dataset]. http://doi.org/10.6084/m9.figshare.c.3872056_D1.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Laurentius Slobbe; Albert Wong; Robert Verheij; Hans Oers; Johan Polder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Netherlands
    Description

    Selection and definition of chronic diseases used in study. Diseases are defined using the ICPC-1 classification for primary care. (XLSX 12Â kb)

  12. f

    Estimated probability of linkage between individuals of different groups, ,...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 9, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    De Gruttola, Victor; Novitsky, Vladimir; Carnegie, Nicole Bohme; Wang, Rui (2014). Estimated probability of linkage between individuals of different groups, , from Mochudi data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001206042
    Explore at:
    Dataset updated
    Jan 9, 2014
    Authors
    De Gruttola, Victor; Novitsky, Vladimir; Carnegie, Nicole Bohme; Wang, Rui
    Area covered
    Mochudi
    Description

    Rates given are per 1000 pairs. A link in this analysis is defined by a difference between sequences in less than 10% of available sites.

  13. The Permanency Innovations Initiative: A New Way to Build Capacity in Child...

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). The Permanency Innovations Initiative: A New Way to Build Capacity in Child Welfare [Dataset]. https://catalog.data.gov/dataset/the-permanency-innovations-initiative-a-new-way-to-build-capacity-in-child-welfare
    Explore at:
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    PII grantees provide examples of how they have strengthened their organizations' abilities to implement and test new interventions and change their efforts due to their involvement in PII. This video highlights how PII has contributed to a shift in the way capacity is defined and built within the Children's Bureau. Metadata-only record linking to the original dataset. Open original dataset below.

  14. Common Pitfalls and How to Avoid Them

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Common Pitfalls and How to Avoid Them [Dataset]. https://catalog.data.gov/dataset/common-pitfalls-and-how-to-avoid-them
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    Through the webinar series, “Back to Basics”, the Division of State Systems within the Children’s Bureau, is offering a venue for information sharing and discussion. The fourth webinar of the series, “Common Pitfalls and How to Avoid Them”, reviews nine common pitfalls associated with building and maintaining child welfare information systems including: lack of user involvement; poorly defined requirements; scope creep, lack of a change control system, poor testing, under-allocated resources, system performance issues, lack of a high-functioning user help desk and poor vendor relationship. Metadata-only record linking to the original dataset. Open original dataset below.

  15. Impact on mortality of being seropositive for hepatitis C virus antibodies...

    • plos.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hélio Ranes de Menezes Filho; Ana Luiza de Souza Bierrenbach; Maria Ligia Damato Capuani; Alfredo Mendrone Jr.; Adele Schwartz Benzaken; Soraia Mafra Machado; Marielena Vogel Saivish; Ester Cerdeira Sabino; Steven Sol Witkin; Maria Cássia Mendes-Corrêa (2023). Impact on mortality of being seropositive for hepatitis C virus antibodies among blood donors in Brazil: A twenty-year study [Dataset]. http://doi.org/10.1371/journal.pone.0226566
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hélio Ranes de Menezes Filho; Ana Luiza de Souza Bierrenbach; Maria Ligia Damato Capuani; Alfredo Mendrone Jr.; Adele Schwartz Benzaken; Soraia Mafra Machado; Marielena Vogel Saivish; Ester Cerdeira Sabino; Steven Sol Witkin; Maria Cássia Mendes-Corrêa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    IntroductionHepatitis C virus (HCV) infection is a major health problem associated with considerable risk of mortality in different regions of the world. The purpose of this study was to investigate the contribution of HCV infection on all-cause and liver-related mortality, in a large cohort of blood donors in Brazil.MethodsThis is a retrospective cohort study of blood donors from 1994 to 2013, at Fundação Pró-Sangue—Hemocentro de São Paulo (FPS). This cohort included 2,892 and 5,784 HCV antibody seropositive and seronegative donors, respectively. Records from the FPS database and the Mortality Information System (SIM: a national database in Brazil) were linked through a probabilistic record linkage (RL). Mortality outcomes were defined based on ICD-10 (10th International Statistical Classification of Diseases and Related Health Problems) codes listed as the cause of death on the death certificate. Hazard ratios (HRs) were estimated for outcomes using Cox multiple regression models.ResultsWhen all causes of death were considered, RL identified 209 deaths (7.2%) among seropositive blood donors and 190 (3.3%) among seronegative blood donors. Donors seropositive for HCV infection had a 2.5 times higher risk of death due to all causes (95% CI: 1.76–2.62; p

  16. E

    Dataset: The plural interpretability of German linking elements...

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    csv
    Updated Aug 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Dataset: The plural interpretability of German linking elements ("Morphology") [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7422
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 15, 2021
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset accompanies a paper to be published in "Morphology" (JOMO, Springer). Under the present DOI, all data generated for this research as well as all scripts used are stored. The paper itself is not CC-licensed, refer to Springer's "Morphology" website for details!AbstractIn this paper, we take a closer theoretical and empirical look at the linking elements in German N1+N2 compounds which are identical to the plural marker of N1 (such as -er with umlaut, as in Häus-er-meer 'sea of houses'). Various perspectives on the actual extent of plural interpretability of these pluralic linking elements are expressed in the literature. We aim to clarify this question by empirically examining to what extent there may be a relationship between plural form and meaning which informs in which sorts of compounds pluralic linking elements appear. Specifically, we investigate whether pluralic linking elements occur especially frequently in compounds where a plural meaning of the first constituent is induced either externally (through plural inflection of the entire compound) or internally (through a relation between the constituents such that N2 forces N1 to be conceptually plural, as in the example above). The results of a corpus study using the DECOW16A corpus and a split-100 experiment show that in the internal but not external plural meaning conditions, a pluralic linking element is preferred over a non-pluralic one, though there is considerable inter-speaker variability, and limitations imposed by other constraints on linking element distribution also play a role. However, we show the overall tendency that German language users do use pluralic linking elements as cues to the plural interpretation of N1+N2 compounds. Our interpretation does not reference a specific morphological framework. Instead, we view our data as strengthening the general approach of probabilistic morphology.

  17. V

    Guide to Data-Driven Decision Making

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Guide to Data-Driven Decision Making [Dataset]. https://data.virginia.gov/dataset/guide-to-data-driven-decision-making
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This guide explains data-driven decision making (DDDM), a process for deciding on a course of action based on data. It describes the requirements of DDDM and the steps for applying DDDM concepts to organizations and service systems.

    Metadata-only record linking to the original dataset. Open original dataset below.

  18. u

    Jyutping Project - Raw Data and Clean Data

    • rdr.ucl.ac.uk
    application/csv
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Lam (2024). Jyutping Project - Raw Data and Clean Data [Dataset]. http://doi.org/10.5522/04/26504347.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Aug 19, 2024
    Dataset provided by
    University College London
    Authors
    Joseph Lam
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Raw and clean data for Jyutping project, submitted to International Journal of Epidemiology.All data are openly available at the time of scrapping. I only retained Chinese Name and Hong Kong Government Romanised English Names. This project aims to describe the problem of non-standardised romanisation and it's impact on data linkage. The included data allows researchers to replicate my process of extracting Jyutping and Pinyin from Chinese Characters. Quite a few of manual screening and reviewing was required, so the code itself was not fully automated. The codes are stored on my personal GitHub, https://github.com/Jo-Lam/Jyutping_project/tree/main.Please cite this data resource: doi:10.5522/04/26504347

  19. H

    Replication data for: Transforming Women's Work: New England Lives in the...

    • dataverse.harvard.edu
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Dublin (2023). Replication data for: Transforming Women's Work: New England Lives in the Industrial Revolution [Dataset]. http://doi.org/10.7910/DVN/0CTHPO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Thomas Dublin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1820 - 1900
    Area covered
    New England, United States
    Description

    This study explores the transformation of women's wage work in New England between 1820 and 1900 with distinct studies of rural outwork, cotton textile manufacturing, boot and shoemaking, domestic service, needle trades, and teaching. Appendices in TRANSFORMING WOMEN'S WORK (Cornell University Press, 1994) discuss in detail how the datasets were constructed. Beginning with either employment or census records, Thomas Dublin employed nominal record linkage to assemble life course data on groups of women workers in Mew England.

  20. Data from: The health, work, and retirement study: representing experiences...

    • tandf.figshare.com
    pdf
    Updated Nov 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joanne Allen; Fiona M. Alpass; Andy Towers; Brendan Stevenson; Ágnes Szabó; Mary Breheny; Christine Stephens (2024). The health, work, and retirement study: representing experiences of later life in Aotearoa New Zealand [Dataset]. http://doi.org/10.6084/m9.figshare.20384472.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 24, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Joanne Allen; Fiona M. Alpass; Andy Towers; Brendan Stevenson; Ágnes Szabó; Mary Breheny; Christine Stephens
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    Older adults represent a large and growing section of Aotearoa New Zealand's population. Longitudinal research on experiences of later life enables understanding of both the capabilities with which people are ageing, and their determinants. The Health, Work, and Retirement (HWR) study has to date conducted eight biennial longitudinal postal surveys of health and well-being with older people (n = 11,601 respondents; 49.4% of Māori descent). Survey data are linked at the individual-level to other modes of data collection, including cognitive assessments, life course history interviews, and national health records. This article describes the HWR study and its potential to support our understanding of ageing in Aotearoa New Zealand. We present an illustrative analysis of data collected to date, using indicators of physical health-related functional ability from n = 10,728 adults aged 55–80 to describe mean trajectories of physical ability with age, by birth cohort and gender. As the original participant cohort recruited in 2006 reach ages 71–86 in 2022, future directions for study include expanding the study's core longitudinal measures to include follow-up assessments of cognitive functioning to understand factors predicting cognitive decline, and linkage to national datasets to identify population-level profiles of risk for conditions such as frailty.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mahin Ramezani (2019). Cora Dataset [Dataset]. http://doi.org/10.3886/E109167V2
Organization logo

Cora Dataset

Explore at:
delimitedAvailable download formats
Dataset updated
Apr 2, 2019
Dataset provided by
Texas A&M University
Authors
Mahin Ramezani
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Cora data contains bibliographic records of machine learning papers that have been manually clustered into groups that refer to the same publication. Originally, Cora was prepared by Andrew McCallum, and his versions of this data set are available on his Data web page. The data is also hosted here. Note that various versions of the Cora data set have been used by many publications in record linkage and entity resolution over the years.

Search
Clear search
Close search
Google apps
Main menu