Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PATRON is a human ethics approved program of research incorporating an enduring de-identified repository of Primary Care data facilitating research and knowledge generation. PATRON is a part of the 'Data for Decisions' initiative of the Department of General Practice, University of Melbourne. 'Data for Decisions' is a research initiative in partnership with general practices. It is an exciting undertaking that makes possible primary care research projects to increase knowledge and improve healthcare practices and policy. Principal Researcher: Jon EmeryData Custodian: Lena SanciData Steward: Douglas BoyleManager: Rachel CanawayMore information about Data for Decisions and utilising PATRON data is available from the Data for Decisions website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT OBJECTIVE To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs. METHODS The study analyzed all TB records from 2009 to 2011 in the state of Rio de Janeiro. A deterministic record linkage algorithm was developed using a set of 70 rules, based on the combination of fragments of the key variables with or without modification (Soundex or substring). Each rule was formed by three or more fragments. The probabilistic approach required a cutoff point for the score, above which the links would be automatically classified as belonging to the same individual. The cutoff point was obtained by linkage of the Notifiable Diseases Information System – Tuberculosis database with itself, subsequent manual review and ROC curves and precision-recall. Sensitivity and specificity for accurate analysis were calculated. RESULTS Accuracy ranged from 87.2% to 95.2% for sensitivity and 99.8% to 99.9% for specificity for probabilistic and deterministic record linkage, respectively. The occurrence of missing values for the key variables and the low percentage of similarity measure for name and date of birth were mainly responsible for the failure to identify records of the same individual with the techniques used. CONCLUSIONS The two techniques showed a high level of correlation for pair classification. Although deterministic linkage identified more duplicate records than probabilistic linkage, the latter retrieved records not identified by the former. User need and experience should be considered when choosing the best technique to be used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of prisoner vital status on the basis of record linkage and known vital status.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data support a paper of this title:
A Geotemporospatial and Causal Inference Epidemiological Exploration of Substance and Cannabinoid Exposure as Drivers of Rising US Pediatric Cancer Rates
Data represent a compilation of various data inputs from numerous sources including the National Cancer Institute SEER*Stat National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database: NPCR and SEER Incidence – U.S. Cancer Statistics Public Use Research Database, 2019 submission (2001-2017), United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Released June 2020. Available at www.cdc.gov/cancer/public-use program; the National survey of Drug Use and Health conducted by the Substance Abuse and Mental Health Services Administration; and the US Census bureau.
Data also include inverse probability weights for cannabis exposure.
Data also include their geospatial linkage network constructed for all US states which makes Alaska and Hawaii spatially connected to the contiguous USA.
Data also include the R script used to conduct and prepare the analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes:1. Search terms used to identify codes that may represent a history of illicit opioid use 2. Codelist for identifying people with a history of illicit opioid use 3. Age- and sex-distribution of patients by product and clinical codes 4. Number of patients currently in the cohort5. Age of patients at cohort entry6. Internal validation based on hospital admissions for opioid dependence
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ADAPTOR project used data from the 45 and Up Study of >267 357 New South Wales residents aged ≥45 years, randomly sampled from the Services Australia Medicare enrolment database. Participants aged 80 years and older and residing in rural/remote areas were overrepresented in the sample. There were a small number (
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Calculation of sensitivity and specificity for probabilistic matching without manual review, not including address variables and using an ETS dataset that only including non-UK born individuals.
PcBaSe Sweden is a data base for clinical epidemiological prostate cancer research based on linkages between the National Prostate Cancer Register (NPCR) of Sweden, a nationwide population-based quality database and other nationwide registries. In the period 1996-2023, 246 500 cases have been registered in NPCR with detailed data on tumour characteristics and primary treatment available https://statistik.incanet.se/npcr/. In addition, there are five controls per case.
By use of the individually unique person identity number, the NPCR has been linked to the Swedish National Cancer Register, the Cause of Death Register, the Prescribed Drug Register, the National Patient Register, and the Acute Myocardial Infarction Register, the Register of the Total Population, the Longitudinal Integration database for health insurance and labour market studies (LISA), the Multi-Generation Register and several other population-based registers. Van Hemelrijck M, Garmo H, Wigertz A, Nilsson P, Stattin P. Cohort Profile Update: The National Prostate Cancer Register of Sweden and Prostate Cancer data Base-a refined prostate cancer trajectory, Int J Epidemiol, 2016 Feb;45(1):73-82.
Purpose:
To provide a platform for prostate cancer research. The data base allows for population-based observational studies with case-control, cohort, or longitudinal case only design that can be used for studies of pertinent issues of clinical importance.
This dataset classifies confirmed yellow fever cases by their method of confirmation, including epidemiological linkage, clinical symptoms, and laboratory test results, aiding in disease surveillance, outbreak investigation, and public health response.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Australian and New Zealand Massive Transfusion Registry (ANZ-MTR) Updates Meeting was held on the 4th of August 2022.
During this meeting, Dr Kim Huynh provided an overview of how the National Transfusion Dataset (NTD) was established. The presentation details the ways in which NTD builds upon the existing ANZ-MTR and Transfusion Database (TD) pilot.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of study events according to both actual event and observed event (by linkage).
This collection provides a complete list of convict names and sufficient biographical data to enable unambiguous identification of convicts who were disembarked from convict ship "Blenheim" at Van Diemen's Land on 1837-07-16
This includes, where known, an estimation of the year of birth, place of birth, where and when convicted, the sentence, the date of arrival in the colony and the convict's age on arrival. The brief convict biographical data provided in this collection serves as an index into the far more extensive set of life course events which are recorded in the prosopgraphy database built by the Founders and Survivors project.
Basic details for this ship: * ship name (as known in VDL records): Blenheim * sailed date : 1837-03-15 from Woolwich * arrival date : 1837-07-16 * population (per Bateson's The Convict Ships): Embarked:210 Men ; Deaths:6 Men ; Landed:204(VDL) Men
Data for convicts listed in this collection comes from the source which has been determined by Founders and Survivors to form the "base population" for this ship. Further information as to the methodology and the linkage of multiple sources is detailed in the narrative format of the collection. The matching and linkage of additional sources about Tasmanian convict's is the subject of ongoing research. This collection may be repuplished regularly, and in additional formats and with specific user interfaces, to enable public participation in the quality of convict matching and linkage -- see for example the EXPERIMENTAL linkage.htm format for this collection. Linkage for ships arriving at Norfolk Island and Port Philip is incomplete.
This ship's prosopography index is published in a directory named "362.41" (the ship's project id). Three three different file formats provided: -- (default; suitable for web browsing) HTML: world wide web hypertext markup language format which provides a "narrative" view of the collection (index.htm); and -- (structured prosopgraphy: persons and events) XML / TEIp5 : Text Encoding Initiative (version p5) XML format which provides the underlying XML database for this collection (index.xml); and -- Not yet available simple list of convict names in a flat file, tab delimited, suitable for Excel, Stata, SPSS or database usage (index.tab). See notes below.
This collection provides a complete list of convict names and sufficient biographical data to enable unambiguous identification of convicts who were disembarked from convict ship "Surrey (3)" at Van Diemen's Land on 1833-04-07
This includes, where known, an estimation of the year of birth, place of birth, where and when convicted, the sentence, the date of arrival in the colony and the convict's age on arrival. The brief convict biographical data provided in this collection serves as an index into the far more extensive set of life course events which are recorded in the prosopgraphy database built by the Founders and Survivors project.
Basic details for this ship: * ship name (as known in VDL records): Surrey (3) * sailed date : 1832-12-04 from Downs * arrival date : 1833-04-07 * population (per Bateson's The Convict Ships): Embarked:?204 Men ; Deaths:1; Landed:204(VDL) Men
Data for convicts listed in this collection comes from the source which has been determined by Founders and Survivors to form the "base population" for this ship. Further information as to the methodology and the linkage of multiple sources is detailed in the narrative format of the collection. The matching and linkage of additional sources about Tasmanian convict's is the subject of ongoing research. This collection may be repuplished regularly, and in additional formats and with specific user interfaces, to enable public participation in the quality of convict matching and linkage -- see for example the EXPERIMENTAL linkage.htm format for this collection. Linkage for ships arriving at Norfolk Island and Port Philip is incomplete.
This ship's prosopography index is published in a directory named "362.03" (the ship's project id). Three three different file formats provided: -- (default; suitable for web browsing) HTML: world wide web hypertext markup language format which provides a "narrative" view of the collection (index.htm); and -- (structured prosopgraphy: persons and events) XML / TEIp5 : Text Encoding Initiative (version p5) XML format which provides the underlying XML database for this collection (index.xml); and -- Not yet available simple list of convict names in a flat file, tab delimited, suitable for Excel, Stata, SPSS or database usage (index.tab). See notes below.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chi squared test, not including missing data for each variable other than NHS number*At least one social risk factor including drug use, homelessness, alcohol misuse/ abuse, prisonDescriptive analysis of case notifications dataset for records with and without an NHS number.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chi squared test, not including missing data for each variable other than NHS number*It was not possible to calculate the exact age for these records as the date of their laboratory result was not recorded, but date of birth was available for all records.Descriptive analysis of laboratory dataset for records with and without an NHS number.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Material 1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aThe top two most significant two-point results within each model and family subset as well as any maximum multipoint LOD score exceeding 2 are included.bWhen two markers are listed, the first corresponds to the marker used for the two-point result shown. The second corresponds to the closest marker included in the multipoint analysis.cLOD scores exceeding 2 are bold and LOD scores exceeding 3 are bold and italicized. For the parametric model, HLOD scores are shown.dEmpirical p-values less than 0.05 are bold.Abbreviations: CTD: connective tissue disorder, NPL: nonparametric linkage, LOD: logarithm of the odds, Emp: empirical, CW: chromosome-wide, GW: genome-wide, N/A: not applicable.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pairwise linkage disequilibrium.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Whole genome sequencing (WGS) is increasingly used for epidemiological investigations of pathogens. While SNP variant calling is currently considered as the most suitable method, the choice of a representative reference genome and the isolate dependency of results limit standardization and affect resolution in an unknown manner. Whole or core genome Multi Locus Sequence Typing (wg-, cg-MLST) represents an attractive alternative. Here, we assess the accuracy of wg- and cg-MLST by comparing results of four Pseudomonas aeruginosa datasets for which epidemiological and genomic data were previously described. Three datasets included 155 isolates from three different sequence types (ST) of P. aeruginosa collected in our ICUs over a 5-year period. The fourth dataset consisted of 10 isolates from an investigation of P. aeruginosa contaminated hand soap. All isolates were previously analyzed by a core SNP approach. In this study, wg- and cg-MLST were performed in BioNumericsTM using a scheme developed by Applied-Maths. Correlation between SNP calling and wg- or cg-MLST results were evaluated by calculating linear regressions and their coefficient of correlations (R2) between the number of SNPs and the number of allele differences in pairwise comparison of isolates. The number of SNPs and allele difference between isolates with close epidemiological linkage varies between 0–26 and 0–13, respectively. When compared to core-SNP calling, a higher coefficient of correlation was obtained with cgMLST (R2 of 0.92–0.99) than with wgMLST (0.78–0.99). In one dataset, a putative homologous recombination of a large DNA fragment (202 loci) was identified among these isolates, affecting its phylogeny, but with no impact on the epidemiological analysis of outbreak isolates. In conclusion, we showed that the P. aeruginosa wgMLST scheme in BioNumericsTM is as discriminatory as the core-SNP calling approach and apparently useful for outbreak investigations. We also showed that epidemiological linked isolates showed less than 26 SNPs or 13 allele differences. These are important figures for the distinction between outbreak and non-outbreak isolates when interpreting WGS results. However, as P. aeruginosa is highly recombinant, a cgMLST approach is preferable and caution should be addressed to possible recombination of large DNA fragments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Lung cancer is the number one cancer-related cause of death in Sweden and worldwide. In most countries, five-year survival estimates vary between 10% and 20% with evidence of improved survival over time. Over the last decades, the management of lung cancer has changed including the introduction of national guidelines, new diagnostic procedures and treatments. This study aimed to investigate temporal trends in lung cancer survival both overall and in subgroups defined by established prognostic factors (i.e., sex, stage, histopathology and smoking history). We estimated one-, two-, and five-year relative survival, and excess mortality, in patients diagnosed with squamous cell carcinoma or adenocarcinoma of the lung between 1995 and 2016 in Sweden. We used population-based information available in a national lung cancer research database (LCBaSe) generated by cross-linkage between the Swedish National Lung Cancer Register and several Swedish health and sociodemographic registers. We included 36,935 patients diagnosed with squamous cell carcinoma or adenocarcinoma of the lung between 1995 and 2016. The overall one-, two- and five-year survival estimates increased between 1995 and 2016, from 38% to 53%, 21% to 37%, and 14% to 24%, respectively. Over the study period, we also found improved survival in subgroups, for example in patients with stages III-IV disease, patients with adenocarcinoma, and never-smokers. The excess mortality decreased over the study period, both overall and in all subgroups. Lung cancer survival increased over time in the overall lung cancer population. Of special note was evidence of improved survival in patients with stage IV disease. Our results corroborate a previously observed global trend of improved survival in patients with lung cancer.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PATRON is a human ethics approved program of research incorporating an enduring de-identified repository of Primary Care data facilitating research and knowledge generation. PATRON is a part of the 'Data for Decisions' initiative of the Department of General Practice, University of Melbourne. 'Data for Decisions' is a research initiative in partnership with general practices. It is an exciting undertaking that makes possible primary care research projects to increase knowledge and improve healthcare practices and policy. Principal Researcher: Jon EmeryData Custodian: Lena SanciData Steward: Douglas BoyleManager: Rachel CanawayMore information about Data for Decisions and utilising PATRON data is available from the Data for Decisions website.