90 datasets found
  1. d

    Guidelines for describing a microbiome data analysis

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Willis; David Clausen (2024). Guidelines for describing a microbiome data analysis [Dataset]. http://doi.org/10.5061/dryad.q2bvq83vc
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 18, 2024
    Dataset provided by
    Dryad
    Authors
    Amy Willis; David Clausen
    Description

    These guidelines were drafted by the authors.

  2. Data from: Using decision trees to understand structure in missing data

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    txt, zip
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas J. Tierney; Fiona A. Harden; Maurice J. Harden; Kerrie L. Mengersen; Nicholas J. Tierney; Fiona A. Harden; Maurice J. Harden; Kerrie L. Mengersen (2022). Data from: Using decision trees to understand structure in missing data [Dataset]. http://doi.org/10.5061/dryad.j4f19
    Explore at:
    txt, zipAvailable download formats
    Dataset updated
    May 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nicholas J. Tierney; Fiona A. Harden; Maurice J. Harden; Kerrie L. Mengersen; Nicholas J. Tierney; Fiona A. Harden; Maurice J. Harden; Kerrie L. Mengersen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Objectives: Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting: Data taken from employees at 3 different industrial sites in Australia. Participants: 7915 observations were included. Materials and methods: The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the 'rpart' and 'gbm' packages for CART and BRT analyses, respectively, from the statistical software 'R'. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results: CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion: Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions: Researchers are encouraged to use CART and BRT models to explore and understand missing data.

  3. d

    GLO climate data stats summary

    • data.gov.au
    • researchdata.edu.au
    • +1more
    zip
    Updated Apr 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2022). GLO climate data stats summary [Dataset]. https://data.gov.au/data/dataset/afed85e0-7819-493d-a847-ec00a318e657
    Explore at:
    zip(8810)Available download formats
    Dataset updated
    Apr 13, 2022
    Dataset authored and provided by
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including

    1. Time series mean annual BAWAP rainfall from 1900 - 2012.

    2. Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month

    3. Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.

    4. Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).

    As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    There are 4 csv files here:

    BAWAP_P_annual_BA_SYB_GLO.csv

    Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

    Source data: annual BILO rainfall

    P_PET_monthly_BA_SYB_GLO.csv

    long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

    Climatology_Trend_BA_SYB_GLO.csv

    Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

    Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

    Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    Dataset History

    Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.

    BAWAP_P_annual_BA_SYB_GLO.csv

    Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

    Source data: annual BILO rainfall

    P_PET_monthly_BA_SYB_GLO.csv

    long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

    Climatology_Trend_BA_SYB_GLO.csv

    Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

    Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

    Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    Dataset Citation

    Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.

    Dataset Ancestors

  4. d

    MEDLINE/PubMed Baseline Statistics: Misc Report

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). MEDLINE/PubMed Baseline Statistics: Misc Report [Dataset]. https://catalog.data.gov/dataset/2023-medline-pubmed-baseline-misc-report
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    A file containing all Misc Baseline Reports for 2018-2023 in their original format is available in the Attachments section below. MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.

  5. E

    Central Statistical Office Dataset

    • live.european-language-grid.eu
    • data.europa.eu
    xml
    Updated Sep 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Central Statistical Office Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/18867
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Sep 9, 2022
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Two Polish-English publications of the Polish Central Statistical Office in the XLIFF format: 1. "Statistical Yearbook of the Republic of Poland 2015" is the main summary publication of the Central Statistical Office, including a comprehensive set of statistical data describing the condition of the natural environment, the socio-economic and demographic situation of Poland, and its position in Europe and in the world. 2. "Women in Poland" contains statistical information regarding women's place and participation in socio-economic life of the country including international comparisons. The texts were aligned at the level of translation segments (mostly sentences and short paragraphs) and manually verified.

  6. d

    MEDLINE/PubMed Baseline Statistics: Min/Max Report

    • catalog.data.gov
    • datadiscovery.nlm.nih.gov
    • +2more
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). MEDLINE/PubMed Baseline Statistics: Min/Max Report [Dataset]. https://catalog.data.gov/dataset/2023-medline-pubmed-baseline-min-max-report
    Explore at:
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    National Library of Medicine
    Description

    A file containing all Min/Max Baseline Reports for 2005-2023 in their original format is available in the Attachments section below. A second file includes a separate set of reports, made available from 2002-2017, that did not include OLDMEDLINE records. MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.

  7. f

    Statistical methods to model and evaluate physical activity programs, using...

    • plos.figshare.com
    doc
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. S. M. Silva; Madawa W. Jayawardana; Denny Meyer (2023). Statistical methods to model and evaluate physical activity programs, using step counts: A systematic review [Dataset]. http://doi.org/10.1371/journal.pone.0206763
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    S. S. M. Silva; Madawa W. Jayawardana; Denny Meyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundPhysical activity reduces the risk of noncommunicable diseases and is therefore an essential component of a healthy lifestyle. Regular engagement in physical activity can produce immediate and long term health benefits. However, physical activity levels are not as high as might be expected. For example, according to the global World Health Organization (WHO) 2017 statistics, more than 80% of the world’s adolescents are insufficiently physically active. In response to this problem, physical activity programs have become popular, with step counts commonly used to measure program performance. Analysing step count data and the statistical modeling of this data is therefore important for evaluating individual and program performance. This study reviews the statistical methods that are used to model and evaluate physical activity programs, using step counts.MethodsAdhering to PRISMA guidelines, this review systematically searched for relevant journal articles which were published between January 2000 and August 2017 in any of three databases (PubMed, PsycINFO and Web of Science). Only the journal articles which used a statistical model in analysing step counts for a healthy sample of participants, enrolled in an intervention involving physical exercise or a physical activity program, were included in this study. In these programs the activities considered were natural elements of everyday life rather than special activity interventions.ResultsThis systematic review was able to identify 78 unique articles describing statistical models for analysing step counts obtained through physical activity programs. General linear models and generalized linear models were the most popular methods used followed by multilevel models, while structural equation modeling was only used for measuring the personal and psychological factors related to step counts. Surprisingly no use was made of time series analysis for analysing step count data. The review also suggested several strategies for the personalisation of physical activity programs.ConclusionsOverall, it appears that the physical activity levels of people involved in such programs vary across individuals depending on psychosocial, demographic, weather and climatic factors. Statistical models can provide a better understanding of the impact of these factors, allowing for the provision of more personalised physical activity programs, which are expected to produce better immediate and long-term outcomes for participants. It is hoped that this review will identify the statistical methods which are most suitable for this purpose.

  8. d

    Digital data sets describing metropolitan areas in the conterminous US

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Oct 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Digital data sets describing metropolitan areas in the conterminous US [Dataset]. https://catalog.data.gov/dataset/digital-data-sets-describing-metropolitan-areas-in-the-conterminous-us
    Explore at:
    Dataset updated
    Oct 5, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Contiguous United States, United States
    Description

    This data set describes metropolitan areas in the conterminous United States, developed from U.S. Bureau of the Census boundaries of Consolidated Metropolitan Statistical Areas (CMSA) and Metropolitan Statistical Areas (MSA), that have been processed to extract the largest contiguous urban area within each MSA or CMSA.

  9. Law Enforcement Assistance Administration Profile Data, [1968-1978]

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Justice Statistics (2025). Law Enforcement Assistance Administration Profile Data, [1968-1978] [Dataset]. https://catalog.data.gov/dataset/law-enforcement-assistance-administration-profile-data-1968-1978-e48da
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    Bureau of Justice Statisticshttp://bjs.ojp.gov/
    Description

    The Law Enforcement Assistance Administration File (PROFILE) System was designed for the automated storage and retrieval of information describing programs sponsored by the Bureau of Justice Statistics. The two types of data elements used to describe the projects in this file are basic data and program descriptors. The basic data elements include the title of the grant, information regarding the location of the grantee and the project, critical funding dates, the government level and type of grantee, financial data, the name of the project director, indication of the availability of reports, and identification numbers. The program descriptor elements form the program classification system and describe the key characteristics of the program. Key characteristics include subject of the program, primary and secondary activity, whether the program covered a juvenile or adult problem, and what specific crimes, clients, staff, program strategies, agencies, equipment, or research methods were to be used or would be affected by the project.

  10. School District Data Book (SDDB), 1990: [United States] - Archival Version

    • search.gesis.org
    Updated Feb 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Education. National Center for Education Statistics (2021). School District Data Book (SDDB), 1990: [United States] - Archival Version [Dataset]. http://doi.org/10.3886/ICPSR02953
    Explore at:
    Dataset updated
    Feb 26, 2021
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    GESIS search
    Authors
    United States Department of Education. National Center for Education Statistics
    License

    https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de435696https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de435696

    Area covered
    United States
    Description

    Abstract (en): The School District Data Book (SDDB) is an education database and information system. It contains an extensive set of data on children, their households, and the nation's school systems. Under the sponsorship of the National Center for Education Statistics, the Bureau of the Census has produced special tabulation files using the basic record files of the 1990 Census of Population and Housing by school district. These tabulation files contain aggregated data describing attributes of children and households in school districts. Data are organized by seven types of tabulation records: (1) characteristics of all households, (2) characteristics of all persons, (3) characteristics of households with children, (4) characteristics of parents living with children, (5) children's household characteristics, (6) children's parents' characteristics, and (7) children's own characteristics. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. All public elementary and secondary education agencies in operation during 1990-1991 in the 50 states and the District of Columbia. 2006-10-27 Variable names were corrected in SAS and SPSS setup files. The processing note in the codebook was also updated to reflect the corrections.2006-01-12 All files were removed from dataset 139 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 138 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 137 and flagged as study-level files, so that they will accompany all downloads.2002-05-29 Seventeen additional datasets (Parts 140-156) were added to the collection, including data for two states previously not covered -- Vermont and Washington -- and additional data for Arkansas, California, Illinois, Massachusetts, Michigan, Minnesota, New Jersey, Pennsylvania, and Texas. (1) Some states have multiple data files because they have large numbers of cases. (2) Two data files are not included in this release. They are: Washington, Part 3, and Wisconsin, Part 4.

  11. Z

    MontoloStats

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lieber, Sven (2020). MontoloStats [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3343052
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Lieber, Sven
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    MontoloStats is a dataset containing RDF DataCube-based statistics of Montolo concepts described using the W3C RDF DataCube and PROV-based MontoloVoc vocabulary

    Ontologies which are built with the RDF framework consist of concepts and relationship between these concepts. Additionally several restrictions in the form of axioms can be defined, using terms of the RDFS and OWL vocabulary. To understand how current ontologies are modeled we created Montolo.

    MontoloVoc is an OWL and RDFS-based vocabulary initially describing concepts regarding restrictions, and it is RDF Data Cube-based so that statistics regarding restrictions can be described. The restriction concepts entail abstract Restriction Types, such as disjoint classes or reflexive properties, and different restriction type Expressions for each type, e.g. owl:disjointWith or owl:AllDisjointClasses for the restriction type disjoint classes. Information regarding the use of restriction types are published as MontoloStats dataset, described using the MontoloVoc vocabulary and currently covering 660 LOV and 565 BioPortal ontologies.

  12. Data from: Natality Detail File, 1993: [United States]

    • icpsr.umich.edu
    • archive.ciser.cornell.edu
    ascii, sas, spss +1
    Updated Mar 28, 2008
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Health and Human Services. National Center for Health Statistics (2008). Natality Detail File, 1993: [United States] [Dataset]. http://doi.org/10.3886/ICPSR06847.v1
    Explore at:
    ascii, spss, stata, sasAvailable download formats
    Dataset updated
    Mar 28, 2008
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States Department of Health and Human Services. National Center for Health Statistics
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/6847/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/6847/terms

    Time period covered
    1993
    Area covered
    United States
    Description

    This collection provides information on live births in the United States during calendar year 1993. The natality data in this file are a component of the vital statistics collection effort maintained by the federal government. Geographic variables describing residence of births include the state, county, city, population, division and state subcode, Standard Metropolitan Statistical Area (SMSA), and metropolitan/nonmetropolitan county. Other variables include the race and sex of the child, the age of the mother, mother's education, place of delivery, person in attendance, and live birth order. The natality tabulations in the documentation include live births by age of mother, live-birth order and race of child, live births by marital status of mother, age of mother, and race of child, and live births by attendant and place of delivery.

  13. c

    MEDLINE/PubMed Baseline Statistics: Min/Max Report

    • s.cnmilf.com
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    A file containing all Min/Max Baseline Reports for 2005-2023 in their original format is available in the Attachments section below. A second file includes a separate set of reports, made available from 2002-2017, that did not include OLDMEDLINE records. MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.

  14. m

    SYD ALL climate data statistics summary

    • demo.dev.magda.io
    • devweb.dga.links.com.au
    • +1more
    zip
    Updated Jun 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2022). SYD ALL climate data statistics summary [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-624b1c2c-f93b-4baa-907a-cde9d60b25bb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 27, 2022
    Dataset provided by
    Bioregional Assessment Program
    License

    Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
    License information was derived automatically

    Description

    Abstract The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. There are 4 csv files here: BAWAP_P_annual_BA_SYB_GLO.csv Desc: Time series mean annual BAWAP rainfall from 1900 - 2012. Source data: annual BILO rainfall on \wron\Project\BA\BA_N_Sydney\Working\li036_Lingtao_LI\Grids\BILO_Rain_Ann\ P_PET_monthly_BA_SYB_GLO.csv long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month Climatology_Trend_BA_SYB_GLO.csv Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). All data used in this analysis came directly from James Risbey, CMAR, Hobart. As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009). Dataset History Dataset was created from various BILO source data, including Monthly BILO rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET (calculated by Randall Donohue), Correlation coefficient data from James Risbey Dataset Citation Bioregional Assessment Programme (XXXX) SYD ALL climate data statistics summary. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/b0a6ccf1-395d-430e-adf1-5068f8371dea. Dataset Ancestors Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012

  15. Health and Disease Indicator Reports Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Health and Disease Indicator Reports Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/health-and-disease-indicator-reports-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    Health indicators are quantifiable characteristics of a population which researchers use as supporting evidence for describing the health of a population. The researchers use a survey methodology to gather information about certain people, use statistics in an attempt to generalize the information collected to the entire population, then use the statistical analysis to make a statement about the health of a population. Health indicators are often used by governments to guide health care policy.

  16. Environmental data associated to particular health events example dataset

    • zenodo.org
    bin, csv, html
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Navarro-Gallinad; Albert Navarro-Gallinad; Fabrizio Orlandi; Fabrizio Orlandi; Declan O'Sullivan; Declan O'Sullivan (2023). Environmental data associated to particular health events example dataset [Dataset]. http://doi.org/10.5281/zenodo.6817101
    Explore at:
    html, bin, csvAvailable download formats
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Albert Navarro-Gallinad; Albert Navarro-Gallinad; Fabrizio Orlandi; Fabrizio Orlandi; Declan O'Sullivan; Declan O'Sullivan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data represents and example output for environmental data (i.e. climate and pollution) linked with individual events through location and time. The linkage is the result of a semantic query that integrates environmental data within an area relevant to the event and selects a period of data before the event.

    The resulting event-environmental linked data contains:

    • The data for analysis as a data table (.csv) and graph (.ttl)
    • The metadata describing the linkage process and the data (.csv and .ttl)
    • The interactive report to explore the (meta)data (.html)

    The graph files are ready to be shared and published as Findable, Accessible, Interoperable and Reusable (FAIR) data, including the necessary information to be reused by other researchers in different contexts.

  17. d

    Mental Health and Learning Disabilities Statistics

    • digital.nhs.uk
    csv, pdf, xls
    Updated Dec 22, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). Mental Health and Learning Disabilities Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-and-learning-disabilities-statistics
    Explore at:
    csv(13.2 kB), xls(485.4 kB), pdf(179.7 kB), pdf(578.3 kB), csv(7.2 MB), csv(2.4 MB), pdf(98.5 kB), xls(494.6 kB)Available download formats
    Dataset updated
    Dec 22, 2015
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Sep 1, 2015 - Oct 31, 2015
    Area covered
    England
    Description

    This statistical release makes available the most recent Mental Health and Learning Disabilities Dataset (MHLDDS) final monthly data (September 2015). This publication presents a wide range of information about care delivered to users of NHS funded secondary mental health and learning disability services in England. The scope of the Mental Health Minimum Dataset (MHMDS) was extended to cover Learning Disability services from September 2014. Many people who have a learning disability use mental health services and people in learning disability services may have a mental health problem. This means that activity included in the new MHLDDS dataset cannot be distinctly divided into mental health or learning disability spells of care - a single spell of care may include inputs from either of both types of service. The Currencies and Payment file that forms part of this release is specifically limited to services in scope for currencies and payment in mental health services and remains unchanged. This information will be of particular interest to organisations involved in delivering secondary mental health and learning disability care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHLDS Monthly Report also includes reporting by local authority for the first time. For patients, researchers, agencies, and the wider public it aims to provide up to date information about the numbers of people using services, spending time in hospital and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis. The Currency and Payment (CaP) measures can be found in a separate machine-readable data file and may also be accessed via an on-line interactive visualisation tool that supports benchmarking. This can be accessed through the related links at the bottom of the page. This release also includes a note about the new experimental data file and the issuing of the ISN for the Mental Health Services Dataset (MHSDS). During summer 2015 we undertook a consultation on Adult Mental Health Statistics, seeking users views on the existing reports and what might usefully be added to our reports when the new version of the dataset (MHSDS) is implemented in 2016. A report on this consultation can be found below. Please note: The Monthly MHLDS Report published in February will cover November final data and December provisional data and will be the last publication from MHLDDS. Data for January 2016 will be published under the new name of Mental Health Services Monthly Statistics, with a first release of provisional data planned for March 2016. A Methodological Change paper describing changes to these monthly reports will be issued in the New Year.

  18. Flash Eurobarometer FL525 : Monitoring the level of financial literacy in...

    • data.europa.eu
    excel xlsx, zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Directorate-General for Communication (2023). Flash Eurobarometer FL525 : Monitoring the level of financial literacy in the EU [Dataset]. https://data.europa.eu/data/datasets/s2953_fl525_eng?locale=de
    Explore at:
    excel xlsx, zipAvailable download formats
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    Directorate-General Communication
    Authors
    Directorate-General for Communication
    License

    http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj

    Area covered
    European Union
    Description

    The results show that 18% of EU citizens display a high level of financial literacy, 64% a medium level, and the remaining 18% a low level. There are, however, wide differences across Member States. In only four Member States, more than one quarter of citizens score highly in financial literacy (the Netherlands, Sweden, Denmark and Slovenia). The results also point to the need for financial education to target in particular women, younger people, people with lower income and with lower level of general education who tend to be on average less financially literate than other groups.

    Processed data

    Processed data files for the Eurobarometer surveys are published in .xlsx format.

    • Volume A "Countries/EU" The file contains frequencies and means or other synthetic indicators including elementary bivariate statistics describing distribution patterns of (weighted) replies for each country or territory and for (weighted) EU results.
    • Volume AP "Trends" The file compares to previous poll in (weighted) frequencies and means (or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies); shifts for each country or territory foreseen in Volume A and for (weighted) results.
    • Volume AA "Groups of countries" The file contains (labelled) frequencies and means or other synthetic indicators including elementary bivariate statistics describing distribution patterns of (weighted) replies for groups of countries specified by the managing unit on the part of the EC.
    • Volume AAP "Trends of groups of countries" The file contains shifts compared to the previous poll in (weighted) frequencies and means (or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies); shifts for each groups of countries foreseen in Volume AA and for (weighted) results.
    • Volume B "EU/socio-demographics" The file contains (labelled) frequencies and means or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies for the EU as a whole (weighted) and cross-tabulated by some 20 sociodemographic, socio-political or other variables, depending on the request from the managing unit on the part of the EC or the managing department of the other contracting authorities.
    • Volume BP "Trends of EU/socio-demographics" The file contains shifts compared to the previous poll in (weighted) frequencies and means (or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies); shifts for each country or territory foreseen in Volume B above)and for (weighted) results.
    • Volume C "Country/socio-demographics" The file contains (labelled) weighted frequencies and means or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies for each country or territory surveyed separately and cross-tabulated by some 20 socio-demographic, socio-political or other variables (including a regional breakdown).

    For SPSS files and questionnaires, please contact GESIS - Leibniz Institute for the Social Sciences: https://www.gesis.org/eurobarometer

  19. n

    Repository Analytics and Metrics Portal (RAMP) 2018 data

    • data.niaid.nih.gov
    • dataone.org
    • +2more
    zip
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2018 data [Dataset]. http://doi.org/10.5061/dryad.ffbg79cvp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.

    Methods

    RAMP Data Documentation – January 1, 2017 through August 18, 2018

    Data Collection

    RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

    The data in these CSV files include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.

    Data Collection from August 19, 2018 Onward

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository

  20. Z

    Montolo

    • data.niaid.nih.gov
    Updated Aug 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lieber, Sven (2020). Montolo [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3343312
    Explore at:
    Dataset updated
    Aug 21, 2020
    Dataset authored and provided by
    Lieber, Sven
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Montolo is a knowledge graph which describes concepts related to RDF data models.

    Currently it contains concepts related to restrictions: ontological axioms and data constraints. The concepts are described using the W3C Data Cube and W3C PROV compliant Montolo-Voc vocabulary. Statistical datasets which refer to descriptions in Montolo are MontoloStats (owl axioms) and MontoloSHACLStats (SHACL constraints).

    Additionally, descriptions of Restriction Type Expressions in Montolo are aligned with the Astrea Knowledge Graph.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amy Willis; David Clausen (2024). Guidelines for describing a microbiome data analysis [Dataset]. http://doi.org/10.5061/dryad.q2bvq83vc

Guidelines for describing a microbiome data analysis

Explore at:
zipAvailable download formats
Dataset updated
Oct 18, 2024
Dataset provided by
Dryad
Authors
Amy Willis; David Clausen
Description

These guidelines were drafted by the authors.

Search
Clear search
Close search
Google apps
Main menu