90 datasets found

d
Guidelines for describing a microbiome data analysis
datadryad.org
data.niaid.nih.gov
zip
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy Willis; David Clausen (2024). Guidelines for describing a microbiome data analysis [Dataset]. http://doi.org/10.5061/dryad.q2bvq83vc
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.q2bvq83vc
Dataset updated
Oct 18, 2024
Dataset provided by
Dryad
Authors
Amy Willis; David Clausen
Description
These guidelines were drafted by the authors.
Data from: Using decision trees to understand structure in missing data
zenodo.org
data.niaid.nih.gov
+2more
txt, zip
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas J. Tierney; Fiona A. Harden; Maurice J. Harden; Kerrie L. Mengersen; Nicholas J. Tierney; Fiona A. Harden; Maurice J. Harden; Kerrie L. Mengersen (2022). Data from: Using decision trees to understand structure in missing data [Dataset]. http://doi.org/10.5061/dryad.j4f19
Explore at:
txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j4f19
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nicholas J. Tierney; Fiona A. Harden; Maurice J. Harden; Kerrie L. Mengersen; Nicholas J. Tierney; Fiona A. Harden; Maurice J. Harden; Kerrie L. Mengersen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objectives: Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting: Data taken from employees at 3 different industrial sites in Australia. Participants: 7915 observations were included. Materials and methods: The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the 'rpart' and 'gbm' packages for CART and BRT analyses, respectively, from the statistical software 'R'. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results: CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion: Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions: Researchers are encouraged to use CART and BRT models to explore and understand missing data.
d
GLO climate data stats summary
data.gov.au
researchdata.edu.au
+1more
zip
Updated Apr 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). GLO climate data stats summary [Dataset]. https://data.gov.au/data/dataset/afed85e0-7819-493d-a847-ec00a318e657
Explore at:
zip(8810)Available download formats
Dataset updated
Apr 13, 2022
Dataset authored and provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including

Time series mean annual BAWAP rainfall from 1900 - 2012.

Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).

As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

There are 4 csv files here:

BAWAP_P_annual_BA_SYB_GLO.csv

Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

Source data: annual BILO rainfall

P_PET_monthly_BA_SYB_GLO.csv

long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

Climatology_Trend_BA_SYB_GLO.csv

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

Dataset History

Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.

BAWAP_P_annual_BA_SYB_GLO.csv

Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

Source data: annual BILO rainfall

P_PET_monthly_BA_SYB_GLO.csv

long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

Climatology_Trend_BA_SYB_GLO.csv

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

Dataset Citation

Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.

Dataset Ancestors

Derived From Natural Resource Management (NRM) Regions 2010

Derived From Bioregional Assessment areas v03

Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012

Derived From Bioregional Assessment areas v01

Derived From Bioregional Assessment areas v02

Derived From GEODATA TOPO 250K Series 3

Derived From NSW Catchment Management Authority Boundaries 20130917

Derived From Geological Provinces - Full Extent

Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
d
MEDLINE/PubMed Baseline Statistics: Misc Report
catalog.data.gov
data.virginia.gov
+2more
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). MEDLINE/PubMed Baseline Statistics: Misc Report [Dataset]. https://catalog.data.gov/dataset/2023-medline-pubmed-baseline-misc-report
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
A file containing all Misc Baseline Reports for 2018-2023 in their original format is available in the Attachments section below. MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.
E
Central Statistical Office Dataset
live.european-language-grid.eu
data.europa.eu
xml
Updated Sep 9, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Central Statistical Office Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/18867
Explore at:
xmlAvailable download formats
Dataset updated
Sep 9, 2022
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Two Polish-English publications of the Polish Central Statistical Office in the XLIFF format: 1. "Statistical Yearbook of the Republic of Poland 2015" is the main summary publication of the Central Statistical Office, including a comprehensive set of statistical data describing the condition of the natural environment, the socio-economic and demographic situation of Poland, and its position in Europe and in the world. 2. "Women in Poland" contains statistical information regarding women's place and participation in socio-economic life of the country including international comparisons. The texts were aligned at the level of translation segments (mostly sentences and short paragraphs) and manually verified.
d
MEDLINE/PubMed Baseline Statistics: Min/Max Report
catalog.data.gov
datadiscovery.nlm.nih.gov
+2more
Updated Feb 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). MEDLINE/PubMed Baseline Statistics: Min/Max Report [Dataset]. https://catalog.data.gov/dataset/2023-medline-pubmed-baseline-min-max-report
Explore at:
Dataset updated
Feb 3, 2025
Dataset provided by
National Library of Medicine
Description
A file containing all Min/Max Baseline Reports for 2005-2023 in their original format is available in the Attachments section below. A second file includes a separate set of reports, made available from 2002-2017, that did not include OLDMEDLINE records. MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.
f
Statistical methods to model and evaluate physical activity programs, using...
plos.figshare.com
doc
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. S. M. Silva; Madawa W. Jayawardana; Denny Meyer (2023). Statistical methods to model and evaluate physical activity programs, using step counts: A systematic review [Dataset]. http://doi.org/10.1371/journal.pone.0206763
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0206763
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
S. S. M. Silva; Madawa W. Jayawardana; Denny Meyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundPhysical activity reduces the risk of noncommunicable diseases and is therefore an essential component of a healthy lifestyle. Regular engagement in physical activity can produce immediate and long term health benefits. However, physical activity levels are not as high as might be expected. For example, according to the global World Health Organization (WHO) 2017 statistics, more than 80% of the world’s adolescents are insufficiently physically active. In response to this problem, physical activity programs have become popular, with step counts commonly used to measure program performance. Analysing step count data and the statistical modeling of this data is therefore important for evaluating individual and program performance. This study reviews the statistical methods that are used to model and evaluate physical activity programs, using step counts.MethodsAdhering to PRISMA guidelines, this review systematically searched for relevant journal articles which were published between January 2000 and August 2017 in any of three databases (PubMed, PsycINFO and Web of Science). Only the journal articles which used a statistical model in analysing step counts for a healthy sample of participants, enrolled in an intervention involving physical exercise or a physical activity program, were included in this study. In these programs the activities considered were natural elements of everyday life rather than special activity interventions.ResultsThis systematic review was able to identify 78 unique articles describing statistical models for analysing step counts obtained through physical activity programs. General linear models and generalized linear models were the most popular methods used followed by multilevel models, while structural equation modeling was only used for measuring the personal and psychological factors related to step counts. Surprisingly no use was made of time series analysis for analysing step count data. The review also suggested several strategies for the personalisation of physical activity programs.ConclusionsOverall, it appears that the physical activity levels of people involved in such programs vary across individuals depending on psychosocial, demographic, weather and climatic factors. Statistical models can provide a better understanding of the impact of these factors, allowing for the provision of more personalised physical activity programs, which are expected to produce better immediate and long-term outcomes for participants. It is hoped that this review will identify the statistical methods which are most suitable for this purpose.
d
Digital data sets describing metropolitan areas in the conterminous US
catalog.data.gov
data.usgs.gov
+2more
Updated Oct 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Digital data sets describing metropolitan areas in the conterminous US [Dataset]. https://catalog.data.gov/dataset/digital-data-sets-describing-metropolitan-areas-in-the-conterminous-us
Explore at:
Dataset updated
Oct 5, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Contiguous United States, United States
Description
This data set describes metropolitan areas in the conterminous United States, developed from U.S. Bureau of the Census boundaries of Consolidated Metropolitan Statistical Areas (CMSA) and Metropolitan Statistical Areas (MSA), that have been processed to extract the largest contiguous urban area within each MSA or CMSA.
Law Enforcement Assistance Administration Profile Data, [1968-1978]
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Justice Statistics (2025). Law Enforcement Assistance Administration Profile Data, [1968-1978] [Dataset]. https://catalog.data.gov/dataset/law-enforcement-assistance-administration-profile-data-1968-1978-e48da
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
Bureau of Justice Statisticshttp://bjs.ojp.gov/
Description
The Law Enforcement Assistance Administration File (PROFILE) System was designed for the automated storage and retrieval of information describing programs sponsored by the Bureau of Justice Statistics. The two types of data elements used to describe the projects in this file are basic data and program descriptors. The basic data elements include the title of the grant, information regarding the location of the grantee and the project, critical funding dates, the government level and type of grantee, financial data, the name of the project director, indication of the availability of reports, and identification numbers. The program descriptor elements form the program classification system and describe the key characteristics of the program. Key characteristics include subject of the program, primary and secondary activity, whether the program covered a juvenile or adult problem, and what specific crimes, clients, staff, program strategies, agencies, equipment, or research methods were to be used or would be affected by the project.
School District Data Book (SDDB), 1990: [United States] - Archival Version
search.gesis.org
Updated Feb 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Education. National Center for Education Statistics (2021). School District Data Book (SDDB), 1990: [United States] - Archival Version [Dataset]. http://doi.org/10.3886/ICPSR02953
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR02953
Dataset updated
Feb 26, 2021
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
GESIS search
Authors
United States Department of Education. National Center for Education Statistics
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de435696https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de435696
Area covered
United States
Description
Abstract (en): The School District Data Book (SDDB) is an education database and information system. It contains an extensive set of data on children, their households, and the nation's school systems. Under the sponsorship of the National Center for Education Statistics, the Bureau of the Census has produced special tabulation files using the basic record files of the 1990 Census of Population and Housing by school district. These tabulation files contain aggregated data describing attributes of children and households in school districts. Data are organized by seven types of tabulation records: (1) characteristics of all households, (2) characteristics of all persons, (3) characteristics of households with children, (4) characteristics of parents living with children, (5) children's household characteristics, (6) children's parents' characteristics, and (7) children's own characteristics. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. All public elementary and secondary education agencies in operation during 1990-1991 in the 50 states and the District of Columbia. 2006-10-27 Variable names were corrected in SAS and SPSS setup files. The processing note in the codebook was also updated to reflect the corrections.2006-01-12 All files were removed from dataset 139 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 138 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 137 and flagged as study-level files, so that they will accompany all downloads.2002-05-29 Seventeen additional datasets (Parts 140-156) were added to the collection, including data for two states previously not covered -- Vermont and Washington -- and additional data for Arkansas, California, Illinois, Massachusetts, Michigan, Minnesota, New Jersey, Pennsylvania, and Texas. (1) Some states have multiple data files because they have large numbers of cases. (2) Two data files are not included in this release. They are: Washington, Part 3, and Wisconsin, Part 4.
Z
MontoloStats
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lieber, Sven (2020). MontoloStats [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3343052
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Lieber, Sven
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
MontoloStats is a dataset containing RDF DataCube-based statistics of Montolo concepts described using the W3C RDF DataCube and PROV-based MontoloVoc vocabulary

Ontologies which are built with the RDF framework consist of concepts and relationship between these concepts. Additionally several restrictions in the form of axioms can be defined, using terms of the RDFS and OWL vocabulary. To understand how current ontologies are modeled we created Montolo.

MontoloVoc is an OWL and RDFS-based vocabulary initially describing concepts regarding restrictions, and it is RDF Data Cube-based so that statistics regarding restrictions can be described. The restriction concepts entail abstract Restriction Types, such as disjoint classes or reflexive properties, and different restriction type Expressions for each type, e.g. owl:disjointWith or owl:AllDisjointClasses for the restriction type disjoint classes. Information regarding the use of restriction types are published as MontoloStats dataset, described using the MontoloVoc vocabulary and currently covering 660 LOV and 565 BioPortal ontologies.
Data from: Natality Detail File, 1993: [United States]
icpsr.umich.edu
archive.ciser.cornell.edu
ascii, sas, spss +1
Updated Mar 28, 2008
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Health and Human Services. National Center for Health Statistics (2008). Natality Detail File, 1993: [United States] [Dataset]. http://doi.org/10.3886/ICPSR06847.v1
Explore at:
ascii, spss, stata, sasAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR06847.v1
Dataset updated
Mar 28, 2008
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
United States Department of Health and Human Services. National Center for Health Statistics
License
https://www.icpsr.umich.edu/web/ICPSR/studies/6847/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/6847/terms
Time period covered
1993
Area covered
United States
Description
This collection provides information on live births in the United States during calendar year 1993. The natality data in this file are a component of the vital statistics collection effort maintained by the federal government. Geographic variables describing residence of births include the state, county, city, population, division and state subcode, Standard Metropolitan Statistical Area (SMSA), and metropolitan/nonmetropolitan county. Other variables include the race and sex of the child, the age of the mother, mother's education, place of delivery, person in attendance, and live birth order. The natality tabulations in the documentation include live births by age of mother, live-birth order and race of child, live births by marital status of mother, age of mother, and race of child, and live births by attendant and place of delivery.
c
MEDLINE/PubMed Baseline Statistics: Min/Max Report
s.cnmilf.com
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
A file containing all Min/Max Baseline Reports for 2005-2023 in their original format is available in the Attachments section below. A second file includes a separate set of reports, made available from 2002-2017, that did not include OLDMEDLINE records. MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.
m
SYD ALL climate data statistics summary
demo.dev.magda.io
devweb.dga.links.com.au
+1more
zip
Updated Jun 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). SYD ALL climate data statistics summary [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-624b1c2c-f93b-4baa-907a-cde9d60b25bb
Explore at:
zipAvailable download formats
Dataset updated
Jun 27, 2022
Dataset provided by
Bioregional Assessment Program
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
Abstract The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. There are 4 csv files here: BAWAP_P_annual_BA_SYB_GLO.csv Desc: Time series mean annual BAWAP rainfall from 1900 - 2012. Source data: annual BILO rainfall on \wron\Project\BA\BA_N_Sydney\Working\li036_Lingtao_LI\Grids\BILO_Rain_Ann\ P_PET_monthly_BA_SYB_GLO.csv long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month Climatology_Trend_BA_SYB_GLO.csv Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). All data used in this analysis came directly from James Risbey, CMAR, Hobart. As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009). Dataset History Dataset was created from various BILO source data, including Monthly BILO rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET (calculated by Randall Donohue), Correlation coefficient data from James Risbey Dataset Citation Bioregional Assessment Programme (XXXX) SYD ALL climate data statistics summary. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/b0a6ccf1-395d-430e-adf1-5068f8371dea. Dataset Ancestors Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012
Health and Disease Indicator Reports Data Package
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Health and Disease Indicator Reports Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/health-and-disease-indicator-reports-data-package/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Description
Health indicators are quantifiable characteristics of a population which researchers use as supporting evidence for describing the health of a population. The researchers use a survey methodology to gather information about certain people, use statistics in an attempt to generalize the information collected to the entire population, then use the statistical analysis to make a statement about the health of a population. Health indicators are often used by governments to guide health care policy.
Environmental data associated to particular health events example dataset
zenodo.org
bin, csv, html
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Navarro-Gallinad; Albert Navarro-Gallinad; Fabrizio Orlandi; Fabrizio Orlandi; Declan O'Sullivan; Declan O'Sullivan (2023). Environmental data associated to particular health events example dataset [Dataset]. http://doi.org/10.5281/zenodo.6817101
Explore at:
html, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6817101
Dataset updated
Mar 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Albert Navarro-Gallinad; Albert Navarro-Gallinad; Fabrizio Orlandi; Fabrizio Orlandi; Declan O'Sullivan; Declan O'Sullivan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data represents and example output for environmental data (i.e. climate and pollution) linked with individual events through location and time. The linkage is the result of a semantic query that integrates environmental data within an area relevant to the event and selects a period of data before the event.

The resulting event-environmental linked data contains:

The data for analysis as a data table (.csv) and graph (.ttl)

The metadata describing the linkage process and the data (.csv and .ttl)

The interactive report to explore the (meta)data (.html)

The graph files are ready to be shared and published as Findable, Accessible, Interoperable and Reusable (FAIR) data, including the necessary information to be reused by other researchers in different contexts.
d
Mental Health and Learning Disabilities Statistics
digital.nhs.uk
csv, pdf, xls
Updated Dec 22, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Mental Health and Learning Disabilities Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-and-learning-disabilities-statistics
Explore at:
csv(13.2 kB), xls(485.4 kB), pdf(179.7 kB), pdf(578.3 kB), csv(7.2 MB), csv(2.4 MB), pdf(98.5 kB), xls(494.6 kB)Available download formats
Dataset updated
Dec 22, 2015
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Sep 1, 2015 - Oct 31, 2015
Area covered
England
Description
This statistical release makes available the most recent Mental Health and Learning Disabilities Dataset (MHLDDS) final monthly data (September 2015). This publication presents a wide range of information about care delivered to users of NHS funded secondary mental health and learning disability services in England. The scope of the Mental Health Minimum Dataset (MHMDS) was extended to cover Learning Disability services from September 2014. Many people who have a learning disability use mental health services and people in learning disability services may have a mental health problem. This means that activity included in the new MHLDDS dataset cannot be distinctly divided into mental health or learning disability spells of care - a single spell of care may include inputs from either of both types of service. The Currencies and Payment file that forms part of this release is specifically limited to services in scope for currencies and payment in mental health services and remains unchanged. This information will be of particular interest to organisations involved in delivering secondary mental health and learning disability care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHLDS Monthly Report also includes reporting by local authority for the first time. For patients, researchers, agencies, and the wider public it aims to provide up to date information about the numbers of people using services, spending time in hospital and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis. The Currency and Payment (CaP) measures can be found in a separate machine-readable data file and may also be accessed via an on-line interactive visualisation tool that supports benchmarking. This can be accessed through the related links at the bottom of the page. This release also includes a note about the new experimental data file and the issuing of the ISN for the Mental Health Services Dataset (MHSDS). During summer 2015 we undertook a consultation on Adult Mental Health Statistics, seeking users views on the existing reports and what might usefully be added to our reports when the new version of the dataset (MHSDS) is implemented in 2016. A report on this consultation can be found below. Please note: The Monthly MHLDS Report published in February will cover November final data and December provisional data and will be the last publication from MHLDDS. Data for January 2016 will be published under the new name of Mental Health Services Monthly Statistics, with a first release of provisional data planned for March 2016. A Methodological Change paper describing changes to these monthly reports will be issued in the New Year.
Flash Eurobarometer FL525 : Monitoring the level of financial literacy in...
data.europa.eu
excel xlsx, zip
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Directorate-General for Communication (2023). Flash Eurobarometer FL525 : Monitoring the level of financial literacy in the EU [Dataset]. https://data.europa.eu/data/datasets/s2953_fl525_eng?locale=de
Explore at:
excel xlsx, zipAvailable download formats
Dataset updated
Jul 18, 2023
Dataset provided by
Directorate-General Communication
Authors
Directorate-General for Communication
License
http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj
Area covered
European Union
Description
The results show that 18% of EU citizens display a high level of financial literacy, 64% a medium level, and the remaining 18% a low level. There are, however, wide differences across Member States. In only four Member States, more than one quarter of citizens score highly in financial literacy (the Netherlands, Sweden, Denmark and Slovenia). The results also point to the need for financial education to target in particular women, younger people, people with lower income and with lower level of general education who tend to be on average less financially literate than other groups.

Processed data

Processed data files for the Eurobarometer surveys are published in .xlsx format.

Volume A "Countries/EU" The file contains frequencies and means or other synthetic indicators including elementary bivariate statistics describing distribution patterns of (weighted) replies for each country or territory and for (weighted) EU results.

Volume AP "Trends" The file compares to previous poll in (weighted) frequencies and means (or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies); shifts for each country or territory foreseen in Volume A and for (weighted) results.

Volume AA "Groups of countries" The file contains (labelled) frequencies and means or other synthetic indicators including elementary bivariate statistics describing distribution patterns of (weighted) replies for groups of countries specified by the managing unit on the part of the EC.

Volume AAP "Trends of groups of countries" The file contains shifts compared to the previous poll in (weighted) frequencies and means (or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies); shifts for each groups of countries foreseen in Volume AA and for (weighted) results.

Volume B "EU/socio-demographics" The file contains (labelled) frequencies and means or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies for the EU as a whole (weighted) and cross-tabulated by some 20 sociodemographic, socio-political or other variables, depending on the request from the managing unit on the part of the EC or the managing department of the other contracting authorities.

Volume BP "Trends of EU/socio-demographics" The file contains shifts compared to the previous poll in (weighted) frequencies and means (or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies); shifts for each country or territory foreseen in Volume B above)and for (weighted) results.

Volume C "Country/socio-demographics" The file contains (labelled) weighted frequencies and means or other synthetic indicators including elementary bivariate statistics describing distribution patterns of replies for each country or territory surveyed separately and cross-tabulated by some 20 socio-demographic, socio-political or other variables (including a regional breakdown).

For SPSS files and questionnaires, please contact GESIS - Leibniz Institute for the Social Sciences: https://www.gesis.org/eurobarometer
n
Repository Analytics and Metrics Portal (RAMP) 2018 data
data.niaid.nih.gov
dataone.org
+2more
zip
Updated Jul 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2018 data [Dataset]. http://doi.org/10.5061/dryad.ffbg79cvp
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.ffbg79cvp
Dataset updated
Jul 27, 2021
Dataset provided by
University of New Mexico
Montana State University
Authors
Jonathan Wheeler; Kenning Arlitsch
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.

Methods

RAMP Data Documentation – January 1, 2017 through August 18, 2018

Data Collection

RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search.

Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes." Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

The data in these CSV files include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search. citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No. index: The Elasticsearch index corresponding to page click data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.

Data Collection from August 19, 2018 Onward

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository
Z
Montolo
data.niaid.nih.gov
Updated Aug 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lieber, Sven (2020). Montolo [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3343312
Explore at:
Dataset updated
Aug 21, 2020
Dataset authored and provided by
Lieber, Sven
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Montolo is a knowledge graph which describes concepts related to RDF data models.

Currently it contains concepts related to restrictions: ontological axioms and data constraints. The concepts are described using the W3C Data Cube and W3C PROV compliant Montolo-Voc vocabulary. Statistical datasets which refer to descriptions in Montolo are MontoloStats (owl axioms) and MontoloSHACLStats (SHACL constraints).

Additionally, descriptions of Restriction Type Expressions in Montolo are aligned with the Astrea Knowledge Graph.