32 datasets found
  1. w

    Dataset Freshness Report for data.maryland.gov

    • data.wu.ac.at
    csv, json, xml
    Updated Aug 12, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Information Technology (DoIT) (2015). Dataset Freshness Report for data.maryland.gov [Dataset]. https://data.wu.ac.at/schema/data_maryland_gov/OHlwYS1jOWQ5
    Explore at:
    csv, json, xmlAvailable download formats
    Dataset updated
    Aug 12, 2015
    Dataset provided by
    Department of Information Technology (DoIT)
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Maryland
    Description

    This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.

    This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.

    This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.

  2. O

    Dataset Freshness Report: Breakout by Agency

    • opendata.maryland.gov
    application/rdfxml +5
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MD Department of Information Technology (2025). Dataset Freshness Report: Breakout by Agency [Dataset]. https://opendata.maryland.gov/Administrative/Dataset-Freshness-Report-Breakout-by-Agency/mb32-u83y
    Explore at:
    csv, application/rdfxml, tsv, json, application/rssxml, xmlAvailable download formats
    Dataset updated
    Aug 8, 2025
    Dataset authored and provided by
    MD Department of Information Technology
    Description

    This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.

    This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.

    This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.

  3. d

    Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  4. Dry Well Reporting System Data

    • data.ca.gov
    • data.cnra.ca.gov
    • +2more
    csv
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Water Resources (2025). Dry Well Reporting System Data [Dataset]. https://data.ca.gov/dataset/dry-well-reporting-system-data
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 11, 2025
    Dataset authored and provided by
    California Department of Water Resourceshttp://www.water.ca.gov/
    Description

    In California, water systems serving one (1) to 15 households are regulated at the county level. Counties vary in their practices, but rarely do counties collect data regularly from these systems. Even where data is collected, it is entirely voluntary. A review of well permit information suggests there are over 1 million such water systems in California.

    In early 2014, a cross-agency Work Group created an easily accessible reporting system to get more systematic data on which parts of the state had households at risk of water supply shortages. The initial motivation for local water supply systems to report shortage information was to obtain statewide drought assistance. The reporting system receives ongoing reports of shortages from local, state, federal and non-governmental organizations, and tracks their status to resolution. While several counties have developed their own tracking mechanisms, this data is manually entered into the reporting system.

    The cross-agency team, led by DWR, seeks to verify and update the data submitted. However, due to the volunteer nature of the reporting and limitations on reporting agencies, collected data are undoubtedly under-representative of all shortages to have occurred. In addition, reports are received from multiple sources and there are occasionally errors and omissions that can create duplicate entries, non-household water supply reporting, and under-reporting. For example, missing information or no data for a given county does not necessarily mean that there are no household water shortages in the county, rather only that none have been reported to the State.

  5. NSDUH 2018 Sample Design Report

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). NSDUH 2018 Sample Design Report [Dataset]. https://catalog.data.gov/dataset/nsduh-2018-sample-design-report
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttp://www.samhsa.gov/
    Description

    This report details the 2018 sample design for the NSDUH and covers the design overview, target population, and stages of sample selection. The design overview describes how the sample design remains consistent with NSDUH’s designs since 1991 and has extended coverage of the sample since then to include additional resident populations. The 2018 target population for this report comprises a civilian, noninstitutionalized population aged 12 years or older residing within the 50 states and the District of Columbia. There are three stages of sample selection that are explained in terms of how to select and aggregate the appropriate census tracts and segments based on state sampling regions (SSRs) for each state.

  6. d

    Data from: Compliance with mandatory reporting of clinical trial results on...

    • datadryad.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Jan 4, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew P. Prayle; Matthew N. Hurley; Alan R. Smyth (2012). Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: cross sectional study [Dataset]. http://doi.org/10.5061/dryad.j512f21p
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 4, 2012
    Dataset provided by
    Dryad
    Authors
    Andrew P. Prayle; Matthew N. Hurley; Alan R. Smyth
    Time period covered
    Dec 13, 2011
    Area covered
    United States
    Description

    clinicaltrials.gov_searchThis is complete original dataset.identify completed trialsThis is the R script which when run on "clinicaltrials.gov_search.txt" will produce a .csv file which lists all the completed trials.FDA_table_with_sensThis is the final dataset after cross referencing the trials. An explanation of the variables is included in the supplementary file "2011-10-31 Prayle Hurley Smyth Supplementary file 3 variables in the dataset".analysis_after_FDA_categorization_and_sensThis R script reproduces the analysis from the paper, including the tables and statistical tests. The comments should make it self explanatory.2011-11-02 prayle hurley smyth supplementary file 1 STROBE checklistThis is a STROBE checklist for the study2011-10-31 Prayle Hurley Smyth Supplementary file 2 examples of categorizationThis is a supplementary file which illustrates some of the decisions which had to be made when categorizing trials.2011-10-31 Prayle Hurley Smyth Supplementary file 3 variables in th...

  7. Data from: PISA 2006 Technical Report

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Mar 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). PISA 2006 Technical Report [Dataset]. https://catalog.data.gov/dataset/pisa-2006-technical-report
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    United States Department of Statehttp://state.gov/
    Description

    The PISA 2006 Technical Report describes the methodology underlying the PISA 2006 survey. It examines additional features related to the implementation of the project at a level of detail that allows researchers to understand and replicate its analyses. The reader will find a wealth of information on the test and sample design, methodologies used to analyse the data, technical features of the project and quality control mechanisms.

  8. c

    ckanext-statsresources

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-statsresources [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-statsresources
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The ckanext-statsresources extension for CKAN provides the ability to generate and publish various statistics and reports from a CKAN instance as resources within datasets. It leverages the ckanext-report extension to create these reports and allows administrators to configure which reports are generated, in what format, and for which datasets. This extension automates the process of creating and updating statistical resources, making it easier to provide insights into the CKAN instance's data. Key Features: Automated Report Generation: Automatically generates statistical reports based on a configurable map of report names, formats, dataset IDs, and resource titles. Multiple Format Support: Supports exporting reports in various formats, as determined by the underlying ckanext-report extension (example: JSON and CSV). Configurable Report Options: Allows customization of report content by specifying options such as including or excluding private and draft datasets. Command-Line Interface (CLI): Provides paster commands to list available statistical resources and generate/update the corresponding resources. Access Control: Configure to limit access to the generated reports pages to only sysadmins. Dependency on ckanext-report: Built upon ckanext-report to handle underlying report creation, so ckanext-report needs to be installed first. Use Cases: Dataset Usage Monitoring: Regularly generate reports on dataset creation dates, providing insight into the growth of the CKAN instance's data catalog.

  9. h

    medreport_text_1000

    • huggingface.co
    Updated Aug 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Wouk Kim (2025). medreport_text_1000 [Dataset]. https://huggingface.co/datasets/wouk1805/medreport_text_1000
    Explore at:
    Dataset updated
    Aug 5, 2025
    Authors
    Young-Wouk Kim
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    MedReport - Reports Dataset

      Dataset Description
    

    This dataset contains medical audio transcriptions and the corresponding structured reports.

      Columns
    

    input: Audio transcription output: Structured medical report sample_id: Example identifier

      Statistics
    

    Total examples: 1000 License: Apache License 2.0 Created: 2025-08-05

      Usage
    
    
    
    
    
      Loading the dataset
    

    from datasets import load_dataset

    Load the dataset

    full_dataset =… See the full description on the dataset page: https://huggingface.co/datasets/wouk1805/medreport_text_1000.

  10. o

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data:...

    • openicpsr.org
    Updated Jun 5, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2017). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest (Return A), 1960-2020 [Dataset]. http://doi.org/10.3886/E100707V17
    Explore at:
    Dataset updated
    Jun 5, 2017
    Dataset provided by
    Princeton University
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1960 - 2020
    Area covered
    United States
    Description

    For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 17 release notes:Adds data for 2020.Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Offenses Known and Clearances by Arrest data they release. Changes .rda files to .rds. Please note that in 2020 the card_actual_pt variable always returns that the month was reported. This causes 2020 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 16 release notes:Changes release notes description, does not change data.Version 15 release notes:Adds data for 2019.Please note that in 2019 the card_actual_pt variable always returns that the month was reported. This causes 2019 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 14 release notes:Adds arson data from the UCR's Arson dataset. This adds just the arson variables about the number of arson incidents, not the complete set of variables in that dataset (which include damages from arson and whether structures were occupied or not during the arson.As arson is an index crime, both the total index and the index property columns now include arson offenses. The "all_crimes" variables also now include arson.Adds a arson_number_of_months_missing column indicating how many months were not reporting (i.e. missing from the annual data) in the arson data. In most cases, this is the same as the normal number_of_months_missing but not always so please check if you intend to use arson data.Please note that in 2018 the card_actual_pt variable always returns that the month was reported. This causes 2018 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this).For some reason, a small number of agencies (primarily federal agencies) had the same ORI number in 2018 and I removed these duplicate agencies. Version 13 release notes: Adds 2018 dataNew Orleans (ORI = LANPD00) data had more unfounded crimes than actual crimes in 2018 so unfounded columns for 2018 are all NA. Version 12 release notes: Adds population 1-3 columns - if an agency is in multiple counties, these variables show the population in the county with the most people in that agency in it (population_1), second largest county (population_2), and third largest county (population_3). Also adds county 1-3 columns which identify which counties the agency is in. The population column is the sum of the three population columns. Thanks to Mike Maltz for the suggestion!Fixes bug in the crosswalk data that is merged to this file that had the incorrect FIPS code for Clinton, Tennessee (ORI = TN00101). Thanks for Brooke Watson for catching this bug!Adds a last_month_reported column which says which month was reported last. This is actually how the FBI defines number_of_months_reported so is a more accurate representation of that. Removes the number_of_months_reported variable as the name is misleading. You should use the last_month_reported or the number_of_months_missing (see below) variable instead.Adds a number_of_months_missin

  11. U

    Reporting the limits of detection (LOD) and quantification (LOQ) for...

    • data.usgs.gov
    • datasets.ai
    • +3more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Merkes; Katy Klymus; Michael Allison; Caren Goldberg; Caren Helbin; Margaret E.; Craig Jackson; Richard Lance; Anna Mangan; Emy Monroe; Antoineete Piagio; Joel Stokdyk; Chris Wilson; Cathy Richter, Reporting the limits of detection (LOD) and quantification (LOQ) for environmental DNA assays: Data [Dataset]. http://doi.org/10.5066/P9AKHU1R
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Christopher Merkes; Katy Klymus; Michael Allison; Caren Goldberg; Caren Helbin; Margaret E.; Craig Jackson; Richard Lance; Anna Mangan; Emy Monroe; Antoineete Piagio; Joel Stokdyk; Chris Wilson; Cathy Richter
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Nov 1, 2017 - May 1, 2019
    Description

    This data set was collected to provide examples and aid in developing a standardized way of determining LOD and LOQ for eDNA assays and has 3 data files. GEDWG_LOD_DATA3.csv is raw qPCR data from multiple labs running multiple standards of known concentration for eDNA assays they regularly use. Comparison-Data.csv is the merged data output from running a generic LOD/LOQ calculator script multiple times with different LOD model settings. The generic LOD/LOQ calculator script is available at: https://github.com/cmerkes/qPCR_LOD_Calc, and details about the multiple settings used are commented in the analysis script available at: https://github.com/cmerkes/LOD_Analysis

  12. O

    Water Point Source Sample Results

    • opendata.maryland.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +2more
    application/rdfxml +5
    Updated Jul 11, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maryland Department of the Environment (2014). Water Point Source Sample Results [Dataset]. https://opendata.maryland.gov/Energy-and-Environment/Water-Point-Source-Sample-Results/eqs6-savc
    Explore at:
    csv, application/rssxml, xml, json, tsv, application/rdfxmlAvailable download formats
    Dataset updated
    Jul 11, 2014
    Dataset authored and provided by
    Maryland Department of the Environment
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Discharge monitoring reports (DMRs) and monthly operating reports (MORs) for the 58 major WWTPs, 215 minor WWTPs, and 10 major industrial point sources within the State of Maryland and are stored in MDE’s point source database. This data set contains results for nutrients such as phosphorous and nitrogen that may impact water quality in the Chesapeake Bay. This dataset covers the time period from 2008 to 2013.

  13. z

    A vigiPoint characterisation of female versus male reports in VigiBase, the...

    • zenodo.org
    • search.dataone.org
    • +1more
    bin
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Watson; Sarah Watson; Ola Caster; Ola Caster (2022). A vigiPoint characterisation of female versus male reports in VigiBase, the WHO global database of individual case safety reports [Dataset]. http://doi.org/10.5061/dryad.8cz8w9gk1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 2, 2022
    Dataset provided by
    Zenodo
    Authors
    Sarah Watson; Sarah Watson; Ola Caster; Ola Caster
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    General information

    This data is supplementary material to the paper by Watson et al. on sex differences in global reporting of adverse drug reactions [1]. Readers are referred to this paper for a detailed description of the context in which the data was generated. Anyone intending to use this data for any purpose should read the publicly available information on the VigiBase source data [2, 3]. The conditions specified in the caveat document [3] must be adhered to.

    Source dataset

    The dataset published here is based on analyses performed in VigiBase, the WHO global database of individual case safety reports [4]. All reports entered into VigiBase from its inception in 1967 up to 2 January 2018 with patient sex coded as either female or male have been included, except suspected duplicate reports [5]. In total, the source dataset contained 9,056,566 female and 6,012,804 male reports.

    Statistical analysis

    The characteristics of the female reports were compared to those of the male reports using a method called vigiPoint [6]. This is a method for comparing two or more sets of reports (here female and male reports) on a large set of reporting variables, and highlight any feature in which the sets are different in a statistically and clinically relevant manner. For example, patient age group is a reporting variable, and the different age groups 0 - 27 days, 28 days - 23 months et cetera are features within this variable. The statistical analysis is based on shrinkage log odds ratios computed as a comparison between the two sets of reports for each feature, including all reports without missing information for the variable under consideration. The specific output from vigiPoint is defined precisely below. Here, the results for 18 different variables with a total of 44,486 features are presented. 74 of these features were highlighted as so called vigiPoint key features, suggesting a statistically and clinically significant difference between female and male reports in VigiBase.

    Description of published dataset

    The dataset is provided in the form of a MS Excel spreadsheet (.xlsx file) with nine columns and 44,486 rows (excluding the header), each corresponding to a specific feature. Below follows a detailed description of the data included in the different columns.

    Variable: This column indicates the reporting variable to which the specific feature belongs. Six of these variables are described in the original publication by Watson et al.: country of origin, geographical region of origin, type of reporter, patient age group, MedDRA SOC, ATC level 2 of reported drugs, seriousness, and fatality [1]. The remaining 12 are described here:

    • MedDRA HLGT (high-level group term), MedDRA HLT (high-level term) and MedDRA PT (preferred term) are defined analogously to the MedDRA SOC (system organ class) [1], only at lower levels of the MedDRA (Medical Dictionary for Regulatory Activities) hierarchy. Here, MedDRA version 20.1 has been used.
    • ATC level 3 of reported drugs is defined analogously to the variable ATC level 2 of reported drugs [1], only one step further down in the ATC (Anatomical Therapeutical Classification) hierarchy.
    • The vigiGrade completeness score is a measure of how complete each report is with respect to certain report fields useful for causality assessment [7]. The completeness score has been dichotomised into two features, 'Above or equal to 0.8' and 'Below 0.8'. The maximum possible score for an individual report is 1.0.
    • The date of VigiBase entry is simply the time when a report was entered into VigiBase. This variable is divided into 14 features that are either individual years or ranges of years.
    • The number of reported drugs is the number of unique drugs that are coded on a report as either suspected, interacting, or concomitant. A drug is here defined as an entry at the preferred base (i.e. substance) level of the WHODRUG terminology. The variable is divided into four features: 'One drug', 'Two drugs', '3-5 drugs', and 'More than 5 drugs'.
    • The number of reported MedDRA PTs is the number of unique MedDRA preferred terms that are coded as events on a report. This variable is divided into four features in exactly the same way as the reported drugs.
    • A reported drug is a drug coded on a report as either suspected, interacting, or concomitant. As above, a drug is defined as an entry at the preferred base (i.e. substance) level of the WHODRUG terminology. This variable has almost 23,000 features, one for each drug that occurs in at least one female or one male report.
    • The type of report indicates the type of individual case report. The vast majority belongs to the feature 'Spontaneous', but there are four other possible features for this variable.

    The Variable column can be useful for filtering the data, for example if one is interested in one or a few specific variables.

    Feature: This column contains each of the 44,486 included features. The vast majority should be self-explanatory, or else they have been explained above, or in the original paper [1].

    Female reports and Male reports: These columns show the number of female and male reports, respectively, for which the specific feature is present.

    Proportion among female reports and Proportion among male reports: These columns show the proportions within the female and male reports, respectively, for which the specific feature is present. Comparing these crude proportions is the simplest and most intuitive way to contrast the female and male reports, and a useful complement to the specific vigiPoint output.

    Odds ratio: The odds ratio is a basic measure of association between the classification of reports into female and male reports and a given reporting feature, and hence can be used to compare female and male reports with respect to this feature. It is formally defined as a / (bc / d), where

    • a is the number of female reports with the feature
    • b is the number of female reports without the feature (excluding reports where the variable is missing)
    • c is the number of male reports with the feature
    • d is the number of male reports without the feature (excluding reports where the variable is missing).

    This crude odds ratio can also be computed as (pfemale / (1-pfemale)) / (pmale / (1-pmale)), where pfemale and pmale are the proportions described earlier. If the odds ratio is above 1, the feature is more common among the female than the male reports; if below 1, the feature is less common among the female than the male reports. Note that the odds ratio can be mathematically undefined, in which case it is missing in the published data.

    vigiPoint score: This score is defined based on an odds ratio with added statistical shrinkage, defined as (a + k) / ((bc / d) + k), where k is 1% of the total number of female reports, or about 9,000. While the shrinkage adds robustness to the measure of association, it makes interpretation more difficult, which is why the crude proportions and unshrunk odds ratios are also presented. Further, 99% credibility intervals are computed for the shrinkage odds ratios, and these intervals are transformed onto a log2 scale [6]. The vigiPoint score is then defined as the lower endpoint of the interval, if that endpoint is above 0; as the higher endpoint of the interval, if that endpoint is below 0; and otherwise as 0. The vigiPoint score is useful for sorting the features from strongest positive to strongest negative associations, and/or to filter the features according to some user-defined criteria.

    vigiPoint key feature: Features are classified as vigiPoint key features if their vigiPoint score is either above 0.5 or below -0.5. The specific thereshold of 0.5 is arbitrary, but chosen to identify features where the two sets of reports (here female and male reports) differ in a clinically significant way.

    References

    1. Watson S, Caster O, Rochon PA, den Ruijter H. Reported adverse drug reactions in women and men: Aggregated evidence from globally collected individual case reports during half a decade. EClinicalMedicine 2019.
    2. Uppsala Monitoring Centre. Guideline for using VigiBase data in studies.
    3. Uppsala Monitoring Centre. Caveat document: Statement of reservations, limitations, and conditions relating to data released from VigiBase, the WHO global database of individual case safety reports (ICSRs).
    4. Lindquist M. VigiBase, the WHO Global ICSR Database System: Basic Facts. The Drug Information Journal 2008; 42(5): 409-19.
    5. Norén GN, Orre R, Bate A, Edwards IR. Duplicate detection in adverse drug reaction surveillance. Data Mining and Knowledge Discovery 2007; 14(3): 305-28.
    6. Juhlin K, Star K, Norén GN. A method for data-driven exploration to pinpoint key features in medical data and facilitate expert review. Pharmacoepidemiology and Drug Safety 2017; 26(10):

  14. e

    COVID-19 Coronavirus data - weekly (from 17 December 2020)

    • data.europa.eu
    csv, excel xlsx, html +3
    Updated Dec 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Disease Prevention and Control (2020). COVID-19 Coronavirus data - weekly (from 17 December 2020) [Dataset]. https://data.europa.eu/data/datasets/covid-19-coronavirus-data-weekly-from-17-december-2020?locale=en
    Explore at:
    html, csv, json, unknown, xml, excel xlsxAvailable download formats
    Dataset updated
    Dec 17, 2020
    Dataset authored and provided by
    European Centre for Disease Prevention and Control
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains a weekly situation update on COVID-19, the epidemiological curve and the global geographical distribution (EU/EEA and the UK, worldwide).

    Since the beginning of the coronavirus pandemic, ECDC’s Epidemic Intelligence team has collected the number of COVID-19 cases and deaths, based on reports from health authorities worldwide. This comprehensive and systematic process was carried out on a daily basis until 14/12/2020. See the discontinued daily dataset: COVID-19 Coronavirus data - daily. ECDC’s decision to discontinue daily data collection is based on the fact that the daily number of cases reported or published by countries is frequently subject to retrospective corrections, delays in reporting and/or clustered reporting of data for several days. Therefore, the daily number of cases may not reflect the true number of cases at EU/EEA level at a given day of reporting. Consequently, day to day variations in the number of cases does not constitute a valid basis for policy decisions.

    ECDC continues to monitor the situation. Every week between Monday and Wednesday, a team of epidemiologists screen up to 500 relevant sources to collect the latest figures for publication on Thursday. The data screening is followed by ECDC’s standard epidemic intelligence process for which every single data entry is validated and documented in an ECDC database. An extract of this database, complete with up-to-date figures and data visualisations, is then shared on the ECDC website, ensuring a maximum level of transparency.

    ECDC receives regular updates from EU/EEA countries through the Early Warning and Response System (EWRS), The European Surveillance System (TESSy), the World Health Organization (WHO) and email exchanges with other international stakeholders. This information is complemented by screening up to 500 sources every day to collect COVID-19 figures from 196 countries. This includes websites of ministries of health (43% of the total number of sources), websites of public health institutes (9%), websites from other national authorities (ministries of social services and welfare, governments, prime minister cabinets, cabinets of ministries, websites on health statistics and official response teams) (6%), WHO websites and WHO situation reports (2%), and official dashboards and interactive maps from national and international institutions (10%). In addition, ECDC screens social media accounts maintained by national authorities on for example Twitter, Facebook, YouTube or Telegram accounts run by ministries of health (28%) and other official sources (e.g. official media outlets) (2%). Several media and social media sources are screened to gather additional information which can be validated with the official sources previously mentioned. Only cases and deaths reported by the national and regional competent authorities from the countries and territories listed are aggregated in our database.

    Disclaimer: National updates are published at different times and in different time zones. This, and the time ECDC needs to process these data, might lead to discrepancies between the national numbers and the numbers published by ECDC. Users are advised to use all data with caution and awareness of their limitations. Data are subject to retrospective corrections; corrected datasets are released as soon as processing of updated national data has been completed.

    If you reuse or enrich this dataset, please share it with us.

  15. California Water Rights Measurement Devices (Reported in Annual Report)

    • catalog.data.gov
    • data.ca.gov
    • +1more
    Updated Mar 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Water Resources Control Board (2024). California Water Rights Measurement Devices (Reported in Annual Report) [Dataset]. https://catalog.data.gov/dataset/california-water-rights-measurement-devices-reported-in-annual-report
    Explore at:
    Dataset updated
    Mar 30, 2024
    Dataset provided by
    California State Water Resources Control Board
    Area covered
    California
    Description

    This list includes detail information about the measurement devices and measurement methods associated with the diversion and storage of water as reported annually for water rights as stored in the State Water Resources Control Board's "Electronic Water Rights Information Management System" (EWRIMS) database. All water right holders are required to submit an annual report including information related to the measurement devices and measurement methods associated with the diversion or storage of water. Each row correspond with a unique annual report-water right id-and measurement device ID combination and its associated data. This file is in flat file format and may not include all information associated to a water right such all uses and seasons or the amounts reported used for every month. Other information may be available in the associated flat files for each category. Examples of annual reports templates are provided as supporting information.

  16. d

    COVID Impact Survey - Public Data

    • data.world
    csv, zip
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2024). COVID Impact Survey - Public Data [Dataset]. https://data.world/associatedpress/covid-impact-survey-public-data
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Oct 16, 2024
    Authors
    The Associated Press
    Description

    Overview

    The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.

    Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).

    The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.

    The survey is focused on three core areas of research:

    • Physical Health: Symptoms related to COVID-19, relevant existing conditions and health insurance coverage.
    • Economic and Financial Health: Employment, food security, and government cash assistance.
    • Social and Mental Health: Communication with friends and family, anxiety and volunteerism. (Questions based on those used on the U.S. Census Bureau’s Current Population Survey.) ## Using this Data - IMPORTANT This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!

    Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.

    Queries

    If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".

    Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.

    Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.

    The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."

    Margin of Error

    The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:

    • At least twice the margin of error, you can report there is a clear difference.
    • At least as large as the margin of error, you can report there is a slight or apparent difference.
    • Less than or equal to the margin of error, you can report that the respondents are divided or there is no difference. ## A Note on Timing Survey results will generally be posted under embargo on Tuesday evenings. The data is available for release at 1 p.m. ET Thursdays.

    About the Data

    The survey data will be provided under embargo in both comma-delimited and statistical formats.

    Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)

    Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.

    Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.

    Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.

    Attribution

    Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.

    AP Data Distributions

    ​To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

  17. Family food datasets

    • gov.uk
    Updated Oct 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Environment, Food & Rural Affairs (2024). Family food datasets [Dataset]. https://www.gov.uk/government/statistical-data-sets/family-food-datasets
    Explore at:
    Dataset updated
    Oct 17, 2024
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Environment, Food & Rural Affairs
    Description

    These family food datasets contain more detailed information than the ‘Family Food’ report and mainly provide statistics from 2001 onwards. The UK household purchases and the UK household expenditure spreadsheets include statistics from 1974 onwards. These spreadsheets are updated annually when a new edition of the ‘Family Food’ report is published.

    The ‘purchases’ spreadsheets give the average quantity of food and drink purchased per person per week for each food and drink category. The ‘nutrient intake’ spreadsheets give the average nutrient intake (eg energy, carbohydrates, protein, fat, fibre, minerals and vitamins) from food and drink per person per day. The ‘expenditure’ spreadsheets give the average amount spent in pence per person per week on each type of food and drink. Several different breakdowns are provided in addition to the UK averages including figures by region, income, household composition and characteristics of the household reference person.

    UK (updated with new FYE 2023 data)

    countries and regions (CR) (updated with FYE 2022 data)

    equivalised income decile group (EID) (updated with FYE 2022 data)

  18. Small example of a frequency table with patterns and diaries.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johan de Rooi; Sarah K. Nørgaard; Morten A. Rasmussen; Klaus Bønnelykke; Hans Bisgaard; Age K. Smilde (2023). Small example of a frequency table with patterns and diaries. [Dataset]. http://doi.org/10.1371/journal.pone.0207177.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Johan de Rooi; Sarah K. Nørgaard; Morten A. Rasmussen; Klaus Bønnelykke; Hans Bisgaard; Age K. Smilde
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Small example of a frequency table with patterns and diaries.

  19. Spinal Cord Images - Spine MRI Dataset

    • kaggle.com
    Updated Feb 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2024). Spinal Cord Images - Spine MRI Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/spinal-cord-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Kaggle
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spine MRI Dataset, Fracture Detection, Anomaly Detection & Segmentation

    The dataset consists of .dcm files containing MRI scans of the spine of the person with several dystrophic changes, such as osteochondrosis, spondyloarthrosis, hemangioma, physiological lordosis smoothed, osteophytes and aggravated defects. The images are labeled by the doctors and accompanied by report in PDF-format.

    The dataset includes 9 studies, made from the different angles which provide a comprehensive understanding of a several dystrophic changes and useful in training spine anomaly classification algorithms. Each scan includes detailed imaging of the spine, including the vertebrae, discs, nerves, and surrounding tissues.

    MRI study angles in the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F62acce9c1d60720bdd396e036718f406%2FFrame%2084.png?generation=1708543957118470&alt=media" alt="">

    💴 For Commercial Usage: Full version of the dataset includes 20,000 spine studies of people with different conditions, leave a request on TrainingData to buy the dataset

    Types of diseases and conditions in the full dataset:

    • Degeneration of discs
    • Osteophytes
    • Osteochondrosis
    • Hemangioma
    • Disk extrusion
    • Spondylitis
    • AND MANY OTHER CONDITIONS

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fd2f21b9ac7dc26a3554e4647db47df57%2F3.gif?generation=1708543677763656&alt=media" alt="">

    Researchers and healthcare professionals can use this dataset to study spinal conditions and disorders, such as herniated discs, spinal stenosis, scoliosis, and fractures. The dataset can also be used to develop and evaluate new imaging techniques, computer algorithms for image analysis, and artificial intelligence models for automated diagnosis.

    OTHER MEDICAL SPINE MRI DATASETS:

    💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

    Content

    The dataset includes:

    • ST000001: includes subfolders with 9 studies. Each study includes MRI-scans in .dcm and .jpg formats,
    • DICOMDIR: includes information about the patient's condition and links to access files,
    • Spine_MRI_2.pdf: includes medical report, provided by the radiologist,
    • .csv file: includes id of the studies and the number of files

    Medical reports include the following data:

    • Patient's demographic information,
    • Description of the case,
    • Preliminary diagnosis,
    • Recommendations on the further actions

    All patients consented to the publication of data

    Medical data might be collected in accordance with your requirements.

    TrainingData provides high-quality data annotation tailored to your needs

    keywords: visual, label, positive, negative, symptoms, clinically, sensory, varicella, syndrome, predictors, diagnosed, rsna cervical, image train, segmentations meta, spine train, mri spine scans, spinal imaging, radiology dataset, neuroimaging, medical imaging data, image segmentation, lumbar spine mri, thoracic spine mri, cervical spine mri, spine anatomy, spinal cord mri, orthopedic imaging, radiologist dataset, mri scan analysis, spine mri dataset, machine learning medical imaging, spinal abnormalities, image classification, neural network spine scans, mri data analysis, deep learning medical imaging, mri image processing, spine tumor detection, spine injury diagnosis, mri image segmentation, spine mri classification, artificial intelligence in radiology, spine abnormalities detection, spine pathology analysis, mri feature extraction, tomography, cloud

  20. C

    Police Data: Crime Reports

    • data.somervillema.gov
    • s.cnmilf.com
    • +1more
    csv, xlsx, xml
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somerville PD (2025). Police Data: Crime Reports [Dataset]. https://data.somervillema.gov/w/aghs-hqvg/default?cur=Rfva9GyGNMT
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    Aug 2, 2025
    Dataset authored and provided by
    Somerville PD
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset contains crime reports from the City of Somerville Police Department's records management system from 2017 to present. Each data point represents an incident, which may involve multiple offenses (the most severe offense is provided here).

    Incidents deemed sensitive by enforcement agencies are included in the data set but are stripped of time or location information to protect the privacy of victims. For these incidents, only the year of the offense is provided.

    This data set is refreshed daily with data appearing with a one-month delay (for example, crime reports from 1/1 will appear on 2/1). If a daily update does not refresh, please email data@somervillema.gov.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Department of Information Technology (DoIT) (2015). Dataset Freshness Report for data.maryland.gov [Dataset]. https://data.wu.ac.at/schema/data_maryland_gov/OHlwYS1jOWQ5

Dataset Freshness Report for data.maryland.gov

Explore at:
csv, json, xmlAvailable download formats
Dataset updated
Aug 12, 2015
Dataset provided by
Department of Information Technology (DoIT)
License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Area covered
Maryland
Description

This dataset shows whether each dataset on data.maryland.gov has been updated recently enough. For example, datasets containing weekly data should be updated at least every 7 days. Datasets containing monthly data should be updated at least every 31 days. This dataset also shows a compendium of metadata from all data.maryland.gov datasets.

This report was created by the Department of Information Technology (DoIT) on August 12 2015. New reports will be uploaded daily (this report is itself included in the report, so that users can see whether new reports are consistently being uploaded each week). Generation of this report uses the Socrata Open Data (API) to retrieve metadata on date of last data update and update frequency. Analysis and formatting of the metadata use Javascript, jQuery, and AJAX.

This report will be used during meetings of the Maryland Open Data Council to curate datasets for maintenance and make sure the Open Data Portal's data stays up to date.

Search
Clear search
Close search
Google apps
Main menu