27 datasets found
  1. Address Standardization

    • hub.arcgis.com
    • sdiinnovation-geoplatform.hub.arcgis.com
    Updated Jul 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Address Standardization [Dataset]. https://hub.arcgis.com/content/6c8e054fbdde4564b3b416eacaed3539
    Explore at:
    Dataset updated
    Jul 25, 2022
    Dataset authored and provided by
    Esrihttp://esri.com/
    Description

    This deep learning model is used to transform incorrect and non-standard addresses into standardized addresses. Address standardization is a process of formatting and correcting addresses in accordance with global standards. It includes all the required address elements (i.e., street number, apartment number, street name, city, state, and postal) and is used by the standard postal service.

          An address can be termed as non-standard because of incomplete details (missing street name or zip code), invalid information (incorrect address), incorrect information (typos, misspellings, formatting of abbreviations), or inaccurate information (wrong house number or street name). These errors make it difficult to locate a destination. Although a standardized address does not guarantee the address validity, it simply converts an address into the correct format. This deep learning model is trained on address dataset provided by openaddresses.io and can be used to standardize addresses from 10 different countries.
    
    
    
      Using the model
    
    
          Follow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.
    
    
    
        Fine-tuning the modelThis model can be fine-tuned using the Train Deep Learning Model tool. Follow the guide to fine-tune this model.Input
        Text (non-standard address) on which address standardization will be performed.
    
        Output
        Text (standard address)
    
        Supported countries
        This model supports addresses from the following countries:
    
          AT – Austria
          AU – Australia
          CA – Canada
          CH – Switzerland
          DK – Denmark
          ES – Spain
          FR – France
          LU – Luxemburg
          SI – Slovenia
          US – United States
    
        Model architecture
        This model uses the T5-base architecture implemented in Hugging Face Transformers.
        Accuracy metrics
        This model has an accuracy of 90.18 percent.
    
        Training dataThe model has been trained on openly licensed data from openaddresses.io.Sample results
        Here are a few results from the model.
    
  2. Z

    Example subjects for Mobilise-D data standardization

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Del Din, Silvia (2022). Example subjects for Mobilise-D data standardization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7185428
    Explore at:
    Dataset updated
    Oct 11, 2022
    Dataset provided by
    Cereatti, Andrea
    Caruso, Marco
    Gazit, Eran
    Kluge, Felix
    Reggi, Luca
    Bonci, Tecla
    Rochester, Lynn
    Palmerini, Luca
    Paraschiv-Ionescu, Anisoara
    Mazzà, Claudia
    on behalf of the Mobilise-D consortium
    Micó-Amigo, Encarna
    Del Din, Silvia
    Kirk, Cameron
    Salis, Francesca
    Hansen, Clint
    Ullrich, Martin
    Soltani, Abolfazl
    D'Ascanio, Ilaria
    Bertuletti, Stefano
    Küderle, Arne
    Hiden, Hugo
    Chiari, Lorenzo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.

    The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).

  3. f

    Data from: A concentration-based approach to data classification for...

    • tandf.figshare.com
    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A concentration-based approach to data classification for choropleth mapping [Dataset]. https://tandf.figshare.com/articles/dataset/A_concentration_based_approach_to_data_classification_for_choropleth_mapping/1456086
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Robert G. Cromley; Shuowei Zhang; Natalia Vorotyntseva
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The choropleth map is a device used for the display of socioeconomic data associated with an areal partition of geographic space. Cartographers emphasize the need to standardize any raw count data by an area-based total before displaying the data in a choropleth map. The standardization process converts the raw data from an absolute measure into a relative measure. However, there is recognition that the standardizing process does not enable the map reader to distinguish between low–low and high–high numerator/denominator differences. This research uses concentration-based classification schemes using Lorenz curves to address some of these issues. A test data set of nonwhite birth rate by county in North Carolina is used to demonstrate how this approach differs from traditional mean–variance-based systems such as the Jenks’ optimal classification scheme.

  4. IT Policies and Standards - NASA Enterprise Architecture Procedures

    • catalog.data.gov
    • data.nasa.gov
    • +2more
    Updated Dec 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2023). IT Policies and Standards - NASA Enterprise Architecture Procedures [Dataset]. https://catalog.data.gov/dataset/it-policies-and-standards-nasa-enterprise-architecture-procedures
    Explore at:
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The documents contained in this dataset reflect NASA's comprehensive IT policy in compliance with Federal Government laws and regulations.

  5. Purchase Order Data

    • catalog.data.gov
    • data.ca.gov
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of General Services (2024). Purchase Order Data [Dataset]. https://catalog.data.gov/dataset/purchase-order-data
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Department of General Services
    Description

    The State Contract and Procurement Registration System (SCPRS) was established in 2003, as a centralized database of information on State contracts and purchases over $5000. eSCPRS represents the data captured in the State's eProcurement (eP) system, Bidsync, as of March 16, 2009. The data provided is an extract from that system for fiscal years 2012-2013, 2013-2014, and 2014-2015 Data Limitations: Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal. Data Collection Methodology: The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database. Secondary/Related Resources: State Contract Manual (SCM) vol. 2 http://www.dgs.ca.gov/pd/Resources/publications/SCM2.aspx State Contract Manual (SCM) vol. 3 http://www.dgs.ca.gov/pd/Resources/publications/SCM3.aspx Buying Green http://www.dgs.ca.gov/buyinggreen/Home.aspx United Nations Standard Products and Services Code, http://www.unspsc.org/

  6. Benchmarking of NIST LWC Finalists on Microcontrollers

    • s.cnmilf.com
    • datasets.ai
    • +2more
    Updated May 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). Benchmarking of NIST LWC Finalists on Microcontrollers [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/benchmarking-of-nist-lwc-finalists-on-microcontrollers
    Explore at:
    Dataset updated
    May 9, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Software benchmarking study of finalists in NIST's lightweight cryptography standardization process. This data set includes the results on several microcontrollers, as well as the benchmarking framework used.

  7. g

    Purchase Order Data | gimi9.com

    • gimi9.com
    Updated Oct 28, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). Purchase Order Data | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_purchase-order-data
    Explore at:
    Dataset updated
    Oct 28, 2015
    Description

    Data Limitations: Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal. Data Collection Methodology: The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database. Secondary/Related Resources: State Contract Manual (SCM) vol. 2 http://www.dgs.ca.gov/pd/Resources/publications/SCM2.aspx

  8. GC HR and Pay - Data Standards

    • ouvert.canada.ca
    • open.canada.ca
    • +1more
    docx
    Updated Mar 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public Services and Procurement Canada (2025). GC HR and Pay - Data Standards [Dataset]. https://ouvert.canada.ca/data/dataset/67c81048-e230-47f2-ad9a-38bc68f3b51e
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 22, 2025
    Dataset provided by
    Public Services and Procurement Canadahttp://www.pwgsc.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Transforming human resources (HR) and pay for the Government of Canada into an integrated, flexible, and modern ecosystem is a complex challenge. To support these activities, the Human Capital Management (HCM) within Public Services and Procurement Canada (PSPC) is working to update processes, standards, and rules to govern HR and Pay data. HR and Pay Data Standards will support trustworthy, high-quality data to easily move throughout the enterprise, as needed, enabling improved insights, decision-making, and more streamlined business processes. These Data Standards will focus on core employee data attributes within the Single Employee Profile (SEP), such as: first and last names, date of birth, first official language, preferred language, home address, mailing address, province of residence, marital status, personal contact information (email, phone), security clearance, PRI, sex at birth, and other important HR and Pay data. These data standards are required above and beyond the GC enterprise data reference standards for the HR and Pay data domain. Progress in implementing these data standards is done through the Unified Actions for Pay (UAP) Measure 2.

  9. f

    Data from: FLiPPR: A Processor for Limited Proteolysis (LiP) Mass...

    • figshare.com
    • acs.figshare.com
    xlsx
    Updated May 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edgar Manriquez-Sandoval; Joy Brewer; Gabriela Lule; Samanta Lopez; Stephen D. Fried (2024). FLiPPR: A Processor for Limited Proteolysis (LiP) Mass Spectrometry Data Sets Built on FragPipe [Dataset]. http://doi.org/10.1021/acs.jproteome.3c00887.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 24, 2024
    Dataset provided by
    ACS Publications
    Authors
    Edgar Manriquez-Sandoval; Joy Brewer; Gabriela Lule; Samanta Lopez; Stephen D. Fried
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Here, we present FLiPPR, or FragPipe LiP (limited proteolysis) Processor, a tool that facilitates the analysis of data from limited proteolysis mass spectrometry (LiP-MS) experiments following primary search and quantification in FragPipe. LiP-MS has emerged as a method that can provide proteome-wide information on protein structure and has been applied to a range of biological and biophysical questions. Although LiP-MS can be carried out with standard laboratory reagents and mass spectrometers, analyzing the data can be slow and poses unique challenges compared to typical quantitative proteomics workflows. To address this, we leverage FragPipe and then process its output in FLiPPR. FLiPPR formalizes a specific data imputation heuristic that carefully uses missing data in LiP-MS experiments to report on the most significant structural changes. Moreover, FLiPPR introduces a data merging scheme and a protein-centric multiple hypothesis correction scheme, enabling processed LiP-MS data sets to be more robust and less redundant. These improvements strengthen statistical trends when previously published data are reanalyzed with the FragPipe/FLiPPR workflow. We hope that FLiPPR will lower the barrier for more users to adopt LiP-MS, standardize statistical procedures for LiP-MS data analysis, and systematize output to facilitate eventual larger-scale integration of LiP-MS data.

  10. f

    DataSheet1_Recommendations from the COST action CA17116 (SPRINT) for the...

    • frontiersin.figshare.com
    docx
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandar Janev; Asmita Banerjee; Adelheid Weidinger; Jure Dimec; Brane Leskošek; Antonietta Rosa Silini; Tina Cirman; Susanne Wolbank; Taja Železnik Ramuta; Urška Dragin Jerman; Assunta Pandolfi; Roberta Di Pietro; Michela Pozzobon; Bernd Giebel; Günther Eissner; Polonca Ferk; Ingrid Lang-Olip; Francesco Alviano; Olga Soritau; Ornella Parolini; Mateja Erdani Kreft (2023). DataSheet1_Recommendations from the COST action CA17116 (SPRINT) for the standardization of perinatal derivative preparation and in vitro testing.docx [Dataset]. http://doi.org/10.3389/fbioe.2023.1258753.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Frontiers
    Authors
    Aleksandar Janev; Asmita Banerjee; Adelheid Weidinger; Jure Dimec; Brane Leskošek; Antonietta Rosa Silini; Tina Cirman; Susanne Wolbank; Taja Železnik Ramuta; Urška Dragin Jerman; Assunta Pandolfi; Roberta Di Pietro; Michela Pozzobon; Bernd Giebel; Günther Eissner; Polonca Ferk; Ingrid Lang-Olip; Francesco Alviano; Olga Soritau; Ornella Parolini; Mateja Erdani Kreft
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Many preclinical studies have shown that birth-associated tissues, cells and their secreted factors, otherwise known as perinatal derivatives (PnD), possess various biological properties that make them suitable therapeutic candidates for the treatment of numerous pathological conditions. Nevertheless, in the field of PnD research, there is a lack of critical evaluation of the PnD standardization process: from preparation to in vitro testing, an issue that may ultimately delay clinical translation. In this paper, we present the PnD e-questionnaire developed to assess the current state of the art of methods used in the published literature for the procurement, isolation, culturing preservation and characterization of PnD in vitro. Furthermore, we also propose a consensus for the scientific community on the minimal criteria that should be reported to facilitate standardization, reproducibility and transparency of data in PnD research. Lastly, based on the data from the PnD e-questionnaire, we recommend to provide adequate information on the characterization of the PnD. The PnD e-questionnaire is now freely available to the scientific community in order to guide researchers on the minimal criteria that should be clearly reported in their manuscripts. This review is a collaborative effort from the COST SPRINT action (CA17116), which aims to guide future research to facilitate the translation of basic research findings on PnD into clinical practice.

  11. W

    Data from: ANALYTICAL PERSPECTIVES ON SETTING ENVIRONMENTAL STANDARDS

    • cloud.csiss.gmu.edu
    • data.wu.ac.at
    pdf
    Updated Aug 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Energy Data Exchange (2019). ANALYTICAL PERSPECTIVES ON SETTING ENVIRONMENTAL STANDARDS [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/analytical-perspectives-on-setting-environmental-standards
    Explore at:
    pdf(13012065)Available download formats
    Dataset updated
    Aug 8, 2019
    Dataset provided by
    Energy Data Exchange
    Description

    Natural scientists, engineers, economists, political scientists, and policy analysts tend to perceive the process of health, safety, and environmental standard setting in radically different ways. Each of these five perspectives has some validity and value: the standard-setting process is so multi-faceted that, like sculpture, it can best be understood when viewd from several vantage points. In this report, I first view the standard setting process from the angels of analytical vision of natural scientists, engineers, economists, political scientists, and policy analysts, in turn.

  12. MCNA - Population Points with T/D Standards

    • data.ca.gov
    • data.chhs.ca.gov
    • +4more
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MCNA - Population Points with T/D Standards [Dataset]. https://data.ca.gov/dataset/mcna-population-points-with-t-d-standards
    Explore at:
    zip, kml, html, arcgis geoservices rest api, csv, geojsonAvailable download formats
    Dataset updated
    Mar 1, 2023
    Dataset authored and provided by
    California Department of Health Care Serviceshttp://www.dhcs.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    Updated 10/6/2022: In the Time/Distance analysis process, points that were found to have been included initially, but with no significant or year-round population were removed. The layer of removed points is also available for viewing. MCNA - Removed Population Points

    The Network Adequacy Standards Representative Population Points feature layer contains 97,694 points spread across California that were created from USPS postal delivery route data and US Census data. Each population point also contains the variables for Time and Distance Standards for the County that the point is within. These standards differ by County due to the County "type" which is based on the population density of the county. There are 5 county categories within California: Rural (<50 people/sq mile), Small (51-200 people/sq mile), Medium (201-599 people/sq mile), and Dense (>600 people/sq mile). The Time and Distance data is divided out by Provider Type, Adult and Pediatric separately, so that the Time or Distance analysis can be performed with greater detail.
    • Hospitals
    • OB/GYN Specialty
    • Adult Cardiology/Interventional Cardiology
    • Adult Dermatology
    • Adult Endocrinology
    • Adult ENT/Otolaryngology
    • Adult Gastroenterology
    • Adult General Surgery
    • Adult Hematology
    • Adult HIV/AIDS/Infectious Disease
    • Adult Mental Health Outpatient Services
    • Adult Nephrology
    • Adult Neurology
    • Adult Oncology
    • Adult Ophthalmology
    • Adult Orthopedic Surgery
    • Adult PCP
    • Adult Physical Medicine and Rehabilitation
    • Adult Psychiatry
    • Adult Pulmonology
    • Pediatric Cardiology/Interventional Cardiology
    • Pediatric Dermatology
    • Pediatric Endocrinology
    • Pediatric ENT/Otolaryngology
    • Pediatric Gastroenterology
    • Pediatric General Surgery
    • Pediatric Hematology
    • Pediatric HIV/AIDS/Infectious Disease
    • Pediatric Mental Health Outpatient Services
    • Pediatric Nephrology
    • Pediatric Neurology
    • Pediatric Oncology
    • Pediatric Ophthalmology
    • Pediatric Orthopedic Surgery
    • Pediatric PCP
    • Pediatric Physical Medicine and Rehabilitation
    • Pediatric Psychiatry
    • Pediatric Pulmonology
  13. Living Standards Survey 1995 -1997 - China

    • microdata.fao.org
    Updated Nov 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Centre for Rural Economy (2022). Living Standards Survey 1995 -1997 - China [Dataset]. https://microdata.fao.org/index.php/catalog/1533
    Explore at:
    Dataset updated
    Nov 8, 2022
    Dataset provided by
    World Bankhttp://worldbank.org/
    Research Centre for Rural Economy
    Time period covered
    1995 - 1997
    Area covered
    China
    Description

    Abstract

    China Living Standards Survey (LSS) consists of one household survey and one community (village) survey, conducted in Hebei and Liaoning Provinces (northern and northeast China) in July 1995 and July 1997 respectively. Five villages from each three sample counties of each province were selected (six were selected in Liaoyang County of Liaoning Province because of administrative area change). About 880 farm households were selected from total thirty-one sample villages for the household survey. The same thirty-one villages formed the samples of community survey. This document provides information on the content of different questionnaires, the survey design and implementation, data processing activities, and the different available data sets.

    Geographic coverage

    Regional

    Analysis unit

    Households

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The China LSS sample is not a rigorous random sample drawn from a well-defined population. Instead it is only a rough approximation of the rural population in Hebei and Liaoning provinces in North-eastern China. The reason for this is that part of the motivation for the survey was to compare the current conditions with conditions that existed in Hebei and Liaoning in the 1930's. Because of this, three counties in Hebei and three counties in Liaoning were selected as "primary sampling units" because data had been collected from those six counties by the Japanese occupation government in the 1930's. Within each of these six counties (xian) five villages (cun) were selected, for an overall total of 30 villages (in fact, an administrative change in one village led to 31 villages being selected). In each county a "main village" was selected that was in fact a village that had been surveyed in the 1930s. Because of the interest in these villages 50 households were selected from each of these six villages (one for each of the six counties). In addition, four other villages were selected in each county. These other villages were not drawn randomly but were selected so as to "represent" variation within the county. Within each of these villages 20 households were selected for interviews. Thus, the intended sample size was 780 households, 130 from each county. Unlike county and village selection, the selection of households within each village was done according to standard sample selection procedures. In each village, a list of all households in the village was obtained from village leaders. An "interval" was calculated as the number of the households in the village divided by the number of households desired for the sample (50 for main villages and 20 for other villages). For the list of households, a random number was drawn between 1 and the interval number. This was used as a starting point. The interval was then added to this number to get a second number, then the interval was added to this second number to get a third number, and so on. The set of numbers produced were the numbers used to select the households, in terms of their order on the list. In fact, the number of households in the sample is 785, as opposed to 780. Most of this difference is due to a village in which 24 households were interviewed, as opposed to the goal of 20 households

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    (a) DATA ENTRY All responses obtained from the household interviews were recorded in the household questionnaires. These were then entered into the computer, in the field, using data entry programs written in BASIC. The data produced by the data entry program were in the form of household files, i.e. one data file for all of the data in one household/community questionnaire. Thus, for the household there were about 880 data files. These data files were processed at the University of Toronto and the World Bank to produce datasets in statistical software formats, each of which contained information for all households for a subset of variables. The subset of variables chosen corresponded to data entry screens, so these files are hereafter referred to as "screen files". For the household survey component 66 data files were created. Members of the survey team checked and corrected data by checking the questionnaires for original recorded information. We would like to emphasize that correction here refers to checking questionnaires, in case of errors in skip patterns, incorrect values, or outlying values, and changing values if and only if data in the computer were different from those in the questionnaires. The personnel in charge of data preparation were given specific instructions not to change data even if values in the questionnaires were clearly incorrect. We have no reason to believe that these instructions were not followed, and every reason to believe that the data resulting from these checks and corrections are accurate and of the highest quality possible.

    (b) DATA EDITING The screen files were then brought to World Bank headquarters in Washington, D.C. and uploaded to a mainframe computer, where they were converted to "standard" LSMS formats by merging datasets to produce separate datasets for each section with variable names corresponding to the questionnaires. In some cases, this has meant a single dataset for a section, while in others it has meant retaining "screen" datasets with just the variable names changed. Linking Parts of the Household Survey Each household has a unique identification number which is contained in the variable HID. Values for this variable range from 10101 to 60520. The first number is the code for the six counties in which data were collected, the second and third digits are for the villages within each county. Finally, the last two digits of HID contain the household number within the village. Data for households from different parts of the survey can be merged by using the HID variable which appears in each dataset of the household survey. To link information for an individual use should be made of both the household identification number, HID, and the person identification number, PID. A child in the household can be linked to the parents, if the parents are household members, through the parents' id codes in Section 01B. For parents who are not in the household, information is collected on the parent's schooling, main occupation and whether he/she is currently alive. Household members can be linked with their non-resident children through the parents' id codes in Section 01C. Linking the Household to the Community Data The community data have a somewhat different set of identifying variables than the household data. Each community dataset has four identifying variables: province (code 7 for Hebei and code 8 for Liaoning); county (six two digit codes, of which the first digit represents province and the second digit represents the three counties in each province); township (3 digit code, first digit is county, second digit is county and third digit is township); and village (4 digit code, first digit is county, second digit is county, third digit is township, and third fourth digit is village). Constructed Data Set Researchers at the World Bank and the University of Toronto have created a data set with information on annual household expenditures, region codes, etc. This constructed data set is made available for general use with the understanding that the description below is the only documentation that will be provided. Any manipulation of the data requires assumptions to be made and, as much as possible, those assumptions are explained below. Except where noted, the data set has been created using only the original (raw) data sets. A researcher could construct a somewhat different data set by incorporating different assumptions. Aggregate Expenditure, TOTEXP. The dataset TOTEXP contains variables for total household annual expenditures (for the year 1994) and variables for the different components of total household expenditures: food expenditures, non-food expenditures, use value of consumer durables, etc. These, along with the algorithm used to calculate household expenditures are detailed in Appendix D. The dataset also contains the variable HID, which can be used to match this dataset to the household level data set. Note that all of the expenditure variables are totals for the household. That is, they are not in per capita terms. Researchers will have to divide these variables by household size to get per capita numbers. The household size variable is included in the data set.

  14. f

    Data from: Standardizing Protein Corona Characterization in Nanomedicine: A...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Akbar Ashkarran; Hassan Gharibi; Seyed Majed Modaresi; Amir Ata Saei; Morteza Mahmoudi (2024). Standardizing Protein Corona Characterization in Nanomedicine: A Multicenter Study to Enhance Reproducibility and Data Homogeneity [Dataset]. http://doi.org/10.1021/acs.nanolett.4c02076.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    ACS Publications
    Authors
    Ali Akbar Ashkarran; Hassan Gharibi; Seyed Majed Modaresi; Amir Ata Saei; Morteza Mahmoudi
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    We recently revealed significant variability in protein corona characterization across various proteomics facilities, indicating that data sets are not comparable between independent studies. This heterogeneity mainly arises from differences in sample preparation protocols, mass spectrometry workflows, and raw data processing. To address this issue, we developed standardized protocols and unified sample preparation workflows, distributing uniform protein corona digests to several top-performing proteomics centers from our previous study. We also examined the influence of using similar mass spectrometry instruments on data homogeneity and standardized database search parameters and data processing workflows. Our findings reveal a remarkable stepwise improvement in protein corona data uniformity, increasing overlaps in protein identification from 11% to 40% across facilities using similar instruments and through a uniform database search. We identify the key parameters behind data heterogeneity and provide recommendations for designing experiments. Our findings should significantly advance the robustness of protein corona analysis for diagnostic and therapeutics applications.

  15. g

    Pilot of the Open Contracting Data Standard (250 contract records) |...

    • gimi9.com
    Updated Dec 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Pilot of the Open Contracting Data Standard (250 contract records) | gimi9.com [Dataset]. https://gimi9.com/dataset/ca_60f22648-c173-446f-aa8a-4929d75d63e3
    Explore at:
    Dataset updated
    Dec 23, 2021
    Description

    This dataset includes the results of the pilot activity that Public Services and Procurement Canada undertook as part of Canada’s 2018-2020 National Action Plan on Open Government. The purpose is to demonstrate the usage and implementation of the Open Contracting Data Standard (OCDS). OCDS is an international data standard that is used to standardize how contracting data and documents can be published in an accessible, structured, and repeatable way. OCDS uses a standard language for contracting data that can be understood by all users. What procurement data is included in the OCDS Pilot? Procurement data included as part of this pilot is a cross-section of at least 250 contract records for a variety of contracts, including major projects. Methodology and lessons learned The Lessons Learned Report documents the methodology used and the lessons learned during the process of compiling the pilot data.

  16. h

    Data from: Standardized Streamflow Index time series for 369 rivers across...

    • heidata.uni-heidelberg.de
    zip
    Updated Oct 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik Tijdeman; Kerstin Stahl; Lena M. Tallaksen; Erik Tijdeman; Kerstin Stahl; Lena M. Tallaksen (2020). Standardized Streamflow Index time series for 369 rivers across Europe [Dataset]. http://doi.org/10.11588/DATA/PFDJI1
    Explore at:
    zip(24721473)Available download formats
    Dataset updated
    Oct 23, 2020
    Dataset provided by
    heiDATA
    Authors
    Erik Tijdeman; Kerstin Stahl; Lena M. Tallaksen; Erik Tijdeman; Kerstin Stahl; Lena M. Tallaksen
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/PFDJI1https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/PFDJI1

    Time period covered
    Jan 1, 1965 - Jan 1, 2009
    Area covered
    Europe
    Dataset funded by
    Research Network Water by the Ministry of Science, Research, and the Arts of the German Federal State of Baden‐Württemberg
    Description

    There are many ways to characterize the streamflow drought hazard. Recently, the use of anomaly indices, such as the Standardized Streamflow index (SSI), a probability index-based approach adopted from the climatological community, increased in popularity. The SSI can be calculated based on various probability distributions that can be fitted using different methods. Up to now, there is no consensus on which method to use. This data set contains SSI time series of 369 rivers located across Europe derived with seven different probability distributions and two fitting methods. These data were used to investigate the sensitivity of the SSI, and drought characteristics derived from SSI time series, to the used distribution and fitting method. The dataset also contains ensembles of SSI time series derived from resampled data. These resampled SSI time series were used to investigate the sensitivity of the SSI to various sample properties as well as to estimate its uncertainty.

  17. f

    Data_Sheet_1_Best Practice Data Standards for Discrete Chemical...

    • figshare.com
    • frontiersin.figshare.com
    txt
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li-Qing Jiang; Denis Pierrot; Rik Wanninkhof; Richard A. Feely; Bronte Tilbrook; Simone Alin; Leticia Barbero; Robert H. Byrne; Brendan R. Carter; Andrew G. Dickson; Jean-Pierre Gattuso; Dana Greeley; Mario Hoppema; Matthew P. Humphreys; Johannes Karstensen; Nico Lange; Siv K. Lauvset; Ernie R. Lewis; Are Olsen; Fiz F. Pérez; Christopher Sabine; Jonathan D. Sharp; Toste Tanhua; Thomas W. Trull; Anton Velo; Andrew J. Allegra; Paul Barker; Eugene Burger; Wei-Jun Cai; Chen-Tung A. Chen; Jessica Cross; Hernan Garcia; Jose Martin Hernandez-Ayon; Xinping Hu; Alex Kozyr; Chris Langdon; Kitack Lee; Joe Salisbury; Zhaohui Aleck Wang; Liang Xue (2023). Data_Sheet_1_Best Practice Data Standards for Discrete Chemical Oceanographic Observations.csv [Dataset]. http://doi.org/10.3389/fmars.2021.705638.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Li-Qing Jiang; Denis Pierrot; Rik Wanninkhof; Richard A. Feely; Bronte Tilbrook; Simone Alin; Leticia Barbero; Robert H. Byrne; Brendan R. Carter; Andrew G. Dickson; Jean-Pierre Gattuso; Dana Greeley; Mario Hoppema; Matthew P. Humphreys; Johannes Karstensen; Nico Lange; Siv K. Lauvset; Ernie R. Lewis; Are Olsen; Fiz F. Pérez; Christopher Sabine; Jonathan D. Sharp; Toste Tanhua; Thomas W. Trull; Anton Velo; Andrew J. Allegra; Paul Barker; Eugene Burger; Wei-Jun Cai; Chen-Tung A. Chen; Jessica Cross; Hernan Garcia; Jose Martin Hernandez-Ayon; Xinping Hu; Alex Kozyr; Chris Langdon; Kitack Lee; Joe Salisbury; Zhaohui Aleck Wang; Liang Xue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Effective data management plays a key role in oceanographic research as cruise-based data, collected from different laboratories and expeditions, are commonly compiled to investigate regional to global oceanographic processes. Here we describe new and updated best practice data standards for discrete chemical oceanographic observations, specifically those dealing with column header abbreviations, quality control flags, missing value indicators, and standardized calculation of certain properties. These data standards have been developed with the goals of improving the current practices of the scientific community and promoting their international usage. These guidelines are intended to standardize data files for data sharing and submission into permanent archives. They will facilitate future quality control and synthesis efforts and lead to better data interpretation. In turn, this will promote research in ocean biogeochemistry, such as studies of carbon cycling and ocean acidification, on regional to global scales. These best practice standards are not mandatory. Agencies, institutes, universities, or research vessels can continue using different data standards if it is important for them to maintain historical consistency. However, it is hoped that they will be adopted as widely as possible to facilitate consistency and to achieve the goals stated above.

  18. Z

    Data from: A Standardized European Hexagon Gridded Dataset Based on...

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyun Woo Kim (2023). A Standardized European Hexagon Gridded Dataset Based on OpenStreetMap POIs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7854524
    Explore at:
    Dataset updated
    May 6, 2023
    Dataset provided by
    Hyun Woo Kim
    Dakota McCarty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Point of interest (POI) data refers to information about the location and type of amenities, services, and attractions within a geographic area. This data is used in urban studies research to better understand the dynamics of a city, assess community needs, and identify opportunities for economic growth and development. POI data is beneficial because it provides a detailed picture of the resources available in a given area, which can inform policy decisions and improve the quality of life for residents. This paper presents a large-scale, standardized POI dataset from OpenStreetMap (OSM) for the European continent. The dataset's standardization and gridding make it more efficient for advanced modeling, reducing 7,218,304 data points to 988,575 without significant resolution loss, suitable for a broader range of models with lower computational demands. The resulting dataset can be used to conduct advanced analyses, examine POI spatial distributions, conduct comparative regional studies, enhancing understanding of the economic activity, distribution, attractions, and subsequently, economic health, growth potential, and cultural opportunities. The paper describes the materials and methods used in generating the dataset, including OSM data retrieval, processing, standardization, and hexagonal grid generation. The dataset can be used independently or integrated with other relevant datasets for more comprehensive spatial distribution studies in future research.

  19. m

    RAW DATA ON THERMAL STABILITY, LIGAND BINDING AND ALLERGENICITY OF Mus m...

    • data.mendeley.com
    • search.datacite.org
    Updated Feb 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Ferrari (2020). RAW DATA ON THERMAL STABILITY, LIGAND BINDING AND ALLERGENICITY OF Mus m 1.0102 ALLERGEN AND ITS SELECTED CYSTEINE MUTANTS [Dataset]. http://doi.org/10.17632/yscyvryz8t.1
    Explore at:
    Dataset updated
    Feb 16, 2020
    Authors
    Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data refer to the allergen Mus m 1.0102 and its cysteine mutants MM-C138A, MM-C157A and MM-C138,157A. The data describes protein fold stability, ligand binding ability and allergenic potential. They were obtained by means of: 1) a Dynamic Light Scattering-based thermal stability assay, 2) a Fluorescence-based ligand-binding assay and 3) a basophil degranulation test.

    Analysis of the raw data produced the temperatures corresponding to the onset of the protein unfolding, the dissociation constants for N-Phenyl-1-naphthylamine ligand and the profiles of b-hexosaminidase release from RBL cells, sensitized with the serum of selected allergic patients and incubated with increasing protein concentrations. The data highlight the enhanced thermal stability of MM-C138A mutant, without a relevant modification of its binding function and in vitro allergenicity. The data contribute to the process of the recombinant allergen standardization, focused to its potential use in immunotherapy and diagnostics applications.

  20. O

    Publishing standards—data.qld.gov.au

    • data.qld.gov.au
    csv, html, xlsx
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Data Administration (data requests) (2025). Publishing standards—data.qld.gov.au [Dataset]. https://www.data.qld.gov.au/dataset/publishing-standards-data-qld-gov-au
    Explore at:
    html(1 bytes), xlsx(38.5 KiB), xlsx(33.5 KiB), xlsx(36.5 KiB), csv(401 bytes), csv(407 bytes), xlsx(37 KiB)Available download formats
    Dataset updated
    Mar 25, 2025
    Dataset authored and provided by
    Open Data Administration (data requests)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Queensland Government, Queensland
    Description

    A set of guides and standards for Queensland Government open data portal (https://www.data.qld.gov.au) publishers. This includes portal process guides and relevant open data file creation information.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Address Standardization [Dataset]. https://hub.arcgis.com/content/6c8e054fbdde4564b3b416eacaed3539
Organization logo

Address Standardization

Explore at:
Dataset updated
Jul 25, 2022
Dataset authored and provided by
Esrihttp://esri.com/
Description

This deep learning model is used to transform incorrect and non-standard addresses into standardized addresses. Address standardization is a process of formatting and correcting addresses in accordance with global standards. It includes all the required address elements (i.e., street number, apartment number, street name, city, state, and postal) and is used by the standard postal service.

      An address can be termed as non-standard because of incomplete details (missing street name or zip code), invalid information (incorrect address), incorrect information (typos, misspellings, formatting of abbreviations), or inaccurate information (wrong house number or street name). These errors make it difficult to locate a destination. Although a standardized address does not guarantee the address validity, it simply converts an address into the correct format. This deep learning model is trained on address dataset provided by openaddresses.io and can be used to standardize addresses from 10 different countries.



  Using the model


      Follow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.



    Fine-tuning the modelThis model can be fine-tuned using the Train Deep Learning Model tool. Follow the guide to fine-tune this model.Input
    Text (non-standard address) on which address standardization will be performed.

    Output
    Text (standard address)

    Supported countries
    This model supports addresses from the following countries:

      AT – Austria
      AU – Australia
      CA – Canada
      CH – Switzerland
      DK – Denmark
      ES – Spain
      FR – France
      LU – Luxemburg
      SI – Slovenia
      US – United States

    Model architecture
    This model uses the T5-base architecture implemented in Hugging Face Transformers.
    Accuracy metrics
    This model has an accuracy of 90.18 percent.

    Training dataThe model has been trained on openly licensed data from openaddresses.io.Sample results
    Here are a few results from the model.
Search
Clear search
Close search
Google apps
Main menu