61 datasets found
  1. Data from: Inventory of online public databases and repositories holding...

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  2. d

    August 2025 data-update for "Updated science-wide author databases of...

    • elsevier.digitalcommonsdata.com
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John P.A. Ioannidis (2025). August 2025 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.8
    Explore at:
    Dataset updated
    Sep 19, 2025
    Authors
    John P.A. Ioannidis
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2024 and single recent year data pertain to citations received during calendar year 2024. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2025 snapshot from Scopus, updated to end of citation year 2024. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2025. If an author is not on the list, it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a

  3. D

    Unified Data Repository Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Unified Data Repository Market Research Report 2033 [Dataset]. https://dataintelo.com/report/unified-data-repository-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Unified Data Repository Market Outlook



    According to our latest research, the unified data repository market size reached USD 8.4 billion in 2024 on a global scale. The market is witnessing robust momentum, driven by the exponential growth of enterprise data and the need for streamlined data management solutions. The market is projected to expand at a notable CAGR of 14.7% during the forecast period, with the total value anticipated to reach USD 26.2 billion by 2033. This significant growth trajectory is underpinned by the increasing adoption of cloud-based solutions, the proliferation of big data analytics, and a growing emphasis on regulatory compliance and data governance across various industries.




    One of the primary growth factors propelling the unified data repository market is the relentless surge in data volumes generated by organizations across all sectors. With the proliferation of digital transformation initiatives, enterprises are experiencing unprecedented data growth, originating from diverse sources such as IoT devices, customer interactions, business operations, and social media. Managing, integrating, and extracting value from this deluge of data has become a strategic imperative. Unified data repositories offer a centralized platform that enables organizations to consolidate disparate data silos, improve data accessibility, and enhance decision-making capabilities. As businesses increasingly recognize the value of data-driven insights, the demand for robust unified data repository solutions is set to accelerate further.




    Another critical driver for the unified data repository market is the growing need for compliance with stringent data protection and privacy regulations. Regulatory frameworks such as GDPR in Europe, CCPA in California, and other local data governance mandates require organizations to maintain high levels of data integrity, security, and transparency. Unified data repositories facilitate centralized control and monitoring of data assets, ensuring that organizations can efficiently manage data lineage, access controls, and audit trails. This capability not only helps mitigate compliance risks but also fosters trust among stakeholders and customers. Consequently, sectors such as BFSI, healthcare, and government are increasingly investing in unified data repository solutions to uphold regulatory standards and safeguard sensitive information.




    Technological advancements and the integration of artificial intelligence (AI) and machine learning (ML) capabilities are further enhancing the value proposition of unified data repositories. Modern solutions are equipped with advanced analytics, automated data classification, and intelligent data integration features that empower organizations to derive actionable insights from their data assets. The ability to seamlessly integrate with existing IT infrastructure and support multi-cloud deployments is also a key differentiator. These technological innovations are enabling organizations to unlock new business opportunities, optimize operational efficiency, and gain a competitive edge in the digital economy. As a result, the unified data repository market is experiencing heightened adoption across both large enterprises and small and medium-sized enterprises (SMEs).




    From a regional perspective, North America continues to dominate the unified data repository market, accounting for the largest revenue share in 2024. The region’s leadership is attributed to the high concentration of technology-driven enterprises, early adoption of advanced data management solutions, and a mature regulatory environment. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitalization, expanding IT infrastructure, and increasing investments in cloud technologies. Europe remains a significant market, driven by stringent data protection regulations and strong demand from the BFSI and healthcare sectors. The Middle East & Africa and Latin America are also witnessing steady growth, supported by rising awareness of data management best practices and ongoing digital transformation initiatives.



    Component Analysis



    The unified data repository market is segmented by component into software, hardware, and services, each playing a crucial role in the overall ecosystem. The software segment holds the largest share, driven by the widespread adoption of advanced data management platforms that enable seamless integration, storage, and retriev

  4. C

    Cloud based Repository Service Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Apr 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Cloud based Repository Service Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-based-repository-service-1408951
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 29, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Cloud-based Repository Service market is experiencing robust growth, driven by the increasing adoption of cloud computing across diverse sectors. The market's expansion is fueled by several key factors. Firstly, the rising need for secure data storage and efficient data management solutions across industries like banking, healthcare, and retail is significantly boosting demand. Secondly, the inherent scalability and cost-effectiveness of cloud-based solutions compared to on-premise infrastructure are major attractions for businesses of all sizes. Furthermore, advancements in data security technologies and robust backup services are enhancing trust and reliability, propelling market growth. While the initial investment might seem higher, the long-term cost savings associated with reduced infrastructure maintenance and operational expenses make cloud repositories a compelling option. The integration services offered within this market further streamline workflows and data accessibility, improving efficiency and productivity. We project a substantial market size, conservatively estimating it at $150 billion in 2025, with a CAGR of 15% predicted through 2033, indicating a significant market opportunity. The market segmentation reveals strong performance across various application areas. Banking and financial services exhibit high adoption rates due to the stringent regulatory compliance and data security requirements. Healthcare, with its growing volume of sensitive patient data, is also a major contributor. Retail, automotive, and education sectors are showing increasing interest in cloud-based repositories for better data management and analytics. The service types within the market—integration services, data security, and backup services—all show strong demand, reflecting the multifaceted nature of the market. Geographic growth is predominantly in North America and Europe, driven by early adoption and robust technological infrastructure. However, Asia-Pacific is expected to experience rapid growth in the coming years due to increasing digitalization and cloud adoption in developing economies like India and China. Competitive pressures among established players like IBM, Google, and emerging cloud service providers will continue to shape the market landscape, further driving innovation and affordability.

  5. s

    The Hydra-in-a-Box Survey on Digital Repositories

    • purl.stanford.edu
    Updated Jun 8, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah Frost; Gary Geisler; Mark A. Matienzo (2016). The Hydra-in-a-Box Survey on Digital Repositories [Dataset]. https://purl.stanford.edu/jk292fy8802
    Explore at:
    Dataset updated
    Jun 8, 2016
    Authors
    Hannah Frost; Gary Geisler; Mark A. Matienzo
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Hydra-in-a-Box is a collaborative project funded by the Institute for Museum and Library Services' National Digital Platform program to design, build, and deliver digital repository software that supports networked resources and services for digital collections. The project team conducted a web-based survey of the digital library, archives, and museum community in July 2015. The purpose of the survey was to gather information about the digital repository solutions that institutions are currently using, the types and sizes of content they are managing, their likes and dislikes with current software, and what features they’d like to see in future repository software. The team is using the survey data to better understand the current landscape of repository solutions in use by libraries, archives, and museums, and to inform the Hydra-in-a-Box product design process. For purposes of this survey, a repository is defined as: a system or service used intentionally to manage digital resources (files and metadata) for discovery, access, and/or preservation. A repository is not the same as a file system or a back-up of a file system. A repository may be open source or proprietary. A repository may be operated locally or by a third-party service provider.

  6. E-Commerce Data

    • kaggle.com
    zip
    Updated Aug 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carrie (2017). E-Commerce Data [Dataset]. https://www.kaggle.com/datasets/carrie1/ecommerce-data
    Explore at:
    zip(7548686 bytes)Available download formats
    Dataset updated
    Aug 17, 2017
    Authors
    Carrie
    Description

    Context

    Typically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".

    Content

    "This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."

    Acknowledgements

    Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.

    Image from stocksnap.io.

    Inspiration

    Analyses for this dataset could include time series, clustering, classification and more.

  7. H

    Sharing Data and Research Products in the HydroShare Repository to Enhance...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery S. Horsburgh (2025). Sharing Data and Research Products in the HydroShare Repository to Enhance Transparency and Reproducibility of Scientific Research [Dataset]. https://www.hydroshare.org/resource/8a1d39b94fe041efa7588eb26c9c2ea2
    Explore at:
    zip(89.1 MB)Available download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    HydroShare
    Authors
    Jeffery S. Horsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HydroShare is a web-based repository and hydrologic information system operated by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) that enables users to share, collaborate around, and publish data, models, code, and applications associated with water related research. This seminar will focus on the capabilities of the HydroShare repository and functionality related to submitting, sharing, and publishing data and research products. It will cover HydroShare’s resource data model, describing HydroShare resources with metadata, and some best practices for depositing data and research products in HydroShare. It will also cover how information technology and best practices can enhance the transparency, reproducibility, and trust in the findings of water-related research by making hydrologic information more findable, accessible, interoperable and reusable (FAIR), and through linked computational systems simplifying the workflows needed for hydrologic modeling and analysis.

    This presentation was delivered on December 7, 2021 at the USAID Center for Excellence for Water at Alexandria University Webinar Series.

  8. I

    ARL IR Metadata Documentation Website Review Data

    • databank.illinois.edu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayla Stein Kenfield, ARL IR Metadata Documentation Website Review Data [Dataset]. http://doi.org/10.13012/B2IDB-7323993_V1
    Explore at:
    Authors
    Ayla Stein Kenfield
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spreadsheet with data about whether or not the indicated institutional repository website provides metadata documentation. See readme file for more information.

  9. Summary of supplemental data links by type.

    • plos.figshare.com
    xls
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristin A. Briney (2024). Summary of supplemental data links by type. [Dataset]. http://doi.org/10.1371/journal.pone.0304781.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kristin A. Briney
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To determine where data is shared and what data is no longer available, this study analyzed data shared by researchers at a single university. 2166 supplemental data links were harvested from the university’s institutional repository and web scraped using R. All links that failed to scrape or could not be tested algorithmically were tested for availability by hand. Trends in data availability by link type, age of publication, and data source were examined for patterns. Results show that researchers shared data in hundreds of places. About two-thirds of links to shared data were in the form of URLs and one-third were DOIs, with several FTP links and links directly to files. A surprising 13.4% of shared URL links pointed to a website homepage rather than a specific record on a website. After testing, 5.4% the 2166 supplemental data links were found to be no longer available. DOIs were the type of shared link that was least likely to disappear with a 1.7% loss, with URL loss at 5.9% averaged over time. Links from older publications were more likely to be unavailable, with a data disappearance rate estimated at 2.6% per year, as well as links to data hosted on journal websites. The results support best practice guidance to share data in a data repository using a permanent identifier.

  10. n

    Data from: Development of Data Dictionary for neonatal intensive care unit:...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Dec 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harpreet Singh; Ravneet Kaur; Satish Saluja; Su Cho; Avneet Kaur; Ashish Pandey; Shubham Gupta; Ritu Das; Praveen Kumar; Jonathan Palma; Gautam Yadav; Yao Sun (2020). Development of Data Dictionary for neonatal intensive care unit: advancement towards a better critical care unit [Dataset]. http://doi.org/10.5061/dryad.zkh18936f
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 27, 2020
    Dataset provided by
    CHIL
    Indraprastha Institute of Information Technology Delhi
    Apollo Cradle For Women & Children
    Lucile Packard Children's Hospital
    Sir Ganga Ram Hospital
    Post Graduate Institute of Medical Education and Research
    KLKH
    Ewha Womans University
    UCSF Benioff Children's Hospital
    Authors
    Harpreet Singh; Ravneet Kaur; Satish Saluja; Su Cho; Avneet Kaur; Ashish Pandey; Shubham Gupta; Ritu Das; Praveen Kumar; Jonathan Palma; Gautam Yadav; Yao Sun
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background: Critical care units (CCUs) with wide use of various monitoring devices generate massive data. To utilize the valuable information of these devices; data are collected and stored using systems like Clinical Information System (CIS), Laboratory Information Management System (LIMS), etc. These systems are proprietary in nature, allow limited access to their database and have vendor specific clinical implementation. In this study we focus on developing an open source web-based meta-data repository for CCU representing stay of patient with relevant details.

    Methods: After developing the web-based open source repository we analyzed prospective data from two sites for four months for data quality dimensions (completeness, timeliness, validity, accuracy and consistency), morbidity and clinical outcomes. We used a regression model to highlight the significance of practice variations linked with various quality indicators. Results: Data dictionary (DD) with 1447 fields (90.39% categorical and 9.6% text fields) is presented to cover clinical workflow of NICU. The overall quality of 1795 patient days data with respect to standard quality dimensions is 87%. The data exhibit 82% completeness, 97% accuracy, 91% timeliness and 94% validity in terms of representing CCU processes. The data scores only 67% in terms of consistency. Furthermore, quality indicator and practice variations are strongly correlated (p-value < 0.05).

    Results: Data dictionary (DD) with 1555 fields (89.6% categorical and 11.4% text fields) is presented to cover clinical workflow of a CCU. The overall quality of 1795 patient days data with respect to standard quality dimensions is 87%. The data exhibit 82% completeness, 97% accuracy, 91% timeliness and 94% validity in terms of representing CCU processes. The data scores only 67% in terms of consistency. Furthermore, quality indicators and practice variations are strongly correlated (p-value < 0.05).

    Conclusion: This study documents DD for standardized data collection in CCU. This provides robust data and insights for audit purposes and pathways for CCU to target practice improvements leading to specific quality improvements.

  11. Online Retail Transaction Data

    • kaggle.com
    zip
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Online Retail Transaction Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/online-retail-transaction-data
    Explore at:
    zip(9098240 bytes)Available download formats
    Dataset updated
    Dec 21, 2023
    Authors
    The Devastator
    Description

    Online Retail Transaction Data

    UK Online Retail Sales and Customer Transaction Data

    By UCI [source]

    About this dataset

    Comprehensive Dataset on Online Retail Sales and Customer Data

    Welcome to this comprehensive dataset offering a wide array of information related to online retail sales. This data set provides an in-depth look at transactions, product details, and customer information documented by an online retail company based in the UK. The scope of the data spans vastly, from granular details about each product sold to extensive customer data sets from different countries.

    This transnational data set is a treasure trove of vital business insights as it meticulously catalogues all the transactions that happened during its span. It houses rich transactional records curated by a renowned non-store online retail company based in the UK known for selling unique all-occasion gifts. A considerable portion of its clientele includes wholesalers; ergo, this dataset can prove instrumental for companies looking for patterns or studying purchasing trends among such businesses.

    The available attributes within this dataset offer valuable pieces of information:

    • InvoiceNo: This attribute refers to invoice numbers that are six-digit integral numbers uniquely assigned to every transaction logged in this system. Transactions marked with 'c' at the beginning signify cancellations - adding yet another dimension for purchase pattern analysis.

    • StockCode: Stock Code corresponds with specific items as they're represented within the inventory system via 5-digit integral numbers; these allow easy identification and distinction between products.

    • Description: This refers to product names, giving users qualitative knowledge about what kind of items are being bought and sold frequently.

    • Quantity: These figures ascertain the volume of each product per transaction – important figures that can help understand buying trends better.

    • InvoiceDate: Invoice Dates detail when each transaction was generated down to precise timestamps – invaluable when conducting time-based trend analysis or segmentation studies.

    • UnitPrice: Unit prices represent how much each unit retails at — crucial for revenue calculations or cost-related analyses.

    Finally,

    • Country: This locational attribute shows where each customer hails from, adding geographical segmentation to your data investigation toolkit.

    This dataset was originally collated by Dr Daqing Chen, Director of the Public Analytics group based at the School of Engineering, London South Bank University. His research studies and business cases with this dataset have been published in various papers contributing to establishing a solid theoretical basis for direct, data and digital marketing strategies.

    Access to such records can ensure enriching explorations or formulating insightful hypotheses about consumer behavior patterns among wholesalers. Whether it's managing inventory or studying transactional trends over time or spotting cancellation patterns - this dataset is apt for multiple forms of retail analysis

    How to use the dataset

    1. Sales Analysis:

    Sales data forms the backbone of this dataset, and it allows users to delve into various aspects of sales performance. You can use the Quantity and UnitPrice fields to calculate metrics like revenue, and further combine it with InvoiceNo information to understand sales over individual transactions.

    2. Product Analysis:

    Each product in this dataset comes with its unique identifier (StockCode) and its name (Description). You could analyse which products are most popular based on Quantity sold or look at popularity per transaction by considering both Quantity and InvoiceNo.

    3. Customer Segmentation:

    If you associated specific business logic onto the transactions (such as calculating total amounts), then you could use standard machine learning methods or even RFM (Recency, Frequency, Monetary) segmentation techniques combining it with 'CustomerID' for your customer base to understand customer behavior better. Concatenating invoice numbers (which stand for separate transactions) per client will give insights about your clients as well.

    4. Geographical Analysis:

    The Country column enables analysts to study purchase patterns across different geographical locations.

    Practical applications

    Understand what products sell best where - It can help drive tailored marketing strategies. Anomalies detection – Identify unusual behaviors that might lead frau...

  12. Z

    Data from: Data repository of Predictors of Social Response to COVID-19...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Feb 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dalky, Heyam; Khraisat, Adam; khalifeh, Anas; Abu-hammad, Sawsan; Hamdan-Mansour, Ayman (2022). Data repository of Predictors of Social Response to COVID-19 among Health Care Workers Caring for Individuals with Confirmed COVID-19 in Jordan [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5893033
    Explore at:
    Dataset updated
    Feb 18, 2022
    Dataset provided by
    Assistant Professor, Department of Maternal and Child Nursing, Faculty of Nursing, Jordan University of Science & Technology
    Clinical Instructor, School of Nursing, Health Science Division Higher College of Technology, United Arab Emirates
    Professor, Psychiatric Nursing, School of Nursing, the University of Jordan
    Associate Professor, Psychiatric Mental Health, Faculty of Nursing, Jordan University of Science & Technology
    Zarqa University, Faculty of Nursing: Zarqa, Zarqa, JO
    Authors
    Dalky, Heyam; Khraisat, Adam; khalifeh, Anas; Abu-hammad, Sawsan; Hamdan-Mansour, Ayman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The outbreak of COVID-19 forced public health authorities around the world to call for national emergency plans. Public responses, in form of social discrimination and stigmatizing behaviors, are increasingly being observed against confirmed individuals with confirmed COVID-19 and healthcare workers (HCWs) caring for those individuals. Hence, this study aimed to investigate the perception of social discrimination and coping strategies, and explore predictors of social discrimination and coping toward COVID-19 among HCWs and individuals with confirmed COVID-19. This study used a cross-sectional descriptive-comparative design to collect data using a convenience sample of 105 individuals with confirmed COVID-19 and 109 HCWs using a web-based survey format. In this study, individuals confirmed with COVID-19 reported a high level of social discrimination compared with HCWs (t = 2.62, p < .01). While HCWs reported high level of coping with COVID-19 compared with individuals with COVID-19 (t = -3.91, p < .001). Educational level, age, monthly income, and taking over-the-counter medication were predictors of social discrimination and coping with COVID-19 among HCWs and individuals confirmed with COVID-19. In conclusion, the findings showed individuals with confirmed COVID-19 were more likely to face social discrimination and HCWs perform better coping with COVID-19 than individuals with confirmed COVID-19.

  13. C

    Mobile-First Website Design for Emergency Home Service Companies

    • caseysseo.com
    txt
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Miller (2025). Mobile-First Website Design for Emergency Home Service Companies [Dataset]. https://caseysseo.com/mobile-first-website-design-for-emergency-home-service-companies
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 26, 2025
    Dataset provided by
    Casey's SEO
    Authors
    Casey Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    Colorado Springs
    Variables measured
    Mobile Abandonment Rate, Mobile Page Load Speed Recommendation, Recommended Minimum Touch Target Size, Recommended Minimum Phone Number Font Size, Mobile Search Percentage During Emergencies, Increase in Mobile-Optimized Emergency Websites, Mobile Search Percentage for Emergency Services, Increase in Emergency Calls from Prominent Phone Number
    Measurement technique
    Field testing with mobile devices, Customer surveys, Analysis of industry benchmarks and performance data
    Description

    This dataset provides insights and best practices for designing mobile-first websites for emergency home service companies, with a focus on improving user experience, speed, and conversions during critical situations.

  14. d

    Point-of-Interest (POI) Data | Global Coverage | 250M Business Listings Data...

    • datarade.ai
    .json, .csv, .xls
    Updated Jan 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quadrant (2022). Point-of-Interest (POI) Data | Global Coverage | 250M Business Listings Data with Custom On-Demand Attributes [Dataset]. https://datarade.ai/data-products/quadrant-point-of-interest-poi-data-business-listings-dat-quadrant
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Jan 30, 2022
    Dataset authored and provided by
    Quadrant
    Area covered
    France
    Description

    We seek to mitigate the challenges with web-scraped and off-the-shelf POI data, and provide tailored, complete, and manually verified datasets with Geolancer. Our goal is to help represent the physical world accurately for applications and services dependent on precise POI data, and offer a reliable basis for geospatial analysis and intelligence.

    Our POI database is powered by our proprietary POI collection and verification platform, Geolancer, which provides manually verified, authentic, accurate, and up-to-date POI datasets.

    Enrich your geospatial applications with a contextual layer of comprehensive and actionable information on landmarks, key features, business areas, and many more granular, on-demand attributes. We offer on-demand data collection and verification services that fit unique use cases and business requirements. Using our advanced data acquisition techniques, we build and offer tailormade POI datasets. Combined with our expertise in location data solutions, we can be a holistic data partner for our customers.

    KEY FEATURES - Our proprietary, industry-leading manual verification platform Geolancer delivers up-to-date, authentic data points

    • POI-as-a-Service with on-demand verification and collection in 170+ countries leveraging our network of 1M+ contributors

    • Customise your feed by specific refresh rate, location, country, category, and brand based on your specific needs

    • Data Noise Filtering Algorithms normalise and de-dupe POI data that is ready for analysis with minimal preparation

    DATA QUALITY

    Quadrant’s POI data are manually collected and verified by Geolancers. Our network of freelancers, maps cities and neighborhoods adding and updating POIs on our proprietary app Geolancer on their smartphone. Compared to other methods, this process guarantees accuracy and promises a healthy stream of POI data. This method of data collection also steers clear of infringement on users’ privacy and sale of their location data. These purpose-built apps do not store, collect, or share any data other than the physical location (without tying context back to an actual human being and their mobile device).

    USE CASES

    The main goal of POI data is to identify a place of interest, establish its accurate location, and help businesses understand the happenings around that place to make better, well-informed decisions. POI can be essential in assessing competition, improving operational efficiency, planning the expansion of your business, and more.

    It can be used by businesses to power their apps and platforms for last-mile delivery, navigation, mapping, logistics, and more. Combined with mobility data, POI data can be employed by retail outlets to monitor traffic to one of their sites or of their competitors. Logistics businesses can save costs and improve customer experience with accurate address data. Real estate companies use POI data for site selection and project planning based on market potential. Governments can use POI data to enforce regulations, monitor public health and well-being, plan public infrastructure and services, and more. A few common and widespread use cases of POI data are:

    • Navigation and mapping for digital marketplaces and apps.
    • Logistics for online shopping, food delivery, last-mile delivery, and more.
    • Improving operational efficiency for rideshare and transportation platforms.
    • Demographic and human mobility studies for market consumption and competitive analysis.
    • Market assessment, site selection, and business expansion.
    • Disaster management and urban mapping for public welfare.
    • Advertising and marketing deployment and ROI assessment.
    • Real-estate mapping for online sales and renting platforms.About Geolancer

    ABOUT GEOLANCER

    Quadrant's POI-as-a-Service is powered by Geolancer, our industry-leading manual verification project. Geolancers, equipped with a smartphone running our proprietary app, manually add and verify POI data points, ensuring accuracy and authenticity. Geolancer helps data buyers acquire data with the update frequency suited for their specific use case.

  15. J

    Data associated with: The Maryland Food System Map

    • archive.data.jhu.edu
    • datasetcatalog.nlm.nih.gov
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johns Hopkins Center for a Livable Future (2023). Data associated with: The Maryland Food System Map [Dataset]. http://doi.org/10.7281/T1/QUDBC6
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Johns Hopkins Research Data Repository
    Authors
    Johns Hopkins Center for a Livable Future
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Maryland
    Description

    This dataset contains geospatial data, code, and documentation relevant to the Maryland Food System Map, a web mapping application maintained by the Johns Hopkins Center for a Livable Future between 2012 and 2023. Approximately 500 geospatial data layers that were featured on the application have been preserved here for use in future analyses of the food system in Maryland. The code behind the application has also been preserved in this dataset and can be used to better understand how the application worked and to develop similar applications in the future. The documentation provides more information about the Maryland Food System Map, including both the history of the application and how it was used. There is also metadata about when and where the data for data layers were obtained.

  16. g

    Coronavirus COVID-19 Global Cases by the Center for Systems Science and...

    • github.com
    • systems.jhu.edu
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) [Dataset]. https://github.com/CSSEGISandData/COVID-19
    Explore at:
    Dataset provided by
    Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
    Area covered
    Global
    Description

    2019 Novel Coronavirus COVID-19 (2019-nCoV) Visual Dashboard and Map:
    https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

    • Confirmed Cases by Country/Region/Sovereignty
    • Confirmed Cases by Province/State/Dependency
    • Deaths
    • Recovered

    Downloadable data:
    https://github.com/CSSEGISandData/COVID-19

    Additional Information about the Visual Dashboard:
    https://systems.jhu.edu/research/public-health/ncov

  17. D

    AIBOM Repositories Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). AIBOM Repositories Market Research Report 2033 [Dataset]. https://dataintelo.com/report/aibom-repositories-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AIBOM Repositories Market Outlook



    According to our latest research, the global AIBOM Repositories market size reached USD 2.7 billion in 2024, reflecting robust adoption across industries. The market is expected to grow at a CAGR of 20.1% from 2025 to 2033, projecting a substantial increase to approximately USD 15.9 billion by 2033. This remarkable growth trajectory is primarily driven by the escalating integration of artificial intelligence and business operations management (AIBOM) solutions, which are transforming data governance, automation, and analytics across diverse sectors.




    The primary growth factor for the AIBOM Repositories market is the rapid digitization of business processes, which has intensified the need for centralized, intelligent repositories that can manage and orchestrate vast volumes of structured and unstructured data. Organizations are increasingly leveraging AIBOM repositories to enhance operational efficiency, ensure compliance, and unlock actionable insights. As digital transformation initiatives accelerate, especially in sectors such as healthcare, finance, and manufacturing, the demand for scalable, secure, and AI-driven repositories is surging. Furthermore, the proliferation of big data and the growing need for real-time data processing are compelling enterprises to invest in advanced repository solutions that offer seamless integration with AI and machine learning frameworks.




    Another critical driver is the growing emphasis on data security, privacy, and regulatory compliance. With the advent of stringent data protection laws such as GDPR and CCPA, organizations are under mounting pressure to ensure robust data governance and auditability. AIBOM repositories, particularly those offering hybrid and private deployment options, are emerging as indispensable tools for organizations seeking to maintain control over sensitive information while enabling collaboration and innovation. The ability to automate access controls, monitor data usage, and generate compliance reports is positioning AIBOM repositories as a cornerstone of modern enterprise IT strategies.




    The market is also benefiting from the increasing adoption of cloud-based deployment models, which offer unparalleled scalability, flexibility, and cost-efficiency. As organizations shift towards hybrid and multi-cloud environments, the need for repositories that can seamlessly operate across on-premises and cloud infrastructures is becoming paramount. Cloud-based AIBOM repositories are enabling enterprises to rapidly deploy, scale, and manage their data assets without the burden of extensive capital investments in physical infrastructure. This trend is particularly pronounced among small and medium enterprises (SMEs), which are leveraging cloud repositories to level the playing field with larger competitors.




    Regionally, North America continues to dominate the AIBOM Repositories market, accounting for over 38% of global revenue in 2024, driven by the early adoption of AI technologies and the presence of leading technology providers. However, Asia Pacific is emerging as the fastest-growing region, with a projected CAGR of 23.4% through 2033, fueled by rapid digitalization, expanding IT infrastructure, and increasing investments in AI research and development. Europe, Latin America, and the Middle East & Africa are also witnessing steady growth, supported by rising awareness of data management best practices and the proliferation of industry-specific regulations.



    Repository Type Analysis



    The repository type segment of the AIBOM Repositories market is categorized into public repositories, private repositories, and hybrid repositories, each catering to distinct organizational needs. Public repositories, which are accessible to a broader audience, are gaining traction among academic and research institutions as well as open-source communities. These repositories facilitate collaboration, knowledge sharing, and innovation by providing a centralized platform for storing and distributing AI models, datasets, and business operation frameworks. The open nature of public repositories accelerates the pace of AI development and democratizes access to cutting-edge tools, but also introduces challenges related to data security and intellectual property protection.




    Private repositories, on the other hand, are designed t

  18. a

    PolarHub: A service-oriented cyberinfrastructure portal to support sustained...

    • arcticdata.io
    • dataone.org
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenwen Li (2020). PolarHub: A service-oriented cyberinfrastructure portal to support sustained polar sciences [Dataset]. http://doi.org/10.18739/A2K649T2G
    Explore at:
    Dataset updated
    May 20, 2020
    Dataset provided by
    Arctic Data Center
    Authors
    Wenwen Li
    Time period covered
    Jan 1, 2013 - Jan 1, 2016
    Area covered
    Description

    This project develop components of a polar cyberinfrastructure (CI) to support researchers and users for data discovery and access. The main goal is to provide tools that will enable a better access to polar data and information, hence allowing to spend more time on analysis and research, and significantly less time on discovery and searching. A large-scale web crawler, PolarHub, is developed to continuously mine the Internet to discover dispersed polar data. Beside identifying polar data in major data repositories, PolarHub is also able to bring individual hidden resources forward, hence increasing the discoverability of polar data. Quality and assessment of data resources are analyzed inside of PolarHub, providing a key tool for not only identifying issues but also to connect the research community with optimal data resources.

    In the current PolarHub system, seven different types of geospatial data and processing services that are compliant with OGC (Open Geospatial Consortium) are supported in the system. They are: -- OGC Web Map Service (WMS): is a standard protocol for serving (over the Internet)georeferenced map images which a map server generates using data from a GIS database. -- OGC Web Feature Service (WFS): provides an interface allowing requests for geographical features across the web using platform-independent calls. -- OGC Web Coverage Service (WCS): Interface Standard defines Web-based retrieval of coverages; that is, digital geospatial information representing space/time-varying phenomena. -- OGC Web Map Tile Service (WMTS): is a standard protocol for serving pre-rendered georeferenced map tiles over the Internet. -- OGC Sensor Observation Service (SOS): is a web service to query real-time sensor data and sensor data time series and is part of theSensor Web. The offered sensor data comprises descriptions of sensors themselves, which are encoded in the Sensor Model Language (SensorML), and the measured values in the Observations and Measurements (O and M) encoding format. -- OGC Web Processing Service (WPS): Interface Standard provides rules for standardizing how inputs and outputs (requests and responses) for invoking geospatial processing services, such as polygon overlay, as a web service. -- OGC Catalog Service for the Web (CSW): is a standard for exposing a catalogue of geospatial records in XML on the Internet (over HTTP). The catalogue is made up of records that describe geospatial data (e.g. KML), geospatial services (e.g. WMS), and related resources.

    PolarHub has three main functions: (1) visualization and metadata viewing of geospatial data services; (2) user-guided real-time data crawling; and (3) data filtering and search from PolarHub data repository.

  19. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Silicon Orchard Lab, Bangladesh
    Independent University, Bangladesh
    University of Memphis, USA
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  20. f

    Data Policy

    • fairsharing.org
    Updated Jun 28, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Oxford, Dept. of Engineering Science, Data Readiness Group (2017). Data Policy [Dataset]. https://fairsharing.org/
    Explore at:
    Dataset updated
    Jun 28, 2017
    Dataset authored and provided by
    University of Oxford, Dept. of Engineering Science, Data Readiness Group
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    A manually curated registry of data policies from research funders, journal publishers, societies, and other organisations. These are linked to the databases and standards that they recommend for use

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
Organization logo

Data from: Inventory of online public databases and repositories holding agricultural data in 2017

Related Article
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

Search
Clear search
Close search
Google apps
Main menu