14 datasets found
  1. e

    Data Base for the Modern Web

    • paper.erudition.co.in
    html
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Einetic (2025). Data Base for the Modern Web [Dataset]. https://paper.erudition.co.in/makaut/btech-in-electronics-and-instrumentation-engineering/8/big-data-analysis
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 3, 2025
    Dataset authored and provided by
    Einetic
    License

    https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

    Description

    Question Paper Solutions of chapter Data Base for the Modern Web of Big Data Analysis, 8th Semester , Applied Electronics and Instrumentation Engineering

  2. Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A...

    • figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather Cribbs; Gabriel Gardner (2023). Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A Multi-Campus Big Data Study" [Dataset]. http://doi.org/10.6084/m9.figshare.19071578.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Heather Cribbs; Gabriel Gardner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.

  3. Data from: Current and projected research data storage needs of Agricultural...

    • datasets.ai
    • agdatacommons.nal.usda.gov
    • +2more
    33, 53, 8
    Updated Mar 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Agriculture (2024). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://datasets.ai/datasets/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    53, 33, 8Available download formats
    Dataset updated
    Mar 30, 2024
    Dataset provided by
    United States Department of Agriculturehttp://usda.gov/
    Authors
    Department of Agriculture
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling.

    The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly.

    From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.

    Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond.

    We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival.

    To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.


    Resources in this dataset:

    • Resource Title: Appendix A: ARS data storage survey questions.

      File Name: Appendix A.pdf

      Resource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here.

      Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/


    • Resource Title: CSV of Responses from ARS Researcher Data Storage Survey.

      File Name: Machine-readable survey response data.csv

      Resource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).


    • Resource Title: Responses from ARS Researcher Data Storage Survey.

      File Name: Data Storage Survey Data for public release.xlsx

      Resource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.

      Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  4. NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 &...

    • registry.opendata.aws
    Updated Apr 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA (2025). NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17, 18 & 19 [Dataset]. https://registry.opendata.aws/noaa-goes/
    Explore at:
    Dataset updated
    Apr 4, 2025
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description



    NEW GOES-19 Data!! On April 4, 2025 at 1500 UTC, the GOES-19 satellite will be declared the Operational GOES-East satellite. All products and services, including NODD, for GOES-East will transition to GOES-19 data at that time. GOES-19 will operate out of the GOES-East location of 75.2°W starting on April 1, 2025 and through the operational transition. Until the transition time and during the final stretch of Post Launch Product Testing (PLPT), GOES-19 products are considered non-operational regardless of their validation maturity level. Shortly following the transition of GOES-19 to GOES-East, all data distribution from GOES-16 will be turned off. GOES-16 will drift to the storage location at 104.7°W. GOES-19 data should begin flowing again on April 4th once this maneuver is complete.

    NEW GOES 16 Reprocess Data!! The reprocessed GOES-16 ABI L1b data mitigates systematic data issues (including data gaps and image artifacts) seen in the Operational products, and improves the stability of both the radiometric and geometric calibration over the course of the entire mission life. These data were produced by recomputing the L1b radiance products from input raw L0 data using improved calibration algorithms and look-up tables, derived from data analysis of the NIST-traceable, on-board sources. In addition, the reprocessed data products contain enhancements to the L1b file format, including limb pixels and pixel timestamps, while maintaining compatibility with the operational products. The datasets currently available span the operational life of GOES-16 ABI, from early 2018 through the end of 2024. The Reprocessed L1b dataset shows improvement over the Operational L1b products but may still contain data gaps or discrepancies. Please provide feedback to Dan Lindsey (dan.lindsey@noaa.gov) and Gary Lin (guoqing.lin-1@nasa.gov). More information can be found in the GOES-R ABI Reprocess User Guide.


    NOTICE: As of January 10th 2023, GOES-18 assumed the GOES-West position and all data files are deemed both operational and provisional, so no ‘preliminary, non-operational’ caveat is needed. GOES-17 is now offline, shifted approximately 105 degree West, where it will be in on-orbit storage. GOES-17 data will no longer flow into the GOES-17 bucket. Operational GOES-West products can be found in the GOES-18 bucket.

    GOES satellites (GOES-16, GOES-17, GOES-18 & GOES-19) provide continuous weather imagery and monitoring of meteorological and space environment data across North America. GOES satellites provide the kind of continuous monitoring necessary for intensive data analysis. They hover continuously over one position on the surface. The satellites orbit high enough to allow for a full-disc view of the Earth. Because they stay above a fixed spot on the surface, they provide a constant vigil for the atmospheric "triggers" for severe weather conditions such as tornadoes, flash floods, hailstorms, and hurricanes. When these conditions develop, the GOES satellites are able to monitor storm development and track their movements. SUVI products available in both NetCDF and FITS.

  5. Table_3_Hotspot and Frontier Analysis of Exercise Training Therapy for Heart...

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Wang; Yuhong Jia; Molin Li; Sirui Jiao; Henan Zhao (2023). Table_3_Hotspot and Frontier Analysis of Exercise Training Therapy for Heart Failure Complicated With Depression Based on Web of Science Database and Big Data Analysis.pdf [Dataset]. http://doi.org/10.3389/fcvm.2021.665993.s003
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Yan Wang; Yuhong Jia; Molin Li; Sirui Jiao; Henan Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Exercise training has been extensively studied in heart failure (HF) and psychological disorders, which has been shown to worsen each other. However, our understanding of how exercise simultaneously protect heart and brain of HF patients is still in its infancy. The purpose of this study was to take advantage of big data techniques to explore hotspots and frontiers of mechanisms that protect the heart and brain simultaneously through exercise training.Methods: We studied the scientific publications on related research between January 1, 2003 to December 31, 2020 from the WoS Core Collection. Research hotspots were assessed through open-source software, CiteSpace, Pajek, and VOSviewer. Big data analysis and visualization were carried out using R, Cytoscape and Origin.Results: From 2003 to 2020, the study on HF, depression, and exercise simultaneously was the lowest of all research sequences (two-way ANOVAs, p < 0.0001). Its linear regression coefficient r was 0.7641. The result of hotspot analysis of related keyword-driven research showed that inflammation and stress (including oxidative stress) were the common mechanisms. Through the further analyses, we noted that inflammation, stress, oxidative stress, apoptosis, reactive oxygen species, cell death, and the mechanisms related to mitochondrial biogenesis/homeostasis, could be regarded as the primary mechanism targets to study the simultaneous intervention of exercise on the heart and brain of HF patients with depression.Conclusions: Our findings demonstrate the potential mechanism targets by which exercise interferes with both the heart and brain for HF patients with depression. We hope that they can boost the attention of other researchers and clinicians, and open up new avenues for designing more novel potential drugs to block heart-brain axis vicious circle.

  6. Enterprise Data Warehouse (EDW) Market Analysis, Size, and Forecast...

    • technavio.com
    pdf
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Enterprise Data Warehouse (EDW) Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), APAC (China, India, Japan, and South Korea), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/enterprise-data-warehouse-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    United States
    Description

    Snapshot img

    Enterprise Data Warehouse (EDW) Market Size 2025-2029

    The enterprise data warehouse (edw) market size is valued to increase USD 43.12 billion, at a CAGR of 28% from 2024 to 2029. Data explosion across industries will drive the enterprise data warehouse (edw) market.

    Major Market Trends & Insights

    APAC dominated the market and accounted for a 32% growth during the forecast period.
    By Product Type - Information and analytical processing segment was valued at USD 4.38 billion in 2023
    By Deployment - Cloud based segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 857.82 million
    Market Future Opportunities: USD 43116.60 million
    CAGR : 28%
    APAC: Largest market in 2023
    

    Market Summary

    The market is a dynamic and ever-evolving landscape, characterized by continuous innovation and adaptation to industry demands. Core technologies, such as cloud computing and big data analytics, are driving the market's growth, enabling organizations to manage and analyze vast amounts of data more effectively. In terms of applications, business intelligence and data mining are leading the way, providing valuable insights for strategic decision-making. Service types, including consulting, implementation, and support, are essential components of the EDW market. According to recent reports, the consulting segment is expected to dominate the market due to the increasing demand for expert advice in implementing and optimizing EDW solutions. However, data security concerns remain a significant challenge, with regulations like GDPR and HIPAA driving the need for robust security measures. Despite these challenges, the market continues to expand, with data explosion across industries fueling the demand for EDW solutions. For instance, the healthcare sector is projected to witness a compound annual growth rate (CAGR) of 15.3% between 2021 and 2028. Furthermore, the market is witnessing a significant focus on new solution launches, with major players like Microsoft, IBM, and Oracle introducing advanced EDW offerings to meet the evolving needs of businesses.

    What will be the Size of the Enterprise Data Warehouse (EDW) Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the Enterprise Data Warehouse (EDW) Market Segmented and what are the key trends of market segmentation?

    The enterprise data warehouse (edw) industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. Product TypeInformation and analytical processingData miningDeploymentCloud basedOn-premisesSectorLarge enterprisesSMEsEnd-userBFSIHealthcare and pharmaceuticalsRetail and E-commerceTelecom and ITOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyItalyUKAPACChinaIndiaJapanSouth KoreaRest of World (ROW)

    By Product Type Insights

    The information and analytical processing segment is estimated to witness significant growth during the forecast period.

    The market is experiencing significant growth, with data replication strategies becoming increasingly sophisticated to ensure capacity planning models accommodate expanding data volumes. ETL tool selection and business intelligence platforms are crucial components, enabling query optimization strategies and disaster recovery planning. Data warehouse migration, data profiling methods, and real-time data ingestion are essential for maintaining a competitive edge. Data warehouse automation, data quality metrics, and data warehouse modernization are ongoing priorities, with data cleansing techniques and dimensional modeling techniques essential for ensuring data accuracy. Data warehousing architecture, performance monitoring tools, and high availability solutions are integral to ensuring scalability and availability. Audit trail management, data lineage tracking, and data warehouse maintenance are critical for maintaining data security and compliance. Data security protocols and data encryption methods are essential for protecting sensitive information, while data virtualization techniques and access control mechanisms facilitate self-service business intelligence tools. ETL process optimization and data governance policies are key to streamlining operations and ensuring data consistency. The IT, BFSI, education, healthcare, and retail sectors are driving market growth, with information processing and analytical processing becoming increasingly important. The construction of web-based accessing tools integrated with web browsers is a current trend, enabling users to access data warehouses easily. According to recent studies, the market for data warehousing solutions is projected to grow by 18.5%, while the adoption of cloud data warehou

  7. Data from: Corpus of Resolutions: UN Security Council (CR-UNSC)

    • zenodo.org
    • nde-dev.biothings.io
    • +1more
    pdf, zip
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seán Fobbe; Seán Fobbe; Lorenzo Gasbarri; Lorenzo Gasbarri; Niccolò Ridi; Niccolò Ridi (2024). Corpus of Resolutions: UN Security Council (CR-UNSC) [Dataset]. http://doi.org/10.5281/zenodo.11212056
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Seán Fobbe; Seán Fobbe; Lorenzo Gasbarri; Lorenzo Gasbarri; Niccolò Ridi; Niccolò Ridi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Overview

    The Corpus of Resolution: UN Security Council (CR-UNSC) collects and presents for the first time in human and machine-readable form all resolutions, drafts, and meeting records of the UN Security Council, including detailed metadata, as published by the UN Digital Library and revised by the authors.

    The United Nations Security Council (UNSC) is the most influential of the principal UN organs. Composed of five permanent and ten non-permanent members, its functioning is constrained by the political context in which it operates. During the Cold War, the complex political relationships between the permanent members and their veto powers significantly affected the capacity of the UNSC to address violations of international peace and security, with only 646 resolutions passed from 1946 to 1989. Since the 1990s, the activity of the UN Security Council has increased dramatically and produced 2721 resolutions up to the end of 2023. The length, complexity and thematic breadth of the resolutions has also increased, prompting calls to redefine it as a quasi-legislative body.

    Under Articles 24 and 25 of the UN Charter, member states have conferred upon the UNSC the "primary responsibility for the maintenance of international peace and security" and have agreed "to accept and carry out" its decisions. The discharge of this function is carried out through the powers bestowed upon it under Chapter VI of the UN Charter, "Pacific Settlement of Disputes", Chapter VII, "Action with Respect to Threats to the Peace, Breaches of the Peace, and Acts of Aggression", Chapter VIII, "Regional Arrangements", and Chapter XII, "International Trusteeship System".

    Under the peace and security mandate, its areas of activity cover disarmament, pacific settlement of disputes, enforcement, and, until 1994, strategic areas in a trusteeship agreement. Its functions also pertain to the correct working of the United Nations, covering issues of membership, the appointment of the Secretary General, the elections of judges of the International Court of Justice (ICJ), the calling of special and emergency sessions of the General Assembly, the amendment of the Charter and of the ICJ Statute.

    Please refer to the Codebook for a detailed explanation of the dataset and instructions on how to make use of it.

    Updates

    The CR-UNSC will be updated at least once per year.

    In case of serious errors an update will be provided at the earliest opportunity and a highlighted advisory issued on the Zenodo page of the current version. Minor errors will be documented in the GitHub issue tracker and fixed with the next scheduled release.

    The CR-UNSC is versioned according to the day of the last run of the data pipeline, in the ISO format YYYY-MM-DD. Its initial release version is 2024-05-03.

    Notifications regarding new and updated data sets will be published on my academic website at www.seanfobbe.com or on the Fediverse at @seanfobbe@fediscience.org

    Changelog

    • New variant: EN_TXT_BEST containing a write-out of the English resolution texts equivalent to the CSV file text variable
    • New diagrams: bar charts of top M49 regions and sub-regions of countries mentioned in resolution texts
    • Fixed naming mix-up of BIBTEX and GRAPHML zip archives
    • Fixed whitespace character detection in citation extraction (adds ca. 10% more citations)
    • Fixed improper merging of weights in citation network
    • Fixed "cannot xtfrm data frames" warning
    • Improve REGEX detection for certain geographic entities
    • Improve Codebook (headings, citation network docs)

    Key Metrics

    Version: 2024-05-19

    Scope: UNSC Resolutions from 1 (1946) up to and including 2722 (2024)

    Tokens: 3,704,016 (English resolution texts)

    Languages: English, French, Spanish, Arabic, Chinese, Russian

    Features

    • 82 Variables
    • Resolution texts in all six official UN languages (English, French, Spanish, Arabic, Chinese, Russian)
    • Draft texts of resolutions in English
    • Meeting record texts in English
    • URLs to draft texts in all other languages (French, Spanish, Arabic, Chinese, Russian)
    • URLs to meeting record texts in all other languages (French, Spanish, Arabic, Chinese, Russian)
    • Citation data as GraphML (UNSC-to-UNSC resolutions and UNSC-to-UNGA resolutions)
    • Bibliographic database in BibTeX/OSCOLA format for e.g. Zotero, Endnote and Jabref
    • Extensive Codebook to explain the uses of the dataset
    • Compilation Report and Quality Assurance Report explain construction and validation of the data set
    • Publication quality diagrams for teaching, research and all other purposes (PDF for printing, PNG for web)
    • Open and platform independent file formats (CSV, PDF, TXT, GraphML)
    • Software version controlled with Docker
    • Publication of full data set (Open Data)
    • Publication of full source code (Open Source)
    • Data published under Public Domain waiver (CC Zero 1.0)
    • Source Code is Free Software published under the GNU General Public License Version 3 (GNU GPL v3)
    • Secure cryptographic signatures for all files in version of record (SHA2-256 and SHA3-512)

    Recommended Variants

    Traditional Scholars

    ALL_PDF_Resolutions

    EN_TXT_BEST

    BIBTEX_OSCOLA

    Quantitative Scholars

    ALL_CSV_FULL

    EN_TXT_BEST

    CITATIONS_GRAPHML

    Please refer to the Codebook regarding for details on each variant. The ZIP archives include texts in all languages, unless noted in the filename.

    We strongly recommend using the CSV files for quantitative analysis, but if you find CSV hard to use and want to analyze only the text of resolutions, the EN_TXT_BEST variant is a mix of expert-revised OCR and born digital texts equivalent to the "text" variable in the CSV file.

    Compilation Report and Quality Assurance Report

    With every compilation of the full data set, an extensive Compilation Report and detailed Quality Assurance Report are created and published in PDF format.

    The Compilation Report includes the source code for the pipeline architecture, comments and explanations of design decisions, relevant computational results, exact timestamps and a table of contents with clickable internal hyperlinks to each section.

    The Quality Assurance Report contains a count of all hard tests and expectations, additional visualizations and documented test results for all soft tests that require further interpretation

    The Compilation Report, Quality Assurance Report and Source Code are published under the following DOI: https://zenodo.org/doi/10.5281/zenodo.7319783

    Attribution and Copyright

    This data is derived from the United Nations Digital Library at https://digitallibrary.un.org. Records were accessed and downloaded on 13 and 26 March 2024, with additional work on revisions and corrections up to and including the date given as the version number.

    Pursuant to UN Administrative Instruction ST/AI/189/Add.9/Rev.2 of 17 September 1987 all official records and United Nations Documents (including resolutions, compilations of resolutions, drafts and meeting records) are in the public domain. We wish to honor the letter and spirit of this UN policy. To ensure the widest possible distribution of official UN documents and to promote the international rule of law we waive any copyright that might have accrued by creating the dataset under a Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

    Disclaimer

    This data set is an academic initiative and is not associated with or endorsed by the United Nations or any of its constituent organs and organizations.

    Author Websites

    Personal Website of Seán Fobbe

    Personal Website of Lorenzo Gasbarri

    Personal Website of Niccolò Ridi

    Contact

    Did you discover any errors? Do you have suggestions on how to improve the data set? You can either post these to the Issue Tracker on GitHub or contact Seán Fobbe via https://seanfobbe.com/contact/

  8. Z by HP Unlocked Challenge 2 - Text Analysis

    • kaggle.com
    zip
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ken Jee (2022). Z by HP Unlocked Challenge 2 - Text Analysis [Dataset]. https://www.kaggle.com/datasets/kenjee/z-by-hp-unlocked-challenge-2-text-analysis
    Explore at:
    zip(97197 bytes)Available download formats
    Dataset updated
    Feb 11, 2022
    Authors
    Ken Jee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Z by HP Unlocked Challenge 2

    Z by HP Unlocked Challenge 2 - Topic Modeling - Link to video & submission: https://www.hp.com/us-en/workstations/industries/data-science/unlocked-challenge.html

    The Task

    Summarize the main topics presented in the articles into the most relevant topic groups with text and visuals. This can be done using NLP tools especially Latent Dirichlet Allocation (LDA) and can provide insight into the relevant content within the articles without having to read through all of them. Be sure to share your work on your favorite platform with #DataUnlockedWithZ

    What is Unlocked?

    Unlocked is an action-packed interactive film made by Z by HP for data scientists. Sharpen your skills and solve the data driven mystery here: https://www.hp.com/us-en/workstations/industries/data-science/unlocked-challenge.html

    The Data

    The Data is pretty straightforward and consists of text files within the challenge2-articles folder. Each text file will follow the format challenge2-articleXXX.txt where X is a number. There should be 144 total articles to summarize.

    Where to Start

    Feel free to follow along with the jupyter notebook or investigate and create your own topic model.

    LDA Models/Visualizations

    pyLDAvis

    pyLDAvis Tutorials: * https://neptune.ai/blog/pyldavis-topic-modelling-exploration-tool-that-every-nlp-data-scientist-should-know * https://nbviewer.org/github/bmabey/pyLDAvis/blob/master/notebooks/pyLDAvis_overview.ipynb

    Research Papers: * http://vis.stanford.edu/files/2012-Termite-AVI.pdf * https://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf

    Sklearn Implementation

    Inspired by: https://nbviewer.org/github/bmabey/pyLDAvis/blob/master/notebooks/sklearn.ipynb

  9. Opportunity Insights real time Economic Tracker US

    • kaggle.com
    zip
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Douglas K.G. Araujo (2025). Opportunity Insights real time Economic Tracker US [Dataset]. https://www.kaggle.com/douglaskgaraujo/opportunity-insights-real-time-economic-tracker-us
    Explore at:
    zip(191171339 bytes)Available download formats
    Dataset updated
    Nov 13, 2025
    Authors
    Douglas K.G. Araujo
    Area covered
    United States
    Description

    Context

    This dataset is created as part of Opportunity Insights' Economic Tracker, which you can access directly (and with a very nice user interface) at https://www.tracktherecovery.org. The objective of this data is to allow everyone to track the state of the US economy in "real time" - in other words, the data is made available almost as soon as it comes in, and thus there is very little lag between the dataset and the current date. Another interesting feature is that the dataset is very granular, meaning that it can inform you about the economy even at the county or city level, and in some cases with breakdowns by economic sector or by income bracket.

    Content

    In short, the data originates from big companies that offer payments services, or job web portals, etc - these companies share their usage data with Opportunity Insight (in a way that protects user privacy, as you can read in detail in OI's website) and they, in turn, match that data with what would correspond to official statistics. Thus, the dataset can be reasonably interpreted as being a sort of tracker for the economy, even though OI does not tap from the same multitude of sources like the official statistical agencies. Further information about the content, including the specific files, can be found at the README page below.

    Acknowledgements

    This dataset is fully attributed to Opportunity Insights, and it is their complete merit. If you use it, please make sure to cite it adequately, by pointing to their website (as above) and to the following paper: "How Did COVID-19 and Stabilization Policies Affect Spending and Employment? A New Real-Time Economic Tracker Based on Private Sector Data", by Raj Chetty, John Friedman, Nathaniel Hendren, Michael Stepner, and the Opportunity Insights Team. June 2020. Available at: https://opportunityinsights.org/wp-content/uploads/2020/05/tracker_paper.pdf

  10. Saudi Arabia IT Market Analysis, Size, and Forecast 2025-2029

    • technavio.com
    pdf
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Saudi Arabia IT Market Analysis, Size, and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/it-market-size-in-saudi-arabia-industry-size-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 23, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Saudi Arabia
    Description

    Snapshot img

    Saudi Arabia IT Market Size 2025-2029

    The saudi arabia it market size is valued to increase USD 5.6 billion, at a CAGR of 6.9% from 2024 to 2029. Launch of national transformation program will drive the saudi arabia it market.

    Major Market Trends & Insights

    By Component - Hardware segment was valued at USD 5.81 billion in 2022
    By End-user - Government segment accounted for the largest market revenue share in 2022
    

    Market Size & Forecast

    Market Opportunities: USD 76.74 billion
    Market Future Opportunities: USD 5.60 billion
    CAGR from 2024 to 2029 : 6.9%
    

    Market Summary

    The market is a dynamic and evolving landscape, characterized by the adoption of core technologies and applications such as cloud computing, artificial intelligence, and the Internet of Things (IoT). According to recent reports, cloud computing is expected to dominate the market, with a projected 30% market share by 2025. This growth is driven by the Saudi Arabian government's national transformation program, which aims to digitize various sectors and enhance public services through e-governance. However, the market faces challenges such as the increasing threat of cyber crimes and the need for regulatory compliance. Despite these hurdles, opportunities abound, including the growing demand for IT services in sectors like healthcare, finance, and education. The IT industry in Saudi Arabia is poised for significant growth, offering promising prospects for service providers and technology companies.

    What will be the Size of the Saudi Arabia IT Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the IT in Saudi Arabia Market Segmented and what are the key trends of market segmentation?

    The it in saudi arabia industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. ComponentHardwareSoftwareServicesEnd-userGovernmentIT and TelecommunicationBFSIOil and gasRetail and E-commerceOthersDeployment TypeOn-PremisesCloud-BasedHybridGeography

    By Component Insights

    The hardware segment is estimated to witness significant growth during the forecast period.

    In Saudi Arabia, the IT market is undergoing continuous evolution as the country embraces advanced technologies for business growth. Companies are investing in robust IT infrastructure, including cybersecurity infrastructure, network security protocols, and IT infrastructure management, to ensure data security and efficiency. The digital transformation initiatives have led to the adoption of customer relationship management and compliance management systems, enabling better business intelligence and data warehousing solutions. The mobile application development sector is thriving, with an increasing number of enterprises adopting enterprise mobility for enhanced productivity. The software development lifecycle, including software testing methodologies and application performance monitoring, is being optimized through network optimization techniques and system integration services. Furthermore, cloud computing services, including business continuity planning and risk management frameworks, are essential for business resilience. Machine learning algorithms and artificial intelligence applications are also gaining popularity, enhancing operational efficiency and driving innovation. These trends reflect the dynamic nature of the Saudi Arabian IT market, offering significant opportunities for businesses to streamline their workflows and stay competitive.

    Request Free Sample

    The Hardware segment was valued at USD 5.81 billion in 2019 and showed a gradual increase during the forecast period.

    Request Free Sample

    Market Dynamics

    Our researchers analyzed the data with 2024 as the base year, along with the key drivers, trends, and challenges. A holistic analysis of drivers will help companies refine their marketing strategies to gain a competitive advantage.

    The market is experiencing robust growth, driven by the implementation of cloud-based solutions and the design and deployment of secure networks. Companies are focusing on the development of scalable web applications and the integration of enterprise resource planning systems to streamline operations and enhance productivity. The management of IT infrastructure services and optimization of database performance are also key priorities, as businesses seek to improve IT service management and cybersecurity measures. Moreover, the adoption of agile development and DevOps practices, along with the application of machine learning models and analysis of big data analytics, is gaining traction. The development of mobile applications and implementation of digital transformation strategies are also significan

  11. f

    Datasheet4_Social media and internet search data to inform drug utilization:...

    • figshare.com
    pdf
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roman Keller; Alessandra Spanu; Milo Alan Puhan; Antoine Flahault; Christian Lovis; Margot Mütsch; Raphaelle Beau-Lejdstrom (2023). Datasheet4_Social media and internet search data to inform drug utilization: A systematic scoping review.pdf [Dataset]. http://doi.org/10.3389/fdgth.2023.1074961.s004
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Roman Keller; Alessandra Spanu; Milo Alan Puhan; Antoine Flahault; Christian Lovis; Margot Mütsch; Raphaelle Beau-Lejdstrom
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionDrug utilization is currently assessed through traditional data sources such as big electronic medical records (EMRs) databases, surveys, and medication sales. Social media and internet data have been reported to provide more accessible and more timely access to medications' utilization.ObjectiveThis review aims at providing evidence comparing web data on drug utilization to other sources before the COVID-19 pandemic.MethodsWe searched Medline, EMBASE, Web of Science, and Scopus until November 25th, 2019, using a predefined search strategy. Two independent reviewers conducted screening and data extraction.ResultsOf 6,563 (64%) deduplicated publications retrieved, 14 (0.2%) were included. All studies showed positive associations between drug utilization information from web and comparison data using very different methods. A total of nine (64%) studies found positive linear correlations in drug utilization between web and comparison data. Five studies reported association using other methods: One study reported similar drug popularity rankings using both data sources. Two studies developed prediction models for future drug consumption, including both web and comparison data, and two studies conducted ecological analyses but did not quantitatively compare data sources. According to the STROBE, RECORD, and RECORD-PE checklists, overall reporting quality was mediocre. Many items were left blank as they were out of scope for the type of study investigated.ConclusionOur results demonstrate the potential of web data for assessing drug utilization, although the field is still in a nascent period of investigation. Ultimately, social media and internet search data could be used to get a quick preliminary quantification of drug use in real time. Additional studies on the topic should use more standardized methodologies on different sets of drugs in order to confirm these findings. In addition, currently available checklists for study quality of reporting would need to be adapted to these new sources of scientific information.

  12. CEDAR Overview BD2K 2016.pdf

    • figshare.com
    pdf
    Updated Aug 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debra Willrett; John Campbell; Kei-Hoi Cheung; Michel Dumontier; Kim A. Durante; Attila L. Egyedi; Olivier Gevaert; Rafael Gonçalves; Alejandra Gonzales-Bertran; John Graybeal; Purvesh Khatri; Steven H. Kleinstein,; Mark Musen,; Csongor I. Nyulas; Maryam Panahiazar; Philippe Rocca-Serra; Marcos Martínez-Romero; Susanna-Assunta Sansone; Ravi D. Shankar; Martin J. O'Connor (2023). CEDAR Overview BD2K 2016.pdf [Dataset]. http://doi.org/10.6084/m9.figshare.4240241.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 5, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Debra Willrett; John Campbell; Kei-Hoi Cheung; Michel Dumontier; Kim A. Durante; Attila L. Egyedi; Olivier Gevaert; Rafael Gonçalves; Alejandra Gonzales-Bertran; John Graybeal; Purvesh Khatri; Steven H. Kleinstein,; Mark Musen,; Csongor I. Nyulas; Maryam Panahiazar; Philippe Rocca-Serra; Marcos Martínez-Romero; Susanna-Assunta Sansone; Ravi D. Shankar; Martin J. O'Connor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Online biomedical repositories contain a wealth of freely available data submitted by the research community, but to reuse this data in further studies requires well-annotated associated metadata. There is a growing set of community-developed standards for creating this metadata, often in the form of templates; still, the difficulties of working with these standards are significant. The Center for Expanded Data Annotation and Retrieval (CEDAR) is building an end-to-end system to ease the authoring of metadata and the templates. This system targets the creation of higher quality metadata to facilitate data discovery, interoperability, and reuse. With our public release in September 2016, we now support many new features which make authoring easier.Template and Metadata Repository: We developed a standardized representation of metadata and the templates that describe them, together with Web-based services to store, search, and share these resources. Templates created using CEDAR technology are stored in our openly accessible community repository, and can now be shared with other people and groups. Researchers can search for templates to annotate their studies, and share their metadata with others. We’ve now added Web-based interfaces and REST APIs to facilitate access to templates, and all the metadata collected using those templates.Template and Metadata Editor: We developed highly interactive Web-based tools to simplify the process of authoring metadata and templates. The Template Editor allows users to create, search, and author templates. An upgraded feature provides interoperation with ontologies: interactive look-up services linked to NCBO’s BioPortal (bioportal.bioontology.org) let template authors find ontology terms to annotate fields in their templates and to define possible values of fields, including creating new terms and value sets. The Metadata Editor, which creates a forms-based acquisition interface from a template, has been redesigned so users can more easily populate metadata based on the template fields.

  13. TED Talk Transcripts (2006 - 2021)

    • kaggle.com
    zip
    Updated Jan 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramshankar Yadhunath (2022). TED Talk Transcripts (2006 - 2021) [Dataset]. https://www.kaggle.com/datasets/thedatabeast/ted-talk-transcripts-2006-2021
    Explore at:
    zip(16927003 bytes)Available download formats
    Dataset updated
    Jan 8, 2022
    Authors
    Ramshankar Yadhunath
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset had been created as a part of a Quantitative Text Analysis project I had to complete during my Masters' degree. I wanted to explore the content of TED talks (being a big TED fan myself). There were datasets available at the time, but most of them were outdated. Also, I wanted to try my hand at web scraping too! So, this data took shape 😄

    Content

    The complete details on the data acquisition process is available here: https://deepnote.com/@ramshankar-yadhunath/Scraping-TED-fRqC4ebhTRaNrtcOSrIXMQ. I would highly recommend having a read if you are interested in using web scraping as a data acquisition technique.

    Acknowledgements

    A big shoutout to the work by @rounakbanik with https://www.kaggle.com/rounakbanik/ted-talks. That was my starting point. Also, Vishal Gupta's https://github.com/The-Gupta/TED-Scraper is a useful resource.

  14. BigBasket Entire Product List (~28K datapoints)

    • kaggle.com
    zip
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SJ (2022). BigBasket Entire Product List (~28K datapoints) [Dataset]. https://www.kaggle.com/datasets/surajjha101/bigbasket-entire-product-list-28k-datapoints
    Explore at:
    zip(6336602 bytes)Available download formats
    Dataset updated
    Jun 22, 2022
    Authors
    SJ
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    E-commerce (electronic commerce) is the activity of electronically buying or selling of products on online services or over the Internet. E-commerce draws on technologies such as mobile commerce, electronic funds transfer, supply chain management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory management systems, and automated data collection systems. E-commerce is in turn driven by the technological advances of the semiconductor industry, and is the largest sector of the electronics industry.

    Bigbasket is the largest online grocery supermarket in India. Was launched somewhere around in 2011 since then they've been expanding their business. Though some new competitors have been able to set their foot in the nation such as Blinkit etc. but BigBasket has still not loose anything - thanks to ever expanding popular base and their shift to online buying.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Einetic (2025). Data Base for the Modern Web [Dataset]. https://paper.erudition.co.in/makaut/btech-in-electronics-and-instrumentation-engineering/8/big-data-analysis

Data Base for the Modern Web

6

Explore at:
htmlAvailable download formats
Dataset updated
Dec 3, 2025
Dataset authored and provided by
Einetic
License

https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

Description

Question Paper Solutions of chapter Data Base for the Modern Web of Big Data Analysis, 8th Semester , Applied Electronics and Instrumentation Engineering

Search
Clear search
Close search
Google apps
Main menu