76 datasets found
  1. i

    Data from: Big Data Machine Learning Benchmark on Spark

    • ieee-dataport.org
    Updated Jun 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jairson Rodrigues (2019). Big Data Machine Learning Benchmark on Spark [Dataset]. https://ieee-dataport.org/open-access/big-data-machine-learning-benchmark-spark
    Explore at:
    Dataset updated
    Jun 6, 2019
    Authors
    Jairson Rodrigues
    Description

    net traffic

  2. d

    Innovating the Data Ecosystem: An Update of the Federal Big Data Research...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCO NITRD (2025). Innovating the Data Ecosystem: An Update of the Federal Big Data Research and Development Strategic Plan [Dataset]. https://catalog.data.gov/dataset/innovating-the-data-ecosystem-an-update-of-the-federal-big-data-research-and-development-s
    Explore at:
    Dataset updated
    May 14, 2025
    Dataset provided by
    NCO NITRD
    Description

    This document, Innovating the Data Ecosystem: An Update of The Federal Big Data Research and Development Strategic Plan, updates the 2016 Federal Big Data Research and Development Strategic Plan. This plan updates the vision and strategies on the research and development needs for big data laid out in the 2016 Strategic Plan through the six strategies areas (enhance the reusability and integrity of data; enable innovative, user-driven data science; develop and enhance the robustness of the federated ecosystem; prioritize privacy, ethics, and security; develop necessary expertise and diverse talent; and enhance U.S. leadership in the international context) to enhance data value and reusability and responsiveness to federal policies on data sharing and management.

  3. Big Data and Business Analytics Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Big Data and Business Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/big-data-and-business-analytics-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Big Data and Business Analytics Market Outlook



    In 2023, the global Big Data and Business Analytics market size is estimated to be valued at approximately $274 billion, and with a projected compound annual growth rate (CAGR) of 12.4%, it is anticipated to reach around $693 billion by 2032. This significant growth is driven by the escalating demand for data-driven decision-making processes across various industries, which leverage insights derived from vast data sets to enhance business efficiency, optimize operations, and drive innovation. The increasing adoption of Internet of Things (IoT) devices, coupled with the exponential growth of data generated daily, further propels the need for advanced analytics solutions to harness and interpret this information effectively.



    A critical growth factor in the Big Data and Business Analytics market is the increasing reliance on data to gain a competitive edge. Organizations are now more than ever looking to uncover hidden patterns, correlations, and insights from the data they collect to make informed decisions. This trend is especially prominent in industries such as retail, where understanding consumer behavior can lead to personalized marketing strategies, and in healthcare, where data analytics can improve patient outcomes through precision medicine. Moreover, the integration of big data analytics with artificial intelligence and machine learning technologies is enabling more accurate predictions and real-time decision-making, further enhancing the value proposition of these analytics solutions.



    Another key driver of market growth is the continuous technological advancements and innovations in data analytics tools and platforms. Companies are increasingly investing in advanced analytics capabilities, such as predictive analytics, prescriptive analytics, and real-time analytics, to gain deeper insights into their operations and market environments. The development of user-friendly and self-service analytics tools is also democratizing data access within organizations, empowering employees at all levels to leverage data in their daily decision-making processes. This democratization of data analytics is reducing the reliance on specialized data scientists, thereby accelerating the adoption of big data analytics across various business functions.



    The increasing emphasis on regulatory compliance and data privacy is also driving growth in the Big Data and Business Analytics market. Strict regulations, such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States, require organizations to manage and analyze data responsibly. This is prompting businesses to invest in robust analytics solutions that not only help them comply with these regulations but also ensure data integrity and security. Additionally, as data breaches and cybersecurity threats continue to rise, organizations are turning to analytics solutions to identify potential vulnerabilities and mitigate risks effectively.



    Regionally, North America remains a dominant player in the Big Data and Business Analytics market, benefiting from the presence of major technology companies and a high rate of digital adoption. The Asia Pacific region, however, is emerging as a significant growth area, driven by rapid industrialization, urbanization, and increasing investments in digital transformation initiatives. Europe also showcases a robust market, fueled by stringent data protection regulations and a strong focus on innovation. Meanwhile, the markets in Latin America and the Middle East & Africa are gradually gaining momentum as organizations in these regions are increasingly recognizing the value of data analytics in enhancing business outcomes and driving economic growth.



    Component Analysis



    The Big Data and Business Analytics market is segmented by components into software, services, and hardware, each playing a crucial role in the ecosystem. Software components, which include data management and analytics tools, are at the forefront, offering solutions that facilitate the collection, analysis, and visualization of large data sets. The software segment is driven by a demand for scalable solutions that can handle the increasing volume, velocity, and variety of data. As organizations strive to become more data-centric, there is a growing need for advanced analytics software that can provide actionable insights from complex data sets, leading to enhanced decision-making capabilities.



    In the services segment, businesses are increasingly seeking consultation, implementation, and support services to effective

  4. u

    Data from: USHAP: Big Data Seamless 1 km Ground-level PM2.5 Dataset for the...

    • iro.uiowa.edu
    • data.niaid.nih.gov
    Updated May 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Wei; Jun Wang; Zhanqing Li (2023). USHAP: Big Data Seamless 1 km Ground-level PM2.5 Dataset for the United States [Dataset]. https://iro.uiowa.edu/esploro/outputs/dataset/USHAP-Big-Data-Seamless-1-km/9984702835302771
    Explore at:
    Dataset updated
    May 1, 2023
    Dataset provided by
    Zenodo
    Authors
    Jing Wei; Jun Wang; Zhanqing Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 1, 2023
    Area covered
    United States
    Description

    USHAP (USHighAirPollutants) is one of the series of long-term, full-coverage, high-resolution, and high-quality datasets of ground-level air pollutants for the United States. It is generated from the big data (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence by considering the spatiotemporal heterogeneity of air pollution. This is the big data-derived seamless (spatial coverage = 100%) daily, monthly, and yearly 1 km (i.e., D1K, M1K, and Y1K) ground-level PM2.5 dataset in the United States from 2000 to 2020. Our daily PM2.5 estimates agree well with ground measurements with an average cross-validation coefficient of determination (CV-R2) of 0.82 and normalized root-mean-square error (NRMSE) of 0.40, respectively. All the data will be made public online once our paper is accepted, and if you want to use the USHighPM2.5 dataset for related scientific research, please contact us (Email: weijing_rs@163.com; weijing@umd.edu). Wei, J., Wang, J., Li, Z., Kondragunta, S., Anenberg, S., Wang, Y., Zhang, H., Diner, D., Hand, J., Lyapustin, A., Kahn, R., Colarco, P., da Silva, A., and Ichoku, C. Long-term mortality burden trends attributed to black carbon and PM2.5 from wildfire emissions across the continental USA from 2000 to 2020: a deep learning modelling study. The Lancet Planetary Health, 2023, 7, e963–e975. https://doi.org/10.1016/S2542-5196(23)00235-8 More air quality datasets of different air pollutants can be found at: https://weijing-rs.github.io/product.html

  5. Data from: Large Marine Ecosystems Status and Trends - Summary for Policy...

    • kiribati-data.sprep.org
    • americansamoa-data.nocache.eightyoptions.com.au
    • +13more
    pdf
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2025). Large Marine Ecosystems Status and Trends - Summary for Policy Makers [Dataset]. https://kiribati-data.sprep.org/dataset/large-marine-ecosystems-status-and-trends-summary-policy-makers
    Explore at:
    pdf(17413777)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Worldwide, -194.06576156616 -33.784560184557, POLYGON ((-217.76367902756 -1.2962868457321, -222.451171875 -2.0771946573915)), -133.38867902756 17.438389936736, -109.16994094849 -13.917874207681
    Description

    The water systems of the world — aquifers, lakes, rivers, large marine ecosystems, and open ocean — sustain the biosphere and underpin the health and socioeconomic well-being of the world’s population. Many of these systems are shared by two or more nations. Recognizing the value of trans-boundary water systems, and the reality that many of them continue to be over-exploited and degraded, and managed in fragmented ways, the Global Environment Facility (GEF) initiated the Trans-boundary Waters Assessment Programme (TWAP). The Programme aims to provide a baseline assessment to identify and evaluate changes in these water systems caused by human activities and natural processes, as well as the consequences these changes may have on the human populations dependent upon them.

  6. Big Bend National Park Tract and Boundary Data

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Jun 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Park Service (2024). Big Bend National Park Tract and Boundary Data [Dataset]. https://catalog.data.gov/dataset/big-bend-national-park-tract-and-boundary-data
    Explore at:
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    National Park Servicehttp://www.nps.gov/
    Description

    These ESRI shape files are of National Park Service tract and boundary data that was created by the Land Resources Division. Tracts are numbered and created by the regional cartographic staff at the Land Resources Program Centers and are associated to the Land Status Maps. This data should be used to display properties that NPS owns and properties that NPS may have some type of interest such as scenic easements or right of ways.

  7. The Federal Big Data Research and Development Strategic Plan

    • datasets.ai
    • s.cnmilf.com
    • +2more
    33
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Networking and Information Technology Research and Development, Executive Office of the President (2024). The Federal Big Data Research and Development Strategic Plan [Dataset]. https://datasets.ai/datasets/the-federal-big-data-research-and-development-strategic-plan
    Explore at:
    33Available download formats
    Dataset updated
    Aug 9, 2024
    Authors
    Networking and Information Technology Research and Development, Executive Office of the President
    Description

    Summary: This Plan is an important milestone in the Administrations Big Data Research and Development (R&D) Initiative

  8. GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over...

    • zenodo.org
    nc, pdf, zip
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu (2025). GlobalHighPM₂.₅: Global Daily Seamless 1 km Ground-Level PM₂.₅ Dataset over Land (2017–Present) [Dataset]. http://doi.org/10.5281/zenodo.10800980
    Explore at:
    nc, zip, pdfAvailable download formats
    Dataset updated
    May 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jing Wei; Jing Wei; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu; Zhanqing Li; Alexei Lyapustin; Jun Wang; Oleg Dubovik; Joel Schwartz; Lin Sun; Chi Li; Song Liu; Tong Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 11, 2022
    Description

    GlobalHighPM2.5 is part of a series of long-term, seamless, global, high-resolution, and high-quality datasets of air pollutants over land (i.e., GlobalHighAirPollutants, GHAP). It is generated from big data sources (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence, taking into account the spatiotemporal heterogeneity of air pollution.

    This dataset contains input data, analysis codes, and generated dataset used for the following article. If you use the GlobalHighPM2.5 dataset in your scientific research, please cite the following reference (Wei et al., NC, 2023):

    Input Data

    Relevant raw data for each figure (compiled into a single sheet within an Excel document) in the manuscript.

    Code

    Relevant Python scripts for replicating and ploting the analysis results in the manuscript, as well as codes for converting data formats.

    Generated Dataset

    Here is the first big data-derived seamless (spatial coverage = 100%) daily, monthly, and yearly 1 km (i.e., D1K, M1K, and Y1K) global ground-level PM2.5 dataset over land from 2017 to the present. This dataset exhibits high quality, with cross-validation coefficients of determination (CV-R2) of 0.91, 0.97, and 0.98, and root-mean-square errors (RMSEs) of 9.20, 4.15, and 2.77 µg m-3 on the daily, monthly, and annual bases, respectively.

    Due to data volume limitations,

    all (including daily) data for the year 2022 is accessible at: GlobalHighPM2.5 (2022)

    all (including daily) data for the year 2021 is accessible at: GlobalHighPM2.5 (2021)

    all (including daily) data for the year 2020 is accessible at: GlobalHighPM2.5 (2020)

    all (including daily) data for the year 2019 is accessible at: GlobalHighPM2.5 (2019)

    all (including daily) data for the year 2018 is accessible at: GlobalHighPM2.5 (2018)

    all (including daily) data for the year 2017 is accessible at: GlobalHighPM2.5 (2017)

    continuously updated...

    More GHAP datasets for different air pollutants are available at: https://weijing-rs.github.io/product.html

  9. Big Data Healthcare Market Size, Outlook, Trends & Global Report 2030

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2025). Big Data Healthcare Market Size, Outlook, Trends & Global Report 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/big-data-healthcare
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2019 - 2030
    Area covered
    Global
    Description

    The Big Data in Healthcare Market Report is Segmented by Component (Software, Services), Deployment (On-Premise, Cloud), Analytics Type (Descriptive Analytics, Predictive Analytics, Prescriptive Analytics), Application (Financial Analytics, and More), End User (Healthcare Providers, and More), and Geography (North America, Europe, Asia-Pacific, and More). The Market Forecasts are Provided in Terms of Value (USD).

  10. Data from: Big data em saúde do trabalhador: o quão distantes estamos?

    • data.scielo.org
    txt, xlsx
    Updated Sep 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thales Fagundes; Thales Fagundes (2024). Big data em saúde do trabalhador: o quão distantes estamos? [Dataset]. http://doi.org/10.48331/SCIELODATA.7FNOXU
    Explore at:
    xlsx(31713), txt(2125)Available download formats
    Dataset updated
    Sep 29, 2024
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Thales Fagundes; Thales Fagundes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Objetivo Identificar estratégias e desafios no uso de big data e inteligência artificial (IA) em saúde ocupacional, assim como práticas e obstáculos na sua implementação. Métodos Revisão de escopo utilizando termos relacionados à saúde ocupacional, big data e IA em quatro bases de dados (MEDLINE, EMBASE, BVS e SciELO), considerando artigos em português, espanhol e inglês publicados até 2022. Foram incluídos estudos com uso de grandes bases de dados e IA para análises relacionadas à saúde ocupacional. A seleção dos artigos foi feita independentemente por dois pesquisadores, com conflitos resolvidos por consenso. Resultados De 505 artigos identificados, 16 foram selecionados. O baixo número pode estar associado à escassez de dados que tratam da saúde do trabalhador de maneira sistêmica, considerando fatores demográficos, tecnológicos, socioeconômicos e ambientais. Os estudos selecionados mostraram que o big data e IA têm bom potencial para subsidiar a saúde ocupacional ao identificar indicadores de saúde e possibilitar previsões precisas. A implementação enfrenta desafios, como armazenamento de dados e questões éticas. Conclusão Big data e IA podem ser ferramentas úteis para analisar interações complexas de variáveis visando aprimorar a identificação de determinantes de saúde e dados de registros sobre ambientes de trabalho e indivíduos a eles expostos. Palavras-chave: big data; doenças ocupacionais; inteligência artificial; aprendizado de máquina; algoritmos; saúde do trabalhador.

  11. d

    Louisville Metro KY - Environmental Health Bulk Data - Inspections

    • catalog.data.gov
    • datasets.ai
    • +4more
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Environmental Health Bulk Data - Inspections [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-environmental-health-bulk-data-inspections-f462a
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    Attributes of routine and complaint driven inspections performed by Louisville Metro Department of Public Health and Wellness. EstablishmentID column can be joined to the EstablishmentID column in the Establishments Table to show attributes of the Establishment when a regulated establishment is involvedData Dictionary:InspectionID-system idEstablishmentID-permit numberRequestID-associated request or complaint id if applicableEHSNumber-id number of inspectorCountyID-system idInspectionDate-date of inspectionInspectionType-type of inspectionIsFollowUpInsp-is inspection a follow up inspection?R_F_InspIDDesc-a designation of food or retail type for certain food establishments that require two types of inspections.req_section-not usedGradeID-letter grade for inspection if applicablescore-inspection scoreInspTimeHoursInspTimeMinssample_attendance-not usedSampleAttendance-not usedNextInspDateACTION_CODE_DESC-sytem code of action taken textIsComplaintResolved-if complaint related is the complaint resolved?Contact:Gerald Kaforskigerald.kaforski@louisvilleky.gov

  12. Data from: A Toolbox for Surfacing Health Equity Harms and Biases in Large...

    • springernature.figshare.com
    application/csv
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen R. Pfohl; Heather Cole-Lewis; Rory Sayres; Darlene Neal; Mercy Asiedu; Awa Dieng; Nenad Tomasev; Qazi Mamunur Rashid; Shekoofeh Azizi; Negar Rostamzadeh; Liam G. McCoy; Leo Anthony Celi; Yun Liu; Mike Schaekermann; Alanna Walton; Alicia Parrish; Chirag Nagpal; Preeti Singh; Akeiylah Dewitt; Philip Mansfield; Sushant Prakash; Katherine Heller; Alan Karthikesalingam; Christopher Semturs; Joëlle K. Barral; Greg Corrado; Yossi Matias; Jamila Smith-Loud; Ivor B. Horn; Karan Singhal (2024). A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models [Dataset]. http://doi.org/10.6084/m9.figshare.26133973.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Stephen R. Pfohl; Heather Cole-Lewis; Rory Sayres; Darlene Neal; Mercy Asiedu; Awa Dieng; Nenad Tomasev; Qazi Mamunur Rashid; Shekoofeh Azizi; Negar Rostamzadeh; Liam G. McCoy; Leo Anthony Celi; Yun Liu; Mike Schaekermann; Alanna Walton; Alicia Parrish; Chirag Nagpal; Preeti Singh; Akeiylah Dewitt; Philip Mansfield; Sushant Prakash; Katherine Heller; Alan Karthikesalingam; Christopher Semturs; Joëlle K. Barral; Greg Corrado; Yossi Matias; Jamila Smith-Loud; Ivor B. Horn; Karan Singhal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary material and data for Pfohl and Cole-Lewis et al., "A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models" (2024).

    We include the sets of adversarial questions for each of the seven EquityMedQA datasets (OMAQ, EHAI, FBRT-Manual, FBRT-LLM, TRINDS, CC-Manual, and CC-LLM), the three other non-EquityMedQA datasets used in this work (HealthSearchQA, Mixed MMQA-OMAQ, and Omiye et al.), as well as the data generated as a part of the empirical study, including the generated model outputs (Med-PaLM 2 [1] primarily, with Med-PaLM [2] answers for pairwise analyses) and ratings from human annotators (physicians, health equity experts, and consumers). See the paper for details on all datasets.

    We include other datasets evaluated in this work: HealthSearchQA [2], Mixed MMQA-OMAQ, and Omiye et al [3].

    • Mixed MMQA-OMAQ is composed of the 140 question subset of MultiMedQA questions described in [1,2] with an additional 100 questions from OMAQ (described below). The 140 MultiMedQA questions are composed of 100 from HealthSearchQA, 20 from LiveQA [4], and 20 from MedicationQA [5]. In the data presented here, we do not reproduce the text of the questions from LiveQA and MedicationQA. For LiveQA, we instead use identifier that correspond to those presented in the original dataset. For MedicationQA, we designate "MedicationQA_N" to refer to the N-th row of MedicationQA (0-indexed).

    A limited number of data elements described in the paper are not included here. The following elements are excluded:

    1. The reference answers written by physicians to HealthSearchQA questions, introduced in [2], and the set of corresponding pairwise ratings. This accounts for 2,122 rated instances.

    2. The free-text comments written by raters during the ratings process.

    3. Demographic information associated with the consumer raters (only age group information is included).

    References

    1. Singhal, K., et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617 (2023).

    2. Singhal, K., Azizi, S., Tu, T. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). https://doi.org/10.1038/s41586-023-06291-2

    3. Omiye, J.A., Lester, J.C., Spichak, S. et al. Large language models propagate race-based medicine. npj Digit. Med. 6, 195 (2023). https://doi.org/10.1038/s41746-023-00939-z

    4. Abacha, Asma Ben, et al. "Overview of the medical question answering task at TREC 2017 LiveQA." TREC. 2017.

    5. Abacha, Asma Ben, et al. "Bridging the gap between consumers’ medication questions and trusted answers." MEDINFO 2019: Health and Wellbeing e-Networks for All. IOS Press, 2019. 25-29.

    Description of files and sheets

    1. Independent Ratings [ratings_independent.csv]: Contains ratings of the presence of bias and its dimensions in Med-PaLM 2 outputs using the independent assessment rubric for each of the datasets studied. The primary response regarding the presence of bias is encoded in the column bias_presence with three possible values (No bias, Minor bias, Severe bias). Binary assessments of the dimensions of bias are encoded in separate columns (e.g., inaccuracy_for_some_axes). Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Instances were missing for five instances in MMQA-OMAQ and two instances in CC-Manual. This file contains 7,519 rated instances.

    2. Paired Ratings [ratings_pairwise.csv]: Contains comparisons of the presence or degree of bias and its dimensions in Med-PaLM and Med-PaLM 2 outputs for each of the datasets studied. Pairwise responses are encoded in terms of two binary columns corresponding to which of the answers was judged to contain a greater degree of bias (e.g., Med-PaLM-2_answer_more_bias). Dimensions of bias are encoded in the same way as for ratings_independent.csv. Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Four ratings were missing (one for EHAI, two for FRT-Manual, one for FBRT-LLM). This file contains 6,446 rated instances.

    3. Counterfactual Paired Ratings [ratings_counterfactual.csv]: Contains ratings under the counterfactual rubric for pairs of questions defined in the CC-Manual and CC-LLM datasets. Contains a binary assessment of the presence of bias (bias_presence), columns for each dimension of bias, and categorical columns corresponding to other elements of the rubric (ideal_answers_diff, how_answers_diff). Instances for the CC-Manual dataset are triple-rated, instances for CC-LLM are single-rated. Due to a data processing error, we removed questions that refer to `Natal'' from the analysis of the counterfactual rubric on the CC-Manual dataset. This affects three questions (corresponding to 21 pairs) derived from one seed question based on the TRINDS dataset. This file contains 1,012 rated instances.

    4. Open-ended Medical Adversarial Queries (OMAQ) [equitymedqa_omaq.csv]: Contains questions that compose the OMAQ dataset. The OMAQ dataset was first described in [1].

    5. Equity in Health AI (EHAI) [equitymedqa_ehai.csv]: Contains questions that compose the EHAI dataset.

    6. Failure-Based Red Teaming - Manual (FBRT-Manual) [equitymedqa_fbrt_manual.csv]: Contains questions that compose the FBRT-Manual dataset.

    7. Failure-Based Red Teaming - LLM (FBRT-LLM); full [equitymedqa_fbrt_llm.csv]: Contains questions that compose the extended FBRT-LLM dataset.

    8. Failure-Based Red Teaming - LLM (FBRT-LLM) [equitymedqa_fbrt_llm_661_sampled.csv]: Contains questions that compose the sampled FBRT-LLM dataset used in the empirical study.

    9. TRopical and INfectious DiseaseS (TRINDS) [equitymedqa_trinds.csv]: Contains questions that compose the TRINDS dataset.

    10. Counterfactual Context - Manual (CC-Manual) [equitymedqa_cc_manual.csv]: Contains pairs of questions that compose the CC-Manual dataset.

    11. Counterfactual Context - LLM (CC-LLM) [equitymedqa_cc_llm.csv]: Contains pairs of questions that compose the CC-LLM dataset.

    12. HealthSearchQA [other_datasets_healthsearchqa.csv]: Contains questions sampled from the HealthSearchQA dataset [1,2].

    13. Mixed MMQA-OMAQ [other_datasets_mixed_mmqa_omaq]: Contains questions that compose the Mixed MMQA-OMAQ dataset.

    14. Omiye et al. [other datasets_omiye_et_al]: Contains questions proposed in Omiye et al. [3].

    Version history

    Version 2: Updated to include ratings and generated model outputs. Dataset files were updated to include unique ids associated with each question. Version 1: Contained datasets of questions without ratings. Consistent with v1 available as a preprint on Arxiv (https://arxiv.org/abs/2403.12025)

    WARNING: These datasets contain adversarial questions designed specifically to probe biases in AI systems. They can include human-written and model-generated language and content that may be inaccurate, misleading, biased, disturbing, sensitive, or offensive.

    NOTE: the content of this research repository (i) is not intended to be a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.

  13. m

    Data for: A Prioritization-based Analysis of Open Data Portals: The Case...

    • data.mendeley.com
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Di Wang (2018). Data for: A Prioritization-based Analysis of Open Data Portals: The Case study of Chinese Local Governments [Dataset]. http://doi.org/10.17632/ykdbpdmspy.1
    Explore at:
    Dataset updated
    Oct 16, 2018
    Authors
    Di Wang
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Area covered
    China
    Description

    We have used Analytic Hierarchy Process (AHP) to derive the priorities of all the factors in the evaluation framework for open government data (OGD) portals. The results of AHP process were shown in the uploaded pdf file. We have collected 2635 open government datasets of 15 different subject categories (local statistics, health, education, cultural activity, transportation, map, public safety, policies and legislation, weather, environment quality, registration, credit records, international trade, budget and spend, and government bid) from 9 OGD portals in China (Beijing, Zhejiang, Shanghai, Guangdong, Guizhou, Sichuan, XInjiang, Hong Kong and Taiwan). These datasets were used for the evaluation of these portals in our study. The records of the quality and open access of these datasets could be found in the uploaded Excel file.

  14. s

    MOHHS TB Mass Screening Information

    • rmi-data.sprep.org
    • pacific-data.sprep.org
    doc
    Updated Oct 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MOHHS (2023). MOHHS TB Mass Screening Information [Dataset]. https://rmi-data.sprep.org/dataset/mohhs-tb-mass-screening-information
    Explore at:
    doc(1284548)Available download formats
    Dataset updated
    Oct 17, 2023
    Dataset provided by
    UNEPCISPac5
    Ministry of Health and Human Services
    Authors
    MOHHS
    License

    https://pacific-data.sprep.org/resource/shared-data-license-agreementhttps://pacific-data.sprep.org/resource/shared-data-license-agreement

    Area covered
    Marshall Islands
    Description

    Ministry of Health TB Mass Screening Information.

  15. G

    GeoThermalCloud: Cloud Fusion of Big Data and Multi-Physics Models using...

    • gdr.openei.org
    • data.openei.org
    • +2more
    code, text_document
    Updated Apr 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bulbul Ahmmed; Bulbul Ahmmed (2022). GeoThermalCloud: Cloud Fusion of Big Data and Multi-Physics Models using Machine Learning for Discovery, Exploration and Development of Hidden Geothermal Resources [Dataset]. http://doi.org/10.15121/1869828
    Explore at:
    code, text_documentAvailable download formats
    Dataset updated
    Apr 4, 2022
    Dataset provided by
    Office of Energy Efficiency and Renewable Energyhttp://energy.gov/eere
    Geothermal Data Repository
    Stanford University
    Authors
    Bulbul Ahmmed; Bulbul Ahmmed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Geothermal exploration and production are challenging, expensive and risky. The GeoThermalCloud uses Machine Learning to predict the location of hidden geothermal resources. This submission includes a training dataset for the GeoThermalCloud neural network. Machine Learning for Discovery, Exploration, and Development of Hidden Geothermal Resources.

  16. m

    Hadoop Big Data Analytics Market - Share, Trends & Industry Forecast

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Jun 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2024). Hadoop Big Data Analytics Market - Share, Trends & Industry Forecast [Dataset]. https://www.mordorintelligence.com/industry-reports/hadoop-big-data-analytics-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 16, 2024
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2019 - 2030
    Area covered
    Global
    Description

    The Hadoop Big Data Analytics Market report segments the industry into Solution (Data Discovery and Visualization (DDV), Advanced Analytics (AA)), End User (BFSI, Retail, IT and Telecom, Healthcare and Life Sciences, Manufacturing, Media and Entertainment, Other End Users), and Geography (North America, Europe, Asia-Pacific, Latin America, Middle-East and Africa).

  17. Data from: A global review of species-specific shark-fin-to-body-mass ratios...

    • rmi-data.sprep.org
    • americansamoa-data.nocache.eightyoptions.com.au
    • +13more
    pdf
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2025). A global review of species-specific shark-fin-to-body-mass ratios and relevant legislation [Dataset]. https://rmi-data.sprep.org/dataset/global-review-species-specific-shark-fin-body-mass-ratios-and-relevant-legislation
    Explore at:
    pdf(658404)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Pacific Region
    Description

    A global review of species-specific shark-fin-to-body-mass ratios and relevant legislation

  18. I

    Global Big Data Exchange Market Future Outlook 2025-2032

    • statsndata.org
    excel, pdf
    Updated May 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Big Data Exchange Market Future Outlook 2025-2032 [Dataset]. https://www.statsndata.org/report/big-data-exchange-market-7383
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    May 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Big Data Exchange market has emerged as a pivotal component in today's data-driven landscape, fundamentally reshaping how organizations manage, analyze, and utilize vast amounts of data. As businesses recognize the immense value hidden within their data reservoirs, the need for efficient data exchange frameworks

  19. Big Data as a Service (BDaaS) Market Analysis North...

    • technavio.com
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2023). Big Data as a Service (BDaaS) Market Analysis North America,APAC,Europe,South America,Middle East and Africa - US,Canada,China,Germany,UK - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/big-data-as-a-service-market-industry-analysis
    Explore at:
    Dataset updated
    Dec 20, 2023
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Canada, Germany, United Kingdom, China, United States, Global
    Description

    Snapshot img

    Big Data as a Service Market Size 2024-2028

    The big data as a service market size is forecast to increase by USD 41.20 billion at a CAGR of 28.45% between 2023 and 2028.

    The market is experiencing significant growth due to the increasing volume of data and the rising demand for advanced data insights. Machine learning algorithms and artificial intelligence are driving product quality and innovation in this sector. Hybrid cloud solutions are gaining popularity, offering the benefits of both private and public cloud platforms for optimal data storage and scalability. Industry standards for data privacy and security are increasingly important, as large amounts of data pose unique risks. The BDaaS market is expected to continue its expansion, providing valuable data insights to businesses across various industries.
    

    What will be the Big Data as a Service Market Size During the Forecast Period?

    Request Free Sample

    Big Data as a Service (BDaaS) has emerged as a game-changer in the business world, enabling organizations to harness the power of big data without the need for extensive infrastructure and expertise. This service model offers various components such as data management, analytics, and visualization tools, enabling businesses to derive valuable insights from their data. BDaaS encompasses several key components that drive market growth. These include Business Intelligence (BI), Data Science, Data Quality, and Data Security. BI provides organizations with the ability to analyze data and gain insights to make informed decisions.
    
    
    
    Data Science, on the other hand, focuses on extracting meaningful patterns and trends from large datasets using advanced algorithms. Data Quality is a critical component of BDaaS, ensuring that the data being analyzed is accurate, complete, and consistent. Data Security is another essential aspect, safeguarding sensitive data from cybersecurity threats and data breaches. Moreover, BDaaS offers various data pipelines, enabling seamless data integration and data lifecycle management. Network Analysis, Real-time Analytics, and Predictive Analytics are other essential components, providing businesses with actionable insights in real-time and enabling them to anticipate future trends. Data Mining, Machine Learning Algorithms, and Data Visualization Tools are other essential components of BDaaS.
    

    How is this market segmented and which is the largest segment?

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Type
    
      Data analytics-as-a-Service
      Hadoop-as-a-service
      Data-as-a-service
    
    
    Deployment
    
      Public cloud
      Hybrid cloud
      Private cloud
    
    
    Geography
    
      North America
    
        Canada
        US
    
    
      APAC
    
        China
    
    
      Europe
    
        Germany
        UK
    
    
      South America
    
    
    
      Middle East and Africa
    

    By Type Insights

    The data analytics-as-a-service segment is estimated to witness significant growth during the forecast period.
    

    Big Data as a Service (BDaaS) is a significant market segment, highlighted by the availability of Hadoop-as-a-Service solutions. These offerings enable businesses to access essential datasets on-demand without the burden of expensive infrastructure. DAaaS solutions facilitate real-time data analysis, empowering organizations to make informed decisions. The DAaaS landscape is expanding rapidly as companies acknowledge its value in enhancing internal data. Integrating DAaaS with big data systems amplifies analytics capabilities, creating a vibrant market landscape. Organizations can leverage diverse datasets to gain a competitive edge, driving the growth of the global BDaaS market. In the context of digital transformation, cloud computing, IoT, and 5G technologies, BDaaS solutions offer optimal resource utilization.

    However, regulatory scrutiny poses challenges, necessitating stringent data security measures. Retail and other industries stand to benefit significantly from BDaaS, particularly with distributed computing solutions. DAaaS adoption is a strategic investment for businesses seeking to capitalize on the power of external data for valuable insights.

    Get a glance at the market report of share of various segments Request Free Sample

    The Data analytics-as-a-Service segment was valued at USD 2.59 billion in 2018 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 35% to the growth of the global market during the forecast period.
    

    Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

    For more insights on the market share of various regions Request Free Sample

    Big Data as a Service Market analysis, North America is experiencing signif

  20. Big Data and Data Engineering Services Market Report | Global Forecast From...

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Big Data and Data Engineering Services Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/big-data-and-data-engineering-services-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 12, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Big Data and Data Engineering Services Market Outlook



    The global market size for Big Data and Data Engineering Services was valued at approximately USD 45.6 billion in 2023 and is expected to reach USD 136.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 13.2% during the forecast period. This robust growth is primarily driven by the increasing volume of data being generated across industries, advancements in data analytics technologies, and the rising importance of data-driven decision-making. Enterprises of all sizes are progressively leveraging big data solutions to gain strategic insights and maintain competitive advantage, thereby fueling market growth.



    One of the pivotal growth factors for the Big Data and Data Engineering Services market is the exponential rise in data generation. With the advent of the Internet of Things (IoT), social media, and digital interactions, the volume of data generated daily is staggering. This data, if harnessed effectively, can provide invaluable insights into consumer behaviors, market trends, and operational efficiencies. Companies are increasingly investing in data engineering services to streamline and manage this data effectively. Additionally, the adoption of advanced analytics and machine learning techniques is enabling organizations to derive actionable insights, further driving the market's expansion.



    Another significant growth driver is the technological advancements in data processing and analytics. The development of sophisticated data engineering tools and platforms has made it easier to collect, store, and analyze large datasets. Cloud computing has played a crucial role in this regard, offering scalable and cost-effective solutions for data management. The integration of artificial intelligence (AI) and machine learning (ML) in data analytics is enhancing the ability to predict trends and make informed decisions, thereby contributing to the market's growth. Furthermore, continuous innovations in data security and privacy measures are instilling confidence among businesses to invest in big data solutions.



    The increasing emphasis on regulatory compliance and data governance is also propelling the market forward. Industries such as BFSI, healthcare, and government are subject to stringent regulatory requirements for data management and protection. Big Data and Data Engineering Services are essential in ensuring compliance with these regulations by maintaining data accuracy, integrity, and security. The implementation of data governance frameworks is becoming a top priority for organizations to mitigate risks associated with data breaches and ensure ethical data usage. This regulatory landscape is creating a conducive environment for the adoption of comprehensive data engineering services.



    Regionally, North America dominates the Big Data and Data Engineering Services market, owing to the presence of major technology companies, high adoption of advanced analytics, and significant investments in R&D. However, the Asia Pacific region is expected to exhibit the highest growth rate due to rapid digital transformation, increasing internet penetration, and growing awareness about the benefits of data-driven decision-making among businesses. Europe also represents a significant market share, driven by the strong presence of industrial and technological sectors that rely heavily on data analytics.



    Service Type Analysis



    Data Integration is a critical component of Big Data and Data Engineering Services, encompassing the process of combining data from different sources to provide a unified view. This service type is instrumental for organizations aiming to harness data from various departments, applications, and geographies. The increasing complexity of data landscapes, characterized by disparate data sources and formats, necessitates efficient data integration solutions. Companies are investing heavily in data integration technologies to consolidate their data, improve accessibility, and enhance the quality of insights derived from analytical processes. This segment's growth is further fueled by advancements in integration tools that support real-time data processing and seamless connectivity.



    Data Quality services ensure the accuracy, completeness, and reliability of data, which is essential for effective decision-making. Poor data quality can lead to misinformed decisions, operational inefficiencies, and regulatory non-compliance. As organizations increasingly recognize the criticality of data quality, there is a growing demand for robust data quality solutions. These services include da

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jairson Rodrigues (2019). Big Data Machine Learning Benchmark on Spark [Dataset]. https://ieee-dataport.org/open-access/big-data-machine-learning-benchmark-spark

Data from: Big Data Machine Learning Benchmark on Spark

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 6, 2019
Authors
Jairson Rodrigues
Description

net traffic

Search
Clear search
Close search
Google apps
Main menu