42 datasets found

D
Data Analytics Software Report
archivemarketresearch.com
doc, pdf, ppt
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Data Analytics Software Report [Dataset]. https://www.archivemarketresearch.com/reports/data-analytics-software-558003
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
May 4, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Data Analytics Software market is experiencing robust growth, driven by the increasing adoption of cloud-based solutions, the expanding volume of big data, and the rising demand for data-driven decision-making across various industries. The market, valued at approximately $150 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% during the forecast period of 2025-2033. This significant expansion is fueled by several key factors. Businesses are increasingly recognizing the strategic importance of data analytics in optimizing operations, enhancing customer experiences, and gaining a competitive edge. The shift towards cloud-based solutions offers scalability, cost-effectiveness, and accessibility, making data analytics accessible to a broader range of businesses, from SMEs to large enterprises. Furthermore, advancements in artificial intelligence (AI) and machine learning (ML) are integrating seamlessly into data analytics platforms, providing more sophisticated insights and predictive capabilities. The market's growth is further segmented by deployment model (on-premise vs. cloud-based) and user type (SMEs vs. large enterprises), reflecting the diverse needs and adoption rates across various business segments. While the market presents substantial opportunities, certain challenges persist. Data security and privacy concerns remain paramount, requiring robust security measures and compliance with evolving regulations. The complexity of implementing and managing data analytics solutions can also pose a barrier to entry for some organizations, requiring skilled professionals and substantial investments in infrastructure and training. Despite these challenges, the long-term outlook for the Data Analytics Software market remains highly positive, driven by continuous technological innovation, growing data volumes, and the increasing strategic importance of data-driven decision-making across industries. The market's evolution will continue to be shaped by the ongoing integration of AI and ML, the expansion of cloud-based offerings, and the increasing demand for advanced analytics capabilities. This dynamic landscape will present both challenges and opportunities for existing players and new entrants alike.
u
Data from: Current and projected research data storage needs of Agricultural...
agdatacommons.nal.usda.gov
datasets.ai
+2more
pdf
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia Parr (2023). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. http://doi.org/10.15482/USDA.ADC/1346946
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1346946
Dataset updated
Nov 30, 2023
Dataset provided by
Ag Data Commons
Authors
Cynthia Parr
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.

Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
m
Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves)
data.mendeley.com
narcis.nl
Updated Mar 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Dubarry (2021). Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves) [Dataset]. http://doi.org/10.17632/bs2j56pn7y.2
Explore at:
Unique identifier
https://doi.org/10.17632/bs2j56pn7y.2
Dataset updated
Mar 12, 2021
Authors
Matthieu Dubarry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details

The V vs. Q dataset was compiled with a resolution of 0.01 for the triplets and C/25 charges. This accounts for more than 5,000 different paths. Each path was simulated with at most 0.85% increases for each The training dataset, therefore, contains more than 700,000 unique voltage vs. capacity curves.

4 Variables are included, see read me file for details and example how to use. Cell info: Contains information on the setup of the mechanistic model Qnorm: normalize capacity scale for all voltage curves pathinfo: index for simulated conditions for all voltage curves volt: voltage data. Each column corresponds to the voltage simulated under the conditions of the corresponding line in pathinfo.
S
How Big Data Applications Drive Disruptive Innovation: Evidence from China’s...
scidb.cn
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia-Hui Zhang (2025). How Big Data Applications Drive Disruptive Innovation: Evidence from China’s Manufacturing Firms [Dataset]. http://doi.org/10.57760/sciencedb.25655
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.25655
Dataset updated
Jun 6, 2025
Dataset provided by
Science Data Bank
Authors
Jia-Hui Zhang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
China
Description
Based on the above, this study selects A-share-listed manufacturing firms from 2014 to 2023 as the research sample. Data on big data applications is extracted from firm annual reports published on the official websites of the Shenzhen Stock Exchange and the Shanghai Stock Exchange. To ensure the validity and robustness of the constructed indicators, the measurement of disruptive innovation draws on patent data from the China National Intellectual Property Administration (CNIPA), covering the period from 2000 to 2023. The specific measurement methodology is detailed in Section 3.2. Additional firm-level data are primarily obtained from the China Stock Market & Accounting Research (CSMAR) database and Wind Information Co., Ltd. (Wind). The data were processed as follows: (1) Firms designated as Special Treatment (ST, Firms that have exhibited financial distress for two consecutive years), particularly Special Treatment (*ST, Firms that have reported consecutive losses for three years or face the risk of trading suspension), or Particular Transfer (PT) were excluded; (2) Financial institutions were removed; (3) Firms with substantial missing values for key variables were excluded. (4) To mitigate the influence of extreme values on the empirical results, selected variables—such as market-oriented disruptive innovation, technology-oriented disruptive innovation, managerial myopia, and government intervention—were winsorized at the 1st and 99th percentiles. After applying the above criteria, a total of 21,203 valid firm-year observations were retained for analysis.
f
Data_Sheet_1_Advanced large language models and visualization tools for data...
frontiersin.figshare.com
txt
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1418006.s001
Dataset updated
Aug 8, 2024
Dataset provided by
Frontiers
Authors
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.
m
A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...
data.mendeley.com
data.niaid.nih.gov
+2more
Updated Jun 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and other sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.17632/rs6jnrjfsx.1
Explore at:
Unique identifier
https://doi.org/10.17632/rs6jnrjfsx.1
Dataset updated
Jun 24, 2024
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Please cite the following paper when using this dataset:

N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: https://doi.org/10.48550/arXiv.2406.07693

Abstract

This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
o
Data from: A consensus compound/bioactivity dataset for data-driven drug...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Mar 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Isigkeit; Apirat Chaikuad; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6398019
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6398019
Dataset updated
Mar 2, 2022
Authors
Laura Isigkeit; Apirat Chaikuad; Daniel Merk
Description
This is the updated version of the dataset from 10.5281/zenodo.6320761 Information The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144648 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design. The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation. This dataset belongs to the publication: https://doi.org/10.3390/molecules27082513 Structure and content of the dataset Dataset structure ChEMBL ID PubChem ID IUPHAR ID Target Activity type Assay type Unit Mean C (0) ... Mean PC (0) ... Mean B (0) ... Mean I (0) ... Mean PD (0) ... Activity check annotation Ligand names Canonical SMILES C ... Structure check (Tanimoto) Source The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file. Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format. Column content: ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases Target: biological target of the molecule expressed as the HGNC gene symbol Activity type: for example, pIC50 Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified Unit: unit of bioactivity measurement Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence no comment: bioactivity values are within one log unit; check activity data: bioactivity values are not within one log unit; only one data point: only one value was available, no comparison and no range calculated; no activity value: no precise numeric activity value was available; no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration Ligand names: all unique names contained in the five source databases are listed Canonical SMILES columns: Molecular structure of the compound from each database Structure check (Tanimoto): To denote matching or differing compound structures in different source databases match: molecule structures are the same between different sources; no match: the structures differ. We calculated the Jaccard-Tanimoto similarity coefficient from Morgan Fingerprints to reveal true differences between sources and reported the minimum value; 1 structure: no structure comparison is possible, because there was only one structure available; no structure: no structure comparison is possible, because there was no structure available. Source: From which databases the data come from
f
Speed in MR/m and Peak memory (in GB per process) for querying database...
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
José M. Abuín; Nuno Lopes; Luís Ferreira; Tomás F. Pena; Bertil Schmidt (2023). Speed in MR/m and Peak memory (in GB per process) for querying database AFS31RS90 and dataset KAL_D in Big Data cluster. [Dataset]. http://doi.org/10.1371/journal.pone.0239741.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0239741.t006
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
José M. Abuín; Nuno Lopes; Luís Ferreira; Tomás F. Pena; Bertil Schmidt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Speed in MR/m and Peak memory (in GB per process) for querying database AFS31RS90 and dataset KAL_D in Big Data cluster.
Data Monetization Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Data Monetization Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-monetization-market-global-industry-analysis
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Data Monetization Market Outlook

According to our latest research, the global data monetization market size reached USD 3.6 billion in 2024, demonstrating robust momentum driven by the increasing adoption of data-driven business models across multiple sectors. The market is expected to register a CAGR of 18.2% from 2025 to 2033, propelling the market to an estimated USD 18.8 billion by 2033. This remarkable growth trajectory is primarily attributed to the surging demand for actionable business intelligence, the proliferation of big data analytics, and the strategic imperative for enterprises to unlock new revenue streams from their data assets.

One of the most significant growth factors for the data monetization market is the exponential increase in data generation across industries such as BFSI, healthcare, retail, and telecommunications. As organizations collect vast volumes of structured and unstructured data from customer interactions, transactions, and IoT devices, the imperative to derive value from these data sets has never been greater. The evolution of advanced analytics, machine learning, and artificial intelligence has enabled enterprises to analyze, segment, and commercialize their data, either by improving internal processes or by creating new data-centric products and services. This shift is further bolstered by the growing recognition among C-level executives that data is a strategic asset, capable of driving innovation, enhancing customer experiences, and unlocking new growth opportunities.

Another critical driver is the increasing regulatory focus on data privacy and compliance, which, paradoxically, is fostering innovation in data monetization strategies. With regulations such as GDPR and CCPA setting stringent guidelines for data usage, organizations are investing in secure data platforms and consent management tools to ensure compliance while still extracting value from their data. This has led to the emergence of privacy-preserving data monetization models, such as data anonymization and federated learning, which enable organizations to monetize data without compromising customer trust or violating regulatory mandates. The convergence of regulatory compliance and data monetization is thus creating a fertile ground for technology providers to offer differentiated solutions tailored to industry-specific needs.

The proliferation of cloud computing and the rise of data marketplaces are also catalyzing the growth of the data monetization market. Cloud platforms provide scalable infrastructure and advanced analytics capabilities, enabling organizations of all sizes to store, process, and monetize their data efficiently. Furthermore, the emergence of data marketplaces and data exchanges is democratizing access to third-party data, allowing businesses to buy, sell, or trade data assets seamlessly. This trend is particularly pronounced among small and medium enterprises (SMEs), which can now participate in the data economy without the need for substantial upfront investments in IT infrastructure. As a result, the data monetization ecosystem is becoming increasingly dynamic, with new business models and value chains emerging at a rapid pace.

From a regional perspective, North America continues to dominate the data monetization market owing to its mature digital infrastructure, high adoption of advanced analytics, and a strong culture of innovation. The presence of leading technology vendors and a large base of data-driven enterprises further strengthens the region's position. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitalization, expanding internet penetration, and a burgeoning start-up ecosystem. Europe, with its focus on data privacy and regulatory compliance, is also witnessing significant investments in secure data monetization platforms. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, supported by increasing awareness and government-led digital transformation initiatives.

Component Analysis

<b
z
Vitamin D deficiency and SARS‑CoV‑2 infection: D-COVID study
zenodo.org
Updated Sep 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marta Neira Álvarez; Noemi Anguita Sánchez; Gema Navarro Jiménez; María del Mar Bermejo Olano; Rocío Queipó; María Benavent Nuñez; Alejandro Parralejo Jiménez; Guillermo López Yepes; Carmen Sáez Nieto; Marta Neira Álvarez; Noemi Anguita Sánchez; Gema Navarro Jiménez; María del Mar Bermejo Olano; Rocío Queipó; María Benavent Nuñez; Alejandro Parralejo Jiménez; Guillermo López Yepes; Carmen Sáez Nieto (2022). Vitamin D deficiency and SARS‑CoV‑2 infection: D-COVID study [Dataset]. http://doi.org/10.5281/zenodo.7053208
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7053208
Dataset updated
Sep 6, 2022
Dataset provided by
Zenodo
Authors
Marta Neira Álvarez; Noemi Anguita Sánchez; Gema Navarro Jiménez; María del Mar Bermejo Olano; Rocío Queipó; María Benavent Nuñez; Alejandro Parralejo Jiménez; Guillermo López Yepes; Carmen Sáez Nieto; Marta Neira Álvarez; Noemi Anguita Sánchez; Gema Navarro Jiménez; María del Mar Bermejo Olano; Rocío Queipó; María Benavent Nuñez; Alejandro Parralejo Jiménez; Guillermo López Yepes; Carmen Sáez Nieto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
D-COVID is a proyect to study the association between COVID-19 infection and vitamin D deficiency in patients of a terciary university hospital. To investigate the clinical evolution and prognosis of patients with COVID-19 and vitamin D deficiency, several queries qere launched into a Database containing plain text apparitions of certain terms in medical reports, as well as structured data. These apparitions were detected using NLP technology, and then saved individually in a database. The presented dataset is a bounded version of such database, containing only relevant data to the one extracted for the associated study. As for the strcture of the dataset, each row represents an apparition of a term in plain text, and the columns contain additional information.

- reportdate: date of the report where the term appears up.

- admission_days: strctured data days in hospitalization, if any.

- patient_id: anonymized patient identifier

- sex: patient sex (1=male, 2=female)

-birthdate: patient birthdate

- service: Service where the report was generated

- report_type: Type of report generated (either a discharge report, note, etc)

- record: unique identifier for the report itself

- term: term (that was read (NLP takes synonims and acronyms into account)

- exitus: medical exitus, if available in structured data, it could also be found in plan text in the previous column.
d
Competitive Intelligence Data -Food & Beverage Industry - USA
datarade.ai
Updated Nov 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Predik Data-driven (2022). Competitive Intelligence Data -Food & Beverage Industry - USA [Dataset]. https://datarade.ai/data-products/competitive-intelligence-data-for-food-beverage-industry-predik-data-driven
Explore at:
.json, .csv, .xml, .xlsAvailable download formats
Dataset updated
Nov 11, 2022
Dataset authored and provided by
Predik Data-driven
Area covered
United States
Description
Competitive intelligence monitoring goes beyond your sales team. Our CI solutions also bring powerful insights to your production, logistics, operation & marketing departments.

Why should you use our Competitive intelligence data? 1. Increase visibility: Our geolocation approach allows us to “get inside” any facility in the US, providing visibility in places where other solutions do not reach. 2. In-depth 360º analysis: Perform a unique and in-depth analysis of competitors, suppliers and customers. 3. Powerful Insights: We use alternative data and big data methodologies to peel back the layers of any private or public company. 4. Uncover your blind spots against leading competitors: Understand the complete business environment of your competitors, from third-tier suppliers to main investors. 5. Identify business opportunities: Analyze your competitor's strategic shifts and identify unnoticed business opportunities and possible threats or disruptions. 6. Keep track of your competitor´s influence around any specific area: Maintain constant monitoring of your competitors' actions and their impact on specific market areas.

How other companies are using our CI Solution? 1. Enriched Data Intelligence: Our Market Intelligence data bring you key insights from different angles. 2. Due Diligence: Our data provide the required panorama to evaluate a company’s cross-company relations to decide whether or not to proceed with an acquisition. 3. Risk Assessment: Our CI approach allows you to anticipate potential disruptions by understanding behavior in all the supply chain tiers. 4. Supply Chain Analysis: Our advanced Geolocation approach allows you to visualize and map an entire supply chain network. 5. Insights Discovery: Our relationship identifiers algorithms generate data matrix networks that uncover new and unnoticed insights within a specific market, consumer segment, competitors' influence, logistics shifts, and more.

From "digital" to the real field: Most competitive intelligence companies focus their solutions analysis on social shares, review sites, and sales calls. Our competitive intelligence strategy consists on tracking the real behavior of your market on the field, so that you can answer questions like: -What uncovered need does my market have? -How much of a threat is my competition? -How is the market responding to my competitor´s offer? -How my competitors are changing? -Am I losing or winning market?
f
Using Virtuoso as an alternate triple store for a VIVO instance
vivo.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Albert; Eliza Chan; Prakesh Adekkanattu; Mohammad Mansour (2023). Using Virtuoso as an alternate triple store for a VIVO instance [Dataset]. http://doi.org/10.6084/m9.figshare.2002032.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2002032.v2
Dataset updated
May 30, 2023
Dataset provided by
VIVO
Authors
Paul Albert; Eliza Chan; Prakesh Adekkanattu; Mohammad Mansour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: For some time, the VIVO for Weill Cornell Medical College (WCMC) had struggled with both unacceptable page load times and unreliable uptime. With some individual profiles containing upwards of 800 publications, WCMC VIVO has relatively large profiles, but no profile was so large that it could account for this performance. The WCMC VIVO Implementation Team explored a number of options for improving performance including caching, better hardware, query optimization, limiting user access to large pages, using another instance of Tomcat, throttling bots, and blocking IP's issuing too many requests. But none of these avenues were fruitful. Analysis of triple stores: With the 1.7 version, VIVO ships with the Jena SDB triple store, but the SDB version of Jena is no longer supported by its developers. In April, we reviewed various published analyses and benchmarks suggesting there were alternatives to Jena such as Virtuoso that perform better than even Jena's successor, TDB. In particular, the Berlin SPARQL Benchmark v. 3.1[1] showed that Virtuoso had the strongest performance compared to the other data stores measured including BigData, BigOwlim, and Jena TDB. In addition, Virtuoso is used on dbpedia.org which serves up 3 billion triples compared to the only 12 million with WCMC's VIVO site. Whereas Jena SDB stores its triples in a MySQL database, Virtuoso manages its in a binary file. The software is available in open source and commercial editions. Configuration: In late 2014, we installed Virtuoso on a local machine and loaded data from our production VIVO. Some queries completed in about 10% of the time as compared to our production VIVO. However, we noticed that the listview queries invoked whenever profile pages were loaded were still slow. After soliciting feedback from members of both the Virtuoso and VIVO communities, we modified these queries to rely on the OPTIONAL instead of UNION construct. This modification, which wasn't possible in a Jena SDB environment, reduced by eight-fold the number of queries that the application makes of the triple store. About four or five additional steps were required for VIVO and Virtuoso to work optimally with one another; these are documented in the VIVO Duraspace wiki. Results: On March 31, WCMC launched Virtuoso in its production environment. According to our instance of New Relic, VIVO has an average page load of about four seconds and 99% uptime, both of which are dramatic improvements. There are opportunities for further tuning: the four second average includes pages such as the visualizations as well as pages served up to logged in users, which are slower than other types of pages. [1] http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/#comparison
d
Smart Triage Jinja Data De-identification
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mawji, Alishah (2023). Smart Triage Jinja Data De-identification [Dataset]. http://doi.org/10.5683/SP3/MSTH98
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/MSTH98
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Mawji, Alishah
Description
This dataset contains de-identified data with an accompanying data dictionary and the R script for de-identification procedures., Objective(s): To demonstrate application of a risk based de-identification framework using the Smart Triage dataset as a clinical example. Data Description: This dataset contains the de-identified version of the Smart Triage Jinja dataset with the accompanying data dictionary and R script for de-identification procedures. Limitations: Utility of the de-identified dataset has only been evaluated with regard to use for the development of prediction models based on a need for hospital admission. Abbreviations: NA Ethics Declaration: The study was reviewed by the instituational review boards at the University of British Columbia in Canada (ID: H19-02398; H20-00484), The Makerere University School of Public Health in Uganda and the Uganda National Council for Science and Technology
d
Grand Challenges, Big Data, Fuzzy Data, and Digital Archaeology: Integrating...
search.dataone.org
Updated Dec 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabinowitz, Adam (University of Texas at Austin) (2018). Grand Challenges, Big Data, Fuzzy Data, and Digital Archaeology: Integrating information about the past into the Planet Texas 2050 data platform (PowerPoint slides) [Dataset]. http://doi.org/10.6067/XCV8447171
Explore at:
Unique identifier
https://doi.org/10.6067/XCV8447171
Dataset updated
Dec 22, 2018
Dataset provided by
the Digital Archaeological Record
Authors
Rabinowitz, Adam (University of Texas at Austin)
Area covered

Description
As our generation and collection of quantitative digital data increase, so do our ambitions for extracting new insights and knowledge from those data. In recent years, those ambitions have manifested themselves in so-called “Grand Challenge” projects coordinated by academic institutions. These projects are often broadly interdisciplinary and attempt to address to major issues facing the world in the present and the future through the collection and integration of diverse types of scientific data. In general, however, disciplines that focus on the past are underrepresented in this environment – in part because these grand challenges tend to look forward rather than back, and in part because historical disciplines tend to produce qualitative, incomplete data that are difficult to mesh with the more continuous quantitative data sets provided by scientific observation. Yet historical information is essential for our understanding of long-term processes, and should thus be incorporated into our efforts to solve present and future problems. Archaeology, an inherently interdisciplinary field of knowledge that bridges the gap between the quantitative and the qualitative, can act as a connector between the study of the past and data-driven attempts to address the challenges of the future. To do so, however, we must find new ways to integrate the results of archaeological research into the digital platforms used for the modeling and analysis of much bigger data.

Planet Texas 2050 is a grand challenge project recently launched by The University of Texas at Austin. Its central goal is to understand the dynamic interactions between water supply, urbanization, energy use, and ecosystems services in Texas, a state that will be especially affected by climate change and population mobility by the middle of the 21st century. Like many such projects, one of the products of Planet Texas 2050 will be an integrated data platform that will make it possible to model various scenarios and help decision-makers project the results of present policies or trends into the future. Unlike other such projects, however, PT2050 incorporates data collected from past societies, primarily through archaeological inquiry. We are currently designing a data integration and modeling platform that will allow us to bring together quantitative sensor data related to the present environment with “fuzzier” data collected in the course of research in the social sciences and humanities. Digital archaeological data, from LiDAR surveys to genomic information to excavation documentation, will be a central component of this platform. In this paper, I discuss the conceptual integration between scientific “big data” and “medium-sized” archaeological data in PT2050; the process that we are following to catalogue data types, identify domain-specific ontologies, and understand the points of intersection between heterogeneous datasets of varying resolution and precision as we construct the data platform; and how we propose to incorporate digital data from archaeological research into integrated modeling and simulation modules.
C
National Hydrography Data - NHD and 3DHP
data.cnra.ca.gov
data.ca.gov
+3more
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Water Resources (2025). National Hydrography Data - NHD and 3DHP [Dataset]. https://data.cnra.ca.gov/dataset/national-hydrography-dataset-nhd
Explore at:
pdf, csv(12977), zip(73817620), pdf(3684753), website, zip(13901824), pdf(4856863), web videos, zip(578260992), pdf(1436424), zip(128966494), pdf(182651), zip(972664), zip(10029073), zip(1647291), pdf(1175775), zip(4657694), pdf(1634485), zip(15824984), zip(39288832), arcgis geoservices rest api, pdf(437025), pdf(9867020)Available download formats
Dataset updated
Jul 1, 2025
Dataset authored and provided by
California Department of Water Resources
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The USGS National Hydrography Dataset (NHD) downloadable data collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes information about naturally occurring and constructed bodies of surface water (lakes, ponds, and reservoirs), paths through which water flows (canals, ditches, streams, and rivers), and related entities such as point features (springs, wells, stream gages, and dams). The information encoded about these features includes classification and other characteristics, delineation, geographic name, position and related measures, a "reach code" through which other information can be related to the NHD, and the direction of water flow. The network of reach codes delineating water and transported material flow allows users to trace movement in upstream and downstream directions. In addition to this geographic information, the dataset contains metadata that supports the exchange of future updates and improvements to the data. The NHD supports many applications, such as making maps, geocoding observations, flow modeling, data maintenance, and stewardship. For additional information on NHD, go to https://www.usgs.gov/core-science-systems/ngp/national-hydrography.

DWR was the steward for NHD and Watershed Boundary Dataset (WBD) in California. We worked with other organizations to edit and improve NHD and WBD, using the business rules for California. California's NHD improvements were sent to USGS for incorporation into the national database. The most up-to-date products are accessible from the USGS website. Please note that the California portion of the National Hydrography Dataset is appropriate for use at the 1:24,000 scale.

For additional derivative products and resources, including the major features in geopackage format, please go to this page: https://data.cnra.ca.gov/dataset/nhd-major-features Archives of previous statewide extracts of the NHD going back to 2018 may be found at https://data.cnra.ca.gov/dataset/nhd-archive.

In September 2022, USGS officially notified DWR that the NHD would become static as USGS resources will be devoted to the transition to the new 3D Hydrography Program (3DHP). 3DHP will consist of LiDAR-derived hydrography at a higher resolution than NHD. Upon completion, 3DHP data will be easier to maintain, based on a modern data model and architecture, and better meet the requirements of users that were documented in the Hydrography Requirements and Benefits Study (2016). The initial releases of 3DHP include NHD data cross-walked into the 3DHP data model. It will take several years for the 3DHP to be built out for California. Please refer to the resources on this page for more information.

The FINAL,STATIC version of the National Hydrography Dataset for California was published for download by USGS on December 27, 2023. This dataset can no longer be edited by the state stewards. The next generation of national hydrography data is the USGS 3D Hydrography Program (3DHP).

Questions about the California stewardship of these datasets may be directed to nhd_stewardship@water.ca.gov.
Big data services revenue in Asia-Pacific (excl. Japan) 2012-2017
statista.com
Updated Oct 30, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2014). Big data services revenue in Asia-Pacific (excl. Japan) 2012-2017 [Dataset]. https://www.statista.com/statistics/496266/big-data-services-revenue-asia-pacific/
Explore at:
Dataset updated
Oct 30, 2014
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2012 - 2014
Area covered
Asia–Pacific
Description
This statistic depicts the revenue generated by the big data services market in the Asia Pacific (excluding Japan) from 2012 to 2014, as well as a forecast of revenue from 2015 to 2017. In 2014, revenues associated with the big data services market in the Asia Pacific amounted to *** million U.S. dollars. 'Big data' refers to data sets that are too large or too complex for traditional data processing applications. Additionally, the term is often used to refer to the technologies that enable predictive analytics or other methods of extracting value from data.
Z
Data from: Caravan - A global community dataset for large-sample hydrology
data.niaid.nih.gov
biorxiv.org
+2more
Updated Jan 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gudmundsson, Lukas (2025). Caravan - A global community dataset for large-sample hydrology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6522634
Explore at:
Dataset updated
Jan 16, 2025
Dataset provided by
Nearing, Grey
Kratzert, Frederik
Matias, Yossi
Gauch, Martin
Klotz, Daniel
Nevo, Sella
Gilon, Oren
Gudmundsson, Lukas
Erickson, Tyler
Shalev, Guy
Addor, Nans
Hassidim, Avinatan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w

Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

All current development and additional community extensions can be found at https://github.com/kratzert/Caravan

Channel Log:

23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.

24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.

15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).

1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).

16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan

10 May 2023: Version 1.1 - No data change, just update data description.

17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.

16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code

16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").
P
Phoenix Data Center Market Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Phoenix Data Center Market Report [Dataset]. https://www.marketreportanalytics.com/reports/phoenix-data-center-market-88991
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Apr 21, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Phoenix data center market exhibits robust growth, driven by the increasing demand for cloud computing, big data analytics, and the expansion of digital services across diverse sectors. The market's Compound Annual Growth Rate (CAGR) of 18.50% from 2019 to 2024 suggests a significant upward trajectory, projected to continue into the forecast period (2025-2033). This expansion is fueled by several key factors. Firstly, the rising adoption of colocation services by enterprises seeking enhanced scalability and reduced IT infrastructure management costs is a major catalyst. Secondly, the growing prominence of hyperscale data centers, requiring massive capacity and power, significantly contributes to market growth. Thirdly, government initiatives promoting digital transformation and the expansion of 5G networks further accelerate market expansion. While regulatory hurdles and security concerns present some restraints, the overall market outlook remains positive. The segmentation by DC size (small to mega), tier type (Tier 1-4), and absorption (utilized vs. non-utilized) provides valuable insights into market dynamics. The utilized segment, further broken down by colocation type (retail, wholesale, hyperscale) and end-user (cloud & IT, media & entertainment, etc.), reveals the diverse applications driving demand. Key players like Digital Realty Trust, DataBank, and CyrusOne are strategically positioned to capitalize on this growth, competing based on factors such as location, connectivity, and service offerings. Regional variations exist, with North America and Asia Pacific likely to dominate due to strong technological advancements and growing digital economies. The projected market size for 2025, based on the provided CAGR and considering industry averages for similar markets, is estimated to be in the range of $10-15 billion (USD). This figure is a reasonable estimate given the market's high growth rate and the significant investments made by major players in recent years. The continued growth into 2033 will significantly increase the total market valuation. The market share among major players is likely dynamic, influenced by factors such as strategic acquisitions, expansion into new markets, and technological innovation. Furthermore, the non-utilized segment holds potential for future growth as demand expands and previously unallocated capacity becomes operational. Continuous monitoring of evolving industry trends and technological advancements is essential for predicting precise future market values. Recent developments include: June 2024 - Microsoft has acquired more than 280 acres in the El Mirage area of Phoenix, Arizona. Microsoft purchased the additional property in El Mirage to support data center construction that's already underway in the area, Bowen Wallace, corporate vice president for Microsoft Datacenters and Americas, said in a statement to BJ., March 2024 - Metro Phoenix prevails as the second-largest data center market in the nation, Bolstered by its availability and affordability of power, and its available land for new development, the market during the second half of 2023 completed leases for 748MW of new data center capacity and has 2.8 million square feet (703MW) of capacity under construction. This ranks second only to national data center leader Northern Virginia, with 1.6GW of transactions completed during the same time period and more than 13 million square feet (1,339MW) under construction.. Notable trends are: Cloud computing is anticipated to hold a significant share.
m
AI & Big Data Global Surveillance Index (2022 updated)
data.mendeley.com
Updated Feb 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven Feldstein (2022). AI & Big Data Global Surveillance Index (2022 updated) [Dataset]. http://doi.org/10.17632/gjhf5y4xjp.2
Explore at:
Unique identifier
https://doi.org/10.17632/gjhf5y4xjp.2
Dataset updated
Feb 17, 2022
Authors
Steven Feldstein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This index compiles empirical data on AI and big data surveillance use for 179 countries around the world between 2012 and 2022— although the bulk of the sources stem from between 2017 and 2022. The index does not distinguish between legitimate and illegitimate uses of AI and big data surveillance. Rather, the purpose of the research is to show how new surveillance capabilities are transforming governments’ ability to monitor and track individuals or groups. Last updated February 2022.

This index addresses three primary questions: Which countries have documented AI and big data public surveillance capabilities? What types of AI and big data public surveillance technologies are governments deploying? And which companies are involved in supplying this technology?

The index measures AI and big data public surveillance systems deployed by state authorities, such as safe cities, social media monitoring, or facial recognition cameras. It does not assess the use of surveillance in private spaces (such as privately-owned businesses in malls or hospitals), nor does it evaluate private uses of this technology (e.g., facial recognition integrated in personal devices). It also does not include AI and big data surveillance used in Automated Border Control systems that are commonly found in airport entry/exit terminals. Finally, the index includes a list of frequently mentioned companies – by country – which source material indicates provide AI and big data surveillance tools and services.

All reference source material used to build the index has been compiled into an open Zotero library, available at https://www.zotero.org/groups/2347403/global_ai_surveillance/items. The index includes detailed information for seventy-seven countries where open source analysis indicates that governments have acquired AI and big data public surveillance capabilities. The index breaks down AI and big data public surveillance tools into the following categories: smart city/safe city, public facial recognition systems, smart policing, and social media surveillance.

The findings indicate that at least seventy-seven out of 179 countries are actively using AI and big data technology for public surveillance purposes:

• Smart city/safe city platforms: fifty-five countries • Public facial recognition systems: sixty-eight countries • Smart policing: sixty-one countries • Social media surveillance: thirty-six countries
z
Data for Medical Data Science Shortcourse
zenodo.org
gz, sas7bdat
Updated Aug 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Kunzmann (2019). Data for Medical Data Science Shortcourse [Dataset]. http://doi.org/10.5281/zenodo.3379064
Explore at:
gz, sas7bdatAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3379064
Dataset updated
Aug 27, 2019
Authors
Kevin Kunzmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a .csv version of the World Bank Data on Health Nutrition and Population, cf. https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics and derived data sets for training purposes.

Facebook

Twitter

Click to copy link

Link copied

Cite

Archive Market Research (2025). Data Analytics Software Report [Dataset]. https://www.archivemarketresearch.com/reports/data-analytics-software-558003

Data Analytics Software Report

Explore at:

doc, pdf, pptAvailable download formats

Dataset updated

May 4, 2025

Dataset authored and provided by

Archive Market Research

License

https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The global Data Analytics Software market is experiencing robust growth, driven by the increasing adoption of cloud-based solutions, the expanding volume of big data, and the rising demand for data-driven decision-making across various industries. The market, valued at approximately $150 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% during the forecast period of 2025-2033. This significant expansion is fueled by several key factors. Businesses are increasingly recognizing the strategic importance of data analytics in optimizing operations, enhancing customer experiences, and gaining a competitive edge. The shift towards cloud-based solutions offers scalability, cost-effectiveness, and accessibility, making data analytics accessible to a broader range of businesses, from SMEs to large enterprises. Furthermore, advancements in artificial intelligence (AI) and machine learning (ML) are integrating seamlessly into data analytics platforms, providing more sophisticated insights and predictive capabilities. The market's growth is further segmented by deployment model (on-premise vs. cloud-based) and user type (SMEs vs. large enterprises), reflecting the diverse needs and adoption rates across various business segments. While the market presents substantial opportunities, certain challenges persist. Data security and privacy concerns remain paramount, requiring robust security measures and compliance with evolving regulations. The complexity of implementing and managing data analytics solutions can also pose a barrier to entry for some organizations, requiring skilled professionals and substantial investments in infrastructure and training. Despite these challenges, the long-term outlook for the Data Analytics Software market remains highly positive, driven by continuous technological innovation, growing data volumes, and the increasing strategic importance of data-driven decision-making across industries. The market's evolution will continue to be shaped by the ongoing integration of AI and ML, the expansion of cloud-based offerings, and the increasing demand for advanced analytics capabilities. This dynamic landscape will present both challenges and opportunities for existing players and new entrants alike.

Clear search

Close search

Google apps

Main menu

Data Analytics Software Report

Data from: Current and projected research data storage needs of Agricultural...

Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves)

How Big Data Applications Drive Disruptive Innovation: Evidence from China’s...

Data_Sheet_1_Advanced large language models and visualization tools for data...

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

Data from: A consensus compound/bioactivity dataset for data-driven drug...

Speed in MR/m and Peak memory (in GB per process) for querying database...

Data Monetization Market Research Report 2033

Data Monetization Market Outlook

Component Analysis

Vitamin D deficiency and SARS‑CoV‑2 infection: D-COVID study

Competitive Intelligence Data -Food & Beverage Industry - USA

Using Virtuoso as an alternate triple store for a VIVO instance

Smart Triage Jinja Data De-identification

Grand Challenges, Big Data, Fuzzy Data, and Digital Archaeology: Integrating...

National Hydrography Data - NHD and 3DHP

Big data services revenue in Asia-Pacific (excl. Japan) 2012-2017

Data from: Caravan - A global community dataset for large-sample hydrology

Phoenix Data Center Market Report

AI & Big Data Global Surveillance Index (2022 updated)

Data for Medical Data Science Shortcourse

Data Analytics Software Report