This graph presents the result of a worldwide survey conducted by Accenture into what is considered to be part of big data in 2014. In 2014, 60 percent of respondents felt that advanced analytics or analysis were part of big data.
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
According to a 2023 global survey, an increasing share of businesses believe they are making effective use of data. Over ************** of respondents said that they were driving innovation with data, while **** considered their businesses to be competing on data and analytics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The era of big data is yet a reality for businesses and individuals. In recent year, the academic literature exploring this field has grown rapidly. This article aimed to identify the main fields and features of the published papers about big data analytics. The methodological approach considered was a bibliometric research at the ISI Web of Science platform, whose focus was given to the big data management issues. It was possible to identify five distinct groups within the published papers: evolution of big data; management, business and strategy; human behavior and the social and cultural aspects; data mining and knowledge generation; Internet of Things. It was possible to conclude that big data corresponds to an emerging theme, which is not yet consolidated. There is a wide variation in the terms used, which influences the bibliographic searches. Therefore, as a complimentary contribution of this research, the main keywords used in such articles were identified, which contributes for bibliometric research of future studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Profiling of individuals based on inborn, acquired, and assigned characteristics is central for decision making in health care. In the era of omics and big smart data, it becomes urgent to differentiate between different data governance affordances for different profiling activities. Typically, diagnostic profiling is in the focus of researchers and physicians, and other types are regarded as undesired side-effects; for example, in the connection of health care insurance risk calculations. Profiling in a legal sense is addressed, for example, by the EU data protection law. It is defined in the General Data Protection Regulation as automated decision making. This term does not correspond fully with profiling in biomedical research and healthcare, and the impact on privacy has hardly ever been examined. But profiling is also an issue concerning the fundamental right of non-discrimination, whenever profiles are used in a way that has a discriminatory effect on individuals. Here, we will focus on genetic profiling, define related notions as legal and subject-matter definitions frequently differ, and discuss the ethical and legal challenges.
According to the source's data, the market value of big data analytics in Italy increased steadily over the period considered, growing from 790 million euros in 2015 to approximately 1.8 billion euros in 2020.
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QEhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QE
WIDEa is R-based software aiming to provide users with a range of functionalities to explore, manage, clean and analyse "big" environmental and (in/ex situ) experimental data. These functionalities are the following, 1. Loading/reading different data types: basic (called normal), temporal, infrared spectra of mid/near region (called IR) with frequency (wavenumber) used as unit (in cm-1); 2. Interactive data visualization from a multitude of graph representations: 2D/3D scatter-plot, box-plot, hist-plot, bar-plot, correlation matrix; 3. Manipulation of variables: concatenation of qualitative variables, transformation of quantitative variables by generic functions in R; 4. Application of mathematical/statistical methods; 5. Creation/management of data (named flag data) considered as atypical; 6. Study of normal distribution model results for different strategies: calibration (checking assumptions on residuals), validation (comparison between measured and fitted values). The model form can be more or less complex: mixed effects, main/interaction effects, weighted residuals.
This statistic presents the market value of the big data analytics sector in Italy from 2015 to 2018. According to the data, the market value of big data analytics increased steadily over the period considered, growing from 790 million euros in 2015 to approximately 1.4 billion euros in 2018.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A class of discrete-time models of infectious disease spread, referred to as individual-level models (ILMs), are typically fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. These models quantify probabilistic outcomes regarding the risk of infection of susceptible individuals due to various susceptibility and transmissibility factors, including their spatial distance from infectious individuals. The infectious pressure from infected individuals exerted on susceptible individuals is intrinsic to these ILMs. Unfortunately, quantifying this infectious pressure for data sets containing many individuals can be computationally burdensome, leading to a time-consuming likelihood calculation and, thus, computationally prohibitive MCMC-based analysis. This problem worsens when using data augmentation to allow for uncertainty in infection times. In this paper, we develop sampling methods that can be used to calculate a fast, approximate likelihood when fitting such disease models. A simple random sampling approach is initially considered followed by various spatially-stratified schemes. We test and compare the performance of our methods with both simulated data and data from the 2001 foot-and-mouth disease (FMD) epidemic in the U.K. Our results indicate that substantial computation savings can be obtained—albeit, of course, with some information loss—suggesting that such techniques may be of use in the analysis of very large epidemic data sets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Current and projected research data storage needs of Agricultural Research Service researchers in 2016’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/e2b7daf0-c8fe-4c68-b62d-891360ba8f96 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling.
The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly.
From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond.
We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival.
To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive includes experimental data associated to the paper:
On the Timed Analysis of Big-Data Applications. Accepted in Proceedings of Nasa Formal Methods (NFM 2018).
Marconi, F., Quattrocchi, G., Baresi, L., Bersani, M.M., Rossi, M.. 2018
Specifically, it includes detailed data regarding the verification tasks reported in Section 4 (Implementation and Validation of the Model).
In reference to Table 1 of the paper, the archive is organized in the following way: there is one folder for each case study (sort_by_key, pagerank, kmeans) and, for each of these folders, there is a subfolder for each configuration considered in the paper.
Here we report an overview of the verification tasks performed. The name of the tables correspond to the code of the setting and to the name of the folder, while the id of each entry corresponds to the folder name of each experiment.
## SortByKey Experiments
sort_by_key_C12_T100_rec260000000
Application Cores Deadline Outcome Verification Time id sort_by_key 12 91000 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91000_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91100 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91100_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91200 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91200_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91300 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91300_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91360 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91360_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91370 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91370_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91380 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91380_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91381 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91381_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91382 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91382_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91383 timeout None C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91383_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91384 sat 3.81 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91384_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91385 sat 3.34 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91385_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91386 sat 3.52 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91386_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91387 sat 3.42 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91387_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91388 sat 3.4 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91388_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91389 sat 3.43 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91389_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91390 sat 2.37 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91390_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91400 sat 3.38 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91400_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91500 sat 16.52 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91500_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91600 sat 6.8 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91600_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91700 sat 12.03 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91700_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91800 sat 5.62 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91800_tc_12_8_n_rounds_by1_t_task sort_by_key 12 91900 sat 5.54 C1_t100_c12_c12_t100_nr260000000_tb20_no_l_d91900_tc_12_8_n_rounds_by1_t_task
the minimum SAT deadline is: 91384
sort_by_key_C12_T100_rec280000000
Application Cores Deadline Outcome Verification Time id sort_by_key 12 98200 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98200_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98300 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98300_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98400 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98400_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98402 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98402_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98403 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98403_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98404 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98404_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98405 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98405_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98406 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98406_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98407 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98407_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98408 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98408_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98409 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98409_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98410 timeout None C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98410_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98420 sat 3.48 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98420_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98430 sat 3.57 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98430_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98440 sat 6.68 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98440_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98450 sat 7.1 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98450_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98460 sat 37.07 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98460_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98470 sat 10.33 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98470_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98480 sat 18.14 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98480_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98490 sat 10.78 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98490_tc_12_8_n_rounds_by1_t_task sort_by_key 12 98500 sat 3.43 C1_t100_c12_c12_t100_nr280000000_tb20_no_l_d98500_tc_12_8_n_rounds_by1_t_task
the minimum SAT deadline is: 98420
sort_by_key_C12_T100_rec300000000
Application Cores Deadline Outcome Verification Time id sort_by_key 12 105200 timeout None C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105200_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105300 timeout None C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105300_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105400 timeout None C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105400_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105420 timeout None C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105420_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105430 timeout None C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105430_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105440 timeout None C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105440_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105441 timeout None C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105441_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105442 timeout None C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105442_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105443 sat 3.33 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105443_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105444 sat 3.35 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105444_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105445 sat 3.3 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105445_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105446 sat 3.31 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105446_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105447 sat 3.37 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105447_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105448 sat 3.37 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105448_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105449 sat 3.3 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105449_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105450 sat 3.02 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105450_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105460 sat 3.03 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105460_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105470 sat 4.52 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105470_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105480 sat 4.54 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105480_tc_12_8_n_rounds_by1_t_task sort_by_key 12 105490 sat 9.48 C1_t100_c12_c12_t100_nr300000000_tb20_no_l_d105490_tc_12_8_n_rounds_by1_t_task
the minimum SAT deadline is: 105443
sort_by_key_C22_T100_rec260000000
Application Cores Deadline Outcome Verification Time id sort_by_key 22 70000 timeout None C2_t100_c22_c22_t100_nr260000000_tb20_no_l_d70000_tc_22_10_n_rounds_by1_t_task sort_by_key 22 70500 timeout None C2_t100_c22_c22_t100_nr260000000_tb20_no_l_d70500_tc_22_10_n_rounds_by1_t_task
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global consumer network attached storage (NAS) market is expected to expand at a substantial CAGR during the forecast period, 2020 – 2026. Network Attached Storage (NAS) are hard disk storage devices which you can connect to any network, and allow multiple computers in a network to share same storage space simultaneously. NAS devices are assigned with an IP address and are accessed by clients (PCs or laptops) via a server that acts as a gateway to data. Network attached storage generally uses multiple disks to store data and provides file level storage and access to data. Globally growing demand for speedy data transfer, cost-effective storage systems, and efficient data backup systems are the drivers for the growth of consumer network attached storage (NAS) market. The deployment of network attached storage in small and medium businesses is expected to drive the Consumer NAS industry. The demand for this industry has increased due to easy installation, and low cost of the product.
Government across the globe are taking initiatives and focusing on digitization of data and this has increased the adoption of NAS architecture across the globe and this trend is expected to continue during the forecast period. IT enterprises are focusing on cloud, big data and artificial intelligence and this is considered to drive the global consumer network attached storage (NAS) market during the forecast period. Large enterprises and SMEs have huge data to be stored and due to this the demand for consumer network attached storage (NAS) market has increased in recent times. The usage of mobile has increased across the globe and the data in it needs to be stored this, in turn will drive the NAS market in coming future. The global consumer network attached storage (NAS) market is segmented into end-user, design types and regions.
Attributes | Details |
Base Year | 2019 |
Historic Data | 2015–2018 |
Forecast Period | 2020–2026 |
Regional Scope | Asia Pacific, Europe, North America, the Middle East & Africa, and Latin America |
Report Coverage | Company Share, Market Analysis and Size, Competitive Landscape, Growth Factors, and Trends, and Revenue Forecast |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We provide four crystalline materials datasets that contain both numerical and categorical features of materials.
References:
[1] K. Takahashi, L. Takahashi, J. D. Baran, and Y. Tanaka, "Descriptors for predicting the lattice constant of body centered cubic crystal", The Journal of chemical physics 146, 204104 (2017).
[2] D.-N. Nguyen, T.-L. Pham, V.-C. Nguyen, T.-D. Ho, T. Tran, K. Takahashi, and H.-C. Dam, "Committee machine that votes for similarity between materials", IUCrJ 5, 830-840 (2018).
[3] Y. Xu, M. Yamazaki, and P. Villars, "Inorganic materials database for exploring the nature of material", Japanese Journal of Applied Physics 50, 11RH02 (2011).
[4] L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheer, "Big data of materials science: critical role of the descriptor", Physical review letters 114, 105503 (2015).
[5] A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, et al., "Commentary: The materials project: A materials genome approach to accelerating materials innovation", Apl Materials 1, 011002 (2013).
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The France data center storage market, valued at €1.33 billion in 2025, is projected to experience steady growth, driven by the increasing adoption of cloud computing, big data analytics, and the expanding digital economy within the country. The Compound Annual Growth Rate (CAGR) of 1.04% over the forecast period (2025-2033) indicates a consistent, albeit moderate, expansion. Key drivers include the rising demand for high-performance computing, particularly within the IT & telecommunication, BFSI (Banking, Financial Services, and Insurance), and government sectors. The shift towards all-flash storage and hybrid storage solutions, offering faster speeds and enhanced efficiency compared to traditional storage, represents a significant trend. While the market faces certain restraints, such as the relatively high initial investment costs associated with advanced storage technologies and potential cybersecurity concerns, the overall growth trajectory remains positive. The market segmentation reveals a diverse landscape, with Network Attached Storage (NAS), Storage Area Network (SAN), and Direct Attached Storage (DAS) dominating the storage technology segment. Major players like Dell, Hewlett Packard Enterprise, NetApp, and others are competing fiercely to cater to the growing needs of various end-users across different sectors. The historical period (2019-2024) likely witnessed a similar growth pattern, with the market adapting to evolving technological advancements and digital transformation initiatives within France's business landscape. The market's steady growth is expected to be fueled by government initiatives promoting digitalization and increasing investments in data center infrastructure by enterprises seeking improved data management and disaster recovery capabilities. The competitive landscape is characterized by both established players and emerging technology providers offering innovative solutions. Continuous technological advancements are expected to lead to further segmentation within storage types and technologies, creating opportunities for specialized service providers and further fueling market expansion. However, the market needs to address potential challenges related to data privacy and security regulations, ensuring data sovereignty and compliance to maintain trust and drive market adoption. Recent developments include: May 2023: Pure Storage Inc. made significant strides by expanding its flash-based platform into the data center with the introduction of FlashBlade//E. This innovative solution addresses the storage of approximately 80% of data currently stored on disk-based systems, which is considered non-hot" or primary data and is predominantly low-cost in nature., June 2023: Huawei unveiled its cutting-edge data center data infrastructure architecture known as F2F2X (Flash-to-Flash-to-Anything). This architecture serves as a robust data foundation specifically designed to assist financial institutions in navigating the challenges posed by new data, new applications, and new resilience requirements.. Key drivers for this market are: Expansion of IT Infrastructure to Increase Market Growth, Increased Investments in Hyperscale Data Centers To Increase Market Growth. Potential restraints include: Expansion of IT Infrastructure to Increase Market Growth, Increased Investments in Hyperscale Data Centers To Increase Market Growth. Notable trends are: IT & Telecommunication Segment to Hold Major Share in the Market.
http://dcat-ap.de/def/licenses/cc-byhttp://dcat-ap.de/def/licenses/cc-by
This data management plan was created for large Collaborative Research Centers (CRC). The data generated in such centers is considered big, broad and heterogeneous. It may range from surveys, lab experiments, simulations, data models to software code and hardware design as well as real world objects etc. This plan has been implemented for the work of TRR277 AMC (a CRC funded by German Research Council (DFG)) since mid-2020. Since then, it has been offered as template of online fillable forms in TUM Workbench. TUM Workbench features some of the basic properties e.g. title, unique ID and project/ work package associations. And the functions of versioning and logs. The Guidelines and hints section of this DMP has been through several updates to help users. The latest draft version 3.2.1 dated September 02, 2022 is being published in its actual document form.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Deep artificial neural networks are feed-forward architectures capable of very impressive performances in diverse domains. Indeed stacking multiple layers allows a hierarchical composition of local functions, providing efficient compact mappings. Compared to the brain, however, such architectures are closer to a single pipeline and require huge amounts of data, while concrete cases for either human or machine learning systems are often restricted to not-so-big data sets. Furthermore, interpretability of the obtained results is a key issue: since deep learning applications are increasingly present in society, it is important that the underlying processes be accessible and understandable to every one. In order to target these challenges, in this contribution we analyze how considering prototypes in a rather generalized sense (with respect to the state of the art) allows to reasonably work with small data sets while providing an interpretable view of the obtained results. Some mathematical interpretation of this proposal is discussed. Sensitivity to hyperparameters is a key issue for reproducible deep learning results, and is carefully considered in our methodology. Performances and limitations of the proposed setup are explored in details, under different hyperparameter sets, in an analogous way as biological experiments are conducted. We obtain a rather simple architecture, easy to explain, and which allows, combined with a standard method, to target both performances and interpretability.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Please cite our paper if you publish material based on those datasets
G. Khodabandelou, V. Gauthier, M. El-Yacoubi, M. Fiore, "Estimation of Static and Dynamic Urban Populations with Mobile Network Metadata", in IEEE Trans. on Mobile Computing, 2018 (in Press). 10.1109/TMC.2018.2871156
Abstract
Communication-enabled devices that are physically carried by individuals are today pervasive,
which opens unprecedented opportunities for collecting digital metadata about the mobility of large populations. In this paper, we propose a novel methodology for the estimation of people density at metropolitan scales, using subscriber presence metadata collected by a mobile operator. We show that our approach suits the estimation of static population densities, i.e., of the distribution of dwelling units per urban area contained in traditional censuses. Specifically, it achieves higher accuracy than that granted by previous equivalent solutions. In addition, our approach enables the estimation of dynamic population densities, i.e., the time-varying distributions of people in a conurbation. Our results build on significant real-world mobile network metadata and relevant ground-truth information in multiple urban scenarios.
Dataset Columns
This dataset cover one month of data taken during the month of April 2015 for three Italian cities: Rome, Milan, Turin. The raw data has been provided during the Telecom Italia Big Data Challenge (http://www.telecomitalia.com/tit/en/innovazione/archivio/big-data-challenge-2015.html)
1. grid_id: the coordinate of the grid can be retrieved with the shapefile of a given city
2. date: format Y-M-D H:M:S
4. landuse_label: the land use label has been computed by through method described in [2]
5. presence: presence data of a given grid id as provided by the Telecom Italia Big Data Challenge
6. population: Census population of a given grid block as defined by the Istituto nazionale di statistica (ISTAT https://www.istat.it/en/censuses) in 2011
7. estimation: Dynamics density population estimation (in person) as the result of the method described in [1]
8. area: surface of the "grid id" considered in km^2
9. geometry: the shape of the area considered with the EPSG:3003 coordinate system (only with quilt)
Note
Due to legal constraints, we cannot share directly the original data from Telecom Italia Big Data Challenge we used to build this dataset.
Easy access to this dataset with quilt
Install the dataset repository:
$ quilt install vgauthier/DynamicPopEstimate
Use the dataset with a Panda Dataframe
>>> from quilt.data.vgauthier import DynamicPopEstimate
>>> import pandas as pd
>>> df = pd.DataFrame(DynamicPopEstimate.rome())
Use the dataset with a GeoPanda Dataframe
>>> from quilt.data.vgauthier import DynamicPopEstimate
>>> import geopandas as gpd
>>> df = gpd.DataFrame(DynamicPopEstimate.rome())
References
[1] G. Khodabandelou, V. Gauthier, M. El-Yacoubi, M. Fiore, "Population estimation from mobile network traffic metadata", in proc of the 17th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM), pp. 1 - 9, 2016.
[2] A. Furno, M. Fiore, R. Stanica, C. Ziemlicki, and Z. Smoreda, "A tale of ten cities: Characterizing signatures of mobile traffic in urban areas," IEEE Transactions on Mobile Computing, Volume: 16, Issue: 10, 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files should be considered as additional material for the paper "An Intelligent Transportation System to Control Air Pollution and Road Traffic in Cities Integrating CEP and Colored Petri nets". Its aim is to show how to reproduce the observed data by using the developed Color Petri Nets.
NEW GOES-19 Data!! On April 4, 2025 at 1500 UTC, the GOES-19 satellite will be declared the Operational GOES-East satellite. All products and services, including NODD, for GOES-East will transition to GOES-19 data at that time. GOES-19 will operate out of the GOES-East location of 75.2°W starting on April 1, 2025 and through the operational transition. Until the transition time and during the final stretch of Post Launch Product Testing (PLPT), GOES-19 products are considered non-operational regardless of their validation maturity level. Shortly following the transition of GOES-19 to GOES-East, all data distribution from GOES-16 will be turned off. GOES-16 will drift to the storage location at 104.7°W. GOES-19 data should begin flowing again on April 4th once this maneuver is complete.
NEW GOES 16 Reprocess Data!! The reprocessed GOES-16 ABI L1b data mitigates systematic data issues (including data gaps and image artifacts) seen in the Operational products, and improves the stability of both the radiometric and geometric calibration over the course of the entire mission life. These data were produced by recomputing the L1b radiance products from input raw L0 data using improved calibration algorithms and look-up tables, derived from data analysis of the NIST-traceable, on-board sources. In addition, the reprocessed data products contain enhancements to the L1b file format, including limb pixels and pixel timestamps, while maintaining compatibility with the operational products. The datasets currently available span the operational life of GOES-16 ABI, from early 2018 through the end of 2024. The Reprocessed L1b dataset shows improvement over the Operational L1b products but may still contain data gaps or discrepancies. Please provide feedback to Dan Lindsey (dan.lindsey@noaa.gov) and Gary Lin (guoqing.lin-1@nasa.gov). More information can be found in the GOES-R ABI Reprocess User Guide.
NOTICE: As of January 10th 2023, GOES-18 assumed the GOES-West position and all data files are deemed both operational and provisional, so no ‘preliminary, non-operational’ caveat is needed. GOES-17 is now offline, shifted approximately 105 degree West, where it will be in on-orbit storage. GOES-17 data will no longer flow into the GOES-17 bucket. Operational GOES-West products can be found in the GOES-18 bucket.
GOES satellites (GOES-16, GOES-17, GOES-18 & GOES-19) provide continuous weather imagery and
monitoring of meteorological and space environment data across North America.
GOES satellites provide the kind of continuous monitoring necessary for
intensive data analysis. They hover continuously over one position on the surface.
The satellites orbit high enough to allow for a full-disc view of the Earth. Because
they stay above a fixed spot on the surface, they provide a constant vigil for the
atmospheric "triggers" for severe weather conditions such as tornadoes, flash floods,
hailstorms, and hurricanes. When these conditions develop, the GOES satellites are able
to monitor storm development and track their movements. SUVI products available in both NetCDF and FITS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details
For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).
%degradation=a × cycle+ (exp^(b×cycle)-1) (1)
Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.
%LLI=%LLI+p8 (LAM_PE-PT) (2)
Where PT was calculated with equation (3) from [60].
PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)
Varying all those parameters accounted for close to 130,000 individual duty cycles with one voltage curve for every 200 cycles (one every 10 cycles for cycles 1-200)
This dataset necessitate the associated V vs. Q dataset to be functional (10.17632/bs2j56pn7y.2) See read me for details on variable and examples.
This graph presents the result of a worldwide survey conducted by Accenture into what is considered to be part of big data in 2014. In 2014, 60 percent of respondents felt that advanced analytics or analysis were part of big data.