The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global big data technology market size was valued at approximately $162 billion in 2023 and is projected to reach around $471 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 12.6% during the forecast period. The growth of this market is primarily driven by the increasing demand for data analytics and insights to enhance business operations, coupled with advancements in AI and machine learning technologies.
One of the principal growth factors of the big data technology market is the rapid digital transformation across various industries. Businesses are increasingly recognizing the value of data-driven decision-making processes, leading to the widespread adoption of big data analytics. Additionally, the proliferation of smart devices and the Internet of Things (IoT) has led to an exponential increase in data generation, necessitating robust big data solutions to analyze and extract meaningful insights. Organizations are leveraging big data to streamline operations, improve customer engagement, and gain a competitive edge.
Another significant growth driver is the advent of advanced technologies like artificial intelligence (AI) and machine learning (ML). These technologies are being integrated into big data platforms to enhance predictive analytics and real-time decision-making capabilities. AI and ML algorithms excel at identifying patterns within large datasets, which can be invaluable for predictive maintenance in manufacturing, fraud detection in banking, and personalized marketing in retail. The combination of big data with AI and ML is enabling organizations to unlock new revenue streams, optimize resource utilization, and improve operational efficiency.
Moreover, regulatory requirements and data privacy concerns are pushing organizations to adopt big data technologies. Governments worldwide are implementing stringent data protection regulations, like the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States. These regulations necessitate robust data management and analytics solutions to ensure compliance and avoid hefty fines. As a result, organizations are investing heavily in big data platforms that offer secure and compliant data handling capabilities.
As organizations continue to navigate the complexities of data management, the role of Big Data Professional Services becomes increasingly critical. These services offer specialized expertise in implementing and managing big data solutions, ensuring that businesses can effectively harness the power of their data. Professional services encompass a range of offerings, including consulting, system integration, and managed services, tailored to meet the unique needs of each organization. By leveraging the knowledge and experience of big data professionals, companies can optimize their data strategies, streamline operations, and achieve their business objectives more efficiently. The demand for these services is driven by the growing complexity of big data ecosystems and the need for seamless integration with existing IT infrastructure.
Regionally, North America holds a dominant position in the big data technology market, primarily due to the early adoption of advanced technologies and the presence of key market players. The Asia Pacific region is expected to witness the highest growth rate during the forecast period, driven by increasing digitalization, the rapid growth of industries such as e-commerce and telecommunications, and supportive government initiatives aimed at fostering technological innovation.
The big data technology market is segmented into software, hardware, and services. The software segment encompasses data management software, analytics software, and data visualization tools, among others. This segment is expected to witness substantial growth due to the increasing demand for data analytics solutions that can handle vast amounts of data. Advanced analytics software, in particular, is gaining traction as organizations seek to gain deeper insights and make data-driven decisions. Companies are increasingly adopting sophisticated data visualization tools to present complex data in an easily understandable format, thereby enhancing decision-making processes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 3 rows and is filtered where the books is Beginning big data with Power BI and Excel 2013 : big data processing and analysis using Power BI in Excel 2013. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.
Key observations
The largest age group in Excel, AL was for the group of age 5 to 9 years years with a population of 77 (15.28%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.40%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.
IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.
IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform
The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.
Due to the changes in our systems, some tables have been affected.
Data quality has been improved across all tables.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global spreadsheet software market is experiencing robust growth, driven by the increasing adoption of cloud-based solutions and the rising demand for data analysis tools across various industries. The market, estimated at $50 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching approximately $150 billion by the end of the forecast period. This growth is fueled by several key factors. Firstly, the increasing reliance on data-driven decision-making across businesses, irrespective of size, necessitates efficient data management and analysis capabilities provided by spreadsheet software. Secondly, the proliferation of cloud-based spreadsheet applications offers enhanced collaboration, accessibility, and scalability, making them attractive to organizations of all sizes. Finally, continuous advancements in features like advanced analytics, data visualization, and integration with other business applications enhance the overall utility and appeal of these tools. Major players like Microsoft, Google, and Zoho are continuously innovating, adding new features and improving user experience to maintain their market leadership. However, the market also faces challenges. Security concerns related to data storage and access in cloud-based solutions, and the need for continuous training and upskilling to leverage advanced features, pose limitations to wider adoption. Despite these challenges, the long-term outlook for the spreadsheet software market remains positive. The increasing digitization of businesses and the expanding adoption of big data analytics will propel demand for sophisticated spreadsheet tools. The emergence of niche players focusing on specific industry needs and specialized functionalities will also contribute to market expansion. Competition will remain fierce among established players and newcomers, prompting innovation and improvement in the overall product offerings. The market will witness consolidation through mergers and acquisitions, and a shift towards subscription-based models, further driving market growth and shaping the competitive landscape. The geographic distribution of the market will see continued growth in developing economies, driven by increasing internet penetration and smartphone adoption.
The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.
The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Sample survey data [ssd]
The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.
Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.
For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.
For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).
Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).
For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.
For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.
For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.
Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).
Computer Assisted Personal Interview [capi]
Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.
For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.
For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.
Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population by year. The dataset can be utilized to understand the population trend of Excel.
The dataset constitues the following datasets
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel household income by gender. The dataset can be utilized to understand the gender-based income distribution of Excel income.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of Excel income distribution by gender. You can refer the same here
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The global spreadsheets software market is experiencing robust growth, driven by increasing digitalization across industries and the rising adoption of cloud-based solutions. The market, estimated at $20 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 8% from 2025 to 2033, reaching approximately $35 billion by 2033. This growth is fueled by several factors, including the expanding need for data analysis and visualization across SMEs and large enterprises, the increasing availability of user-friendly and feature-rich spreadsheet software, and the growing preference for collaborative tools that facilitate seamless teamwork. The market is segmented by operating system (Windows, Macintosh, Linux, Others) and user type (SMEs, Large Enterprises). While Microsoft Excel maintains a dominant market share, the rise of cloud-based alternatives like Google Sheets and collaborative platforms is challenging this dominance, fostering competition and innovation. Furthermore, the increasing integration of spreadsheets with other business intelligence tools further enhances their utility and fuels demand. Geographic expansion, particularly in developing economies with rising internet penetration, also contributes significantly to market expansion. However, factors such as the high initial investment in software licenses and the need for specialized training can restrain market growth, particularly for smaller businesses with limited budgets and technical expertise. The increasing complexity of data analysis necessitates enhanced security features and data protection measures, which add cost and require ongoing investment. Moreover, the emergence of advanced analytical tools and specialized data visualization software presents a competitive challenge, demanding continuous innovation and adaptation from existing spreadsheet software providers. Nevertheless, the overall market outlook remains positive, driven by sustained demand from diverse industries and technological advancements within the software landscape.
https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/
Global Business Intelligence and Analytics Software Market size was valued at USD 21.56 Billion in 2022 and is poised to grow from USD 23.38 Billion in 2023 to USD 44.78 Billion by 2031, growing at a CAGR of 8.46% in the forecast period (2024-2031).
The utilisation of machine learning techniques in the analysis of big data has become a standard practice for forecasting the outcomes of various sectors of the economy. This scientific article analyses changes in electricity consumption volumes in the Republic of Kazakhstan from 2002 to 2022. The study examines the dynamics of growth or reduction in consumption volumes. Data was prepared for analysis and model construction by employing outlier assessment methods. The work considers the Holt-Winters, SARIMA, and LSTM models, comparing their advantages and disadvantages. Accuracy metrics for forecasts and the Dickey-Fuller test for stationarity are also assessed. The application of forecasting models yielded diverse output data for predictive values, and it also examined the quality of training neural networks for analysing big data, specifically electricity consumption volumes. As an alternative, recurrent neural networks (RNNs) were utilised. The study revealed that this category of neural networks achieves results that closely approximate actual data and ensures precise training of neural networks for forecasting tasks. Moreover, the article explores the bidirectional RNN, a variation of recurrent networks that offers superior problem-solving quality in some instances. Recurrent networks process input sequences sequentially, and any alteration in data sequence can completely transform the representation extracted from the sequence. Hence, they excel in tasks where sequence order matters, such as temperature forecasting.
The scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared to the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up onl..., SLA data was downlaoded from TRY (traits 3115, 3116, and 3117) for all conifer (Araucariaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae), Plantago, Poa, and Quercus species. The data has not been processed in any way, but additional columns have been added to the datset that provide the viewer with information about where each data point came from, how it was cited, how it was measured, whether it was uploaded correctly, whether it had already been uploaded to TRY, and whether it was uploaded by the individual who collected the data., , There are two additional documents associated with this publication. One is a word document that includes a description of each of the 120 datasets that contained SLA data for the four plant groups within the study (conifers, Plantago, Poa, and Quercus). The second is an excel document that contains the SLA data that was downloaded from TRY and all associated metadata.
Missing data codes: NA and N/A
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Key-Value Database (KVD) market, currently valued at $841 million in 2025, is projected to experience robust growth, driven by the increasing demand for high-performance, scalable data storage solutions across various industries. The 7.5% CAGR from 2025 to 2033 indicates a substantial market expansion, fueled by several key factors. The rising adoption of cloud computing and big data analytics necessitates efficient data management systems, with KVDs offering speed and scalability unmatched by traditional relational databases. Furthermore, the growing popularity of real-time applications, such as IoT devices and online gaming, significantly contributes to the market's expansion, as KVDs excel in handling large volumes of rapidly changing data. The proliferation of mobile devices and the expansion of the digital economy further reinforces the market's growth trajectory. Competition among major players like AWS, Azure, Redis, and others fosters innovation and drives down costs, making KVD solutions accessible to a broader range of businesses. However, potential security concerns and the need for specialized expertise to manage and maintain KVD systems could act as minor restraints on market growth. The market segmentation, while not explicitly detailed, likely includes variations based on deployment model (cloud, on-premise), database type (NoSQL, in-memory), industry verticals (finance, healthcare, e-commerce), and geographic regions. Companies are constantly innovating to enhance features like data consistency, fault tolerance, and data encryption. The forecast period (2025-2033) presents significant opportunities for existing players to consolidate their market share and for new entrants to disrupt the market with innovative technologies. Continued focus on improving performance, security, and ease of use will be critical to success within this rapidly evolving landscape. The historical period (2019-2024) showcases a period of growing adoption, providing a solid foundation for the anticipated growth during the forecast period.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.
This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.
https://i.imgur.com/cUFuMeU.png" alt="">
The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.
Cover Photo by: Kevin Woblick on Unsplash
Thumbnail by: Airplane icons created by Freepik - Flaticon
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel