Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.
Key observations
The largest age group in Excel, AL was for the group of age 5 to 9 years years with a population of 77 (15.28%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.40%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population by year. The dataset can be utilized to understand the population trend of Excel.
The dataset constitues the following datasets
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global big data technology market size was valued at approximately $162 billion in 2023 and is projected to reach around $471 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 12.6% during the forecast period. The growth of this market is primarily driven by the increasing demand for data analytics and insights to enhance business operations, coupled with advancements in AI and machine learning technologies.
One of the principal growth factors of the big data technology market is the rapid digital transformation across various industries. Businesses are increasingly recognizing the value of data-driven decision-making processes, leading to the widespread adoption of big data analytics. Additionally, the proliferation of smart devices and the Internet of Things (IoT) has led to an exponential increase in data generation, necessitating robust big data solutions to analyze and extract meaningful insights. Organizations are leveraging big data to streamline operations, improve customer engagement, and gain a competitive edge.
Another significant growth driver is the advent of advanced technologies like artificial intelligence (AI) and machine learning (ML). These technologies are being integrated into big data platforms to enhance predictive analytics and real-time decision-making capabilities. AI and ML algorithms excel at identifying patterns within large datasets, which can be invaluable for predictive maintenance in manufacturing, fraud detection in banking, and personalized marketing in retail. The combination of big data with AI and ML is enabling organizations to unlock new revenue streams, optimize resource utilization, and improve operational efficiency.
Moreover, regulatory requirements and data privacy concerns are pushing organizations to adopt big data technologies. Governments worldwide are implementing stringent data protection regulations, like the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States. These regulations necessitate robust data management and analytics solutions to ensure compliance and avoid hefty fines. As a result, organizations are investing heavily in big data platforms that offer secure and compliant data handling capabilities.
As organizations continue to navigate the complexities of data management, the role of Big Data Professional Services becomes increasingly critical. These services offer specialized expertise in implementing and managing big data solutions, ensuring that businesses can effectively harness the power of their data. Professional services encompass a range of offerings, including consulting, system integration, and managed services, tailored to meet the unique needs of each organization. By leveraging the knowledge and experience of big data professionals, companies can optimize their data strategies, streamline operations, and achieve their business objectives more efficiently. The demand for these services is driven by the growing complexity of big data ecosystems and the need for seamless integration with existing IT infrastructure.
Regionally, North America holds a dominant position in the big data technology market, primarily due to the early adoption of advanced technologies and the presence of key market players. The Asia Pacific region is expected to witness the highest growth rate during the forecast period, driven by increasing digitalization, the rapid growth of industries such as e-commerce and telecommunications, and supportive government initiatives aimed at fostering technological innovation.
The big data technology market is segmented into software, hardware, and services. The software segment encompasses data management software, analytics software, and data visualization tools, among others. This segment is expected to witness substantial growth due to the increasing demand for data analytics solutions that can handle vast amounts of data. Advanced analytics software, in particular, is gaining traction as organizations seek to gain deeper insights and make data-driven decisions. Companies are increasingly adopting sophisticated data visualization tools to present complex data in an easily understandable format, thereby enhancing decision-making processes.
This dataset contains FEMA applicant-level data for the Individuals and Households Program (IHP). All PII information has been removed. The location is represented by county, city, and zip code. This dataset contains Individual Assistance (IA) applications from DR1439 (declared in 2002) to those declared over 30 days ago. The full data set is refreshed on an annual basis and refreshed weekly to update disasters declared in the last 18 months. This dataset includes all major disasters and includes only valid registrants (applied in a declared county, within the registration period, having damage due to the incident and damage within the incident period). Information about individual data elements and descriptions are listed in the metadata information within the dataset.rnValid registrants may be eligible for IA assistance, which is intended to meet basic needs and supplement disaster recovery efforts. IA assistance is not intended to return disaster-damaged property to its pre-disaster condition. Disaster damage to secondary or vacation homes does not qualify for IHP assistance.rnData comes from FEMA's National Emergency Management Information System (NEMIS) with raw, unedited, self-reported content and subject to a small percentage of human error.rnAny financial information is derived from NEMIS and not FEMA's official financial systems. Due to differences in reporting periods, status of obligations and application of business rules, this financial information may differ slightly from official publication on public websites such as usaspending.gov. This dataset is not intended to be used for any official federal reporting. rnCitation: The Agency’s preferred citation for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page, Citing Data section: https://www.fema.gov/about/openfema/terms-conditions.rnDue to the size of this file, tools other than a spreadsheet may be required to analyze, visualize, and manipulate the data. MS Excel will not be able to process files this large without data loss. It is recommended that a database (e.g., MS Access, MySQL, PostgreSQL, etc.) be used to store and manipulate data. Other programming tools such as R, Apache Spark, and Python can also be used to analyze and visualize data. Further, basic Linux/Unix tools can be used to manipulate, search, and modify large files.rnIf you have media inquiries about this dataset, please email the FEMA News Desk at FEMA-News-Desk@fema.dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open Government program, please email the OpenFEMA team at OpenFEMA@fema.dhs.gov.rnThis dataset is scheduled to be superceded by Valid Registrations Version 2 by early CY 2024.
Market basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
Microsoft Excel based (using Visual Basic for Applications) data-reduction and visualization tools have been developed that allow to numerically reduce large sets of geothermal data to any size. The data can be quickly sifted through and graphed to allow their study. The ability to analyze large data sets can yield responses to field management procedures that would otherwise be undetectable. Field-wide trends such as decline rates, response to injection, evolution of superheat, recording instrumentation problems and data inconsistencies can be quickly queried and graphed. The application of these newly developed tools to data from The Geysers geothermal field is illustrated. A copy of these tools may be requested by contacting the authors.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:
Context:
Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.
Inspiration:
The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.
Dataset Information:
The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:
Use Cases:
Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global document databases market size was valued at approximately USD 3.5 billion in 2023 and is projected to reach around USD 8.2 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 9.7% over the forecast period. This impressive growth can be attributed to the increasing demand for more flexible and scalable database solutions that can handle diverse data types and structures.
One of the primary growth factors for the document databases market is the rising adoption of NoSQL databases. Traditional relational databases often struggle with the unstructured data generated by modern applications, social media, and IoT devices. NoSQL databases, such as document databases, offer a more flexible and scalable solution to handle this data, which has led to their increased adoption across various industry verticals. Additionally, the growing popularity of microservices architecture in application development also drives the need for document databases, as they provide the necessary agility and performance.
Another significant growth factor is the increasing volume of data generated globally. With the exponential growth of data, organizations require robust and efficient database management systems to store, process, and analyze vast amounts of information. Document databases excel in managing large volumes of semi-structured and unstructured data, making them an ideal choice for enterprises looking to harness the power of big data analytics. Furthermore, advancements in cloud computing have made it easier for organizations to deploy and scale document databases, further driving their adoption.
The rise of artificial intelligence (AI) and machine learning (ML) technologies is also propelling the growth of the document databases market. AI and ML applications require databases that can handle complex data structures and provide quick access to large datasets for training and inference purposes. Document databases, with their schema-less design and ability to store diverse data types, are well-suited for these applications. As more organizations incorporate AI and ML into their operations, the demand for document databases is expected to grow significantly.
Regionally, North America holds the largest market share for document databases, driven by the presence of major technology companies and a high adoption rate of advanced database solutions. Europe is also a significant market, with growing investments in digital transformation initiatives. The Asia Pacific region is anticipated to witness the highest growth rate during the forecast period, fueled by rapid technological advancements and increasing adoption of cloud-based solutions in countries like China, India, and Japan. Latin America and the Middle East & Africa are also experiencing growth, albeit at a slower pace, due to increasing digitalization efforts and the need for efficient data management solutions.
NoSQL databases, a subset of document databases, have gained significant traction over the past decade. They are designed to handle unstructured and semi-structured data, making them highly versatile and suitable for a wide range of applications. Unlike traditional relational databases, NoSQL databases do not require a predefined schema, allowing for greater flexibility and scalability. This has led to their adoption in industries such as retail, e-commerce, and social media, where the volume and variety of data are constantly changing.
The key advantage of NoSQL databases is their ability to scale horizontally. Traditional relational databases often face challenges when scaling up, as they require more powerful hardware and complex configurations. In contrast, NoSQL databases can easily scale out by adding more servers to the database cluster. This makes them an ideal choice for applications that experience high traffic and require real-time data processing. Companies like Amazon, Facebook, and Google have already adopted NoSQL databases to manage their massive data workloads, setting a precedent for other organizations to follow.
Another driving factor for the adoption of NoSQL databases is their performance in handling large datasets. NoSQL databases are optimized for read and write operations, making them faster and more efficient than traditional relational databases. This is particularly important for applications that require real-time analytics and immediate data access. For instance, e-commerce platforms use NoSQL databases to provide personalized recommendations to users based on th
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains files related to Sant et al. Nature Genetics 2025 which were too large to include as part of the publication. Below, we describe each file and its contents.
1. Simulated datasets and associated parameters
Simulated_Data_Parameters.xlsx - This file contains the parameters used to create the simulated datasets mentioned below. Briefly, using the R package Splatter, we generated 100 simulated datasets representing 1, 5, 10, or 20 distinct ground-truth cell populations, ranging from 500 to 25,000 cells. To assess how various aspects of snRNA-seq datasets affect CHOIR’s performance, we used five of the simulated datasets produced with Splatter as the baseline to generate 105 additional simulated datasets in which we incrementally reduced the prevalence of rare cell populations, the degree of differential expression, or the library size. Additionally, we generated 10 simulated datasets with multiple batches, with either balanced or imbalanced batch sizes, and 5 simulated datasets using Splatter’s simulation of cell differentiation trajectories. To ensure that our results were not dependent on the software used for data simulation, we also generated 10 datasets with the simulation method scDesign3 from real subsampled PBMC cell populations.
Simulated_Datasets.tar.gz - This tar.gz archive contains the 230 simulated datasets which were used for benchmarking of clustering tools for single-cell analysis in Sant et al. Nature Genetics 2025. The individual datasets have been stored as Seurat objects and combined into a single tar.gz file.
2. Source data and results for real-world datasets
SourceData1_RealData.xlsx - This excel file contains the parameters used, the metrics obtained, the cell labels obtained, and any relevant single-cell-resolution results from the analyses of the following real-world datasets: snMultiome human retina (Wang et al. Cell Genomics 2022), atlas-scale snRNA-seq of human brain (Siletti et al. Science 2023), scRNA-seq of mixed cell lines (Kinker et al. Nature Genetics 2020), CITE-seq of human PBMCs (Hao et al. Cell 2021), and sci-Space of mouse embryo (Srivatsan et al. Science 2021).
3. Source data and results for simulated datasets
SourceData2_SimulatedData.xlsx - This excel file contains the parameters used, the metrics obtained, and the cell labels obtained for all simulated datasets analyzed in Sant et al. Nature Genetics 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the the household distribution across 16 income brackets among four distinct age groups in Excel: Under 25 years, 25-44 years, 45-64 years, and over 65 years. The dataset highlights the variation in household income, offering valuable insights into economic trends and disparities within different age categories, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income brackets:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel median household income by age. You can refer the same here
MealMe provides comprehensive grocery and retail SKU-level product data, including real-time pricing, from the top 100 retailers in the USA and Canada. Our proprietary technology ensures accurate and up-to-date insights, empowering businesses to excel in competitive intelligence, pricing strategies, and market analysis.
Retailers Covered: MealMe’s database includes detailed SKU-level data and pricing from leading grocery and retail chains such as Walmart, Target, Costco, Kroger, Safeway, Publix, Whole Foods, Aldi, ShopRite, BJ’s Wholesale Club, Sprouts Farmers Market, Albertsons, Ralphs, Pavilions, Gelson’s, Vons, Shaw’s, Metro, and many more. Our coverage spans the most influential retailers across North America, ensuring businesses have the insights needed to stay competitive in dynamic markets.
Key Features: SKU-Level Granularity: Access detailed product-level data, including product descriptions, categories, brands, and variations. Real-Time Pricing: Monitor current pricing trends across major retailers for comprehensive market comparisons. Regional Insights: Analyze geographic price variations and inventory availability to identify trends and opportunities. Customizable Solutions: Tailored data delivery options to meet the specific needs of your business or industry. Use Cases: Competitive Intelligence: Gain visibility into pricing, product availability, and assortment strategies of top retailers like Walmart, Costco, and Target. Pricing Optimization: Use real-time data to create dynamic pricing models that respond to market conditions. Market Research: Identify trends, gaps, and consumer preferences by analyzing SKU-level data across leading retailers. Inventory Management: Streamline operations with accurate, real-time inventory availability. Retail Execution: Ensure on-shelf product availability and compliance with merchandising strategies. Industries Benefiting from Our Data CPG (Consumer Packaged Goods): Optimize product positioning, pricing, and distribution strategies. E-commerce Platforms: Enhance online catalogs with precise pricing and inventory information. Market Research Firms: Conduct detailed analyses to uncover industry trends and opportunities. Retailers: Benchmark against competitors like Kroger and Aldi to refine assortments and pricing. AI & Analytics Companies: Fuel predictive models and business intelligence with reliable SKU-level data. Data Delivery and Integration MealMe offers flexible integration options, including APIs and custom data exports, for seamless access to real-time data. Whether you need large-scale analysis or continuous updates, our solutions scale with your business needs.
Why Choose MealMe? Comprehensive Coverage: Data from the top 100 grocery and retail chains in North America, including Walmart, Target, and Costco. Real-Time Accuracy: Up-to-date pricing and product information ensures competitive edge. Customizable Insights: Tailored datasets align with your specific business objectives. Proven Expertise: Trusted by diverse industries for delivering actionable insights. MealMe empowers businesses to unlock their full potential with real-time, high-quality grocery and retail data. For more information or to schedule a demo, contact us today!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using Excel Data Analysis Tools and BigML Machine Learning platform, we tested correlation between biopsy data for breast cancer and created a model which helps to distinguish between benign and malignant tumors. Data set of oncology patients were used to analyze links between 10 indicators collected by biopsy non- cancerous and cancerous tumours. Created model can be used as a future medical science tool and can be available to specially trained histology nurses in rural areas. Developed model that can be used to detect cancer on early stages is especially important in the view of the fact that detecting cancer at stage IV give patients of about 22% of survival rate 1.
Note: Data files will be made available upon manuscript publication This dataset contains all code and data needed to reproduce the analyses in the manuscript: IDENTIFICATION OF A KEY TARGET FOR ELIMINATION OF NITROUS OXIDE, A MAJOR GREENHOUSE GAS. Blake A. Oakley (1), Trevor Mitchell (2), Quentin D. Read (3), Garrett Hibbs (1), Scott E. Gold (2), Anthony E. Glenn (2) Department of Plant Pathology, University of Georgia, Athens, GA, USA. Toxicology and Mycotoxin Research Unit, U.S. National Poultry Research Center, United States Department of Agriculture-Agricultural Research Service, Athens, GA, USA Southeast Area, United States Department of Agriculture-Agricultural Research Service, Raleigh, NC, USA citation will be updated upon acceptance of manuscript Brief description of study aims Denitrification is a chemical process that releases nitrous oxide (N2O), a potent greenhouse gas. The NOR1 gene is part of the denitrification pathway in Fusarium. Three experiments were conducted for this study. (1) The N2O comparative experiment compares denitrification rates, as measured by N2O production, of a variety of Fusarium spp. strains with and without the NOR1 gene. (2) The N2O substrate experiment compares denitrification rates of selected strains on different growth media (substrates). For parts 1 and 2, linear models are fit comparing N2O production between strains and/or substrates. (3) The Bioscreen growth assay tests whether there is a pleiotropic effect of the NOR1 gene. In this portion of the analysis, growth curves are fit to assess differences in growth rate and carrying capacity between selected strains with and without the NOR1 gene. Code All code is included in a .zip archive generated from a private git repository on 2022-10-13 and archived as part of this dataset. The code is contained in R scripts and RMarkdown notebooks. There are two components to the analysis: the denitrification analysis (comprising parts 1 and 2 described above) and the Bioscreen growth analysis (part 3). The scripts for each are listed and described below. Analysis of results of denitrification experiments (parts 1 and 2) NOR1_denitrification_analysis.Rmd: The R code to analyze the experimental data comparing nitrous oxide emissions is all contained in a single RMarkdown notebook. This script analyzes the results from the comparative study and the substrate study. n2o_subgroup_figures.R: R script to create additional figures using the output from the RMarkdown notebook Analysis of results of Bioscreen growth assay (part 3) bioscreen_analysis.Rmd: This RMarkdown notebook contains all R code needed to analyze the results of the Bioscreen assay comparing growth of the different strains. It could be run as is. However, the model-fitting portion was run on a high-performance computing cluster with the following scripts: bioscreen_fit_simpler.R: R script containing only the model-fitting portion of the Bioscreen analysis, fit using the Stan modeling language interfaced with R through the brms and cmdstanr packages. job_bssimple.sh: Job submission shell script used to submit the model-fitting R job to be run on USDA SciNet high-performance computing cluster. Additional scripts developed as part of the analysis but that are not required to reproduce the analyses in the manuscript are in the deprecated/ folder. Also note the files nor1-denitrification.Rproj (RStudio project file) and gtstyle.css (stylesheet for formatting the tables in the notebooks) are included. Data Data required to run the analysis scripts are archived in this dataset, other than strain_lookup.csv, a lookup table of strain abbreviations and full names included in the code repository for convenience. They should be placed in a folder or symbolic link called project within the unzipped code repository directory. N2O_data_2022-08-03/N2O_Comparative_Study_Trial_(n)(date range).xlsx: These are the data from the N2O comparative study, where n is the trial number from 1-3 and date range is the begin and end date of the trial. N2O_data_2022-08-03/Nitrogen_Substrate_Study_Trial(n)(date range).xlsx: These are the data from the N2O substrate study, where n is the trial number from 1-3 and date range is the begin and end date of the trial. Outliers_NOR1_2022/Bioscreen_NOR1_Fungal_Growth_Assay(substrate)(oxygen level)_Outliers_BAO(date).xlsx: These are the raw Bioscreen data files in MS Excel format. The format of each file name includes the substrate (minimal medium with nitrite or nitrate and lysine), oxygen level (hypoxia or normoxia), and date of the run. This repository includes code to process these files, but the processed data are also included on Ag Data Commons, so it is not necessary to run the data processing portion of the code. clean_data/bioscreen_clean_data.csv: This is an intermediate output file in CSV format generated by bioscreen_analysis.Rmd. It includes all the data from the Bioscreen assays in a clean analysis-ready format.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This is a compiled datasets comprising of data from various companies' 10-K annual reports and balance sheets. The data is a longitudinal or panel data, from year 2009-2022(/23) and also consists of a few bankrupt companies to help for investigating factors. The names of the companies are given according to their Stocks. Companies divided into specific categories.
We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. , The proposed network is built starting from two distinct data sources:
the entire dataset dump from Semantic Scholar (with particular emphasis on the authors and papers datasets) the entire list of Italian faculty members as maintained by Cineca (under appointment by the Italian Ministry of University and Research).
By means of a custom name-identity recognition algorithm (details are available in the accompanying paper published in Scientific Data), the names of the authors in the Semantic Scholar dataset have been mapped against the names contained in the Cineca dataset and authors with no match (e.g., because of not being part of an Italian university) have been discarded. The remaining authors will compose the nodes of the network, which have been enriched with node-related (i.e., author-related) attributes. In order to build the network edges, we leveraged the papers dataset from Semantic Scholar: specifically, any two authors are said to be connected if there is at least one pap..., , # Data cleaning and enrichment through data integration: networking the Italian academia
https://doi.org/10.5061/dryad.wpzgmsbwj
Manuscript published in Scientific Data with DOI .
This repository contains two main data files:
edge_data_AGG.csv
, the full network in comma-separated edge list format (this file contains mainly temporal co-authorship information);Coauthorship_Network_AGG.graphml
, the full network in GraphML format. along with several supplementary data, listed below, useful only to build the network (i.e., for reproducibility only):
University-City-match.xlsx
, an Excel file that maps the name of a university against the city where its respective headquarter is located;Areas-SS-CINECA-match.xlsx
, an Excel file that maps the research areas in Cineca against the research areas in Semantic Scholar.The `Coauthorship_Networ...
Success.ai’s Ecommerce Store Data for the APAC E-commerce Sector provides a reliable and accurate dataset tailored for businesses aiming to connect with e-commerce professionals and organizations across the Asia-Pacific region. Covering roles and businesses involved in online retail, marketplace management, logistics, and digital commerce, this dataset includes verified business profiles, decision-maker contact details, and actionable insights.
With access to continuously updated, AI-validated data and over 700 million global profiles, Success.ai ensures your outreach, market analysis, and partnership strategies are effective and data-driven. Backed by our Best Price Guarantee, this solution helps you excel in one of the world’s fastest-growing e-commerce markets.
Why Choose Success.ai’s Ecommerce Store Data?
Verified Profiles for Precision Engagement
Comprehensive Coverage of the APAC E-commerce Sector
Continuously Updated Datasets
Ethical and Compliant
Data Highlights:
Key Features of the Dataset:
Comprehensive E-commerce Business Profiles
Advanced Filters for Precision Campaigns
Regional and Sector-specific Insights
AI-Driven Enrichment
Strategic Use Cases:
Marketing Campaigns and Outreach
Partnership Development and Vendor Collaboration
Market Research and Competitive Analysis
Recruitment and Talent Acquisition
Why Choose Success.ai?
Best Price Guarantee
Seamless Integration
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.
The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel household income by gender. The dataset can be utilized to understand the gender-based income distribution of Excel income.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of Excel income distribution by gender. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.
Key observations
The largest age group in Excel, AL was for the group of age 5 to 9 years years with a population of 77 (15.28%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.40%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here