89 datasets found

N
Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/4521c211-f122-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Excel
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

Key observations

The largest age group in Excel, AL was for the group of age 5 to 9 years years with a population of 77 (15.28%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.40%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Excel is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
N
Excel, AL Population Growth and Demographic Trends Dataset: Annual Editions...
neilsberg.com
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Excel, AL Population Growth and Demographic Trends Dataset: Annual Editions Collection // Editions 2000-2024 [Dataset]. https://www.neilsberg.com/research/datasets/bc2a806c-55e4-11ee-9c55-3860777c1fe6/
Explore at:
Dataset updated
Jul 30, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Excel
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Excel population by year. The dataset can be utilized to understand the population trend of Excel.

Content

The dataset constitues the following datasets

Excel, AL Population Dataset: Yearly Figures, Population Change, and Percent Change Analysis

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
D
Big Data Technology Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Big Data Technology Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-big-data-technology-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Big Data Technology Market Outlook

The global big data technology market size was valued at approximately $162 billion in 2023 and is projected to reach around $471 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 12.6% during the forecast period. The growth of this market is primarily driven by the increasing demand for data analytics and insights to enhance business operations, coupled with advancements in AI and machine learning technologies.

One of the principal growth factors of the big data technology market is the rapid digital transformation across various industries. Businesses are increasingly recognizing the value of data-driven decision-making processes, leading to the widespread adoption of big data analytics. Additionally, the proliferation of smart devices and the Internet of Things (IoT) has led to an exponential increase in data generation, necessitating robust big data solutions to analyze and extract meaningful insights. Organizations are leveraging big data to streamline operations, improve customer engagement, and gain a competitive edge.

Another significant growth driver is the advent of advanced technologies like artificial intelligence (AI) and machine learning (ML). These technologies are being integrated into big data platforms to enhance predictive analytics and real-time decision-making capabilities. AI and ML algorithms excel at identifying patterns within large datasets, which can be invaluable for predictive maintenance in manufacturing, fraud detection in banking, and personalized marketing in retail. The combination of big data with AI and ML is enabling organizations to unlock new revenue streams, optimize resource utilization, and improve operational efficiency.

Moreover, regulatory requirements and data privacy concerns are pushing organizations to adopt big data technologies. Governments worldwide are implementing stringent data protection regulations, like the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States. These regulations necessitate robust data management and analytics solutions to ensure compliance and avoid hefty fines. As a result, organizations are investing heavily in big data platforms that offer secure and compliant data handling capabilities.

As organizations continue to navigate the complexities of data management, the role of Big Data Professional Services becomes increasingly critical. These services offer specialized expertise in implementing and managing big data solutions, ensuring that businesses can effectively harness the power of their data. Professional services encompass a range of offerings, including consulting, system integration, and managed services, tailored to meet the unique needs of each organization. By leveraging the knowledge and experience of big data professionals, companies can optimize their data strategies, streamline operations, and achieve their business objectives more efficiently. The demand for these services is driven by the growing complexity of big data ecosystems and the need for seamless integration with existing IT infrastructure.

Regionally, North America holds a dominant position in the big data technology market, primarily due to the early adoption of advanced technologies and the presence of key market players. The Asia Pacific region is expected to witness the highest growth rate during the forecast period, driven by increasing digitalization, the rapid growth of industries such as e-commerce and telecommunications, and supportive government initiatives aimed at fostering technological innovation.

Component Analysis

The big data technology market is segmented into software, hardware, and services. The software segment encompasses data management software, analytics software, and data visualization tools, among others. This segment is expected to witness substantial growth due to the increasing demand for data analytics solutions that can handle vast amounts of data. Advanced analytics software, in particular, is gaining traction as organizations seek to gain deeper insights and make data-driven decisions. Companies are increasingly adopting sophisticated data visualization tools to present complex data in an easily understandable format, thereby enhancing decision-making processes.

<br /&
Individuals and Households Program - Valid Registrations
catalog.data.gov
s.cnmilf.com
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FEMA/Response and Recovery/Recovery Directorate (2025). Individuals and Households Program - Valid Registrations [Dataset]. https://catalog.data.gov/dataset/individuals-and-households-program-valid-registrations-nemis
Explore at:
Dataset updated
Jun 7, 2025
Dataset provided by
Federal Emergency Management Agencyhttp://www.fema.gov/
Description
This dataset contains FEMA applicant-level data for the Individuals and Households Program (IHP). All PII information has been removed. The location is represented by county, city, and zip code. This dataset contains Individual Assistance (IA) applications from DR1439 (declared in 2002) to those declared over 30 days ago. The full data set is refreshed on an annual basis and refreshed weekly to update disasters declared in the last 18 months. This dataset includes all major disasters and includes only valid registrants (applied in a declared county, within the registration period, having damage due to the incident and damage within the incident period). Information about individual data elements and descriptions are listed in the metadata information within the dataset.rnValid registrants may be eligible for IA assistance, which is intended to meet basic needs and supplement disaster recovery efforts. IA assistance is not intended to return disaster-damaged property to its pre-disaster condition. Disaster damage to secondary or vacation homes does not qualify for IHP assistance.rnData comes from FEMA's National Emergency Management Information System (NEMIS) with raw, unedited, self-reported content and subject to a small percentage of human error.rnAny financial information is derived from NEMIS and not FEMA's official financial systems. Due to differences in reporting periods, status of obligations and application of business rules, this financial information may differ slightly from official publication on public websites such as usaspending.gov. This dataset is not intended to be used for any official federal reporting. rnCitation: The Agency’s preferred citation for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page, Citing Data section: https://www.fema.gov/about/openfema/terms-conditions.rnDue to the size of this file, tools other than a spreadsheet may be required to analyze, visualize, and manipulate the data. MS Excel will not be able to process files this large without data loss. It is recommended that a database (e.g., MS Access, MySQL, PostgreSQL, etc.) be used to store and manipulate data. Other programming tools such as R, Apache Spark, and Python can also be used to analyze and visualize data. Further, basic Linux/Unix tools can be used to manipulate, search, and modify large files.rnIf you have media inquiries about this dataset, please email the FEMA News Desk at FEMA-News-Desk@fema.dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open Government program, please email the OpenFEMA team at OpenFEMA@fema.dhs.gov.rnThis dataset is scheduled to be superceded by Valid Registrations Version 2 by early CY 2024.
Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
f
GHS Safety Fingerprints
figshare.com
xlsx
Updated Oct 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Murphy (2018). GHS Safety Fingerprints [Dataset]. http://doi.org/10.6084/m9.figshare.7210019.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7210019.v3
Dataset updated
Oct 25, 2018
Dataset provided by
figshare
Authors
Brian Murphy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
w
Data from: New Data Reduction Tools and their Application to The Geysers...
data.wu.ac.at
pdf
Updated Dec 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). New Data Reduction Tools and their Application to The Geysers Geothermal Field [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/NjNiMTc2MzQtOWQ5Mi00MjE5LWEwOWQtZDFjMmE5YjcwZWM0
Explore at:
pdfAvailable download formats
Dataset updated
Dec 5, 2017
Area covered
3296a5bce23af293dbb49a144dfc986f894e7756, The Geysers
Description
Microsoft Excel based (using Visual Basic for Applications) data-reduction and visualization tools have been developed that allow to numerically reduce large sets of geothermal data to any size. The data can be quickly sifted through and graphed to allow their study. The ability to analyze large data sets can yield responses to field management procedures that would otherwise be undetectable. Field-wide trends such as decline rates, response to injection, evolution of superheat, recording instrumentation problems and data inconsistencies can be quickly queried and graphed. The application of these newly developed tools to data from The Geysers geothermal field is illustrated. A copy of these tools may be requested by contacting the authors.
Retail Transactions Dataset
kaggle.com
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasad Patil
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

Context:

Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

Inspiration:

The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

Dataset Information:

The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.

Date: The date and time when the transaction occurred. It records the timestamp of each purchase.

Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.

Product: A list of products purchased in the transaction. It includes the names of the products bought.

Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.

Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.

Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.

City: The city where the purchase took place. It indicates the location of the transaction.

Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.

Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.

Customer_Category: A category representing the customer's background or age group.

Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.

Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

Use Cases:

Market Basket Analysis: Discover associations between products and uncover buying patterns.

Customer Segmentation: Group customers based on purchasing behavior.

Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.

Retail Analytics: Analyze store performance and customer trends.

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
Student Performance Data Set
kaggle.com
Updated Mar 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
D
Document Databases Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Document Databases Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/document-databases-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Oct 16, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Document Databases Market Outlook

The global document databases market size was valued at approximately USD 3.5 billion in 2023 and is projected to reach around USD 8.2 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 9.7% over the forecast period. This impressive growth can be attributed to the increasing demand for more flexible and scalable database solutions that can handle diverse data types and structures.

One of the primary growth factors for the document databases market is the rising adoption of NoSQL databases. Traditional relational databases often struggle with the unstructured data generated by modern applications, social media, and IoT devices. NoSQL databases, such as document databases, offer a more flexible and scalable solution to handle this data, which has led to their increased adoption across various industry verticals. Additionally, the growing popularity of microservices architecture in application development also drives the need for document databases, as they provide the necessary agility and performance.

Another significant growth factor is the increasing volume of data generated globally. With the exponential growth of data, organizations require robust and efficient database management systems to store, process, and analyze vast amounts of information. Document databases excel in managing large volumes of semi-structured and unstructured data, making them an ideal choice for enterprises looking to harness the power of big data analytics. Furthermore, advancements in cloud computing have made it easier for organizations to deploy and scale document databases, further driving their adoption.

The rise of artificial intelligence (AI) and machine learning (ML) technologies is also propelling the growth of the document databases market. AI and ML applications require databases that can handle complex data structures and provide quick access to large datasets for training and inference purposes. Document databases, with their schema-less design and ability to store diverse data types, are well-suited for these applications. As more organizations incorporate AI and ML into their operations, the demand for document databases is expected to grow significantly.

Regionally, North America holds the largest market share for document databases, driven by the presence of major technology companies and a high adoption rate of advanced database solutions. Europe is also a significant market, with growing investments in digital transformation initiatives. The Asia Pacific region is anticipated to witness the highest growth rate during the forecast period, fueled by rapid technological advancements and increasing adoption of cloud-based solutions in countries like China, India, and Japan. Latin America and the Middle East & Africa are also experiencing growth, albeit at a slower pace, due to increasing digitalization efforts and the need for efficient data management solutions.

NoSQL Databases Analysis

NoSQL databases, a subset of document databases, have gained significant traction over the past decade. They are designed to handle unstructured and semi-structured data, making them highly versatile and suitable for a wide range of applications. Unlike traditional relational databases, NoSQL databases do not require a predefined schema, allowing for greater flexibility and scalability. This has led to their adoption in industries such as retail, e-commerce, and social media, where the volume and variety of data are constantly changing.

The key advantage of NoSQL databases is their ability to scale horizontally. Traditional relational databases often face challenges when scaling up, as they require more powerful hardware and complex configurations. In contrast, NoSQL databases can easily scale out by adding more servers to the database cluster. This makes them an ideal choice for applications that experience high traffic and require real-time data processing. Companies like Amazon, Facebook, and Google have already adopted NoSQL databases to manage their massive data workloads, setting a precedent for other organizations to follow.

Another driving factor for the adoption of NoSQL databases is their performance in handling large datasets. NoSQL databases are optimized for read and write operations, making them faster and more efficient than traditional relational databases. This is particularly important for applications that require real-time analytics and immediate data access. For instance, e-commerce platforms use NoSQL databases to provide personalized recommendations to users based on th
Source Data and Simulated Datasets for Sant et al. 2025 - CHOIR improves...
zenodo.org
application/gzip, bin
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cathrine Sant; Cathrine Sant (2025). Source Data and Simulated Datasets for Sant et al. 2025 - CHOIR improves significance-based detection of cell types and states from single-cell data [Dataset]. http://doi.org/10.5281/zenodo.14641222
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14641222
Dataset updated
Mar 15, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cathrine Sant; Cathrine Sant
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains files related to Sant et al. Nature Genetics 2025 which were too large to include as part of the publication. Below, we describe each file and its contents.

1. Simulated datasets and associated parameters

Simulated_Data_Parameters.xlsx - This file contains the parameters used to create the simulated datasets mentioned below. Briefly, using the R package Splatter, we generated 100 simulated datasets representing 1, 5, 10, or 20 distinct ground-truth cell populations, ranging from 500 to 25,000 cells. To assess how various aspects of snRNA-seq datasets affect CHOIR’s performance, we used five of the simulated datasets produced with Splatter as the baseline to generate 105 additional simulated datasets in which we incrementally reduced the prevalence of rare cell populations, the degree of differential expression, or the library size. Additionally, we generated 10 simulated datasets with multiple batches, with either balanced or imbalanced batch sizes, and 5 simulated datasets using Splatter’s simulation of cell differentiation trajectories. To ensure that our results were not dependent on the software used for data simulation, we also generated 10 datasets with the simulation method scDesign3 from real subsampled PBMC cell populations.

Simulated_Datasets.tar.gz - This tar.gz archive contains the 230 simulated datasets which were used for benchmarking of clustering tools for single-cell analysis in Sant et al. Nature Genetics 2025. The individual datasets have been stored as Seurat objects and combined into a single tar.gz file.

2. Source data and results for real-world datasets

SourceData1_RealData.xlsx - This excel file contains the parameters used, the metrics obtained, the cell labels obtained, and any relevant single-cell-resolution results from the analyses of the following real-world datasets: snMultiome human retina (Wang et al. Cell Genomics 2022), atlas-scale snRNA-seq of human brain (Siletti et al. Science 2023), scRNA-seq of mixed cell lines (Kinker et al. Nature Genetics 2020), CITE-seq of human PBMCs (Hao et al. Cell 2021), and sci-Space of mouse embryo (Srivatsan et al. Science 2021).

3. Source data and results for simulated datasets

SourceData2_SimulatedData.xlsx - This excel file contains the parameters used, the metrics obtained, and the cell labels obtained for all simulated datasets analyzed in Sant et al. Nature Genetics 2025.
N
Income Bracket Analysis by Age Group Dataset: Age-Wise Distribution of...
neilsberg.com
csv, json
Updated Feb 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Income Bracket Analysis by Age Group Dataset: Age-Wise Distribution of Excel, AL Household Incomes Across 16 Income Brackets // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/f34b22db-f353-11ef-8577-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 25, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Excel
Variables measured
Number of households with income $200,000 or more, Number of households with income less than $10,000, Number of households with income between $15,000 - $19,999, Number of households with income between $20,000 - $24,999, Number of households with income between $25,000 - $29,999, Number of households with income between $30,000 - $34,999, Number of households with income between $35,000 - $39,999, Number of households with income between $40,000 - $44,999, Number of households with income between $45,000 - $49,999, Number of households with income between $50,000 - $59,999, and 6 more
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across 16 income brackets (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out the total number of households within a specific income bracket along with how many households with that income bracket for each of the 4 age cohorts (Under 25 years, 25-44 years, 45-64 years and 65 years and over). For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the the household distribution across 16 income brackets among four distinct age groups in Excel: Under 25 years, 25-44 years, 45-64 years, and over 65 years. The dataset highlights the variation in household income, offering valuable insights into economic trends and disparities within different age categories, aiding in data analysis and decision-making..

Key observations

Upon closer examination of the distribution of households among age brackets, it reveals that there are 6(3.53%) households where the householder is under 25 years old, 59(34.71%) households with a householder aged between 25 and 44 years, 52(30.59%) households with a householder aged between 45 and 64 years, and 53(31.18%) households where the householder is over 65 years old.

In Excel, the age group of 25 to 44 years stands out with both the highest median income and the maximum share of households. This alignment suggests a financially stable demographic, indicating an established community with stable careers and higher incomes.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income brackets:

Less than $10,000

$10,000 to $14,999

$15,000 to $19,999

$20,000 to $24,999

$25,000 to $29,999

$30,000 to $34,999

$35,000 to $39,999

$40,000 to $44,999

$45,000 to $49,999

$50,000 to $59,999

$60,000 to $74,999

$75,000 to $99,999

$100,000 to $124,999

$125,000 to $149,999

$150,000 to $199,999

$200,000 or more

Variables / Data Columns

Household Income: This column showcases 16 income brackets ranging from Under $10,000 to $200,000+ ( As mentioned above).

Under 25 years: The count of households led by a head of household under 25 years old with income within a specified income bracket.

25 to 44 years: The count of households led by a head of household 25 to 44 years old with income within a specified income bracket.

45 to 64 years: The count of households led by a head of household 45 to 64 years old with income within a specified income bracket.

65 years and over: The count of households led by a head of household 65 years and over old with income within a specified income bracket.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel median household income by age. You can refer the same here
SKU-Level Transaction Data | Point-of-Sale (POS) Data | 1M+ Grocery,...
datarade.ai
Updated Jan 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MealMe (2025). SKU-Level Transaction Data | Point-of-Sale (POS) Data | 1M+ Grocery, Restaurant, and Retail stores stores with SKU level transactions [Dataset]. https://datarade.ai/data-products/sku-level-transaction-data-point-of-sale-pos-data-1m-g-mealme
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 29, 2025
Dataset provided by
MealMe, Inc.
Authors
MealMe
Area covered
Kosovo, Japan, Indonesia, Moldova (Republic of), Swaziland, Slovenia, Åland Islands, Ecuador, Ghana, New Zealand
Description
MealMe provides comprehensive grocery and retail SKU-level product data, including real-time pricing, from the top 100 retailers in the USA and Canada. Our proprietary technology ensures accurate and up-to-date insights, empowering businesses to excel in competitive intelligence, pricing strategies, and market analysis.

Retailers Covered: MealMe’s database includes detailed SKU-level data and pricing from leading grocery and retail chains such as Walmart, Target, Costco, Kroger, Safeway, Publix, Whole Foods, Aldi, ShopRite, BJ’s Wholesale Club, Sprouts Farmers Market, Albertsons, Ralphs, Pavilions, Gelson’s, Vons, Shaw’s, Metro, and many more. Our coverage spans the most influential retailers across North America, ensuring businesses have the insights needed to stay competitive in dynamic markets.

Key Features: SKU-Level Granularity: Access detailed product-level data, including product descriptions, categories, brands, and variations. Real-Time Pricing: Monitor current pricing trends across major retailers for comprehensive market comparisons. Regional Insights: Analyze geographic price variations and inventory availability to identify trends and opportunities. Customizable Solutions: Tailored data delivery options to meet the specific needs of your business or industry. Use Cases: Competitive Intelligence: Gain visibility into pricing, product availability, and assortment strategies of top retailers like Walmart, Costco, and Target. Pricing Optimization: Use real-time data to create dynamic pricing models that respond to market conditions. Market Research: Identify trends, gaps, and consumer preferences by analyzing SKU-level data across leading retailers. Inventory Management: Streamline operations with accurate, real-time inventory availability. Retail Execution: Ensure on-shelf product availability and compliance with merchandising strategies. Industries Benefiting from Our Data CPG (Consumer Packaged Goods): Optimize product positioning, pricing, and distribution strategies. E-commerce Platforms: Enhance online catalogs with precise pricing and inventory information. Market Research Firms: Conduct detailed analyses to uncover industry trends and opportunities. Retailers: Benchmark against competitors like Kroger and Aldi to refine assortments and pricing. AI & Analytics Companies: Fuel predictive models and business intelligence with reliable SKU-level data. Data Delivery and Integration MealMe offers flexible integration options, including APIs and custom data exports, for seamless access to real-time data. Whether you need large-scale analysis or continuous updates, our solutions scale with your business needs.

Why Choose MealMe? Comprehensive Coverage: Data from the top 100 grocery and retail chains in North America, including Walmart, Target, and Costco. Real-Time Accuracy: Up-to-date pricing and product information ensures competitive edge. Customizable Insights: Tailored datasets align with your specific business objectives. Proven Expertise: Trusted by diverse industries for delivering actionable insights. MealMe empowers businesses to unlock their full potential with real-time, high-quality grocery and retail data. For more information or to schedule a demo, contact us today!
f
Future of Science: Detecting Cancer
stemfellowship.figshare.com
png
Updated Feb 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vic Li; Cecilia Shi; Michael Yang; Eva Zhang; Suzy Zhang; George Li (2017). Future of Science: Detecting Cancer [Dataset]. http://doi.org/10.6084/m9.figshare.4621006.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4621006.v1
Dataset updated
Feb 5, 2017
Dataset provided by
STEM Fellowship Big Data Challenge
Authors
Vic Li; Cecilia Shi; Michael Yang; Eva Zhang; Suzy Zhang; George Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Using Excel Data Analysis Tools and BigML Machine Learning platform, we tested correlation between biopsy data for breast cancer and created a model which helps to distinguish between benign and malignant tumors. Data set of oncology patients were used to analyze links between 10 indicators collected by biopsy non- cancerous and cancerous tumours. Created model can be used as a future medical science tool and can be available to specially trained histology nurses in rural areas. Developed model that can be used to detect cancer on early stages is especially important in the view of the fact that detecting cancer at stage IV give patients of about 22% of survival rate 1.
d
Data from: Data and code from: Identification of a key target for...
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: Identification of a key target for elimination of nitrous oxide, a major greenhouse gas [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-identification-of-a-key-target-for-elimination-of-nitrous-oxide-a-major-c072f
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
Note: Data files will be made available upon manuscript publication This dataset contains all code and data needed to reproduce the analyses in the manuscript: IDENTIFICATION OF A KEY TARGET FOR ELIMINATION OF NITROUS OXIDE, A MAJOR GREENHOUSE GAS. Blake A. Oakley (1), Trevor Mitchell (2), Quentin D. Read (3), Garrett Hibbs (1), Scott E. Gold (2), Anthony E. Glenn (2) Department of Plant Pathology, University of Georgia, Athens, GA, USA. Toxicology and Mycotoxin Research Unit, U.S. National Poultry Research Center, United States Department of Agriculture-Agricultural Research Service, Athens, GA, USA Southeast Area, United States Department of Agriculture-Agricultural Research Service, Raleigh, NC, USA citation will be updated upon acceptance of manuscript Brief description of study aims Denitrification is a chemical process that releases nitrous oxide (N2O), a potent greenhouse gas. The NOR1 gene is part of the denitrification pathway in Fusarium. Three experiments were conducted for this study. (1) The N2O comparative experiment compares denitrification rates, as measured by N2O production, of a variety of Fusarium spp. strains with and without the NOR1 gene. (2) The N2O substrate experiment compares denitrification rates of selected strains on different growth media (substrates). For parts 1 and 2, linear models are fit comparing N2O production between strains and/or substrates. (3) The Bioscreen growth assay tests whether there is a pleiotropic effect of the NOR1 gene. In this portion of the analysis, growth curves are fit to assess differences in growth rate and carrying capacity between selected strains with and without the NOR1 gene. Code All code is included in a .zip archive generated from a private git repository on 2022-10-13 and archived as part of this dataset. The code is contained in R scripts and RMarkdown notebooks. There are two components to the analysis: the denitrification analysis (comprising parts 1 and 2 described above) and the Bioscreen growth analysis (part 3). The scripts for each are listed and described below. Analysis of results of denitrification experiments (parts 1 and 2) NOR1_denitrification_analysis.Rmd: The R code to analyze the experimental data comparing nitrous oxide emissions is all contained in a single RMarkdown notebook. This script analyzes the results from the comparative study and the substrate study. n2o_subgroup_figures.R: R script to create additional figures using the output from the RMarkdown notebook Analysis of results of Bioscreen growth assay (part 3) bioscreen_analysis.Rmd: This RMarkdown notebook contains all R code needed to analyze the results of the Bioscreen assay comparing growth of the different strains. It could be run as is. However, the model-fitting portion was run on a high-performance computing cluster with the following scripts: bioscreen_fit_simpler.R: R script containing only the model-fitting portion of the Bioscreen analysis, fit using the Stan modeling language interfaced with R through the brms and cmdstanr packages. job_bssimple.sh: Job submission shell script used to submit the model-fitting R job to be run on USDA SciNet high-performance computing cluster. Additional scripts developed as part of the analysis but that are not required to reproduce the analyses in the manuscript are in the deprecated/ folder. Also note the files nor1-denitrification.Rproj (RStudio project file) and gtstyle.css (stylesheet for formatting the tables in the notebooks) are included. Data Data required to run the analysis scripts are archived in this dataset, other than strain_lookup.csv, a lookup table of strain abbreviations and full names included in the code repository for convenience. They should be placed in a folder or symbolic link called project within the unzipped code repository directory. N2O_data_2022-08-03/N2O_Comparative_Study_Trial_(n)(date range).xlsx: These are the data from the N2O comparative study, where n is the trial number from 1-3 and date range is the begin and end date of the trial. N2O_data_2022-08-03/Nitrogen_Substrate_Study_Trial(n)(date range).xlsx: These are the data from the N2O substrate study, where n is the trial number from 1-3 and date range is the begin and end date of the trial. Outliers_NOR1_2022/Bioscreen_NOR1_Fungal_Growth_Assay(substrate)(oxygen level)_Outliers_BAO(date).xlsx: These are the raw Bioscreen data files in MS Excel format. The format of each file name includes the substrate (minimal medium with nitrite or nitrate and lysine), oxygen level (hypoxia or normoxia), and date of the run. This repository includes code to process these files, but the processed data are also included on Ag Data Commons, so it is not necessary to run the data processing portion of the code. clean_data/bioscreen_clean_data.csv: This is an intermediate output file in CSV format generated by bioscreen_analysis.Rmd. It includes all the data from the Bioscreen assays in a clean analysis-ready format.
Financial Statements of Major Companies(2009-2023)
kaggle.com
Updated Dec 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishabh Patil (2023). Financial Statements of Major Companies(2009-2023) [Dataset]. https://www.kaggle.com/datasets/rish59/financial-statements-of-major-companies2009-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rishabh Patil
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This is a compiled datasets comprising of data from various companies' 10-K annual reports and balance sheets. The data is a longitudinal or panel data, from year 2009-2022(/23) and also consists of a few bankrupt companies to help for investigating factors. The names of the companies are given according to their Stocks. Companies divided into specific categories.
d
Data from: Data cleaning and enrichment through data integration: networking...
search.dataone.org
datadryad.org
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar (2025). Data cleaning and enrichment through data integration: networking the Italian academia [Dataset]. http://doi.org/10.5061/dryad.wpzgmsbwj
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.wpzgmsbwj
Dataset updated
Feb 25, 2025
Dataset provided by
Dryad Digital Repository
Authors
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar
Description
We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. , The proposed network is built starting from two distinct data sources:

the entire dataset dump from Semantic Scholar (with particular emphasis on the authors and papers datasets) the entire list of Italian faculty members as maintained by Cineca (under appointment by the Italian Ministry of University and Research).

By means of a custom name-identity recognition algorithm (details are available in the accompanying paper published in Scientific Data), the names of the authors in the Semantic Scholar dataset have been mapped against the names contained in the Cineca dataset and authors with no match (e.g., because of not being part of an Italian university) have been discarded. The remaining authors will compose the nodes of the network, which have been enriched with node-related (i.e., author-related) attributes. In order to build the network edges, we leveraged the papers dataset from Semantic Scholar: specifically, any two authors are said to be connected if there is at least one pap..., , # Data cleaning and enrichment through data integration: networking the Italian academia

https://doi.org/10.5061/dryad.wpzgmsbwj

Manuscript published inÂ Scientific Data with DOI .

Description of the data and file structure

This repository contains two main data files:

edge_data_AGG.csv, the full network in comma-separated edge list format (this file contains mainly temporal co-authorship information);

Coauthorship_Network_AGG.graphml, the full network in GraphML format.Â

along with several supplementary data, listed below, useful only to build the network (i.e., for reproducibility only):

University-City-match.xlsx, an Excel file that maps the name of a university against the city where its respective headquarter is located;

Areas-SS-CINECA-match.xlsx, an Excel file that maps the research areas in Cineca against the research areas in Semantic Scholar.

Description of the main data files

TheÂ `Coauthorship_Networ...
Ecommerce Store Data | APAC E-commerce Sector | Verified Business Profiles...
datarade.ai
Updated Jan 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2018). Ecommerce Store Data | APAC E-commerce Sector | Verified Business Profiles with Key Insights | Best Price Guarantee [Dataset]. https://datarade.ai/data-products/ecommerce-store-data-apac-e-commerce-sector-verified-busi-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 1, 2018
Dataset provided by
Area covered
Northern Mariana Islands, Malta, Korea (Democratic People's Republic of), Italy, Austria, Mexico, Fiji, Canada, Andorra, Lao People's Democratic Republic
Description
Success.ai’s Ecommerce Store Data for the APAC E-commerce Sector provides a reliable and accurate dataset tailored for businesses aiming to connect with e-commerce professionals and organizations across the Asia-Pacific region. Covering roles and businesses involved in online retail, marketplace management, logistics, and digital commerce, this dataset includes verified business profiles, decision-maker contact details, and actionable insights.

With access to continuously updated, AI-validated data and over 700 million global profiles, Success.ai ensures your outreach, market analysis, and partnership strategies are effective and data-driven. Backed by our Best Price Guarantee, this solution helps you excel in one of the world’s fastest-growing e-commerce markets.

Why Choose Success.ai’s Ecommerce Store Data?

Verified Profiles for Precision Engagement

Access verified profiles, business locations, employee counts, and decision-maker details for e-commerce businesses across APAC.

AI-driven validation ensures 99% accuracy, improving engagement rates and reducing outreach inefficiencies.

Comprehensive Coverage of the APAC E-commerce Sector

Includes businesses from major e-commerce hubs such as China, India, Japan, South Korea, Australia, and Southeast Asia.

Gain insights into regional e-commerce trends, digital transformation efforts, and logistics innovations.

Continuously Updated Datasets

Real-time updates ensure that business profiles, employee roles, and operational insights remain accurate and relevant.

Stay aligned with dynamic market conditions and emerging opportunities in the APAC region.

Ethical and Compliant

Fully adheres to GDPR, CCPA, and other global data privacy regulations, ensuring responsible and lawful data usage.

Data Highlights:

700M+ Verified Global Profiles: Access business profiles for e-commerce professionals and organizations across APAC.

Firmographic Insights: Gain detailed information, including business locations, employee counts, and operational details.

Decision-maker Profiles: Connect with key e-commerce leaders, managers, and strategists driving online retail innovation.

Industry Trends: Understand emerging e-commerce trends, consumer behavior, and market dynamics in the APAC region.

Key Features of the Dataset:

Comprehensive E-commerce Business Profiles

Identify and connect with businesses specializing in online retail, marketplace management, and digital commerce logistics.

Target decision-makers involved in supply chain optimization, digital marketing, and platform development.

Advanced Filters for Precision Campaigns

Filter businesses and professionals by industry focus (fashion, electronics, grocery), geographic location, or employee size.

Tailor campaigns to address specific goals, such as promoting technology adoption, enhancing customer engagement, or expanding supply chains.

Regional and Sector-specific Insights

Leverage data on APAC’s fast-growing e-commerce markets, consumer purchasing trends, and regional challenges.

Refine your marketing strategies and outreach efforts to align with market priorities.

AI-Driven Enrichment

Profiles enriched with actionable data allow for personalized messaging, highlight unique value propositions, and improve engagement outcomes.

Strategic Use Cases:

Marketing Campaigns and Outreach

Promote e-commerce solutions, logistics services, or digital commerce tools to businesses and professionals in the APAC region.

Use verified contact data for multi-channel outreach, including email, phone, and social media campaigns.

Partnership Development and Vendor Collaboration

Build relationships with e-commerce marketplaces, logistics providers, and payment solution companies seeking strategic partnerships.

Foster collaborations that drive operational efficiency, enhance customer experiences, or expand market reach.

Market Research and Competitive Analysis

Analyze regional e-commerce trends, consumer preferences, and logistics challenges to refine product offerings and business strategies.

Benchmark against competitors to identify growth opportunities and high-demand solutions.

Recruitment and Talent Acquisition

Target HR professionals and hiring managers in the e-commerce industry recruiting for roles in operations, logistics, and digital marketing.

Provide workforce optimization platforms or training solutions tailored to the digital commerce sector.

Why Choose Success.ai?

Best Price Guarantee

Access premium-quality e-commerce store data at competitive prices, ensuring strong ROI for your marketing, sales, and strategic initiatives.

Seamless Integration

Integrate verified e-commerce data into CRM systems, analytics platforms, or market...
Predictive Maintenance Dataset
kaggle.com
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 7, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Agarwal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.
N
Dataset for Excel, AL Census Bureau Income Distribution by Gender
neilsberg.com
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Dataset for Excel, AL Census Bureau Income Distribution by Gender [Dataset]. https://www.neilsberg.com/research/datasets/b3afce66-abcb-11ee-8b96-3860777c1fe6/
Explore at:
Dataset updated
Jan 9, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Excel
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Excel household income by gender. The dataset can be utilized to understand the gender-based income distribution of Excel income.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Excel, AL annual median income by work experience and sex dataset : Aged 15+, 2010-2022 (in 2022 inflation-adjusted dollars)

Excel, AL annual income distribution by work experience and gender dataset (Number of individuals ages 15+ with income, 2021)

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Excel income distribution by gender. You can refer the same here

Facebook

Twitter

Click to copy link

Link copied

Cite

Neilsberg Research (2025). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/4521c211-f122-11ef-8c1b-3860777c1fe6/

Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition

Explore at:

json, csvAvailable download formats

Dataset updated

Feb 22, 2025

Dataset authored and provided by

Neilsberg Research

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Excel

Variables measured

Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more

Measurement technique

The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.

Dataset funded by

Neilsberg Research

Description

About this dataset

Context

The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

Key observations

The largest age group in Excel, AL was for the group of age 5 to 9 years years with a population of 77 (15.28%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.40%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

Age groups:

Under 5 years
5 to 9 years
10 to 14 years
15 to 19 years
20 to 24 years
25 to 29 years
30 to 34 years
35 to 39 years
40 to 44 years
45 to 49 years
50 to 54 years
55 to 59 years
60 to 64 years
65 to 69 years
70 to 74 years
75 to 79 years
80 to 84 years
85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration
Population: The population for the specific age group in the Excel is shown in this column.
% of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here

Clear search

Close search

Google apps

Main menu

Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

About this dataset

Content

Inspiration

Recommended for further research

Excel, AL Population Growth and Demographic Trends Dataset: Annual Editions...

About this dataset

Content

Inspiration

Big Data Technology Market Report | Global Forecast From 2025 To 2033

Big Data Technology Market Outlook

Component Analysis

Individuals and Households Program - Valid Registrations

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

GHS Safety Fingerprints

Data from: New Data Reduction Tools and their Application to The Geysers...

Retail Transactions Dataset

Context:

Inspiration:

Dataset Information:

Use Cases:

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

Student Performance Data Set

Document Databases Market Report | Global Forecast From 2025 To 2033

Document Databases Market Outlook

NoSQL Databases Analysis

Source Data and Simulated Datasets for Sant et al. 2025 - CHOIR improves...

Income Bracket Analysis by Age Group Dataset: Age-Wise Distribution of...

About this dataset

Content

Inspiration

Recommended for further research

SKU-Level Transaction Data | Point-of-Sale (POS) Data | 1M+ Grocery,...

Future of Science: Detecting Cancer

Data from: Data and code from: Identification of a key target for...

Financial Statements of Major Companies(2009-2023)

Data from: Data cleaning and enrichment through data integration: networking...

Description of the data and file structure

Description of the main data files

Ecommerce Store Data | APAC E-commerce Sector | Verified Business Profiles...

Predictive Maintenance Dataset

Dataset for Excel, AL Census Bureau Income Distribution by Gender

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 EditionSee More Versions

About this dataset

Content

Inspiration

Recommended for further research

`Context:`

`Inspiration:`

`Dataset Information:`

`Use Cases:`

Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition