Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The field of cancer research is overall ambiguous to the general population and apart from medical news, not a lot is known of the proceedings. This study aims to provide some clarity towards cancer research, especially towards the correlations between research of different types of cancer. The amount of research papers pertaining to different types of cancers is compared against mortality and diagnosis rates to determine the amount of research attention towards a type of cancer in relation to its overall importance or danger level to the general population. This is achieved through the use of many computational tools such as Python, R, and Microsoft Excel. Python is used to parse through the JSON files and extract the abstract and Altmetric score onto a single CSV file. R is used to iterate through the rows of the CSV files and count the appearance of each type of cancer in the abstract. As well as this, R creates the histograms describing Altmetric scores and file frequency. Microsoft Excel is used to provide further data analysis and find correlations between Altmetrics data and Canadian Cancer Society data. The analysis from these tools revealed that breast cancer was the most researched cancer by a large margin with nearly 1,700 papers. Although there were a large number of cancer research papers, the Altmetric scores revealed that most of these papers did not gain significant attention. By comparing these results to Canadian Cancer Society data, it was uncovered that Breast Cancer was receiving research attention that was not merited. There were four times more breast cancer research papers than the second most researched cancer, prostate cancer. This was despite the fact that breast cancer was fourth in mortality and third in new cases among all cancers. Inversely, lung cancer was underrepresented with only 401 research papers in spite of being the deadliest cancer in Canada.
Facebook
TwitterMicrosoft Excel based (using Visual Basic for Applications) data-reduction and visualization tools have been developed that allow to numerically reduce large sets of geothermal data to any size. The data can be quickly sifted through and graphed to allow their study. The ability to analyze large data sets can yield responses to field management procedures that would otherwise be undetectable. Field-wide trends such as decline rates, response to injection, evolution of superheat, recording instrumentation problems and data inconsistencies can be quickly queried and graphed. The application of these newly developed tools to data from The Geysers geothermal field is illustrated. A copy of these tools may be requested by contacting the authors.
Facebook
TwitterExcel spreadsheet tool that can be used to produce predicted costs for large pipe relining job, based on the project's final regression model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a comprehensive collection of consumer behavior data that can be used for various market research and statistical analyses. It includes information on purchasing patterns, demographics, product preferences, customer satisfaction, and more, making it ideal for market segmentation, predictive modeling, and understanding customer decision-making processes.
The dataset is designed to help researchers, data scientists, and marketers gain insights into consumer purchasing behavior across a wide range of categories. By analyzing this dataset, users can identify key trends, segment customers, and make data-driven decisions to improve product offerings, marketing strategies, and customer engagement.
Key Features: Customer Demographics: Understand age, income, gender, and education level for better segmentation and targeted marketing. Purchase Behavior: Includes purchase amount, frequency, category, and channel preferences to assess spending patterns. Customer Loyalty: Features like brand loyalty, engagement with ads, and loyalty program membership provide insights into long-term customer retention. Product Feedback: Customer ratings and satisfaction levels allow for analysis of product quality and customer sentiment. Decision-Making: Time spent on product research, time to decision, and purchase intent reflect how customers make purchasing decisions. Influences on Purchase: Factors such as social media influence, discount sensitivity, and return rates are included to analyze how external factors affect purchasing behavior.
Columns Overview: Customer_ID: Unique identifier for each customer. Age: Customer's age (integer). Gender: Customer's gender (categorical: Male, Female, Non-binary, Other). Income_Level: Customer's income level (categorical: Low, Middle, High). Marital_Status: Customer's marital status (categorical: Single, Married, Divorced, Widowed). Education_Level: Highest level of education completed (categorical: High School, Bachelor's, Master's, Doctorate). Occupation: Customer's occupation (categorical: Various job titles). Location: Customer's location (city, region, or country). Purchase_Category: Category of purchased products (e.g., Electronics, Clothing, Groceries). Purchase_Amount: Amount spent during the purchase (decimal). Frequency_of_Purchase: Number of purchases made per month (integer). Purchase_Channel: The purchase method (categorical: Online, In-Store, Mixed). Brand_Loyalty: Loyalty to brands (1-5 scale). Product_Rating: Rating given by the customer to a purchased product (1-5 scale). Time_Spent_on_Product_Research: Time spent researching a product (integer, hours or minutes). Social_Media_Influence: Influence of social media on purchasing decision (categorical: High, Medium, Low, None). Discount_Sensitivity: Sensitivity to discounts (categorical: Very Sensitive, Somewhat Sensitive, Not Sensitive). Return_Rate: Percentage of products returned (decimal). Customer_Satisfaction: Overall satisfaction with the purchase (1-10 scale). Engagement_with_Ads: Engagement level with advertisements (categorical: High, Medium, Low, None). Device_Used_for_Shopping: Device used for shopping (categorical: Smartphone, Desktop, Tablet). Payment_Method: Method of payment used for the purchase (categorical: Credit Card, Debit Card, PayPal, Cash, Other). Time_of_Purchase: Timestamp of when the purchase was made (date/time). Discount_Used: Whether the customer used a discount (Boolean: True/False). Customer_Loyalty_Program_Member: Whether the customer is part of a loyalty program (Boolean: True/False). Purchase_Intent: The intent behind the purchase (categorical: Impulsive, Planned, Need-based, Wants-based). Shipping_Preference: Shipping preference (categorical: Standard, Express, No Preference). Payment_Frequency: Frequency of payment (categorical: One-time, Subscription, Installments). Time_to_Decision: Time taken from consideration to actual purchase (in days).
Use Cases: Market Segmentation: Segment customers based on demographics, preferences, and behavior. Predictive Analytics: Use data to predict customer spending habits, loyalty, and product preferences. Customer Profiling: Build detailed profiles of different consumer segments based on purchase behavior, social media influence, and decision-making patterns. Retail and E-commerce Insights: Analyze purchase channels, payment methods, and shipping preferences to optimize marketing and sales strategies.
Target Audience: Data scientists and analysts looking for consumer behavior data. Marketers interested in improving customer segmentation and targeting. Researchers are exploring factors influencing consumer decisions and preferences. Companies aiming to improve customer experience and increase sales through data-driven decisions.
This dataset is available in CSV format for easy integration into data analysis tools and platforms such as Python, R, and Excel.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel spreadsheets. XLSX file containing the data from Sousa Abreu et al. which is used in the example of the article. (XLSX 611 kb)
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterThis research used econometric techniques to evaluate 73 relining jobs for large diameter steel pipe interiors to identify major cost drivers for such relining jobs and to specify a regression model for predicting future relining job costs. Reclamation’s inventory of 121 unrelined penstocks was evaluated with the final model to predict preliminary-level costs for future relining work. An app tool was developed using Microsoft Power Apps as an end-user interface for predicting relining costs based on the final regression model. A subsequent Microsoft Excel tool was developed to share with the study’s contribution partner agencies: Metropolitan Water District, Central Arizona Project, Denver Water, and BC Hydro.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.
IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.
IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform
The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.
Due to the changes in our systems, some tables have been affected.
Data quality has been improved across all tables.
Facebook
Twitter
According to our latest research, the global graph database market size in 2024 stands at USD 2.92 billion, with a robust compound annual growth rate (CAGR) of 21.6% projected from 2025 to 2033. By the end of 2033, the market is expected to reach approximately USD 21.1 billion. The rapid expansion of this market is primarily driven by the rising need for advanced data analytics, real-time big data processing, and the growing adoption of artificial intelligence and machine learning across various industry verticals. As organizations continue to seek innovative solutions to manage complex and interconnected data, the demand for graph database technologies is accelerating at an unprecedented pace.
One of the most significant growth factors for the graph database market is the exponential increase in data complexity and volume. Traditional relational databases often struggle to efficiently handle highly connected data, which is becoming more prevalent in modern business environments. Graph databases excel at managing relationships between data points, making them ideal for applications such as fraud detection, social network analysis, and recommendation engines. The ability to visualize and query data relationships in real-time provides organizations with actionable insights, enabling faster and more informed decision-making. This capability is particularly valuable in sectors like BFSI, healthcare, and e-commerce, where understanding intricate data connections can lead to substantial competitive advantages.
Another key driver fueling market growth is the widespread digital transformation initiatives undertaken by enterprises worldwide. As businesses increasingly migrate to cloud-based infrastructures and adopt advanced analytics tools, the need for scalable and flexible database solutions becomes paramount. Graph databases offer seamless integration with cloud platforms, supporting both on-premises and cloud deployment models. This flexibility allows organizations to efficiently manage growing data workloads while ensuring security and compliance. Additionally, the proliferation of IoT devices and the surge in unstructured data generation further amplify the demand for graph database solutions, as they are uniquely equipped to handle dynamic and heterogeneous data sources.
The integration of artificial intelligence and machine learning with graph databases is also a pivotal growth factor. AI-driven analytics require robust data models capable of uncovering hidden patterns and relationships within vast datasets. Graph databases provide the foundational infrastructure for such applications, enabling advanced features like predictive analytics, anomaly detection, and personalized recommendations. As more organizations invest in AI-powered solutions to enhance customer experiences and operational efficiency, the adoption of graph database technologies is expected to surge. Furthermore, continuous advancements in graph processing algorithms and the emergence of open-source graph database platforms are lowering entry barriers, fostering innovation, and expanding the marketÂ’s reach.
Graph Analytics is becoming an essential component in the realm of graph databases, offering powerful tools to analyze and visualize complex data relationships. As organizations strive to extract deeper insights from their data, graph analytics enables them to uncover hidden patterns and trends that are not easily detectable with traditional analytics methods. This capability is particularly beneficial for sectors such as finance, healthcare, and retail, where understanding intricate connections can lead to more informed strategic decisions. By leveraging graph analytics, businesses can enhance their predictive modeling, optimize operations, and ultimately drive competitive advantage in a data-driven world.
From a regional perspective, North America currently dominates the graph database market, owing to the early adoption of advanced technologies and the presence of major industry players. However, the Asia Pacific region is anticipated to witness the highest growth rate over the forecast period, driven by rapid digitalization, increasing investments in IT infrastructure, and the rising demand for data-driven decision-making across emerging economies. Europe also holds a significant share, supported by stringent dat
Facebook
Twitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The target company's hydraulic modelling package uses Innovyze InfoworksTM. This product enables third party integration through API’s and Ruby scripts when the ICM Exchange service is enabled. As a result, the research looked at opportunities to exploit scripting in order to run the chosen optimisation strategy. The first approach initially investigated the use of a CS-script tool that would export the results tables directly from the Innovyze InfoworksTM environment into CSV format workbooks. From here the data could then be inspected, with the application of mathematical tooling to optimise the pump start parameters before returning these back into the model and rerunning. Note, the computational resource the research obtained to deploy the modelling and analysis tools comprised the following specification. Hardware
Dell Poweredge R720
Intel Xeon Processor E5-2600 v2
2x Processor Sockets
32GB Memory random access memory (RAM) – 1866MT/s Virtual Machine
Hosted on VMWare Hypervisor v6.0.
Windows Server 2012R2.
Microsoft Excel 64bit.
16 virtual-central-processing-units (V-CPU’s).
Full provision of 32GB RAM – 1866MT/s.
were highlighted in the first round of data exports as, even with a dedicated
Issues server offering 16-V-CPUs, and the specification as shown above, the Excel frontend environment was unable to process the very large data matrices being generated. There were regular failings of the Excel executable which led to an overall inability to inspect the data let alone run calculations on the matrices. When considering the five- second sample over 31 days this resulted in matrices in the order of [44x535682] per model run, with the calculations in (14-19) needing to be applied on a per cell basis.
Facebook
TwitterTypically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
"This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."
Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.
Image from stocksnap.io.
Analyses for this dataset could include time series, clustering, classification and more.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The field of cancer research is overall ambiguous to the general population and apart from medical news, not a lot is known of the proceedings. This study aims to provide some clarity towards cancer research, especially towards the correlations between research of different types of cancer. The amount of research papers pertaining to different types of cancers is compared against mortality and diagnosis rates to determine the amount of research attention towards a type of cancer in relation to its overall importance or danger level to the general population. This is achieved through the use of many computational tools such as Python, R, and Microsoft Excel. Python is used to parse through the JSON files and extract the abstract and Altmetric score onto a single CSV file. R is used to iterate through the rows of the CSV files and count the appearance of each type of cancer in the abstract. As well as this, R creates the histograms describing Altmetric scores and file frequency. Microsoft Excel is used to provide further data analysis and find correlations between Altmetrics data and Canadian Cancer Society data. The analysis from these tools revealed that breast cancer was the most researched cancer by a large margin with nearly 1,700 papers. Although there were a large number of cancer research papers, the Altmetric scores revealed that most of these papers did not gain significant attention. By comparing these results to Canadian Cancer Society data, it was uncovered that Breast Cancer was receiving research attention that was not merited. There were four times more breast cancer research papers than the second most researched cancer, prostate cancer. This was despite the fact that breast cancer was fourth in mortality and third in new cases among all cancers. Inversely, lung cancer was underrepresented with only 401 research papers in spite of being the deadliest cancer in Canada.