Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterAssembled from 196 references, this database records a total of 3,861 cases of historical dam failures around the world and represents the largest compilation of dam failures recorded to date (17-02-2020). Indeed, in this database is recorded historical dam failure regardless of the type of dams (e.g. man-made dam, tailing dam, temporary dam, natural dam, etc.), either the type of structure (e.g. concrete dam, embankment dam, etc.), the type of failure (e.g. pipping failure, overtopping failure, etc.) or the properties of the dams (e.g. dam height, reservoir capacity, etc.). Through this process, a total of 45 variables (i.e. which composed the “dataset”, obtained) have been used (when possible/available and relevant) to record various information about the failure (e.g. dam descriptions, dam properties, breach dimensions, etc.). Coupled with the Excel’s functionalities (e.g. adapted from Excel 2016; customizable screen visualization, individual search of specific cases, data filter, pivot table, etc.), the database file can easily be adapted to the needs of the user (i.e. research field, dam type, dam failure type, etc.) and is considered as a door opening in various fields of research (e.g. such as hydrology, hydraulics and dam safety). Also, notice that the dataset proposed allows any user to optimize the verification process, to identify duplicates and to put back in context the historical dam failures recorded. Overall, this investigation work has aimed to standardize data collection of historical dam failures and to facilitate the international collection by setting guidelines. Indeed, the sharing method (i.e. provided through this link) not only represents a considerable asset for a wide audience (e.g. researchers, dams’ owner, etc.) but, furthermore, allows paving the way for the field of dam safety in the actual era of "Big Data". Updated versions will be deposited (at this DOI) at undetermined frequencies in order to update the data recorded over the years. Cette base de données, compile un total de 3 861 cas de rupture de barrages à travers le monde, soit la plus large compilation de ruptures historiques de barrages actuellement disponible dans la littérature (17-02-2020), et a été obtenue suite à la revue de 196 références. Pour ce faire, les cas de ruptures de barrages historiques recensés ont été enregistrés dans le fichier XLSX fourni, et ce, indépendamment du domaine d’application (ex. barrage construit par l’Homme, barrage à rétention minier, barrage temporaire, barrage naturel, etc.), du type d’ouvrage (ex. barrage en béton, barrage en remblai, etc.), du mode de rupture (ex. rupture par effet de Renard, rupture par submersion, etc.) et des propriétés des ouvrages (ex. hauteur du barrage, capacité du réservoir, etc.). Au fil du processus de compilation, un jeu de 45 variables a été obtenu afin d’enregistrer les informations (lorsque possible/disponible et pertinente) décrivant les données recensées dans la littérature (ex. descriptions du barrage, propriétés du barrage, dimensions de la brèche de rupture, etc.). De ce fait, le travail d’investigation et de compilation, ayant permis d’uniformiser et de standardiser cette collecte de données de différents types de barrages, a ainsi permis de fournir des balises facilitant la collecte de données à l’échelle internationale. Soulignons qu’en couplant la base de données aux fonctionnalités d'Excel (ex. pour Excel 2016: visualisation d'écran personnalisable, recherche individuelle de cas spécifiques, filtre de données, tableau croisé dynamique, etc.), le fichier peut également aisément être adapter aux besoins de son utilisateur (ex. domaine d’étude, type de barrage, type de rupture de barrage, etc.), ouvrant ainsi la porte à de nouvelles études dans divers domaines de recherche (ex. domaine de l’hydrologie, l’hydraulique et de la sécurité des barrages), grâce aux données nouvellement compilées. De ce fait, cette méthode de partage, mise gratuitement à la disposition de la communauté internationale par l’entremise de cette page web, représente donc non seulement un atout considérable pour un large public (ex. chercheurs, propriétaires de barrages, etc.), mais permet au domaine de la sécurité des barrages d’entrer dans l'actuelle ère du « Big Data ». Des versions mises à jour seront par le fait même déposées (via ce DOI) à des fréquences indéterminées afin de mettre à jour les données enregistrées au fil des ans.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset provides atomic coordinates for metal-organic frameworks (MOFs), enabling high-throughput computational screening of MOFs in a broad range of scenarios. The dataset is derived from the Cambridge Structural Database (CSD) and across the internet and offers an array of useful parameters, like accessible surface area (ASA), non-accessible surface area (NASA), largest cavity diameter (LCD), pore limiting diameter (PLD)and more. The results yielded by this dataset may prove to be very helpful in assessing the potential of MOFs as prospective materials for chemical separations, transformations and functional nanoporous materials. This can bring about improvements to many industries and help devise better products for consumers worldwide. If errors are found in this data, there is a feedback form available which can be used to report your findings. We appreciate your interest in our project and hope you will make good use out of this data!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This guide will introduce you to the CoRE MOF 2019 dataset and explain how to properly use it for high-throughput computational screenings. It will provide you with the necessary background information and knowledge for successful use of this dataset.
The CoRE MOF 2019 Dataset contains atomic coordinates for metal-organic frameworks (MOFs) which can be used as inputs for simulation software packages, enabling high-throughput computational screening of these MOFs. This dataset is derived from both the Cambridge Structural Database (CSD) and World Wide Web sources, providing powerful data on which MOF systems are suitable for potential applications in chemical separations, transformations, and functional nanoporous materials.
In order to make efficient use of this dataset, it is important that you familiarize yourself with all available columns. The columns contain information about a given MOF system such as LCD (largest cavity diameter), PLD (pore limiting diameter), LFPD (largest sphere along the free path), ASA (accessible surface area), NASA (non-accessible surface area), void fraction (AV_VF). Additionally there is also useful metadata such as public availability status, CSD overlap references in CoRE or CCDC databases, DOI details if available etc.. To get a full list of all these features please refer to the provided documentation or codebook on Kaggle website or your own research.
Once you are familiar with column specifications it's time to move forward by downloading the actual database file from Kaggle servers. The downloaded file should be opened in MS Excel/CSV format where each row will represent a single distinct MOFS whereas each respective column represents its corresponding parameters value/range depending upon type(integer/float/boolean). Considering specific row from database shows us every information related to particular Molecular Framework System like AAC: Surface Area accessible by molecules outside pore (m^2). Using such info one can easily compare two different molecular framework systems directly without need for any pre processing algorithm or manual calculations typically required when comparing right values across different datasets holding same type of informations like respective project MCMC Algorithm running upon obtain structure hypothesis produces various mathematical linear variables whose direct comparison over simple values won't make much useful score out [until processed#naturally]. Thus after ensuring minimum data loss occurred during formatting one should seriously consider performing direct analysis involving entire set rather loopin[g #ASAP] into individual rows and perform direct comparisions though they might appear simpler at first instance
- Create an open source library of automated SIM simulations for MOFs, which can be used to generate results quickly and accurately.
- Update the existing Porous Materials Database (PMD) software with additional data fields that leverage insights from this dataset, allowing users to easily search and filter MOFs by specific structural characteristics.
- Develop a web-based interface that allows researchers to visualize different MOF structures using realistic 3D images derived from the atomic data provided in the dataset
If you use this dataset in your research, please credit the original authors. Data Source
...
Facebook
TwitterThe last stream within the NESP 5.5 project was related to the conduct of an online survey to get aesthetic ratings of additional 3500 images downloaded from Flickr to improve the Artificial Intelligence (AI)-based system recognising and assessing the beauty of natural scenes, which had been developed in the previous NESP 3.2.3 project. Despite some earlier investment into this research area, there is still a need to improve the tools we use to measure the aesthetic beauty of marine landscapes. This research drew on images publicly available on the Internet (in particular through the photo sharing site Flickr) to build a large dataset of GBR images for the assessment of aesthetic value. Building on earlier work in NESP TWQ Hub Project 3.2.3, we conducted a survey focused on collecting beauty scores of an additional large number of GBR images (n = 3500). This dataset consists of one dataset report, two word files and one excel file demonstrating the aesthetic ratings collected used to improve the accuracy of the aesthetic monitoring AI system.
Methods: The third research stream was conducted on the basis of an online survey to collect aesthetic ratings of 1585 Australians to rate the aesthetic beauty of 3500 GBR underwater pictures downloaded and selected from Flickr. Flickr is an image hosting service and one of the main sources of images for our project. As per our requirement, we downloaded all images and their metadata (including coordinates where available) based on keyword filter such as “Great Barrier Reef”. The Flickr API is available for non-commercial (but commercial use is possible by prior arrangement) use by outside developers. To ensure a much larger and diverse supply of photographs, we have developed a python-based application using Flickr API that allowed us to download Flickr images by keyword (e.g. “Great Barrier Reef” available at https://www.flickr.com). The focus of this research was on under-water images, which had to be filtered from the downloaded Flickr photos. From the collected images we identified an additional number of 3020 relevant images with coral and fish contents out of a total of approximately 55,000 downloaded images. Matt Curnock, CSIRO expert, also provide 100 images from his private images taken at the GBR and consent to use these images for our research. In total, 3120 images were selected and renamed to be rated in a survey by Australian participants (see two file “Image modification” and “Matt image rename” in the AI folder for further details).
The survey was created on Qualtrics website and launched in in April 2020 using Qualtrics survey service. After giving the consent to participating in the online survey, each respondent was randomly exposed to 50 images of the GBR and rate the aesthetic of the GBR scenery on a 10 point scale (1-Very ugly/unpleasant – Very beautiful/pleasant). In total, 1585 complete and valid questionnaires were recorded. Aesthetic rating results was exported to an Excel file and used for improving the accuracy of the computer algorithm recognising and assessing the beauty of natural scenes which had been developed in the previous NESP 3.2.3 project.
Further information can be found here: Stantic, B. and Mandal, R. (2020) Aesthetic Assessment of the Great Barrier Reef using Deep Learning. Report to the National Environmental Science Program. Reef and Rainforest Research Centre Limited, Cairns (30pp.). Available at https://nesptropical.edu.au/wp-content/uploads/2020/11/NESP-TWQ-Project-5.5-Technical-Report-3.pdf
Format: The AI DATASET has one dataset report, one excel file showing aesthetic ratings of all images and two Word files showing how images downloaded from Flickr website and provided by Matt Curnock (CSIRO) were renamed and used for aesthetic ratings and AI development. The aesthetic rating results were later used to improve the accuracy of the AI aesthetic monitoring system for the GBR.
Further information can be found here: Stantic, B. and Mandal, R. (2020) Aesthetic Assessment of the Great Barrier Reef using Deep Learning. Report to the National Environmental Science Program. Reef and Rainforest Research Centre Limited, Cairns (30pp.). Available at https://nesptropical.edu.au/wp-content/uploads/2020/11/NESP-TWQ-Project-5.5-Technical-Report-3.pdf
References: Murray, N., Marchesotti, M. & Perronnin, F (2012). AVA: A Large-Scale Database for Aesthetic Visual Analysis. Available (09/10/17) http://refbase.cvc.uab.es/files/MMP2012a.pdf
Data Location: This dataset is filed in the eAtlas enduring data repository at: data\custodian\2019-2022-NESP-TWQ-5\5.5_Measuring-aesthetics
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .
We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments
--View dataset
SELECT *
FROM netflix;
--The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
SELECT show_id, COUNT(*)
FROM netflix
GROUP BY show_id
ORDER BY show_id DESC;
--No duplicates
--Check null values across columns
SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
FROM netflix;
We can see that there are NULLS.
director_nulls = 2634
movie_cast_nulls = 825
country_nulls = 831
date_added_nulls = 10
rating_nulls = 4
duration_nulls = 3
The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column
-- Below, we find out if some directors are likely to work with particular cast
WITH cte AS
(
SELECT title, CONCAT(director, '---', movie_cast) AS director_cast
FROM netflix
)
SELECT director_cast, COUNT(*) AS count
FROM cte
GROUP BY director_cast
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;
With this, we can now populate NULL rows in directors
using their record with movie_cast
UPDATE netflix
SET director = 'Alastair Fothergill'
WHERE movie_cast = 'David Attenborough'
AND director IS NULL ;
--Repeat this step to populate the rest of the director nulls
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET director = 'Not Given'
WHERE director IS NULL;
--When I was doing this, I found a less complex and faster way to populate a column which I will use next
Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column
--Populate the country using the director column
SELECT COALESCE(nt.country,nt2.country)
FROM netflix AS nt
JOIN netflix AS nt2
ON nt.director = nt2.director
AND nt.show_id <> nt2.show_id
WHERE nt.country IS NULL;
UPDATE netflix
SET country = nt2.country
FROM netflix AS nt2
WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id
AND netflix.country IS NULL;
--To confirm if there are still directors linked to country that refuse to update
SELECT director, country, date_added
FROM netflix
WHERE country IS NULL;
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET country = 'Not Given'
WHERE country IS NULL;
The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization
--Show date_added nulls
SELECT show_id, date_added
FROM netflix_clean
WHERE date_added IS NULL;
--DELETE nulls
DELETE F...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SYstem for iNTegrating Human dimensions, Ecosystem Services and Economic Assessment for Sustainability. CSIRO has developed this Shiny application to store metadata information relevant to datasets found in the Great Barrier Reef region which are primarily ecosystem service focused. The Shiny app allows users to filter the records describing the catalog of datasets by ES category (Provisioning, Regulating & Cultural) and Ecosystem service. It also allows users to filter by GBR User type (First Nations, Government, Household and industry). Users can also filter by various components in the Ecosystem Service Value Chain (ESVC) (eg. Use, measures, and derived values) to enable users to understand the various datasets which have been used to construct an ESVC.
Lineage: The SEABORNE (Sustainable UsE And Benefits fOR mariNE) project has consolidatied and synthesised existing information about who is using the Reef, how it is being used and what the benefits are from this use. CSIRO's research on the Great Barrier Reef (GBR) has been identified as a Category 4 Mission for the organisation, with well-established investors and collaborators, an internal coordination architecture, and delivering impact from a large portfolio of research across several Business Units. This project is one of several key strategic outcomes of the Great Barrier Reef Platform. SEABORNE began in November 2021, with the project team initially developing and sourcing a list of potential datasets relevant to the research question. An Excel spreadsheet was trialled to make it more of a data entry form for users, however we encountered problems with dropdown fields not allowing multiple selections of values, due to the version of Excel and VBA programming. As part of the SCCPs (ERRFP-1322) we developed a Shiny (R) dashboard that allowed the database to be filtered and searched in a user-friendly manner. We exported the data from the MS ACCESS database as a CSV and used this in the Shiny app. This app also allows the spatial extent data from CSIRO DAP and the GBRMPA catalogue (online json file) to be read and displayed from the relevant website on a Leaflet map. Each record in the metadata database (has a UniqueID) and pertains to a dataset which has been used or considered in the SEABORNE project. This tool allows researchers a summary of what’s available, particularly in the GBR in relation to ecosystem services, where to get the data, what the data is about, the quality of the data etc, and who to contact to acquire the data. It is NOT a data warehouse, nor is it a data portal to download data from.
Facebook
TwitterBy Throwback Thursday [source]
The dataset includes data on Christianity, Islam, Judaism, Buddhism, Hinduism, Sikhism, Shintoism, Baha'i Faith, Taoism, Confucianism, Jainism and various other syncretic and animist religions. For each religion or denomination category, it provides both the total population count and the percentage representation in relation to the overall population.
Additionally, - Columns labeled with Population provide numeric values representing the total number of individuals belonging to a particular religion or denomination. - Columns labeled with Percent represent numerical values indicating the percentage of individuals belonging to a specific religion or denomination within a given population. - Columns that begin with ** indicate primary categories (e.g., Christianity), while columns that do not have this prefix refer to subcategories (e.g., Christianity - Roman Catholics).
In addition to providing precise data about specific religions or denominations globally throughout multiple years,this dataset also records information about geographical locations by including state or country names under StateNme.
This comprehensive dataset is valuable for researchers seeking information on global religious trends and can be used for analysis in fields such as sociology, anthropology studies cultural studies among others
Introduction:
Understanding the Columns:
Year: Represents the year in which the data was recorded.
StateNme: Represents the name of the state or country for which data is recorded.
Population: Represents the total population of individuals.
Total Religious: Represents the total percentage and population of individuals who identify as religious, regardless of specific religion.
Non Religious: Represents the percentage and population of individuals who identify as non-religious or atheists.
Identifying Specific Religions: The dataset includes columns for different religions such as Christianity, Judaism, Islam, Buddhism, Hinduism, etc. Each religion is further categorized into specific denominations or types within that religion (e.g., Roman Catholics within Christianity). You can find relevant information about these religions by focusing on specific columns related to each one.
Analyzing Percentages vs. Population: Some columns provide percentages while others provide actual population numbers for each category. Depending on your analysis requirement, you can choose either column type for your calculations and comparisons.
Accessing Historical Data: The dataset includes records from multiple years allowing you to analyze trends in religious populations over time. You can filter data based on specific years using Excel filters or programming languages like Python.
Filtering Data by State/Country: If you are interested in understanding religious populations in a particular state or country, use filters to focus on that region's data only.
Example - Extracting Information:
Let's say you want to analyze Hinduism's growth globally from 2000 onwards:
- Identify Relevant Columns:
- Year: to filter data from 2000 onwards.
Hindu - Total (Percent): to analyze the percentage of individuals identifying as Hindus globally.
Filter Data:
Set a filter on the Year column and select values greater than or equal to 2000.
Look for rows where Hindu - Total (Percent) has values.
Analyze Results: You can now visualize and calculate the growth of Hinduism worldwide after filtering out irrelevant data. Use statistical methods or graphical representations like line charts to understand trends over time.
Conclusion: This guide has provided you with an overview of how to use the Rel
- Comparing religious populations across different countries: With data available for different states and countries, this dataset allows for comparisons of religious populations across regions. Researchers can analyze how different religions are distributed geographically and compare their percentages or total populations across various locations.
- Studying the impact of historical events on religious demographics: Since the dataset includes records categorized by year, it can be used to study how historical events such as wars, migration, or political changes have influenced religious demographics over time. By comparing population numbers before and after specific events, resea...
Facebook
TwitterThis dataset was obtained from Reddit user u/jwolle1 on https://www.reddit.com/r/datasets/comments/cj3ipd/jeopardy_dataset_with_349000_clues/
Notes: - 349,641 clues in TSV format. Source: They prefer not to be named. DM for info. - I made one large complete dataset and also individual datasets for each season. The season files are small enough to open with Excel. - I tried to clean up all the formatting and encoding issues so there is minimal , \u201c, etc. - I tried to filter out all the impossible audio and video clues. - I included Alex's comments when he reads the categories at the beginning of each round. - I included a column that specifies whether a clue was a Daily Double or not (yes or no). - I made a note when clues come from special episodes (Teen Tournament, Celebrity Jeopardy, etc.). I was on the fence about including this but I decided it was the best way to find relatively easy or difficult clues. - I organized the data into chronological order from 1984 to present (July 2019, end of Season 35). And each category is grouped together so you can read it from top to bottom.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset tracks food insecurity across different demographics starting 4/23/2020 to 8/23/2021. It contains fields such as Race, Education, Sex, State, Income, etc. If you're looking for a dataset to examine Covid-19's impact on food insecurity for different demographics, then here you are!
This data is from the United States Census Bureau's Pulse Survey. The Pulse Survey is a frequently updating survey designed to collect data on how people's lives have been impacted by the coronavirus. Specifically, this dataset is a cleaned up version of the ' Food Sufficiency for Households, in the Last 7 Days, by Select Characteristics" tables.
The original form of this data can be found at: https://www.census.gov/programs-surveys/household-pulse-survey/data.html
The original form of this data was split into 36 excel files containing ~67 sheets each. The data was in a non-tidy format, and questions were also not entirely standard. This dataset is my attempt to combine all these different files, tidy the data up, and combine slightly different questions together.
The large amount of NA's are a consequence of how awful the data was originally/ forcing the data into a tidy format. Just filter the NA's out for the question you want to analyze and you'll be fine.
Facebook
TwitterThe Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.
This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.
https://i.imgur.com/6UEqejq.png" alt="">
This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.
Cover Photo by: Freepik
Thumbnail by: Clothing icons created by Flat Icons - Flaticon
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the Car sales data set which include information about different cars . This data set is being taken from the Analytixlabs for the purpose of prediction In this we have to see two things
First we have see which feature has more impact on car sales and carry out result of this
Secondly we have to train the classifier and to predict car sales and check the accuracy of the prediction.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.