Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The reference for the dataset and the dashboard was Youtube Channel codebasics. I have used a fictitious company called Atlix where the Sales Director want the sales data to be in a proper format which can help in decision making.
We have a total of 5 tables namely customers, products, markets, date & transactions. The data is exported from Mysql to Tableau.
In tableau , inner joins were used.
In the transactions table, we notice that sum sales amount figures are either negative or zero while the sales qty is either 1 or more. This cannot be right. Therefore, we filter the sales amount table in Tableau by having the least sales amount as minimum 1.
When currency column from transactions table was grouped in MySql, we could see ‘USD’ and ‘INR’ showing up. We cannot have a sales data showing two currencies. This was rectified by converting the USD sales amount into INR by taking the latest exchange rate at Rs.81.
We make the above change in tableau by creating a new calculated field called ‘Normalised Sales Amount’. If [Sales Amount] == ‘USD’ then [Sales Amount] * 81 else [Sales Amount] End.
Conclusion: The dashboard prepared is an interactive dashboard with filters. For eg. By Clicking on Mumbai under “Sales by Markets” we will see the results change in the other charts as well as they Will now show the results pertaining only to Mumbai. This can be done by year , month, customers , products etc. Parameter with filter has also been created for top customers and top products. This produces a slider which can be used to view the top 10 customers and products and slide it accordingly.
Following information can be passed on to the sales team or director.
Total Sales: from Jun’17 to Feb’20 has been INR 12.83 million. There is a drop of 57% in the sales revenue from 2018 to 2019. The year 2020 has not been considered as it only account for 2 months data. Markets: Mumbai which is the top most performing market and accounts for 51% of the total sales market has seen a drop in sales of almost 64% from 2018 to 2019. Top Customers: Path was on 2nd position in terms of sales in the year 2018. It accounted for 19% of the total sales after Electricalslytical which accounted for 21% of the total sales. But in year 2019, both Electricalslytical and Path were the 2nd and 4th highest customers by sales. By targeting the specific markets and customers through new ideas such as promotions, discounts etc we can look to reverse the trend of decreasing sales.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description:
The myusabank.csv dataset contains daily financial data for a fictional bank (MyUSA Bank) over a two-year period. It includes various key financial metrics such as interest income, interest expense, average earning assets, net income, total assets, shareholder equity, operating expenses, operating income, market share, and stock price. The data is structured to simulate realistic scenarios in the banking sector, including outliers, duplicates, and missing values for educational purposes.
Potential Student Tasks:
Data Cleaning and Preprocessing:
Exploratory Data Analysis (EDA):
Calculating Key Performance Indicators (KPIs):
Building Tableau Dashboards:
Forecasting and Predictive Modeling:
Business Insights and Reporting:
Educational Goals:
The dataset aims to provide hands-on experience in data preprocessing, analysis, and visualization within the context of banking and finance. It encourages students to apply data science techniques to real-world financial data, enhancing their skills in data-driven decision-making and strategic analysis.
Facebook
TwitterINTRODUCTION: As California’s homeless population continues to grow at an alarming rate, large metropolitan regions like the San Francisco Bay Area face unique challenges in coordinating efforts to track and improve homelessness. As an interconnected region of nine counties with diverse community needs, identifying homeless population trends across San Francisco Bay Area counties can help direct efforts more effectively throughout the region, and inform initiatives to improve homelessness at the city, county, and metropolitan level. OBJECTIVES: The primary objective of this research is to compare the annual Point-in-Time (PIT) counts of homelessness across San Francisco Bay Area counties between the years 2018-2022. The secondary objective of this research is to compare the annual Point-in-Time (PIT) counts of homelessness among different age groups in each of the nine San Francisco Bay Area counties between the years 2018-2022. METHODS: Two datasets were used to conduct research. The first dataset (Dataset 1) contains Point-in-Time (PIT) homeless counts published by the U.S. Department of Housing and Urban Development. Dataset 1 was cleaned using Microsoft Excel and uploaded to Tableau Desktop Public Edition 2022.4.1 as a CSV file. The second dataset (Dataset 2) was published by Data SF and contains shapefiles of geographic boundaries of San Francisco Bay Area counties. Both datasets were joined in Tableau Desktop Public Edition 2022.4 and all data analysis was conducted using Tableau visualizations in the form of bar charts, highlight tables, and maps. RESULTS: Alameda, San Francisco, and Santa Clara counties consistently reported the highest annual count of people experiencing homelessness across all 5 years between 2018-2022. Alameda, Napa, and San Mateo counties showed the largest increase in homelessness between 2018 and 2022. Alameda County showed a significant increase in homeless individuals under the age of 18. CONCLUSIONS: Results from this research reveal both stark and fluctuating differences in homeless counts among San Francisco Bay Area Counties over time, suggesting that a regional approach that focuses on collaboration across counties and coordination of services could prove beneficial for improving homelessness throughout the region. Results suggest that more immediate efforts to improve homelessness should focus on the counties of Alameda, San Francisco, Santa Clara, and San Mateo. Changes in homelessness during the COVID-19 pandemic years of 2020-2022 point to an urgent need to support Contra Costa County.
Facebook
Twitter_ Sorting table based on the ArcategTM repository. Agreement drafted in French and signed by hand on 03/12/2019 by the Director of ADEM and the Director of ANLux._
History of administration:
The management of jobseekers in the Grand Duchy of Luxembourg dates back to the end of the 19th century with the creation of labour exchanges in Luxembourg, Esch-sur-Alzette and Diekirch. There was no clear policy on the employment and management of the unemployed before the Act of 2 May 1913 governing the action of employment offices. After an ephemeral Central Placement Office created under the German occupation on 13 July 1940, it was not until the end of the Second World War that the first sustainable central and state administration for employment management emerged. The National Labour Office, which takes over the management of the labour exchanges, is entrusted with this task by the Grand-Ducal Decree of 30 June 1945. The Office was replaced in 1976 by the Employment Administration (ADEM), which was reformed in 2012 and renamed the Employment Development Agency.
Principal missions:
ADEM is the public employment service of the Grand Duchy of Luxembourg whose mission is to promote employment by strengthening the capacity to steer employment policy in coordination with economic and social policy. ADEM’s clients are jobseekers and employers. In the context of career guidance, ADEM also advises secondary school pupils.
In order to carry out this task, the Agency shall have the following powers: - Accompanying, advising, guiding and helping people looking for a job - To contribute to the security of employees' career paths - Coordinating and organising the training of jobseekers with a view to increasing their professional skills in collaboration with bodies which have vocational training in their remit - Prospecting the labour market, collecting job vacancies, assisting and advising employers in their recruitment - To ensure that job vacancies and applications are matched - Ensure the enforcement of legislation concerning the prevention of unemployment, the reduction of unemployment, the granting of unemployment benefits and employment support - To intervene in the retraining and re-employment of the workforce - Contribute to the implementation of legislation on the restoration of full employment - Organise apprenticeship placements for young people and adults - Provide career guidance for the integration or reintegration of young people and adults into working life - Contribute to the development and management of youth employment measures - Promote female employment, in particular as regards access to employment - Provide guidance, training, rehabilitation, professional integration and reintegration and follow-up for employees with disabilities and employees with reduced working capacity - Monitor and analyse the situation and developments on the labour market - To ensure technical relations with similar foreign and international services
Regulatory references:
Versions and updates:
The following shall be published in the dataset: - The first version signed on 03/12/2019
Facebook
TwitterInfos migrations no 20 - La population étrangère (DSED)
Facebook
TwitterDeterministic and stochastic are two methods for modeling of crude oil and bottled water market. Forecasting the price of the market directly affected energy producer and water user.There are two software, Tableau and Python, which are utilized to model and visualize both markets for the aim of estimating possible price in the future.The role of those software is to provide an optimal alternative with different methods (deterministic versus stochastic). The base of predicted price in Tableau is deterministic—global optimization and time series. In contrast, Monte Carlo simulation as a stochastic method is modeled by Python software. The purpose of the project is, first, to predict the price of crude oil and bottled water with stochastic (Monte Carlo simulation) and deterministic (Tableau software),second, to compare the prices in a case study of Crude Oil Prices: West Texas Intermediate (WTI) and the U.S. bottled water. 1. Introduction Predicting stock and stock price index is challenging due to uncertainties involved. We can analyze with a different aspect; the investors perform before investing in a stock or the evaluation of stocks by means of studying statistics generated by market activity such as past prices and volumes. The data analysis attempt to identify stock patterns and trends that may predict the estimation price in the future. Initially, the classical regression (deterministic) methods were used to predict stock trends; furthermore, the uncertainty (stochastic) methods were used to forecast as same as deterministic. According to Deterministic versus stochastic volatility: implications for option pricing models (1997), Paul Brockman & Mustafa Chowdhury researched that the stock return volatility is deterministic or stochastic. They reported that “Results reported herein add support to the growing literature on preference-based stochastic volatility models and generally reject the notion of deterministic volatility” (Pag.499). For this argument, we need to research for modeling forecasting historical data with two software (Tableau and Python). In order to forecast analyze Tableau feature, the software automatically chooses the best of up to eight models which generates the highest quality forecast. According to the manual of Tableau , Tableau assesses forecast quality optimize the smoothing of each model. The optimization model is global. The main part of the model is a taxonomy of exponential smoothing that analyzes the best eight models with enough data. The real- world data generating process is a part of the forecast feature and to support deterministic method. Therefore, Tableau forecast feature is illustrated the best possible price in the future by deterministic (time – series and prices). Monte Carlo simulation (MCs) is modeled by Python, which is predicted the floating stock market index . Forecasting the stock market by Monte Carlo demonstrates in mathematics to solve various problems by generating suitable random numbers and observing that fraction of the numbers that obeys some property or properties. The method utilizes to obtain numerical solutions to problems too complicated to solve analytically. It randomly generates thousands of series representing potential outcomes for possible returns. Therefore, the variable price is the base of a random number between possible spot price between 2002-2016 that present a stochastic method.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer, the second-leading cause of mortality, kills 16% of people worldwide. Unhealthy lifestyles, smoking, alcohol abuse, obesity, and a lack of exercise have been linked to cancer incidence and mortality. However, it is hard. Cancer and lifestyle correlation analysis and cancer incidence and mortality prediction in the next several years are used to guide people’s healthy lives and target medical financial resources. Two key research areas of this paper are Data preprocessing and sample expansion design Using experimental analysis and comparison, this study chooses the best cubic spline interpolation technology on the original data from 32 entry points to 420 entry points and converts annual data into monthly data to solve the problem of insufficient correlation analysis and prediction. Factor analysis is possible because data sources indicate changing factors. TSA-LSTM Two-stage attention design a popular tool with advanced visualization functions, Tableau, simplifies this paper’s study. Tableau’s testing findings indicate it cannot analyze and predict this paper’s time series data. LSTM is utilized by the TSA-LSTM optimization model. By commencing with input feature attention, this model attention technique guarantees that the model encoder converges to a subset of input sequence features during the prediction of output sequence features. As a result, the model’s natural learning trend and prediction quality are enhanced. The second step, time performance attention, maintains We can choose network features and improve forecasts based on real-time performance. Validating the data source with factor correlation analysis and trend prediction using the TSA-LSTM model Most cancers have overlapping risk factors, and excessive drinking, lack of exercise, and obesity can cause breast, colorectal, and colon cancer. A poor lifestyle directly promotes lung, laryngeal, and oral cancers, according to visual tests. Cancer incidence is expected to climb 18–21% between 2020 and 2025, according to 2021. Long-term projection accuracy is 98.96 percent, and smoking and obesity may be the main cancer causes.
Facebook
TwitterThis dataset contains data from Roscommon County Council’s Annual Budget. The budget is comprised of Tables A to F and Appendix 1. Each table is represented by a separate data file.
Dataset Name: Budget 2015: Table F,
Dataset Publisher: Roscommon County Council,
Dataset Language: English,
Date of Creation: November 2014,
Last Updated: November 2014,
Update Frequency: Annual.
The published annual budget document can be viewed at http://www.roscommoncoco.ie/en/Services/Finance/Annual_Budget/Annual-Budget-2015.pdf
Table F provides a breakdown of Expenditure to Sub-Service level of the Expenditure and Income to Income Source per Council Division contained in Table A.
In the published Annual Budget document, Table F is published as a separate table for each Division.
Section 1 of Table F contains Expenditure broken down by ‘Division’, ‘Service’ and ‘Sub-Service’,
Section 2 of Table F contains Income broken down by ‘Division’, ‘Income Type’ and ‘Income Source’,
Data fields for Table F are as follows –
Doc : Table Reference,
Heading :Indicates sections in the Table - Table F is comprised of two sections : Income and Expenditure. Heading = 1 for all Expenditure records; Heading = 2 for all Income records,
Ref : Division Reference,
Ref_Desc : Division Description,
Ref1 : Service Reference for all Expenditure records (i.e. Heading = 1) or Income Type for all Income records (i.e. Heading = 2), <br /
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
Tourism and travel holds more than 10% of the GDP worldwide, and is trending towards capturing higher stakes of the global pie. At the same time, it's an industry that generates huge volume of data and getting advantage of it could help businesses to stand out from the crowd.
Content
The dataset provides reservations data for two consecutive seasons (2021 - 2023) of a luxury hotel.
Source
ChatGPT 3.5 (OpenAI) is the main creator of the dataset. Minor adjustments were performed by myself to ensure that the dataset contains the desired fields and values.
Inspiration
• How effectively is the hotel performing across key metrics? • How are bookings distributed across different channels (e.g., Booking Platform, Phone, Walk-in, and Website)? • What is the current occupancy rate and how does it compare to the same period last year? • What are the demographics of the current guests (e.g., nationality)? • What is the average daily rate (ADR) per room?
These are examples of interesting questions that could be answered by analyzing this dataset.
If you are interested, please have a look at the Tableau dashboard that I have created to help answer the above questions. Tableau dashboard: https://public.tableau.com/app/profile/dimitris.angelides/viz/HotelExecutiveDashboards/HotelExecutiveSummaryReport?publish=yes
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
About Datasets:
Domain : Finance Project: Bank loan of customers Datasets: Finance_1.xlsx & Finance_2.xlsx Dataset Type: Excel Data Dataset Size: Each Excel file has 39k+ records
KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data
Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results
This data contains bar chart, text, stacked bar chart, dashboard, horizontal bars, donut chart, area chart, treemap, slicers, table, image.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
View the dataFor best results:View the dashboard in full screen.Use Chrome or Firefox as your browser.Read the dataData viewsThere are two views with this dashboard. You can toggle between them by clicking the button on the top right of the dashboard.The views are:Crime summary viewCrime details viewViewing modesThere are ways to view with this dashboard. You can toggle between them by clicking the button.The modes to view the data are:DarkLightSearch the dataCrime summary viewThe search options allow you to select:Location: Options are citywide, each of the precincts, each of the wards, or each of the neighborhoods.Select Crime: Select a type of crime to display.Select Chart: Select a way to display the crime data.Crime detail viewThe search options allow you to select:Date range: Select a custom date range.Location: Options are citywide, each of the precincts, each of the wards, or each of the neighborhoods.Select Type: Select a type of crime.Select Categories: Select one or more categories of crime to display.Select Details: Select one or more details to filter the data displayed.Select Chart: Select a way to display the crime data.View dashboard data definitions and detailed directionsView the open data set
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Tableau de tri de l'Institut national de la statistique et des études économiques, réalisé d’après le référentiel Arcateg™. Convention rédigée en français et signée électroniquement le 29/07/2020 par Monsieur le Directeur du STATEC et Madame la Directrice des ANLux. Historique de l’administration : L’instauration des différents organismes chargés de la statistique au Grand-Duché de Luxembourg se précise tout au long du XXe siècle. L’Office de la statistique est créé en 1923, succédant ainsi à la Commission permanente de statistique instituée au début du XXe siècle. L’Office de la statistique générale prend sa suite par l’arrêté grand-ducal du 2 août 1945. Il est chargé d’exécuter tous les travaux statistiques nécessaires pour éclairer les pouvoirs publics, notamment sur la situation démographique, économique et sociétale du pays. Il a un pouvoir de centralisation de tous les renseignements statistiques ainsi qu’un pouvoir d’autorisation des enquêtes générales. Un service d’études et de documentation économique est également créé en 1945. Il a pour mission l’étude des problèmes relatifs à la structure et à l’organisation de l’économie du pays et plus particulièrement des questions d’orientation et de réadaptation. Issu de la fusion de l’Office de la statistique générale et du Service d’études et de documentation économique, le Service central de la statistique et des études économiques, plus communément appelé STATEC, est fondé en 1962. Ses missions sont complétées et précisées par la loi du 14 juillet 1971 portant réorganisation du Service central de la statistique et des études économiques. Ce n’est qu’au travers de la loi du 10 juillet 2011 que le STATEC définit ses missions actuelles et devient l'Institut national de la statistique et des études économiques. Missions principales : Constituer un système d’information statistique accessible au public, notamment sur la structure et l’activité du pays en procédant, par enquêtes ou exploitation de fichiers administratifs, à l’élaboration de statistiques concernant notamment des phénomènes démographiques, économiques, sociaux et environnementaux ainsi qu’en centralisant les données statistiques dont les organismes publics disposent en raison de leurs attributions Établir les comptes nationaux, globaux ou sectoriels Établir avec la Banque centrale du Luxembourg, la balance des paiements et les comptes financiers et garantir leur cohérence méthodologique conformément aux règles européennes et internationales Établir et gérer une « Centrale des bilans » constituée de données issues des comptes annuels des entreprises et d’en publier les informations Réaliser les recensements de la population, du logement et des bâtiments Faire des études et analyses dans le domaine de la méthodologie statistique et des procédures statistiques et en publier les résultats Rassembler une documentation générale concernant les statistiques, ainsi que les théories et les faits démographiques, économiques et sociaux Représenter le Luxembourg en tant qu’autorité nationale de statistique auprès des autorités statistiques étrangères, communautaires et internationales. Références règlementaires : Loi modifiée du 10 juillet 2011 portant organisation de l’Institut national de la statistique et des études économiques Versions et mises à jour : Est publiée dans le jeu de données : La première version signée le 29/07/2020
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Tableau de tri du Centre socio-éducatif de l'État réalisé d’après le référentiel Arcateg™. Convention rédigée en français et signée électroniquement le 05/07/2022 par Monsieur le Directeur du CSEE et Madame la Directrice des ANLux. Historique de l’administration : La prise en charge et le traitement judiciaire des mineurs est une préoccupation de la société, du milieu politique et des gouvernements depuis plusieurs siècles. En effet, on retrouve une différenciation juridique des peines selon l’âge du condamné à partir de la Révolution française. Le code pénal de 1791 dans son titre V introduit « l’influence de l’âge des condamnés sur la nature et la durée des peines » selon si l’âge atteint est de seize ans ou moins. Le Code pénal de Napoléon de 1810 reprend cette différenciation des peines selon le même critère d’âge dans ses articles 66 et 67. L’arrêté royal grand-ducal du 6 février 1873 dans son article 110 indique lui aussi une différenciation de traitement selon l’âge et précise que : « La fréquentation de l’école est obligatoire pour tous les détenus des deux sexes renfermés dans la maison de correction et pour tous les autres condamnés âgés de moins de seize ans accomplis. » Ainsi, une attention plus particulière est portée à l’instruction des détenus mineurs. Néanmoins aucun dispositif relatif à la justice des mineurs n’est pas encore clairement établi. Le début du XXe siècle est marqué par un durcissement des mesures prises à l’égard des mineurs à l’origine d’infractions mais il est aussi marqué par une meilleure prise en charge de la protection des mineurs avec les premiers dispositifs de la justice des mineurs instaurant des tribunaux et des juges spécifiques. La loi du 2 août 1939 sur la protection de l’enfance illustre cette double prise en charge. Le chapitre 1 est consacré à la déchéance de l’autorité parentale. Le chapitre 2 instaure les mesures à prendre pour les mineurs traduits en justice. Il identifie dans ses différents articles la fonction de juge des enfants, les mesures qu’il est autorisé à prendre et décrit les procédures en matière de répression. Cette loi pose ainsi les fondements de la protection et de la justice des mineurs. La prise en charge des mineurs est alors sous l’autorité du ministère de la Justice. Cette loi est ensuite modifiée par la loi du 27 octobre 1958 puis abrogée par la loi du 12 novembre 1971 relative à la protection de la jeunesse. Les compétences de protection et de justice des mineurs relèvent alors du ministère de la Famille. Le début des années 1990 est marqué par une réforme des institutions. La loi du 12 juillet 1991 portant organisation des centres socio-éducatifs de l’État instaure pour la première fois dans un texte de loi les missions et l’organisation des établissements chargés de l’accueil des jeunes. Les maisons d’éducation pour garçons de Dreiborn et pour filles de Schrassig prennent alors la dénomination de centres socio-éducatifs de l’État. Les centres sont alors sous la triple responsabilité des ministres de la Famille, de la Justice et de l’Éducation nationale. Cette loi est abrogée par celle du 16 juin 2004 portant réorganisation du centre socio-éducatif de l’État, toujours en vigueur et modifiée par la loi du 29 août 2017. Missions principales : La loi du 16 juin 2004 attribue au CSEE la mission d’accueillir les mineurs qui lui sont confiés par décision des autorités judiciaires suivant les dispositions de la loi relative à la protection de la jeunesse ou toutes autres dispositions légales. Cette mission générale d’accueil comprend : L’accueil socio-éducatif : l’accueil physique des jeunes permettant leur prise en charge sociale et éducative. L’assistance thérapeutique : la prise en charge médicale et psychologique des jeunes L’enseignement socio-éducatif : l’éducation sous un aspect social et pédagogique La mission de préservation et de garde : veiller à la protection et à la surveillance des jeunes ou de leurs actions Références règlementaires : Loi du 29 août 2017 portant modification : de la loi modifiée du 16 juin 2004 portant réorganisation du centre socio-éducatif de l’État de la loi modifiée du 29 juin 2005 fixant les cadres du personnel des établissements d’enseignement secondaire et secondaire technique de la loi modifiée du 23 juillet 1952 concernant l’organisation militaire de l’article 32 du Livre 1er du Code de la sécurité sociale Loi du 16 juin 2004 portant réorganisation du centre socio-éducatif de l’État Loi du 12 juillet 1991 portant organisation des centres socio-éducatifs de l’État Versions et mises à jour : Est publiée dans le jeu de données : La première version signée le 05/07/2022
Facebook
TwitterLicence Ouverte / Open Licence 1.0https://www.etalab.gouv.fr/wp-content/uploads/2014/05/Open_Licence.pdf
License information was derived automatically
Admission au séjour (CICI 2011 - Les orientations de la politique de l’immigration - Septième rapport établi en application de l’article L. 111-10 du code de l’entrée et du séjour des étrangers et du droit d’asile)
Facebook
TwitterAdmission au séjour (CICI 2011 - Les orientations de la politique de l’immigration - Septième rapport établi en application de l’article L. 111-10 du code de l’entrée et du séjour des étrangers et du droit d’asile)
Facebook
TwitterThe population of Metro Vancouver (20110729Regional Growth Strategy Projections Population, Housing and Employment 2006 – 2041 File) will have increased greatly by 2040, and finding a new source of reservoirs for drinking water (2015_ Water Consumption_ Statistics File) will be essential. This issue of drinking water needs to be optimized and estimated (Data Mining file) with the aim of developing the region. Three current sources of water reservoirs for Metro Vancouver are Capilano, Seymour, and Coquitlam, in which the treated water is being supplied to the customer. The linear optimization (LP) model (Optimization, Sensitivity Report File) illustrates the amount of drinking water for each reservoir and region. In fact, the B.C. government has a specific strategy for the growing population till 2040, which leads them toward their goal. In addition, another factor is the new water source for drinking water that needs to be estimated and monitored to anticipate the feasible water source (wells) until 2040. As such, the government will have to make a decision on how much groundwater is used. The goal of the project is two steps: (1) an optimization model for three water reservoirs, and (2) estimating the new source of water to 2040. The process of data analysis for the project includes: the data is analyzed with six software—Trifacta Wrangler, AMPL, Excel Solver, Arc GIS, and SQL—and is visualized in Tableau. 1. Trifacta Wrangler Software clean data (Data Mining file). 2. AMPL and Solver Excel Software optimize drinking water consumption for Metro Vancouver (data in the Optimization and Sensitivity Report file). 3. ArcMap collaborates the raw data and result of the optimization water reservoir and estimating population till 2040 with the ArcGIS software (GIS Map for Tableau file). 4. Visualizing, estimating, and optimizing the source of drinking water for Metro Vancouver until 2040 with SQL software in Tableau (export tableau data file).
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
The Social Sustainability global database and its visualization dashboard are global public goods produced by the Social Sustainability and Inclusion Global Practice of The World Bank Group. They feature 85 leading indicators of inclusion, resilience, social cohesion, and process legitimacy, for 222 countries, disaggregated by population group and analyzed spatially and over time. In addition, the dashboard allows the user to overlay the indicators in the geospatial platform of the World Bank Group
Data Sources, technical note
The database and dashboard draw from 18 publicly available data sources comprising Barometers, the World Values Survey, the European Values Study, the Global Monitoring Database, ACLED, World Development Indicators, among others. The full list of data sources can be found here, and the technical note used for its construction can be accessed here
Disaggregation
Population group disaggregation can be performed by: gender (female/male); age (15-24 years vs 25+ years or 15-29 years, 30-59 years, 60+years); location (urban/rural); ethnicity and religion (major group/others). Analysis over time can be performed for two waves: 2015-2018 and 2019-2022. Spatial analysis can be performed at the first administrative level (ADM1).
Analysis
The dashboard offers four main analytical tools: country profile, country benchmarking, regional benchmarking and associations. The country profile option allows the user to examine the social profile of a country in a given wave. The benchmarking options allow comparisons of a country in two different time periods or across regions or globally. The associations function offers the user the option to examine patterns between two indicators.
Open data
The SSI database and dashboard being global public goods follow the open data and reproducibility policies of the World Bank Group. The STATA codes, codebook, and technical note used for constructing the indicators are available here in GitHub.
Links to resources
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This dataset is a cleaned version of the original version which can be found here. The data consist of contents added to Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .
We are going to: 1. Treat the Nulls 2. Treat the duplicates 3. Populate missing rows 4. Drop unneeded columns 5. Split columns Extra steps and more explanation on the process will be explained through the code comments
--View dataset
SELECT *
FROM netflix;
--The show_id column is the unique id for the dataset, therefore we are going to check for duplicates
SELECT show_id, COUNT(*)
FROM netflix
GROUP BY show_id
ORDER BY show_id DESC;
--No duplicates
--Check null values across columns
SELECT COUNT(*) FILTER (WHERE show_id IS NULL) AS showid_nulls,
COUNT(*) FILTER (WHERE type IS NULL) AS type_nulls,
COUNT(*) FILTER (WHERE title IS NULL) AS title_nulls,
COUNT(*) FILTER (WHERE director IS NULL) AS director_nulls,
COUNT(*) FILTER (WHERE movie_cast IS NULL) AS movie_cast_nulls,
COUNT(*) FILTER (WHERE country IS NULL) AS country_nulls,
COUNT(*) FILTER (WHERE date_added IS NULL) AS date_addes_nulls,
COUNT(*) FILTER (WHERE release_year IS NULL) AS release_year_nulls,
COUNT(*) FILTER (WHERE rating IS NULL) AS rating_nulls,
COUNT(*) FILTER (WHERE duration IS NULL) AS duration_nulls,
COUNT(*) FILTER (WHERE listed_in IS NULL) AS listed_in_nulls,
COUNT(*) FILTER (WHERE description IS NULL) AS description_nulls
FROM netflix;
We can see that there are NULLS.
director_nulls = 2634
movie_cast_nulls = 825
country_nulls = 831
date_added_nulls = 10
rating_nulls = 4
duration_nulls = 3
The director column nulls is about 30% of the whole column, therefore I will not delete them. I will rather find another column to populate it. To populate the director column, we want to find out if there is relationship between movie_cast column and director column
-- Below, we find out if some directors are likely to work with particular cast
WITH cte AS
(
SELECT title, CONCAT(director, '---', movie_cast) AS director_cast
FROM netflix
)
SELECT director_cast, COUNT(*) AS count
FROM cte
GROUP BY director_cast
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;
With this, we can now populate NULL rows in directors
using their record with movie_cast
UPDATE netflix
SET director = 'Alastair Fothergill'
WHERE movie_cast = 'David Attenborough'
AND director IS NULL ;
--Repeat this step to populate the rest of the director nulls
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET director = 'Not Given'
WHERE director IS NULL;
--When I was doing this, I found a less complex and faster way to populate a column which I will use next
Just like the director column, I will not delete the nulls in country. Since the country column is related to director and movie, we are going to populate the country column with the director column
--Populate the country using the director column
SELECT COALESCE(nt.country,nt2.country)
FROM netflix AS nt
JOIN netflix AS nt2
ON nt.director = nt2.director
AND nt.show_id <> nt2.show_id
WHERE nt.country IS NULL;
UPDATE netflix
SET country = nt2.country
FROM netflix AS nt2
WHERE netflix.director = nt2.director and netflix.show_id <> nt2.show_id
AND netflix.country IS NULL;
--To confirm if there are still directors linked to country that refuse to update
SELECT director, country, date_added
FROM netflix
WHERE country IS NULL;
--Populate the rest of the NULL in director as "Not Given"
UPDATE netflix
SET country = 'Not Given'
WHERE country IS NULL;
The date_added rows nulls is just 10 out of over 8000 rows, deleting them cannot affect our analysis or visualization
--Show date_added nulls
SELECT show_id, date_added
FROM netflix_clean
WHERE date_added IS NULL;
--DELETE nulls
DELETE F...
Facebook
TwitterThis dataset shows different breakdowns of London's resident population by their nationality. Data used comes from ONS' Annual Population Survey (APS). The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. 95% confidence interval levels are provided. Numbers have been rounded to the nearest thousand and figures for smaller populations have been suppressed. Two files are available to download: Nationality - Borough: Shows nationality estimates in their broad groups such as European Union, South East Asia, North Africa, etc. broken down to borough level. Detailed Nationality - London: Shows nationality estimates for specific countries such as France, Bangladesh, Nigeria, etc. available for London as a whole. A Tableau visualisation tool is also available. Country of Birth data can be found here: https://data.london.gov.uk/dataset/country-of-birth Nationality refers to that stated by the respondent during the interview. Country of birth is the country in which they were born. It is possible that an individual’s nationality may change, but the respondent’s country of birth cannot change. This means that country of birth gives a more robust estimate of change over time.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains financial information of 1500 companies across 8 different industries scraped from companiesmarketcap.com on May 2024. It contains information about the company's name, industry, country, employees, marketcap, revenue, earnings, etc.
The dataset contains 2 files with the same column names. scraped_company_data.csv file is further transformed and cleaned to produce the finaltransformed_company_data.csvfile.
The website companiesmarketcap.com was used to scrape this dataset. Please include citations for this dataset if you use it in your own research.
The dataset can be used to find industries with the highest average market value, most profitable industries, most growth-oriented sectors, etc. More interesting insights can be found in this README file.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The reference for the dataset and the dashboard was Youtube Channel codebasics. I have used a fictitious company called Atlix where the Sales Director want the sales data to be in a proper format which can help in decision making.
We have a total of 5 tables namely customers, products, markets, date & transactions. The data is exported from Mysql to Tableau.
In tableau , inner joins were used.
In the transactions table, we notice that sum sales amount figures are either negative or zero while the sales qty is either 1 or more. This cannot be right. Therefore, we filter the sales amount table in Tableau by having the least sales amount as minimum 1.
When currency column from transactions table was grouped in MySql, we could see ‘USD’ and ‘INR’ showing up. We cannot have a sales data showing two currencies. This was rectified by converting the USD sales amount into INR by taking the latest exchange rate at Rs.81.
We make the above change in tableau by creating a new calculated field called ‘Normalised Sales Amount’. If [Sales Amount] == ‘USD’ then [Sales Amount] * 81 else [Sales Amount] End.
Conclusion: The dashboard prepared is an interactive dashboard with filters. For eg. By Clicking on Mumbai under “Sales by Markets” we will see the results change in the other charts as well as they Will now show the results pertaining only to Mumbai. This can be done by year , month, customers , products etc. Parameter with filter has also been created for top customers and top products. This produces a slider which can be used to view the top 10 customers and products and slide it accordingly.
Following information can be passed on to the sales team or director.
Total Sales: from Jun’17 to Feb’20 has been INR 12.83 million. There is a drop of 57% in the sales revenue from 2018 to 2019. The year 2020 has not been considered as it only account for 2 months data. Markets: Mumbai which is the top most performing market and accounts for 51% of the total sales market has seen a drop in sales of almost 64% from 2018 to 2019. Top Customers: Path was on 2nd position in terms of sales in the year 2018. It accounted for 19% of the total sales after Electricalslytical which accounted for 21% of the total sales. But in year 2019, both Electricalslytical and Path were the 2nd and 4th highest customers by sales. By targeting the specific markets and customers through new ideas such as promotions, discounts etc we can look to reverse the trend of decreasing sales.