Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Percentage of individuals using the internet for internet banking (electronic payments/transactions, looking up account information, etc.), within the last 3 months prior to the survey. Expressed as a percentage of all individuals aged between 16 and 74 years old surveyed. Data based on the annual EU survey on the use of Information and Communication Technologies (ICT) in households and by individuals.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset simulates a set of key economic, social, and environmental indicators for 20 countries over the period from 2010 to 2019. The dataset is designed to reflect typical World Bank metrics, which are used for analysis, policy-making, and forecasting. It includes the following variables:
Country Name: The country for which the data is recorded. Year: The specific year of the observation (from 2010 to 2019). GDP (USD): Gross Domestic Product in billions of US dollars, indicating the economic output of a country. Population: The total population of the country in millions. Life Expectancy (in years): The average life expectancy at birth for the country’s population. Unemployment Rate (%): The percentage of the total labor force that is unemployed but actively seeking employment. CO2 Emissions (metric tons per capita): The per capita carbon dioxide emissions, reflecting environmental impact. Access to Electricity (% of population): The percentage of the population with access to electricity, representing infrastructure development. Country:
Description: Name of the country for which the data is recorded. Data Type: String Example: "United States", "India", "Brazil" Year:
Description: The year in which the data is observed. Data Type: Integer Range: 2010 to 2019 Example: 2012, 2015 GDP (USD):
Description: The Gross Domestic Product of the country in billions of US dollars, indicating the economic output. Data Type: Float (billions of USD) Example: 14200.56 (represents 14,200.56 billion USD) Population:
Description: The total population of the country in millions. Data Type: Float (millions of people) Example: 331.42 (represents 331.42 million people) Life Expectancy (in years):
Description: The average number of years a newborn is expected to live, assuming that current mortality rates remain constant throughout their life. Data Type: Float (years) Range: Typically between 50 and 85 years Example: 78.5 years Unemployment Rate (%):
Description: The percentage of the total labor force that is unemployed but actively seeking employment. Data Type: Float (percentage) Range: Typically between 2% and 25% Example: 6.25% CO2 Emissions (metric tons per capita):
Description: The amount of carbon dioxide emissions per person in the country, measured in metric tons. Data Type: Float (metric tons) Range: Typically between 0.5 and 20 metric tons per capita Example: 4.32 metric tons per capita Access to Electricity (%):
Description: The percentage of the population with access to electricity. Data Type: Float (percentage) Range: Typically between 50% and 100% Example: 95.7%
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Term deposits are a major source of income for a bank. A term deposit is a cash investment held at a financial institution. Your money is invested for an agreed rate of interest over a fixed amount of time, or term. The bank has various outreach plans to sell term deposits to their customers such as email marketing, advertisements, telephonic marketing, and digital marketing.
Telephonic marketing campaigns still remain one of the most effective way to reach out to people. However, they require huge investment as large call centers are hired to actually execute these campaigns. Hence, it is crucial to identify the customers most likely to convert beforehand so that they can be specifically targeted via call.
The data is related to direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe to a term deposit (variable y).
The data is related to the direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed by the customer or not. The data folder contains two datasets:-
bank client data:
1 - age (numeric) 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed) 4 - education (categorical: "unknown","secondary","primary","tertiary") 5 - default: has credit in default? (binary: "yes","no") 6 - balance: average yearly balance, in euros (numeric) 7 - housing: has housing loan? (binary: "yes","no") 8 - loan: has personal loan? (binary: "yes","no") # related with the last contact of the current campaign: 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 10 - day: last contact day of the month (numeric) 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec") 12 - duration: last contact duration, in seconds (numeric) # other attributes: 13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted) 15 - previous: number of contacts performed before this campaign and for this client (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")
Output variable (desired target): 17 - y - has the client subscribed a term deposit? (binary: "yes","no")
Missing Attribute Values: None
This dataset is publicly available for research. It has been picked up from the UCI Machine Learning with random sampling and a few additional columns.
Please add this citation if you use this dataset for any further analysis.
S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
The full dataset was described and analyzed in:
Created by: Paulo Cortez (Univ. Minho) and Sérgio Moro (ISCTE-IUL) @ 2012. Thanks to Berkin Kaplanoğlu for helping with the proper column descriptions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The original dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann.
The dataset utilized comes from a german bank in 2016 collected by Professor Hoffman of the University of Califonia.
In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes.
The original dataset required extensive cleaning and variable selection I due to its complicated system of categories and symbols. Several columns are simply ignored, because they were viewed as not important or their descriptions are obscure. The selected attributes are:
The objective of this analysis is to segment the German bank's customers based on the various factors (variables) available in their database.
The library makes use of the following packages:
Conclusion.
The analysis found that the most optimal clusters were 4 as explained below:
Cluster 0 – high mean of credit amount, long duration, younger customers
Cluster 1 – low mean of credit amount, short duration, younger customers
Cluster 2 - low mean of credit amount, short duration, older customers
Cluster 3 - high mean of credit amount, middle-time duration, older customers
Segmenting bank customers through clustering techniques offers significant benefits for both the bank itself and its various stakeholders. Here are some key advantages:
For Banks:
New Product Development: By analyzing the needs and preferences of different segments, banks can develop new products and services that cater to their specific requirements, increasing customer loyalty and driving revenue growth.
For Stakeholders:
Improved Customer Experience: Segmented communication and personalized offerings lead to a more satisfying and relevant experience for customers, boosting overall satisfaction and trust in the bank.
Increased Value Perception: By providing products and services aligned with their needs, customers perceive greater value from the bank's offerings, leading to strengthened relationships and increased loyalty.
Enhanced Financial Inclusion: Customer segmentation can help banks identify underserved segments and develop strategies to offer them tailored financial products and services, promoting greater financial inclusion.
Improved Regulatory Compliance: By understanding customer behavior and risk profiles better, banks can better comply with regulations and mitigate potential regulatory risks.
Overall, customer segmentation via clustering empowers banks to make data-driven decisions, optimize their operations, and deliver a more personalized and satisfying experience for their customers. This ultimately leads to increased profitability, stronger stakeholder relationships, and a competitive advantage in the market.
Some additional examples of how customer segmentation can benefit other stakeholders:
Facebook
TwitterHow many people use social media?
Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
Who uses social media?
Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
How much time do people spend on social media?
Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
What are the most popular social media platforms?
Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
Facebook
TwitterThe online banking penetration rate in Australia was forecast to continuously increase between 2024 and 2029 by in total 4.1 percentage points. After the fifteenth consecutive increasing year, the online banking penetration is estimated to reach 71.28 percent and therefore a new peak in 2029. Notably, the online banking penetration rate of was continuously increasing over the past years.Shown is the estimated percentage of the total population in a given region or country, which makes use of online banking.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Internet use in the UK annual estimates by age, sex, disability, ethnic group, economic activity and geographical location, including confidence intervals.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Germany DE: Internet Users: Individuals: % of Population data was reported at 93.500 % in 2024. This records an increase from the previous number of 92.500 % for 2023. Germany DE: Internet Users: Individuals: % of Population data is updated yearly, averaging 75.200 % from Dec 1990 (Median) to 2024, with 35 observations. The data reached an all-time high of 93.500 % in 2024 and a record low of 0.126 % in 1990. Germany DE: Internet Users: Individuals: % of Population data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Germany – Table DE.World Bank.WDI: Telecommunication. Internet users are individuals who have used the Internet (from any location) in the last 3 months. The Internet can be used via a computer, mobile phone, personal digital assistant, games machine, digital TV etc.;International Telecommunication Union (ITU) World Telecommunication/ICT Indicators Database;Weighted average;Please cite the International Telecommunication Union for third-party use of these data.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains synthetic data representing the operations of a banking collections department. It includes information about customers, their loan details, payment histories, and risk assessments. The dataset is designed for data analysis, machine learning, and visualization tasks.
Column Description Here’s the column description in your requested format:
| Column Name | Description |
|---|---|
| Customer_ID | A unique identifier for each customer (e.g., CUST00001). |
| Name | The name of the customer (fictional). |
| Account_Number | A unique account number associated with the customer. |
| Account_Type | The type of account held by the customer (Savings, Current, Credit). |
| Loan_Type | The type of loan taken by the customer (Home Loan, Car Loan, Personal Loan). |
| Loan_Amount | The total loan amount issued to the customer. |
| Outstanding_Amount | The remaining balance yet to be paid by the customer. |
| EMI_Amount | The monthly installment amount for the loan. |
| Due_Date | The date on which the EMI was due. |
| Payment_Status | Status of the EMI payment (Paid, Missed, Partially Paid). |
| Collection_Agent | The name of the agent responsible for collecting dues. |
| Last_Payment_Date | The date when the last payment was made, or null if no payment was made. |
| Payment_Delay_Days | The number of days by which the payment was delayed (0 if on time). |
| Region | The geographical region of the customer (North, South, East, West). |
| Contact_Number | The contact number of the customer (fictional). |
| The email address of the customer (fictional). | |
| Customer_Score | A score representing the customer’s creditworthiness (300 to 850). |
| Risk_Level | Categorical field indicating the customer’s risk level (Low, Medium, High). |
Customer_ID:
A unique identifier for each customer (e.g., CUST00001).
Name:
The name of the customer (fictional).
Account_Number:
A unique account number associated with the customer.
Account_Type:
The type of account held by the customer. Possible values:
Loan_Type:
The type of loan taken by the customer. Possible values:
Loan_Amount:
The total loan amount issued to the customer, in the range of $5,000 to $500,000.
Outstanding_Amount:
The remaining balance yet to be paid by the customer.
EMI_Amount:
The monthly installment amount calculated based on the outstanding amount and tenure.
Due_Date:
The date on which the EMI was due.
Payment_Status:
Status of the EMI payment. Possible values:
Collection_Agent:
The name of the agent responsible for collecting dues from the customer.
Last_Payment_Date:
The date when the last payment was made. It may be null if no payment was made.
Payment_Delay_Days:
The number of days by which the payment was delayed. Zero if payments are made on time.
Region:
The geographical region of the customer. Possible values:
Contact_Number:
The contact number of the customer (fictional).
Email:
The email address of the customer (fictional).
Customer_Score:
A score representing the customer’s creditworthiness, ranging from 300 to 850.
Risk_Level:
A categorical variable indicating the customer’s risk level. Possible values:
This dataset is ideal for: - Predicting payment defaults using machine learning. - Analyzing customer credit behavior. - Visualizing loan repayment patterns by region or loan type. - Assessing the effectiveness of collection agents.
If you need further customization or additional features, let me know!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Login Data Set for Risk-Based Authentication
Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.
This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.
The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.
WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.
Overview
The data set contains the following features related to each login attempt on the SSO:
| Feature | Data Type | Description | Range or Example |
|---|---|---|---|
| IP Address | String | IP address belonging to the login attempt | 0.0.0.0 - 255.255.255.255 |
| Country | String | Country derived from the IP address | US |
| Region | String | Region derived from the IP address | New York |
| City | String | City derived from the IP address | Rochester |
| ASN | Integer | Autonomous system number derived from the IP address | 0 - 600000 |
| User Agent String | String | User agent string submitted by the client | Mozilla/5.0 (Windows NT 10.0; Win64; ... |
| OS Name and Version | String | Operating system name and version derived from the user agent string | Windows 10 |
| Browser Name and Version | String | Browser name and version derived from the user agent string | Chrome 70.0.3538 |
| Device Type | String | Device type derived from the user agent string | (mobile, desktop, tablet, bot, unknown)1 |
| User ID | Integer | Idenfication number related to the affected user account | [Random pseudonym] |
| Login Timestamp | Integer | Timestamp related to the login attempt | [64 Bit timestamp] |
| Round-Trip Time (RTT) [ms] | Integer | Server-side measured latency between client and server | 1 - 8600000 |
| Login Successful | Boolean | True: Login was successful, False: Login failed | (true, false) |
| Is Attack IP | Boolean | IP address was found in known attacker data set | (true, false) |
| Is Account Takeover | Boolean | Login attempt was identified as account takeover by incident response team of the online service | (true, false) |
Data Creation
As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.
The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.
The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.
The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.
The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.
Regarding the Data Values
Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.
You can recognize them by the following values:
ASNs with values >= 500.000
IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)
Study Reproduction
Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.
The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.
See RESULTS.md for more details.
Ethics
By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.
The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.
Publication
You can find more details on our conducted study in the following journal article:
Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022)
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono.
ACM Transactions on Privacy and Security
Bibtex
@article{Wiefling_Pump_2022,
author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi},
title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}},
journal = {{ACM} {Transactions} on {Privacy} and {Security}},
doi = {10.1145/3546069},
publisher = {ACM},
year = {2022}
}
License
This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069
Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎
Facebook
TwitterThe fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.
The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.
The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.
National coverage
Individual
Observation data/ratings [obs]
In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.
In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.
In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.
The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).
For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.
Sample size for France is 1000.
Landline and mobile telephone
Questionnaires are available on the website.
Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.
Facebook
TwitterThe population share with mobile internet access in India was forecast to continuously increase between 2024 and 2029 by in total 25 percentage points. After the fifteenth consecutive increasing year, the mobile internet penetration is estimated to reach 73.62 percent and therefore a new peak in 2029. Notably, the population share with mobile internet access of was continuously increasing over the past years.The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the population share with mobile internet access in countries like Bangladesh and Sri Lanka.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The goal of this project was to create a structured dataset which can be used to train computer vision models to detect electronic waste devices, i.e., e-waste or Waste Electrical and Electronic Equipment (WEEE). Due to the often-subjective differences between e-waste and functioning electronic devices, a model trained on this dataset could also be used to detect electronic devices in general. However, it must be noted that for the purposes of e-waste recognition, this dataset does not differentiate between different brands or models of the same type of electronic devices, e.g. smartphones, and it also includes images of damaged equipment.
The structure of this dataset is based on the UNU-KEYS classification Wang et al., 2012, Forti et al., 2018. Each class in this dataset has a tag containing its corresponding UNU-KEY. This dataset structure has the following benefits: 1. It allows the user to easily classify e-waste devices regardless of which e-waste definition their country or organization uses, thanks to the correlation between the UNU-KEYS and other classifications such as the HS-codes or the EU-6 categories, defined in the WEEE directive; 2. It helps dataset contributors focus on adding e-waste devices with higher priority compared to arbitrarily chosen devices. This is because electronic devices in the same UNU-KEY category have similar function, average weight and life-time distribution as well as comparable material composition, both in terms of hazardous substances and valuable materials, and related end-of-life attributes Forti et al., 2018. 3. It gives dataset contributors a clear goal of which electronic devices still need to be added and a clear understanding of their progress in the seemingly endless task of creating an e-waste dataset.
This dataset contains annotated images of e-waste from every UNU-KEY category. According to Forti et al., 2018, there are a total of 54 UNU-KEY e-waste categories.
At the time of writing, 22. Apr. 2024, the dataset has 19613 annotated images and 77 classes. The dataset has mixed bounding-box and polygon annotations. Each class of the dataset represents one type of electronic device. Different models of the same type of device belong to the same class. For example, different brands of smartphones are labelled as "Smartphone", regardless of their make or model. Many classes can belong to the same UNU-KEY category and therefore have the same tag. For example, the classes "Smartphone" and "Bar-Phone" both belong to the UNU-KEY category "0306 - Mobile Phones". The images in the dataset are anonymized, meaning that no people were annotated and images containing visible faces were removed.
The dataset was almost entirely built by cloning annotated images from the following open-source Roboflow datasets: [1]-[91]. Some of the images in the dataset were acquired from the Wikimedia Commons website. Those images were chosen to have an unrestrictive license, i.e., they belong to the public domain. They were manually annotated and added to the dataset.
This work was done as part of the PhD of Dimitar Iliev, student at the Faculty of German Engineering and Industrial Management at the Technical University of Sofia, Bulgaria and in collaboration with the Faculty of Computer Science at Otto-von-Guericke-University Magdeburg, Germany.
If you use this dataset in a research paper, please cite it using the following BibTeX:
@article{iliev2024EwasteDataset,
author = "Iliev, Dimitar and Marinov, Marin and Ortmeier, Frank",
title = "A proposal for a new e-waste image dataset based on the unu-keys classification",
journal = "XXIII-rd International Symposium on Electrical Apparatus and Technologies SIELA 2024",
year = 2024,
volume = "23",
number = "to appear",
pages = {to appear}
note = {under submission}
}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Construction
This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/
[1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain},
Dataset Description
Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021
Overview:
This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs.
Every dates have been retrieved from bloc UNIX timestamp and GMT timezone.
Contents:
The dataset is distributed across three compressed archives:
All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package.
orbitaal-stream_graph.tar.gz:
The root directory is STREAM_GRAPH/
Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes).
The stream graph is divided into 13 files, one for each year
Files format is parquet
Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering
These files are in the subdirectory STREAM_GRAPH/EDGES/
orbitaal-snapshot-all.tar.gz:
The root directory is SNAPSHOT/
Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021).
Files format is parquet
Name format is orbitaal-snapshot-all.snappy.parquet.
These files are in the subdirectory SNAPSHOT/EDGES/ALL/
orbitaal-snapshot-year.tar.gz:
The root directory is SNAPSHOT/
Contains the yearly resolution of snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering
These files are in the subdirectory SNAPSHOT/EDGES/year/
orbitaal-snapshot-month.tar.gz:
The root directory is SNAPSHOT/
Contains the monthly resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where
[YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering
These files are in the subdirectory SNAPSHOT/EDGES/month/
orbitaal-snapshot-day.tar.gz:
The root directory is SNAPSHOT/
Contains the daily resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where
[YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering
These files are in the subdirectory SNAPSHOT/EDGES/day/
orbitaal-snapshot-hour.tar.gz:
The root directory is SNAPSHOT/
Contains the hourly resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where
[YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering
These files are in the subdirectory SNAPSHOT/EDGES/hour/
orbitaal-nodetable.tar.gz:
The root directory is NODE_TABLE/
Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses.
Small samples in CSV format
orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv
These two CSV files are related to stream graph representations of an halvening happening in 2016.
orbitaal-snapshot-2016_07_08.csv and orbitaal-snapshot-2016_07_09.csv
These two CSV files are related to daily snapshot representations of an halvening happening in 2016.
Facebook
TwitterA. SUMMARY This dataset includes data on a variety of substance use services funded by the San Francisco Department of Public Health (SFDPH). This dataset only includes Drug MediCal-certified residential treatment, withdrawal management, and methadone treatment. Other private non-Drug Medi-Cal treatment providers may operate in the city. Withdrawal management discharges are inclusive of anyone who left withdrawal management after admission and may include someone who left before completing withdrawal management. This dataset also includes naloxone distribution from the SFDPH Behavioral Health Services Naloxone Clearinghouse and the SFDPH-funded Drug Overdose Prevention and Education program. Both programs distribute naloxone to various community-based organizations who then distribute naloxone to their program participants. Programs may also receive naloxone from other sources. Data from these other sources is not included in this dataset. Finally, this dataset includes the number of clients on medications for opioid use disorder (MOUD). The number of people who were treated with methadone at a Drug Medi-Cal certified Opioid Treatment Program (OTP) by year is populated by the San Francisco Department of Public Health (SFDPH) Behavioral Health Services Quality Management (BHSQM) program. OTPs in San Francisco are required to submit patient billing data in an electronic medical record system called Avatar. BHSQM calculates the number of people who received methadone annually based on Avatar data. Data only from Drug MediCal certified OTPs were included in this dataset. The number of people who receive buprenorphine by year is populated from the Controlled Substance Utilization Review and Evaluation System (CURES), administered by the California Department of Justice. All licensed prescribers in California are required to document controlled substance prescriptions in CURES. The Center on Substance Use and Health calculates the total number of people who received a buprenorphine prescription annually based on CURES data. Formulations of buprenorphine that are prescribed only for pain management are excluded. People may receive buprenorphine and methadone in the same year, so you cannot add the Buprenorphine Clients by Year, and Methadone Clients by Year data together to get the total number of unique people receiving medications for opioid use disorder. For more information on where to find treatment in San Francisco, visit findtreatment-sf.org. B. HOW THE DATASET IS CREATED This dataset is created by copying the data into this dataset from the SFDPH Behavioral Health Services Quality Management Program, the California Controlled Substance Utilization Review and Evaluation System (CURES), and the Office of Overdose Prevention. C. UPDATE PROCESS Residential Substance Use Treatment, Withdrawal Management, Methadone, and Naloxone data are updated quarterly with a 45-day delay. Buprenorphine data are updated quarterly and when the state makes this data available, usually at a 5-month delay. D. HOW TO USE THIS DATASET Throughout the year this dataset may include partial year data for methadone and buprenorphine treatment. As both methadone and buprenorphine are used as long-term treatments for opioid use disorder, many people on treatment at the end of one calendar year will continue into the next. For this reason, doubling (methadone), or quadrupling (buprenorphine) partial year data will not accurately project year-end totals. E. RELATED DATASETS Overdose-Related 911 Responses by Emergency Medical Services Unintentional Overdose Death Rates by Race/Ethnicity Preliminary Unintentional Drug Overdose Deaths
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical stock price data for major banks from the year 2014 to 2024. The dataset includes daily stock prices, trading volume, and other relevant financial metrics for prominent banks. The stock prices are provided in IDR (Indonesian Rupiah) currency.
PT Bank Central Asia Tbk (BBCA.JK), more commonly recognized as Bank Central Asia (BCA). As one of Indonesia's largest privately-owned banks, BCA was founded in 1955 and provides a diverse array of banking services encompassing consumer banking, corporate banking, investment banking, and asset management. With a widespread presence throughout Indonesia, including numerous branches and ATMs, BCA is esteemed for its robust financial achievements, inventive banking offerings, and dedication to customer satisfaction.
Dataset Variables:
Data Sources: The dataset is compiled from reliable financial sources, including stock exchanges, financial news websites, and reputable financial data providers. Data cleaning and preprocessing techniques have been applied to ensure accuracy and consistency. More info: https://finance.yahoo.com/quote/BBCA.JK/history/
Use Case: This dataset can be utilized for various purposes, including financial analysis, stock market forecasting, algorithmic trading strategies, and academic research. Researchers, analysts, and data scientists can explore the trends, patterns, and relationships within the data to derive valuable insights into the performance of the banking sector over the specified period. Additionally, this dataset can serve as a benchmark for evaluating the performance of machine learning models and quantitative trading strategies in the banking industry.
Facebook
TwitterBy Dataquest [source]
Explore the world of consumer finance with this dataset from the Consumer Financial Protection Bureau. This data set includes a rich compilation of detailed bank and credit card customer complaints and provides an invaluable insight into customer experiences in the financial sector. With over [number] records spanning across [date stream], this data set is ideal for researchers, policymakers, financial institutions and anyone looking to learn more about consumer finance.
For each record in the dataset, you'll find details such as date received, product name, issue category, consumer complaint narrative, company response to customer enquiries, state origin of complaint (where appropriate) , even tags associated with the complaint. You can also uncover how timely the company responded to customer query usingthe Timely Response? field or whether customers disputed a firm's reply with Consumer Disputed? field. Utilizing all these features along with deep analysis can aid businesses in creating better consumer experiences prepared explainable models on root causes responsible for issues like disputes or late-responses ultimately leadingtoindustrywidepolicy change that benefit customers alike. Enjoyed exploring data? Hop online to check out additional records available at https://www.consumerfinance.gov/data-research/consumer-complaints/#download-the-data . This dataset is released under Public Domain Licensing Info which meant everyone’s free access!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
It is important to note that some fields are optional and missing values are expected for those fields due to how many data points had been reported at time of collection. It can be beneficial to list all unrecorded information separately for comparison considerations if relevant for your research needs.
The data points found within this dataset can not only help you explore differences between experiences based on non-similar factors such as age but also broaden understanding into more specific discussions such as identifying racial disparities in access new types of technology like mobile banking applications versus traditional forms like checks or savings accounts. By using this tool along with other sources of information you should be able create a comprehensive picture regarding both individual's differences experiences in addition broader trends applicable across large swaths impacted people on both local and national levels. These findings could then be used potentially lead positive changes into institutions responsible providing us with these services over time alongside continued evaluation if growth has effectively occurred .
- Identifying states and specific areas with the highest number of financial complaints to target education and awareness initiatives.
- Analyzing trends in complaint investigations to improve customer service response times and accuracy rates.
- Developing a machine learning model that can accurately predict if a company will respond to a financial complaint in a timely manner
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: Bank_Account_or_Service_Complaints.csv | Column name | Description | |:---------------------------------|:------------------------------------------------------------------------------------| | Date received | The date the complaint was received by the CFPB. (Date) | | Product | The type of financial product or service the complaint is related to. (Text) | | Sub-product | The sub-category of the product the complaint is related to. (Text) | | Issue | The issue the consumer is complaining about. (Text) | | Sub-issue | The sub-category of the issue the consumer is complaining about. (Text) | | Consumer complaint narrative | The narrative of the complaint provided by the consumer. (Text) | | Company public response | The public response from the company regarding the complaint. (Text) ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
High-throughput sequencing has become ubiquitous in biomedical sciences. As new technologies emerge and sequencing costs decline, the diversity and volume of available data increases exponentially, and successfully navigating the data becomes more challenging. Though datasets are often hosted by public repositories, scientists must rely on inconsistent annotation to identify and interpret meaningful data. Moreover, the experimental heterogeneity and wide-ranging quality of high-throughput biological data means that even data with desired cell lines, tissue types, or molecular targets may not be readily interpretable or integrated. We have developed ORSO (Online Resource for Social Omics) as an easy-to-use web application to connect life scientists with genomics data. In ORSO, users interact within a data-driven social network, where they can favorite datasets and follow other users. In addition to more than 30,000 datasets hosted from major biomedical consortia, users may contribute their own data to ORSO, facilitating its discovery by other users. Leveraging user interactions, ORSO provides a novel recommendation system to automatically connect users with hosted data. In addition to social interactions, the recommendation system considers primary read coverage information and annotated metadata. Similarities used by the recommendation system are presented by ORSO in a graph display, allowing exploration of dataset associations. The topology of the network graph reflects established biology, with samples from related systems grouped together. We tested the recommendation system using an RNA-seq time course dataset from differentiation of embryonic stem cells to cardiomyocytes. The ORSO recommendation system correctly predicted early data point sources as embryonic stem cells and late data point sources as heart and muscle samples, resulting in recommendation of related datasets. By connecting scientists with relevant data, ORSO provides a critical new service that facilitates wide-ranging research interests.
Facebook
TwitterThe Australia Telescope National Facility (ATNF) Pulsar Catalog is a catalog of known pulsars compiled by R.N. Manchester et al. and is descended from pulsar database used for the paper "Catalog of 558 Pulsars" by J.H. Taylor, R.N. Manchester and A.G. Lyne 1993, ApJS, 88, 529-568. The current catalog has been supplemented by inclusion of published data from more recent radio surveys, in particular, the Parkes Multibeam (PM) Pulsar Survey (Manchester et al. 2001, MNRAS, 328, 17-35) [available at the HEASARC as the PMPULSAR table] and the Swinburne Intermediate Latitude Pulsar Survey (Edwards et al. 2001, MNRAS, 326, 358-374), both made using the ATNF Parkes 64-m radio telescope. Binary parameters for known binary pulsars are also included as well as all available astrometric and spin parameter information for all pulsars. The catalog includes all published rotation-powered pulsars. Two separate small subsets of pulsars detected only at high energies are also included in the current table: the first group comprises X-ray and gamma-ray pulsars which are apparently powered by spin-down energy, but which have not been detected at radio wavelengths, while the second group contains anomalous X-ray pulsars (AXPs) and soft-gamma-ray repeaters (SGRs) for which coherent pulsations have been detected. Accretion-powered pulsars such as Her X-1 and the recently discovered X-ray millisecond pulsars such as SAX J1808.4-3658 are not included in this table, however. Many people have contributed to the compilation of the data contained in this catalog and the database that it was derived from. The authors particularly thank Andrew Lyne of the University of Manchester, Jodrell Bank Observatory, David Nice of Princeton University, and Russell Edwards, then at Swinburne University of Technology. The also acknowledge the efforts of Warwick University students Adam Goode and Steven Thomas who compiled and checked a recent version of the database. The original (summer 2003) database at the ATNF website was compiled with the invaluable assistance of Maryam Hobbs, while the ATNF web interface was designed and constructed by Albert Teoh, a Summer Vacation Scholar at the ATNF in 2002/2003. The authors would appreciate if anyone making use of this catalog in a publication acknowledges the source of their information by quoting the ATNF Pulsar Catalog website address of http://www.atnf.csiro.au/research/pulsar/psrcat/ This database table was initially created by the HEASARC in January 2002. It was revised in March 2002, in June 2003, and again in January 2014. It is based on the table obtained from http://www.atnf.csiro.au/research/pulsar/psrcat/expert.html.
Changes to the catalog are logged at http://www.atnf.csiro.au/research/pulsar/psrcat/catalogueHistory.html.
The HEASARC table will be updated on a weekly basis whenever the original ATNF database table is updated. This is a service provided by NASA HEASARC .
Facebook
TwitterDuring a January 2024 global survey among marketers, nearly 60 percent reported plans to increase their organic use of YouTube for marketing purposes in the following 12 months. LinkedIn and Instagram followed, respectively mentioned by 57 and 56 percent of the respondents intending to use them more. According to the same survey, Facebook was the most important social media platform for marketers worldwide.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Percentage of individuals using the internet for internet banking (electronic payments/transactions, looking up account information, etc.), within the last 3 months prior to the survey. Expressed as a percentage of all individuals aged between 16 and 74 years old surveyed. Data based on the annual EU survey on the use of Information and Communication Technologies (ICT) in households and by individuals.