Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
DescriptionDataset of porosity data in Powder Bed Fusion - Laser Beam of Ti-6Al-4V obtained via X-ray Micro Computed Tomography. This work was conducted on an EOS M290. The coupons in this dataset are fabricated at 150 W and 280 W.Contentsporedf.csv: A csv file with pore measurements for each sample scanned.parameters.csv: A csv file containing the process parameters and extreme value statistics (EVS) parameters for each sample scanned.WARNING: parameters.csv is too large to open in excel. Saving it in excel will cause data loss.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.
This data consists of several elements:
Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.
This data consists of several elements:
BioTIMESQL_02_04_2018.sql - an SQL file for the full public version of BioTIME which can be imported into any mySQL database.
BioTIMEQuery_02_04_2018.csv - data file, although too large to view in Excel, this can be read into several software applications such as R or various database packages.
BioTIMEMetadata_02_04_2018.csv - file containing the meta data for all studies.
BioTIMECitations_02_04_2018.csv - file containing the citation list for all studies.
BioTIMECitations_02_04_2018.xlsx - file containing the citation list for all studies (some special characters are not supported in the csv format).
BioTIMEInteractions_02_04_2018.Rmd - an r markdown page providing a brief overview of how to interact with the database and associated .csv files (this will not work until field paths and database connections have been added/updated).
Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.
To cite the data paper use the following:
Dornelas M, Antão LH, Moyes F, Bates, AE, Magurran, AE, et al. BioTIME: A database of biodiversity time series for the Anthropocene. Global Ecol Biogeogr. 2018; 27:760 - 786. https://doi.org/10.1111/geb.12729
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Business roles at AgroStar require a baseline of analytical skills, and it is also critical that we are able to explain complex concepts in a simple way to a variety of audiences. This test is structured so that someone with the baseline skills needed to succeed in the role should be able to complete this in under 4 hours without assistance.
Use the data in the included sheet to address the following scenario...
Since its inception, AgroStar has been leveraging an assisted marketplace model. Given that the market potential is huge and that the target customer appreciates a physical store nearby, we have taken a call to explore the offline retail model to drive growth. The primary objective is to get a larger wallet share for AgroStar among existing customers.
Assume you are back in time, in August 2018 and you have been asked to determine the location (taluka) of the first AgroStar offline retail store. 1. What are the key factors you would use to determine the location? Why? 2. What taluka (across three states) would you look open in? Why?
-- (1) Please mention any assumptions you have made and the underlying thought process
-- (2) Please treat the assignment as standalone (it should be self-explanatory to someone who reads it), but we will have a follow-up discussion with you in which we will walk through your approach to this assignment.
-- (3) Mention any data that may be missing that would make this study more meaningful
-- (4) Kindly conduct your analysis within the spreadsheet, we would like to see the working sheet. If you face any issues due to the file size, kindly download this file and share an excel sheet with us
-- (5) If you would like to append a word document/presentation to summarize, please go ahead.
-- (6) In case you use any external data source/article, kindly share the source.
The file CDNOW_master.txt contains the entire purchase history up to the end of June 1998 of the cohort of 23,570 individuals who made their first-ever purchase at CDNOW in the first quarter of 1997. This CDNOW dataset was first used by Fader and Hardie (2001).
Each record in this file, 69,659 in total, comprises four fields: the customer's ID, the date of the transaction, the number of CDs purchased, and the dollar value of the transaction.
CustID = CDNOW_master(:,1); % customer id Date = CDNOW_master(:,2); % transaction date Quant = CDNOW_master(:,3); % number of CDs purchased Spend = CDNOW_master(:,4); % dollar value (excl. S&H)
See "Notes on the CDNOW Master Data Set" (http://brucehardie.com/notes/026/) for details of how the 1/10th systematic sample (http://brucehardie.com/datasets/CDNOW_sample.zip) used in many papers was created.
Reference:
Fader, Peter S. and Bruce G.,S. Hardie, (2001), "Forecasting Repeat Sales at CDNOW: A Case Study," Interfaces, 31 (May-June), Part 2 of 2, S94-S107.
I have merged all three datasets into one file and also did some feature engineering.
Available Data: You will be given anonymized user gameplay data in the form of 3 csv files.
Fields in the data are as described below:
Gameplay_Data.csv contains the following fields:
* Uid: Alphanumeric unique Id assigned to user
* Eventtime: DateTime on which user played the tournament
* Entry_Fee: Entry Fee of tournament
* Win_Loss: ‘W’ if the user won that particular tournament, ‘L’ otherwise
* Winnings: How much money the user won in the tournament (0 for ‘L’)
* Tournament_Type: Type of tournament user played (A / B / C / D)
* Num_Players: Number of players that played in this tournament
Wallet_Balance.csv contains following fields: * Uid: Alphanumeric unique Id assigned to user * Timestamp: DateTime at which user’s wallet balance is given * Wallet_Balance: User’s wallet balance at given time stamp
Demographic.csv contains following fields: * Uid: Alphanumeric unique Id assigned to user * Installed_At: Timestamp at which user installed the app * Connection_Type: User’s internet connection type (Ex: Cellular / Dial Up) * Cpu_Type: Cpu type of device that the user is playing with * Network_Type: Network type in encoded form * Device_Manufacturer: Ex: Realme * ISP: Internet Service Provider. Ex: Airtel * Country * Country_Subdivision * City * Postal_Code * Language: Language that user has selected for gameplay * Device_Name * Device_Type
Build a basic recommendation system which is able to rank/recommend relevant tournaments and entry prices to the user. The main objectives are: 1. A user should not have to scroll too much before selecting a tournament of their preference 2. We would like the user to play as high an entry fee tournament as possible
Facebook
TwitterThe heat pump monitoring datasets are a key output of the Electrification of Heat Demonstration (EoH) project, a government-funded heat pump trial assessing the feasibility of heat pumps across the UK’s diverse housing stock. These datasets are provided in both cleansed and raw form and allow analysis of the initial performance of the heat pumps installed in the trial. From the datasets, insights such as heat pump seasonal performance factor (a measure of the heat pump's efficiency), heat pump performance during the coldest day of the year, and half-hourly performance to inform peak demand can be gleaned.
For the second edition (December 2024), the data were updated to include cleaned performance data collected between November 2020 and September 2023. The only documentation currently available with the study is the Excel data dictionary. Reports and other contextual information can be found on the Energy Systems Catapult website.
The EoH project was funded by the Department of Business, Energy and Industrial Strategy. From 2023, it is covered by the new Department for Energy Security and Net Zero.
Data availability
This study comprises the open-access cleansed data from the EoH project and a summary dataset, available in four zipped files (see the 'Access Data' tab). Users must download all four zip files to obtain the full set of cleansed data and accompanying documentation.
When unzipped, the full cleansed data comprises 742 CSV files. Most of the individual CSV files are too large to open in Excel. Users should ensure they have sufficient computing facilities to analyse the data.
The UKDS also holds an accompanying study, SN 9049 Electrification of Heat Demonstration Project: Heat Pump Performance Raw Data, 2020-2023, which is available only to registered UKDS users. This contains the raw data from the EoH project. Since the data are very large, only the summary dataset is available to download; an order must be placed for FTP delivery of the remaining raw data. Other studies in the set include SN 9209, which comprises 30-minute interval heat pump performance data, and SN 9210, which includes daily heat pump performance data.
The Python code used to cleanse the raw data and then perform the analysis is accessible via the
"https://github.com/ES-Catapult/electrification_of_heat" target="_blank">
Energy Systems Catapult Github
Facebook
Twitterhttps://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Risk manager email list can play a key role in your business growth. Moreover, risk managers are vital because they protect companies from financial and operational problems. They always ensure workplace safety and find ways to improve business stability. In addition, they often work in industries like insurance, banking, and corporate finance. However, reaching them directly is not always easy. Therefore, you can use our risk manager email list to connect faster. With our verified lists, you can contact professionals in manufacturing, healthcare, finance, and retail. Most importantly, the data is accurate, up-to-date, and human-verified for your benefit. Also, we check the database regularly to maintain quality. So, you can send offers, updates, or proposals without worrying about wrong contacts.
manager email list is affordable and easy to use in any CRM system. Thus, you save time, reduce marketing costs, and target the right audience effectively. So, by choosing our service, you get the correct information that truly helps your business succeed. Therefore, whether you run a small company or a large enterprise, you can benefit. As a result, your marketing can bring better leads, higher engagement, and increased profits. In conclusion, our risk managers email lists give you the right contacts for the right opportunities. So, by choosing our service, you invest in accurate information that helps your business grow and succeed. It is available now at List to Data. Risk manager email database provides the best solution for your business. It allows you to easily reach out to risk managers all over the world. Most importantly, it helps you expand your business and find new clients. We ensure our database is highly reliable by verifying all the information we collect. That makes your marketing efforts much more effective. With the email addresses in this database, you can directly inform risk managers about your products or services. Furthermore, you can easily get our database in either Excel or CSV file format. Moreover, this makes it simple to use the data according to your specific needs. In conclusion, purchasing our dataset will help your business grow faster. We are always committed to providing you with the best possible service.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Abstract: The aim of this study is to gain insights into the attitudes of the population towards big data practices and the factors influencing them. To this end, a nationwide survey (N = 1,331), representative of the population of Germany, addressed the attitudes about selected big data practices exemplified by four scenarios, which may have a direct impact on the personal lifestyle. The scenarios contained price discrimination in retail, credit scoring, differentiations in health insurance, and differentiations in employment. The attitudes about the scenarios were set into relation to demographic characteristics, personal value orientations, knowledge about computers and the internet, and general attitudes about privacy and data protection. Another focus of the study is on the institutional framework of privacy and data protection, because the realization of benefits or risks of big data practices for the population also depends on the knowledge about the rights the institutional framework provided to the population and the actual use of those rights. As results, several challenges for the framework by big data practices were confirmed, in particular for the elements of informed consent with privacy policies, purpose limitation, and the individuals’ rights to request information about the processing of personal data and to have these data corrected or erased. TechnicalRemarks: TYPE OF SURVEY AND METHODS The data set includes responses to a survey conducted by professionally trained interviewers of a social and market research company in the form of computer-aided telephone interviews (CATI) from 2017-02 to 2017-04. The target population was inhabitants of Germany aged 18 years and more, who were randomly selected by using the sampling approaches ADM eASYSAMPLe (based on the Gabler-Häder method) for landline connections and eASYMOBILe for mobile connections. The 1,331 completed questionnaires comprise 44.2 percent mobile and 55.8 percent landline phone respondents. Most questions had options to answer with a 5-point rating scale (Likert-like) anchored with ‘Fully agree’ to ‘Do not agree at all’, or ‘Very uncomfortable’ to ‘Very comfortable’, for instance. Responses by the interviewees were weighted to obtain a representation of the entire German population (variable ‘gewicht’ in the data sets). To this end, standard weighting procedures were applied to reduce differences between the sample and the entire population with regard to known rates of response and non-response depending on household size, age, gender, educational level, and place of residence. RELATED PUBLICATION AND FURTHER DETAILS The questionnaire, analysis and results will be published in the corresponding report (main text in English language, questionnaire in Appendix B in German language of the interviews and English translation). The report will be available as open access publication at KIT Scientific Publishing (https://www.ksp.kit.edu/). Reference: Orwat, Carsten; Schankin, Andrea (2018): Attitudes towards big data practices and the institutional framework of privacy and data protection - A population survey, KIT Scientific Report 7753, Karlsruhe: KIT Scientific Publishing. FILE FORMATS The data set of responses is saved for the repository KITopen at 2018-11 in the following file formats: comma-separated values (.csv), tapulator-separated values (.dat), Excel (.xlx), Excel 2007 or newer (.xlxs), and SPSS Statistics (.sav). The questionnaire is saved in the following file formats: comma-separated values (.csv), Excel (.xlx), Excel 2007 or newer (.xlxs), and Portable Document Format (.pdf). PROJECT AND FUNDING The survey is part of the project Assessing Big Data (ABIDA) (from 2015-03 to 2019-02), which receives funding from the Federal Ministry of Education and Research (BMBF), Germany (grant no. 01IS15016A-F). http://www.abida.de
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
HelpSteer is an Open-Source dataset designed to empower AI Alignment through the support of fair, team-oriented annotation. The dataset provides 37,120 samples each containing a prompt and response along with five human-annotated attributes ranging between 0 and 4; with higher results indicating better quality. Using cutting-edge methods in machine learning and natural language processing in combination with the annotation of data experts, HelpSteer strives to create a set of standardized values that can be used to measure alignment between human and machine interactions. With comprehensive datasets providing responses rated for correctness, coherence, complexity, helpfulness and verbosity, HelpSteer sets out to assist organizations in fostering reliable AI models which ensure more accurate results thereby leading towards improved user experience at all levels
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use HelpSteer: An Open-Source AI Alignment Dataset
HelpSteer is an open-source dataset designed to help researchers create models with AI Alignment. The dataset consists of 37,120 different samples each containing a prompt, a response and five human-annotated attributes used to measure these responses. This guide will give you a step-by-step introduction on how to leverage HelpSteer for your own projects.
Step 1 - Choosing the Data File
Helpsteer contains two data files – one for training and one for validation. To start exploring the dataset, first select the file you would like to use by downloading both train.csv and validation.csv from the Kaggle page linked above or getting them from the Google Drive repository attached here: [link]. All the samples in each file consist of 7 columns with information about a single response: prompt (given), response (submitted), helpfulness, correctness, coherence, complexity and verbosity; all sporting values between 0 and 4 where higher means better in respective category.
## Step 2—Exploratory Data Analysis (EDA) Once you have your file loaded into your workspace or favorite software environment (e.g suggested libraries like Pandas/Numpy or even Microsoft Excel), it’s time explore it further by running some basic EDA commands that summarize each feature's distribution within our data set as well as note potential trends or points of interests throughout it - e.g what are some traits that are polarizing these responses more? Are there any outliers that might signal something interesting happening? Plotting these results often provides great insights into pattern recognition across datasets which can be used later on during modeling phase also known as “Feature Engineering”
## Step 3—Data Preprocessing After your interpretation of raw data while doing EDA should form some hypotheses around what features matter most when trying to estimate attribute scores of unknown responses accurately so proceeding with preprocessing such as cleaning up missing entries or handling outliers accordingly becomes highly recommended before starting any modelling efforts with this data set - kindly refer also back at Kaggle page description section if unsure about specific attributes domain ranges allowed values explicitly for extra confidence during this step because having correct numerical suggestions ready can make modelling workload lighter later on while building predictive models . It’s important not rushing over this stage otherwise poor results may occur later when aiming high accuracy too quickly upon model deployment due low quality
- Designating and measuring conversational AI engagement goals: Researchers can utilize the HelpSteer dataset to design evaluation metrics for AI engagement systems.
- Identifying conversational trends: By analyzing the annotations and data in HelpSteer, organizations can gain insights into what makes conversations more helpful, cohesive, complex or consistent across datasets or audiences.
- Training Virtual Assistants: Train artificial intelligence algorithms on this dataset to develop virtual assistants that respond effectively to customer queries with helpful answers
If you use this dataset in your research, please credit the original authors. Data Source
**License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons.org/pu...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the research paper titled The Gesture-Related Personality of a Virtual Character.It includes both the dataset and the codebook, provided in CSV format. These files can be opened using Microsoft Excel or any compatible open-source office software.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/
This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
DescriptionDataset of porosity data in Powder Bed Fusion - Laser Beam of Ti-6Al-4V obtained via X-ray Micro Computed Tomography. This work was conducted on an EOS M290. The coupons in this dataset are fabricated at 150 W and 280 W.Contentsporedf.csv: A csv file with pore measurements for each sample scanned.parameters.csv: A csv file containing the process parameters and extreme value statistics (EVS) parameters for each sample scanned.WARNING: parameters.csv is too large to open in excel. Saving it in excel will cause data loss.