12 datasets found
  1. c

    Dataset of X-Ray Micro Computed Tomography Measurements of Porosity for...

    • kilthub.cmu.edu
    txt
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Miner; Sneha Prabha Narra (2025). Dataset of X-Ray Micro Computed Tomography Measurements of Porosity for Nominal and Low Power Coupons Fabricated by Powder Bed Fusion-Laser Beam [Dataset]. http://doi.org/10.1184/R1/28152209.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Carnegie Mellon University
    Authors
    Justin Miner; Sneha Prabha Narra
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    DescriptionDataset of porosity data in Powder Bed Fusion - Laser Beam of Ti-6Al-4V obtained via X-ray Micro Computed Tomography. This work was conducted on an EOS M290. The coupons in this dataset are fabricated at 150 W and 280 W.Contentsporedf.csv: A csv file with pore measurements for each sample scanned.parameters.csv: A csv file containing the process parameters and extreme value statistics (EVS) parameters for each sample scanned.WARNING: parameters.csv is too large to open in excel. Saving it in excel will cause data loss.

  2. BioTIME

    • zenodo.org
    bin, csv
    Updated Jun 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioTIME Consortium; BioTIME Consortium (2021). BioTIME [Dataset]. http://doi.org/10.5281/zenodo.1095628
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Jun 24, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    BioTIME Consortium; BioTIME Consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.

    This data consists of several elements:

    • BioTIMESQL_06_12_2017.sql - an SQL file for the full public version of BioTIME which can be imported into any mySQL database.
    • BioTIMEQuery_06_12_2017.csv - data file, although too large to view in Excel, this can be read into several software applications such as R or various database packages.
    • BioTIMEMetadata_06_12_2017.csv - file containing the meta data for all studies.
    • BioTIMECitations_06_12_2017.csv - file containing the citation list for all studies.
    • BioTIMECitations_06_12_2017.xlsx - file containing the citation list for all studies (some special characters are not supported in the csv format).
    • BioTIMEInteractions_06_12_2017.Rmd - an r markdown page providing a brief overview of how to interact with the database and associated .csv files (this will not work until field paths and database connections have been added/updated).

    Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.

  3. Z

    BioTIME

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioTIME Consortium (2021). BioTIME [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1095627
    Explore at:
    Dataset updated
    Jun 24, 2021
    Dataset provided by
    Centre for Biological Diversity and Scottish Oceans Institute, School of Biology, University of St. Andrews, St. Andrews, UK
    Authors
    BioTIME Consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.

    This data consists of several elements:

    BioTIMESQL_02_04_2018.sql - an SQL file for the full public version of BioTIME which can be imported into any mySQL database.

    BioTIMEQuery_02_04_2018.csv - data file, although too large to view in Excel, this can be read into several software applications such as R or various database packages.

    BioTIMEMetadata_02_04_2018.csv - file containing the meta data for all studies.

    BioTIMECitations_02_04_2018.csv - file containing the citation list for all studies.

    BioTIMECitations_02_04_2018.xlsx - file containing the citation list for all studies (some special characters are not supported in the csv format).

    BioTIMEInteractions_02_04_2018.Rmd - an r markdown page providing a brief overview of how to interact with the database and associated .csv files (this will not work until field paths and database connections have been added/updated).

    Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.

    To cite the data paper use the following:

    Dornelas M, Antão LH, Moyes F, Bates, AE, Magurran, AE, et al. BioTIME: A database of biodiversity time series for the Anthropocene. Global Ecol Biogeogr. 2018; 27:760 - 786. https://doi.org/10.1111/geb.12729

  4. Job Interview Assignments test

    • kaggle.com
    zip
    Updated Apr 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Anand (2023). Job Interview Assignments test [Dataset]. https://www.kaggle.com/datasets/yekahaaagayeham/job-interview-assignments-test
    Explore at:
    zip(37601318 bytes)Available download formats
    Dataset updated
    Apr 18, 2023
    Authors
    Aman Anand
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Task 1

    Business roles at AgroStar require a baseline of analytical skills, and it is also critical that we are able to explain complex concepts in a simple way to a variety of audiences. This test is structured so that someone with the baseline skills needed to succeed in the role should be able to complete this in under 4 hours without assistance.

    Use the data in the included sheet to address the following scenario...

    Since its inception, AgroStar has been leveraging an assisted marketplace model. Given that the market potential is huge and that the target customer appreciates a physical store nearby, we have taken a call to explore the offline retail model to drive growth. The primary objective is to get a larger wallet share for AgroStar among existing customers.

    Assume you are back in time, in August 2018 and you have been asked to determine the location (taluka) of the first AgroStar offline retail store. 1. What are the key factors you would use to determine the location? Why? 2. What taluka (across three states) would you look open in? Why?

    Guidelines:

    -- (1) Please mention any assumptions you have made and the underlying thought process -- (2) Please treat the assignment as standalone (it should be self-explanatory to someone who reads it), but we will have a follow-up discussion with you in which we will walk through your approach to this assignment. -- (3) Mention any data that may be missing that would make this study more meaningful -- (4) Kindly conduct your analysis within the spreadsheet, we would like to see the working sheet. If you face any issues due to the file size, kindly download this file and share an excel sheet with us -- (5) If you would like to append a word document/presentation to summarize, please go ahead. -- (6) In case you use any external data source/article, kindly share the source.

    Task 4 Cohort

    The file CDNOW_master.txt contains the entire purchase history up to the end of June 1998 of the cohort of 23,570 individuals who made their first-ever purchase at CDNOW in the first quarter of 1997. This CDNOW dataset was first used by Fader and Hardie (2001).

    Each record in this file, 69,659 in total, comprises four fields: the customer's ID, the date of the transaction, the number of CDs purchased, and the dollar value of the transaction.

    CustID = CDNOW_master(:,1); % customer id Date = CDNOW_master(:,2); % transaction date Quant = CDNOW_master(:,3); % number of CDs purchased Spend = CDNOW_master(:,4); % dollar value (excl. S&H)

    See "Notes on the CDNOW Master Data Set" (http://brucehardie.com/notes/026/) for details of how the 1/10th systematic sample (http://brucehardie.com/datasets/CDNOW_sample.zip) used in many papers was created.

    Reference:

    Fader, Peter S. and Bruce G.,S. Hardie, (2001), "Forecasting Repeat Sales at CDNOW: A Case Study," Interfaces, 31 (May-June), Part 2 of 2, S94-S107.

    Task 6 Zupee.csv

    I have merged all three datasets into one file and also did some feature engineering.
    Available Data: You will be given anonymized user gameplay data in the form of 3 csv files. Fields in the data are as described below: Gameplay_Data.csv contains the following fields: * Uid: Alphanumeric unique Id assigned to user * Eventtime: DateTime on which user played the tournament * Entry_Fee: Entry Fee of tournament * Win_Loss: ‘W’ if the user won that particular tournament, ‘L’ otherwise * Winnings: How much money the user won in the tournament (0 for ‘L’) * Tournament_Type: Type of tournament user played (A / B / C / D) * Num_Players: Number of players that played in this tournament

    Wallet_Balance.csv contains following fields: * Uid: Alphanumeric unique Id assigned to user * Timestamp: DateTime at which user’s wallet balance is given * Wallet_Balance: User’s wallet balance at given time stamp

    Demographic.csv contains following fields: * Uid: Alphanumeric unique Id assigned to user * Installed_At: Timestamp at which user installed the app * Connection_Type: User’s internet connection type (Ex: Cellular / Dial Up) * Cpu_Type: Cpu type of device that the user is playing with * Network_Type: Network type in encoded form * Device_Manufacturer: Ex: Realme * ISP: Internet Service Provider. Ex: Airtel * Country * Country_Subdivision * City * Postal_Code * Language: Language that user has selected for gameplay * Device_Name * Device_Type

    Build a basic recommendation system which is able to rank/recommend relevant tournaments and entry prices to the user. The main objectives are: 1. A user should not have to scroll too much before selecting a tournament of their preference 2. We would like the user to play as high an entry fee tournament as possible

  5. u

    Electrification of Heat Demonstration Project, 2020-2023

    • datacatalogue.ukdataservice.ac.uk
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Energy Systems Catapult (2024). Electrification of Heat Demonstration Project, 2020-2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-9050-2
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Energy Systems Catapult
    Area covered
    United Kingdom
    Description

    The heat pump monitoring datasets are a key output of the Electrification of Heat Demonstration (EoH) project, a government-funded heat pump trial assessing the feasibility of heat pumps across the UK’s diverse housing stock. These datasets are provided in both cleansed and raw form and allow analysis of the initial performance of the heat pumps installed in the trial. From the datasets, insights such as heat pump seasonal performance factor (a measure of the heat pump's efficiency), heat pump performance during the coldest day of the year, and half-hourly performance to inform peak demand can be gleaned.

    For the second edition (December 2024), the data were updated to include cleaned performance data collected between November 2020 and September 2023. The only documentation currently available with the study is the Excel data dictionary. Reports and other contextual information can be found on the Energy Systems Catapult website.

    The EoH project was funded by the Department of Business, Energy and Industrial Strategy. From 2023, it is covered by the new Department for Energy Security and Net Zero.

    Data availability

    This study comprises the open-access cleansed data from the EoH project and a summary dataset, available in four zipped files (see the 'Access Data' tab). Users must download all four zip files to obtain the full set of cleansed data and accompanying documentation.

    When unzipped, the full cleansed data comprises 742 CSV files. Most of the individual CSV files are too large to open in Excel. Users should ensure they have sufficient computing facilities to analyse the data.

    The UKDS also holds an accompanying study, SN 9049 Electrification of Heat Demonstration Project: Heat Pump Performance Raw Data, 2020-2023, which is available only to registered UKDS users. This contains the raw data from the EoH project. Since the data are very large, only the summary dataset is available to download; an order must be placed for FTP delivery of the remaining raw data. Other studies in the set include SN 9209, which comprises 30-minute interval heat pump performance data, and SN 9210, which includes daily heat pump performance data.

    The Python code used to cleanse the raw data and then perform the analysis is accessible via the "https://github.com/ES-Catapult/electrification_of_heat" target="_blank"> Energy Systems Catapult Github

  6. d

    GP Practice Prescribing Presentation-level Data - July 2014

    • digital.nhs.uk
    csv, zip
    Updated Oct 31, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). GP Practice Prescribing Presentation-level Data - July 2014 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/practice-level-prescribing-data
    Explore at:
    csv(1.4 GB), zip(257.7 MB), csv(1.7 MB), csv(275.8 kB)Available download formats
    Dataset updated
    Oct 31, 2014
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Jul 1, 2014 - Jul 31, 2014
    Area covered
    United Kingdom
    Description

    Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.

  7. p

    Risk Manager Email List

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Risk Manager Email List [Dataset]. https://listtodata.com/risk-manager-email-list
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Authors
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Åland Islands, Myanmar, Aruba, Panama, Burkina Faso, Iceland, Equatorial Guinea, Serbia, Pitcairn, South Africa
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Risk manager email list can play a key role in your business growth. Moreover, risk managers are vital because they protect companies from financial and operational problems. They always ensure workplace safety and find ways to improve business stability. In addition, they often work in industries like insurance, banking, and corporate finance. However, reaching them directly is not always easy. Therefore, you can use our risk manager email list to connect faster. With our verified lists, you can contact professionals in manufacturing, healthcare, finance, and retail. Most importantly, the data is accurate, up-to-date, and human-verified for your benefit. Also, we check the database regularly to maintain quality. So, you can send offers, updates, or proposals without worrying about wrong contacts.

    manager email list is affordable and easy to use in any CRM system. Thus, you save time, reduce marketing costs, and target the right audience effectively. So, by choosing our service, you get the correct information that truly helps your business succeed. Therefore, whether you run a small company or a large enterprise, you can benefit. As a result, your marketing can bring better leads, higher engagement, and increased profits. In conclusion, our risk managers email lists give you the right contacts for the right opportunities. So, by choosing our service, you invest in accurate information that helps your business grow and succeed. It is available now at List to Data. Risk manager email database provides the best solution for your business. It allows you to easily reach out to risk managers all over the world. Most importantly, it helps you expand your business and find new clients. We ensure our database is highly reliable by verifying all the information we collect. That makes your marketing efforts much more effective. With the email addresses in this database, you can directly inform risk managers about your products or services. Furthermore, you can easily get our database in either Excel or CSV file format. Moreover, this makes it simple to use the data according to your specific needs. In conclusion, purchasing our dataset will help your business grow faster. We are always committed to providing you with the best possible service.

  8. t

    Data from: Data set for the population survey “attitudes towards big data...

    • service.tib.eu
    • radar-service.eu
    • +1more
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Data set for the population survey “attitudes towards big data practices and the institutional framework of privacy and data protection” [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1151
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Abstract: The aim of this study is to gain insights into the attitudes of the population towards big data practices and the factors influencing them. To this end, a nationwide survey (N = 1,331), representative of the population of Germany, addressed the attitudes about selected big data practices exemplified by four scenarios, which may have a direct impact on the personal lifestyle. The scenarios contained price discrimination in retail, credit scoring, differentiations in health insurance, and differentiations in employment. The attitudes about the scenarios were set into relation to demographic characteristics, personal value orientations, knowledge about computers and the internet, and general attitudes about privacy and data protection. Another focus of the study is on the institutional framework of privacy and data protection, because the realization of benefits or risks of big data practices for the population also depends on the knowledge about the rights the institutional framework provided to the population and the actual use of those rights. As results, several challenges for the framework by big data practices were confirmed, in particular for the elements of informed consent with privacy policies, purpose limitation, and the individuals’ rights to request information about the processing of personal data and to have these data corrected or erased. TechnicalRemarks: TYPE OF SURVEY AND METHODS The data set includes responses to a survey conducted by professionally trained interviewers of a social and market research company in the form of computer-aided telephone interviews (CATI) from 2017-02 to 2017-04. The target population was inhabitants of Germany aged 18 years and more, who were randomly selected by using the sampling approaches ADM eASYSAMPLe (based on the Gabler-Häder method) for landline connections and eASYMOBILe for mobile connections. The 1,331 completed questionnaires comprise 44.2 percent mobile and 55.8 percent landline phone respondents. Most questions had options to answer with a 5-point rating scale (Likert-like) anchored with ‘Fully agree’ to ‘Do not agree at all’, or ‘Very uncomfortable’ to ‘Very comfortable’, for instance. Responses by the interviewees were weighted to obtain a representation of the entire German population (variable ‘gewicht’ in the data sets). To this end, standard weighting procedures were applied to reduce differences between the sample and the entire population with regard to known rates of response and non-response depending on household size, age, gender, educational level, and place of residence. RELATED PUBLICATION AND FURTHER DETAILS The questionnaire, analysis and results will be published in the corresponding report (main text in English language, questionnaire in Appendix B in German language of the interviews and English translation). The report will be available as open access publication at KIT Scientific Publishing (https://www.ksp.kit.edu/). Reference: Orwat, Carsten; Schankin, Andrea (2018): Attitudes towards big data practices and the institutional framework of privacy and data protection - A population survey, KIT Scientific Report 7753, Karlsruhe: KIT Scientific Publishing. FILE FORMATS The data set of responses is saved for the repository KITopen at 2018-11 in the following file formats: comma-separated values (.csv), tapulator-separated values (.dat), Excel (.xlx), Excel 2007 or newer (.xlxs), and SPSS Statistics (.sav). The questionnaire is saved in the following file formats: comma-separated values (.csv), Excel (.xlx), Excel 2007 or newer (.xlxs), and Portable Document Format (.pdf). PROJECT AND FUNDING The survey is part of the project Assessing Big Data (ABIDA) (from 2015-03 to 2019-02), which receives funding from the Federal Ministry of Education and Research (BMBF), Germany (grant no. 01IS15016A-F). http://www.abida.de

  9. Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A...

    • figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather Cribbs; Gabriel Gardner (2023). Data for "To Pre-Filter, or Not to Pre-Filter, That Is the Query: A Multi-Campus Big Data Study" [Dataset]. http://doi.org/10.6084/m9.figshare.19071578.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Heather Cribbs; Gabriel Gardner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Five files, one of which is a ZIP archive, containing data that support the findings of this study. PDF file "IA screenshots CSU Libraries search config" contains screenshots captured from the Internet Archive's Wayback Machine for all 24 CalState libraries' homepages for years 2017 - 2019. Excel file "CCIHE2018-PublicDataFile" contains Carnegie Classifications data from the Indiana University Center for Postsecondary Research for all of the CalState campuses from 2018. CSV file "2017-2019_RAW" contains the raw data exported from Ex Libris Primo Analytics (OBIEE) for all 24 CalState libraries for calendar years 2017 - 2019. CSV file "clean_data" contains the cleaned data from Primo Analytics which was used for all subsequent analysis such as charting and import into SPSS for statistical testing. ZIP archive file "NonparametricStatisticalTestsFromSPSS" contains 23 SPSS files [.spv format] reporting the results of testing conducted in SPSS. This archive includes things such as normality check, descriptives, and Kruskal-Wallis H-test results.

  10. HelpSteer: AI Alignment Dataset

    • kaggle.com
    zip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). HelpSteer: AI Alignment Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/helpsteer-ai-alignment-dataset
    Explore at:
    zip(16614333 bytes)Available download formats
    Dataset updated
    Nov 22, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    HelpSteer: AI Alignment Dataset

    Real-World Helpfulness Annotated for AI Alignment

    By Huggingface Hub [source]

    About this dataset

    HelpSteer is an Open-Source dataset designed to empower AI Alignment through the support of fair, team-oriented annotation. The dataset provides 37,120 samples each containing a prompt and response along with five human-annotated attributes ranging between 0 and 4; with higher results indicating better quality. Using cutting-edge methods in machine learning and natural language processing in combination with the annotation of data experts, HelpSteer strives to create a set of standardized values that can be used to measure alignment between human and machine interactions. With comprehensive datasets providing responses rated for correctness, coherence, complexity, helpfulness and verbosity, HelpSteer sets out to assist organizations in fostering reliable AI models which ensure more accurate results thereby leading towards improved user experience at all levels

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use HelpSteer: An Open-Source AI Alignment Dataset

    HelpSteer is an open-source dataset designed to help researchers create models with AI Alignment. The dataset consists of 37,120 different samples each containing a prompt, a response and five human-annotated attributes used to measure these responses. This guide will give you a step-by-step introduction on how to leverage HelpSteer for your own projects.

    Step 1 - Choosing the Data File

    Helpsteer contains two data files – one for training and one for validation. To start exploring the dataset, first select the file you would like to use by downloading both train.csv and validation.csv from the Kaggle page linked above or getting them from the Google Drive repository attached here: [link]. All the samples in each file consist of 7 columns with information about a single response: prompt (given), response (submitted), helpfulness, correctness, coherence, complexity and verbosity; all sporting values between 0 and 4 where higher means better in respective category.

    ## Step 2—Exploratory Data Analysis (EDA) Once you have your file loaded into your workspace or favorite software environment (e.g suggested libraries like Pandas/Numpy or even Microsoft Excel), it’s time explore it further by running some basic EDA commands that summarize each feature's distribution within our data set as well as note potential trends or points of interests throughout it - e.g what are some traits that are polarizing these responses more? Are there any outliers that might signal something interesting happening? Plotting these results often provides great insights into pattern recognition across datasets which can be used later on during modeling phase also known as “Feature Engineering”

    ## Step 3—Data Preprocessing After your interpretation of raw data while doing EDA should form some hypotheses around what features matter most when trying to estimate attribute scores of unknown responses accurately so proceeding with preprocessing such as cleaning up missing entries or handling outliers accordingly becomes highly recommended before starting any modelling efforts with this data set - kindly refer also back at Kaggle page description section if unsure about specific attributes domain ranges allowed values explicitly for extra confidence during this step because having correct numerical suggestions ready can make modelling workload lighter later on while building predictive models . It’s important not rushing over this stage otherwise poor results may occur later when aiming high accuracy too quickly upon model deployment due low quality

    Research Ideas

    • Designating and measuring conversational AI engagement goals: Researchers can utilize the HelpSteer dataset to design evaluation metrics for AI engagement systems.
    • Identifying conversational trends: By analyzing the annotations and data in HelpSteer, organizations can gain insights into what makes conversations more helpful, cohesive, complex or consistent across datasets or audiences.
    • Training Virtual Assistants: Train artificial intelligence algorithms on this dataset to develop virtual assistants that respond effectively to customer queries with helpful answers

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons.org/pu...

  11. The Gesture-Related Personality of a Virtual Character

    • figshare.com
    csv
    Updated Oct 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chun Yang Su; Chun Heng Ho (2025). The Gesture-Related Personality of a Virtual Character [Dataset]. http://doi.org/10.6084/m9.figshare.30404356.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 28, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Chun Yang Su; Chun Heng Ho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the research paper titled The Gesture-Related Personality of a Virtual Character.It includes both the dataset and the codebook, provided in CSV format. These files can be opened using Microsoft Excel or any compatible open-source office software.

  12. m

    Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...

    • data.mendeley.com
    Updated Oct 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abtin Ijadi Maghsoodi (2021). Integrated Cryptocurrency Historical Data for a Predictive Data-Driven Decision-Making Algorithm [Dataset]. http://doi.org/10.17632/37nb83jwtd.1
    Explore at:
    Dataset updated
    Oct 29, 2021
    Authors
    Abtin Ijadi Maghsoodi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/

    This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Justin Miner; Sneha Prabha Narra (2025). Dataset of X-Ray Micro Computed Tomography Measurements of Porosity for Nominal and Low Power Coupons Fabricated by Powder Bed Fusion-Laser Beam [Dataset]. http://doi.org/10.1184/R1/28152209.v1

Dataset of X-Ray Micro Computed Tomography Measurements of Porosity for Nominal and Low Power Coupons Fabricated by Powder Bed Fusion-Laser Beam

Explore at:
txtAvailable download formats
Dataset updated
Jun 25, 2025
Dataset provided by
Carnegie Mellon University
Authors
Justin Miner; Sneha Prabha Narra
License

Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically

Description

DescriptionDataset of porosity data in Powder Bed Fusion - Laser Beam of Ti-6Al-4V obtained via X-ray Micro Computed Tomography. This work was conducted on an EOS M290. The coupons in this dataset are fabricated at 150 W and 280 W.Contentsporedf.csv: A csv file with pore measurements for each sample scanned.parameters.csv: A csv file containing the process parameters and extreme value statistics (EVS) parameters for each sample scanned.WARNING: parameters.csv is too large to open in excel. Saving it in excel will cause data loss.

Search
Clear search
Close search
Google apps
Main menu