100+ datasets found
  1. w

    Dataset of books series that contain Bright ideas and how to have them

    • workwithdata.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of books series that contain Bright ideas and how to have them [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Bright+ideas+and+how+to+have+them&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book series. It has 1 row and is filtered where the books is Bright ideas and how to have them. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  2. o

    Michigan Public Policy Survey Public Use Datasets

    • openicpsr.org
    delimited, spss +1
    Updated Aug 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for Local, State, and Urban Policy (2016). Michigan Public Policy Survey Public Use Datasets [Dataset]. http://doi.org/10.3886/E100132V30
    Explore at:
    spss, delimited, stataAvailable download formats
    Dataset updated
    Aug 19, 2016
    Dataset authored and provided by
    Center for Local, State, and Urban Policy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Michigan
    Description

    The Michigan Public Policy Survey (MPPS) is a program of state-wide surveys of local government leaders in Michigan. The MPPS is designed to fill an important information gap in the policymaking process. While there are ongoing surveys of the business community and of the citizens of Michigan, before the MPPS there were no ongoing surveys of local government officials that were representative of all general purpose local governments in the state. Therefore, while we knew the policy priorities and views of the state's businesses and citizens, we knew very little about the views of the local officials who are so important to the economies and community life throughout Michigan. The MPPS was launched in 2009 by the Center for Local, State, and Urban Policy (CLOSUP) at the University of Michigan and is conducted in partnership with the Michigan Association of Counties, Michigan Municipal League, and Michigan Townships Association. The associations provide CLOSUP with contact information for the survey's respondents, and consult on survey topics. CLOSUP makes all decisions on survey design, data analysis, and reporting, and receives no funding support from the associations. The surveys investigate local officials' opinions and perspectives on a variety of important public policy issues and solicit factual information about their localities relevant to policymaking. Over time, the program has covered issues such as fiscal, budgetary and operational policy, fiscal health, public sector compensation, workforce development, local-state governmental relations, intergovernmental collaboration, economic development strategies and initiatives such as placemaking and economic gardening, the role of local government in environmental sustainability, energy topics such as hydraulic fracturing ("fracking") and wind power, trust in government, views on state policymaker performance, opinions on the impacts of the Federal Stimulus Program (ARRA), and more. The program will investigate many other issues relevant to local and state policy in the future. A searchable database of every question the MPPS has asked is available on CLOSUP's website. Results of MPPS surveys are currently available as reports, and via online data tables. Out of a commitment to promoting public knowledge of Michigan local governance, the Center for Local, State, and Urban Policy is releasing public use datasets. In order to protect respondent confidentiality, CLOSUP has divided the data collected in each wave of the survey into separate datasets focused on different topics that were covered in the survey. Each dataset contains only variables relevant to that subject, and the datasets cannot be linked together. Variables have also been omitted or recoded to further protect respondent confidentiality. For researchers looking for a more extensive release of the MPPS data, restricted datasets are available through openICPSR's Virtual Data Enclave. Please note: additional waves of MPPS public use datasets are being prepared, and will be available as part of this project as soon as they are completed. For information on accessing MPPS public use and restricted datasets, please visit the MPPS data access page: http://closup.umich.edu/mpps-download-datasets

  3. E

    The Human Know-How Dataset

    • dtechtive.com
    • find.data.gov.scot
    pdf, zip
    Updated Apr 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). The Human Know-How Dataset [Dataset]. http://doi.org/10.7488/ds/1394
    Explore at:
    pdf(0.0582 MB), zip(19.67 MB), zip(0.0298 MB), zip(9.433 MB), zip(13.06 MB), zip(0.2837 MB), zip(5.372 MB), zip(69.8 MB), zip(20.43 MB), zip(5.769 MB), zip(14.86 MB), zip(19.78 MB), zip(43.28 MB), zip(62.92 MB), zip(92.88 MB), zip(90.08 MB)Available download formats
    Dataset updated
    Apr 29, 2016
    Description

    The Human Know-How Dataset describes 211,696 human activities from many different domains. These activities are decomposed into 2,609,236 entities (each with an English textual label). These entities represent over two million actions and half a million pre-requisites. Actions are interconnected both according to their dependencies (temporal/logical orders between actions) and decompositions (decomposition of complex actions into simpler ones). This dataset has been integrated with DBpedia (259,568 links). For more information see: - The project website: http://homepages.inf.ed.ac.uk/s1054760/prohow/index.htm - The data is also available on datahub: https://datahub.io/dataset/human-activities-and-instructions ---------------------------------------------------------------- * Quickstart: if you want to experiment with the most high-quality data before downloading all the datasets, download the file '9of11_knowhow_wikihow', and optionally files 'Process - Inputs', 'Process - Outputs', 'Process - Step Links' and 'wikiHow categories hierarchy'. * Data representation based on the PROHOW vocabulary: http://w3id.org/prohow# Data extracted from existing web resources is linked to the original resources using the Open Annotation specification * Data Model: an example of how the data is represented within the datasets is available in the attached Data Model PDF file. The attached example represents a simple set of instructions, but instructions in the dataset can have more complex structures. For example, instructions could have multiple methods, steps could have further sub-steps, and complex requirements could be decomposed into sub-requirements. ---------------------------------------------------------------- Statistics: * 211,696: number of instructions. From wikiHow: 167,232 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 44,464 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 2,609,236: number of RDF nodes within the instructions From wikiHow: 1,871,468 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 737,768 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 255,101: number of process inputs linked to 8,453 distinct DBpedia concepts (dataset Process - Inputs) * 4,467: number of process outputs linked to 3,439 distinct DBpedia concepts (dataset Process - Outputs) * 376,795: number of step links between 114,166 different sets of instructions (dataset Process - Step Links)

  4. E

    A Dataset of Scientific Topics

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    Updated Apr 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). A Dataset of Scientific Topics [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/18328
    Explore at:
    Dataset updated
    Apr 23, 2022
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The automatic extraction of topics is a standard technique for summarizing text corpora from various domains (e.g., news articles, transport or logistic reports, scientific publications) that has several applications. Since, in many cases, topics are subject to continuous change there is the need to monitor the evolution of a set of topics of interest, as the corresponding corpora are updated. The evolution of scientific topics, in particular, is of great interest for researchers, policy makers, fund managers, and other professionals/engineers in the research and academic community. In this dataset, we provide a set of topics for scientific publications gathered from Crossref. The topics have been produced by performing a topic modeling analysis on two distinct sets of publications, each coming from a different time period. Acknowledgements: This research was partially funded by project ENIRISST under grant agreement No. MIS 5027930 (co-financed by Greece and the EU through the European Regional Development Fund).

  5. w

    Dataset of books in the 101 ideas series

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books in the 101 ideas series [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_series&fop0=%3D&fval0=101+ideas&j=1&j0=book_series
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 6 rows and is filtered where the book series is 101 ideas. It features 9 columns including author, publication date, language, and book publisher.

  6. WikiTableQuestions (Semi-structured Tables Q&A)

    • kaggle.com
    Updated Nov 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). WikiTableQuestions (Semi-structured Tables Q&A) [Dataset]. https://www.kaggle.com/datasets/thedevastator/investigation-of-semi-structured-tables-wikitabl
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Investigation of Semi-Structured Tables: WikiTableQuestions

    A Dataset of Complex Questions on Semi-Structured Wikipedia Tables

    By [source]

    About this dataset

    The WikiTableQuestions dataset poses complex questions about the contents of semi-structured Wikipedia tables. Beyond merely testing a model's knowledge retrieval capabilities, these questions require an understanding of both the natural language used and the structure of the table itself in order to provide a correct answer. This makes the dataset an excellent testing ground for AI models that aim to replicate or exceed human-level intelligence

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to use the WikiTableQuestions dataset, you will need to first understand the structure of the dataset. The dataset is comprised of two types of files: questions and answers. The questions are in natural language, and are designed to test a model's ability to understand the table structure, understand the natural language question, and reason about the answer. The answers are in a list format, and provide additional information about each table that can be used to answer the questions.

    To start working with the WikiTableQuestions dataset, you will need to download both the questions and answers files. Once you have downloaded both files, you can begin working with the dataset by loading it into a pandas dataframe. From there, you can begin exploring the data and developing your own models for answering the questions.

    Happy Kaggling!

    Research Ideas

    • The WikiTableQuestions dataset can be used to train a model to answer complex questions about semi-structured Wikipedia tables.

    • The WikiTableQuestions dataset can be used to train a model to understand the structure of semi-structured Wikipedia tables.

    • The WikiTableQuestions dataset can be used to train a model to understand the natural language questions and reason about the answers

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: 0.csv

    File: 1.csv

    File: 10.csv

    File: 11.csv

    File: 12.csv

    File: 14.csv

    File: 15.csv

    File: 17.csv

    File: 18.csv

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

  7. BBC Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2025). BBC Datasets [Dataset]. https://brightdata.com/products/datasets/bbc
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Aug 6, 2025
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of BBC broadcast data with our comprehensive dataset featuring transcripts, program schedules, headlines, topics, and multimedia resources. This all-in-one dataset is designed to empower media analysts, researchers, journalists, and advocacy groups with actionable insights for media analysis, transparency studies, and editorial assessments.

    Dataset Features

    Transcripts: Access detailed broadcast transcripts, including headlines, content, author details, and publication dates. Perfect for analyzing media framing, topic frequency, and news narratives across various programs. Program Schedules: Explore program schedules with accurate timing, show names, and related metadata to track news coverage patterns and identify trends. Topics and Keywords: Analyze categorized topics and keywords to understand content diversity, editorial focus, and recurring themes in news broadcasts. Multimedia Content: Gain access to videos, images, and related articles linked to each broadcast for a holistic understanding of the news presentation. Metadata: Includes critical data points like publication dates, last updates, content URLs, and unique IDs for easier referencing and cross-analysis.

    Customizable Subsets for Specific Needs Our CNN dataset is fully customizable to match your research or analytical goals. Focus on transcripts for in-depth media framing analysis, extract multimedia for content visualization studies, or dive into program schedules for broadcast trend analysis. Tailor the dataset to ensure it aligns with your objectives for maximum efficiency and relevance.

    Popular Use Cases

    Media Analysis: Evaluate news framing, content diversity, and topic coverage to assess editorial direction and media focus. Transparency Studies: Analyze journalistic standards, corrections, and retractions to assess media integrity and accountability. Audience Engagement: Identify recurring topics and trends in news content to understand audience preferences and behavior. Market Analysis: Track media coverage of key industries, companies, and topics to analyze public sentiment and industry relevance. Journalistic Integrity: Use transcripts and metadata to evaluate adherence to reporting practices, fairness, and transparency in news coverage. Research and Scholarly Studies: Leverage transcripts and multimedia to support academic studies in journalism, media criticism, and political discourse analysis.

    Whether you are evaluating transparency, conducting media criticism, or tracking broadcast trends, our BBC dataset provides you with the tools and insights needed for in-depth research and strategic analysis. Customize your access to focus on the most relevant data points for your unique needs.

  8. Activities of Daily Living Object Dataset

    • figshare.com
    bin
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Tanzil Shahria; Mohammad H Rahman (2024). Activities of Daily Living Object Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27263424.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Md Tanzil Shahria; Mohammad H Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Activities of Daily Living Object DatasetOverviewThe ADL (Activities of Daily Living) Object Dataset is a curated collection of images and annotations specifically focusing on objects commonly interacted with during daily living activities. This dataset is designed to facilitate research and development in assistive robotics in home environments.Data Sources and LicensingThe dataset comprises images and annotations sourced from four publicly available datasets:COCO DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (ECCV), 740–755.Open Images DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2020). The Open Images Dataset V6: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. International Journal of Computer Vision, 128(7), 1956–1981.LVIS DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A Dataset for Large Vocabulary Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5356–5364.Roboflow UniverseLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation: The following repositories from Roboflow Universe were used in compiling this dataset:Work, U. AI Based Automatic Stationery Billing System Data Dataset. 2022. Accessible at: https://universe.roboflow.com/university-work/ai-based-automatic-stationery-billing-system-data (accessed on 11 October 2024).Destruction, P.M. Pencilcase Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/pencilcase-se7nb (accessed on 11 October 2024).Destruction, P.M. Final Project Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/final-project-wsuvj (accessed on 11 October 2024).Personal. CSST106 Dataset. 2024. Accessible at: https://universe.roboflow.com/personal-pgkq6/csst106 (accessed on 11 October 2024).New-Workspace-kubz3. Pencilcase Dataset. 2022. Accessible at: https://universe.roboflow.com/new-workspace-kubz3/pencilcase-s9ag9 (accessed on 11 October 2024).Finespiralnotebook. Spiral Notebook Dataset. 2024. Accessible at: https://universe.roboflow.com/finespiralnotebook/spiral_notebook (accessed on 11 October 2024).Dairymilk. Classmate Dataset. 2024. Accessible at: https://universe.roboflow.com/dairymilk/classmate (accessed on 11 October 2024).Dziubatyi, M. Domace Zadanie Notebook Dataset. 2023. Accessible at: https://universe.roboflow.com/maksym-dziubatyi/domace-zadanie-notebook (accessed on 11 October 2024).One. Stationery Dataset. 2024. Accessible at: https://universe.roboflow.com/one-vrmjr/stationery-mxtt2 (accessed on 11 October 2024).jk001226. Liplip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/liplip (accessed on 11 October 2024).jk001226. Lip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/lip-uteep (accessed on 11 October 2024).Upwork5. Socks3 Dataset. 2022. Accessible at: https://universe.roboflow.com/upwork5/socks3 (accessed on 11 October 2024).Book. DeskTableLamps Material Dataset. 2024. Accessible at: https://universe.roboflow.com/book-mxasl/desktablelamps-material-rjbgd (accessed on 11 October 2024).Gary. Medicine Jar Dataset. 2024. Accessible at: https://universe.roboflow.com/gary-ofgwc/medicine-jar (accessed on 11 October 2024).TEST. Kolmarbnh Dataset. 2023. Accessible at: https://universe.roboflow.com/test-wj4qi/kolmarbnh (accessed on 11 October 2024).Tube. Tube Dataset. 2024. Accessible at: https://universe.roboflow.com/tube-nv2vt/tube-9ah9t (accessed on 11 October 2024). Staj. Canned Goods Dataset. 2024. Accessible at: https://universe.roboflow.com/staj-2ipmz/canned-goods-isxbi (accessed on 11 October 2024).Hussam, M. Wallet Dataset. 2024. Accessible at: https://universe.roboflow.com/mohamed-hussam-cq81o/wallet-sn9n2 (accessed on 14 October 2024).Training, K. Perfume Dataset. 2022. Accessible at: https://universe.roboflow.com/kdigital-training/perfume (accessed on 14 October 2024).Keyboards. Shoe-Walking Dataset. 2024. Accessible at: https://universe.roboflow.com/keyboards-tjtri/shoe-walking (accessed on 14 October 2024).MOMO. Toilet Paper Dataset. 2024. Accessible at: https://universe.roboflow.com/momo-nutwk/toilet-paper-wehrw (accessed on 14 October 2024).Project-zlrja. Toilet Paper Detection Dataset. 2024. Accessible at: https://universe.roboflow.com/project-zlrja/toilet-paper-detection (accessed on 14 October 2024).Govorkov, Y. Highlighter Detection Dataset. 2023. Accessible at: https://universe.roboflow.com/yuriy-govorkov-j9qrv/highlighter_detection (accessed on 14 October 2024).Stock. Plum Dataset. 2024. Accessible at: https://universe.roboflow.com/stock-qxdzf/plum-kdznw (accessed on 14 October 2024).Ibnu. Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/ibnu-h3cda/avocado-g9fsl (accessed on 14 October 2024).Molina, N. Detection Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/norberto-molina-zakki/detection-avocado (accessed on 14 October 2024).in Lab, V.F. Peach Dataset. 2023. Accessible at: https://universe.roboflow.com/vietnam-fruit-in-lab/peach-ejdry (accessed on 14 October 2024).Group, K. Tomato Detection 4 Dataset. 2023. Accessible at: https://universe.roboflow.com/kkabs-group-dkcni/tomato-detection-4 (accessed on 14 October 2024).Detection, M. Tomato Checker Dataset. 2024. Accessible at: https://universe.roboflow.com/money-detection-xez0r/tomato-checker (accessed on 14 October 2024).University, A.S. Smart Cam V1 Dataset. 2023. Accessible at: https://universe.roboflow.com/ain-shams-university-byja6/smart_cam_v1 (accessed on 14 October 2024).EMAD, S. Keysdetection Dataset. 2023. Accessible at: https://universe.roboflow.com/shehab-emad-n2q9i/keysdetection (accessed on 14 October 2024).Roads. Chips Dataset. 2024. Accessible at: https://universe.roboflow.com/roads-rvmaq/chips-a0us5 (accessed on 14 October 2024).workspace bgkzo, N. Object Dataset. 2021. Accessible at: https://universe.roboflow.com/new-workspace-bgkzo/object-eidim (accessed on 14 October 2024).Watch, W. Wrist Watch Dataset. 2024. Accessible at: https://universe.roboflow.com/wrist-watch/wrist-watch-0l25c (accessed on 14 October 2024).WYZUP. Milk Dataset. 2024. Accessible at: https://universe.roboflow.com/wyzup/milk-onbxt (accessed on 14 October 2024).AussieStuff. Food Dataset. 2024. Accessible at: https://universe.roboflow.com/aussiestuff/food-al9wr (accessed on 14 October 2024).Almukhametov, A. Pencils Color Dataset. 2023. Accessible at: https://universe.roboflow.com/almas-almukhametov-hs5jk/pencils-color (accessed on 14 October 2024).All images and annotations obtained from these datasets are released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits sharing and adaptation of the material in any medium or format, for any purpose, even commercially, provided that appropriate credit is given, a link to the license is provided, and any changes made are indicated.Redistribution Permission:As all images and annotations are under the CC BY 4.0 license, we are legally permitted to redistribute this data within our dataset. We have complied with the license terms by:Providing appropriate attribution to the original creators.Including links to the CC BY 4.0 license.Indicating any changes made to the original material.Dataset StructureThe dataset includes:Images: High-quality images featuring ADL objects suitable for robotic manipulation.Annotations: Bounding boxes and class labels formatted in the YOLO (You Only Look Once) Darknet format.ClassesThe dataset focuses on objects commonly involved in daily living activities. A full list of object classes is provided in the classes.txt file.FormatImages: JPEG format.Annotations: Text files corresponding to each image, containing bounding box coordinates and class labels in YOLO Darknet format.How to Use the DatasetDownload the DatasetUnpack the Datasetunzip ADL_Object_Dataset.zipHow to Cite This DatasetIf you use this dataset in your research, please cite our paper:@article{shahria2024activities, title={Activities of Daily Living Object Dataset: Advancing Assistive Robotic Manipulation with a Tailored Dataset}, author={Shahria, Md Tanzil and Rahman, Mohammad H.}, journal={Sensors}, volume={24}, number={23}, pages={7566}, year={2024}, publisher={MDPI}}LicenseThis dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).License Link: https://creativecommons.org/licenses/by/4.0/By using this dataset, you agree to provide appropriate credit, indicate if changes were made, and not impose additional restrictions beyond those of the original licenses.AcknowledgmentsWe gratefully acknowledge the use of data from the following open-source datasets, which were instrumental in the creation of our specialized ADL object dataset:COCO Dataset: We thank the creators and contributors of the COCO dataset for making their images and annotations publicly available under the CC BY 4.0 license.Open Images Dataset: We express our gratitude to the Open Images team for providing a comprehensive dataset of annotated images under the CC BY 4.0 license.LVIS Dataset: We appreciate the efforts of the LVIS dataset creators for releasing their extensive dataset under the CC BY 4.0 license.Roboflow Universe:

  9. Physiological signals during activities for daily life: Dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Mar 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Gutierrez Maestro; Eduardo Gutierrez Maestro (2022). Physiological signals during activities for daily life: Dataset [Dataset]. http://doi.org/10.5281/zenodo.6391454
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 29, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eduardo Gutierrez Maestro; Eduardo Gutierrez Maestro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset used in this work is composed by four participants, two men and two women. Each of them carried the wearable device Empatica E4 for a total number of 15 days. They carried the wearable during the day, and during the nights we asked participants to charge and load the data into an external memory unit. During these days, participants were asked to answer EMA questionnaires which are used to label our data. However, some participants could not complete the full experiment or some days were discarded due to data corruption. Specific demographic information, total sampling days and total number of EMA answers can be found in table I.

    Participant 1Participant 2Participant 3Participant 4
    Age67556063
    GenderMaleFemaleMaleFemale

    Final Valid Days

    9151213
    Total EMAs42576446

    Table I. Summary of participants' collected data.

    This dataset provides three different type of labels. Activeness and happiness are two of these labels. These are the answers to EMA questionnaires that participants reported during their daily activities. These labels are numbers between 0 and 4.
    These labels are used to interpolate the mental well-being state according to [1] We report in our dataset a total number of eight emotional states: (1) pleasure, (2) excitement, (3) arousal, (4) distress, (5) misery, (6) depression, (7) sleepiness, and (8) contentment.

    The data we provide in this repository consist of two type of files:

    • CSV files: These files contain physiological signals recorded during the data collection process. The first line of each CSV file defines the timestamp by which data started being sampled. The second line defines the sampling frequency used for gathering the signal. From the third line until the end of the file, one can find sampled datapoints.
    • Excel files: These files contain the labels obtained from EMA answers. It is indicated the timestamp at which the answer was registered. Labels for pleasure, activeness and mood can be found in this file.

    NOTE: Files are numbered according to each specific sampling day. For example, ACC1.csv corresponds to the signal ACC for sampling day 1. The same applied to excel files.

    Code and a tutorial of how to labelled and extract features can be found in this repository: https://github.com/edugm94/temporal-feat-emotion-prediction

    References:

    [1] . A. Russell, “A circumplex model of affect,” Journal of personality and social psychology, vol. 39, no. 6, p. 1161, 1980

  10. f

    Data from: Count-Based Morgan Fingerprint: A More Efficient and...

    • acs.figshare.com
    xlsx
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shifa Zhong; Xiaohong Guan (2023). Count-Based Morgan Fingerprint: A More Efficient and Interpretable Molecular Representation in Developing Machine Learning-Based Predictive Regression Models for Water Contaminants’ Activities and Properties [Dataset]. http://doi.org/10.1021/acs.est.3c02198.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    ACS Publications
    Authors
    Shifa Zhong; Xiaohong Guan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In this study, we introduce the count-based Morgan fingerprint (C-MF) to represent chemical structures of contaminants and develop machine learning (ML)-based predictive models for their activities and properties. Compared with the binary Morgan fingerprint (B-MF), C-MF not only qualifies the presence or absence of an atom group but also quantifies its counts in a molecule. We employ six different ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost) to develop models on 10 contaminant-related data sets based on C-MF and B-MF to compare them in terms of the model’s predictive performance, interpretation, and applicability domain (AD). Our results show that C-MF outperforms B-MF in nine of 10 data sets in terms of model predictive performance. The advantage of C-MF over B-MF is dependent on the ML algorithm, and the performance enhancements are proportional to the difference in the chemical diversity of data sets calculated by B-MF and C-MF. Model interpretation results show that the C-MF-based model can elucidate the effect of atom group counts on the target and have a wider range of SHAP values. AD analysis shows that C-MF-based models have an AD similar to that of B-MF-based ones. Finally, we developed a “ContaminaNET” platform to deploy these C-MF-based models for free use.

  11. i

    ASCC Activities of Daily Living Dataset

    • ieee-dataport.org
    Updated Dec 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fei Liang (2023). ASCC Activities of Daily Living Dataset [Dataset]. https://ieee-dataport.org/documents/ascc-activities-daily-living-dataset
    Explore at:
    Dataset updated
    Dec 5, 2023
    Authors
    Fei Liang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Activities of daily living (ADLs) monitoring is essenitial in elderly field as it provides daily activities information for caregivers. In human daily life

  12. Time-Motion Data Set of Construction Work

    • zenodo.org
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olli Seppänen; Olli Seppänen; Christopher Görsch; Christopher Görsch (2024). Time-Motion Data Set of Construction Work [Dataset]. http://doi.org/10.5281/zenodo.13867877
    Explore at:
    Dataset updated
    Oct 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Olli Seppänen; Olli Seppänen; Christopher Görsch; Christopher Görsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This open-access dataset, provides a detailed time-motion study of construction work, specifically focusing on MEP (Mechanical, Electrical, and Plumbing) activities. The dataset is intended to facilitate research and analysis to improve operational efficiency and safety within the construction industry. It includes anonymized and pseudonymized data, ensuring privacy while still offering valuable insights into worker activities.

    Contents: (1)Time-motion study dataset: Captures categorized work activities by MEP workers at a second-to-second level. (2) Description of work activities: Provides detailed classifications of the tasks performed, allowing for in-depth analysis.

    This dataset has been made publicly available under the CC-BY-SA license, encouraging reuse and redistribution with proper attribution and share-alike terms. By downloading the dataset, users acknowledge and agree to comply with the terms outlined above.

    Funding and Support: This work has been supported by the “Hukka LVI- ja sähkötöissä” (Waste in Plumbing and Electrical Work) project, funded by STUL (Electrical Contractor Association), LVI-TU (HVAC Contractor Association), and STTA (Electrical Employers Union) from Finland.

    This comprehensive dataset offers valuable resources for research and analysis purposes. For further information or collaboration inquiries, feel free to reach out to discuss data collection methods and potential research partnerships: olli.seppanen@aalto.fi & christopher.gorsch@vtt.fi.

  13. i

    Data from: Nurse Care Activities Datasets: In laboratory and in real field

    • ieee-dataport.org
    Updated Jan 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paula Lago (2021). Nurse Care Activities Datasets: In laboratory and in real field [Dataset]. https://ieee-dataport.org/open-access/nurse-care-activities-datasets-laboratory-and-real-field
    Explore at:
    Dataset updated
    Jan 4, 2021
    Authors
    Paula Lago
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nurse Care Activity Recognition

  14. Evaluating Health Home Care Quality

    • kaggle.com
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Evaluating Health Home Care Quality [Dataset]. https://www.kaggle.com/datasets/thedevastator/evaluating-health-home-care-quality
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Evaluating Health Home Care Quality

    CMS Core Set and Health Home SPA Measures

    By Health Data New York [source]

    About this dataset

    This dataset provides comprehensive measures to evaluate the quality of medical services provided to Medicaid beneficiaries by Health Homes, including the Centers for Medicare & Medicaid Services (CMS) Core Set and Health Home State Plan Amendment (SPA). This allows us to gain insight into how well these health homes are performing in terms of delivering high-quality care. Our data sources include the Medicaid Data Mart, QARR Member Level Files, and New York State Delivery System Inform Incentive Program (DSRIP) Data Warehouse. With this data set you can explore essential indicators such as rates for indicators within scope of Core Set Measures, sub domains, domains and measure descriptions; age categories used; denominators of each measure; level of significance for each indicator; and more! By understanding more about Health Home Quality Measures from this resource you can help make informed decisions about evidence based health practices while also promoting better patient outcomes

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains measures that evaluate the quality of care delivered by Health Homes for the Centers for Medicare & Medicaid Services (CMS). With this dataset, you can get an overview of how a health home is performing in terms of quality. You can use this data to compare different health homes and their respective service offerings.

    The data used to create this dataset was collected from Medicaid Data Mart, QARR Member Level Files, and New York State Delivery System Incentive Program (DSRIP) Data Warehouse sources.

    In order to use this dataset effectively, you should start by looking at the columns provided. These include: Measurement Year; Health Home Name; Domain; Sub Domain; Measure Description; Age Category; Denominator; Rate; Level of Significance; Indicator. Each column provides valuable insight into how a particular health home is performing in various measurements of healthcare quality.

    When examining this data, it is important to remember that many variables are included in any given measure and that changes may have occurred over time due to varying factors such as population or financial resources available for healthcare delivery. Furthermore, changes in policy may also affect performance over time so it is important to take these things into account when evaluating the performance of any given health home from one year to the next or when comparing different health homes on a specific measure or set of indicators over time

    Research Ideas

    • Using this dataset, state governments can evaluate the effectiveness of their health home programs by comparing the performance across different domains and subdomains.
    • Healthcare providers and organizations can use this data to identify areas for improvement in quality of care provided by health homes and strategies to reduce disparities between individuals receiving care from health homes.
    • Researchers can use this dataset to analyze how variations in cultural context, geography, demographics or other factors impact delivery of quality health home services across different locations

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: health-home-quality-measures-beginning-2013-1.csv | Column name | Description | |:--------------------------|:----------------------------------------------------| | Measurement Year | The year in which the data was collected. (Integer) | | Health Home Name | The name of the health home. (String) | | Domain | The domain of the measure. (String) | | Sub Domain | The sub domain of the measure. (String) | | Measure Description | A description of the measure. (String) | | Age Category | The age category of the patient. (String) | | Denominator | The denominator of the measure. (Integer) | | Rate | The rate of the measure. (Float) | | Level of Significance | The level of significance of the measure. (String) | | Indicator | The indicator of the measure. (String) |

    Acknowledgements

    ...

  15. International Data & Economic Analysis (IDEA)

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). International Data & Economic Analysis (IDEA) [Dataset]. https://catalog.data.gov/dataset/international-data-economic-analysis-idea
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    International Data & Economic Analysis (IDEA) is USAID's comprehensive source of economic and social data and analysis. IDEA brings together over 12,000 data series from over 125 sources into one location for easy access by USAID and its partners through the USAID public website. The data are broken down by countries, years and the following sectors: Economy, Country Ratings and Rankings, Trade, Development Assistance, Education, Health, Population, and Natural Resources. IDEA regularly updates the database as new data become available. Examples of IDEA sources include the Demographic and Health Surveys, STATcompiler; UN Food and Agriculture Organization, Food Price Index; IMF, Direction of Trade Statistics; Millennium Challenge Corporation; and World Bank, World Development Indicators. The database can be queried by navigating to the site displayed in the Home Page field below.

  16. U

    Time Diary Study (CAPS-DIARY module)

    • dataverse-staging.rdmc.unc.edu
    • datasearch.gesis.org
    Updated May 18, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNC Dataverse (2009). Time Diary Study (CAPS-DIARY module) [Dataset]. https://dataverse-staging.rdmc.unc.edu/dataset.xhtml?persistentId=hdl:1902.29/CAPS-DIARY
    Explore at:
    tsv(68411), application/x-sas-transport(237840), application/x-spss-por(75276), application/x-sas-transport(242160), application/x-spss-por(75850), application/x-sas-transport(240000), txt(70468), application/x-spss-por(74374), application/x-spss-por(77572), tsv(65433), txt(452140), txt(91461), application/x-sas-transport(1613120), application/x-spss-por(75358), txt(135850), txt(237380), application/x-spss-por(392206), txt(219960), txt(223730), txt(243880), application/x-sas-transport(945520), txt(437710), txt(447330), application/x-sas-transport(235680), txt(239720), tsv(65759), tsv(66745), txt(134420), txt(198510), txt(231010), application/x-spss-por(75522), text/x-sas-syntax(14192), tsv(66377), application/x-spss-por(75686), txt(218140), txt(247000), txt(229190), txt(456950), tsv(67095), txt(209820), txt(29480), txt(234130), text/x-sas-syntax(14213), tsv(67582), txt(223990), txt(227110), txt(432900), application/x-spss-por(74702), application/x-spss-por(76506), txt(248950), application/x-spss-por(75768), txt(132990), text/x-sas-syntax(14212), tsv(66338), tsv(65479), txt(442520), txt(133120), txt(220870), text/x-sas-syntax(14200), tsv(515401), txt(130390), txt(222560), txt(217100), txt(246350), tsv(66085), txt(461760), application/x-spss-por(76260), tsv(66939), txt(235560), txt(229450), txt(72104), tsv(66400), txt(211510), txt(226850), application/x-spss-por(492492), txt(205790), txt(210210), tsv(66217), tsv(66157), txt(234390), application/x-spss-por(75112), application/x-spss-por(75932), txt(224770), application/x-spss-por(74784), tsv(66192), txt(131560), txt(230100), txt(219050), tsv(382593), txt(213980), tsv(66604), txt(140140)Available download formats
    Dataset updated
    May 18, 2009
    Dataset provided by
    UNC Dataverse
    License

    https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/CAPS-DIARYhttps://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/CAPS-DIARY

    Description

    The purpose of this project is to determine how college students distribute their activities in time (with a particular focus on academic and athletic activities) and to examine the factors that influence such distributions.Each R reported once about each of the seven days of the week and an additional time about either Saturday or Sunday. Rs were told the week before they were to report which day was assigned and were given a report form to complete during that day. They entered the i nformation from that form when they returned the next week.The activity codes included were: 0: Sleeping. 1: Attending classes. 2: Studying or preparing classroom assignments. 3: Working at a jog (including CAPS). 4: Cooking, home chores, laundry, grocery shopping. 5: Errands, non-grocery shopping, gardening, animal care. 6: Eating. 7: Bathing, getting dressed, etc. 8: Sports, exercising, other physical activities. 9: Playing competitive games (cards, darts, videogames, frisbee, chess, Tr ivial Pursuit, etc.). 10: Participating in UNC-sponsored organizations (student government, band, sorority, etc.). 11: Listening to the radio. 12: Watching TV. 13: Reading for pleasure (not studying or reading for class). 14: Going to a movie. 15: Attending a cultural event (such as a play, concert, or museum). 16: Attending a sports event as a spectator. 17: Partying. 18: Religious activities. 19: Conversation. 20: Travel. 21: Resting. 22: Doing other things DIARY1-8: These datasets contain a matrix of activities by times for a particular day. Included is time period, activity code (see above), # of friends present, # of others present. (Rs were allowed to report doing two activities at once. In these cases they were also asked to report the % of time during the time period affected which was allocated to the first of the two activities listed.)THE DIARY DATASETS ARE STORED IN RAW FORM. SUMMARY FILES, CALLED TIMEREP, CONTAIN MOST SUMMA RY INFORMATION WHICH MIGHT BE USED IN ANALYSES. THE DIARY DATASETS CAN BE LISTED TO ALLOW UNIQUE CODING OF THE ORIGINAL DATA. Each R reported once about each of the seven days of the week and an additional time about either Saturday or Sunday.TIMEREP: The TIMEREP dataset is a summary file which gives the amount of time spent on each activity during each of the eight reporting periods and also includes more detailed information about many of the activities from follow-up questions which were asked if the respondent reported having engaged in certain activities. Data from additional questions asked of every respondent after each diary entry are also included: contact with family members, number of alcoholic drinks consumed during the 24 hour period reported on, number of friends and others present while drinking, number of cigarettes smoked on day reported about, and number of classes skipped on day reported about. Follow-up questions include detail about kind of physical activity or sports participation, kind of university organization, kind of radio program listened to and place of listening, kind of TV program watched and place of watching, kind of reading material read and topic, alcohol consumed while partying and place of partying, conversation topics, kind of travel, activities included in 'other' category.Special processing is required to put the dataset into SAS format. See spec for details.

  17. 4

    Activities of daily living of several individuals

    • data.4tu.nl
    • datasetcatalog.nlm.nih.gov
    • +2more
    zip
    Updated Nov 3, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Sztyler; J. (Josep) Carmona (2015). Activities of daily living of several individuals [Dataset]. http://doi.org/10.4121/uuid:01eaba9f-d3ed-4e04-9945-b8b302764176
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 3, 2015
    Dataset provided by
    University of Mannheim, Germany
    Authors
    Timo Sztyler; J. (Josep) Carmona
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    This dataset comprises event logs (XES = Extensible Event Stream) regarding the activities of daily living performed by several individuals. The event logs were derived from sensor data which was collected in different scenarios and represent activities of daily living performed by several individuals. These include e.g., sleeping, meal preparation, and washing. The event logs show the different behavior of people in their own homes but also common patterns. The attached event logs were created with Fluxicon Disco ({http://fluxicon.com/disco/}).

  18. w

    Dataset of books in the Career ideas for kids series series

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books in the Career ideas for kids series series [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_series&fop0=%3D&fval0=Career+ideas+for+kids+series&j=1&j0=book_series
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 2 rows and is filtered where the book series is Career ideas for kids series. It features 9 columns including author, publication date, language, and book publisher.

  19. Z

    Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
    Explore at:
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Haak, Fabian
    Schaer, Philipp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  20. F

    English Chain of Thought Prompt & Response Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/english-chain-of-thought-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the English Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.

    Dataset Content:

    This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the English language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.

    Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native English speaking people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.

    Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.

    Prompt Diversity:

    To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.

    These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.

    Response Formats:

    To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.

    These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled English Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.

    Quality and Accuracy:

    Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

    The English version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy English Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Work With Data (2024). Dataset of books series that contain Bright ideas and how to have them [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Bright+ideas+and+how+to+have+them&j=1&j0=books

Dataset of books series that contain Bright ideas and how to have them

Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is about book series. It has 1 row and is filtered where the books is Bright ideas and how to have them. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

Search
Clear search
Close search
Google apps
Main menu