6 datasets found
  1. CORD-19 Dataset v2020

    • kaggle.com
    Updated Oct 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SMLRA-KJSCE (2020). CORD-19 Dataset v2020 [Dataset]. https://www.kaggle.com/datasets/smlrakjsce/cord19-dataset-v2020/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SMLRA-KJSCE
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Open-Ended track where your team can build anything using the dataset provided by us

    Dataset Description In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

    Call to Action We are issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

    Many of the questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.We are maintaining a summary of the community's contributions.

    Acknowledgements We wouldn't be here without the help of others. The datset is a subset of the dataset available at AI2's Semantic Scholar - https://pages.semanticscholar.org/coronavirus-research This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy. Dataset The dataset is in tar.gz format and can be downloaded from - https://drive.google.com/file/d/15SV8_Nc1HECN9uaplDSQx7H1yKFR4F_Z/view?usp=sharing

    Submissions Notebook and Output results are expected as appropriate submissions.

  2. Data from: The InnoGraph Artificial Intelligence Taxonomy

    • zenodo.org
    csv
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vladimir Alexiev; Vladimir Alexiev; Boyan Bechev; Boyan Bechev (2025). The InnoGraph Artificial Intelligence Taxonomy [Dataset]. http://doi.org/10.5281/zenodo.15113095
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vladimir Alexiev; Vladimir Alexiev; Boyan Bechev; Boyan Bechev
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The AI Innovation Taxonomy is a structured vocabulary of 7,490 distinct AI-related concepts, systematically categorized to cover various aspects of Artificial Intelligence. Each concept within the ontology is associated with a unique identifier, preferred labels, alternate labels, broader concepts, and detailed descriptions, facilitating precise semantic annotations and topic categorization. Sample topics include widely recognized areas such as Natural Language Processing, Artificial Intelligence, Machine Translation, Knowledge Representation and Reasoning, Computational Linguistics, Data Mining, Data Science, Text Mining, and Textual Entailment. This ontology provides a robust semantic foundation for accurately annotating, filtering, and categorizing AI-related content, thus supporting consistent and effective topic extraction methodologies.

    The AI Innovation Taxonomy is developed as part of the research project enrichMyData, specifically the InnoGraph business case that builds a holistic knowledge graph of innovation based on Artificial Intelligence (AI), and more generally of the global “hitech” ecosystem. It has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070284.

    Publication: The InnoGraph Artificial Intelligence Taxonomy: A Key to Unlocking AI-Related Entities and Content. Alexiev, V.; Bechev, B.; and Osytsin, A. White paper (Technical Report). Ontotext Corp, December 2023.

  3. w

    Data from: Inventory of Mineral Properties in Chelan County, Washington

    • data.wu.ac.at
    pdf
    Updated May 17, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arizona Geological Survey (2013). Inventory of Mineral Properties in Chelan County, Washington [Dataset]. https://data.wu.ac.at/odso/data_gov/Y2I1ZTM1YzctNDQzMi00MGNiLWIyYzktMzNlY2RmMzVhMzU4
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 17, 2013
    Dataset provided by
    Arizona Geological Survey
    Area covered
    0e1e862d26baaff875e93d90abadefbcfce4fd65
    Description

    Inventory of Mineral Properties in Chelan County, Washington, Report of Investigations 9. Since late in the last century reports dealing with mineral resources and individual mineral properties in many parts of Chelan County have been written and many of these have been published in technical papers, mining journals, and in the publications of several State and Federal agencies. This summary report is a compilation of all such information available to this office, with additional data obtained from field investigations. Sources of information have not been indicated for individual properties, but reference is here made to the bibliography on page 57 for such sources. It should be noted however, that the bibliography does not include references to articles in mining periodicals.

  4. White Wine Quality

    • kaggle.com
    Updated Sep 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Agnihotri (2020). White Wine Quality [Dataset]. https://www.kaggle.com/piyushagni5/white-wine-quality/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2020
    Dataset provided by
    Kaggle
    Authors
    Piyush Agnihotri
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, refer to [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

    Content

    For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

    Acknowledgements

    This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality, to get both the dataset i.e. red and white vinho verde wine samples, from the north of Portugal, please visit the above link.

    Please include this citation if you plan to use this database:

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

    Inspiration

    We kagglers can apply several machine-learning algorithms to determine which physiochemical properties make a wine 'good'!

    Relevant papers

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

  5. Replication package for the paper: "Technical Debt's State of Practice on...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eliakim Gama; Emmanuel Sávio S. Freire; Matheus Paixao; Mariela Inés Cortés; Eliakim Gama; Emmanuel Sávio S. Freire; Matheus Paixao; Mariela Inés Cortés (2020). Replication package for the paper: "Technical Debt's State of Practice on Stack Overflow: a Preliminary Study" [Dataset]. http://doi.org/10.5281/zenodo.3383148
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eliakim Gama; Emmanuel Sávio S. Freire; Matheus Paixao; Mariela Inés Cortés; Eliakim Gama; Emmanuel Sávio S. Freire; Matheus Paixao; Mariela Inés Cortés
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the replication package for the paper "Technical Debt’s State of Practice on Stack Overflow: a Preliminary Study", published (in Portuguese) in the preliminary results track of SBQS, the Brazilian Symposium on Software Quality.

    We provide the data for all steps of our methodology and final analysis. Each file is numbered, indicating the order in which they were produced in our study.

  6. G

    Mineral Industry Report 1975

    • canwin-datahub.ad.umanitoba.ca
    • data.urbandatacentre.ca
    • +3more
    html
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Yukon (2024). Mineral Industry Report 1975 [Dataset]. https://canwin-datahub.ad.umanitoba.ca/data/dataset/33597bac-f3f5-4020-58f2-8dc80394314f
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Oct 30, 2024
    Dataset provided by
    Government of Yukon
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    This report is a review of the Yukon mineral industry for 1975 by the Geology Section, Northern Natural Resources and Environment Branch, Department of Indian and Northern Affairs. It includes descriptions of work conducted on mineral claims by individuals and mineral exploration companies and operating summaries of the several producing mines in the Yukon. It also contains technical papers on select properties. Information in this report was obtained from visits to mineral properties, from personal communication with individuals and from technical reports, trade journals, newspapers, publications of the Geological Survey of Canada and the monthly reports of the District Mining Recorders. A list of assessment reports, both confidential and those available for inspection, is included in the list of Technical Reports. In this report, activities of the mineral industry are divided into lode mining and exploration, coal mining and exploration and placer mining.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SMLRA-KJSCE (2020). CORD-19 Dataset v2020 [Dataset]. https://www.kaggle.com/datasets/smlrakjsce/cord19-dataset-v2020/discussion
Organization logo

CORD-19 Dataset v2020

CORD-19 Dataset with only 2020 research papers

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SMLRA-KJSCE
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Open-Ended track where your team can build anything using the dataset provided by us

Dataset Description In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Call to Action We are issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

Many of the questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.We are maintaining a summary of the community's contributions.

Acknowledgements We wouldn't be here without the help of others. The datset is a subset of the dataset available at AI2's Semantic Scholar - https://pages.semanticscholar.org/coronavirus-research This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy. Dataset The dataset is in tar.gz format and can be downloaded from - https://drive.google.com/file/d/15SV8_Nc1HECN9uaplDSQx7H1yKFR4F_Z/view?usp=sharing

Submissions Notebook and Output results are expected as appropriate submissions.

Search
Clear search
Close search
Google apps
Main menu