6 datasets found

CORD-19 Dataset v2020
kaggle.com
Updated Oct 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SMLRA-KJSCE (2020). CORD-19 Dataset v2020 [Dataset]. https://www.kaggle.com/datasets/smlrakjsce/cord19-dataset-v2020/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SMLRA-KJSCE
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Open-Ended track where your team can build anything using the dataset provided by us

Dataset Description In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Call to Action We are issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

Many of the questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.We are maintaining a summary of the community's contributions.

Acknowledgements We wouldn't be here without the help of others. The datset is a subset of the dataset available at AI2's Semantic Scholar - https://pages.semanticscholar.org/coronavirus-research This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy. Dataset The dataset is in tar.gz format and can be downloaded from - https://drive.google.com/file/d/15SV8_Nc1HECN9uaplDSQx7H1yKFR4F_Z/view?usp=sharing

Submissions Notebook and Output results are expected as appropriate submissions.
Data from: The InnoGraph Artificial Intelligence Taxonomy
zenodo.org
csv
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vladimir Alexiev; Vladimir Alexiev; Boyan Bechev; Boyan Bechev (2025). The InnoGraph Artificial Intelligence Taxonomy [Dataset]. http://doi.org/10.5281/zenodo.15113095
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15113095
Dataset updated
Apr 1, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vladimir Alexiev; Vladimir Alexiev; Boyan Bechev; Boyan Bechev
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The AI Innovation Taxonomy is a structured vocabulary of 7,490 distinct AI-related concepts, systematically categorized to cover various aspects of Artificial Intelligence. Each concept within the ontology is associated with a unique identifier, preferred labels, alternate labels, broader concepts, and detailed descriptions, facilitating precise semantic annotations and topic categorization. Sample topics include widely recognized areas such as Natural Language Processing, Artificial Intelligence, Machine Translation, Knowledge Representation and Reasoning, Computational Linguistics, Data Mining, Data Science, Text Mining, and Textual Entailment. This ontology provides a robust semantic foundation for accurately annotating, filtering, and categorizing AI-related content, thus supporting consistent and effective topic extraction methodologies.

The AI Innovation Taxonomy is developed as part of the research project enrichMyData, specifically the InnoGraph business case that builds a holistic knowledge graph of innovation based on Artificial Intelligence (AI), and more generally of the global “hitech” ecosystem. It has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070284.

Publication: The InnoGraph Artificial Intelligence Taxonomy: A Key to Unlocking AI-Related Entities and Content. Alexiev, V.; Bechev, B.; and Osytsin, A. White paper (Technical Report). Ontotext Corp, December 2023.
w
Data from: Inventory of Mineral Properties in Chelan County, Washington
data.wu.ac.at
pdf
Updated May 17, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arizona Geological Survey (2013). Inventory of Mineral Properties in Chelan County, Washington [Dataset]. https://data.wu.ac.at/odso/data_gov/Y2I1ZTM1YzctNDQzMi00MGNiLWIyYzktMzNlY2RmMzVhMzU4
Explore at:
pdfAvailable download formats
Dataset updated
May 17, 2013
Dataset provided by
Arizona Geological Survey
Area covered
0e1e862d26baaff875e93d90abadefbcfce4fd65
Description
Inventory of Mineral Properties in Chelan County, Washington, Report of Investigations 9. Since late in the last century reports dealing with mineral resources and individual mineral properties in many parts of Chelan County have been written and many of these have been published in technical papers, mining journals, and in the publications of several State and Federal agencies. This summary report is a compilation of all such information available to this office, with additional data obtained from field investigations. Sources of information have not been indicated for individual properties, but reference is here made to the bibliography on page 57 for such sources. It should be noted however, that the bibliography does not include references to articles in mining periodicals.
White Wine Quality
kaggle.com
Updated Sep 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piyush Agnihotri (2020). White Wine Quality [Dataset]. https://www.kaggle.com/piyushagni5/white-wine-quality/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2020
Dataset provided by
Kaggle
Authors
Piyush Agnihotri
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, refer to [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Content

For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Acknowledgements

This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality, to get both the dataset i.e. red and white vinho verde wine samples, from the north of Portugal, please visit the above link.

Please include this citation if you plan to use this database:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Inspiration

We kagglers can apply several machine-learning algorithms to determine which physiochemical properties make a wine 'good'!

Relevant papers

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
Replication package for the paper: "Technical Debt's State of Practice on...
zenodo.org
explore.openaire.eu
+1more
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eliakim Gama; Emmanuel Sávio S. Freire; Matheus Paixao; Mariela Inés Cortés; Eliakim Gama; Emmanuel Sávio S. Freire; Matheus Paixao; Mariela Inés Cortés (2020). Replication package for the paper: "Technical Debt's State of Practice on Stack Overflow: a Preliminary Study" [Dataset]. http://doi.org/10.5281/zenodo.3383148
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3383148
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eliakim Gama; Emmanuel Sávio S. Freire; Matheus Paixao; Mariela Inés Cortés; Eliakim Gama; Emmanuel Sávio S. Freire; Matheus Paixao; Mariela Inés Cortés
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the replication package for the paper "Technical Debt’s State of Practice on Stack Overflow: a Preliminary Study", published (in Portuguese) in the preliminary results track of SBQS, the Brazilian Symposium on Software Quality.

We provide the data for all steps of our methodology and final analysis. Each file is numbered, indicating the order in which they were produced in our study.
G
Mineral Industry Report 1975
canwin-datahub.ad.umanitoba.ca
data.urbandatacentre.ca
+3more
html
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Yukon (2024). Mineral Industry Report 1975 [Dataset]. https://canwin-datahub.ad.umanitoba.ca/data/dataset/33597bac-f3f5-4020-58f2-8dc80394314f
Explore at:
htmlAvailable download formats
Dataset updated
Oct 30, 2024
Dataset provided by
Government of Yukon
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
This report is a review of the Yukon mineral industry for 1975 by the Geology Section, Northern Natural Resources and Environment Branch, Department of Indian and Northern Affairs. It includes descriptions of work conducted on mineral claims by individuals and mineral exploration companies and operating summaries of the several producing mines in the Yukon. It also contains technical papers on select properties. Information in this report was obtained from visits to mineral properties, from personal communication with individuals and from technical reports, trade journals, newspapers, publications of the Geological Survey of Canada and the monthly reports of the District Mining Recorders. A list of assessment reports, both confidential and those available for inspection, is included in the list of Technical Reports. In this report, activities of the mineral industry are divided into lode mining and exploration, coal mining and exploration and placer mining.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

SMLRA-KJSCE (2020). CORD-19 Dataset v2020 [Dataset]. https://www.kaggle.com/datasets/smlrakjsce/cord19-dataset-v2020/discussion

CORD-19 Dataset v2020

CORD-19 Dataset with only 2020 research papers

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 18, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

SMLRA-KJSCE

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Open-Ended track where your team can build anything using the dataset provided by us

Dataset Description In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Call to Action We are issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

Many of the questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.We are maintaining a summary of the community's contributions.

Acknowledgements We wouldn't be here without the help of others. The datset is a subset of the dataset available at AI2's Semantic Scholar - https://pages.semanticscholar.org/coronavirus-research This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy. Dataset The dataset is in tar.gz format and can be downloaded from - https://drive.google.com/file/d/15SV8_Nc1HECN9uaplDSQx7H1yKFR4F_Z/view?usp=sharing

Submissions Notebook and Output results are expected as appropriate submissions.

Clear search

Close search

Google apps

Main menu

CORD-19 Dataset v2020

Open-Ended track where your team can build anything using the dataset provided by us

Data from: The InnoGraph Artificial Intelligence Taxonomy

Data from: Inventory of Mineral Properties in Chelan County, Washington

White Wine Quality

Context

Content

Acknowledgements

Inspiration

Relevant papers

Replication package for the paper: "Technical Debt's State of Practice on...

Mineral Industry Report 1975

CORD-19 Dataset v2020See More Versions

CORD-19 Dataset with only 2020 research papers

Open-Ended track where your team can build anything using the dataset provided by us

CORD-19 Dataset v2020