http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Dataset Description In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.
Call to Action We are issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.
Many of the questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.We are maintaining a summary of the community's contributions.
Acknowledgements We wouldn't be here without the help of others. The datset is a subset of the dataset available at AI2's Semantic Scholar - https://pages.semanticscholar.org/coronavirus-research This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy. Dataset The dataset is in tar.gz format and can be downloaded from - https://drive.google.com/file/d/15SV8_Nc1HECN9uaplDSQx7H1yKFR4F_Z/view?usp=sharing
Submissions Notebook and Output results are expected as appropriate submissions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The AI Innovation Taxonomy is a structured vocabulary of 7,490 distinct AI-related concepts, systematically categorized to cover various aspects of Artificial Intelligence. Each concept within the ontology is associated with a unique identifier, preferred labels, alternate labels, broader concepts, and detailed descriptions, facilitating precise semantic annotations and topic categorization. Sample topics include widely recognized areas such as Natural Language Processing, Artificial Intelligence, Machine Translation, Knowledge Representation and Reasoning, Computational Linguistics, Data Mining, Data Science, Text Mining, and Textual Entailment. This ontology provides a robust semantic foundation for accurately annotating, filtering, and categorizing AI-related content, thus supporting consistent and effective topic extraction methodologies.
The AI Innovation Taxonomy is developed as part of the research project enrichMyData, specifically the InnoGraph business case that builds a holistic knowledge graph of innovation based on Artificial Intelligence (AI), and more generally of the global “hitech” ecosystem. It has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070284.
Publication: The InnoGraph Artificial Intelligence Taxonomy: A Key to Unlocking AI-Related Entities and Content. Alexiev, V.; Bechev, B.; and Osytsin, A. White paper (Technical Report). Ontotext Corp, December 2023.
Inventory of Mineral Properties in Chelan County, Washington, Report of Investigations 9. Since late in the last century reports dealing with mineral resources and individual mineral properties in many parts of Chelan County have been written and many of these have been published in technical papers, mining journals, and in the publications of several State and Federal agencies. This summary report is a compilation of all such information available to this office, with additional data obtained from field investigations. Sources of information have not been indicated for individual properties, but reference is here made to the bibliography on page 57 for such sources. It should be noted however, that the bibliography does not include references to articles in mining periodicals.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, refer to [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)
This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality, to get both the dataset i.e. red and white vinho verde wine samples, from the north of Portugal, please visit the above link.
Please include this citation if you plan to use this database:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
We kagglers can apply several machine-learning algorithms to determine which physiochemical properties make a wine 'good'!
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the replication package for the paper "Technical Debt’s State of Practice on Stack Overflow: a Preliminary Study", published (in Portuguese) in the preliminary results track of SBQS, the Brazilian Symposium on Software Quality.
We provide the data for all steps of our methodology and final analysis. Each file is numbered, indicating the order in which they were produced in our study.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This report is a review of the Yukon mineral industry for 1975 by the Geology Section, Northern Natural Resources and Environment Branch, Department of Indian and Northern Affairs. It includes descriptions of work conducted on mineral claims by individuals and mineral exploration companies and operating summaries of the several producing mines in the Yukon. It also contains technical papers on select properties. Information in this report was obtained from visits to mineral properties, from personal communication with individuals and from technical reports, trade journals, newspapers, publications of the Geological Survey of Canada and the monthly reports of the District Mining Recorders. A list of assessment reports, both confidential and those available for inspection, is included in the list of Technical Reports. In this report, activities of the mineral industry are divided into lode mining and exploration, coal mining and exploration and placer mining.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Dataset Description In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.
Call to Action We are issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.
Many of the questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights on these questions.We are maintaining a summary of the community's contributions.
Acknowledgements We wouldn't be here without the help of others. The datset is a subset of the dataset available at AI2's Semantic Scholar - https://pages.semanticscholar.org/coronavirus-research This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, IBM, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy. Dataset The dataset is in tar.gz format and can be downloaded from - https://drive.google.com/file/d/15SV8_Nc1HECN9uaplDSQx7H1yKFR4F_Z/view?usp=sharing
Submissions Notebook and Output results are expected as appropriate submissions.