Facebook
TwitterThe testing dataset used at TRECVID for the DSDI task in 2020-2022.The dataset includes public videos, ground truth and features of the DSDI task. As the task is continuing, the dataset will be continually updated.There are 32 features across 5 main categories (Environment, Vehicles, Water, Infrastructure, Damage). All videos are airborne low altitude from natural disaster events.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The provided dataset is extracted from yahoo finance using pandas and yahoo finance library in python. This deals with stock market index of the world best economies. The code generated data from Jan 01, 2003 to Jun 30, 2023 that’s more than 20 years. There are 18 CSV files, dataset is generated for 16 different stock market indices comprising of 7 different countries. Below is the list of countries along with number of indices extracted through yahoo finance library, while two CSV files deals with annualized return and compound annual growth rate (CAGR) has been computed from the extracted data.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F90ce8a986761636e3edbb49464b304d8%2FNumber%20of%20Index.JPG?generation=1688490342207096&alt=media" alt="">
This dataset is useful for research purposes, particularly for conducting comparative analyses involving capital market performance and could be used along with other economic indicators.
There are 18 distinct CSV files associated with this dataset. First 16 CSV files deals with number of indices and last two CSV file deals with annualized return of each year and CAGR of each index. If data in any column is blank, it portrays that index was launch in later years, for instance: Bse500 (India), this index launch in 2007, so earlier values are blank, similarly China_Top300 index launch in year 2021 so early fields are blank too.
The extraction process involves applying different criteria, like in 16 CSV files all columns are included, Adj Close is used to calculate annualized return. The algorithm extracts data based on index name (code given by the yahoo finance) according start and end date.
Annualized return and CAGR has been calculated and illustrated in below image along with machine readable file (CSV) attached to that.
To extract the data provided in the attachment, various criteria were applied:
Content Filtering: The data was filtered based on several attributes, including the index name, start and end date. This filtering process ensured that only relevant data meeting the specified criteria.
Collaborative Filtering: Another filtering technique used was collaborative filtering using yahoo finance, which relies on index similarity. This approach involves finding indices that are similar to other index or extended dataset scope to other countries or economies. By leveraging this method, the algorithm identifies and extracts data based on similarities between indices.
In the last two CSV files, one belongs to annualized return, that was calculated based on the Adj close column and new DataFrame created to store its outcome. Below is the image of annualized returns of all index (if unreadable, machine-readable or CSV format is attached with the dataset).
As far as annualised rate of return is concerned, most of the time India stock market indices leading, followed by USA, Canada and Japan stock market indices.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F37645bd90623ea79f3708a958013c098%2FAnnualized%20Return.JPG?generation=1688525901452892&alt=media" alt="">
The best performing index based on compound growth is Sensex (India) that comprises of top 30 companies is 15.60%, followed by Nifty500 (India) that is 11.34% and Nasdaq (USA) all is 10.60%.
The worst performing index is China top300, however this is launch in 2021 (post pandemic), so would not possible to examine at that stage (due to less data availability). Furthermore, UK and Russia indices are also top 5 in the worst order.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F58ae33f60a8800749f802b46ec1e07e7%2FCAGR.JPG?generation=1688490409606631&alt=media" alt="">
Geography: Stock Market Index of the World Top Economies
Time period: Jan 01, 2003 – June 30, 2023
Variables: Stock Market Index Title, Open, High, Low, Close, Adj Close, Volume, Year, Month, Day, Yearly_Return and CAGR
File Type: CSV file
This is not a financial advice; due diligence is required in each investment decision.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Indexing Magic Cards is a dataset for object detection tasks - it contains Magic Cards annotations for 297 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please use the MESINESP2 corpus (the second edition of the shared-task) since it has a higher level of curation, quality and is organized by document type (scientific articles, patents and clinical trials).
Introduction
The Mesinesp (Spanish BioASQ track, see https://temu.bsc.es/mesinesp) development set has a total of 750 records indexed manually by seven experienced medical literature indexers. Indexing is done using DeCS codes, a sort of Spanish equivalent to MeSH terms. Records were distributed in a way that each article was annotated, at least, by two different human indexers.
The data annotation process consisted in two steps:
Manual indexing step. DeCS codes were manually assigned to each record following the DeCS manual indexing guidelines.
Manual validation and consensus. The joined set of manually indexed DeCS codes generated by both indexers were manually revised and corrections were done.
These annotations were analyzed, resulting in an agreement using the Jaccard index.
Records consisted basically in medical literature abstracts and titles from the IBECS and LILACS databases.
Zip structure The zip file contains two different development sets:
Official development set, which has the union of the annotations, with an agreement of macro = 0.6568 and micro = 0.6819. This set is composed by all the different (unique) DeCS codes that have been added by any annotator for each document; and
Core-descriptors development set, which has the intersection of the annotations, with an agreement of macro = 1.0 and micro = 1.0. This set is composed of the common DeCS codes that have been added by two or more annotators for each document.
Corpus format
Each dataset is a JSON object with one single key named "articles", which contains a list of documents. So, the raw format of the file is one line per document plus two additional lines (the first and the last) to enclose that list of documents and the expected type of data is as follows:
{"articles":[ {"abstractText":str,"db":str,"decsCodes":list,"id":str,"journal":str,"title":str,"year":int}, ... ]}
To clarify, the order of appearance of the fields in each document is as follows (note that this example it is pretty printed for readability purposes):
{ "articles": [ { "abstractText": "Content of the abstract", "db": "Name of the source database", "decsCodes": [ "code1", "code2", "code3" ], "id": "Id of the document", "journal": "Name of the journal", "title": "Title of the document", "year": 2019 } ] }
Note: The fields "db", "journal" and "year" might be null.
Copyright (c) 2020 Secretaría de Estado de Digitalización e Inteligencia Artificial
Facebook
TwitterThis database is not owned by me, I uploaded this merely to make importing it to kaggle kernels more convenient. I don't have any responsibility for maintaining this dataset, and all rights are reserved for the original author(s)
All data is downloaded from https://datacatalog.worldbank.org/dataset/human-capital-index
For documentation, please see https://datacatalog.worldbank.org/dataset/human-capital-index
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset "PLANTAS" (“Historia de las plantas”, Vol.1) were written using a quill-pen by Bernardo de Cienfuegos, one of the most outstanding Spanish botanists in the XVII century. The book was writing mainly in Spanish, but a significant number of words and full sentences are in Latin and many other languages. The originals of PLANTAS are currently available at the "Biblioteca Nacional de España", and a digital reproduction of it can be found at the "Biblioteca Digital Hispánica" (http://bdh-rd.bne.es/viewer.vm?id=0000140162). In this dataset, only the first volume of PLANTAS (Mss 3357, with 1,035 pages and around 20,000 handwritten text lines) was considered.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Name: City Happiness Index
Dataset Description:
This dataset and the related codes are entirely prepared, original, and exclusive by Emirhan BULUT. The dataset includes crucial features and measurements from various cities around the world, focusing on factors that may affect the overall happiness score of each city. By analyzing these factors, we aim to gain insights into the living conditions and satisfaction of the population in urban environments.
The dataset consists of the following features:
With these features, the dataset aims to analyze and understand the relationship between various urban factors and the happiness of a city's population. The developed Deep Q-Network model, PIYAAI_2, is designed to learn from this data to provide accurate predictions in future scenarios. Using Reinforcement Learning, the model is expected to improve its performance over time as it learns from new data and adapts to changes in the environment.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A CSV dataset containing the number of references of each bibliographic entity identified by an OMID in the OpenCitations Index (https://opencitations.net/index).The dataset is based on the last release of the OpenCitations Index (https://opencitations.net/download) – November 2023. The size of the zipped archive is 0.35 GB, while the size of the unzipped CSV file is 1.7 GB.The CSV dataset contains the reference count of 71,805,806 bibliographic entities. The first column (omid) lists the entities, while the second column (references) indicates the corresponding number of incoming citations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dallas Fed Manufacturing Shipments Index in the United States increased to 15.10 points in November from 5.80 points in October of 2025. This dataset includes a chart with historical data for the United States Dallas Fed Manufacturing Shipments Index.
Facebook
TwitterAn Environmental Quality Index (EQI) for all counties in the United States for the time period 2000-2005 was developed which incorporated data from five environmental domains: air, water, land, built, and socio-demographic. The EQI was developed in four parts: domain identification; data source identification and review; variable construction; and data reduction using principal components analysis (PCA). The methods applied provide a reproducible approach that capitalizes almost exclusively on publically-available data sources. The primary goal in creating the EQI is to use it as a composite environmental indicator for research on human health. A series of peer reviewed manuscripts utilized the EQI in examining health outcomes. This dataset is not publicly accessible because: This series of papers are considered Human health research - not to be loaded onto ScienceHub. It can be accessed through the following means: The EQI data can be accessed at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: EQI data, metadata, formats, and data dictionary all available at website. This dataset is associated with the following publications: Gray, C., L. Messer, K. Rappazzo, J. Jagai, S. Grabich, and D. Lobdell. The association between physical inactivity and obesity is modified by five domains of environmental quality in U.S. adults: A cross-sectional study. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 13(8): e0203301, (2018). Patel, A., J. Jagai, L. Messer, C. Gray, K. Rappazzo, S. DeflorioBarker, and D. Lobdell. Associations between environmental quality and infant mortality in the United States, 2000-2005. Archives of Public Health. BioMed Central Ltd, London, UK, 76(60): 1, (2018). Gray, C., D. Lobdell, K. Rappazzo, Y. Jian, J. Jagai, L. Messer, A. Patel, S. Deflorio-Barker, C. Lyttle, J. Solway, and A. Rzhetsky. Associations between environmental quality and adult asthma prevalence in medical claims data. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 166: 529-536, (2018).
Facebook
TwitterProvidence Neighborhood Code with related Description. Useful to cross-reference tax assessment and collection data.
Facebook
TwitterA coastal vulnerability index (CVI) was used to map the relative vulnerability of the coast to future sea-level rise within Channel Islands National Park in California. The CVI ranks the following in terms of their physical contribution to sea-level rise-related coastal change: geomorphology, regional coastal slope, rate of relative sea-level rise, historical shoreline change rates, mean tidal range and mean significant wave height. The rankings for each input variable were combined and an index value calculated for 1-minute grid cells covering the park. The CVI highlights those regions where the physical effects of sea-level rise might be the greatest. This approach combines the coastal system's susceptibility to change with its natural ability to adapt to changing environmental conditions, yielding a quantitative, although relative, measure of the park's natural vulnerability to the effects of sea-level rise. The CVI and the data contained within this dataset provide an objective technique for evaluation and long-term planning by scientists and park managers.
Facebook
TwitterMeasuring the usage of informatics resources such as software tools and databases is essential to quantifying their impact, value and return on investment. We have developed a publicly available dataset of informatics resource publications and their citation network, along with an associated metric (u-Index) to measure informatics resources’ impact over time. Our dataset differentiates the context in which citations occur to distinguish between ‘awareness’ and ‘usage’, and uses a citing universe of open access publications to derive citation counts for quantifying impact. Resources with a high ratio of usage citations to awareness citations are likely to be widely used by others and have a high u-Index score. We have pre-calculated the u-Index for nearly 100,000 informatics resources. We demonstrate how the u-Index can be used to track informatics resource impact over time. The method of calculating the u-Index metric, the pre-computed u-Index values, and the dataset we compiled to calculate the u-Index are publicly available.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Historical Index of Ethnic Fractionalization (HIEF) dataset contains an ethnic fractionalization index for 165 countries across all continents. The dataset covers annually the period 1945-2013. The ethnic fractionalization index corresponds to the probability that two randomly drawn individuals within a country are not from the same ethnic group. The new dataset is a natural extension of previous ethnic fractionalization indices and it allows its users to compare developments in ethnic fractionalization over time. The applications of HIEF pertain to the pattern of ethnic diversity across countries and over time.
Facebook
TwitterNotice: Due to funding limitations, this data set was recently changed to a “Basic” Level of Service. Learn more about what this means for users and how you can share your story here: Level of Service Update for Data Products. The Sea Ice Index provides a quick look at Arctic- and Antarctic-wide changes in sea ice. It is a source for consistent, up-to-date sea ice extent and concentration images, in PNG format, and data values, in GeoTIFF and ASCII text files, from November 1978 to the present. Sea Ice Index images also depict trends and anomalies in ice cover calculated using a 30-year reference period of 1981 through 2010.The images and data are produced in a consistent way that makes the Index time-series appropriate for use when looking at long-term trends in sea ice cover. Both monthly and daily products are available. However, monthly products are better to use for long-term trend analysis because errors in the daily product tend to be averaged out in the monthly product and because day-to-day variations are often the result of short-term weather.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are a total of 5 datasets.sp500_datasp500_newFeatures_datasp500_lagged_datanasdaq_lagged_datahsi_lagged_dataThe first dataset contains 34 years worth of data from 1990 to 2023 for the stock index S&P500. This dataset has been preprocessed and is used for training and testing. The second dataset transforms the initial dataset with the addition of new features derived from the first dataset. The third dataset is a different transformation of the first dataset where the features are mostly contained of lagged features. The fourth dataset contains 10 years of data for the NASDAQ index from 2014-2023 following the same format of lagged features like the third dataset. The fifth dataset has 10 years of data from 2014-2023 for the HSI stock index. This dataset also follows the same format of features as the third datasetAll five of these datasets were used as implementations for a research to predict tomorrow's closing price based on today's financial features
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please use the MESINESP2 corpus (the second edition of the shared-task) since it has a higher level of curation, quality and is organized by document type (scientific articles, patents and clinical trials).
The MESINESP (Spanish BioASQ track, see https://temu.bsc.es/mesinesp) Challenge was held in May-June 2020, and as a result of a strong participation and the manual annotation of an evaluation dataset, two additional datasets are released now:
1) "all_annotations_withIDsv3.tsv" contains a tab-separated file with all manual annotations (both validated and non-validated) of the evaluation dataset prepared for the competition. It contains the following fields:
annotatorName: Human annotator id
documentId: Document ID in the source database
decsCode: A DeCS code added to it or validated
timestamp: When it was added
validated: if it was validated at that point by another annotator, or not yet
SpanishTerm: The Spanish descriptor corresponding to the DeCS code
mesinespId: The internal document id in the distributed evaluation file
dataset: if part of the evaluation or the test sets
source: which database it was taken from
Example:
annotatorName documentId decsCode timestamp validated SpanishTerm mesinespId dataset source A7 biblio-1001069 6893 2020-01-17T11:27:07.000Z false caballos mesinesp-dev-671 dev LILACS A7 biblio-1001069 4345 2020-01-17T11:27:12.000Z false perros mesinesp-dev-671 dev LILACS
2) A "Silver Standard" created from the 24 system runs submitted by 6 participating teams. It contains each of the submitted DeCS code for each document in the test set, as well as other information that can help ascertain reliability and source for anyone that wants to use this dataset to enrich their training data. It contains more that 5.8 million datapoints, and is structured as follows
SubmissionName: Alias of the team that submitted the run
REALdocumentId: The real id of the document
mesinespId: The mesinesp assigned id in the evaluation dataset
docSource: The source database
decsCode: the DeCS code assigned to it by the team's system
SpanishTerm: The Spanish descriptor of the DeCS code
MiF: The Micro-f1 scored by that system's run
MiR: The Micro-Recall scored by that system's run
MiP: The Micro-Precision scored by that system's run
Acc: The Accuracy scored by that system's run
consensus: The number of runs where that DeCS code was assigned to this document by the participating teams (max. is 24)
Example:
SubmissionName REALdocumentId mesinespId docSource decsCode SpanishTerm MiF MiR MiP Acc consensus AN ibc-177565 mesinesp-evaluation-00001 IBECS 28567 riesgo 0.2054 0.1930 0.2196 0.1198 4 AN ibc-177565 mesinesp-evaluation-00001 IBECS 15335 trabajo 0.2054 0.1930 0.2196 0.1198 4 AN ibc-177565 mesinesp-evaluation-00001 IBECS 33182 conocimiento 0.2054 0.1930 0.2196 0.1198 7
For citation and a detailed description of the Challenge, please cite: Anastasios, Nentidis and Anastasia, Krithara and Konstantinos, Bougiatiotis and Martin, Krallinger and Carlos, Rodriguez-Penagos and Marta, Villegas and Georgios, Paliouras. Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering (2020). Proceedings of the Eleventh International Conference of the CLEF Association (CLEF 2020). Thessaloniki, Greece, September 22--25
Citation
@inproceedings{durusan2019overview, title={Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering}, author={Anastasios, Nentidis and Anastasia, Krithara and Konstantinos, Bougiatiotis and Martin, Krallinger and Carlos, Rodriguez-Penagos and Marta, Villegas and Georgios, Paliouras}, booktitle={Experimental IR Meets Multilinguality, Multimodality, and Interaction Proceedings of the Eleventh International Conference of the CLEF Association (CLEF 2020), Thessaloniki, Greece, September 22--25, 2020, Proceedings}, volume={12260}, year={2020}, organization={Springer} }
Copyright (c) 2020 Secretaría de Estado de Digitalización e Inteligencia Artificial
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The AI Global Index Dataset is a comprehensive index that benchmarks 62 countries based on the level of AI investment, innovation, and implementation, including seven key indicators (human resources, infrastructure, operational environment, research, development, government strategy, commercialization) and general information by country (region, cluster, income group, political system).
2) Data Utilization (1) AI Global Index Dataset has characteristics that: • This dataset consists of a total of 13 columns with 5 categorical variables (regions, clusters, etc.) and 8 numerical variables (scores for each indicator), covering 62 countries. • The seven key indicators are classified into three pillars: △ implementation (human resources/infrastructure/operational environment) △ innovation (R&D) △ investment (government strategy/commercialization), and assess each country's overall AI ecosystem capabilities in multiple dimensions. (2) AI Global Index Dataset can be used to: • Global AI leadership pattern analysis: Correlation analysis between seven indicators can identify AI strengths and weaknesses by country and perform group comparisons by region and income level. • Machine learning-based predictive model: It can be used for data science education and application, such as country-specific index prediction through regression analysis or classification of AI development types through clustering.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was compiled as part of a study on Barriers and Opportunities in the Discoverability and Indexing of Student-led Academic Journals. The list of student journals and their details is compiled from public sources. This list is used to identify the presence of Canadian student journals in Google Scholar as well as in select indexes and databases: DOAJ, Scopus, Web of Science, Medline, Erudit, ProQuest, and HeinOnline. Additionally, journal publishing platform is recorded to be used for a correlational analysis against Google Scholar indexing results. For further details see README.
Facebook
TwitterThe testing dataset used at TRECVID for the DSDI task in 2020-2022.The dataset includes public videos, ground truth and features of the DSDI task. As the task is continuing, the dataset will be continually updated.There are 32 features across 5 main categories (Environment, Vehicles, Water, Infrastructure, Damage). All videos are airborne low altitude from natural disaster events.