Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please use the MESINESP2 corpus (the second edition of the shared-task) since it has a higher level of curation, quality and is organized by document type (scientific articles, patents and clinical trials).
Introduction
The Mesinesp (Spanish BioASQ track, see https://temu.bsc.es/mesinesp) development set has a total of 750 records indexed manually by seven experienced medical literature indexers. Indexing is done using DeCS codes, a sort of Spanish equivalent to MeSH terms. Records were distributed in a way that each article was annotated, at least, by two different human indexers.
The data annotation process consisted in two steps:
Manual indexing step. DeCS codes were manually assigned to each record following the DeCS manual indexing guidelines.
Manual validation and consensus. The joined set of manually indexed DeCS codes generated by both indexers were manually revised and corrections were done.
These annotations were analyzed, resulting in an agreement using the Jaccard index.
Records consisted basically in medical literature abstracts and titles from the IBECS and LILACS databases.
Zip structure The zip file contains two different development sets:
Official development set, which has the union of the annotations, with an agreement of macro = 0.6568 and micro = 0.6819. This set is composed by all the different (unique) DeCS codes that have been added by any annotator for each document; and
Core-descriptors development set, which has the intersection of the annotations, with an agreement of macro = 1.0 and micro = 1.0. This set is composed of the common DeCS codes that have been added by two or more annotators for each document.
Corpus format
Each dataset is a JSON object with one single key named "articles", which contains a list of documents. So, the raw format of the file is one line per document plus two additional lines (the first and the last) to enclose that list of documents and the expected type of data is as follows:
{"articles":[ {"abstractText":str,"db":str,"decsCodes":list,"id":str,"journal":str,"title":str,"year":int}, ... ]}
To clarify, the order of appearance of the fields in each document is as follows (note that this example it is pretty printed for readability purposes):
{ "articles": [ { "abstractText": "Content of the abstract", "db": "Name of the source database", "decsCodes": [ "code1", "code2", "code3" ], "id": "Id of the document", "journal": "Name of the journal", "title": "Title of the document", "year": 2019 } ] }
Note: The fields "db", "journal" and "year" might be null.
Copyright (c) 2020 Secretaría de Estado de Digitalización e Inteligencia Artificial
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Unlock the power of real-time data! Explore the booming real-time index database market, projected to reach $32 billion by 2033. Discover key trends, leading companies (Elastic, AWS, Splunk), and regional insights in this comprehensive market analysis.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Name: City Happiness Index
Dataset Description:
This dataset and the related codes are entirely prepared, original, and exclusive by Emirhan BULUT. The dataset includes crucial features and measurements from various cities around the world, focusing on factors that may affect the overall happiness score of each city. By analyzing these factors, we aim to gain insights into the living conditions and satisfaction of the population in urban environments.
The dataset consists of the following features:
With these features, the dataset aims to analyze and understand the relationship between various urban factors and the happiness of a city's population. The developed Deep Q-Network model, PIYAAI_2, is designed to learn from this data to provide accurate predictions in future scenarios. Using Reinforcement Learning, the model is expected to improve its performance over time as it learns from new data and adapts to changes in the environment.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This data contains Index match, index match Advance
Facebook
TwitterThis dataset contains the full historical record of the S&P 500 index (^GSPC), downloaded via the Yahoo Finance API using the yfinance Python library.
The dataset includes: - Date: Trading date - Open, High, Low, Close: Daily price levels - Volume: Daily trading volume
Period covered: Dec 30, 1927 – Aug 31, 2025 Frequency: Daily
⚠️ Disclaimer: This dataset is provided for educational and research purposes only. Redistribution or commercial use may be subject to Yahoo Finance’s Terms of Service
Data sourced from Yahoo Finance. Provided for educational and research purposes only. Redistribution may be restricted.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dallas Fed Manufacturing Shipments Index in the United States increased to 15.10 points in November from 5.80 points in October of 2025. This dataset includes a chart with historical data for the United States Dallas Fed Manufacturing Shipments Index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Indexing Magic Cards is a dataset for object detection tasks - it contains Magic Cards annotations for 297 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterA monthly measure of the volume of services performed by the for-hire transportation sector. The index covers the activities of local mass transit, intercity passenger rail, and passenger air transportation.
Facebook
TwitterAn Environmental Quality Index (EQI) for all counties in the United States for the time period 2000-2005 was developed which incorporated data from five environmental domains: air, water, land, built, and socio-demographic. The EQI was developed in four parts: domain identification; data source identification and review; variable construction; and data reduction using principal components analysis (PCA). The methods applied provide a reproducible approach that capitalizes almost exclusively on publically-available data sources. The primary goal in creating the EQI is to use it as a composite environmental indicator for research on human health. A series of peer reviewed manuscripts utilized the EQI in examining health outcomes. This dataset is not publicly accessible because: This series of papers are considered Human health research - not to be loaded onto ScienceHub. It can be accessed through the following means: The EQI data can be accessed at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: EQI data, metadata, formats, and data dictionary all available at website. This dataset is associated with the following publications: Gray, C., L. Messer, K. Rappazzo, J. Jagai, S. Grabich, and D. Lobdell. The association between physical inactivity and obesity is modified by five domains of environmental quality in U.S. adults: A cross-sectional study. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 13(8): e0203301, (2018). Patel, A., J. Jagai, L. Messer, C. Gray, K. Rappazzo, S. DeflorioBarker, and D. Lobdell. Associations between environmental quality and infant mortality in the United States, 2000-2005. Archives of Public Health. BioMed Central Ltd, London, UK, 76(60): 1, (2018). Gray, C., D. Lobdell, K. Rappazzo, Y. Jian, J. Jagai, L. Messer, A. Patel, S. Deflorio-Barker, C. Lyttle, J. Solway, and A. Rzhetsky. Associations between environmental quality and adult asthma prevalence in medical claims data. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 166: 529-536, (2018).
Facebook
TwitterA coastal vulnerability index (CVI) was used to map the relative vulnerability of the coast to future sea-level rise within Channel Islands National Park in California. The CVI ranks the following in terms of their physical contribution to sea-level rise-related coastal change: geomorphology, regional coastal slope, rate of relative sea-level rise, historical shoreline change rates, mean tidal range and mean significant wave height. The rankings for each input variable were combined and an index value calculated for 1-minute grid cells covering the park. The CVI highlights those regions where the physical effects of sea-level rise might be the greatest. This approach combines the coastal system's susceptibility to change with its natural ability to adapt to changing environmental conditions, yielding a quantitative, although relative, measure of the park's natural vulnerability to the effects of sea-level rise. The CVI and the data contained within this dataset provide an objective technique for evaluation and long-term planning by scientists and park managers.
Facebook
TwitterThis dataset shows the concentration of cyanobacteria cells/ml in fresh water bodies and estuaries of the Ohio and Florida derived from 300x300 meter MEdium Resolution Imaging Spectrometer (MERIS) satellite imagery. This dataset was produced through partnership with the National Oceanic and Atmospheric Administration (NOAA), the National Aeronautics and Space Administration (NASA), the United States Geological Survey (USGS), and the United States Environmental Protection Agency (USEPA). This cyanobacteria dataset was derived using the European Space Agency (ESA) Envisat satellite and MERIS instrument. MERIS is a 68.5 degree field-of-view nadir-pointing imaging spectrometer which measures the solar radiation reflected by the Earth in 15 spectral bands (visible and near-infrared). MERIS imagery was used to identify long-wavelength spectral bands (from red through near-infrared portion of the spectrum) to locate algal blooms within freshwaters and estuaries of the continental United States. This dataset is associated with the following publication: Urquhart, E., B. Schaeffer, R. Stumpf, K. Loftin, and J. Wedell. .A method for examining temporal changes in cyanobacterial harmful algal bloom spatial extent using satellite remote sensing. Harmful Algae. Elsevier B.V., Amsterdam, NETHERLANDS, 67: 144-152, (2017).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are a total of 5 datasets.sp500_datasp500_newFeatures_datasp500_lagged_datanasdaq_lagged_datahsi_lagged_dataThe first dataset contains 34 years worth of data from 1990 to 2023 for the stock index S&P500. This dataset has been preprocessed and is used for training and testing. The second dataset transforms the initial dataset with the addition of new features derived from the first dataset. The third dataset is a different transformation of the first dataset where the features are mostly contained of lagged features. The fourth dataset contains 10 years of data for the NASDAQ index from 2014-2023 following the same format of lagged features like the third dataset. The fifth dataset has 10 years of data from 2014-2023 for the HSI stock index. This dataset also follows the same format of features as the third datasetAll five of these datasets were used as implementations for a research to predict tomorrow's closing price based on today's financial features
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Mohit Gupta
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CFNAI Employment Index in the United States increased to -0.07 points in August from -0.10 points in July of 2025. This dataset includes a chart with historical data for the United States CFNAI Employment Index.
Facebook
TwitterA faiss index for the wikipedia documents chunked from early august 2023 wikipedia dump, with FAISS doc id's matching the doc id's in these two pre-chunked databases:
https://www.kaggle.com/datasets/donkeys/wikipedia-202308-64tk/data https://www.kaggle.com/datasets/donkeys/wikipedia-202308-chunks-256tk-sqlite
see the using notebook for example code. it can be used to look up similarities to given indices and the received id values can be used to retrieve the documents matching the closest ones, along with the document chunks, which can in turn be used for finer-grained similarity search
The embedding model used to build this was this: https://www.kaggle.com/datasets/donkeys/bge-small-en/data
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The AI Global Index Dataset is a comprehensive index that benchmarks 62 countries based on the level of AI investment, innovation, and implementation, including seven key indicators (human resources, infrastructure, operational environment, research, development, government strategy, commercialization) and general information by country (region, cluster, income group, political system).
2) Data Utilization (1) AI Global Index Dataset has characteristics that: • This dataset consists of a total of 13 columns with 5 categorical variables (regions, clusters, etc.) and 8 numerical variables (scores for each indicator), covering 62 countries. • The seven key indicators are classified into three pillars: △ implementation (human resources/infrastructure/operational environment) △ innovation (R&D) △ investment (government strategy/commercialization), and assess each country's overall AI ecosystem capabilities in multiple dimensions. (2) AI Global Index Dataset can be used to: • Global AI leadership pattern analysis: Correlation analysis between seven indicators can identify AI strengths and weaknesses by country and perform group comparisons by region and income level. • Machine learning-based predictive model: It can be used for data science education and application, such as country-specific index prediction through regression analysis or classification of AI development types through clustering.
Facebook
TwitterThis data set contains vector polygons representing the boundaries of all hardcopy cartographic products produced as part of the Environmental Sensitivity Index (ESI) for Alabama. This data set comprises a portion of the ESI data for Alabama. ESI data characterize the marine and coastal environments and wildlife by their sensitivity to spilled oil. The ESI data include information for three main components: shoreline habitats, sensitive biological resources, and human-use resources.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data used for the development of the Index Index model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Leaf Area Index (LAI) is a fundamental vegetation structural variable that drives energy and mass exchanges between the plant and the atmosphere. Moderate-resolution (300m – 7km) global LAI data products have been widely applied to track global vegetation changes, drive Earth system models, monitor crop growth and productivity, etc. Yet, cutting-edge applications in climate adaptation, hydrology, and sustainable agriculture require LAI information at higher spatial resolution (< 100m) to model and understand heterogeneous landscapes.
This dataset was built to assist a machine-learning-based approach for mapping LAI from 30m-resolution Landsat images across the contiguous US (CONUS). The data was derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) Version 6 LAI/FPAR, Landsat Collection 1 surface reflectance, and NLCD Land Cover datasets over 2006 – 2018 using Google Earth Engine. Each record/sample/row includes a MODIS LAI value, corresponding Landsat surface reflectance in green, red, NIR, SWIR1 bands, a land cover (biome) type, geographic location, and other auxiliary information. Each sample represents a MODIS LAI pixel (500m) within which a single biome type dominates 90% of the area. The spatial homogeneity of the samples was further controlled by a screening process based on the coefficient of variation of the Landsat surface reflectance. In total, there are approximately 1.6 million samples, stratified by biome, Landsat sensor, and saturation status from the MODIS LAI algorithm. This dataset can be used to train machine learning models and generate LAI maps for Landsat 5, 7, 8 surface reflectance images within CONUS. Detailed information on the sample generation and quality control can be found in the related journal article. Resources in this dataset:Resource Title: README. File Name: LAI_train_samples_CONUS_README.txtResource Description: Description and metadata of the main datasetResource Software Recommended: Notepad,url: https://www.microsoft.com/en-us/p/windows-notepad/9msmlrh6lzf3?activetab=pivot:overviewtab Resource Title: LAI_training_samples_CONUS. File Name: LAI_train_samples_CONUS_v0.1.1.csvResource Description: This CSV file consists of the training samples for estimating Leaf Area Index based on Landsat surface reflectance images (Collection 1 Tire 1). Each sample has a MODIS LAI value and corresponding surface reflectance derived from Landsat pixels within the MODIS pixel.
Contact: Yanghui Kang (kangyanghui@gmail.com)
Column description
UID: Unique identifier. Format: LATITUDE_LONGITUDE_SENSOR_PATHROW_DATE
Landsat_ID: Landsat image ID
Date: Landsat image date in "YYYYMMDD"
Latitude: Latitude (WGS84) of the MODIS LAI pixel center
Longitude: Longitude (WGS84) of the MODIS LAI pixel center
MODIS_LAI: MODIS LAI value in "m2/m2"
MODIS_LAI_std: MODIS LAI standard deviation in "m2/m2"
MODIS_LAI_sat: 0 - MODIS Main (RT) method used no saturation; 1 - MODIS Main (RT) method with saturation
NLCD_class: Majority class code from the National Land Cover Dataset (NLCD)
NLCD_frequency: Percentage of the area cover by the majority class from NLCD
Biome: Biome type code mapped from NLCD (see below for more information)
Blue: Landsat surface reflectance in the blue band
Green: Landsat surface reflectance in the green band
Red: Landsat surface reflectance in the red band
Nir: Landsat surface reflectance in the near infrared band
Swir1: Landsat surface reflectance in the shortwave infrared 1 band
Swir2: Landsat surface reflectance in the shortwave infrared 2 band
Sun_zenith: Solar zenith angle from the Landsat image metadata. This is a scene-level value.
Sun_azimuth: Solar azimuth angle from the Landsat image metadata. This is a scene-level value.
NDVI: Normalized Difference Vegetation Index computed from Landsat surface reflectance
EVI: Enhanced Vegetation Index computed from Landsat surface reflectance
NDWI: Normalized Difference Water Index computed from Landsat surface reflectance
GCI: Green Chlorophyll Index = Nir/Green - 1
Biome code
1 - Deciduous Forest
2 - Evergreen Forest
3 - Mixed Forest
4 - Shrubland
5 - Grassland/Pasture
6 - Cropland
7 - Woody Wetland
8 - Herbaceous Wetland
Reference Dataset: All data was accessed through Google Earth Engine Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment. MODIS Version 6 Leaf Area Index/FPAR 4-day L5 Global 500m Myneni, R., Y. Knyazikhin, T. Park. MOD15A2H MODIS/Terra Leaf Area Index/FPAR 8-Day L4 Global 500m SIN Grid V006. 2015, distributed by NASA EOSDIS Land Processes DAAC, https://doi.org/10.5067/MODIS/MOD15A2H.006 Landsat 5/7/8 Collection 1 Surface Reflectance Landsat Level-2 Surface Reflectance Science Product courtesy of the U.S. Geological Survey. Masek, J.G., Vermote, E.F., Saleous N.E., Wolfe, R., Hall, F.G., Huemmrich, K.F., Gao, F., Kutler, J., and Lim, T-K. (2006). A Landsat surface reflectance dataset for North America, 1990–2000. IEEE Geoscience and Remote Sensing Letters 3(1):68-72. http://dx.doi.org/10.1109/LGRS.2005.857030. Vermote, E., Justice, C., Claverie, M., & Franch, B. (2016). Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of Environment. http://dx.doi.org/10.1016/j.rse.2016.04.008. National Land Cover Dataset (NLCD) Yang, Limin, Jin, Suming, Danielson, Patrick, Homer, Collin G., Gass, L., Bender, S.M., Case, Adam, Costello, C., Dewitz, Jon A., Fry, Joyce A., Funk, M., Granneman, Brian J., Liknes, G.C., Rigge, Matthew B., Xian, George, A new generation of the United States National Land Cover Database—Requirements, research priorities, design, and implementation strategies: ISPRS Journal of Photogrammetry and Remote Sensing, v. 146, p. 108–123, at https://doi.org/10.1016/j.isprsjprs.2018.09.006 Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CFNAI Sales Orders and Inventories Index in the United States increased to 0 percent in August from -0.02 percent in July of 2025. This dataset includes a chart with historical data for the United States CFNAI Sales, Orders and Inventories Index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please use the MESINESP2 corpus (the second edition of the shared-task) since it has a higher level of curation, quality and is organized by document type (scientific articles, patents and clinical trials).
Introduction
The Mesinesp (Spanish BioASQ track, see https://temu.bsc.es/mesinesp) development set has a total of 750 records indexed manually by seven experienced medical literature indexers. Indexing is done using DeCS codes, a sort of Spanish equivalent to MeSH terms. Records were distributed in a way that each article was annotated, at least, by two different human indexers.
The data annotation process consisted in two steps:
Manual indexing step. DeCS codes were manually assigned to each record following the DeCS manual indexing guidelines.
Manual validation and consensus. The joined set of manually indexed DeCS codes generated by both indexers were manually revised and corrections were done.
These annotations were analyzed, resulting in an agreement using the Jaccard index.
Records consisted basically in medical literature abstracts and titles from the IBECS and LILACS databases.
Zip structure The zip file contains two different development sets:
Official development set, which has the union of the annotations, with an agreement of macro = 0.6568 and micro = 0.6819. This set is composed by all the different (unique) DeCS codes that have been added by any annotator for each document; and
Core-descriptors development set, which has the intersection of the annotations, with an agreement of macro = 1.0 and micro = 1.0. This set is composed of the common DeCS codes that have been added by two or more annotators for each document.
Corpus format
Each dataset is a JSON object with one single key named "articles", which contains a list of documents. So, the raw format of the file is one line per document plus two additional lines (the first and the last) to enclose that list of documents and the expected type of data is as follows:
{"articles":[ {"abstractText":str,"db":str,"decsCodes":list,"id":str,"journal":str,"title":str,"year":int}, ... ]}
To clarify, the order of appearance of the fields in each document is as follows (note that this example it is pretty printed for readability purposes):
{ "articles": [ { "abstractText": "Content of the abstract", "db": "Name of the source database", "decsCodes": [ "code1", "code2", "code3" ], "id": "Id of the document", "journal": "Name of the journal", "title": "Title of the document", "year": 2019 } ] }
Note: The fields "db", "journal" and "year" might be null.
Copyright (c) 2020 Secretaría de Estado de Digitalización e Inteligencia Artificial