https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
CQADupstack-Webmasters-PL An MTEB dataset Massive Text Embedding Benchmark
CQADupStack: A Stack Exchange Question Duplicate Pairs Dataset
Task category t2t
Domains Written, Web
Reference https://huggingface.co/datasets/clarin-knext/cqadupstack-webmasters-pl
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["CQADupstack-Webmasters-PL"]) evaluator =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CQADupstack-Webmasters-PL.
This data contains information about people involved in a crash and if any injuries were sustained. This dataset should be used in combination with the traffic Crash and Vehicle dataset. Each record corresponds to an occupant in a vehicle listed in the Crash dataset. Some people involved in a crash may not have been an occupant in a motor vehicle, but may have been a pedestrian, bicyclist, or using another non-motor vehicle mode of transportation. Injuries reported are reported by the responding police officer. Fatalities that occur after the initial reports are typically updated in these records up to 30 days after the date of the crash. Person data can be linked with the Crash and Vehicle dataset using the “CRASH_RECORD_ID” field. A vehicle can have multiple occupants and hence have a one to many relationship between Vehicle and Person dataset. However, a pedestrian is a “unit” by itself and have a one to one relationship between the Vehicle and Person table. The Chicago Police Department reports crashes on IL Traffic Crash Reporting form SR1050. The crash data published on the Chicago data portal mostly follows the data elements in SR1050 form. The current version of the SR1050 instructions manual with detailed information on each data elements is available here. Change 11/21/2023: We have removed the RD_NO (Chicago Police Department report number) for privacy reasons.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
HotpotQA-PL An MTEB dataset Massive Text Embedding Benchmark
HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems.
Task category t2t
Domains Web, Written
Reference https://hotpotqa.github.io/
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/HotpotQA-PL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Windsor Place by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Windsor Place across both sexes and to determine which sex constitutes the majority.
Key observations
There is a majority of female population, with 61.19% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Windsor Place Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Langdon Place population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Langdon Place across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of Langdon Place was 865, a 0.23% decrease year-by-year from 2021. Previously, in 2021, Langdon Place population was 867, a decline of 0.57% compared to a population of 872 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Langdon Place decreased by 123. In this period, the peak population was 1,091 in the year 2009. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Langdon Place Population by Year. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Retail Analytics: Store owners can use the model to track the number of customers visiting their stores during different times of the day or seasons, which can help in workforce and resource allocation.
Crowd Management: Event organizers or public authorities can utilize the model to monitor crowd sizes at concerts, festivals, public gatherings or protests, aiding in security and emergency planning.
Smart Transportation: The model can be integrated into public transit systems to count the number of passengers in buses or trains, providing real-time occupancy information and assisting in transportation planning.
Health and Safety Compliance: During times of pandemics or emergencies, the model can be used to count the number of people in a location, ensuring compliance with restrictions on gathering sizes.
Building Security: The model can be adopted in security systems to track how many people enter and leave a building or a particular area, providing useful data for access control.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Forest Proximate People" (FPP) dataset is one of the data layers contributing to the development of indicator #13, “number of forest-dependent people in extreme poverty,” of the Collaborative Partnership on Forests (CPF) Global Core Set of forest-related indicators (GCS). The FPP dataset provides an estimate of the number of people living in or within 5 kilometers of forests (forest-proximate people) for the year 2019 with a spatial resolution of 100 meters at a global level.
For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L. Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: A new methodology and global estimates. Background Paper to The State of the World’s Forests 2022 report. Rome, FAO.
Contact points:
Maintainer: Leticia Pina
Maintainer: Sarah E., Castle
Data lineage:
The FPP data are generated using Google Earth Engine. Forests are defined by the Copernicus Global Land Cover (CGLC) (Buchhorn et al. 2020) classification system’s definition of forests: tree cover ranging from 15-100%, with or without understory of shrubs and grassland, and including both open and closed forests. Any area classified as forest sized ≥ 1 ha in 2019 was included in this definition. Population density was defined by the WorldPop global population data for 2019 (WorldPop 2018). High density urban populations were excluded from the analysis. High density urban areas were defined as any contiguous area with a total population (using 2019 WorldPop data for population) of at least 50,000 people and comprised of pixels all of which met at least one of two criteria: either the pixel a) had at least 1,500 people per square km, or b) was classified as “built-up” land use by the CGLC dataset (where “built-up” was defined as land covered by buildings and other manmade structures) (Dijkstra et al. 2020). Using these datasets, any rural people living in or within 5 kilometers of forests in 2019 were classified as forest proximate people. Euclidean distance was used as the measure to create a 5-kilometer buffer zone around each forest cover pixel. The scripts for generating the forest-proximate people and the rural-urban datasets using different parameters or for different years are published and available to users. For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L., Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: a new methodology and global estimates. Background Paper to The State of the World’s Forests 2022. Rome, FAO.
References:
Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.E., Herold, M., Fritz, S., 2020. Copernicus Global Land Service: Land Cover 100m: collection 3 epoch 2019. Globe.
Dijkstra, L., Florczyk, A.J., Freire, S., Kemper, T., Melchiorri, M., Pesaresi, M. and Schiavina, M., 2020. Applying the degree of urbanisation to the globe: A new harmonised definition reveals a different picture of global urbanisation. Journal of Urban Economics, p.103312.
WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University, 2018. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645
Online resources:
GEE asset for "Forest proximate people - 5km cutoff distance"
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES PLACE OF BIRTH - DP02 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 People not reporting a place of birth were assigned the state or country of birth of another family member, or were allocated the response of another individual with similar characteristics. People born outside the United States were asked to report their place of birth according to current international boundaries. Since numerous changes in boundaries of foreign countries have occurred in the last century, some people may have reported their place of birth in terms of boundaries that existed at the time of their birth or emigration, or in accordance with their own national preference.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
AmsterTime dataset offers a collection of 2,500 well-curated images matching the same scene from a street view matched to historical archival image data from Amsterdam city. The image pairs capture the same place with different cameras, viewpoints, and appearances. Unlike existing benchmark datasets, AmsterTime is directly crowdsourced in a GIS navigation platform (Mapillary). In turn, all the matching pairs are verified by a human expert to verify the correct matches and evaluate the human competence in the Visual Place Recognition (VPR) task for further references.
The properties of the dataset are summarized as:
Two sub-tasks are created on the dataset:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The number of employed persons in Netherlands increased to 9833 Thousand in May of 2025 from 9824 Thousand in April of 2025. This dataset provides the latest reported value for - Netherlands Employed Persons - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
A cryptocurrency, crypto-currency, or crypto is a collection of binary data which is designed to work as a medium of exchange. Individual coin ownership records are stored in a ledger, which is a computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership. Cryptocurrencies are generally fiat currencies, as they are not backed by or convertible into a commodity. Some crypto schemes use validators to maintain the cryptocurrency. In a proof-of-stake model, owners put up their tokens as collateral. In return, they get authority over the token in proportion to the amount they stake. Generally, these token stakes get additional ownership in the token overtime via network fees, newly minted tokens, or other such reward mechanisms.
Cryptocurrency does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies typically use decentralized control as opposed to a central bank digital currency (CBDC). When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database
A cryptocurrency is a tradable digital asset or digital form of money, built on blockchain technology that only exists online. Cryptocurrencies use encryption to authenticate and protect transactions, hence their name. There are currently over a thousand different cryptocurrencies in the world, and many see them as the key to a fairer future economy.
Bitcoin, first released as open-source software in 2009, is the first decentralized cryptocurrency. Since the release of bitcoin, many other cryptocurrencies have been created.
This Dataset is a collection of records of 3000+ Different Cryptocurrencies. * Top 395+ from 2021 * Top 3000+ from 2023
https://i.imgur.com/qGVJaHl.png" alt="">
This Data is collected from: https://finance.yahoo.com/. If you want to learn more, you can visit the Website.
Cover Photo by Worldspectrum: https://www.pexels.com/photo/ripple-etehereum-and-bitcoin-and-micro-sdhc-card-844124/
This dataset provides information about the number of programs that have received Agency Review funding, how many of those programs had defined measurable outcome goals (DMOG) specified in the agency's funding request applications, and how many programs achieved their DMOG.The Agency Review process was developed to distribute human services funds to non-profit agencies. Agency Review funds come from the City of Tempe General Revenue Fund, Federal Community Development Block Grants, and water utility customer donations through Tempe’s Help to Others.This page provides data for the Human Services Grant performance measure.Identifies the people served as a result of the Agency Review grant funding to non-profit agencies.The performance measure dashboard is available at 3.10 Human Services Grants.Additional InformationSource: e-CImpactContact: Octavia HarrisContact E-Mail: octavia_harris@tempe.govData Source Type: ExcelPreparation Method: Data downloaded from e-CImpact, then compiled in a spreadsheet to establish yes/no fields for aggregate calculations by population servedPublish Frequency: AnnualPublish Method: ManualData Dictionary
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Bilingual (EN-PL) COVID-19-related corpus acquired from the website (https://www.voltairenet.org/) of Voltaire Network (1st May 2020).
SCIDOCS-PL An MTEB dataset Massive Text Embedding Benchmark
SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation.
Task category t2t
Domains None
Reference https://allenai.org/data/scidocs
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["SCIDOCS-PL"])… See the full description on the dataset page: https://huggingface.co/datasets/mteb/SCIDOCS-PL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bilingual (EN-PL) corpus acquired from website (https://ec.europa.eu/*coronavirus-response) of the EU portal (20th May 2020)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Elmwood Place by race. It includes the population of Elmwood Place across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Elmwood Place across relevant racial categories.
Key observations
The percent distribution of Elmwood Place population by race (across all racial categories recognized by the U.S. Census Bureau): 87.33% are white, 4.71% are Black or African American, 0.95% are some other race and 7.01% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Elmwood Place Population by Race & Ethnicity. You can refer the same here
NQ-PL An MTEB dataset Massive Text Embedding Benchmark
Natural Questions: A Benchmark for Question Answering Research
Task category t2t
Domains None
Reference https://ai.google.com/research/NaturalQuestions/
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["NQ-PL"]) evaluator = mteb.MTEB(task)
model = mteb.get_model(YOUR_MODEL) evaluator.run(model)
To learn… See the full description on the dataset page: https://huggingface.co/datasets/mteb/NQ-PL.
Mortality Rates for Lake County, Illinois. Explanation of field attributes: Average Age of Death – The average age at which a people in the given zip code die. Cancer Deaths – Cancer deaths refers to individuals who have died of cancer as the underlying cause. This is a rate per 100,000. Heart Disease Related Deaths – Heart Disease Related Deaths refers to individuals who have died of heart disease as the underlying cause. This is a rate per 100,000. COPD Related Deaths – COPD Related Deaths refers to individuals who have died of chronic obstructive pulmonary disease (COPD) as the underlying cause. This is a rate per 100,000.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset accompanies the following publication, please cite this publication if you use this dataset:
Fischer, T. and Milford, M., 2022. How Many Events Do You Need? Event-Based Visual Place Recognition Using Sparse But Varying Pixels. IEEE Robotics and Automation Letters, 7(4), pp.12275-12282.
@article{FischerRAL2022ICRA2023,
title={How Many Events do You Need? Event-based Visual Place Recognition Using Sparse But Varying Pixels},
author={Tobias Fischer and Michael Milford},
journal={IEEE Robotics and Automation Letters},
volume={7},
number={4},
pages={12275--12282},
year={2022},
doi={10.1109/LRA.2022.3216226},
}
The dataset contains seven sequences of recordings. For each recording, the following files are made available:
A rosbag (*.bag) file with the following contents:
/dvs/events (type: dvs_msgs/EventArray) with the event stream, see https://github.com/uzh-rpg/rpg_dvs_ros
/dvs/camera_info (type: sensor_msgs/CameraInfo) with the camera info of the DAVIS frame camera
/dvs/image_raw (type: sensor_msgs/Image) with the DAVIS frame camera images
/dvs/imu (sensor_msgs/Imu) with the IMU data of the event camera
A parquet file that can be read with pandas, which is converted from the bag file, with a denoising algorithm applied.
A zip file containing the DAVIS frame camera images. Once extracted, the images have the timestamp as their filename.
Please see the associated code repository (https://github.com/Tobias-Fischer/sparse-event-vpr) for manually annotated ground-truth information.
The dataset contains weekly working hours for various European countries spanning from 2014 to 2023.
Foto von Milad Fakurian auf Unsplash
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
CQADupstack-Webmasters-PL An MTEB dataset Massive Text Embedding Benchmark
CQADupStack: A Stack Exchange Question Duplicate Pairs Dataset
Task category t2t
Domains Written, Web
Reference https://huggingface.co/datasets/clarin-knext/cqadupstack-webmasters-pl
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["CQADupstack-Webmasters-PL"]) evaluator =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CQADupstack-Webmasters-PL.