Facebook
TwitterThe OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents the quantitative raw data that was collected under the H2020 RRI2SCALE project for the D1.4 - “Large scale regional citizen surveys report”. The dataset includes the answers that were provided by almost 8,000 participants from 4 pilot European regions (Kriti, Vestland, Galicia, and Overijssel) regarding the general public's views, concerns, and moral issues about the current and future trajectories of their RTD&I ecosystem. The original survey questionnaire was created by White Research SRL and disseminated to the regions through supporting pilot partners. Data collection took place from June 2020 to September 2020 through 4 different waves – one for each region. Based on the conclusion of a consortium vote during the kick-off meeting, it was decided that instead of resource-intensive methods that would render data collection unduly expensive, to fill in the quotas responses were collected through online panels by survey companies that were used for each region. For the statistical analysis of the data and the conclusions drawn from the analysis, you can access the "Large scale regional citizen surveys report" (D1.4).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Version 162 of the dataset. NOTES: Data for 3/15 - 3/18 was not extracted due to unexpected and unannounced downtime of our university infrastructure. We will try to backfill those days by next release. FUTURE CHANGES: Due to the imminent paywalling of Twitter's API access this might be the last full update of this dataset. If the API access is not blocked, we will be stopping updates for this dataset with release 165 - a bit more than 3 years after our initial release. It's been a joy seeing all the work that uses this resource and we are glad that so many found it useful.
The dataset files: full_dataset.tsv.gz and full_dataset_clean.tsv.gz have been split in 1 GB parts using the Linux utility called Split. So make sure to join the parts before unzipping. We had to make this change as we had huge issues uploading files larger than 2GB's (hence the delay in the dataset releases). The peer-reviewed publication for this dataset has now been published in Epidemiologia an MDPI journal, and can be accessed here: https://doi.org/10.3390/epidemiologia2030024. Please cite this when using the dataset.
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets.
The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (1,395,222,801 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (361,748,721 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the full_dataset-statistics.tsv and full_dataset-clean-statistics.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/
More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688)
As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used.
Facebook
TwitterThis is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundLarge cross-sectional household surveys are common for measuring indicators of neglected tropical disease control programs. As an alternative to standard paper-based data collection, we utilized novel paperless technology to collect data electronically from over 12,000 households in Ethiopia.MethodologyWe conducted a needs assessment to design an Android-based electronic data collection and management system. We then evaluated the system by reporting results of a pilot trial and from comparisons of two, large-scale surveys; one with traditional paper questionnaires and the other with tablet computers, including accuracy, person-time days, and costs incurred.Principle FindingsThe electronic data collection system met core functions in household surveys and overcame constraints identified in the needs assessment. Pilot data recorders took 264 (standard deviation (SD) 152 sec) and 260 sec (SD 122 sec) per person registered to complete household surveys using paper and tablets, respectively (P = 0.77). Data recorders felt a lack of connection with the interviewee during the first days using electronic devices, but preferred to collect data electronically in future surveys. Electronic data collection saved time by giving results immediately, obviating the need for double data entry and cross-correcting. The proportion of identified data entry errors in disease classification did not differ between the two data collection methods. Geographic coordinates collected using the tablets were more accurate than coordinates transcribed on a paper form. Costs of the equipment required for electronic data collection was approximately the same cost incurred for data entry of questionnaires, whereas repeated use of the electronic equipment may increase cost savings.Conclusions/SignificanceConducting a needs assessment and pilot testing allowed the design to specifically match the functionality required for surveys. Electronic data collection using an Android-based technology was suitable for a large-scale health survey, saved time, provided more accurate geo-coordinates, and was preferred by recorders over standard paper-based questionnaires.
Facebook
TwitterIndex to BGS geological map 'Standards', manuscript and published maps for Great Britain produced by the Survey on County Series (1:10560) and National Grid (1:10560 & 1:10000) Ordnance Survey base maps. 'Standards' are the best interpretation of the geology at the time they were produced. The Oracle index was set up in 1988, current holdings are over 41,000 maps. There are entries for all registered maps, but not all fields are complete on all entries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
See dataset details: https://github.com/logpai/Loghub-2.0
The datasets are freely available for research or academic work, subject to the following condition: For any usage or distribution of the LogPub datasets, please refer to the LogPub repository URL (https://github.com/logpai/Loghub-2.0) and cite the LogPub paper (A Large-scale Evaluation for Log Parsing Techniques: How Far are We?) where applicable.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Part of the network parameter data.
Facebook
TwitterCount of total number of Large-Scale Loach (Paramisgurnus dabryanus) observed in a two hour period
Facebook
TwitterThis data consists of 12 semi-strutured, participant led, qualitative biographical interviews, conducted during the scoping phase of the ESRC UK Voices Pilot Project. The aim of the interviews was to test a Topic Guide for producing a large-scale qualitative general purpose data set that provides insights into people's experiences and life in the UK. It is linked to an additional dataset of 38 interviews, conducted during the larger pilot.
UK Voices developed a methodology for building a general-use qualitative interview dataset to provide insights into how the UK population experiences and navigates accelerated social changes, including climate change, political polarisation, and inequality.
The project is piloted methods for large-scale qualitative data collection and analysis to enhance the UK’s social science research infrastructure. Funded by ESRC and running from October 2024 to June 2025, the project was organised into two main work packages. The data in this deposit was collected during work package 1, which first tested qualitative interview techniques for gathering in-depth data from a broad sample of the population, refining methods for large-scale qualitative research. This built on existing projects, such as the American Voices Project, to develop a methodology tailored to the UK context.
The second work package explored the use of generative AI and Natural Language Processing (NLP) tools to streamline the analysis of the extensive qualitative data collected. By leveraging these tools to assist in identifying and analysing key sections of text, this phase addressed some of the challenges regarding scaling qualitative research, which is often limited to small numbers of participants. The project ultimately aimed to create a flexible research platform that merges qualitative methods with innovative software tools, enabling more efficient analysis and broader exploration of critical social issues. The findings from this pilot have been shared with the wider social science community through reports, workshops, and conferences, laying the groundwork for future large-scale and cross-national qualitative research.
The UK Voices Pilot Project explored how large-scale qualitative data could be collected and analysed to deepen our understanding of how people in the UK are experiencing and responding to rapid social change. It sought to evaluate both the feasibility of developing a representative qualitative resource and the potential of using Artificial Intelligence (AI), specifically Natural Language Processing (NLP) and Large Language Models (LLMs), to support qualitative data analysis. The project was structured around two interconnected work packages. Work Package 1 (WP1) focused on testing a biographical interview protocol and exploring how such an approach could be scaled nationally. During a two-stage process, the first led by LSE and the second by National Centre for Social Research (NatCen), the team conducted 51 interviews across a diverse sample. The flexible biographical approach, beginning with the question “Can you tell me your life story?”, proved successful in eliciting detailed, reflective narratives. However, the pilot also revealed challenges relating to recruitment and response rates, particularly during the panel-based phase led by NatCen. These findings are crucial for informing future sampling strategies, interviewer training, and fieldwork planning. Despite recruitment difficulties, the interviews produced rich data relevant to a wide range of different social science research questions. This demonstrates that this form of data collection is both meaningful and achievable at scale with appropriate design and resourcing. Work Package 2 (WP2) focused on developing and evaluating an interface designed to support researchers working with large corpora of qualitative interview data. The tool, QualQuest, was iteratively developed and tested using two datasets: 12 UK Voices scoping interviews and 73 transcripts from the Welfare at a Social Distance (WASD) Project. Early attempts to integrate LLM-based summarisation were found to be unreliable. A Retrieval Augmented Generation (RAG) architecture was subsequently adopted, enabling the more accurate retrieval of thematically relevant direct quotations in response to natural language queries. Structured testing showed that the final version performed well in terms of recall but less so in terms of precision QualQuest proved especially useful in helping researchers identify relevant transcripts and thematic content quickly, though false positives remained an area for future refinement. Outreach and Knowledge Exchange Activities included convening regular meetings with an international advisory board (IAB), which has now evolved into a global network of researchers working on large-scale qualitative data projects. We also presented our work at academic events and held workshops with researchers from a range of fields and with...
Facebook
Twitter
According to our latest research, the global 3D Map Data Collection Sensor market size reached USD 4.2 billion in 2024, with a robust year-on-year growth rate. The market is anticipated to expand at a CAGR of 13.7% from 2025 to 2033, culminating in a forecasted market value of USD 13.1 billion by 2033. The major growth driver for this market is the increasing demand for high-resolution geospatial data across industries such as automotive, urban planning, and environmental monitoring, propelled by advancements in sensor technologies and the proliferation of autonomous systems worldwide.
The primary growth factor fueling the 3D Map Data Collection Sensor market is the rapid adoption of autonomous vehicles and advanced driver-assistance systems (ADAS) in the automotive sector. As automotive manufacturers strive to enhance safety and navigation capabilities, the integration of LiDAR, radar, and high-definition cameras has become indispensable. These sensors are critical for real-time 3D mapping, object detection, and environmental perception, enabling vehicles to operate autonomously with greater accuracy and reliability. Additionally, the surge in demand for electric vehicles and connected mobility solutions further amplifies the need for sophisticated 3D mapping technologies, driving sustained investment and innovation in sensor development.
Another significant growth catalyst is the widespread application of 3D mapping in urban planning, construction, and infrastructure management. Governments and private enterprises are increasingly leveraging 3D map data collection sensors for smart city initiatives, land surveying, and construction project management. These sensors enable accurate spatial data acquisition, facilitating efficient planning, design, and monitoring of urban environments. The integration of aerial and mobile platforms with advanced sensor arrays allows for rapid, large-scale data collection, supporting infrastructure development and environmental sustainability goals. As urbanization accelerates globally, the demand for precise 3D mapping solutions is expected to rise exponentially.
Technological advancements in sensor miniaturization, data processing, and cloud-based analytics are also propelling the market forward. The evolution of compact, high-performance sensors has made it feasible to deploy 3D map data collection systems across diverse platforms, including unmanned aerial vehicles (UAVs), terrestrial vehicles, and handheld devices. Enhanced data fusion techniques and artificial intelligence-driven analytics are enabling real-time processing and interpretation of vast geospatial datasets, unlocking new use cases in agriculture, disaster management, and environmental monitoring. These innovations are reducing operational costs, improving data accuracy, and expanding the accessibility of 3D mapping technologies to a broader spectrum of end-users.
Regionally, North America continues to dominate the 3D Map Data Collection Sensor market, accounting for the largest revenue share in 2024, followed closely by Europe and Asia Pacific. The presence of leading technology companies, robust R&D investments, and early adoption of autonomous solutions are key factors contributing to the region's market leadership. Meanwhile, the Asia Pacific region is witnessing the fastest growth, driven by rapid urbanization, infrastructure development, and increasing investments in smart city projects. Emerging markets in Latin America and the Middle East & Africa are also exhibiting promising growth potential, supported by government initiatives and expanding industrial applications.
The 3D Map Data Collection Sensor market is segmented by sensor type into LiDAR, radar, camera, GNSS, ultrasonic, and others, each playing a pivotal role in the acquisition of spatial data. LiDAR sensors have emerged as the most prominent segment due to their exceptional ability to generate high-resolution, acc
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Large-Scale Dataset for Segmentation and Classification
Authors: O. Ulucan, D. Karakaya, M. Turkan Department of Electrical and Electronics Engineering, Izmir University of Economics, Izmir, Turkey Corresponding author: M. Turkan Contact Information: mehmet.turkan@ieu.edu.tr
General Introduction
This dataset contains 9 different seafood types collected from a supermarket in Izmir, Turkey for a university-industry collaboration project at Izmir University of Economics, and this work was published in ASYU 2020. The dataset includes gilt head bream, red sea bream, sea bass, red mullet, horse mackerel, black sea sprat, striped red mullet, trout, shrimp image samples.
If you use this dataset in your work, please consider to cite:
@inproceedings{ulucan2020large, title={A Large-Scale Dataset for Fish Segmentation and Classification}, author={Ulucan, Oguzhan and Karakaya, Diclehan and Turkan, Mehmet}, booktitle={2020 Innovations in Intelligent Systems and Applications Conference (ASYU)}, pages={1--5}, year={2020}, organization={IEEE} }
Purpose of the work
This dataset was collected in order to carry out segmentation, feature extraction, and classification tasks and compare the common segmentation, feature extraction, and classification algorithms (Semantic Segmentation, Convolutional Neural Networks, Bag of Features). All of the experiment results prove the usability of our dataset for purposes mentioned above.
Data Gathering Equipment and Data Augmentation
Images were collected via 2 different cameras, Kodak Easyshare Z650 and Samsung ST60. Therefore, the resolution of the images are 2832 x 2128, 1024 x 768, respectively.
Before the segmentation, feature extraction, and classification process, the dataset was resized to 590 x 445 by preserving the aspect ratio. After resizing the images, all labels in the dataset were augmented (by flipping and rotating).
At the end of the augmentation process, the number of total images for each class became 2000; 1000 for the RGB fish images and 1000 for their pair-wise ground truth labels.
Description of the dataset
The dataset contains 9 different seafood types. For each class, there are 1000 augmented images and their pair-wise augmented ground truths. Each class can be found in the "Fish_Dataset" file with their ground truth labels. All images for each class are ordered from "00000.png" to "01000.png".
For example, if you want to access the ground truth images of the shrimp in the dataset, the order should be followed is "Fish->Shrimp->Shrimp GT".
Facebook
TwitterWater features that relate to the interior of the country. A single point that describes a feature's location. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.
Facebook
TwitterIndex to the BGS collection of large scale or large format plans of all types including those relating to mining activity, including abandonment plans and site investigations. The Plans Database Index was set up c.1983 as a digital index to the collections of Land Survey Plans and Plans of Abandoned Mines. There are entries for all registered plans but not all the index fields are complete, as this depends on the nature of the original plan. The index covers the whole of Great Britain.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HPC-ODA is a collection of datasets acquired on production HPC systems, which are representative of several real-world use cases in the field of Operational Data Analytics (ODA) for the improvement of reliability and energy efficiency. The datasets are composed of monitoring sensor data, acquired from the components of different HPC systems depending on the specific use case. Two tools, whose overhead is proven to be very light, were used to acquire data in HPC-ODA: these are the DCDB and LDMS monitoring frameworks.
The aim of HPC-ODA is to provide several vertical slices (here named segments) of the monitoring data available in a large-scale HPC installation. The segments all have different granularities, in terms of data sources and time scale, and provide several use cases on which models and approaches to data processing can be evaluated. While having a production dataset from a whole HPC system - from the infrastructure down to the CPU core level - at a fine time granularity would be ideal, this is often not feasible due to the confidentiality of the data, as well as the sheer amount of storage space required. HPC-ODA includes 6 different segments:
Power Consumption Prediction: a fine-granularity dataset that was collected from a single compute node in a HPC system. It contains both node-level data as well as per-CPU core metrics, and can be used to perform regression tasks such as power consumption prediction.
Fault Detection: a medium-granularity dataset that was collected from a single compute node while it was subjected to fault injection. It contains only node-level data, as well as the labels for both the applications and faults being executed on the HPC node in time. This dataset can be used to perform fault classification.
Application Classification: a medium-granularity dataset that was collected from 16 compute nodes in a HPC system while running different parallel MPI applications. Data is at the compute node level, separated for each of them, and is paired with the labels of the applications being executed. This dataset can be used for tasks such as application classification.
Infrastructure Management: a coarse-granularity dataset containing cluster-wide data from a HPC system, about its warm water cooling system as well as power consumption. The data is at the rack level, and can be used for regression tasks such as outlet water temperature or removed heat prediction.
Cross-architecture: a medium-granularity dataset that is a variant of the Application Classification one, and shares the same ODA use case. Here, however, single-node configurations of the applications were executed on three different compute node types with different CPU architectures. This dataset can be used to perform cross-architecture application classification, or performance comparison studies.
DEEP-EST Dataset: this medium-granularity dataset was collected on the modular DEEP-EST HPC system and consists of three parts.These were collected on 16 compute nodes each, while running several MPI applications under different warm-water cooling configurations. This dataset can be used for CPU and GPU temperature prediction, or for thermal characterization.
The HPC-ODA dataset collection includes a readme document containing all necessary usage information, as well as a lightweight Python framework to carry out the ODA tasks described for each dataset.
Facebook
TwitterTopographic features whose primary characteristics relate to a single or group of buildings and associated facilities functioning together as a unit. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Roads are one of the most transforming linear infrastructures in human-dominated landscapes, with animal road-kills as their most studied impact. Therefore, there is the need to gather road-kill data and in this sense, citizen science is gaining popularity as an easy and cheap source of data collection that allows large scale studies that may otherwise be unattainable. However, citizen science projects that focus on road-kills tends to be geographically localised, therefore, there is a debate about whether large-scale data collected by citizen scientists can identify spatial and temporal road-kill patterns, and thus, be used as a reliable conservation tool. We aim to assess whether citizen science data contained in the Spanish Atlas of Terrestrial Mammals (henceforth “Atlas”), can be as valuable and accurate as road-kill surveys undertaken by experts in detecting road-kill hotspots and establishing road-kill rates for different species of carnivores. Using Linear Models, we compared species-richness, diversity and abundance of road-killed carnivores between Atlas data and our own road-kill survey database. We also compared (per species) the observed road-kills in our road survey with the expected road-kills based on the species abundance from the Atlas. In our Linear Models we did not find a significant relation between the road-kill data and the Atlas data. This suggests that data from the Atlas are unsuitable to determine road-kills patterns in our study area. This could be due to the lack of control over the sampling effort in the Atlas data, and the fact that the Atlas has a sampling scope that is not fitted for road mortality studies. When we compared observed road-kills (per species) with those expected based on Atlas abundance, we found that some species are road-killed more (or less) than expected. This may be due to ecological or behavioural traits that make some species more (or less) prone to be road-killed. To summarize, our findings suggest that occurrence in Atlas data does not mirror road-kill patterns, likely due to both several biases in Atlas data and to species-specific responses to roads. Thus, to study road-kill rates and patterns, we suggest the use classical road-kill surveys, unless correcting approaches to citizen science datasets are applied. This is especially important when the study aims to determine species’ specific road-kill patterns.
Facebook
TwitterVersion 25 of the dataset, we have refactored the full_dataset.tsv and full_dataset_clean.tsv files (since version 20) to include two additional columns: language and place country code (when available). This change now includes language and country code for ALL the tweets in the dataset, not only clean tweets. With this change we have removed the clean_place_country.tar.gz and clean_languages.tar.gz files. With our refactoring of the dataset generating code we also found a small bug that made some of the retweets not be counted properly, hence the extra increase on tweets available.
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets.
The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (651,611,876 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (154,646,580 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the full_dataset-statistics.tsv and full_dataset-clean-statistics.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/
More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688)
As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global market size for Portable Traffic Data Collection Systems reached USD 1.62 billion in 2024, and is anticipated to expand at a CAGR of 8.1% from 2025 to 2033. By the end of the forecast period, the market is projected to achieve a value of USD 3.23 billion by 2033. The primary growth factor driving this market is the increasing demand for real-time and accurate traffic data to support smart city initiatives, urban mobility planning, and enhanced road safety measures, as per our latest research and industry analysis.
One of the most significant growth drivers for the Portable Traffic Data Collection Systems market is the rapid urbanization and the consequent rise in vehicular density across major cities globally. As urban centers continue to expand, the need for efficient traffic management has become paramount. Governments and urban planners are increasingly relying on advanced traffic data collection to optimize signal timings, reduce congestion, and improve road safety. The ability of portable systems to be quickly deployed and relocated makes them ideal for dynamic and temporary data collection needs, such as during roadworks, special events, or in areas experiencing sudden changes in traffic flow. Furthermore, the integration of these systems with smart city frameworks and intelligent transportation systems (ITS) has amplified their adoption, as municipalities strive for data-driven decision-making to address urban mobility challenges.
Technological advancements are also propelling the market forward. Innovations in sensor technologies, data analytics, wireless communication, and cloud-based platforms have significantly enhanced the accuracy, reliability, and flexibility of portable traffic data collection systems. Modern systems now offer real-time data transmission, remote monitoring, and seamless integration with existing traffic management infrastructure. This has enabled stakeholders to access actionable insights quickly and efficiently, supporting proactive interventions and policy formulation. The shift towards video-based and radar-based solutions, in particular, is driven by their ability to provide granular data on vehicle speed, classification, and count, further fueling market growth. As the cost of these technologies continues to decline, their adoption is expected to increase across both developed and developing regions.
Another key factor contributing to market expansion is the increased focus on sustainable transportation and environmental monitoring. Portable traffic data collection systems are being leveraged to assess the impact of traffic on air quality, noise pollution, and carbon emissions. This data is critical for designing low-emission zones, promoting public transportation, and implementing congestion pricing schemes. Additionally, the growing trend of public-private partnerships in the transportation sector has spurred investments in advanced traffic monitoring solutions. Private companies, research institutes, and urban planning organizations are collaborating to develop innovative applications, further diversifying the market landscape. As regulatory frameworks evolve to mandate comprehensive traffic data collection for infrastructure projects, the demand for portable solutions is expected to witness sustained growth.
Regionally, North America currently dominates the Portable Traffic Data Collection Systems market, accounting for the largest share in 2024. This leadership position is attributed to the early adoption of smart traffic management technologies, substantial government investments in infrastructure modernization, and the presence of leading technology providers. Europe follows closely, driven by stringent regulatory requirements for road safety and environmental monitoring. The Asia Pacific region, however, is poised for the fastest growth during the forecast period, fueled by rapid urbanization, increasing vehicle ownership, and large-scale smart city projects in countries such as China, India, and Japan. Latin America and the Middle East & Africa are also witnessing growing adoption, supported by urban development initiatives and cross-border transportation projects.
The Product Type segment of the Portable Traffic Data Collection Systems market is highly diversified, encompassing radar-based sys
Facebook
Twitter
Facebook
TwitterThe OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.