This dataset contains the historical Unidata Internet Data Distribution (IDD) Global Observational Data that are derived from real-time Global Telecommunications System (GTS) reports distributed via the Unidata Internet Data Distribution System (IDD). Reports include surface station (SYNOP) reports at 3-hour intervals, upper air (RAOB) reports at 3-hour intervals, surface station (METAR) reports at 1-hour intervals, and marine surface (BUOY) reports at 1-hour intervals. Select variables found in all report types include pressure, temperature, wind speed, and wind direction. Data may be available at mandatory or significant levels from 1000 millibars to 1 millibar, and at surface levels. Online archives are populated daily with reports generated two days prior to the current date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network and loading data for a real-world distribution network in the North-East of England.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Number of in situ measurements obtained from instruments carried aboard oceanographic research and merchant ships. This is of annual data distribution. The spatial and temporal coverage of nitrates data in the Gulf of Mexico is not uniform, and most of the historical data were collected over the continental shelf near shallow intertidal areas (<200 m depth).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.
In this dataset:
We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.
Please cite this dataset as:
Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4
Organization of data
The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:
HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.
HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.
HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.
target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.
Column names
YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.
H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)
In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.
License Creative Commons Attribution 4.0 International.
Related datasets
Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612
Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the profile data that produces fig1 in Chávez-Solís, E.M., Solís, C., Simões, N. et al. Distribution patterns, carbon sources and niche partitioning in cave shrimps (Atyidae: Typhlatya). Sci Rep 10, 12812 (2020). https://doi.org/10.1038/s41598-020-69562-2.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Big Data Processing and Distribution Software market is experiencing robust growth, driven by the exponential increase in data volume across industries and the rising need for efficient data management and analytics. The market, estimated at $50 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $150 billion by 2033. This growth is fueled by several key factors, including the increasing adoption of cloud-based solutions, the proliferation of Internet of Things (IoT) devices generating massive data streams, and the growing demand for real-time analytics and data-driven decision-making across various sectors like finance, healthcare, and retail. Large enterprises are leading the adoption, followed by a rapidly growing segment of Small and Medium-sized Enterprises (SMEs) leveraging cloud-based solutions for cost-effectiveness and scalability. The market is characterized by a competitive landscape with both established players like Google, Amazon Web Services, and Microsoft, and emerging niche providers offering specialized solutions. While the North American market currently holds a significant share, regions like Asia-Pacific are showing exceptional growth potential, driven by rapid digitalization and increasing investments in data infrastructure. However, the market also faces certain restraints. These include the complexities associated with data integration and management, the high costs of implementing and maintaining big data solutions, and the need for skilled professionals to manage and analyze the data effectively. Furthermore, ensuring data security and compliance with evolving regulations poses a challenge for organizations. Despite these hurdles, the overall market outlook remains positive, fueled by continuous technological advancements, increasing data generation, and the growing understanding of the value of data-driven insights. The shift towards cloud-based solutions continues to be a significant trend, facilitating easier access, scalability, and reduced infrastructure costs. The market's future hinges on the continued development of innovative solutions addressing security, scalability, and ease of use, catering to the diverse needs of various industry segments and geographical locations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
Water-quality data for groundwater samples collected from 4,824 sites, and ancillary data and information on sampled wells and principal aquifers, were used to assess the occurrence and distribution of strontium in U.S. groundwater from 32 principal aquifers. This data release includes one tab-delimited text file detailing these data. Table 1. Chemical data from the U.S. Geological Survey National Water Information System and ancillary data considered for assessment of strontium concentration in U.S. groundwater.
This paper offers a scalable and robust distributed algorithm for decision-tree induction in large peer-to-peer (P2P) environments. Computing a decision tree in such large distributed systems using standard centralized algorithms can be very communication-expensive and impractical because of the synchronization requirements. The problem becomes even more challenging in the distributed stream monitoring scenario where the decision tree needs to be updated in response to changes in the data distribution. This paper presents an alternate solution that works in a completely asynchronous manner in distributed environments and offers low communication overhead, a necessity for scalability. It also seamlessly handles changes in data and peer failures. The paper presents extensive experimental results to corroborate the theoretical claims.
The water depth and temperature data was collected from multiple ships by the United Kingdom Hydrographic Office. The originator's analog bathythermograph (XBT) data was submitted in a diskette containing 3 files in NODEF-1 format. The data has been converted by NODC and is now available on line. See accompanying documentation for file format information.
The Clinical Questions Collection is a repository of questions that have been collected between 1991 – 2003 from healthcare providers in clinical settings across the country. The questions have been submitted by investigators who wish to share their data with other researchers. This dataset is no-longer updated with new content. The collection is used in developing approaches to clinical and consumer-health question answering, as well as researching information needs of clinicians and the language they use to express their information needs. All files are formatted in XML.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Distribution Centre. While all reasonable steps have been taken to ensure the accuracy, completeness and reliability of the information provided, Enemalta assumes no responsibility for any errors, inaccuracies or missing information. In no event shall Enemalta be liable for any direct, indirect, special or incidental damage resulting from, arising out of or in connection with the use of the information being provided.
Organizations in the services industry were the most common targets of leaks of confidential data in the database format in Russia in 2023, having accounted for 28 percent of the total. The second-largest share was occupied by retail and e-commerce companies, at 26 percent of data theft cases.
This dataset provides an overview of the U.S. Environmental Protection Agency’s (EPA’s) research results from investigating water quality monitoring sensor technologies that might be used to serve as a real-time contamination warning system (CWS) when a contaminant is introduced into a drinking water distribution system.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
How humans efficiently operate in a world with massive amounts of data that need to be processed, stored, and recalled has long been an unsettled question. Our physical and social environment needs to be represented in a structured way, which could be achieved by reducing input to latent variables in the form of probability distributions, as proposed by influential, probabilistic accounts of cognition and perception. However, few studies have investigated the neural processes underlying the brain’s potential ability to represent a probability distribution’s complex, global features. Here, we presented participants with a sequence of tones that formed a normal or a bimodal distribution. Using a novel, single-trial EEG analysis, we demonstrate a neural response that indexes the likelihood of an item, given previously presented items, and corresponds to the experienced tones’ distribution. Our results indicate that the adult human brain can build a representation of the complex, global pattern of a probability distribution and offer a novel tool for an in-depth understanding of related neural mechanics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes the data visualization scripts that are part of the second chapter in the PhD thesis Landscapes of Trade, the used (open) data, and resulting plots. There is also one figure of Chapter 1 and one figure of Chapter 7 included. Proprietary data used to calculate some of the numbers in Chapter 2 are not included in this repository.
The set includes two zipped work folders:
The folder 'Datavisualization' includes: a README file, two R scripts to produce plots and numbers used in the publication, along with underlying data folder and export folder.
The folder Gateway Factor includes: a README file, two R scripts to treat the data and produce the regression analysis as shown in Chapter 2, with underlying data folder and export folder.
This dataset provides information about the number of properties, residents, and average property values for Global Distribution Way cross streets in Louisville, KY.
Effective September 27, 2023, this dataset will no longer be updated. Similar data are accessible from wonder.cdc.gov. This visualization provides data that can be used to illustrate potential differences in the burden of deaths due to COVID-19 by race and ethnicity.
This dataset contains the historical Unidata Internet Data Distribution (IDD) Global Observational Data that are derived from real-time Global Telecommunications System (GTS) reports distributed via the Unidata Internet Data Distribution System (IDD). Reports include surface station (SYNOP) reports at 3-hour intervals, upper air (RAOB) reports at 3-hour intervals, surface station (METAR) reports at 1-hour intervals, and marine surface (BUOY) reports at 1-hour intervals. Select variables found in all report types include pressure, temperature, wind speed, and wind direction. Data may be available at mandatory or significant levels from 1000 millibars to 1 millibar, and at surface levels. Online archives are populated daily with reports generated two days prior to the current date.