Facebook
TwitterGOOD is a systematic graph OOD benchmark, which provides carefully designed data environments for distribution shifts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of basic properties of empirical distributions that are interesting for data mining.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.
In this dataset:
We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.
Please cite this dataset as:
Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4
Organization of data
The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:
HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.
HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.
HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.
target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.
Column names
YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.
H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)
In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.
License Creative Commons Attribution 4.0 International.
Related datasets
Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612
Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Percentage of iPhones assembled by contract manufacturers.
Facebook
Twitterhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QEhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QE
WIDEa is R-based software aiming to provide users with a range of functionalities to explore, manage, clean and analyse "big" environmental and (in/ex situ) experimental data. These functionalities are the following, 1. Loading/reading different data types: basic (called normal), temporal, infrared spectra of mid/near region (called IR) with frequency (wavenumber) used as unit (in cm-1); 2. Interactive data visualization from a multitude of graph representations: 2D/3D scatter-plot, box-plot, hist-plot, bar-plot, correlation matrix; 3. Manipulation of variables: concatenation of qualitative variables, transformation of quantitative variables by generic functions in R; 4. Application of mathematical/statistical methods; 5. Creation/management of data (named flag data) considered as atypical; 6. Study of normal distribution model results for different strategies: calibration (checking assumptions on residuals), validation (comparison between measured and fitted values). The model form can be more or less complex: mixed effects, main/interaction effects, weighted residuals.
Facebook
TwitterThe Employee Department Distribution Dataset offers an insightful glimpse into the organizational structure of a company, showcasing the distribution of employees across various departments. With a diverse array of departments including Engineering, Business Development, Sales, Services, Product Management, Accounting, Legal, Marketing, Human Resources, Training, Auditing, Support, and Research and Development, this dataset presents a comprehensive overview of workforce allocation within the organization.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this supplemental material, we provide supplemental information (PDF document with derivations of the results presented in the paper and two additional use cases) and the supplementary video for uncertainty-aware spectral analysis. We model an uncertain time series as a multivariate Gaussian process. We propagate the uncertainty and explicitly compute the probability distribution in the spectral domain. In the video, we use our interactive visual analysis tool to analyze these distributions. This material complements the paper on Uncertainty-aware Spectral Visualization.
Facebook
TwitterSoftware tool for visualizing research papers from computational psychiatry as two dimensional map. Shows distribution of papers along neuroscientific, psychiatric, and computational dimensions to enable anyone to find niche research and deepen their understanding of the field. Database for visualizing research papers.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The research is to find the impact of fracture geometry and topology on the connectivity and flow properties of stochastic fracture networks. The data are generated by our in-house developed C++ code. From the shared data, you can: 1. Find the data to reproduce Fig. 7 and 8 in the paper. Those data are averaged by 6000 random realizations for each case; 2. Find the data to plot 2D fracture networks and its graph representation; 3. Find the data to plot 3D fracture networks and its graph representation; 4. Find the data to plot the pressure distribution in 2D fracture networks; 5. Find the data to plot the pressure distribution in 3D fracture networks; 6. Find the corresponding Matlab code to help you visualize the result; To have a better visualization, the system size of 2D ad 3D fracture networks are modified. Those modifications are only for visualization purpose and will not change the results we have in the paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The file presenting a pore file distribution on a SiC ceramic structure made by mercury intrusion porosimetry method.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Bay View by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Bay View. The dataset can be utilized to understand the population distribution of Bay View by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Bay View. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Bay View.
Key observations
Largest age group (population): Male # 60-64 years (60) | Female # 65-69 years (54). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Bay View Population by Gender. You can refer the same here
Facebook
TwitterDate: December 2020Summary: The Pacific lamprey has inhabited the rivers, streams and coastal waters of the west for 350 million years (BPA, 2005). These data describe areas of known distribution (this dataset is limited to Oregon, Washington and Idaho) currently and historically used by Pacific Lamprey. Pacific Lamprey, Entosphenus tridentata are native to the Pacific coast from Baja California to the Bering Sea. They are anadromous, migrating from their natal stream to the ocean and back, but unlike salmon, Pacific Lamprey spend well over half of their total life cycle in freshwater. Distribution data is mapped at a 1:24,000 scale and are based on the StreamNet Hydrography dataset. Type: ArcGIS Feature Layer View for Pacific Lamprey StoryMap. Only showing streams for Pacific Lamprey, Unknown lamprey streams not shown in this viewReason: Public outreach Scale: Web Source: Columbia River Fish and Wildlife Conservation Office curated the data from multiple sources (see metadata).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual distribution of students across grade levels in Creek View Elementary School
Facebook
Twitter**Dataset Overview ** The Titanic dataset is a widely used benchmark dataset for machine learning and data science tasks. It contains information about passengers who boarded the RMS Titanic in 1912, including their age, sex, social class, and whether they survived the sinking of the ship. The dataset is divided into two main parts:
Train.csv: This file contains information about 891 passengers who were used to train machine learning models. It includes the following features:
PassengerId: A unique identifier for each passenger Survived: Whether the passenger survived (1) or not (0) Pclass: The passenger's social class (1 = Upper, 2 = Middle, 3 = Lower) Name: The passenger's name Sex: The passenger's sex (Male or Female) Age: The passenger's age Sibsp: The number of siblings or spouses aboard the ship Parch: The number of parents or children aboard the ship Ticket: The passenger's ticket number Fare: The passenger's fare Cabin: The passenger's cabin number Embarked: The port where the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton) Test.csv: This file contains information about 418 passengers who were not used to train machine learning models. It includes the same features as train.csv, but does not include the Survived label. The goal of machine learning models is to predict whether or not each passenger in the test.csv file survived.
**Data Preparation ** Before using the Titanic dataset for machine learning tasks, it is important to perform some data preparation steps. These steps may include:
Handling missing values: Some of the features in the dataset have missing values. These values can be imputed or removed, depending on the specific task. Encoding categorical variables: Some of the features in the dataset are categorical variables, such as Pclass, Sex, and Embarked. These variables need to be encoded numerically before they can be used by machine learning algorithms. Scaling numerical variables: Some of the features in the dataset are numerical variables, such as Age and Fare. These variables may need to be scaled to ensure that they are on the same scale. Data Visualization
Data visualization can be a useful tool for exploring the Titanic dataset and gaining insights into the data. Some common data visualization techniques that can be used with the Titanic dataset include:
Histograms: Histograms can be used to visualize the distribution of numerical variables, such as Age and Fare. Scatter plots: Scatter plots can be used to visualize the relationship between two numerical variables. Box plots: Box plots can be used to visualize the distribution of a numerical variable across different categories, such as Pclass and Sex. Machine Learning Tasks
The Titanic dataset can be used for a variety of machine learning tasks, including:
Classification: The most common task is to use the train.csv file to train a machine learning model to predict whether or not each passenger in the test.csv file survived. Regression: The dataset can also be used to train a machine learning model to predict the fare of a passenger based on their other features. Anomaly detection: The dataset can also be used to identify anomalies, such as passengers who are outliers in terms of their age, social class, or other features.
Facebook
TwitterWe list the distribution of the 15 main hub gene degrees in the two graph prototypes. Here, is the number of genes with neighbors.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The graph technology market is experiencing robust growth, driven by the increasing need for advanced data analytics and the rising adoption of artificial intelligence (AI) and machine learning (ML) applications. The market's expansion is fueled by the ability of graph databases to handle complex, interconnected data more efficiently than traditional relational databases. This is particularly crucial in industries like finance (fraud detection, risk management), healthcare (patient relationship mapping, drug discovery), and e-commerce (recommendation systems, personalized marketing). Key trends include the move towards cloud-based graph solutions, the integration of graph technology with other data management systems, and the development of more sophisticated graph algorithms for advanced analytics. While challenges remain, such as the need for skilled professionals and the complexity of implementing graph databases, the overall market outlook remains positive, with a projected Compound Annual Growth Rate (CAGR) – let's conservatively estimate this at 25% – for the forecast period 2025-2033. This growth will be driven by ongoing digital transformation initiatives across various sectors, leading to an increased demand for efficient data management and analytics capabilities. We can expect to see continued innovation in both open-source and commercial graph database solutions, further fueling the market's expansion. The competitive landscape is characterized by a mix of established players like Oracle, IBM, and Microsoft, alongside emerging innovative companies such as Neo4j, TigerGraph, and Amazon Web Services. These companies are constantly vying for market share through product innovation, strategic partnerships, and acquisitions. The presence of both open-source and proprietary solutions caters to a diverse range of needs and budgets. The market segmentation, while not explicitly detailed, likely includes categories based on deployment (cloud, on-premise), database type (property graph, RDF), and industry vertical. The regional distribution will likely show strong growth in North America and Europe, reflecting the higher adoption of advanced technologies in these regions, followed by a steady rise in Asia-Pacific and other developing markets. Looking ahead, the convergence of graph technology with other emerging technologies like blockchain and the Internet of Things (IoT) promises to unlock even greater opportunities for growth and innovation in the years to come.
Facebook
TwitterConcerning the five selected segments, the segment ***** years has the largest population by age with ***** percent. Contrastingly, ***** years is ranked last, with ***** percent. Their difference, compared to ***** years, lies at ***** percentage points. Find other insights concerning similar markets and segments, such as a ranking of subsegments in China regarding share in the segment Video Streaming (SVoD) and a ranking of subsegments in the Philippines regarding share in the segment Video Streaming (SVoD) . The Statista Market Insights cover a broad range of additional markets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This bar chart displays books by ISBN using the aggregation count. The data is filtered where the author is Lynn Rice-See. The data is about books.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Funko Pop (2019–2022): Trends in Pricing, Licenses & Fan Interest
What the data says
1,482 products from 2019–2022
Cleaned in Kaggle (Python/Pandas), visualized in Tableau
Dashboard worksheets: Top Licenses, Price Distribution, Monthly Releases (by product type), License vs. Interest (treemap).
Data story for analysts
Pricing:
The average list price is $14.25. Distribution is right skewed: most items cluster between $10–$15, with a thin tail above $30–$50 (special editions/jumbo formats).
The $10–$15 band is the core price anchor and outliers likely reflect format premiums or limited releases.
Licenses & demand signals:
The portfolio is dominated by big, evergreen IPs. The Top Licenses view shows NBA at #1, followed by The Nightmare Before Christmas, Five Nights at Freddy’s, Star Wars, Harry Potter, NFL, The Mandalorian, Batman, Funko (original), and My Hero Academia.
The License vs. Interest treemap concentrates large tiles around NBA, Disney/Nightmare Before Christmas, Star Wars, DC, Video Gamesa and anime. This suggests sustained collector attention around sports and long running franchises.
Release cadence & mix:
Monthly Releases (stacked by product type) reveal seasonal spikes aligned with major franchise drops and holiday windows and baseline months hover near the dashboard’s average line.
Pop! figures lead volume while other formats (Pop! Jumbo, Pins, Plush, Keychains, Vinyl GOLD, Apparel, Board Games, Action Figures) appear as smaller but sometimes higher price niches.
How to use this analysis
Pricing strategy: Treat $10–$15 as the reference band and test premiums only for large tile licenses or special formats.
Licensing & inventory: Prioritize replenishment and variant bets for top tiles like NBA, Star Wars, and DC.
Calendar planning: Align drops with monthly peaks and piggyback on franchise events to ride natural demand waves.
Next steps: Add time series features (moving averages by license), and join to secondary data (Google Trends, release schedules, convention dates) to explain peak months and forecast tile growth.
Reproducibility & caveats
Counts represent SKUs, not sales volume.
Prices are listed prices; promotions and secondary market dynamics are out of scope.
Notebook includes cleaning steps (type standardization, outlier checks) and an export for Tableau.
Explore the dashboard to filter by license or product type, compare months against the average line, and use the treemap to trace where fan interest concentrates across the catalog.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Visualizing cholesterol (CL) fluctuation in plasma membranes is a crucially important yet challenging task in cell biology. Here, we proposed a new imaging strategy based on permeability changes of plasma membranes triggered by different CL contents to result in controllable spatial distribution of single fluorescent probes (SF-probes) in subcellular organelles. Three spatial distribution-controllable SF-probes (PMM-Me, PMM-Et, and PMM-Bu) for imaging CL fluctuation in plasma membranes were rationally developed. These SF-probes target plasma membranes and mitochondria at normal CL levels, while they display solely staining in plasma membranes and mitochondria at increased and decreased CL levels, respectively. These polarity-sensitive probes also show distinct emission colors with fluorescence peaks of 575 and 620 nm in plasma membranes and mitochondria, respectively. Thus, the CL fluctuation in plasma membranes can be clearly visualized by means of the spatially distributed and two-color emissive SF-probes.
Facebook
TwitterGOOD is a systematic graph OOD benchmark, which provides carefully designed data environments for distribution shifts.