This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:
-How often do people visit a location? (daily, monthly, absolute, and averages).
-What type of places do they visit ? (parks, schools, hospitals, etc)
-Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors.
-What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?
Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.
Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.
We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.
Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.
Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.
Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.
Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.
POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.
Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.
Delivery schemas We can deliver the data in three different formats:
Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.
Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.
Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Background Acute compartment syndrome (ACS) is an emergency orthopaedic condition wherein a rapid rise in compartmental pressure compromises blood perfusion to the tissues leading to ischaemia and muscle necrosis. This serious condition is often misdiagnosed or associated with significant diagnostic delay, and can lead to limb amputations and death.
The most common causes of ACS are high impact trauma, especially fractures of the lower limbs which account for 40% of ACS cases. ACS is a challenge to diagnose and treat effectively, with differing clinical thresholds being utilised which can result in unnecessary osteotomy. The highly granular synthetic data for over 900 patients with ACS provide the following key parameters to support critical research into this condition:
PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Scope: Enabling data-driven research and machine learning models towards improving the diagnosis of Acute compartment syndrome. Longitudinal & individually linked, so that the preceding & subsequent health journey can be mapped & healthcare utilisation prior to & after admission understood. The dataset includes highly granular patient demographics, physiological parameters, muscle biomarkers, blood biomarkers and co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care (timings and admissions), presenting complaint, lab analysis results (eGFR, troponin, CRP, INR, ABG glucose), systolic and diastolic blood pressures, procedures and surgery details.
Available supplementary data: ACS cohort, Matched controls; ambulance, OMOP data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
This dataset offers valuable insights into the demographic profile of a specific population, with data on factors such as age, income, and gender distribution, as well as number of homes and spending habits categorized into major expenditure categories such as food, transportation, and healthcare.
The data is geocoded using geohash7 (152.9m x 152.4m), providing a more accurate representation of the population distribution.
This information is a valuable resource for companies, researchers, and policymakers looking to gain a deeper understanding of the economic and social landscape of a community.
Utilizing this data, they can make informed decisions related to resource allocation, planning, and policy development, and tailor initiatives to effectively address the challenges and opportunities facing the population.
The dataset can be provided by country, state, municipality, colony, zone, polygon, etc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data indicates by calendar year the What Works Cities certification level achieved by the City of Tempe. Certification helps cities benchmark their progress and develop a roadmap for using data and evidence to deliver results for residents. This data table supports the Data-Driven Governance performance measure. The performance measure page is available at 3.35 Data-Driven Governance. Additional Information (pending)Source: Excel Contact (author): Stephanie DeitrickContact E-Mail (author): Stephanie_Deitrick@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: ExcelPreparation Method: ManualPublish Frequency: AnnualPublish Method: ManualData Dictionary
This paper provides a review of three different advanced machine learning algorithms for anomaly detection in continuous data streams from a ground-test firing of a subscale Solid Rocket Motor (SRM). This study compares Orca, one-class support vector machines, and the Inductive Monitoring System (IMS) for anomaly detection on the data streams. We measure the performance of the algorithm with respect to the detection horizon for situations where fault information is available. These algorithms have been also studied by the present authors (and other co-authors) as applied to liquid propulsion systems. The trade space will be explored between these algorithms for both types of propulsion systems.
This dataset represents 9,097 basin boundaries (rdews_gages.shp) of select U.S. Geological Survey's (USGS) active and historical stream gages derived from the published datasets of stream gage basins (Wieczorek, 2006), GAGESII (Falcone, 2011), and delineated from digital elevation models found in the NHDPlus version 1 data suite (NHDPlus, 2006). These basins were created to assist in spatial processing of model inputs for the U.S. Geological Survey's (USGS) Data-Driven Drought Prediction Project of the Drought Science Program within the Water Resources Mission Area's Water Resource Availability Program.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset belonging to the paper: Data-Driven Machine Learning-Informed Framework for Model Predictive Control in Vehicles
labeled_seed.csv: Processed and labeled data of all maneuvers combined into a single file, sorted by label
raw_track_session.csv: Untouched CSV file from Racebox track session
unlabeled_exemplar.csv: Processed but unlabeled data of street and track data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 6 rows and is filtered where the books is The joy of Dreamweaver MX : recipes for data-driven Web sites. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
These tabular data sets represent daily climate metrics processed from 4 kilometer snow water equivalent (SWE) raster data in millimeters (Broxton and others, 2019) for the period of record 10-01-1981 through 09-30-2020 and compiled for three spatial components: select United States Geological Survey stream gage basins (Staub and Wieczorek, 2023), 2) individual reach flowline catchments of the Upper and Lower Colorado (ucol) portions of the Geospatial Fabric for the National Hydrologic Model, version 1.1 (nhgfv11, Bock and others, 2020 ), and 3) the upstream watersheds of each individual nhgfv11 flowline catchments. Flowline reach catchment information characterizes data at the local scale using the python tool set called gdptools (McDonald, 2021). Reach catchments accumulated upstream through the river network characterizes cumulative upstream conditions. Network-accumulated values were computed using the published python software package Xstrm (Wieferich and others).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A novel subject classification scheme should often be applied to a preclassified bibliographic database for the research evaluation task. Generally, adopting a new subject classification scheme is labor intensive and time consuming, and an effective and efficient approach is necessary. Hence, we propose an approach to apply a new subject classification scheme for a subject-classified database using a data-driven correspondence between the new and present ones. In this paper, we define a subject classification model of the bibliographic database comprising a topological space. Then, we show our approach based on this model, wherein forming a compact topological space is required for a novel subject classification scheme. To form the space, a correspondence between two subject classification schemes using a research project database is utilized as data. As a case study, we applied our approach to a practical example. It is a tool used as world proprietary benchmarking for research evaluation based on a citation database. We tried to add a novel subject classification of a research project database.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The Database Platform as a Service (DBPaaS) market is poised for substantial growth, with a market size that was valued at USD 9.5 billion in 2023 and is projected to reach USD 25.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.5% during the forecast period. This remarkable growth is driven by factors such as the increasing adoption of cloud-based solutions, the surge in data generation across various sectors, and the need for scalable and efficient database management systems. Furthermore, the growing demand for real-time data analytics to derive actionable insights and the rising trend of digital transformation across industries are further propelling the market's expansion.
One of the critical growth drivers of the DBPaaS market is the widespread embrace of cloud technology across businesses of all sizes. As organizations increasingly migrate their operations to the cloud, the demand for flexible and cost-effective database management solutions has surged. DBPaaS allows companies to manage databases without the need for complex on-premises infrastructure, enabling them to focus more on their core business objectives. This cloud-first approach is particularly appealing to small and medium enterprises (SMEs) that may lack the resources to maintain robust IT infrastructures, thereby fueling market growth across this segment.
Moreover, the acceleration of digital transformation initiatives across various industries is another pivotal factor influencing the growth of the DBPaaS market. Industries such as BFSI, healthcare, IT and telecommunications, and retail are increasingly relying on digital solutions to optimize their operations, improve customer experiences, and gain competitive advantages. As these sectors generate vast amounts of data, the need for efficient and scalable database management systems becomes paramount. DBPaaS offers these industries the agility and scalability required to handle their data needs effectively, thereby contributing significantly to market expansion.
The ongoing advancements in real-time data analytics and the increasing importance of data-driven decision-making are also boosting the DBPaaS market. Organizations today are keen on leveraging big data and analytics to enhance business operations and customer satisfaction. DBPaaS solutions provide the necessary infrastructure and tools to manage and analyze large datasets efficiently, allowing businesses to derive insights that can drive strategic initiatives. The ability to access real-time data analytics is crucial for industries like retail and BFSI, where timely decisions can significantly impact performance and profitability.
As the DBPaaS market continues to evolve, the concept of a Database Private Cloud is gaining traction among organizations seeking enhanced security and control over their data. Unlike public cloud solutions, a Database Private Cloud offers dedicated resources and infrastructure, ensuring higher levels of data privacy and compliance with industry regulations. This model is particularly appealing to sectors such as healthcare and BFSI, where data sensitivity and confidentiality are paramount. By opting for a Database Private Cloud, businesses can maintain greater oversight of their data environments, tailoring their database management strategies to meet specific security and operational requirements. This approach not only enhances data protection but also allows for more customized and efficient database solutions, aligning with the growing demand for secure cloud-based services.
Regionally, North America dominates the DBPaaS market due to the early adoption of innovative technologies and the presence of major cloud service providers. The region's mature IT infrastructure, coupled with a strong focus on digital transformation across verticals, creates a conducive environment for DBPaaS growth. Meanwhile, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. Factors such as increasing investments in cloud infrastructure, rapid economic development, and the rising uptake of cloud services by SMEs in countries like India and China contribute to this regional surge. Europe also demonstrates steady growth, driven by stringent data protection regulations that encourage cloud adoption and database management solutions.
The DBPaaS market is segmented based on service types into managed services and pr
Digital light processing (DLP) vat photopolymerization (VP) additive manufacturing (AM) uses patterned UV light to selectively cure a liquid photopolymer into a solid layer. Subsequent layers are printed on to preceding layers to eventually form a desired 3 dimensional (3D) part. This data set characterizes the 3D geometry of a single layer of voxels (volume pixels) printed with photomasks assigned random intensity levels at every pixel. The masks are computer generated, then printed onto a glass cover slide. Geometry of the printed voxels is characterized by laser scanning confocal microscopy. The data were originally curated to train image-to-image U-net machine learning models to predict voxel scale geometry given arbitrary photomasks, as described in the publication "A Data-Driven Approach to Complex Voxel Predictions in Grayscale Digital Light Processing Additive Manufacturing Using U-nets and Generative Adversarial Networks". Data are provided in a raw (native microscope format and photomask image) and processed into aligned mask-print training pairs. A total of 1500 8 pixel × 8 pixel (i.e. 96 000 pixel interactions) training pairs are provided. Jupyter notebooks for various steps in process are also provided.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global database management services market size was estimated at USD 20.5 billion in 2023 and is projected to reach USD 40.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 7.6% during the forecast period. A significant growth factor propelling this market includes the increasing digital transformation initiatives across various industries, driving the need for robust database management solutions.
One of the primary growth drivers for the database management services market is the exponential growth of data generated globally. Enterprises are increasingly digitizing their operations, generating massive volumes of data that need efficient management. Furthermore, the proliferation of cloud computing has made the storage and management of data more flexible and scalable, fueling the adoption of cloud-based database management services. Another critical aspect is the advent of big data analytics, which demands advanced database management systems to handle and process large datasets effectively.
The increasing adoption of advanced technologies such as artificial intelligence (AI), machine learning (ML), and the Internet of Things (IoT) is also contributing significantly to the market's growth. These technologies require robust database management systems to store and analyze the vast amounts of data they generate. Businesses are recognizing the value of data-driven insights for making informed decisions, thereby accelerating the demand for sophisticated database management services. Additionally, regulatory requirements for data storage and management are becoming more stringent, compelling organizations to adopt advanced database management systems to ensure compliance.
The growing trend of remote work and the need for real-time data access also play a crucial role in the market's expansion. With more employees working remotely, the demand for seamless and secure data access has surged, leading to a higher need for effective database management solutions. Moreover, the rise of e-commerce and online services has led to an increased demand for efficient and scalable database management systems to handle customer data, transactions, and other critical information.
From a regional perspective, North America holds a significant share of the database management services market, primarily due to the presence of major technology companies and early adoption of advanced technologies. The Asia-Pacific region is expected to witness the highest growth rate during the forecast period, driven by rapid industrialization, increasing digitalization, and growing investments in IT infrastructure. Europe and Latin America are also experiencing steady growth, with organizations in these regions increasingly adopting database management solutions to enhance operational efficiency and drive business growth.
Database management services can be segmented by service type into consulting, implementation, maintenance, and support. Consulting services involve providing expert advice and strategies for database management tailored to an organization’s specific needs. As businesses strive to integrate more sophisticated data solutions, the demand for consulting services is expected to grow. Consultants help identify the most suitable database management systems, optimize existing infrastructure, and ensure that data policies comply with regulatory standards, thus driving the segment's growth.
Implementation services encompass the deployment of database management systems and solutions within an organization. This segment is poised for significant growth as companies move towards modernizing their IT infrastructures. Implementation services ensure seamless integration of new systems with existing technologies, minimizing disruption and enhancing data accessibility and security. With the rise of cloud computing, implementation services are increasingly focused on migrating on-premises databases to cloud-based solutions, which offers scalability and cost-efficiency.
Maintenance services involve the ongoing management and upkeep of database systems to ensure their optimal performance. This includes regular updates, security patches, and troubleshooting to prevent downtime and data loss. As businesses become more reliant on data-driven operations, the importance of maintenance services cannot be overstated. These services ensure that databases remain functional, secure, and efficient, thereby supporting continuous business operations and data availabilit
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4812https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4812
This repository contains the Matlab code and generated data for the manuscript "Data-driven geometric parameter optimization for PD-GMRES" which uses a quadtree approach to optimize parameters for the iterative solver PD-GMRES. It includes hardware specific data to allow for reproducibity of our results. Our calculations were performed using MATLAB R2019a and should be reproducible up to and including version R2022a. A change in version R2022b leads to different numerical behavior. However, the code does run on newer Matlab versions. Further information is contained in the README.
Accurate rainfall-runoff modelling is particularly challenging due to complex nonlinear relationships between various factors such as rainfall characteristics, soil properties, land use, and temporal lags. Recently, with improvements to computation systems and resources, data-driven models have shown good performances for runoff forecasting. However, the relative performance of common data-driven models using small temporal resolutions is still unclear. This study presents an application of data-driven models using artificial neural network, support vector regression and long-short term memory approaches and distributed forcing data for runoff predictions between 2010 to 2019 in the Russian River basin, California, USA. These models were used to predict hourly runoff with 1 – 6 hours of lead time using precipitation, soil moisture, baseflow and land surface temperature datasets provided from the North American Land Data Assimilation System. The predicted results were evaluated in terms of seasonal and event-based performance using various statistical metrics. The results showed that the long-short term memory and support vector regression models outperforms artificial neural network model for hourly runoff forecasting, and the predictive performance of the models was greater during the wet seasons compared to the dry seasons. In addition, a comparison of the data-driven model results with the National Water Model, a fully distributed physical-based hydrologic model, showed that the long-short term memory and support vector regression models provide comparable performance. The results demonstrate that data-driven models for hourly runoff forecasting are sufficiently predictive and useful in areas where observation systems are not available.
This metadata record describes outputs from 12 configurations of long short-term memory (LSTM) models which were used to predict streamflow drought occurrence at 384 stream gage locations in the Colorado River Basin region. The models were trained on data from 01-Oct-1981 to 31-Mar-2005 and validated over the period of record spanning 01-Apr-2005 to 31-Mar- 2014. The models use explanatory variable inputs described in Wieczorek (2023) (doi.org/10.5066/P98IG8LO) to predict daily streamflow and streamflow percentiles as described in Simeone (2022) (doi.org/10.5066/P92FAASD). Separate models were trained to predict daily streamflow and streamflow percentiles. Two types of percentiles were modeled: (1) fixed-threshold percentiles that are based on comparing all streamflow throughout the year, and (2) variable-threshold percentiles that compare streamflow separately for each day of the year (using a moving 30-day window). Separate models were trained for predicting at lead times of 0, 7 and 14 days ahead. Details on methods and model configurations can be found in Hamshaw and others (2023). The comma separated files are grouped by target variables and lead times as listed in the table below and include model output for the validation period (01-Apr-2005 to 31-Mar-2014). This metadata record also includes model code (see Readme.txt within the CRB_NN_model_archive.zip for more details) and a model performance metrics file (model_validation_performance_metrics_by_gage.csv).
Data File | Prediction target variable | Forecast lead time | Model Configurations |
---|---|---|---|
streamflow_model_predictions_0day_ahead.csv | Daily Streamflow (mm/day) | 0 days | Streamflow-0d, PUB-Streamflow-0d |
streamflow_model_predictions_7day_ahead.csv | Daily Streamflow (mm/day) | 7 days | Streamflow-7d |
streamflow_model_predictions_14day_ahead.csv | Daily Streamflow (mm/day) | 14 days | Streamflow-14d |
percentile_fixed_model_predictions_0day_ahead.csv | Fixed Percentile | 0 days | Fixed-0d, PUB-Fixed-0d Q-to-Fixed-0d |
percentile_fixed_model_predictions_7day_ahead.csv | Fixed Percentile | 7 days | Fixed-7d |
percentile_fixed_model_predictions_14day_ahead.csv | Fixed Percentile | 14 days | Fixed-14d |
percentile_variable_model_predictions_0day_ahead.csv | Variable Percentile | 0 days | Variable-0d, PUB-Variable-0d, Q-to-Variable-0d |
percentile_variable_model_predictions_7day_ahead.csv | Variable Percentile | 7 days | Variable-7d |
percentile_variable_model_predictions_14day_ahead.csv | Variable Percentile | 14 days | Variable-14d |
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In a resource-constrained world with growing population and demand for energy, goods, and services with commensurate environmental impacts, we need to understand how these trends relate to various aspects of economic activity. 7see-GB is a computational model that links energy demand through to final economic consumption, and is used to explore decadal scenarios for the UK macroeconomy. This dataset includes two published models (*.vpm) from the source model 7see-GB, version 5-10 (22Apr15). They show how results were created for the paper 'A Robust Data-driven Macro-socioeconomic-energy Model'. The source model was developed in Vensim(r) (5.8b) and these published models can be viewed with the Vensim Reader, as provided with this dataset. There are instructions on how to navigate the published models and inspect variables shown in the paper. The .exe and .dmg files are free 'Model Reader' executables for Windows/OSX which allow a user to run the model without buying the Vensim simulator.
This dataset contains datatables from a USDA-sponsored big data project focused on collecting...
Among various approaches for implementing prognostic algorithms data-driven algorithms are popular in the industry due to their intuitive nature and relatively fast developmental cycle. However, no matter how easy it may seem, there are several pitfalls that one must watch out for while developing a data-driven prognostic algorithm. One such pitfall is the uncertainty inherent in the system. At each processing step uncertainties get compounded and can grow beyond control in predictions if not carefully managed during the various steps of the algorithms. This paper presents analysis from our preliminary development of data- driven algorithm for predicting end of discharge of Li-ion batteries using constant load experiment data and challenges faced when applying these algorithms to randomized variable loading profile as is the case in realistic applications. Lessons learned during the development phase are presented.
This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:
-How often do people visit a location? (daily, monthly, absolute, and averages).
-What type of places do they visit ? (parks, schools, hospitals, etc)
-Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors.
-What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?
Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.
Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.
We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.
Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.
Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.
Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.
Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.
POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.
Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.
Delivery schemas We can deliver the data in three different formats:
Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.
Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.
Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.