https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Colour patterns and their visual backgrounds consist of a mosaic of patches that vary in colour, brightness, size, shape and position. Most studies of crypsis, aposematism, sexual selection, or other forms of signalling concentrate on one or two patch classes (colours), either ignoring the rest of the colour pattern, or analysing the patches separately. We summarize methods of comparing colour patterns making use of known properties of bird eyes. The methods are easily modifiable for other animal visual systems. We present a new statistical method to compare entire colour patterns rather than comparing multiple pairs of patches. Unlike previous methods, the new method detects differences in the relationships among the colours, not just differences in colours. We present tests of the method's ability to detect a variety of kinds of differences between natural colour patterns and provide suggestions for analysis.
This dataset features over 160,000 high-quality images of patterns sourced from photographers worldwide. Designed to support AI and machine learning applications, it provides a diverse and richly annotated collection of pattern imagery.
Key Features: 1. Comprehensive Metadata The dataset includes full EXIF data, detailing camera settings such as aperture, ISO, shutter speed, and focal length. Additionally, each image is pre-annotated with object and scene detection metadata, making it ideal for tasks like classification, detection, and segmentation. Popularity metrics, derived from engagement on our proprietary platform, are also included.
Unique Sourcing Capabilities The images are collected through a proprietary gamified platform for photographers. Competitions focused on pattern photography ensure fresh, relevant, and high-quality submissions. Custom datasets can be sourced on-demand within 72 hours, allowing for specific requirements such as particular pattern types (e.g., geometric, organic, textile) or stylistic preferences to be met efficiently.
Global Diversity Photographs have been sourced from contributors in over 100 countries, ensuring a vast array of visual patterns captured in various cultural, architectural, and natural contexts. The images feature varied environments, including fabric textures, wallpapers, cityscapes, fractals, and abstract art, offering a rich visual spectrum for training and analysis.
High-Quality Imagery The dataset includes images with resolutions ranging from standard to high-definition to meet the needs of various projects. Both professional and amateur photography styles are represented, offering a mix of artistic and practical perspectives suitable for a variety of applications.
Popularity Scores Each image is assigned a popularity score based on its performance in GuruShots competitions. This unique metric reflects how well the image resonates with a global audience, offering an additional layer of insight for AI models focused on user preferences or engagement trends.
AI-Ready Design This dataset is optimized for AI applications, making it ideal for training models in tasks such as pattern recognition, style classification, and image generation. It is compatible with a wide range of machine learning frameworks and workflows, ensuring seamless integration into your projects.
Licensing & Compliance The dataset complies fully with data privacy regulations and offers transparent licensing for both commercial and academic use.
Use Cases: 1. Training AI systems for visual pattern recognition and classification. 2. Enhancing fashion and interior design models through textile and decorative pattern analysis. 3. Building datasets for generative models and style transfer applications. 4. Supporting research in visual perception, cultural studies, and computational aesthetics.
This dataset offers a comprehensive, diverse, and high-quality resource for training AI and ML models, tailored to deliver exceptional performance for your projects. Customizations are available to suit specific project needs. Contact us to learn more!
The National Airspace System (NAS) is an ever changing and complex engineering system. As the Next Generation Air Transportation System (NextGen) is developed, there will be an increased emphasis on safety and operational and environmental efficiency. Current operations in the NAS are monitored using a variety of data sources, including data from flight recorders, radar track data, weather data, and other massive data collection systems. Although numerous technologies exist to monitor the frequency of known but undesirable behaviors in the NAS, there are currently few methods that can analyze the large repositories to discover new and previously unknown events in the NAS. Having a tool to discover events that have implications for safety or incidents of operational importance, increases the awareness of such scenarios in the community and helps to broaden the overall safety of the NAS, whereas only monitoring the frequency of known events can only provide mitigations for already established problems. This paper discusses a novel approach for discovering operationally significant events in the NAS that are currently not monitored and have potential safety and/or efficiency implications using radar-track data. This paper will discuss the discovery algorithm and describe in detail some flights of interest with comments from subject matter experts who are familiar with the operations in the airspace that was studied.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1.the "dingxiang_datas.xls"contains all the original data which is crawled from DingXiang forum, and also the word segmentation result for each medical record is given.2.the "pmi_new_words.txt" is the result of new medical words found by calculating mutual information.3.the "association_rules" folder contains the association rules mined from the dataset where h-confidence threshold is set 0.3 and support threshold is set 0.0001.4.the "network_communities.csv" describes the complication communities.p.s. if you encounter a "d", it means the word is a disease description vocabulary, and "z" or "s" represents a symptom description vocabulary.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Q: What are the chances for various temperature conditions next month? A: Shaded areas show where average temperature has an increased chance of being warmer or cooler than usual. The darker the shading, the greater the chance for the indicated condition. White areas have equal chances for average temperatures that are below, near, or above the long-term average for the month. Q: What data do experts use to develop these forecasts? A: Climate scientists base future climate outlooks on current patterns in the ocean and atmosphere. They examine projections from climate and weather models and consider recent trends. They also check historical records to see what temperature conditions resulted from similar patterns in the past. Q: What do the colors mean? A: Colors on the map show experts’ level of confidence in their forecasts for above- or below-average temperatures. Each location on the map has some chance to experience average temperatures that rank in the bottom, middle, or top of records from the previous three decades. White areas have equal chances for all three conditions. Colors show where the odds for one of the conditions are higher than for the other two. A common mistake is to interpret these maps as predicted temperatures. However, dark orange or red areas are not predicted to be warmer than light orange areas. The darker orange areas simply have a higher likelihood for above-average temperatures than the lighter orange areas do. Similarly, dark blue areas are not predicted to be cooler than light blue areas. Keep in mind that outlooks show the most likely condition for each region, not the only possible outcome. You can visit the Data Snapshots interface to view previous temperature outlooks and compare them to monthly temperature observations. Q: Why do these data matter? A: Energy companies want to know how much energy people will need in the next month. Temperature outlooks can inform them when they should prepare to meet high demand for energy. Outlooks can also help them choose the best time to schedule maintenance procedures. Forestry managers also check temperature outlooks. When they see increased chances for warmer-than-usual weather, they prepare for more wildfires. Managers in agricultural industries also want to know if temperatures are likely to be warmer or cooler than usual. This information can help them optimize food production. Q: How did you produce these snapshots? A: Data Snapshots are derivatives of existing data products: to meet the needs of a broad audience, we present the source data in a simplified visual style. NOAA's Climate Prediction Center (CPC) produces the source images for monthly temperature outlooks. To produce our images, we run a set of scripts that access mapping layers from CPC, re-project them into desired projections at various sizes, and output them with a custom color bar. Additional information CPC issues monthly outlooks one-half month before the beginning of the month of interest. On the day before the new month begins, experts update the outlook for the upcoming month. Each monthly outlook in Data Snapshots shows the date the outlook was issued. Outlooks that include Alaska are available: while displaying an outlook of interest, click the Download button, select Full Resolution Assets, and then click OK References One-Month to Three-Month Climate Outlooks. http://www.cpc.ncep.noaa.gov/products/forecasts/ Current Outlook Discussion http://www.cpc.ncep.noaa.gov/products/predictions/long_range/fxus07.html Source: https://www.climate.gov/maps-data/data-snapshots/data-source/temperature-monthly-outlook This upload includes two additional files:* Temperature - Monthly Outlook _NOAA Climate.gov.pdf is a screenshot of the main Climate.gov site for these snapshots (https://www.climate.gov/maps-data/data-snapshots/data-so
PATTERN is a node classification tasks generated with Stochastic Block Models, which is widely used to model communities in social networks by modulating the intra- and extra-communities connections, thereby controlling the difficulty of the task. PATTERN tests the fundamental graph task of recognizing specific predetermined subgraphs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
UI Dark Patterns and Where to Find Them: A Study on Mobile Applications and User Perception
This dataset contains:
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
These two scripts can be used to pull all of the water quality and flow data from the USGS "Current Conditions" page for a given state. You will first need to build a list of URLs using the URL script, and then you can use the second script to pull all of the data from each URL. The scraper is configured to use the maximum available date range for each variable of interest (e.g. discharge, conductivity...), meaning that file sizes can be anywhere from very small to very large. If you're finding that your data is being cut-off, you'll need to extend the "timeout" parameter within the url request call (currently set to 199 s), so that the total extent of the data may load. The USGS prefers automated data retrieval take place between 12 am - 6 am. Please adhere to these guidelines or your connection may be blocked. To facilitate friendly scraping, I've added a start time parameter to the second script. Set this variable ('then') to the time you want to start running the scraper. Another common issue is the parameter code dictionary. The USGS maintains a large list of parameters assigned to 5 digit codes. If a code is not present in the dictionary as is (i.e. dictionary 'd'), the URL will be saved to Broken_links.csv and data will not be collected. To remedy this, follow these URLs in your browser, find the new variables/codes, and then add them to the dictionary.
This layer contains data on the number of establishments, total employment, and total annual payroll for for 20 selected 4- and 5-digit North American Industry Classification System (NAICS) codes. This is shown by county and state boundaries. The full CBP data set (available at census.gov) is updated annually to contain the most currently released CBP data. This layer is symbolized to show the total number of establishments depicted by size, and the average annual pay per employee, depicted by color.
Current Vintage: 2017
CBP Table: CB1700CBP
Data downloaded from: Census Bureau's API for County Business Patterns
Date of API call: June 1, 2019
The United States Census Bureau's County Business Patterns Program (CBP):
About this Program Data Technical Documentation News & Updates
This ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census Bureau and CBP when using this data.
Data Processing Notes: Boundaries come from the US Census Bureau TIGER geodatabases. Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census Bureau. These are Census Bureau boundaries with water and/or coastlines clipped for cartographic purposes. For census tracts, the water cutouts are derived from a subset of the 2010 AWATER (Area Water) boundaries offered by TIGER. For state and county boundaries, the water and coastlines are derived from the coastlines of the 500k TIGER Cartographic Boundary Shapefiles. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 56 records - all US states, Washington D.C., Puerto Rico, and U.S. Island Areas Blank values represent industries where there either were no businesses in that industry and that geography OR industries where the data had to be withheld to avoid disclosing data for individual companies. Users should visit data.census.gov or Census Business Builder for more details on these withheld records.
https://www.gnu.org/licenses/agpl.txthttps://www.gnu.org/licenses/agpl.txt
Business rules are an important part of the requirements of software systems that are meant to support an organization. These rules describe the operations, definitions, and constraints that apply to the organization. Within the software system, business rules are often translated into constraints on the values that are required or allowed for data, called data constraints. Business rules are subject to frequent changes, which in turn require changes to the corresponding data constraints in the software. The ability to efficiently and precisely identify where data constraints are implemented in the source code is essential for performing such necessary changes.
In this paper, we introduce Lasso, the first technique that automatically retrieves the method and line of code where a given data constraint is enforced. Lasso is based on traceability link recovery approaches and leverages results from recent research that identified line-of-code level implementation patterns for data constraints. We implement three versions of Lasso that can retrieve data constraint implementations when they are implemented with any one of 13 frequently occurring patterns. We evaluate the three versions on a set of 299 data constraints from 15 real-world Java systems, and find that they improve method-level link recovery by 30%, 70%, and 163%, in terms of true positives within the first 10 results, compared to their text-retrieval-based baseline. More importantly, the Lasso variants correctly identify the line of code implementing the constraint inside the methods for 68% of the 299 constraints.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.All Sectors: County Business Patterns, including ZIP Code Business Patterns, by Legal Form of Organization and Employment Size Class for the U.S., States, and Selected Geographies: 2023.Table ID.CBP2023.CB2300CBP.Survey/Program.Economic Surveys.Year.2023.Dataset.ECNSVY Business Patterns County Business Patterns.Source.U.S. Census Bureau, 2023 Economic Surveys, Business Patterns.Release Date.2025-06-26.Release Schedule.County Business Patterns (CBP) data, including ZIP Code Business Patterns (ZBP) data are released annually around the month of June. For more information about CBP data releases, see County Business Patterns Updates..Dataset Universe.The dataset universe consists of all establishments that are in operation for at least some part of 2023, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees, and are classified in one of nineteen in-scope sectors defined by the 2017 North American Industry Classification System (NAICS). For more information, see County Business Patterns Methodology..Methodology.Data Items and Other Identifying Records.Number of establishmentsAnnual payroll ($1,000)First-quarter payroll ($1,000)Number of employees (during the pay period including March 12)Noise range for annual payroll, first-quarter payroll, and number of employees during the pay period including March 12Definitions of data items can be found in the table by clicking on the column header and selecting “Column Notes” or by accessing the County Business Patterns Glossary..Unit(s) of Observation.The units for CBP are employer establishments with paid employees extracted from the Business Register, Census Bureau's source of information on employer establishments. An establishment is a single physical location at which business is conducted or services or industrial operations are performed. An establishment is not necessarily equivalent to a company or enterprise, which may consist of one or more establishments. For more information, see County Business Patterns Methodology..Geography Coverage.The data are shown at the U.S., State, County, Metropolitan and Micropolitan Statistical Areas, Combined Statistical Area, 5-digit ZIP code, and Congressional District levels. Also available are data for the District of Columbia, Puerto Rico, and the Island Areas (American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, and the U.S. Virgin Islands) at the state and county equivalent levels.Four additional employment-size classes (1,000 to 1,499 employees, 1,500 to 2,499 employees, 2,500 to 4,999 employees, and 5,000 or more employees) are available at the CSA, MSA, and county-levels.For information about geographic classification, see Program Methodology..Industry Coverage.The data are shown at the 2- through 6-digit NAICS code levels for all sectors with published data, and for NAICS code 00 (Total for all sectors).ZBP data by employment size class, shown at the 2- through 6-digit NAICS code levels, only contains data on the number of establishments. ZBP data shown for NAICS code 00 (Total across all sectors) contains data on the number of establishments, total employment, first quarter payroll, and annual payroll.For information about industry coverage, see Program Methodology..Business Characteristics.Data are classified by Legal Form of Organization (U.S. and state level only) and employment size category of the establishment (1,000 to 1,499 employees, 1,500 to 2,499 employees, 2,500 to 4,999 employees, and 5,000 or more employees). Definitions of data items can be found in the table by clicking on the column header and selecting “Column Notes” or by accessing the County Business Patterns Glossary..Sampling.There is no sampling done for County Business Patterns. CBP data are derived from a complete tabulation of all establishments on the Census Bureau’s Business Register that meet the in-scope criteria for being included in CBP. For more information about methodology and data limitations, see County Business Patterns Methodology..Confidentiality.The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. 7503949, Disclosure Review Board (DRB) approval number: CBDRB-FY25-0158). Beginning with reference year 2007, CBP and ZBP data are released using the Noise Infusion disclosure avoidance methodology to protect confidentiality. To comply with disclosure avoidance guidelines, data rows with fewer than three contributing establishments are not presented. In accordance with U.S. Code, Title 13, Section 9, no data are published that would disclose the operations of an individual employer. For more information on the coverage, disclosure avoidance, and methodology of the CBP and ZBP data products see Program Methodology..Technical Documentation/Methodology.For detailed information see, Program Methodology..Weigh...
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
This data set provides fine-granular statistics on trading traffic generated by six global exchanges over the course of two days in February 2019 for a set of representative feeds and recorded by the systems of vwd Vereinigte Wirtschaftsdienste GmbH (now known as Infront Financial Technology GmbH).
Please note that these numbers represent only limited market segments of the actual exchange and the measured feeds might provide different products and instrument types.
The exchanges are identified as AU = Sydney, FFM = Frankfurt am Main (GER), HK = Hong Kong (CN), Q = NASDAQ (USA), TK = Tokyo (JPN), UK = London (UK).
Please see the Zenodo page https://doi.org/10.5281/zenodo.6381970 for details on syntax etc.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Major business patterns within Riverside County. Not all industries are captured in this dataset, it is meant to represent table CB1300A11 created by the US Census Bureau. See https://factfinder.census.gov/ for more data
This dataset can be updated via the Census API using this workspace: data.countyofriverside.us -County Business... - 8mkp-gyar - FMEv2016.fmw
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Q: What are the chances that total precipitation will be below, near, or above average next month? A: Colors show where total precipitation has an increased chance of being higher or lower than usual during the next month. The darker the shading, the greater the chance for the indicated condition. White areas have equal chances for precipitation totals that are below, near, or above the long-term average (median) for the month. Q: How do experts develop these forecasts? A: Climate scientists base future climate outlooks on current patterns in the ocean and atmosphere. They examine projections from climate and weather models and consider recent trends. They also check historical records to see how much precipitation fell when patterns were similar in the past. Q: What do the colors mean? A: Colors on the map show experts’ level of confidence in their forecasts for above or below median precipitation totals. Each location on the map has some chance to receive precipitation that ranks in the bottom, middle, or top third of records from the previous three decades. White areas have equal chances for each condition. Colors show where the odds for one of the three conditions are higher than for the other two. A common mistake is to interpret these maps as predictions of precipitation amounts. However, dark green areas are not predicted to receive more precipitation than light green areas. The dark green areas simply have a higher likelihood of receiving above median amounts of rain than the light green areas do. Similarly, dark brown areas are not predicted to receive less rain than light brown areas. Keep in mind that outlooks show the most likely condition for each region, not the only possible outcome. Q: Why do these data matter? A: Water managers, farmers, and forestry officials have an intense interest in precipitation outlooks. They use them to help make decisions about water resources, irrigation, and fire-fighting resources. Flood forecasters also use these outlooks. They want to know as early as possible if an area is likely to receive more precipitation than usual. Q: How did you produce these snapshots? A: Data Snapshots are derivatives of existing data products: to meet the needs of a broad audience, we present the source data in a simplified visual style. NOAA's Climate Prediction Center (CPC) produces the source images for monthly precipitation outlooks. To produce our images, we run a set of scripts that access map layers from CPC, re-project them into desired projections at various sizes, and output them with a custom color bar. Additional information CPC issues monthly outlooks one-half month before the beginning of the month of interest. On the day before the new month begins, experts update the outlook for the upcoming month. Each monthly outlook in Data Snapshots shows the date the outlook was issued. Outlooks that include Alaska are available: while displaying an outlook of interest, click the Download button, select Full Resolution Assets, and then click OK References One-Month to Three-Month Climate Outlooks. http://www.cpc.ncep.noaa.gov/products/forecasts/ Current Outlook Discussion http://www.cpc.ncep.noaa.gov/products/predictions/long_range/fxus07.html Source: https://www.climate.gov/maps-data/data-snapshots/data-source/precipitation-monthly-outlookThis upload includes two additional files:* Precipitation - Monthly Outlook _NOAA Climate.gov.pdf is a screenshot of the main Climate.gov site for these snapshots (https://www.climate.gov/maps-data/data-snapshots/data-source/precipitation-monthly-outlook)* Cimate_gov_ Data Snapshots.pdf is a screenshot of the data download page for the full-resolution files.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Social interactions are a defining behavioural trait of social animals. Discovering characteristic patterns in the display of such behaviour is one of the fundamental endeavours in behavioural biology and psychology, as this promises to facilitate the general understanding, classification, prediction and even automation of social interactions. We present a novel approach to study characteristic patterns, including both sequential and synchronous actions in social interactions. The key concept in our analysis is to represent social interactions as sequences of behavioural states and to focus on changes in behavioural states shown by individuals rather than on the duration for which they are displayed. We extend techniques from data mining and bioinformatics to detect frequent patterns in these sequences and to assess how these patterns vary across individuals or changes in interaction tasks. To illustrate our approach and to demonstrate its potential, we apply it to novel data on a simple physical interaction, where one person hands a cup to another person. Our findings advance the understanding of handover interactions, a benchmark scenario for social interactions. More generally, we suggest that our approach permits a general perspective for studying social interactions.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Release Date: 2024-06-27.Release Schedule:.The County Business Patterns (CBP) data, including ZIP Code Business Patterns (ZBP) data, in this file were released on June 27, 2024...Key Table Information:.Beginning with reference year 2007, CBP and ZBP data are released using the Noise disclosure methodology to protect confidentiality. See Program Methodology for complete information on the coverage and methodology of the CBP and ZBP data series..Includes only establishments with payrolls...Four employment-size classes (1,000 to 1,499 employees, 1,500 to 2,499 employees, 2,500 to 4,999 employees, and 5,000 or more employees) are only available at the CSA, MSA, and county-levels...ZBP data by employment size class, shown at the 2-6 digit NAICS code levels only contains data on the number of establishments. ZBP data shown for NAICS code 00 (Total for all sectors) contains data on the number of establishments, total employment, first quarter payroll, and annual payroll...For additional details regarding Congressional Districts, please see Program Methodology...Data Items and Other Identifying Records:.This table contains data classified by Legal Form of Organization (U.S. and state level only) and employment size category of the establishment..Number of establishments.Annual payroll ($1,000).First-quarter payroll ($1,000).Number of employees during the pay period including March 12.Noise range for annual payroll, first-quarter payroll, and number of employees during the pay period including March 12..Geography Coverage:.The data are shown at the U.S., State, County, Metropolitan/ Micropolitan Statistical Areas, Combined Statistical Areas, 5-digit ZIP code, and Congressional District levels. Also available are data for the District of Columbia, Puerto Rico, and the Island Areas (American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, and the U.S. Virgin Islands) at the state and county equivalent levels...Industry Coverage:.The data are shown at the 2- through 6- digit NAICS code levels for all sectors with published data, and for NAICS code 00 (Total for all sectors)...FTP Download:.Download the entire table at: https://www2.census.gov/programs-surveys/cbp/data/2022/CB2200CBP.zip..API Information:.County Business Patterns (CBP) data are housed in the County Business Patterns (CBP) API. For more information, see CBP and ZBP APIs...Methodology:.In accordance with U.S. Code, Title 13, Section 9, no data are published that would disclose the operations of an individual employer. The data are subject to nonsampling error such as errors of self-classification, as well as errors of response, nonreporting and coverage. Data users who create their own estimates using data from this file should cite the U.S. Census Bureau as the source of the original data only.. .To comply with disclosure avoidance guidelines, data rows with fewer than three contributing establishments are not presented. For detailed information about the methods used to collect and produce statistics, see Program Methodology..Symbols:.D - Withheld to avoid disclosing data for individual companies; data are included in higher level totals (used prior to 2017).G - Low noise; cell value was changed by less than 2 percent by the application of noise.H - Moderate noise; cell value was changed by 2 percent or more but less than 5 percent by the application of noise.J - High noise; cell value was changed by 5 percent or more by the application of noise.N - Not available or not comparable.S - Withheld because estimates did not meet publication standards.X - Not applicable.r - Revised (represented as superscript).For a complete list of symbols, see County Business Patterns Glossary...Source:.U.S. Census Bureau, 2022 County Business Patterns..For more information about County Business Patterns, see the County Business Patterns website...Contact Information:.U.S. Census Bureau.Economy-Wide Statistics Division.Business Statistics Branch.(301)763-2580.ewd.county.business.patterns@census.gov
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is part of a challenge about identifying behavioural patterns related to building occupancy using sensor data. It is part of the Talking Buildings initiative, which is about collaborative learning and social AI in urban settings for energy sustainability.
The challenge is organized by TNO in collaboration with the Hanze university of applied sciences, DFKI, and AIMZ. It is supported by the European Tailor-network project.
The focus is on sensor data from a smart-building located on a university campus. The building is a so-called multi-tenant building, which means that it is used by different types of organizations. File floorplan_id.pdf
shows a floorplan of the building. The building consists of three floors. Each floor contains different types of rooms. There are rooms used for lectures, rooms for project meetings and rooms that can be used for demonstrations of innovative projects.
Rooms are grouped in zones. For example, the second floor is called a lecture zone. Some of the zones, such as the main entrance hall, the stairs, the canteen and the toilets, are public zones. These zones can be accessed by everyone. A few of them, such as e.g. the demonstration area on the ground floor, are used as a walk-through to other zones. There are also private zones, which can be accessed only by a single organization. An example of this is room 0.44.
Each zone contains one or more sensors which provide data such as e.g. temperature, movement, light, or co2. Sensors are boxed in a device and mounted on ceilings or walls. That box, in fact a multi-sensor, is connected to a network to provide multiple types of data at once. For instance, a device called a room-sensor provides temperature as well as light, movement and CO, or so-called sound-sensors which measure sound intensity, but also light, CO and movement. There are also single-type sensors, such as door-passage-sensors, which measure only the open or close state of a door. The boxes are hanging throughout the building, but are not equally spread. In the floorplan one can see which (multi)sensor hangs in which room. Details are listed in the sensor-overview. (see section Sensor Overview).
For the challenge the following is available: - a floorplan, - a sensor-overview, - sensordata.
The sensor data, i.e. the measured values, use an anonymised id for the purpose of the challenge. The mapping to the floorplan can be deduced via the sensor-overview (except for some ’missing’ sensors since that is part of the challenge). The data will be available through a number of .csv
files.
The sensor-overview contains information about the sensor boxes. The type can be derived from its sensorId
, (e.g. sound). Remember this sensorId
refers to the box, so apart from sound it also measures temperature, humidity and light. The second column in the sensor-overview contains information about its location. The last column contains the so-called challenge id, which is an anonymized id. The challenge id is used in the provided measurement data files. In the challenge one has to find the corresponding sensorId.
Sometimes topological information about sensors in a building happens to be incomplete or out- dated. It can be that sensors are re-mounted, rooms could have been splitted, or the administration fails to be precise. In such situations it can happen that one does have one does have data coming from the sensors captured somewhere in the measurement database.
In the challenge we mimicked this situation, there are 5 sensors deliberately left out in the sensorplan. Their locations are known by the organizer so we can verify our approaches. But as a matter of fact, during the preparation of this challenge; one sensor (named Thomas) was really not known for its location. Maybe we can help the building owner here. As can be seen in the sensor-overview in section 2.2, there are a few sensors that are not mentioned, and marked as unknown with respect to their position in the floorplan. The assignment is to compare this data with data from other sensors and find out in which room each of the unidentified sensors might be mounted. A possible direction towards a solution might be to derive time based patterns and relate them to a particular type of room in which they are mounted. By means of clustering these patterns one might find the location of the unidentified sensors.
The question to be answered in this challenge is
Although the data from building sensors indicates a notion of occupancy, there is neither a ground truth nor a unit of which we c...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset offers valuable insights into yearly domestic water consumption across various Lower Super Output Areas (LSOAs) or Data Zones, accompanied by the count of water meters within each area. It is instrumental for analysing residential water use patterns, facilitating water conservation efforts, and guiding infrastructure development and policy making at a localised level.
Key Definitions
Aggregation
The process of summarising or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes.
AMR Meter
Automatic meter reading (AMR) is the technology of automatically collecting consumption, diagnostic, and status data from a water meter remotely and periodically.
Dataset
Structured and organised collection of related elements, often stored digitally, used for analysis and interpretation in various fields.
Data Zone
Data zones are the key geography for the dissemination of small area statistics in Scotland
Dumb Meter
A dumb meter or analogue meter is read manually. It does not have any external connectivity.
Granularity
Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours
ID
Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance.
LSOA
Lower Layer Super Output Areas (LSOA) are a geographic hierarchy designed to improve the reporting of small area statistics in England and Wales.
Open Data Triage
The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data.
Schema
Structure for organising and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute.
Smart Meter
A smart meter is an electronic device that records information and communicates it to the consumer and the supplier. It differs from automatic meter reading (AMR) in that it enables two-way communication between the meter and the supplier.
Units
Standard measurements used to quantify and compare different physical quantities.
Water Meter
Water metering is the practice of measuring water use. Water meters measure the volume of water used by residential and commercial building units that are supplied with water by a public water supply system.
Data History
Data Origin
Domestic consumption data is recorded using water meters. The consumption recorded is then sent back to water companies. This dataset is extracted from the water companies.
Data Triage Considerations
This section discusses the careful handling of data to maintain anonymity and addresses the challenges associated with data updates, such as identifying household changes or meter replacements.
Identification of Critical Infrastructure
This aspect is not applicable for the dataset, as the focus is on domestic water consumption and does not contain any information that reveals critical infrastructure details.
Commercial Risks and Anonymisation
Individual Identification Risks
There is a potential risk of identifying individuals or households if the consumption data is updated irregularly (e.g., every 6 months) and an out-of-cycle update occurs (e.g., after 2 months), which could signal a change in occupancy or ownership. Such patterns need careful handling to avoid accidental exposure of sensitive information.
Meter and Property Association
Challenges arise in maintaining historical data integrity when meters are replaced but the property remains the same. Ensuring continuity in the data without revealing personal information is crucial.
Interpretation of Null Consumption
Instances of null consumption could be misunderstood as a lack of water use, whereas they might simply indicate missing data. Distinguishing between these scenarios is vital to prevent misleading conclusions.
Meter Re-reads
The dataset must account for instances where meters are read multiple times for accuracy.
Joint Supplies & Multiple Meters per Household
Special consideration is required for households with multiple meters as well as multiple households that share a meter as this could complicate data aggregation.
Schema Consistency with the Energy Industry:
In formulating the schema for the domestic water consumption dataset, careful consideration was given to the potential risks to individual privacy. This evaluation included examining the frequency of data updates, the handling of property and meter associations, interpretations of null consumption, meter re-reads, joint suppliers, and the presence of multiple meters within a single household as described above.
After a thorough assessment of these factors and their implications for individual privacy, it was decided to align the dataset's schema with the standards established within the energy industry. This decision was influenced by the energy sector's experience and established practices in managing similar risks associated with smart meters. This ensures a high level of data integrity and privacy protection.
Schema
The dataset schema is aligned with those used in the energy industry, which has encountered similar challenges with smart meters. However, it is important to note that the energy industry has a much higher density of meter distribution, especially smart meters.
Aggregation to Mitigate Risks
The dataset employs an elevated level of data aggregation to minimise the risk of individual identification. This approach is crucial in maintaining the utility of the dataset while ensuring individual privacy. The aggregation level is carefully chosen to remove identifiable risks without excluding valuable data, thus balancing data utility with privacy concerns.
Data Freshness
Users should be aware that this dataset reflects historical consumption patterns and does not represent real-time data.
Publish Frequency
Annually
Data Triage Review Frequency
An annual review is conducted to ensure the dataset's relevance and accuracy, with adjustments made based on specific requests or evolving data trends.
Data Specifications
For the domestic water consumption dataset, the data specifications are designed to ensure comprehensiveness and relevance, while maintaining clarity and focus. The specifications for this dataset include:
·
Each
dataset encompasses recordings of domestic water consumption as measured and
reported by the data publisher. It excludes commercial consumption.
· Where it is necessary to estimate consumption, this is calculated based on actual meter readings.
· Meters of all types (smart, dumb, AMR) are included in this dataset.
·
The
dataset is updated and published annually.
·
Historical
data may be made available to facilitate trend analysis and comparative
studies, although it is not mandatory for each dataset release.
Context
Users are cautioned against using the dataset for immediate operational decisions regarding water supply management. The data should be interpreted considering potential seasonal and weather-related influences on water consumption patterns.
The geographical data provided does not pinpoint locations of water meters within an LSOA.
The dataset aims to cover a broad spectrum of households, from single-meter homes to those with multiple meters, to accurately reflect the diversity of water use within an LSOA.
Supplementary Information
Below is a curated selection of links for additional reading, which provide a deeper understanding of this dataset.
Ofwat guidance on water meters
https://www.ofwat.gov.uk/wp-content/uploads/2015/11/prs_lft_101117meters.pdf
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Geospatial Imagery Analytics Marketsize was valued at USD 11.88 USD Billion in 2023 and is projected to reach USD 83.39 USD Billion by 2032, exhibiting a CAGR of 32.1 % during the forecast period.Geospatial analytics gathers, manipulates, and displays geographic information system (GIS) data and imagery including GPS and satellite photographs. Geospatial data analytics rely on geographic coordinates and specific identifiers such as street address and zip code. geospatial visualization enables businesses to better understand complex information and make informed decisions. They can quickly see patterns and trends and assess the impact of different variables by visualizing data in a spatial context. The field encompasses several techniques and algorithms, such as spatial interpolation, spatial regression, spatial clustering, and spatial autocorrelation analysis, which help extract insights from various geospatial data sources. The growing adoption of location-based services in various industries, including agriculture, defense, and urban planning, is driving the demand for geospatial imagery analytics. Recent developments include: August 2023: onX, a digital navigation company, partnered with Planet Labs PBC, a satellite imagery provider, to introduce a new feature called ‘Recent Imagery’. This feature offers onX app users updated satellite imagery maps every two weeks, enhancing the user experience across onX Hunt, onX Offroad, and onX Backcountry apps. This frequent data update helps outdoor enthusiasts access real-time information for safer and more informed outdoor activities., August 2023: Quant Data & Analytics, a provider of data products and enterprise solutions for real estate and retail, partnered with Satellogic Inc. to utilize Satellogic’s high-resolution satellite imagery to enhance property technology in Saudi Arabia and the Gulf region., April 2023: Astraea, a spatiotemporal data and analytics platform, introduced a new ordering service that grants customers scalable access to top-tier commercial satellite imagery from providers such as Planet Labs PBC and others., May 2022: Satellogic Inc. established a partnership with UP42. This geospatial developer platform enables direct access to Satellogic’s satellite tasking capabilities, including high-resolution multispectral and wide-area hyperspectral imagery, through the UP42 API-based platform., April 2022: TomTom International BV, a geolocation tech company, broadened its partnership with Maxar Technologies, a space solution provider. This expansion involves integrating high-resolution global satellite imagery from Maxar’s Vivid imagery base maps into TomTom’s product lineup, enhancing their visualization solutions for customers.. Key drivers for this market are: Growing Demand for Location-based Insights across Diverse Industries to Fuel Market Growth. Potential restraints include: Complexity and Cost Associated with Data Acquisition and Processing May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Colour patterns and their visual backgrounds consist of a mosaic of patches that vary in colour, brightness, size, shape and position. Most studies of crypsis, aposematism, sexual selection, or other forms of signalling concentrate on one or two patch classes (colours), either ignoring the rest of the colour pattern, or analysing the patches separately. We summarize methods of comparing colour patterns making use of known properties of bird eyes. The methods are easily modifiable for other animal visual systems. We present a new statistical method to compare entire colour patterns rather than comparing multiple pairs of patches. Unlike previous methods, the new method detects differences in the relationships among the colours, not just differences in colours. We present tests of the method's ability to detect a variety of kinds of differences between natural colour patterns and provide suggestions for analysis.