The main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png.
This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day.
As part of this dataset, the following files were provided:
We create a synthetic administrative dataset to be used in the development of the R package for calculating quality indicators for administrative data (see: https://github.com/sook-tusk/qualadmin) that mimic the properties of a real administrative dataset according to specifications by the ONS. Taking over 1 million records from a synthetic 1991 UK census dataset, we deleted records, moved records to a different geography and duplicated records to a different geography according to pre-specified proportions for each broad ethnic group (White, Non-white) and gender (males, females). The final size of the synthetic administrative data was 1033664 individuals.National Statistical Institutes (NSIs) are directing resources into advancing the use of administrative data in official statistics systems. This is a top priority for the UK Office for National Statistics (ONS) as they are undergoing transformations in their statistical systems to make more use of administrative data for future censuses and population statistics. Administrative data are defined as secondary data sources since they are produced by other agencies as a result of an event or a transaction relating to administrative procedures of organisations, public administrations and government agencies. Nevertheless, they have the potential to become important data sources for the production of official statistics by significantly reducing the cost and burden of response and improving the efficiency of such systems. Embedding administrative data in statistical systems is not without costs and it is vital to understand where potential errors may arise. The Total Administrative Data Error Framework sets out all possible sources of error when using administrative data as statistical data, depending on whether it is a single data source or integrated with other data sources such as survey data. For a single administrative data, one of the main sources of error is coverage and representation to the target population of interest. This is particularly relevant when administrative data is delivered over time, such as tax data for maintaining the Business Register. For sub-project 1 of this research project, we develop quality indicators that allow the statistical agency to assess if the administrative data is representative to the target population and which sub-groups may be missing or over-covered. This is essential for producing unbiased estimates from administrative data. Another priority at statistical agencies is to produce a statistical register for population characteristic estimates, such as employment statistics, from multiple sources of administrative and survey data. Using administrative data to build a spine, survey data can be integrated using record linkage and statistical matching approaches on a set of common matching variables. This will be the topic for sub-project 2, which will be split into several topics of research. The first topic is whether adding statistical predictions and correlation structures improves the linkage and data integration. The second topic is to research a mass imputation framework for imputing missing target variables in the statistical register where the missing data may be due to multiple underlying mechanisms. Therefore, the third topic will aim to improve the mass imputation framework to mitigate against possible measurement errors, for example by adding benchmarks and other constraints into the approaches. On completion of a statistical register, estimates for key target variables at local areas can easily be aggregated. However, it is essential to also measure the precision of these estimates through mean square errors and this will be the fourth topic of the sub-project. Finally, this new way of producing official statistics is compared to the more common method of incorporating administrative data through survey weights and model-based estimation approaches. In other words, we evaluate whether it is better 'to weight' or 'to impute' for population characteristic estimates - a key question under investigation by survey statisticians in the last decade. This is a synthetic administrative dataset with only 6 variables to enable the calculation of quality indicators in the R package: https://github.com/sook-tusk/qualadmin See also the user manual. The dataset was created from a 1991 synthetic UK census dataset containing over 1 million records by deleting, moving and duplicating records across geographies according to pre-specified proportions within broad ethnic group and gender. The geography variable includes 6 local authorities but they are completely anonymized and labelled 1,2..6. Other variables are (number of categories in parentheses): sex (2), age groups (14), ethnic groups (5) and employment (3). The final size of the synthetic administrative data is 1033664 individuals. The description of the variables are in the data dictionary that is uploaded with the data.
This parent dataset (collection of datasets) describes the general organization of data in the datasets for each growing season (two-year period) when winter wheat (Triticum aestivum L.) was grown for grain at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Winter wheat was grown on two large, precision weighing lysimeters, calibrated to NIST standards (Howell et al., 1995). Each lysimeter was in the center of a 4.44 ha square field on which wheat was also grown (Evett et al., 2000). The two fields were contiguous and arranged with one directly north of the other. See the resource titled "Geographic Coordinates, USDA, ARS, Bushland, Texas" for UTM geographic coordinates for field and lysimeter locations. Wheat was planted in Autumn and grown over the winter in 1989-1990, 1991-1992, and 1992-1993. Agronomic calendar for the each of the three growing seasons list by date the agronomic practices applied, severe weather, and activities (e.g., planting, thinning, fertilization, pesticide application, lysimeter maintenance, harvest) in and on lysimeters that could influence crop growth, water use, and lysimeter data. These include fertilizer and pesticide applications. Irrigation was by linear move sprinkler system equipped with pressure regulated low pressure sprays (mid-elevation spray application, MESA). Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a field-calibrated (Evett and Steiner, 1995) neutron probe from 0.10- to 2.4-m depth in the field. The lysimeters and fields were planted to the same plant density, row spacing, tillage depth (by hand on the lysimeters and by machine in the fields), and fertilizer and pesticide applications. The weighing lysimeters were used to measure relative soil water storage to 0.05 mm accuracy at 5-min intervals, and the 5-min change in soil water storage was used along with precipitation, dew and frost accumulation, and irrigation amounts to calculate crop evapotranspiration (ET), which is reported at 15-min intervals. Each lysimeter was equipped with a suite of instruments to sense wind speed, air temperature and humidity, radiant energy (incoming and reflected, typically both shortwave and longwave), surface temperature, soil heat flux, and soil temperature, all of which are reported at 15-min intervals. Instruments used changed from season to season, which is another reason that subsidiary datasets and data dictionaries for each season are required. The Bushland weighing lysimeter research program was described by Evett et al. (2016), and lysimeter design is described by Marek et al. (1988). Important conventions concerning the data-time correspondence, sign conventions, and terminology specific to the USDA ARS, Bushland, TX, field operations are given in the resource titled "Conventions for Bushland, TX, Weighing Lysimeter Datasets". There are six datasets in this collection. Common symbols and abbreviations used in the datasets are defined in the resource titled, "Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets". Datasets consist of Excel (xlsx) files. Each xlsx file contains an Introductory tab that explains the other tabs, lists the authors, describes conventions and symbols used and lists any instruments used. The remaining tabs in a file consist of dictionary and data tabs. The six datasets are as follows: Agronomic Calendars for the Bushland, Texas Winter Wheat Datasets Growth and Yield Data for the Bushland, Texas Winter Wheat Datasets Weighing Lysimeter Data for The Bushland, Texas Winter Wheat Datasets Soil Water Content Data for The Bushland, Texas, Large Weighing Lysimeter Experiments Evapotranspiration, Irrigation, Dew/frost - Water Balance Data for The Bushland, Texas Winter Wheat Datasets Standard Quality Controlled Research Weather Data – USDA-ARS, Bushland, Texas See the README for descriptions of each dataset. The soil is a Pullman series fine, mixed, superactive, thermic Torrertic Paleustoll. Soil properties are given in the resource titled "Soil Properties for the Bushland, TX, Weighing Lysimeter Datasets". The land slope in the lysimeter fields is <0.3% and topography is flat. The mean annual precipitation is ~470 mm, the 20-year pan evaporation record indicates ~2,600 mm Class A pan evaporation per year, and winds are typically from the South and Southwest. The climate is semi-arid with ~70% (350 mm) of the annual precipitation occurring from May to September, during which period the pan evaporation averages ~1520 mm. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have described the facilities and research methods (Evett et al., 2016), and have focused on winter wheat ET (Howell et al., 1995, 1997, 1998), and crop coefficients (Howell et al., 2006; Schneider and Howell, 1997, 2001) that have been used by ET networks for irrigation management. The data have utility for developing, calibrating, and testing simulation models of crop ET, growth, and yield (Evett et al., 1994; Kang et al., 2009), and have been used by several universities and for testing, and calibrating models of ET that use satellite and/or weather data. Resources in this dataset: Resource Title: Geographic Coordinates of Experimental Assets, Weighing Lysimeter Experiments, USDA, ARS, Bushland, Texas. File Name: Geographic Coordinates, USDA, ARS, Bushland, Texas.xlsx. Resource Description: The file gives the UTM latitude and longitude of important experimental assets of the Bushland, Texas, USDA, ARS, Conservation & Production Research Laboratory (CPRL). Locations include weather stations [Soil and Water Management Research Unit (SWMRU) and CPRL], large weighing lysimeters, and corners of fields within which each lysimeter was centered. There were four fields designated NE, SE, NW, and SW, and a weighing lysimeter was centered in each field. The SWMRU weather station was adjacent to and immediately east of the NE and SE lysimeter fields. Resource Title: Conventions for Bushland, TX, Weighing Lysimeter Datasets. File Name: Conventions for Bushland, TX, Weighing Lysimeter Datasets.xlsx. Resource Description: Descriptions of conventions and terminology used in the Bushland, TX, weighing lysimeter research program. Resource Title: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets. File Name: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets.xlsx. Resource Description: Definitions of symbols and abbreviations used in the Bushland, TX, weighing lysimeter research datasets. Resource Title: Soil Properties for the Bushland, TX, Weighing Lysimeter Datasets. File Name: Bushland_TX_soil_properties.xlsx. Resource Description: Soil properties useful for simulation modeling and for describing the soil are given for the Pullman soil series at the USDA, ARS, Conservation & Production Research Laboratory, Bushland, TX, USA. For each soil layer, soil horizon designation and texture according to USDA Soil Taxonomy, bulk density, porosity, water content at field capacity (33 kPa) and permanent wilting point (1500 kPa), percent sand, percent silt, percent clay, percent organic matter, pH, and van Genuchten-Mualem characteristic curve parameters describing the soil hydraulic properties are given. A separate table describes the soil horizon thicknesses, designations, and textures according to USDA Soil Taxonomy. Another table describes important aspects of the soil hydrologic and rooting behavior. Resource Title: README - Bushland Texas Winter Wheat collection. File Name: README_Bushland_winter_wheat_collection.pdf. Resource Description: Descriptions of the datasets in the Bushland Texas Winter Wheat collection
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic GOAL 15: Life on Land (5 year moving average) and country Suriname. Indicator Definition:SDG Goal 15 data availability. Source: UN Global SDG Indicators Database
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic GOAL 15: Life on Land (5 year moving average) and country Vanuatu. Indicator Definition:SDG Goal 15 data availability. Source: UN Global SDG Indicators Database
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains real, anonymized sales data from a fast-moving consumer goods (FMCG) company in Latin America. It includes over 25 million records of daily sales transactions across thousands of unique customers, products, and distribution routes. The dataset has been carefully cleaned, standardized, and anonymized to remove any personally identifiable information, while preserving key structures that enable advanced analytics and machine learning tasks. This dataset is ideal for:
No synthetic data was generated—these are real-world patterns from an operational context. Use it to test scalable data science pipelines, feature engineering, or business intelligence dashboards.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic GOAL 6: Clean Water and Sanitation (5 year moving average) and country Eswatini. Indicator Definition:SDG Goal 6 data availability. Source: UN Global SDG Indicators Database
This dataset contains detailed data on all nba players from 2023/24 season.
First launched by the U.S. Department of Housing and Urban Development (HUD) and Department of Transportation (DOT) in November 2013, the Location Affordability Index (LAI) provides ubiquitous, standardized household housing and transportation cost estimates for all 50 states and the District of Columbia. Because what is affordable is different for everyone, users can choose among eight household profiles—which vary by household income, size, and number of commuters—and see the impact of the built environment on affordability in a given location while holding household demographics constant.
Version 3 updates the constituent data sets with 2012-2016 American Community Survey data and makes several methodological tweaks, most notably moving to modeling at the Census tract level rather at the block group. As with Version 2, the inputs to the simultaneous equation model (SEM) include six endogenous variables—housing costs, car ownership, and transit usage for both owners and renters—and 18 exogenous variables, with vehicle miles traveled still modeled separately due to data limitations.To learn more about the Location Affordability Index (v.3) visit: https://www.hudexchange.info/programs/location-affordability-index/, for questions about the spatial attribution of this dataset, please reach out to us at GISHelpdesk@hud.gov. Date of Coverage: 2012-2016 Data Dictionary: DD_Location Affordability Indev v.3.0LAI Version 3 Data and MethodologyLAI Version 3 Technical Documentation
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Understanding urban mobility patterns is constrained by our limited capabilities to extract and visualize spatio-temporal regularities from large amounts of mobility data. Moving flocks, defined as groups of people traveling along over a pre-defined time duration, can reveal collective moving patterns at aggregated spatio-temporal scales, thereby facilitating the discovery of urban mobility structure and travel demand patterns. In this study, we extend classical trajectory-oriented flock mining algorithms to discover moving flocks of transit passengers, accounting for the constraints of multi-modal transit networks. We develop a map-centered visual analytics approach by integrating the flock mining algorithm with interactive visualization designs of discovered flocks. Novel interactive visualizations are designed and implemented to support the exploration and analyses of discovered moving flocks at different spatial and temporal scales. The visual analytics approach is evaluated using a real-world smart card dataset collected in Shenzhen City, China, validating its applicability in capturing and mapping dynamic mobility patterns over a large metropolitan area.
Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
Latest edition information
For the second edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN and SOC20M have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
This parent dataset (collection of datasets) describes the general organization of data in the datasets for each growing season (year) when maize (Zea mays, L., also known as corn in the United States) was grown for grain at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Maize was grown for grain on between two and four large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The four fields were contiguous and arranged in four quadrants, which were labeled northeast (NE), southeast (SE), northwest (NW), and southwest (SW). See the resource titled "Geographic Coordinates, USDA, ARS, Bushland, Texas" for UTM geographic coordinates for field and lysimeter locations. Maize was grown on only the NE and SE fields in 1989 and 1990, and on all four fields in 1994, 2013, 2016, and 2018. Irrigation was by linear move sprinkler system in 1989, 1990, and 1994, although the system was equipped with various application technologies such as high-pressure impact sprinklers, low pressure spray applications, and low energy precision applicators (LEPA). In 2013, 2016, and 2018, two lysimeters and their respective fields were irrigated using subsurface drip irrigation (SDI), and two lysimeters and their respective fields were irrigated by a linear move sprinkler system equipped with spray applicators. Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe from 0.10- to 2.4-m depth in the field. The number and spacing of neutron probe reading locations changed through the years (additional sites were added), which is one reason why subsidiary datasets and data dictionaries are needed. The lysimeters and fields were planted to the same plant density, row spacing, tillage depth (by hand on the lysimeters and by machine in the fields), and fertilizer and pesticide applications. The weighing lysimeters were used to measure relative soil water storage to 0.05 mm accuracy at 5-minute intervals, and the 5-minute change in soil water storage was used along with precipitation, dew and frost accumulation, and irrigation amounts to calculate crop evapotranspiration (ET), which is reported at 15-minute intervals. Each lysimeter was equipped with a suite of instruments to sense wind speed, air temperature and humidity, radiant energy (incoming and reflected, typically both shortwave and longwave), surface temperature, soil heat flux, and soil temperature, all of which are reported at 15-minute intervals. Instruments used changed from season to season, which is another reason that subsidiary datasets and data dictionaries for each season are required.Important conventions concerning the data-time correspondence, sign conventions, and terminology specific to the USDA ARS, Bushland, TX, field operations are given in the resource titled "Conventions for Bushland, TX, Weighing Lysimeter Datasets".There are six datasets in this collection. Common symbols and abbreviations used in the datasets are defined in the resource titled, "Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets". Datasets consist of Excel (xlsx) files. Each xlsx file contains an Introductory tab that explains the other tabs, lists the authors, describes conventions and symbols used and lists any instruments used. The remaining tabs in a file consist of dictionary and data tabs. There is a dictionary tab for every data tab. The name of the dictionary tab contains the name of the corresponding data tab. Tab names are unique so that if individual tabs were saved to CSV files, each CSV file in the entire collection would have a different name. The six datasets, according to their titles, are as follows:Agronomic Calendars for the Bushland, Texas Maize for Grain DatasetsGrowth and Yield Data for the Bushland, Texas Maize for Grain DatasetsWeighing Lysimeter Data for The Bushland, Texas Maize for Grain DatasetsSoil Water Content Data for The Bushland, Texas, Large Weighing Lysimeter ExperimentsEvapotranspiration, Irrigation, Dew/frost - Water Balance Data for The Bushland, Texas Maize for Grain DatasetsStandard Quality Controlled Research Weather Data – USDA-ARS, Bushland, TexasSee the README for descriptions of each dataset.The land slope is <1% and topography is flat. The mean annual precipitation is ~470 mm, the 20-year pan evaporation record indicates ~2,600 mm Class A pan evaporation per year, and winds are typically from the South and Southwest. The climate is semi-arid with ~70% (350 mm) of the annual precipitation occurring from May to September, during which period the pan evaporation averages ~1520 mm.These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have described the facilities and research methods, and have focused on maize ET, crop coefficients, and crop water productivity. Crop coefficients have been used by ET networks for irrigation management. The data have utility for testing simulation models of crop ET, growth, and yield and have been used by the Agricultural Model Intercomparison and Improvement Project (AgMIP), by OPENET, and by many others for testing, and calibrating models of ET that use satellite and/or weather data.Resources in this dataset:Resource Title: Geographic Coordinates of Experimental Assets, Weighing Lysimeter Experiments, USDA, ARS, Bushland, Texas.File Name: Geographic Coordinates, USDA, ARS, Bushland, Texas.xlsx.Resource Description: The file gives the UTM latitude and longitude of important experimental assets of the Bushland, Texas, USDA, ARS, Conservation & Production Research Laboratory (CPRL). Locations include weather stations [Soil and Water Management Research Unit (SWMRU) and CPRL], large weighing lysimeters, and corners of fields within which each lysimeter was centered. There were four fields designated NE, SE, NW, and SW, and a weighing lysimeter was centered in each field. The SWMRU weather station was adjacent to and immediately east of the NE and SE lysimeter fields.Resource Title: Conventions for Bushland, TX, Weighing Lysimeter Datasets.File Name: Conventions for Bushland, TX, Weighing Lysimeter Datasets.xlsx.Resource Description: Descriptions of conventions and terminology used in the Bushland, TX, weighing lysimeter research program.Resource Title: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets.File Name: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets.xlsx.Resource Description: Definitions of symbols and abbreviations used in the Bushland, TX, weighing lysimeter research datasets.Resource Title: README - Bushland Texas Maize for Grain collection.File Name: README_Bushland_maize_for_grain_collection.pdf.Resource Description: Descriptions of the datasets in the Bushland Texas Maize for Grain collection.
This raster dataset of Core Mapper Moving Window Averages is an intermediary modeling product that was produced by the Core Mapper tool (Shirk and McRae 2013) in the process of developing habitat cores for use in our coastal marten connectivity model. It is derived from another dataset (HabitatSurface), and was produced using the Core Mapper parameters defined in the Lineage section of the accompanying geospatial metadata record. More specifically, it is a calculated dataset in which a 977m moving window was used on the habitat surface to calculate the average habitat value within a 977m radius around each pixel (this moving window size was derived from the estimated average size of a female marten's home range of 300 hectares). Of note, the set of habitat cores that came from this Core Mapper tool received additional modifications; see the report or the metadata record for PrimaryModel_HabitatCores for details. Refer to the HabitatSurface and PrimaryModel_HabitatCores metadata records for additional context. We derived the habitat cores using a tool within Gnarly Landscape Utilities called Core Mapper (Shirk and McRae 2015). To develop a Habitat Surface for input into Core Mapper, we started by assigning each 30m pixel on the modeled landscape a habitat value equal to its GNN OGSI value (range = 0-100). In areas with serpentine soils that support habitat potentially suitable for coastal marten, we assigned a minimum habitat value of 31, which is equivalent to the 33rd percentile of OGSI 80 pixels in the marten’s historical range marten (for general details on our incorporation of serpentine soils, see the report section titled "Data Layers - Serpentine Soils"; for specific details on the development of this serpentine dataset, see the metadata record for the ResistancePostProcessing_Serpentine data layer, which was used to make these modifications to the habitat surface). Pixels with an OGSI value >31.0 retained their normal habitat value. Our intention was to allow the modified serpentine pixels to be more easily incorporated into habitat cores if there were higher value OGSI pixels in the vicinity, but not to have them form the entire basis of a core. As a parameter of the Core Mapper tool, we also excluded pixels with a habitat value <1.0 from inclusion in habitat cores. We then used Core Mapper to define a moving window and calculate the average habitat value within a 977m radius around each pixel (derived from the estimated average size of a female marten’s home range of 300 ha). Pixels with an average habitat value ≥36.0 were then incorporated into habitat cores. This is an abbreviated and incomplete description of the dataset. Please refer to the spatial metadata for a more thorough description of the methods used to produce this dataset, and a discussion of any assumptions or caveats that should be taken into consideration. Additional data for this project (including the Habitat Surface referenced above and the Habitat Cores used in our connectivity model) can be found at: https://www.fws.gov/arcata/shc/marten
As of July 2nd, 2024 the COVID-19 Deaths by Population Characteristics Over Time dataset has been retired. This dataset is archived and will no longer update. We will be publishing a cumulative deaths by population characteristics dataset that will update moving forward.
A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics and by date. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals for previous days may increase or decrease. More recent data is less reliable.
Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups.
B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health.
Data on the population characteristics of COVID-19 deaths are from: *Case reports *Medical records *Electronic lab reports *Death certificates
Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.
To protect resident privacy, we summarize COVID-19 data by only one characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more.
Data notes on each population characteristic type is listed below.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases.
Gender * The City collects information on gender identity using these guidelines.
C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week.
Dataset will not update on the business day following any federal holiday.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date.
New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.
This data may not be immediately available for more recent deaths. Data updates as more information becomes available.
To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset.
E. CHANGE LOG
Attribution-NonCommercial-NoDerivs 2.5 (CC BY-NC-ND 2.5)https://creativecommons.org/licenses/by-nc-nd/2.5/
License information was derived automatically
NADA (Not-A-Database) is an easy-to-use geometric shape data generator that allows users to define non-uniform multivariate parameter distributions to test novel methodologies. The full open-source package is provided at GIT:NA_DAtabase. See Technical Report for details on how to use the provided package.
This database includes 3 repositories:
Each image can be used for classification (shape/color) or regression (radius/area) tasks.
All datasets can be modified and adapted to the user's research question using the included open source data generator.
"Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates.This dataset includes demographic data of 22 countries from 1960 to 2018, including Sri Lanka, Bangladesh, Pakistan, India, Maldives, etc. Data fields include: country, year, population ratio, male ratio, female ratio, population density (km). Source: ( 1 ) United Nations Population Division. World Population Prospects: 2019 Revision. ( 2 ) Census reports and other statistical publications from national statistical offices, ( 3 ) Eurostat: Demographic Statistics, ( 4 ) United Nations Statistical Division. Population and Vital Statistics Reprot ( various years ), ( 5 ) U.S. Census Bureau: International Database, and ( 6 ) Secretariat of the Pacific Community: Statistics and Demography Programme. Periodicity: Annual Statistical Concept and Methodology: Population estimates are usually based on national population censuses. Estimates for the years before and after the census are interpolations or extrapolations based on demographic models. Errors and undercounting occur even in high-income countries. In developing countries errors may be substantial because of limits in the transport, communications, and other resources required to conduct and analyze a full census. The quality and reliability of official demographic data are also affected by public trust in the government, government commitment to full and accurate enumeration, confidentiality and protection against misuse of census data, and census agencies' independence from political influence. Moreover, comparability of population indicators is limited by differences in the concepts, definitions, collection procedures, and estimation methods used by national statistical agencies and other organizations that collect the data. The currentness of a census and the availability of complementary data from surveys or registration systems are objective ways to judge demographic data quality. Some European countries' registration systems offer complete information on population in the absence of a census. The United Nations Statistics Division monitors the completeness of vital registration systems. Some developing countries have made progress over the last 60 years, but others still have deficiencies in civil registration systems. International migration is the only other factor besides birth and death rates that directly determines a country's population growth. Estimating migration is difficult. At any time many people are located outside their home country as tourists, workers, or refugees or for other reasons. Standards for the duration and purpose of international moves that qualify as migration vary, and estimates require information on flows into and out of countries that is difficult to collect. Population projections, starting from a base year are projected forward using assumptions of mortality, fertility, and migration by age and sex through 2050, based on the UN Population Division's World Population Prospects database medium variant."
The Interstellar Boundary Explorer, IBEX, has operated in space since 2008 updating our knowledge of the outer heliosphere and its interaction with the local interstellar medium. Start-time: 2008-12-25. There are currently 15 releases of IBEX-HI and/or IBEX-LO data covering the years from 2009 to 2018. This data set is derived from the Release 14 three-year IBEX-Hi map data with two-year overlaps of adjacent maps, 2009-2011, 2010-2012, and so forth through 2015-2017 from ram-direction fluxes with corrections for spacecraft motion, cg: Compton-Getting, but with no corrections, sp, for Energetic Neutral Atom, ENA, survival probability between 1 and 100 AU. The data set parameters include line-of-sight, LOS, integrated pressures computed separately from the Global Distributed Flux, GDF, the Ribbon Flux, and the Total Flux from summing GDF and Ribbon LOS pressures. Additionally there are signal to noise ratios for the GDF, Ribbon, and Total LOS pressures. Finally, there are power law slope values for the GDF differential flux and signal to noise ratios of the slope. The IBEX Release 14 data are archived as fully citable data. Please consult IBEX team publications and personnel for further details on production, processing, and usage of these data. The data consist of ram-direction sky maps in Solar Ecliptic Longitude, east and west, and Latitude angles for the above parameters. Details of the data and enabled science from Release 14 are given in the following journal publication: Schwadron, N. A., et al. 2018, Time Dependence of the IBEX Ribbon and the Globally Distributed Energetic Neutral Atom Flux Using the First 9 Years of Observations, DOI: 10.3847/1538-4365/aae48e. The following codes are used to define data set types in the multiple IBEX data releases: +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ Code Code definition --------- ------------------------------------------------------------------------------------------------------------------------------------------------------- cg Compton-Getting corrections have been applied to the data to account for the speed of the spacecraft relative to the direction of arrival of the ENAs nocg no Compton-Getting corrections --------- ------------------------------------------------------------------------------------------------------------------------------------------------------- sp survival probability corrections have been applied to the data to account for the loss of ENAs due to radiation pressure, photoionization and ionization via charge exchange with solar wind protons as they stream through the heliosphere. This correction scales the data out from IBEX at 1 AU to approximately 100 AU. In the original data this mode is denoted as Tabular. noSP no survival probability corrections have been applied to the data --------- ------------------------------------------------------------------------------------------------------------------------------------------------------- omni data from all directions ram data was collected when the spacecraft was ramming into the incoming ENAs antiram data was collected when the spacecraft was moving away from the incoming ENAs +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ This particular data set denoted in the original ASCII files as: +------------------------------------------------------------------------------------------------------------------------------------------------------------+ Directory Name File Content Description +---------------- -------------------------------------------------------------------------------------------------------------------------------------------+ GDFPressure Globally Distributed Flux Line-of-Sight Integrated Pressure in pdyne-au/cm^2 GDFSlope Power Law Slope of the differential flux spectrum for the Globally Distributed Flux GDFSlopeSN Signal/Noise ratio of the GDF differential flux power law slope where noise represents uncertainty GDFSN Globally Distributed Flux Signal/Noise, where Noise is defined as the uncertainty and the Signal is GDF Line-of-Sight integrated pressure RibbonPressure Ribbon Line-of-Sight Integrated Pressure in pdyne-au/cm^2 RibbonSN Ribbon Signal/Noise, where Noise is defined as the uncertainty and the Signal is GDF Line-of-Sight integrated pressure TotPressure Total Pressure in ENA maps including both the GDF and Ribbon. Line-of-Sight Integrated Pressure in pdyne-au/cm^2 TotSN Total Pressure Signal-to-Noise where noise represents uncertainty and signal represents the Total LOS integrated pressure +------------------------------------------------------------------------------------------------------------------------------------------------------------+
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
This dataset is the definitive of the annually released meshblock boundaries as at 1 January 2024 as defined by Stats NZ. This version contains 57,539 meshblocks, including 16 with empty or null geometries (non-digitised meshblocks).
Stats NZ maintains an annual meshblock pattern for collecting and producing statistical data. This allows data to be compared over time.
A meshblock is the smallest geographic unit for which statistical data is collected and processed by Stats NZ. A meshblock is a defined geographic area, which can vary in size from part of a city block to a large area of rural land. The optimal size for a meshblock is 30–60 dwellings (containing approximately 60–120 residents).
Each meshblock borders on another to form a network covering all of New Zealand, including coasts and inlets and extending out to the 200-mile economic zone (EEZ) and is digitised to the 12-mile (19.3km) limit. Meshblocks are added together to build up larger geographic areas such as statistical area 1 (SA1), statistical area 2 (SA2), statistical area 3 (SA3), and urban rural (UR). They are also used to define electoral districts, territorial authorities, and regional councils.
Meshblock boundaries generally follow road centrelines, cadastral property boundaries, or topographical features such as rivers. Expanses of water in the form of lakes and inlets are defined separately from land.
Meshblock maintenance
Meshblock boundaries are amended by:
Reasons for meshblock splits and nudges can include:
· to maintain meshblock criteria rules.
· to improve the size balance of meshblocks in areas where there has been population growth
· to maintain alignment to cadastre and other geographic features.
· Stats NZ requests for boundary changes so that statistical geography boundaries can be moved
· external requests for boundary changes so that administrative or electoral boundaries can be moved
· to separate land and water. Mainland, inland water, islands, inlets, and oceanic are defined separately
Meshblock changes are made throughout the year. A major release is made at 1 January each year with ad hoc releases available to users at other times.
While meshblock boundaries are continually under review, 'freezes' on changes to the boundaries are applied periodically. Such 'freezes' are imposed at the time of population censuses and during periods of intense electoral activity, for example, prior and during general and local body elections.
Meshblock numbering
Meshblocks are not named and have seven-digit codes.
When meshblocks are split, each new meshblock is given a new code. The original meshblock codes no longer exist within that version and future versions of the meshblock classification. Meshblock codes do not change when a meshblock boundary is nudged.
Meshblocks that existed prior to 2015 and have not changed are numbered from 0000100 to 3210003. Meshblocks created from 2015 onwards are numbered from 4000000.
Digitised and non-digitised meshblocks
The digital geographic boundaries are defined and maintained by Stats NZ.
Meshblocks cover the land area of New Zealand, the water area to the 12mile limit, the Chatham Islands, Kermadec Islands, sub-Antarctic islands, offshore oil rigs, and Ross Dependency. The following 16 meshblocks are not held in digitised form.
Meshblock / Location (statistical area 2 name)
For more information please refer to the Statistical standard for geographic areas 2023.
High definition version
This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre.
Digital Data
Digital boundary data became freely available on 1 July 2007.
This dataset represents point locations of cities and towns in Arizona. The data contains point locations for incorporated cities, Census Designated Places and populated places. Several data sets were used as inputs to construct this data set. A subset of the Geographic Names Information System (GNIS) national dataset for the state of Arizona was used for the base location of most of the points. Polygon files of the Census Designated Places (CDP), from the U.S. Census Bureau and an incorporated city boundary database developed and maintained by the Arizona State Land Department were also used for reference during development. Every incorporated city is represented by a point, originally derived from GNIS. Some of these points were moved based on local knowledge of the GIS Analyst constructing the data set. Some of the CDP points were also moved and while most CDP's of the Census Bureau have one point location in this data set, some inconsistencies were allowed in order to facilitate the use of the data for mapping purposes. Population estimates were derived from data collected during the 2010 Census. During development, an additional attribute field was added to provide additional functionality to the users of this data. This field, named 'DEF_CAT', implies definition category, and will allow users to easily view, and create custom layers or datasets from this file. For example, new layers may created to include only incorporated cities (DEF_CAT = Incorporated), Census designated places (DEF_CAT = Incorporated OR DEF_CAT = CDP), or all cities that are neither CDP's or incorporated (DEF_CAT= Other). This data is current as of February 2012. At this time, there is no planned maintenance or update process for this dataset.This data is created to serve as base information for use in GIS systems for a variety of planning, reference, and analysis purposes. This data does not represent a legal record.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Japanese Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
This visual speech dataset contains 1000 videos in Japanese language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
The dataset provides comprehensive metadata for each video recording and participant:
The main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png.
This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day.
As part of this dataset, the following files were provided: