50 datasets found
  1. D

    Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories

    • data.transportation.gov
    • odgavaprod.ogopendata.com
    • +2more
    application/rdfxml +5
    Updated Nov 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories [Dataset]. https://data.transportation.gov/Automobiles/Third-Generation-Simulation-Data-TGSIM-I-90-I-94-M/6a6e-vfvi
    Explore at:
    application/rssxml, json, xml, application/rdfxml, csv, tsvAvailable download formats
    Dataset updated
    Nov 4, 2024
    Area covered
    Interstate 90
    Description

    The main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png.

    This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day.

    As part of this dataset, the following files were provided:

    • I90_94_moving_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the automated test vehicles ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion.
    • I90_94_moving_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound lanes) for each run X.
    • I-90-moving-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and vertical locations in the reference image, respectively. The "ramp" columns define the type of roadway segment (0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments). In total, the centerline files define six northbound lanes.
    • Annotation on Regions.zip, which includes images that visually map lanes (I90_94_moving_lane1.png through I90_94_moving_lane6.png) to their associated numerical lane IDs.
  2. e

    Synthetic Administrative Data: Census 1991, 2023 - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Oct 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Synthetic Administrative Data: Census 1991, 2023 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/6f71c471-1b89-5932-b354-700afb58cb5c
    Explore at:
    Dataset updated
    Oct 11, 2024
    Description

    We create a synthetic administrative dataset to be used in the development of the R package for calculating quality indicators for administrative data (see: https://github.com/sook-tusk/qualadmin) that mimic the properties of a real administrative dataset according to specifications by the ONS. Taking over 1 million records from a synthetic 1991 UK census dataset, we deleted records, moved records to a different geography and duplicated records to a different geography according to pre-specified proportions for each broad ethnic group (White, Non-white) and gender (males, females). The final size of the synthetic administrative data was 1033664 individuals.National Statistical Institutes (NSIs) are directing resources into advancing the use of administrative data in official statistics systems. This is a top priority for the UK Office for National Statistics (ONS) as they are undergoing transformations in their statistical systems to make more use of administrative data for future censuses and population statistics. Administrative data are defined as secondary data sources since they are produced by other agencies as a result of an event or a transaction relating to administrative procedures of organisations, public administrations and government agencies. Nevertheless, they have the potential to become important data sources for the production of official statistics by significantly reducing the cost and burden of response and improving the efficiency of such systems. Embedding administrative data in statistical systems is not without costs and it is vital to understand where potential errors may arise. The Total Administrative Data Error Framework sets out all possible sources of error when using administrative data as statistical data, depending on whether it is a single data source or integrated with other data sources such as survey data. For a single administrative data, one of the main sources of error is coverage and representation to the target population of interest. This is particularly relevant when administrative data is delivered over time, such as tax data for maintaining the Business Register. For sub-project 1 of this research project, we develop quality indicators that allow the statistical agency to assess if the administrative data is representative to the target population and which sub-groups may be missing or over-covered. This is essential for producing unbiased estimates from administrative data. Another priority at statistical agencies is to produce a statistical register for population characteristic estimates, such as employment statistics, from multiple sources of administrative and survey data. Using administrative data to build a spine, survey data can be integrated using record linkage and statistical matching approaches on a set of common matching variables. This will be the topic for sub-project 2, which will be split into several topics of research. The first topic is whether adding statistical predictions and correlation structures improves the linkage and data integration. The second topic is to research a mass imputation framework for imputing missing target variables in the statistical register where the missing data may be due to multiple underlying mechanisms. Therefore, the third topic will aim to improve the mass imputation framework to mitigate against possible measurement errors, for example by adding benchmarks and other constraints into the approaches. On completion of a statistical register, estimates for key target variables at local areas can easily be aggregated. However, it is essential to also measure the precision of these estimates through mean square errors and this will be the fourth topic of the sub-project. Finally, this new way of producing official statistics is compared to the more common method of incorporating administrative data through survey weights and model-based estimation approaches. In other words, we evaluate whether it is better 'to weight' or 'to impute' for population characteristic estimates - a key question under investigation by survey statisticians in the last decade. This is a synthetic administrative dataset with only 6 variables to enable the calculation of quality indicators in the R package: https://github.com/sook-tusk/qualadmin See also the user manual. The dataset was created from a 1991 synthetic UK census dataset containing over 1 million records by deleting, moving and duplicating records across geographies according to pre-specified proportions within broad ethnic group and gender. The geography variable includes 6 local authorities but they are completely anonymized and labelled 1,2..6. Other variables are (number of categories in parentheses): sex (2), age groups (14), ethnic groups (5) and employment (3). The final size of the synthetic administrative data is 1033664 individuals. The description of the variables are in the data dictionary that is uploaded with the data.

  3. The Bushland, Texas, Winter Wheat Datasets

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). The Bushland, Texas, Winter Wheat Datasets [Dataset]. https://catalog.data.gov/dataset/the-bushland-texas-winter-wheat-datasets-7e83c
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Area covered
    Bushland, Texas
    Description

    This parent dataset (collection of datasets) describes the general organization of data in the datasets for each growing season (two-year period) when winter wheat (Triticum aestivum L.) was grown for grain at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Winter wheat was grown on two large, precision weighing lysimeters, calibrated to NIST standards (Howell et al., 1995). Each lysimeter was in the center of a 4.44 ha square field on which wheat was also grown (Evett et al., 2000). The two fields were contiguous and arranged with one directly north of the other. See the resource titled "Geographic Coordinates, USDA, ARS, Bushland, Texas" for UTM geographic coordinates for field and lysimeter locations. Wheat was planted in Autumn and grown over the winter in 1989-1990, 1991-1992, and 1992-1993. Agronomic calendar for the each of the three growing seasons list by date the agronomic practices applied, severe weather, and activities (e.g., planting, thinning, fertilization, pesticide application, lysimeter maintenance, harvest) in and on lysimeters that could influence crop growth, water use, and lysimeter data. These include fertilizer and pesticide applications. Irrigation was by linear move sprinkler system equipped with pressure regulated low pressure sprays (mid-elevation spray application, MESA). Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a field-calibrated (Evett and Steiner, 1995) neutron probe from 0.10- to 2.4-m depth in the field. The lysimeters and fields were planted to the same plant density, row spacing, tillage depth (by hand on the lysimeters and by machine in the fields), and fertilizer and pesticide applications. The weighing lysimeters were used to measure relative soil water storage to 0.05 mm accuracy at 5-min intervals, and the 5-min change in soil water storage was used along with precipitation, dew and frost accumulation, and irrigation amounts to calculate crop evapotranspiration (ET), which is reported at 15-min intervals. Each lysimeter was equipped with a suite of instruments to sense wind speed, air temperature and humidity, radiant energy (incoming and reflected, typically both shortwave and longwave), surface temperature, soil heat flux, and soil temperature, all of which are reported at 15-min intervals. Instruments used changed from season to season, which is another reason that subsidiary datasets and data dictionaries for each season are required. The Bushland weighing lysimeter research program was described by Evett et al. (2016), and lysimeter design is described by Marek et al. (1988). Important conventions concerning the data-time correspondence, sign conventions, and terminology specific to the USDA ARS, Bushland, TX, field operations are given in the resource titled "Conventions for Bushland, TX, Weighing Lysimeter Datasets". There are six datasets in this collection. Common symbols and abbreviations used in the datasets are defined in the resource titled, "Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets". Datasets consist of Excel (xlsx) files. Each xlsx file contains an Introductory tab that explains the other tabs, lists the authors, describes conventions and symbols used and lists any instruments used. The remaining tabs in a file consist of dictionary and data tabs. The six datasets are as follows: Agronomic Calendars for the Bushland, Texas Winter Wheat Datasets Growth and Yield Data for the Bushland, Texas Winter Wheat Datasets Weighing Lysimeter Data for The Bushland, Texas Winter Wheat Datasets Soil Water Content Data for The Bushland, Texas, Large Weighing Lysimeter Experiments Evapotranspiration, Irrigation, Dew/frost - Water Balance Data for The Bushland, Texas Winter Wheat Datasets Standard Quality Controlled Research Weather Data – USDA-ARS, Bushland, Texas See the README for descriptions of each dataset. The soil is a Pullman series fine, mixed, superactive, thermic Torrertic Paleustoll. Soil properties are given in the resource titled "Soil Properties for the Bushland, TX, Weighing Lysimeter Datasets". The land slope in the lysimeter fields is <0.3% and topography is flat. The mean annual precipitation is ~470 mm, the 20-year pan evaporation record indicates ~2,600 mm Class A pan evaporation per year, and winds are typically from the South and Southwest. The climate is semi-arid with ~70% (350 mm) of the annual precipitation occurring from May to September, during which period the pan evaporation averages ~1520 mm. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have described the facilities and research methods (Evett et al., 2016), and have focused on winter wheat ET (Howell et al., 1995, 1997, 1998), and crop coefficients (Howell et al., 2006; Schneider and Howell, 1997, 2001) that have been used by ET networks for irrigation management. The data have utility for developing, calibrating, and testing simulation models of crop ET, growth, and yield (Evett et al., 1994; Kang et al., 2009), and have been used by several universities and for testing, and calibrating models of ET that use satellite and/or weather data. Resources in this dataset: Resource Title: Geographic Coordinates of Experimental Assets, Weighing Lysimeter Experiments, USDA, ARS, Bushland, Texas. File Name: Geographic Coordinates, USDA, ARS, Bushland, Texas.xlsx. Resource Description: The file gives the UTM latitude and longitude of important experimental assets of the Bushland, Texas, USDA, ARS, Conservation & Production Research Laboratory (CPRL). Locations include weather stations [Soil and Water Management Research Unit (SWMRU) and CPRL], large weighing lysimeters, and corners of fields within which each lysimeter was centered. There were four fields designated NE, SE, NW, and SW, and a weighing lysimeter was centered in each field. The SWMRU weather station was adjacent to and immediately east of the NE and SE lysimeter fields. Resource Title: Conventions for Bushland, TX, Weighing Lysimeter Datasets. File Name: Conventions for Bushland, TX, Weighing Lysimeter Datasets.xlsx. Resource Description: Descriptions of conventions and terminology used in the Bushland, TX, weighing lysimeter research program. Resource Title: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets. File Name: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets.xlsx. Resource Description: Definitions of symbols and abbreviations used in the Bushland, TX, weighing lysimeter research datasets. Resource Title: Soil Properties for the Bushland, TX, Weighing Lysimeter Datasets. File Name: Bushland_TX_soil_properties.xlsx. Resource Description: Soil properties useful for simulation modeling and for describing the soil are given for the Pullman soil series at the USDA, ARS, Conservation & Production Research Laboratory, Bushland, TX, USA. For each soil layer, soil horizon designation and texture according to USDA Soil Taxonomy, bulk density, porosity, water content at field capacity (33 kPa) and permanent wilting point (1500 kPa), percent sand, percent silt, percent clay, percent organic matter, pH, and van Genuchten-Mualem characteristic curve parameters describing the soil hydraulic properties are given. A separate table describes the soil horizon thicknesses, designations, and textures according to USDA Soil Taxonomy. Another table describes important aspects of the soil hydrologic and rooting behavior. Resource Title: README - Bushland Texas Winter Wheat collection. File Name: README_Bushland_winter_wheat_collection.pdf. Resource Description: Descriptions of the datasets in the Bushland Texas Winter Wheat collection

  4. m

    GOAL 15: Life on Land (5 year moving average) - Suriname

    • macro-rankings.com
    csv, excel
    Updated Dec 31, 2004
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2004). GOAL 15: Life on Land (5 year moving average) - Suriname [Dataset]. https://www.macro-rankings.com/suriname/goal-15-life-on-land-(5-year-moving-average)
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Dec 31, 2004
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Suriname
    Description

    Time series data for the statistic GOAL 15: Life on Land (5 year moving average) and country Suriname. Indicator Definition:SDG Goal 15 data availability. Source: UN Global SDG Indicators Database

  5. m

    GOAL 15: Life on Land (5 year moving average) - Vanuatu

    • macro-rankings.com
    csv, excel
    Updated Dec 31, 2004
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2004). GOAL 15: Life on Land (5 year moving average) - Vanuatu [Dataset]. https://www.macro-rankings.com/vanuatu/goal-15-life-on-land-(5-year-moving-average)
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Dec 31, 2004
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Vanuatu
    Description

    Time series data for the statistic GOAL 15: Life on Land (5 year moving average) and country Vanuatu. Indicator Definition:SDG Goal 15 data availability. Source: UN Global SDG Indicators Database

  6. Real Anonymized Sales Dataset

    • kaggle.com
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Perez (2025). Real Anonymized Sales Dataset [Dataset]. https://www.kaggle.com/datasets/danielperez067178/real-anonymized-sales-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Daniel Perez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains real, anonymized sales data from a fast-moving consumer goods (FMCG) company in Latin America. It includes over 25 million records of daily sales transactions across thousands of unique customers, products, and distribution routes. The dataset has been carefully cleaned, standardized, and anonymized to remove any personally identifiable information, while preserving key structures that enable advanced analytics and machine learning tasks. This dataset is ideal for:

    • Market segmentation
    • Portfolio optimization
    • Recommender systems
    • Clustering and unsupervised learning practice
    • Retail and supply chain analytics case studies

    No synthetic data was generated—these are real-world patterns from an operational context. Use it to test scalable data science pipelines, feature engineering, or business intelligence dashboards.

  7. m

    GOAL 6: Clean Water and Sanitation (5 year moving average) - Eswatini

    • macro-rankings.com
    csv, excel
    Updated Dec 31, 2004
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2004). GOAL 6: Clean Water and Sanitation (5 year moving average) - Eswatini [Dataset]. https://www.macro-rankings.com/eswatini/goal-6-clean-water-and-sanitation-(5-year-moving-average)
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Dec 31, 2004
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Eswatini
    Description

    Time series data for the statistic GOAL 6: Clean Water and Sanitation (5 year moving average) and country Eswatini. Indicator Definition:SDG Goal 6 data availability. Source: UN Global SDG Indicators Database

  8. NBA Players Stats 23/24

    • kaggle.com
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    orkunaktas4 (2024). NBA Players Stats 23/24 [Dataset]. https://www.kaggle.com/datasets/orkunaktas/nba-players-stats-2324/data?select=nba-player-data.csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Kaggle
    Authors
    orkunaktas4
    Description

    This dataset contains detailed data on all nba players from 2023/24 season.

    • Player: The name of the player.
    • Nation: The player's nationality.
    • Pos: The player's position (e.g., guard, forward, center).
    • Age: The player's age.
    • MP (Minutes Played): The total number of minutes the player has played.
    • Starts: The number of games the player started.
    • Min (Minutes): The total number of minutes played by the player (similar to MP).
    • 90s (90s Played): The equivalent number of 90-minute matches played by the player (e.g., 1.5 = 135 minutes).
    • Gls (Goals): The total number of goals scored by the player.
    • Ast (Assists): The total number of assists made by the player.
    • G+A (Goals + Assists): The total number of goals and assists combined.
    • G-PK (Goals - Penalty Kicks): The total number of goals scored excluding penalty kicks.
    • PK (Penalty Kicks): The number of penalty goals scored by the player.
    • PKatt (Penalty Kicks Attempted): The number of penalty kicks attempted by the player.
    • CrdY (Yellow Cards): The number of yellow cards received by the player.
    • CrdR (Red Cards): The number of red cards received by the player.
    • xG (Expected Goals): The expected number of goals from the player's shots.
    • npxG (Non-Penalty Expected Goals): The expected goals excluding penalties.
    • xAG (Expected Assists): The expected number of assists from the player's passes.
    • npxG+xAG (Non-Penalty xG + xAG): The total of non-penalty expected goals and expected assists.
    • PrgC (Progressive Carries): The number of times the player carried the ball forward.
    • PrgP (Progressive Passes): The number of passes made by the player that moved the ball forward.
    • PrgR (Progressive Runs): The number of times the player made runs forward with the ball.
    • Gls (Goals): (Repeated, already defined) The total number of goals scored.
    • Ast (Assists): (Repeated, already defined) The total number of assists made.
    • G+A (Goals + Assists): (Repeated, already defined) The total number of goals and assists combined.
    • G-PK (Goals - Penalty Kicks): (Repeated, already defined) The total number of goals scored excluding penalty kicks.
    • G+A-PK (Goals + Assists - Penalty Kicks): The total number of goals and assists minus penalty goals.
    • xG (Expected Goals): (Repeated, already defined) The expected number of goals from the player's shots.
    • xAG (Expected Assists): (Repeated, already defined) The expected number of assists from the player's passes.
    • xG+xAG (Expected Goals + Expected Assists): The total expected goals and assists.
    • npxG (Non-Penalty Expected Goals): (Repeated, already defined) The expected goals excluding penalties.
    • npxG+xAG (Non-Penalty xG + Expected Assists): The total of non-penalty expected goals and expected assists.
  9. Location Affordability Index v.3

    • hudgis-hud.opendata.arcgis.com
    • data.lojic.org
    • +2more
    Updated Jan 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Housing and Urban Development (2025). Location Affordability Index v.3 [Dataset]. https://hudgis-hud.opendata.arcgis.com/datasets/location-affordability-index-v-3
    Explore at:
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    United States Department of Housing and Urban Developmenthttp://www.hud.gov/
    Authors
    Department of Housing and Urban Development
    Area covered
    Description

    First launched by the U.S. Department of Housing and Urban Development (HUD) and Department of Transportation (DOT) in November 2013, the Location Affordability Index (LAI) provides ubiquitous, standardized household housing and transportation cost estimates for all 50 states and the District of Columbia. Because what is affordable is different for everyone, users can choose among eight household profiles—which vary by household income, size, and number of commuters—and see the impact of the built environment on affordability in a given location while holding household demographics constant.

    Version 3 updates the constituent data sets with 2012-2016 American Community Survey data and makes several methodological tweaks, most notably moving to modeling at the Census tract level rather at the block group. As with Version 2, the inputs to the simultaneous equation model (SEM) include six endogenous variables—housing costs, car ownership, and transit usage for both owners and renters—and 18 exogenous variables, with vehicle miles traveled still modeled separately due to data limitations.To learn more about the Location Affordability Index (v.3) visit: https://www.hudexchange.info/programs/location-affordability-index/, for questions about the spatial attribution of this dataset, please reach out to us at GISHelpdesk@hud.gov. Date of Coverage: 2012-2016 Data Dictionary: DD_Location Affordability Indev v.3.0LAI Version 3 Data and MethodologyLAI Version 3 Technical Documentation

  10. f

    Data from: Interactive visual analytics of moving passenger flocks using...

    • tandf.figshare.com
    gif
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tong Zhang; Wei He; Jing Huang; Zhenxuan He; Jing Li (2024). Interactive visual analytics of moving passenger flocks using massive smart card data [Dataset]. http://doi.org/10.6084/m9.figshare.19329069.v1
    Explore at:
    gifAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Tong Zhang; Wei He; Jing Huang; Zhenxuan He; Jing Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Understanding urban mobility patterns is constrained by our limited capabilities to extract and visualize spatio-temporal regularities from large amounts of mobility data. Moving flocks, defined as groups of people traveling along over a pre-defined time duration, can reveal collective moving patterns at aggregated spatio-temporal scales, thereby facilitating the discovery of urban mobility structure and travel demand patterns. In this study, we extend classical trajectory-oriented flock mining algorithms to discover moving flocks of transit passengers, accounting for the constraints of multi-modal transit networks. We develop a map-centered visual analytics approach by integrating the flock mining algorithm with interactive visualization designs of discovered flocks. Novel interactive visualizations are designed and implemented to support the exploration and analyses of discovered moving flocks at different spatial and temporal scales. The visual analytics approach is evaluated using a real-world smart card dataset collected in Shenzhen City, China, validating its applicability in capturing and mapping dynamic mobility patterns over a large metropolitan area.

  11. u

    Quarterly Labour Force Survey Household Dataset, October - December, 2022

    • beta.ukdataservice.ac.uk
    Updated 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office For National Statistics (2023). Quarterly Labour Force Survey Household Dataset, October - December, 2022 [Dataset]. http://doi.org/10.5255/ukda-sn-9064-2
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    datacite
    Authors
    Office For National Statistics
    Description
    Background
    The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

    Household datasets
    Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. From January 2011, a pseudonymised household identifier variable (HSERIALP) is also included in the main quarterly LFS dataset instead.

    Change to coding of missing values for household series
    From 1996-2013, all missing values in the household datasets were set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. This was also in line with the Annual Population Survey household series of the time. The change was applied to the back series during 2010 to ensure continuity for analytical purposes. From 2013 onwards, the -8 and -9 categories have been reinstated.

    LFS Documentation
    The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each volume alongside the appropriate questionnaire for the year concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS
    LFS User Guidance page before commencing analysis.

    Additional data derived from the QLFS
    The Archive also holds further QLFS series: End User Licence (EUL) quarterly datasets; Secure Access datasets (see below); two-quarter and five-quarter longitudinal datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.

    End User Licence and Secure Access QLFS Household datasets
    Users should note that there are two discrete versions of the QLFS household datasets. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. Secure Access household datasets for the QLFS are available from 2009 onwards, and include additional, detailed variables not included in the standard EUL versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurrence of learning difficulty or disability; and benefits. For full details of variables included, see data dictionary documentation. The Secure Access version (see SN 7674) has more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.

    Changes to variables in QLFS Household EUL datasets
    In order to further protect respondent confidentiality, ONS have made some changes to variables available in the EUL datasets. From July-September 2015 onwards, 4-digit industry class is available for main job only, meaning that 3-digit industry group is the most detailed level available for second and last job.

    Review of imputation methods for LFS Household data - changes to missing values
    A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

    Occupation data for 2021 and 2022 data files

    The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

    Latest edition information

    For the second edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN and SOC20M have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

  12. The Bushland, Texas Maize for Grain Datasets

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). The Bushland, Texas Maize for Grain Datasets [Dataset]. https://catalog.data.gov/dataset/the-bushland-texas-maize-for-grain-datasets-a4eb2
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Area covered
    Bushland, Texas
    Description

    This parent dataset (collection of datasets) describes the general organization of data in the datasets for each growing season (year) when maize (Zea mays, L., also known as corn in the United States) was grown for grain at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Maize was grown for grain on between two and four large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The four fields were contiguous and arranged in four quadrants, which were labeled northeast (NE), southeast (SE), northwest (NW), and southwest (SW). See the resource titled "Geographic Coordinates, USDA, ARS, Bushland, Texas" for UTM geographic coordinates for field and lysimeter locations. Maize was grown on only the NE and SE fields in 1989 and 1990, and on all four fields in 1994, 2013, 2016, and 2018. Irrigation was by linear move sprinkler system in 1989, 1990, and 1994, although the system was equipped with various application technologies such as high-pressure impact sprinklers, low pressure spray applications, and low energy precision applicators (LEPA). In 2013, 2016, and 2018, two lysimeters and their respective fields were irrigated using subsurface drip irrigation (SDI), and two lysimeters and their respective fields were irrigated by a linear move sprinkler system equipped with spray applicators. Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe from 0.10- to 2.4-m depth in the field. The number and spacing of neutron probe reading locations changed through the years (additional sites were added), which is one reason why subsidiary datasets and data dictionaries are needed. The lysimeters and fields were planted to the same plant density, row spacing, tillage depth (by hand on the lysimeters and by machine in the fields), and fertilizer and pesticide applications. The weighing lysimeters were used to measure relative soil water storage to 0.05 mm accuracy at 5-minute intervals, and the 5-minute change in soil water storage was used along with precipitation, dew and frost accumulation, and irrigation amounts to calculate crop evapotranspiration (ET), which is reported at 15-minute intervals. Each lysimeter was equipped with a suite of instruments to sense wind speed, air temperature and humidity, radiant energy (incoming and reflected, typically both shortwave and longwave), surface temperature, soil heat flux, and soil temperature, all of which are reported at 15-minute intervals. Instruments used changed from season to season, which is another reason that subsidiary datasets and data dictionaries for each season are required.Important conventions concerning the data-time correspondence, sign conventions, and terminology specific to the USDA ARS, Bushland, TX, field operations are given in the resource titled "Conventions for Bushland, TX, Weighing Lysimeter Datasets".There are six datasets in this collection. Common symbols and abbreviations used in the datasets are defined in the resource titled, "Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets". Datasets consist of Excel (xlsx) files. Each xlsx file contains an Introductory tab that explains the other tabs, lists the authors, describes conventions and symbols used and lists any instruments used. The remaining tabs in a file consist of dictionary and data tabs. There is a dictionary tab for every data tab. The name of the dictionary tab contains the name of the corresponding data tab. Tab names are unique so that if individual tabs were saved to CSV files, each CSV file in the entire collection would have a different name. The six datasets, according to their titles, are as follows:Agronomic Calendars for the Bushland, Texas Maize for Grain DatasetsGrowth and Yield Data for the Bushland, Texas Maize for Grain DatasetsWeighing Lysimeter Data for The Bushland, Texas Maize for Grain DatasetsSoil Water Content Data for The Bushland, Texas, Large Weighing Lysimeter ExperimentsEvapotranspiration, Irrigation, Dew/frost - Water Balance Data for The Bushland, Texas Maize for Grain DatasetsStandard Quality Controlled Research Weather Data – USDA-ARS, Bushland, TexasSee the README for descriptions of each dataset.The land slope is <1% and topography is flat. The mean annual precipitation is ~470 mm, the 20-year pan evaporation record indicates ~2,600 mm Class A pan evaporation per year, and winds are typically from the South and Southwest. The climate is semi-arid with ~70% (350 mm) of the annual precipitation occurring from May to September, during which period the pan evaporation averages ~1520 mm.These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have described the facilities and research methods, and have focused on maize ET, crop coefficients, and crop water productivity. Crop coefficients have been used by ET networks for irrigation management. The data have utility for testing simulation models of crop ET, growth, and yield and have been used by the Agricultural Model Intercomparison and Improvement Project (AgMIP), by OPENET, and by many others for testing, and calibrating models of ET that use satellite and/or weather data.Resources in this dataset:Resource Title: Geographic Coordinates of Experimental Assets, Weighing Lysimeter Experiments, USDA, ARS, Bushland, Texas.File Name: Geographic Coordinates, USDA, ARS, Bushland, Texas.xlsx.Resource Description: The file gives the UTM latitude and longitude of important experimental assets of the Bushland, Texas, USDA, ARS, Conservation & Production Research Laboratory (CPRL). Locations include weather stations [Soil and Water Management Research Unit (SWMRU) and CPRL], large weighing lysimeters, and corners of fields within which each lysimeter was centered. There were four fields designated NE, SE, NW, and SW, and a weighing lysimeter was centered in each field. The SWMRU weather station was adjacent to and immediately east of the NE and SE lysimeter fields.Resource Title: Conventions for Bushland, TX, Weighing Lysimeter Datasets.File Name: Conventions for Bushland, TX, Weighing Lysimeter Datasets.xlsx.Resource Description: Descriptions of conventions and terminology used in the Bushland, TX, weighing lysimeter research program.Resource Title: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets.File Name: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets.xlsx.Resource Description: Definitions of symbols and abbreviations used in the Bushland, TX, weighing lysimeter research datasets.Resource Title: README - Bushland Texas Maize for Grain collection.File Name: README_Bushland_maize_for_grain_collection.pdf.Resource Description: Descriptions of the datasets in the Bushland Texas Maize for Grain collection.

  13. d

    Core mapper moving window averages (primary model) - A landscape...

    • catalog.data.gov
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish and Wildlife Service (2025). Core mapper moving window averages (primary model) - A landscape connectivity analysis for the coastal marten (Martes caurina humboldtensis) [Dataset]. https://catalog.data.gov/dataset/core-mapper-moving-window-averages-primary-model-a-landscape-connectivity-analysis-for-the
    Explore at:
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    U.S. Fish and Wildlife Service
    Description

    This raster dataset of Core Mapper Moving Window Averages is an intermediary modeling product that was produced by the Core Mapper tool (Shirk and McRae 2013) in the process of developing habitat cores for use in our coastal marten connectivity model. It is derived from another dataset (HabitatSurface), and was produced using the Core Mapper parameters defined in the Lineage section of the accompanying geospatial metadata record. More specifically, it is a calculated dataset in which a 977m moving window was used on the habitat surface to calculate the average habitat value within a 977m radius around each pixel (this moving window size was derived from the estimated average size of a female marten's home range of 300 hectares). Of note, the set of habitat cores that came from this Core Mapper tool received additional modifications; see the report or the metadata record for PrimaryModel_HabitatCores for details. Refer to the HabitatSurface and PrimaryModel_HabitatCores metadata records for additional context. We derived the habitat cores using a tool within Gnarly Landscape Utilities called Core Mapper (Shirk and McRae 2015). To develop a Habitat Surface for input into Core Mapper, we started by assigning each 30m pixel on the modeled landscape a habitat value equal to its GNN OGSI value (range = 0-100). In areas with serpentine soils that support habitat potentially suitable for coastal marten, we assigned a minimum habitat value of 31, which is equivalent to the 33rd percentile of OGSI 80 pixels in the marten’s historical range marten (for general details on our incorporation of serpentine soils, see the report section titled "Data Layers - Serpentine Soils"; for specific details on the development of this serpentine dataset, see the metadata record for the ResistancePostProcessing_Serpentine data layer, which was used to make these modifications to the habitat surface). Pixels with an OGSI value >31.0 retained their normal habitat value. Our intention was to allow the modified serpentine pixels to be more easily incorporated into habitat cores if there were higher value OGSI pixels in the vicinity, but not to have them form the entire basis of a core. As a parameter of the Core Mapper tool, we also excluded pixels with a habitat value <1.0 from inclusion in habitat cores. We then used Core Mapper to define a moving window and calculate the average habitat value within a 977m radius around each pixel (derived from the estimated average size of a female marten’s home range of 300 ha). Pixels with an average habitat value ≥36.0 were then incorporated into habitat cores. This is an abbreviated and incomplete description of the dataset. Please refer to the spatial metadata for a more thorough description of the methods used to produce this dataset, and a discussion of any assumptions or caveats that should be taken into consideration. Additional data for this project (including the Habitat Surface referenced above and the Habitat Cores used in our connectivity model) can be found at: https://www.fws.gov/arcata/shc/marten

  14. [Archived] COVID-19 Deaths by Population Characteristics Over Time

    • healthdata.gov
    • data.sfgov.org
    • +1more
    application/rdfxml +5
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). [Archived] COVID-19 Deaths by Population Characteristics Over Time [Dataset]. https://healthdata.gov/dataset/-Archived-COVID-19-Deaths-by-Population-Characteri/hs5f-amst
    Explore at:
    csv, json, xml, application/rssxml, tsv, application/rdfxmlAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    data.sfgov.org
    Description

    As of July 2nd, 2024 the COVID-19 Deaths by Population Characteristics Over Time dataset has been retired. This dataset is archived and will no longer update. We will be publishing a cumulative deaths by population characteristics dataset that will update moving forward.

    A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics and by date. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals for previous days may increase or decrease. More recent data is less reliable.

    Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups.

    B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health.

    Data on the population characteristics of COVID-19 deaths are from: *Case reports *Medical records *Electronic lab reports *Death certificates

    Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.

    To protect resident privacy, we summarize COVID-19 data by only one characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more.

    Data notes on each population characteristic type is listed below.

    Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases.

    Gender * The City collects information on gender identity using these guidelines.

    C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week.

    Dataset will not update on the business day following any federal holiday.

    D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).

    This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date.

    New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.

    This data may not be immediately available for more recent deaths. Data updates as more information becomes available.

    To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset.

    E. CHANGE LOG

    • 9/11/2023 - on this date, we began using an updated definition of a COVID-19 death to align with the California Department o

  15. NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep...

    • zenodo.org
    text/x-python, zip
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulio Del Corso; Giulio Del Corso; Volpini Federico; Volpini Federico; Claudia Caudai; Claudia Caudai; Davide Moroni; Davide Moroni; Sara Colantonio; Sara Colantonio (2025). NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep learning models [Dataset]. http://doi.org/10.5281/zenodo.15194187
    Explore at:
    zip, text/x-pythonAvailable download formats
    Dataset updated
    Apr 16, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giulio Del Corso; Giulio Del Corso; Volpini Federico; Volpini Federico; Claudia Caudai; Claudia Caudai; Davide Moroni; Davide Moroni; Sara Colantonio; Sara Colantonio
    License

    Attribution-NonCommercial-NoDerivs 2.5 (CC BY-NC-ND 2.5)https://creativecommons.org/licenses/by-nc-nd/2.5/
    License information was derived automatically

    Time period covered
    Dec 18, 2024
    Description

    NADA (Not-A-Database) is an easy-to-use geometric shape data generator that allows users to define non-uniform multivariate parameter distributions to test novel methodologies. The full open-source package is provided at GIT:NA_DAtabase. See Technical Report for details on how to use the provided package.

    This database includes 3 repositories:

    • NADA_Dis: Is the model able to correctly characterize/Disentangle a complex latent space?
      The repository contains 3x100,000 synthetic black and white images to test the ability of the models to correctly define a proper latent space (e.g., autoencoders) and disentangle it. The first 100,000 images contain 4 shapes and uniform parameter space distributions, while the other images have a more complex underlying distribution (truncated Gaussian and correlated marginal variables).

    • NADA_OOD: Does the model identify Out-Of-Distribution images?
      The repository contains 100,000 training images (4 different shapes with 3 possible colors located in the upper left corner of the canvas) and 6x100,000 increasingly different sets of images (changing the color class balance, reducing the radius of the shape, moving the shape to the lower left corner) providing increasingly challenging out-of-distribution images.
      This can help to test not only the capability of a model, but also methods that produce reliability estimates and should correctly classify OOD elements as "unreliable" as they are far from the original distributions.

    • NADA_AlEp: Does the model distinguish between different types (Aleatoric/Epistemic) of uncertainties?
      The repository contains 5x100,000 images with different type of noise/uncertainties:
      • NADA_AlEp_0_Clean: Dataset clean of noise to use as a possible training set.
      • NADA_AlEp_1_White_Noise: Epistemic white noise dataset. Each image is perturbed with an amount of white noise randomly sampled from 0% to 90%.
      • NADA_AlEp_2_Deformation: Dataset with Epistemic deformation noise. Each image is deformed by a randomly amount uniformly sampled between 0% and 90%. 0% corresponds to the original image, while 100% is a full deformation to the circumscribing circle.
      • NADA_AlEp_3_Label: Dataset with label noise. Formally, 20% of Triangles of a given color are missclassified as a Square with a random color (among Blue, Orange, and Brown) and viceversa (Squares to Triangles). Label noise introduces \textit{Aleatoric Uncertainty} because it is inherent in the data and cannot be reduced.
      • NADA_AlEp_4_Combined: Combined dataset with all previous sources of uncertainty.

    Each image can be used for classification (shape/color) or regression (radius/area) tasks.

    All datasets can be modified and adapted to the user's research question using the included open source data generator.

  16. T

    Global population survey data set (1950-2018)

    • data.tpdc.ac.cn
    • tpdc.ac.cn
    zip
    Updated Sep 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wen DONG (2020). Global population survey data set (1950-2018) [Dataset]. https://data.tpdc.ac.cn/en/data/ece5509f-2a2c-4a11-976e-8d939a419a6c
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 3, 2020
    Dataset provided by
    TPDC
    Authors
    Wen DONG
    Area covered
    Description

    "Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates.This dataset includes demographic data of 22 countries from 1960 to 2018, including Sri Lanka, Bangladesh, Pakistan, India, Maldives, etc. Data fields include: country, year, population ratio, male ratio, female ratio, population density (km). Source: ( 1 ) United Nations Population Division. World Population Prospects: 2019 Revision. ( 2 ) Census reports and other statistical publications from national statistical offices, ( 3 ) Eurostat: Demographic Statistics, ( 4 ) United Nations Statistical Division. Population and Vital Statistics Reprot ( various years ), ( 5 ) U.S. Census Bureau: International Database, and ( 6 ) Secretariat of the Pacific Community: Statistics and Demography Programme. Periodicity: Annual Statistical Concept and Methodology: Population estimates are usually based on national population censuses. Estimates for the years before and after the census are interpolations or extrapolations based on demographic models. Errors and undercounting occur even in high-income countries. In developing countries errors may be substantial because of limits in the transport, communications, and other resources required to conduct and analyze a full census. The quality and reliability of official demographic data are also affected by public trust in the government, government commitment to full and accurate enumeration, confidentiality and protection against misuse of census data, and census agencies' independence from political influence. Moreover, comparability of population indicators is limited by differences in the concepts, definitions, collection procedures, and estimation methods used by national statistical agencies and other organizations that collect the data. The currentness of a census and the availability of complementary data from surveys or registration systems are objective ways to judge demographic data quality. Some European countries' registration systems offer complete information on population in the absence of a census. The United Nations Statistics Division monitors the completeness of vital registration systems. Some developing countries have made progress over the last 60 years, but others still have deficiencies in civil registration systems. International migration is the only other factor besides birth and death rates that directly determines a country's population growth. Estimating migration is difficult. At any time many people are located outside their home country as tourists, workers, or refugees or for other reasons. Standards for the duration and purpose of international moves that qualify as migration vary, and estimates require information on flows into and out of countries that is difficult to collect. Population projections, starting from a base year are projected forward using assumptions of mortality, fertility, and migration by age and sex through 2050, based on the UN Population Division's World Population Prospects database medium variant."

  17. IBEX High Energy Neutral Atom Imager (ENA-Hi) Data Release-14, Compton...

    • s.cnmilf.com
    • data.nasa.gov
    • +1more
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA Space Physics Data Facility (SPDF) Coordinated Data Analysis Web (CDAWeb) Data Services (2025). IBEX High Energy Neutral Atom Imager (ENA-Hi) Data Release-14, Compton Getting corrected, not Survival Probability corrected, Ram direction, West Ecliptic Global Distributed Flux and Flux Power Law Slope Maps, Level H3 (H3), three year average Data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/ibex-high-energy-neutral-atom-imager-ena-hi-data-release-14-compton-getting-corrected-not-
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The Interstellar Boundary Explorer, IBEX, has operated in space since 2008 updating our knowledge of the outer heliosphere and its interaction with the local interstellar medium. Start-time: 2008-12-25. There are currently 15 releases of IBEX-HI and/or IBEX-LO data covering the years from 2009 to 2018. This data set is derived from the Release 14 three-year IBEX-Hi map data with two-year overlaps of adjacent maps, 2009-2011, 2010-2012, and so forth through 2015-2017 from ram-direction fluxes with corrections for spacecraft motion, cg: Compton-Getting, but with no corrections, sp, for Energetic Neutral Atom, ENA, survival probability between 1 and 100 AU. The data set parameters include line-of-sight, LOS, integrated pressures computed separately from the Global Distributed Flux, GDF, the Ribbon Flux, and the Total Flux from summing GDF and Ribbon LOS pressures. Additionally there are signal to noise ratios for the GDF, Ribbon, and Total LOS pressures. Finally, there are power law slope values for the GDF differential flux and signal to noise ratios of the slope. The IBEX Release 14 data are archived as fully citable data. Please consult IBEX team publications and personnel for further details on production, processing, and usage of these data. The data consist of ram-direction sky maps in Solar Ecliptic Longitude, east and west, and Latitude angles for the above parameters. Details of the data and enabled science from Release 14 are given in the following journal publication: Schwadron, N. A., et al. 2018, Time Dependence of the IBEX Ribbon and the Globally Distributed Energetic Neutral Atom Flux Using the First 9 Years of Observations, DOI: 10.3847/1538-4365/aae48e. The following codes are used to define data set types in the multiple IBEX data releases: +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ Code Code definition --------- ------------------------------------------------------------------------------------------------------------------------------------------------------- cg Compton-Getting corrections have been applied to the data to account for the speed of the spacecraft relative to the direction of arrival of the ENAs nocg no Compton-Getting corrections --------- ------------------------------------------------------------------------------------------------------------------------------------------------------- sp survival probability corrections have been applied to the data to account for the loss of ENAs due to radiation pressure, photoionization and ionization via charge exchange with solar wind protons as they stream through the heliosphere. This correction scales the data out from IBEX at 1 AU to approximately 100 AU. In the original data this mode is denoted as Tabular. noSP no survival probability corrections have been applied to the data --------- ------------------------------------------------------------------------------------------------------------------------------------------------------- omni data from all directions ram data was collected when the spacecraft was ramming into the incoming ENAs antiram data was collected when the spacecraft was moving away from the incoming ENAs +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ This particular data set denoted in the original ASCII files as: +------------------------------------------------------------------------------------------------------------------------------------------------------------+ Directory Name File Content Description +---------------- -------------------------------------------------------------------------------------------------------------------------------------------+ GDFPressure Globally Distributed Flux Line-of-Sight Integrated Pressure in pdyne-au/cm^2 GDFSlope Power Law Slope of the differential flux spectrum for the Globally Distributed Flux GDFSlopeSN Signal/Noise ratio of the GDF differential flux power law slope where noise represents uncertainty GDFSN Globally Distributed Flux Signal/Noise, where Noise is defined as the uncertainty and the Signal is GDF Line-of-Sight integrated pressure RibbonPressure Ribbon Line-of-Sight Integrated Pressure in pdyne-au/cm^2 RibbonSN Ribbon Signal/Noise, where Noise is defined as the uncertainty and the Signal is GDF Line-of-Sight integrated pressure TotPressure Total Pressure in ENA maps including both the GDF and Ribbon. Line-of-Sight Integrated Pressure in pdyne-au/cm^2 TotSN Total Pressure Signal-to-Noise where noise represents uncertainty and signal represents the Total LOS integrated pressure +------------------------------------------------------------------------------------------------------------------------------------------------------------+

  18. Meshblock 2024

    • datafinder.stats.govt.nz
    csv, dwg, geodatabase +6
    Updated Nov 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2023). Meshblock 2024 [Dataset]. https://datafinder.stats.govt.nz/layer/115225-meshblock-2024/
    Explore at:
    dwg, geodatabase, geopackage / sqlite, pdf, shapefile, mapinfo mif, csv, kml, mapinfo tabAvailable download formats
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Area covered
    Description

    This dataset is the definitive of the annually released meshblock boundaries as at 1 January 2024 as defined by Stats NZ. This version contains 57,539 meshblocks, including 16 with empty or null geometries (non-digitised meshblocks).

    Stats NZ maintains an annual meshblock pattern for collecting and producing statistical data. This allows data to be compared over time.

    A meshblock is the smallest geographic unit for which statistical data is collected and processed by Stats NZ. A meshblock is a defined geographic area, which can vary in size from part of a city block to a large area of rural land. The optimal size for a meshblock is 30–60 dwellings (containing approximately 60–120 residents).

    Each meshblock borders on another to form a network covering all of New Zealand, including coasts and inlets and extending out to the 200-mile economic zone (EEZ) and is digitised to the 12-mile (19.3km) limit. Meshblocks are added together to build up larger geographic areas such as statistical area 1 (SA1), statistical area 2 (SA2), statistical area 3 (SA3), and urban rural (UR). They are also used to define electoral districts, territorial authorities, and regional councils.

    Meshblock boundaries generally follow road centrelines, cadastral property boundaries, or topographical features such as rivers. Expanses of water in the form of lakes and inlets are defined separately from land.

    Meshblock maintenance

    Meshblock boundaries are amended by:

    1. Splitting – subdividing a meshblock into two or more meshblocks.
    2. Nudging – shifting a boundary to a more appropriate position.

    Reasons for meshblock splits and nudges can include:

    · to maintain meshblock criteria rules.

    · to improve the size balance of meshblocks in areas where there has been population growth

    · to maintain alignment to cadastre and other geographic features.

    · Stats NZ requests for boundary changes so that statistical geography boundaries can be moved

    · external requests for boundary changes so that administrative or electoral boundaries can be moved

    · to separate land and water. Mainland, inland water, islands, inlets, and oceanic are defined separately

    Meshblock changes are made throughout the year. A major release is made at 1 January each year with ad hoc releases available to users at other times.

    While meshblock boundaries are continually under review, 'freezes' on changes to the boundaries are applied periodically. Such 'freezes' are imposed at the time of population censuses and during periods of intense electoral activity, for example, prior and during general and local body elections.

    Meshblock numbering

    Meshblocks are not named and have seven-digit codes.

    When meshblocks are split, each new meshblock is given a new code. The original meshblock codes no longer exist within that version and future versions of the meshblock classification. Meshblock codes do not change when a meshblock boundary is nudged.

    Meshblocks that existed prior to 2015 and have not changed are numbered from 0000100 to 3210003. Meshblocks created from 2015 onwards are numbered from 4000000.

    Digitised and non-digitised meshblocks

    The digital geographic boundaries are defined and maintained by Stats NZ.

    Meshblocks cover the land area of New Zealand, the water area to the 12mile limit, the Chatham Islands, Kermadec Islands, sub-Antarctic islands, offshore oil rigs, and Ross Dependency. The following 16 meshblocks are not held in digitised form.

    Meshblock / Location (statistical area 2 name)

    • 0016901 / Oceanic Kermadec Islands
    • 0016902 / Kermadec Islands
    • 1588000 / Oceanic Oil Rig Taranaki
    • 3166401 / Oceanic Campbell Island
    • 3166402 / Campbell Island
    • 3166600 / Oceanic Oil Rig Southland
    • 3166710 / Oceanic Auckland Islands
    • 3166711 / Auckland Islands
    • 3195000 / Ross Dependency
    • 3196001 / New Zealand Economic Zone
    • 3196002 / Oceanic Bounty Islands
    • 3196003 / Bounty Islands
    • 3196004 / Oceanic Snares Islands
    • 3196005 / Snares Island
    • 3196006 / Oceanic Antipodes Islands
    • 3196007 / Antipodes Island

    For more information please refer to the Statistical standard for geographic areas 2023.

    High definition version

    This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre.

    Digital Data

    Digital boundary data became freely available on 1 July 2007.

  19. a

    City Points

    • azgeo-open-data-agic.hub.arcgis.com
    • hub.arcgis.com
    Updated May 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AZGeo Data Hub (2020). City Points [Dataset]. https://azgeo-open-data-agic.hub.arcgis.com/datasets/azgeo::city-points/about
    Explore at:
    Dataset updated
    May 4, 2020
    Dataset authored and provided by
    AZGeo Data Hub
    Area covered
    Description

    This dataset represents point locations of cities and towns in Arizona. The data contains point locations for incorporated cities, Census Designated Places and populated places. Several data sets were used as inputs to construct this data set. A subset of the Geographic Names Information System (GNIS) national dataset for the state of Arizona was used for the base location of most of the points. Polygon files of the Census Designated Places (CDP), from the U.S. Census Bureau and an incorporated city boundary database developed and maintained by the Arizona State Land Department were also used for reference during development. Every incorporated city is represented by a point, originally derived from GNIS. Some of these points were moved based on local knowledge of the GIS Analyst constructing the data set. Some of the CDP points were also moved and while most CDP's of the Census Bureau have one point location in this data set, some inconsistencies were allowed in order to facilitate the use of the data for mapping purposes. Population estimates were derived from data collected during the 2010 Census. During development, an additional attribute field was added to provide additional functionality to the users of this data. This field, named 'DEF_CAT', implies definition category, and will allow users to easily view, and create custom layers or datasets from this file. For example, new layers may created to include only incorporated cities (DEF_CAT = Incorporated), Census designated places (DEF_CAT = Incorporated OR DEF_CAT = CDP), or all cities that are neither CDP's or incorporated (DEF_CAT= Other). This data is current as of February 2012. At this time, there is no planned maintenance or update process for this dataset.This data is created to serve as base information for use in GIS systems for a variety of planning, reference, and analysis purposes. This data does not represent a legal record.

  20. F

    Audio Visual Speech Dataset: Japanese

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Audio Visual Speech Dataset: Japanese [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/japanese-visual-speech-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Japanese Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.

    Dataset Content

    This visual speech dataset contains 1000 videos in Japanese language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.

    Participant Diversity:
    Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of Japan.
    Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Video Data

    While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.

    Recording Details:
    File Duration: Average duration of 30 seconds to 3 minutes per video.
    Formats: Videos are available in MP4 or MOV format.
    Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.
    Device: Both the latest Android and iOS devices are used in this collection.
    Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:
    Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.
    Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.
    Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.
    Face Orientation: Contains straight face and tilted face angles.
    Participant Positions: Records participants in both standing and seated positions.
    Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.
    Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.
    Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.
    Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:
    Happy
    Sad
    Excited
    Angry
    Annoyed
    Normal
    Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

    Metadata

    The dataset provides comprehensive metadata for each video recording and participant:

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories [Dataset]. https://data.transportation.gov/Automobiles/Third-Generation-Simulation-Data-TGSIM-I-90-I-94-M/6a6e-vfvi

Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories

Explore at:
application/rssxml, json, xml, application/rdfxml, csv, tsvAvailable download formats
Dataset updated
Nov 4, 2024
Area covered
Interstate 90
Description

The main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png.

This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day.

As part of this dataset, the following files were provided:

  • I90_94_moving_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the automated test vehicles ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion.
  • I90_94_moving_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound lanes) for each run X.
  • I-90-moving-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and vertical locations in the reference image, respectively. The "ramp" columns define the type of roadway segment (0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments). In total, the centerline files define six northbound lanes.
  • Annotation on Regions.zip, which includes images that visually map lanes (I90_94_moving_lane1.png through I90_94_moving_lane6.png) to their associated numerical lane IDs.
Search
Clear search
Close search
Google apps
Main menu