This dataset shows daily citywide counts of persons tested by nucleic acid amplification tests (NAAT, also known as a molecular test; e.g. a PCR test) for SARS-CoV-2 , counts of persons with positive tests, and the percent positivity. Also included is a calculation of the average percent positivity over a 7-day period. NAAT tests work through direct detection of the virus’s genetic material, and typically involve collecting a nasal swab. These tests are highly accurate and recommended for diagnosing current COVID-19 infection. After specimen collection, molecular tests are processed in a laboratory, and results are electronically reported to the New York State (NYS) Electronic Clinical Laboratory Results System (ECLRS). Test results for NYC residents are then sent electronically to NYC DOHMH. There is typically a lag of a few days between when a specimen is collected and when a result is reported to NYC DOHMH. Data is sourced from electronic laboratory reporting from NYS ECLRS. All identifying health information is excluded from the dataset.
This dataset contains the collection and maintenance of crime data for incidents that occur in New York City public schools.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘NYC Social Media Usage’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/nyc-social-media-usage on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The Demographic Reports is a compilation of population, households and housing unit estimates and forecasts; market value estimates; residential development activity estimates; and industrial and commercial gross floor area estimates. Various geographic arrangements are used to present these data, such as supervisor districts, towns, planning districts, human services regions, ZIP Codes, sewer sheds, and census tracts. These small area estimates and forecasts are produced on an annual basis. The methodology used for estimating and forecasting housing units, households and population is contained. The Methodologies used to estimate market value, residential development, and gross floor area are contained in their respective sections. In addition to the small area estimates and forecasts, state and federal data on Fairfax County are collected and summarized, and special studies and Quantitative research are conducted by the unit.
If you use this dataset in your research, please credit John Snow Labs
--- Original source retains full ownership of the source dataset ---
This dataset contains the list of NYC (New York City) Properties under DOB (Department of Buildings) jurisdiction.
The dataset comes from CouncilStat, which is used by many NYC Council district offices to enter and track constituent cases that can range from issues around affordable housing, to potholes and pedestrian safety. This dataset aggregates the information that individual staff have input. However, district staffs handle a wide range of complex issues. Each offices uses the program differently, and thus records cases, differently and so comparisons between accounts may be difficult. Not all offices use the program. For more info - http://labs.council.nyc/districts/data/
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset comes from CouncilStat, which is used by many NYC Council district offices to enter and track constituent cases that can range from issues around affordable housing, to potholes and pedestrian safety. This dataset aggregates the information that individual staff have input. However, district staffs handle a wide range of complex issues. Each offices uses the program differently, and thus records cases, differently and so comparisons between accounts may be difficult. Not all offices use the program. For more info - http://labs.council.nyc/districts/data/
The NYC Health and Nutritional Examination Survey (NYC HANES), modeled on the National Health and Nutrition Examination Survey (NHANES), is a population-based, cross-sectional study with data collected from physical examinations, clinical and laboratory tests, as well as face-to-face interviews and audio computer-assisted self-interviews (ACASI). It was first conducted in 2004 by the NYC Department of Health and Mental Hygiene (DOHMH) in 2004 with 1,999 respondents and again in 2013-2014 through a partnership with the City University of New York School of Public Health and NYC DOHMH yielding 1,524 respondents. Participants in both rounds of collection are non-institutionalized adults 20 years or older selected through three-stage cluster sampling. NYC HANES focuses on chronic conditions such as diabetes, high blood pressure, high cholesterol, and depression.
NYC HANES data consists of the following 6 files: SPfile (Study Participant File); CAPI (Computer Assisted Personal Interview) ; ACASI (Audio Computer-Assisted Self-Interview); CIDI (Composite International Diagnostic Interview); EXAM; and LABS. Each data file (downloadable in SAS), is accompanied by a variable list, data documentation, and codebook.
The dataset comes from CouncilStat, which is used by many NYC Council district offices to enter and track constituent cases that can range from issues around affordable housing, to potholes and pedestrian safety. This dataset aggregates the information that individual staff have input. However, district staffs handle a wide range of complex issues. Each offices uses the program differently, and thus records cases, differently and so comparisons between accounts may be difficult. Not all offices use the program. For more info - http://labs.council.nyc/districts/data/
The purpose of this data package is to offer demographic data for U.S. cities. The data sources are multiple, the most important one being the U.S. Census Bureau, American Community Survey. In this case, the data was organized by the Big Cities Health Coalition (BCHC). Others are the New York City Department of City Planning and Department of Parks and Recreation, data being available through the NYC Open Data.
This dataset shows daily confirmed and probable cases of COVID-19 in New York City by date of specimen collection. Total cases has been calculated as the sum of daily confirmed and probable cases. Seven-day averages of confirmed, probable, and total cases are also included in the dataset. A person is classified as a confirmed COVID-19 case if they test positive with a nucleic acid amplification test (NAAT, also known as a molecular test; e.g. a PCR test). A probable case is a person who meets the following criteria with no positive molecular test on record: a) test positive with an antigen test, b) have symptoms and an exposure to a confirmed COVID-19 case, or c) died and their cause of death is listed as COVID-19 or similar. As of June 9, 2021, people who meet the definition of a confirmed or probable COVID-19 case >90 days after a previous positive test (date of first positive test) or probable COVID-19 onset date will be counted as a new case. Prior to June 9, 2021, new cases were counted ≥365 days after the first date of specimen collection or clinical diagnosis. Any person with a residence outside of NYC is not included in counts. Data is sourced from electronic laboratory reporting from the New York State Electronic Clinical Laboratory Reporting System to the NYC Health Department. All identifying health information is excluded from the dataset.
These data are used to evaluate the overall number of confirmed and probable cases by day (seven day average) to track the trajectory of the pandemic. Cases are classified by the date that the case occurred. NYC COVID-19 data include people who live in NYC. Any person with a residence outside of NYC is not included.
The Metro Region Explorer is an interactive map showing population, housing, and employment trends within the tri-state New York City metropolitan region, and sharing key insights about how the region has changed from 2000 to today.Developed in collaboration between DCP Planning Labs and DCP Regional Planning, this tool will be maintained as part of our ongoing commitment to the public access and understand information about planning issues affecting NYC and the metro region.Check back for new data additions and map updates. To let us know how this app could be better, add a GitHub issue or send a tweet to @NYCPlanningLabs. If you have questions about the data and analysis, send an email to regional@planning.nyc.gov
In New York City (NYC), more than 200,000 motor vehicle collisions happen every year. This means about every 3 minute, a collision happens somewhere in NYC. To reduce collisions, there is a need to discover the key factors to improve. The use of NYC collision data and historical weather data to identify the worst location (total number of collisions) and worst weather condition (frequency of collision)
This scene contains the relative heat severity for every pixel for every city in the United States, from this source layer. This 30-meter raster was derived from Landsat 8 imagery band 10 (ground-level thermal sensor) from the summers of 2018 and 2019.Federal statistics over a 30-year period show extreme heat is the leading cause of weather-related deaths in the United States. Extreme heat exacerbated by urban heat islands can lead to increased respiratory difficulties, heat exhaustion, and heat stroke. These heat impacts significantly affect the most vulnerable—children, the elderly, and those with preexisting conditions.The purpose of this scene is to show where certain areas of cities are hotter than the average temperature for that same city as a whole. Severity is measured on a scale of 1 to 5, with 1 being a relatively mild heat area (slightly above the mean for the city), and 5 being a severe heat area (significantly above the mean for the city). The absolute heat above mean values are classified into these 5 classes using the Jenks Natural Breaks classification method, which seeks to reduce the variance within classes and maximize the variance between classes. Knowing where areas of high heat are located can help a city government plan for mitigation strategies.This dataset represents a snapshot in time. It will be updated yearly, but is static between updates. It does not take into account changes in heat during a single day, for example, from building shadows moving. The thermal readings detected by the Landsat 8 sensor are surface-level, whether that surface is the ground or the top of a building. Although there is strong correlation between surface temperature and air temperature, they are not the same. We believe that this is useful at the national level, and for cities that don’t have the ability to conduct their own hyper local temperature survey. Where local data is available, it may be more accurate than this dataset. Dataset SummaryThis dataset was developed using proprietary Python code developed at The Trust for Public Land, running on the Descartes Labs platform through the Descartes Labs API for Python. The Descartes Labs platform allows for extremely fast retrieval and processing of imagery, which makes it possible to produce heat island data for all cities in the United States in a relatively short amount of time.What can you do with this layer?This layer has query, identify, and export image services available. Since it is served as an image service, it is not necessary to download the data; the service itself is data that can be used directly in any Esri geoprocessing tool that accepts raster data as input.Other Sources of Heat Island InformationPlease see these websites for valuable information on heat islands and to learn about exciting new heat island research being led by scientists across the country:EPA’s Heat Island Resource Center: https://www.epa.gov/heat-islands/heat-island-resourcesDr. Ladd Keith, University of Arizona: https://www.laddkeith.com/ Dr. Ben McMahan, University of Arizona: https://www.climas.arizona.edu/about/people/ben-mcmahan Dr. Jeremy Hoffman, Science Museum of Virginia: https://jeremyscotthoffman.com/about-me-shift#about Dr. Hunter Jones, NOAA: https://cpo.noaa.gov/News/News-Article/ArtMID/6226/ArticleID/971/CPOs-Hunter-Jones-delivers-keynote-on-Climate-and-Extreme-Heat-at-Design-for-Risk-Reduction-Symposium-in-NYC Daphne Lundi, Senior Policy Advisor, NYC Mayor's Office of Recovery and Resiliency: https://youtu.be/sAHlqGDU0_4 Disclaimer/FeedbackWith nearly 14,000 cities represented, checking each city's heat island raster for quality assurance would be prohibitively time-consuming, so The Trust for Public Land checked a statistically significant sample size for data quality. The sample passed all quality checks, with about 98.5% of the output cities error-free, but there could be instances where the user finds errors in the data. These errors will most likely take the form of a line of discontinuity where there is no city boundary; this type of error is caused by large temperature differences in two adjacent Landsat scenes, so the discontinuity occurs along scene boundaries (see figure below). The Trust for Public Land would appreciate feedback on these errors so that version 2 of the national UHI dataset can be improved. Contact Pete.Aniello@tpl.org with feedback.
This dataset shows the status of parks and playgrounds closure due to COVID-19 in the New York City (NYC). The data is provided by the National Hospital Care Survey (NHCS). The dataset has been provided by the Department of Parks and Recreation (DPR).
This dataset shows the New York City (NYC) street tree census data for the year 2015 provided by the Department of Parks and Recreation (DPR).
This dataset enlists the monthly listings of households, recipients and expenditures for SNAP (Supplemental Nutrition Assistance Program). Data is from the New York State Office of Temporary and Disability Assistance and the NYC Open Data.
Notice: this is not the latest Heat Island Severity image service.This layer contains the relative heat severity for every pixel for every city in the United States, including Alaska, Hawaii, and Puerto Rico. Heat Severity is a reclassified version of Heat Anomalies raster which is also published on this site. This data is generated from 30-meter Landsat 8 imagery band 10 (ground-level thermal sensor) from the summer of 2023.To explore previous versions of the data, visit the links below:Heat Severity - USA 2022Heat Severity - USA 2021Heat Severity - USA 2020Heat Severity - USA 2019Federal statistics over a 30-year period show extreme heat is the leading cause of weather-related deaths in the United States. Extreme heat exacerbated by urban heat islands can lead to increased respiratory difficulties, heat exhaustion, and heat stroke. These heat impacts significantly affect the most vulnerable—children, the elderly, and those with preexisting conditions.The purpose of this layer is to show where certain areas of cities are hotter than the average temperature for that same city as a whole. Severity is measured on a scale of 1 to 5, with 1 being a relatively mild heat area (slightly above the mean for the city), and 5 being a severe heat area (significantly above the mean for the city). The absolute heat above mean values are classified into these 5 classes using the Jenks Natural Breaks classification method, which seeks to reduce the variance within classes and maximize the variance between classes. Knowing where areas of high heat are located can help a city government plan for mitigation strategies.This dataset represents a snapshot in time. It will be updated yearly, but is static between updates. It does not take into account changes in heat during a single day, for example, from building shadows moving. The thermal readings detected by the Landsat 8 sensor are surface-level, whether that surface is the ground or the top of a building. Although there is strong correlation between surface temperature and air temperature, they are not the same. We believe that this is useful at the national level, and for cities that don’t have the ability to conduct their own hyper local temperature survey. Where local data is available, it may be more accurate than this dataset. Dataset SummaryThis dataset was developed using proprietary Python code developed at Trust for Public Land, running on the Descartes Labs platform through the Descartes Labs API for Python. The Descartes Labs platform allows for extremely fast retrieval and processing of imagery, which makes it possible to produce heat island data for all cities in the United States in a relatively short amount of time.What can you do with this layer?This layer has query, identify, and export image services available. Since it is served as an image service, it is not necessary to download the data; the service itself is data that can be used directly in any Esri geoprocessing tool that accepts raster data as input.In order to click on the image service and see the raw pixel values in a map viewer, you must be signed in to ArcGIS Online, then Enable Pop-Ups and Configure Pop-Ups.Using the Urban Heat Island (UHI) Image ServicesThe data is made available as an image service. There is a processing template applied that supplies the yellow-to-red or blue-to-red color ramp, but once this processing template is removed (you can do this in ArcGIS Pro or ArcGIS Desktop, or in QGIS), the actual data values come through the service and can be used directly in a geoprocessing tool (for example, to extract an area of interest). Following are instructions for doing this in Pro.In ArcGIS Pro, in a Map view, in the Catalog window, click on Portal. In the Portal window, click on the far-right icon representing Living Atlas. Search on the acronyms “tpl” and “uhi”. The results returned will be the UHI image services. Right click on a result and select “Add to current map” from the context menu. When the image service is added to the map, right-click on it in the map view, and select Properties. In the Properties window, select Processing Templates. On the drop-down menu at the top of the window, the default Processing Template is either a yellow-to-red ramp or a blue-to-red ramp. Click the drop-down, and select “None”, then “OK”. Now you will have the actual pixel values displayed in the map, and available to any geoprocessing tool that takes a raster as input. Below is a screenshot of ArcGIS Pro with a UHI image service loaded, color ramp removed, and symbology changed back to a yellow-to-red ramp (a classified renderer can also be used): A typical operation at this point is to clip out your area of interest. To do this, add your polygon shapefile or feature class to the map view, and use the Clip Raster tool to export your area of interest as a geoTIFF raster (file extension ".tif"). In the environments tab for the Clip Raster tool, click the dropdown for "Extent" and select "Same as Layer:", and select the name of your polygon. If you then need to convert the output raster to a polygon shapefile or feature class, run the Raster to Polygon tool, and select "Value" as the field.Other Sources of Heat Island InformationPlease see these websites for valuable information on heat islands and to learn about exciting new heat island research being led by scientists across the country:EPA’s Heat Island Resource CenterDr. Ladd Keith, University of ArizonaDr. Ben McMahan, University of Arizona Dr. Jeremy Hoffman, Science Museum of Virginia Dr. Hunter Jones, NOAA Daphne Lundi, Senior Policy Advisor, NYC Mayor's Office of Recovery and ResiliencyDisclaimer/FeedbackWith nearly 14,000 cities represented, checking each city's heat island raster for quality assurance would be prohibitively time-consuming, so Trust for Public Land checked a statistically significant sample size for data quality. The sample passed all quality checks, with about 98.5% of the output cities error-free, but there could be instances where the user finds errors in the data. These errors will most likely take the form of a line of discontinuity where there is no city boundary; this type of error is caused by large temperature differences in two adjacent Landsat scenes, so the discontinuity occurs along scene boundaries (see figure below). Trust for Public Land would appreciate feedback on these errors so that version 2 of the national UHI dataset can be improved. Contact Dale.Watt@tpl.org with feedback.
This dataset shows the COVID-19 outcomes by testing cohorts. It shows the cases, hospitalizations and Deaths in the NYC (New York City). The data is provided by the Department of Health and Mental Hygiene (DOHMH).
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Human mobility is crucial for urban planning (e.g., public transportation) and epidemic response strategies. However, existing research often neglects integrating comprehensive perspectives on spatial dynamics, temporal trends, and other contextual views due to the limitations of existing mobility datasets. To bridge this gap, we introduce MOBINS (MOBIlity Networked time Series), a novel dataset collection designed for networked time-series forecasting of dynamic human movements. MOBINS features diverse and explainable datasets that capture various mobility patterns across different transportation modes in four cities and two countries and cover both transportation and epidemic domains at the administrative area level. Our experiments with nine baseline methods reveal the significant impact of different model backbones on the proposed six datasets. We provide a valuable resource for advancing urban mobility research, and our dataset collection is available at DOI 10.5281/zenodo.14590709.
DLinear
, NLinear
SegRNN
Informer
, Reformer
, PatchTST
TimesNet
STGCN
, MPNNLSTM
There is MOBINS_Results.pdf in the Github Link, the detailed benchmark results of MOBINS were reported with MAE, MSE, and standard deviation.
DLinear
: https://github.com/cure-lab/LTSF-LinearNLinear
: https://github.com/cure-lab/LTSF-LinearSegRNN
: https://github.com/lss-1138/SegRNNInformer
: https://github.com/zhouhaoyi/Informer2020Reformer
: https://github.com/lucidrains/reformer-pytorchPatchTST
: https://github.com/yuqinie98/PatchTSTTimesNet
: https://github.com/thuml/TimesNetSTGCN
: https://github.com/hazdzz/STGCNMPNNLSTM
: https://github.com/geopanag/pandemic_tgnnDataset | Locations | Spatial node units | Edges | Domain | Daily Movements | Daily Amounts | Time interval | Time Range | Frames | Target dimension |
---|---|---|---|---|---|---|---|---|---|---|
Transportation | Seoul | 128 | 290 | Station-based administrative area | SmartCard:2.68M | In/Out-flow:4.02M | 1 hour | 01/01/2022-12/31/2023 | 17520 | 16640 |
Busan | 60 | 121 | Station-based administrative area | SmartCard:0.63M | In/Out-flow:0.75M | 1 hour | 01/01/2021-12/31/2023 | 26280 | 3720 | |
Daegu | 61 | 123 | Station-based administrative area | SmartCard:0.10M | In/Out-flow:0.34M | 1 hour | 01/01/2021-12/31/2023 | 26280 | 3843 | |
NYC | 5 | 12 | Borough | Taxi:0.10M | Ridership:3.03M | 1 hour | 02/01/2022-03/31/2024 | 17280 | 30 | |
Epidemic | Korea | 16 | 45 | City&Province | SmartCards:13.41M | Infection:25834 | 1 day | 01/20/2020-08/31/2023 | 1320 | 272 |
NYC | 5 | 12 | Borough | Taxi:2418 | Infection:2038 | 1 day | 03/01/2020-12/31/2023 | 1401 | 30 |
csv format
datasets in every environment: each dataset has three components.
SPATIAL_NETWORK.csv
: ( n∗n where n = # of nodes )
NODE_TIME_SERIES_FEATURES.csv
: ( t * p ) * ( n * d ) where t = # of timestamps in a day, p = total period, and d = # of variables from time series
OD_MOVEMENTS.csv
: ( t * p ) * ( n, n )
In the Github Link, there is metadata for MOBINS_Meta.pdf.
Each file contains information about a single node or a node pair, which is abstracted for simplicity by describing only the i-th node. We omit the detailed description in metadata for Transportation-[Busan, Daegu] because the CSV file structures are identical to the metadata for Transportation_Seoul, differing only in the number of nodes, which is unique to each dataset. Transportation_NYC follows a similar structure, with the exception of the variable for node time-series features (ridership).
Each file contains information about a single node or a node pair, which is abstracted for simplicity by describing only the i-th node. Both datasets share a consistent structure in terms of node time-series features, OD movements, and spatial networks.
Transportation-[Seoul, Busan, Daegu, NYC]
and Epidemic-NYC
datasets are released under a CC BY-NC 4.0 International License.Epidemic-Korea
datasets are released under a CC BY-NC-ND 4.0 International License.The MOBINS dataset collection consists of mobility networked time-series data for forecasting tasks in two domains: Transportation-[Seoul, Busan, Daegu, NYC] and Epidemic-[Korea, NYC]. Each dataset comprises three key components: (1) OD movements, (2) a spatial network, and (3) time series. These datasets capture the temporal evolution of OD movements and time series within a fixed spatial network. OD movements represent the volume of movements between pairs of nodes, while time series denotes the time-varying features within each node. These datasets provide a comprehensive understanding of mobility patterns, exhibiting high correlation and synergy between OD movements and time series.
All datasets in the MOBINS are collected from reliable sources, including government agencies, local governments, public transportation operators, and smart card companies. These sources provide publicly accessible data downloads based on their administrative systems. The source data from smart transit card information systems is accessed through API calls at the administrative area level, such as neighborhoods or provinces, to align the spatial resolution of the time series.
The use of data available on the Korea Public Data Portal is either unrestricted or covered by the CC BY license. For sources without a specific license indication, we obtained responses about the uses for research through inquiries via phone or email. Additionally, data from the Korea Disease Control and Prevention Agency was used without numerical value modifications after obtaining permission.
Each dataset in the MOBINS collection is derived from different sources for OD movements and time series. To ensure consistent spatial and temporal resolution, we align these two sources using Python. In the Transportation-[Seoul, Busan, Daegu] datasets, we use 'station-based administrative areas' as spatial node units, treating stations within the same administrative area as a single node. For the Transportation-NYC dataset, we use boroughs as spatial node units to align the spatial resolution between taxi zones and stations. In the Epidemic-Korea dataset, the source infection case data is collected at the city and province levels. Hence, we use OD movements based on the city and province levels to match spatial resolution. Similarly, for the \emph{Epidemic-NYC} dataset, we use corresponding OD movements at the borough level to maintain consistent spatial node units. After the spatial
The dataset comes from CouncilStat, which is used by many NYC Council district offices to enter and track constituent cases that can range from issues around affordable housing, to potholes and pedestrian safety. This dataset aggregates the information that individual staff have input. However, district staffs handle a wide range of complex issues. Each offices uses the program differently, and thus records cases, differently and so comparisons between accounts may be difficult. Not all offices use the program. For more info - http://labs.council.nyc/districts/data/
This dataset shows daily citywide counts of persons tested by nucleic acid amplification tests (NAAT, also known as a molecular test; e.g. a PCR test) for SARS-CoV-2 , counts of persons with positive tests, and the percent positivity. Also included is a calculation of the average percent positivity over a 7-day period. NAAT tests work through direct detection of the virus’s genetic material, and typically involve collecting a nasal swab. These tests are highly accurate and recommended for diagnosing current COVID-19 infection. After specimen collection, molecular tests are processed in a laboratory, and results are electronically reported to the New York State (NYS) Electronic Clinical Laboratory Results System (ECLRS). Test results for NYC residents are then sent electronically to NYC DOHMH. There is typically a lag of a few days between when a specimen is collected and when a result is reported to NYC DOHMH. Data is sourced from electronic laboratory reporting from NYS ECLRS. All identifying health information is excluded from the dataset.