Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: Falcon distribution data
Facebook
TwitterThe fitted slopes of the E3C/E2C data distributions as a function of jet pt are used to illustrate the dependency...
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset contains detailed information about the locations and operational status of grocery stores in Washington, spanning multiple years. It includes both spatial and temporal data, offering a comprehensive view of how grocery stores are distributed and have evolved over time. Below is a breakdown of the columns included in the dataset:
X, Y: Geographic coordinates (latitude and longitude) representing the store's location in the dataset.
STORENAME: The name of the grocery store.
ADDRESS: The physical address of the grocery store.
ZIPCODE: The ZIP code of the store’s location.
PHONE: The contact phone number for the store.
WARD: The local government ward in which the store is located.
SSL: A unique identifier or code related to the store, possibly referring to specific data collection attributes.
NOTES: Additional comments or information about the store.
PRESENT: Temporal indicators showing the presence (likely open or closed) of each store across various years. These columns provide insights into the longevity and temporal trends of grocery store operations.
GIS_ID: A unique identifier for geographic information system (GIS) data.
XCOORD, YCOORD: Coordinates (likely more specific) used for spatial data analysis, providing the exact location of the store.
MAR_ID: A unique identifier for marketing or regional analysis purposes.
GLOBALID: A global unique identifier for the store data.
CREATOR: The individual or system that created the data entry.
CREATED: Timestamp showing when the data entry was created.
EDITOR: The individual or system that edited the data entry.
EDITED: Timestamp showing when the data entry was last edited.
SE_ANNO_CAD_DATA: Specific annotation or data related to CAD (computer-aided design), possibly linked to store location details.
OBJECTID: A unique identifier for the object or record within the dataset.
This dataset is invaluable for urban planners, policymakers, and business stakeholders looking to improve food access and urban infrastructure.
Facebook
TwitterFind details of Daisy Distribution Buyer/importer data in US (United States) with product description, price, shipment date, quantity, imported products list, major us ports name, overseas suppliers/exporters name etc. at sear.co.in.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
These data files provide species abundance and distribution data from analysis of diver and ROV images for the eastern Long Island Sound Phase IIB study area. The files are in Excel spreadsheet format. Funding was provided by the Long Island Sound Cable Fund Seafloor Habitat Mapping Initiative administered cooperatively by the EPA Long Island Sound Study and the Connecticut Department of Energy and Environmental Protection (DEEP).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Ocean View by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Ocean View. The dataset can be utilized to understand the population distribution of Ocean View by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Ocean View. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Ocean View.
Key observations
Largest age group (population): Male # 75-79 years (253) | Female # 75-79 years (268). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Ocean View Population by Gender. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Lake View by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Lake View. The dataset can be utilized to understand the population distribution of Lake View by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Lake View. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Lake View.
Key observations
Largest age group (population): Male # 30-34 years (252) | Female # 35-39 years (433). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lake View Population by Gender. You can refer the same here
Facebook
TwitterFind details of Dollar Tree Distribution Center Buyer/importer data in US (United States) with product description, price, shipment date, quantity, imported products list, major us ports name, overseas suppliers/exporters name etc. at sear.co.in.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical Dataset of Lake View Elementary School is provided by PublicSchoolReview and contain statistics on metrics:Distribution of Students By Grade Trends
Facebook
TwitterFind details of Rb Distribution Inc Buyer/importer data in US (United States) with product description, price, shipment date, quantity, imported products list, major us ports name, overseas suppliers/exporters name etc. at sear.co.in.
Facebook
TwitterUsing the Box-Cox quantile regression model, we analyse the size distribution of firms in Portuguese manufacturing during the 1980s. Specifically, we estimate the effect of selected industry attributes on the location, scale, skewness and kurtosis of the conditional size distributions of firms. We find that industry attributes affect the size of firms in the same direction across the distribution, but the effects of these variables are typically much greater at the largest quantiles. Over time the distribution shifted towards smaller firms, due mainly to the way the economy responds to industry characteristics rather than to changes of the level of these characteristics. The prediction of lognormality, implied by Gibrat's Law, is soundly rejected by the observed distribution of firm sizes. However, we found that, at least in 1983, lognormality is a reasonable description of the conditional size distribution.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Bay View by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Bay View. The dataset can be utilized to understand the population distribution of Bay View by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Bay View. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Bay View.
Key observations
Largest age group (population): Male # 15-19 years (67) | Female # 55-59 years (73). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Bay View Population by Gender. You can refer the same here
Facebook
TwitterDistributions of scores for nonresDNN in 6b data and the background prediction after background-only fits to the observed data. The...
Facebook
TwitterThis dataset details vehicle types and ages for each transit agency reporting to the NTD in the 2022, 2023, and 2024 report years. Non-dedicated fleets do not report Year of Manufacture and are thus excluded from the Age Distribution table.
Agencies do not report Useful Life Benchmark for non-dedicated fleets or fleets for which the agency does not have capital replacement responsibility. These fleets are excluded from calculations of the percentage of vehicles meeting or exceeding their useful life.
In versions of the data tables from before 2014, you can find data on vehicles in the file called "Age Distribution of Active Vehicle Inventory."
In years 2014-2021, you can find this data in the "Vehicles" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.
If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
Facebook
TwitterBy Michael Tauberg [source]
This comprehensive dataset spans a substantial sampling of movies from the last five decades, giving insight into the financial and creative successes of Hollywood film productions. Containing various production details such as director, actors, editing team, budget, and overall gross revenue, it can be used to understand how different elements come together to make a movie successful. With information covering all aspects of movie-making – from country of origin to soundtrack composer – this collection offers an unparalleled opportunity for a data-driven dive into the world of cinematic storytelling
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
The columns are important factors to analyze the data in depth – they range from general information such as year, name and language of movie to more specific info such as directors and editors of movie production teams. A good first step is to get an understanding of what kind of data exists and getting familiar with different columns.
Good luck exploring!
- Analyzing the correlations between budget, gross revenue, and number of awards or nominations won by a movie. Movie-makers and studios can use this data to understand what factors have an impact on the success of a movie and make better creative decisions accordingly.
- Studying the trend of movies from different countries over time to understand how popular genres are changing over time across regions and countries; this data could be used by international film producers to identify potential opportunities for co-productions with other countries or regions.
- Identifying unique topics for films (based on writers, directors, music etc) that hadn’t been explored in previous decades - studios can use this data to find unique stories or ideas for new films that often succeed commercially due to its novelty factor with audiences
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: movies_1970_2018.csv | Column name | Description | |:-------------------|:----------------------------------------------------------| | year | Year the movie was released. (Integer) | | wiki_ref | Reference to the Wikipedia page for the movie. (String) | | wiki_query | Query used to search for the movie on Wikipedia. (String) | | producer | Name of the producer of the movie. (String) | | distributor | Name of the distributor of the movie. (String) | | name | Name of the movie. (String) | | country | Country of origin of the movie. (String) | | director | Name of the director of the movie. (String) | | cinematography | Name of the cinematographer of the movie. (String) | | editing | Name of the editor of the movie. (String) | | studio | Name of the studio that produced the movie. (String) | | budget | Budget of the movie. (Integer) | | gross | Gross box office receipts of the movie. (Integer) | | runtime | Length of the movie in minutes. (Integer) | | music | Name of the composer of the movie's soundtrack. (String) | | writer | Name of the writer of the movie. (String) | | starring | Names of the actors in the movie. (String) | | language | Language of the movie. (String) |
If you use this dataset in your research, p...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This filtered view displays only the Points of Distribution (PODs) that are currently open in Montgomery County. PODs are critical locations where essential supplies such as water, tarps, meals (MREs), blankets, and more are distributed to residents during emergencies. The view is accessible to the public and serves to provide real-time information about active POD locations during crises.
Facebook
TwitterThis dataset provides supporting information for the species distribution data used in the associated manuscript. Collections of five non-native fish species were made by a number of institutions, and several capture techniques were used. This dataset also includes number of individuals of each species captured at each locality.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Resetting a stochastic process has been shown to expedite the completion time of some complex task, such as finding a target for the first time. Here we consider the cost of resetting by associating a cost to each reset, which is a function of the distance travelled during the reset event. We compute the Laplace transform of the joint probability of first passage time $t_f$, number of resets $N$ and resetting cost $C$, and use this to study the statistics of the total cost. We show that in the limit of zero resetting rate the mean cost is finite for a linear cost function, vanishes for a sub-linear cost function and diverges for a super-linear cost function. This result contrasts with the case of no resetting where the cost is always zero. For the case of an exponentially increasing cost function we show that the mean cost diverges at a finite resetting rate. We explain this by showing that the distribution of the cost has a power-law tail with continuously varying exponent that depends on the resetting rate. The dataset is related to the upcoming paper John C. Sunil, Richard A. Blythe, Martin R. Evans and Satya N. Majumdar (in submission), 'The Cost of Stochastic Resetting'.
Facebook
TwitterWinter Steelhead Distribution June 2012 Version This dataset depicts observation-based stream-level geographic distribution of anadromous winter-run steelhead trout, Oncorhynchus mykiss irideus (O. mykiss), in California. It was developed for the express purpose of assisting with steelhead recovery planning efforts. The distributions reported in this dataset were derived from a subset of the data contained in the Aquatic Species Observation Database (ASOD), a Microsoft Access multi-species observation data capture application. ASOD is an ongoing project designed to capture as complete a set of statewide inland aquatic vertebrate species observation information as possible. Please note: A separate distribution is available for summer-run steelhead. Contact information is the same as for the above. ASOD Observation data were used to develop a network of stream segments. These lines are developed by "tracing down" from each observation to the sea using the flow properties of USGS National Hydrography Dataset (NHD) High Resolution hydrography. Lastly these lines, representing stream segments, were assigned a value of either Anad Present (Anadromous present). The end result (i.e., this layer) consists of a set of lines representing the distribution of steelhead based on observations in the Aquatic Species Observation Database. This dataset represents stream reaches that are known or believed to be used by steelhead based on steelhead observations. Thus, it contains only positive steelhead occurrences. The absence of distribution on a stream does not necessarily indicate that steelhead do not utilize that stream. Additionally, steelhead may not be found in all streams or reaches each year. This is due to natural variations in run size, water conditions, and other environmental factors. The information in this data set should be used as an indicator of steelhead presence/suspected presence at the time of the observation as indicated by the 'Late_Yr' (Latest Year) field attribute. The line features in the dataset may not represent the maximum extent of steelhead on a stream; rather it is important to note that this distribution most likely underestimates the actual distribution of steelhead. This distribution is based on observations found in the ASOD database. The individual observations may not have occurred at the upper extent of anadromous occupation. In addition, no attempt was made to capture every observation of O. mykiss and so it should not be assumed that this dataset is complete for each stream. The distribution dataset was built solely from the ASOD observational data. No additional data (habitat mapping, barriers data, gradient modeling, etc.) were utilized to either add to or validate the data. It is very possible that an anadromous observation in this dataset has been recorded above (upstream of) a barrier as identified in the Passage Assessment Database (PAD). In the near future, we hope to perform a comparative analysis between this dataset and the PAD to identify and resolve all such discrepancies. Such an analysis will add rigor to and help validate both datasets. This dataset has recently undergone a review. Data source contributors as well as CDFG fisheries biologists have been provided the opportunity to review and suggest edits or additions during a recent review. Data contributors were notified and invited to review and comment on the handling of the information that they provided. The distribution was then posted to an intranet mapping application and CDFG biologists were provided an opportunity to review and comment on the dataset. During this review, biologists were also encouraged to add new observation data. This resulting final distribution contains their suggestions and additions. Please refer to "Use Constraints" section below.
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).