Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
{# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contain lung x-ray image including:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15315323%2F8041ddd2485bfe9cdf2ba1f9d96bd7e5%2F6_Class_Img.jpg?generation=1741951756137022&alt=media" alt="">
The dataset we use is compiled from many reputable sources including: Dataset 1 [1]: This dataset includes four classes of diseases: COVID-19, viral pneumonia, bacterial pneumonia, and normal. It has multiple versions, and we are currently using the latest version (version 4). Previous studies, such as those by Hariri et al. [18] and Ahmad et al. [20], have also utilized earlier versions of this dataset. Dataset 2 [2]: This dataset is from the National Institutes of Health (NIH) Chest X-Ray Dataset, which contains over 100,000 chest X-ray images from over 30,000 patients. It includes 14 disease classes, including conditions like atelectasis, consolidation, and infiltration. For this study, we have selected 2,550 chest X-ray images specifically from the Emphysema class. Dataset 3 [3]: This is the COVQU dataset, which we have extended to include two additional classes: COVID-19 and viral pneumonia. This dataset has been widely used in previous studies by M.E.H. Chowdhury et al. [4] and Rahman T et al. [5], establishing its reputation as a reliable resource.
In addition, we also publish a modified dataset that aims to remove image regions that do not contain lungs (abdomen, arms, etc.).
References: [1] U. Sait, K. G. Lal, S. P. Prajapati, R. Bhaumik, T. Kumar, S. Shivakumar, K. Bhalla, Curated dataset for covid-19 posterior-anterior chest radiography images (x-rays)., Mendeley Data V4 (2022). doi:10.17632/9xkhgts2s6.4. [2] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers, Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases (2017) 3462–3471. doi:10.1109/CVPR.2017.369. [3] A. M. Tahir, M. E. Chowdhury, A. Khandakar, T. Rahman, Y. Qiblawey, U. Khurshid, S. Kiranyaz, N. Ibtehaz, M. S. Rahman, S. Al-Maadeed,S. Mahmud, M. Ezeddin, K. Hameed, T. Hamid, Covid-19 infection localization and severity grading from chest x-ray images, Computers in Biology and Medicine 139 (2021) 105002. URL: https://www.sciencedirect.com/science/article/pii/S0010482521007964. doi:https://doi.org/10.1016/j.compbiomed.2021.105002. [4] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. A. Emadi, M. B. I. Reaz, M. T. Islam, Can ai help in screening viral and covid-19 pneumonia?, IEEE Access 8 (2020) 132665–132676. doi:10.1109/ACCESS.2020.3010287. [5] T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. B. A. Kashem, M. T. Islam, S. A. Maadeed, S. M. Zughaier, M. S. Khan, M. E. Chowdhury, Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images, Computers in Biology and Medicine 132 (2021). doi:10.1016/j.compbiomed.2021.104319.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Global cropland extent annual for 2000-2022 based on the Potapov et al. (2021). Cropland defined as: land used for annual and perennial herbaceous crops for human consumption, forage (including hay), and biofuel. Perennial woody crops, permanent pastures, and shifting cultivation are excluded from the definition. The original 30-m resolution data (0/1 values) was interpolated from time-series 2003, 2007, 2011, 2015, 2019 to annual values 2000 to 2022 using linear interpolation. All values shown are in principle fractions 0-100%. The 30-m and 100-m resoluton images (COGs) are too large for Zenodo but you can access them from URLs in the filenames_openlandmap_cropland.txt file. See for example (drop the URL in QGIS):
Disclaimer: linear interpolation has limited accuracy and is basically only used to gap-fill the missing years. The remaining missing values in the maps can be ALL consider to be 0 value for cropland. A more detailed up-to-date cropland map of the world is provided by van Tricht et al., (2023), however only single year (2021) has been mapped at 10-m resolution within the WorldCereal project.
The temporal interpolation was implemented using terra package ii.e. using the following fuction:
library(terra)
y.l = c(2003, 2007, 2011, 2015, 2019)
out.years = 2000:2022
i = parallel::mclapply(y.l, function(x){system(paste0('gdal_translate Global_cropland_', x, '.vrt Global_cropland_', x, '.tif -co TILED=YES -co BIGTIFF=YES -co COMPRESS=DEFLATE -co ZLEVEL=9 -co BLOCKXSIZE=1024 -co BLOCKYSIZE=1024 -co NUM_THREADS=8 -co SPARSE_OK=TRUE -a_nodata 255 -scale 0 1 0 100 -ot Byte'))}, mc.cores = length(y.l))
## land mask at 1 deg (100x100km) ----
x = parallel::mclapply(y.l, function(x){system(paste0("gdal_translate Global_cropland_", x, ".tif Global_cropland_", x, "_1d.tif -tr 1 1 -r average -co BIGTIFF=YES -ot Byte -co NUM_THREADS=10"))}, mc.cores = length(y.l))
## 2 hrs
g1 = terra::rast(paste0("Global_cropland_", y.l, "_1d.tif"))
gs = sum(g1, na.rm=TRUE)
plot(gs)
gs.p <- as.polygons(gs, values = TRUE, extent=FALSE, dissolve=FALSE, na.rm=TRUE)
## Input layers:
r = terra::rast(paste0("Global_cropland_", y.l, ".tif"))
int.mc = function(r, tile, y.l, out.years=2000:2022){
bb = paste(as.vector(ext(tile)), collapse = ".")
if(any(!file.exists(paste0("./tmp/", out.years, "/Global_cropland_", out.years, "_", bb, ".tif")))){
r.t = terra::crop(r, ext(tile))
## each tile is 16M pixels
r.x = as.data.frame(r.t, xy=TRUE, na.rm=FALSE)
rs = rowSums(r.x[,-c(1:2)], na.rm=TRUE)
## if sum is == 0 means no cropland throughout the time-series
sel = which(rs>0)
## extract complete values:
r.x0 = r.x[sel,-c(1:2)]
r.x0[is.na(r.x0)] = 0
## interpolate between values:
t1s = as.data.frame(t(apply(r.x0, 1, function(y){ try( approx(y.l, as.vector(y), xout=out.years, rule=2)$y ) })))
## write to GeoTIFFs
t1s$x <- r.x$x[sel]; t1s$y <- r.x$y[sel]
## convert to RasterLayer:
r.x = rast(t1s[,c("x","y",paste0("V", 1:length(out.years)))], type="xyz", crs="+proj=longlat +datum=WGS84 +no_defs")
for(j in 1:length(out.years)){
writeRaster(r.x[[j]], filename=paste0("./tmp/", out.years[j], "/Global_cropland_", out.years[j], "_", bb, ".tif"), gdal=c("COMPRESS=DEFLATE"), datatype='INT1U', NAflag=0, overwrite=FALSE)
}
}
}
## test it:
#int.mc(r, tile=gs.p[1000], y.l)
## run in parallel ----
## takes 12 hrs... 1TB RAM
i = parallel::mclapply(sample(1:length(gs.p)), function(x){try( int.mc(r, tile=gs.p[x], y.l) )}, mc.cores = 70)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Since its inception, the DHS has produced Fiscal Yearly ERO and detention statistics reports that provide information on detention and Alternatives to Detention (ATD) statistics. This dataset compiles the Ice Detention Data, Average in Custody Length of Stay (ICLOS) and Detainees, and Facilities tabs of the FY19, FY20, FY21, and FY22 DHS-ICE reports (referenced below), into a multi-year dataset.
Data Clean-up Process: To assist in the analysis and clear understanding of the data, and fix inaccuracies and data silences in the original data, additional columns were added to the facilities tab. Additionally, columns that were not used in the analysis and where outside of the scope of work, were removed from the new dataset. Rationale for the additional columns and the columns removed is as follow:
New Operator Category column: In attempting to understand the type of DHS facilities and how they may have changed through the years, an Operator Category column was added, where every facility is categorized as A) Country or City Facility (public), B) Privately Operated Facility, and C) Unknown operator. New Gender Modified Column: Upon reviewing the total number of female and male detainees for each facility it was noted that the original Male/Female column, which categorized each facility by the gender of the detainees, was often not reflective of the detainee population. To fix this error, a new "Gender Modified" column was added and the facilities were categorized as A) Male, B) Female, and C) Female and Male depending on the Total Male and Total Female columns of the reports. The new Gender Modified column is based on the ADP numbers of female vs. male detainees. If there were 0 female detainees, the facility was categorized as Male. If there were 0 Male ADP detainees, the facility was categorized as Female. If there were 1 or more detainees for both the female and male totals, the facility was categorized as Female and Male. New FY19-22 Active columns and New Ice Facilities ACTIVE 19-22 tab: Since the project attempts to do a multi-year analysis, five new columns were added to indicate if a facility was active FY19-22, Active FY19, Active FY20, Active FY21, and Active FY22. A facility that was active in a particular year was marked as Y. A facility that was Active all four years was marked as Y in the Active FY19-22 column. Certain modeling and hypothesis testing methodologies required that only the statistics for the facilities that were active FY19-22 be used. For this purpose, a second Tab named “Ice Facilities ACTIVE 19-22” was created. Removal of second to last inspection columns: The original DHS reports included additional columns with second to last inspection type, standard, and date. For the scope of this project, these columns were removed from each year and only the Last Inspection type, standard, and date were left and used for this project. New Percentage of Criminality columns: To calculate the percentage of criminality among detainees new Percentage of Criminality columns were added for each fiscal year. The percentages were calculated by the formula: (ADP: Criminality: Male Crim + ADP: Criminality: Female Crim) / (ADP Total Male + FY: Total Female) NA cells: to indicate that no information was provided by ICE in the original reports or to indicate that the particular facility was not active for a particular year, NA was added in the corresponding cells. Additional data analysis in R changed the NA to nulls.
Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.
To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.
To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.
I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.
As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:
Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Asterisk indicates datasets with p-value < 0.05.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We used this dataset to assess the strength of isolation due to geographic and macroclimatic distance across island and mainland systems, comparing published measurements of phenotypic traits and neutral genetic diversity for populations of plants and animals worldwide. The dataset includes 112 studies of 108 species (72 animals and 36 plants) in 868 island populations and 760 mainland populations, with population-level taxonomic and biogeographic information, totalling 7438 records. Methods Description of methods used for collection/generation of data: We searched the ISI Web of Science in March 2017 for comparative studies that included data on phenotypic traits and/or neutral genetic diversity of populations on true islands and on mainland sites in any taxonomic group. Search terms were 'island' and ('mainland' or 'continental') and 'population*' and ('demograph*' or 'fitness' or 'survival' or 'growth' or 'reproduc*' or 'density' or 'abundance' or 'size' or 'genetic diversity' or 'genetic structure' or 'population genetics') and ('plant*' or 'tree*' or 'shrub*or 'animal*' or 'bird*' or 'amphibian*' or 'mammal*' or 'reptile*' or 'lizard*' or 'snake*' or 'fish'), subsequently refined to the Web of Science categories 'Ecology' or 'Evolutionary Biology' or 'Zoology' or 'Genetics Heredity' or 'Biodiversity Conservation' or 'Marine Freshwater Biology' or 'Plant Sciences' or 'Geography Physical' or 'Ornithology' or 'Biochemistry Molecular Biology' or 'Multidisciplinary Sciences' or 'Environmental Sciences' or 'Fisheries' or 'Oceanography' or 'Biology' or 'Forestry' or 'Reproductive Biology' or 'Behavioral Sciences'. The search included the whole text including abstract and title, but only abstracts and titles were searchable for older papers depending on the journal. The search returned 1237 papers which were distributed among coauthors for further scrutiny. First paper filter To be useful, the papers must have met the following criteria: Overall study design criteria: Include at least two separate islands and two mainland populations; Eliminate studies comparing populations on several islands where there were no clear mainland vs. island comparisons; Present primary research data (e.g., meta-analyses were discarded); Include a field study (e.g., experimental studies and ex situ populations were discarded); Can include data from sub-populations pooled within an island or within a mainland population (but not between islands or between mainland sites); Island criteria: Island populations situated on separate islands (papers where all information on island populations originated from a single island were discarded); Can include multiple populations recorded on the same island, if there is more than one island in the study; While we accepted the authors' judgement about island vs. mainland status, in 19 papers we made our own judgement based on the relative size of the island or position relative to the mainland (e.g. Honshu Island of Japan, sized 227 960 km² was interpreted as mainland relative to islands less than 91 km²); Include islands surrounded by sea water but not islands in a lake or big river; Include islands regardless of origin (continental shelf, volcanic); Taxonomic criteria: Include any taxonomic group; The paper must compare populations within a single species; Do not include marine species (including coastline organisms); Databases used to check species delimitation: Handbook of Birds of the World (www.hbw.com/); International Plant Names Index (https://www.ipni.org/); Plants of the World Online(https://powo.science.kew.org/); Handbook of the Mammals of the World; Global Biodiversity Information Facility (https://www.gbif.org/); Biogeographic criteria: Include all continents, as well as studies on multiple continents; Do not include papers regarding migratory species; Only include old / historical invasions to islands (>50 yrs); do not include recent invasions; Response criteria: Do not include studies which report community-level responses such as species richness; Include genetic diversity measures and/or individual and population-level phenotypic trait responses; The first paper filter resulted in 235 papers which were randomly reassigned for a second round of filtering. Second paper filter In the second filter, we excluded papers that did not provide population geographic coordinates and population-level quantitative data, unless data were provided upon contacting the authors or could be obtained from figures using DataThief (Tummers 2006). We visually inspected maps plotted for each study separately and we made minor adjustments to the GPS coordinates when the coordinates placed the focal population off the island or mainland. For this study, we included only responses measured at the individual level, therefore we removed papers referring to demographic performance and traits such as immunity, behaviour and diet that are heavily reliant on ecosystem context. We extracted data on population-level mean for two broad categories of response: i) broad phenotypic measures, which included traits (size, weight and morphology of entire body or body parts), metabolism products, physiology, vital rates (growth, survival, reproduction) and mean age of sampled mature individuals; and ii) genetic diversity, which included heterozygosity,allelic richness, number of alleles per locus etc. The final dataset includes 112 studies and 108 species. Methods for processing the data: We made minor adjustments to the GPS location of some populations upon visual inspection on Google Maps of the correct overlay of the data point with the indicated island body or mainland. For each population we extracted four climate variables reflecting mean and variation in temperature and precipitation available in CliMond V1.2 (Kritikos et al. 2012) at 10 minutes resolution: mean annual temperature (Bio1), annual precipitation (Bio12), temperature seasonality (CV) (Bio4) and precipitation seasonality (CV) (Bio15) using the "prcomp function" in the stats package in R. For populations where climate variables were not available on the global climate maps mostly due to small island size not captured in CliMond, we extracted data from the geographically closest grid cell with available climate values, which was available within 3.5 km away from the focal grid cell for all localities. We normalised the four climate variables using the "normalizer" package in R (Vilela 2020), and we performed a Principal Component Analysis (PCA) using the psych package in R (Revelle 2018). We saved the loadings of the axes for further analyses. References:
Bruno Vilela (2020). normalizer: Making data normal again.. R package version 0.1.0. Kriticos, D.J., Webber, B.L., Leriche, A., Ota, N., Macadam, I., Bathols, J., et al.(2012). CliMond: global high-resolution historical and future scenario climate surfaces for bioclimatic modelling. Methods Ecol. Evol., 3, 53--64. Revelle, W. (2018) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 1.8.12. Tummers, B. (2006). DataThief III. https://datathief.org/
The 5-year goal of the “Model America” concept was to generate a model of every building in the United States. This data repository delivers on that goal. Oak Ridge National Laboratory (ORNL) has developed the Automatic Building Energy Modeling (AutoBEM) software suite to process multiple types of data, extract building-specific descriptors, generate building energy models, and simulate them on High Performance Computing (HPC) resources. For more information, see AutoBEM-related publications (bit.ly/AutoBEM). There were 125,714,640 buildings detected in the United States and this dataset contains 122,930,327 (97.8%) buildings which resulted in a successful simulation. Future, annual updates have been proposed that may include additional buildings, data improvements, or other algorithmic enhancements. This dataset of 122.9 million buildings includes: Models (state_county.zip) – OpenStudio (v3.1.0) and EnergyPlus (v9.4) building energy models. Please note that the download requires the free Globus Connect Personal (https://www.globus.org/globus-connect-personal); Each model has approximately 3,000 building input descriptors that can be extracted. Please see the EnergyPlus (v9.4) 2,784-page Input/Output Reference Guide (https://energyplus.net/sites/all/modules/custom/nrel_custom/pdfs/pdfs_v9.4.0/InputOutputReference.pdf) for everything that can be retrieved or simulated from these models. These models were derived from the following metadata, which is not included in this dataset: 1. ID - unique building ID 2. County - county name 3. State - state name 4. CZ - ASHRAE Climate Zone designation 5. Clim_Zone - text label of climate zone 6. est_year - estimated year of construction 7. est_commercial - estimated building type (0=residential, 1=commercial) 8. Centroid - building center location in latitude/longitude (from Footprint2D) 9. Footprint2D - building polygon of 2D footprint (lat1/lon1_lat2/lon2_...) 10. Height - building height (meters) 11. Area2D - footprint area (ft2) 12. BuildingType - DOE prototype building designation (IECC=residential) as implemented by OpenStudio-standards 13. WWR_surfaces - percent of each facade (pair of points from Footprint2D) covered by fenestration/windows (average 14.5% for residential, 40% for commercial buildings) 14. NumFloors - number of floors (above-grade) 15. Area - estimate of total conditioned floor area (ft2) 16. Standard - building vintage. These models are made free and openly available in hopes of stimulating any simulation-informed use case. Data is provided as-is with no warranties, express or implied, regarding fitness for a particular purpose. We wish to thank our sponsors which include Oak Ridge National Laboratory (ORNL) Laboratory Directed Research and Development (LDRD), U.S. Dept. of Energy’s (DOE) Building Technologies Office (BTO), Office of Electricity (OE), Biological and Environmental Research (BER), and National Nuclear Security Administration (NNSA). This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. Please cite as: New, Joshua R., Adams, Mark, Bass, Brett, Berres, Anne, and Clinton, Nicholas (2021). “Model America - data and models of every U.S. building. [Data set].” Constellation, doi.ccs.ornl.gov/ui/doi/339, April 14, 2021
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
{# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}