Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This comprehensive dataset explores the relationship between housing and weather conditions across North America in 2012. Through a range of climate variables such as temperature, wind speed, humidity, pressure and visibility it provides unique insights into the weather-influenced environment of numerous regions. The interrelated nature of housing parameters such as longitude, latitude, median income, median house value and ocean proximity further enhances our understanding of how distinct climates play an integral part in area real estate valuations. Analyzing these two data sets offers a wealth of knowledge when it comes to understanding what factors can dictate the value and comfort level offered by residential areas throughout North America
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset offers plenty of insights into the effects of weather and housing on North American regions. To explore these relationships, you can perform data analysis on the variables provided.
First, start by examining descriptive statistics (i.e., mean, median, mode). This can help show you the general trend and distribution of each variable in this dataset. For example, what is the most common temperature in a given region? What is the average wind speed? How does this vary across different regions? By looking at descriptive statistics, you can get an initial idea of how various weather conditions and housing attributes interact with one another.
Next, explore correlations between variables. Are certain weather variables correlated with specific housing attributes? Is there a link between wind speeds and median house value? Or between humidity and ocean proximity? Analyzing correlations allows for deeper insights into how different aspects may influence one another for a given region or area. These correlations may also inform broader patterns that are present across multiple North American regions or countries.
Finally, use visualizations to further investigate this relationship between climate and housing attributes in North America in 2012. Graphs allow you visualize trends like seasonal variations or long-term changes over time more easily so they are useful when interpreting large amounts of data quickly while providing larger context beyond what numbers alone can tell us about relationships between different aspects within this dataset
- Analyzing the effect of climate change on housing markets across North America. By looking at temperature and weather trends in combination with housing values, researchers can better understand how climate change may be impacting certain regions differently than others.
- Investigating the relationship between median income, house values and ocean proximity in coastal areas. Understanding how ocean proximity plays into housing prices may help inform real estate investment decisions and urban planning initiatives related to coastal development.
- Utilizing differences in weather patterns across different climates to determine optimal seasonal rental prices for property owners. By analyzing changes in temperature, wind speed, humidity, pressure and visibility from season to season an investor could gain valuable insights into seasonal market trends to maximize their profits from rentals or Airbnb listings over time
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: Weather.csv | Column name | Description | |:---------------------|:-----------------------------------------------| | Date/Time | Date and time of the observation. (Date/Time) | | Temp_C | Temperature in Celsius. (Numeric) | | Dew Point Temp_C | Dew point temperature in Celsius. (Numeric) | | Rel Hum_% | Relative humidity in percent. (Numeric) | | Wind Speed_km/h | Wind speed in kilometers per hour. (Numeric) | | Visibility_km | Visibilit...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
You will find three datasets containing heights of the high school students.
All heights are in inches.
The data is simulated. The heights are generated from a normal distribution with different sets of mean and standard deviation for boys and girls.
| Height Statistics (inches) | Boys | Girls |
|---|---|---|
| Mean | 67 | 62 |
| Standard Deviation | 2.9 | 2.2 |
There are 500 measurements for each gender.
Here are the datasets:
hs_heights.csv: contains a single column with heights for all boys and girls. There's no way to tell which of the values are for boys and which ones are for girls.
hs_heights_pair.csv: has two columns. The first column has boy's heights. The second column contains girl's heights.
hs_heights_flag.csv: has two columns. The first column has the flag is_girl. The second column contains a girl's height if the flag is 1. Otherwise, it contains a boy's height.
To see how I generated this dataset, check this out: https://github.com/ysk125103/datascience101/tree/main/datasets/high_school_heights
Image by Gillian Callison from Pixabay
Facebook
TwitterData for Figure 3.39 from Chapter 3 of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Figure 3.39 shows the observed and simulated Pacific Decadal Variability (PDV). --------------------------------------------------- How to cite this dataset --------------------------------------------------- When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Eyring, V., N.P. Gillett, K.M. Achuta Rao, R. Barimalala, M. Barreiro Parrillo, N. Bellouin, C. Cassou, P.J. Durack, Y. Kosaka, S. McGregor, S. Min, O. Morgenstern, and Y. Sun, 2021: Human Influence on the Climate System. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 423–552, doi:10.1017/9781009157896.005. --------------------------------------------------- Figure subpanels --------------------------------------------------- The figure has six panels. Files are not separated according to the panels. --------------------------------------------------- List of data provided --------------------------------------------------- pdv.obs.nc contains - Observed SST anomalies associated with the PDV pattern - Observed PDV index time series (unfiltered) - Observed PDV index time series (low-pass filtered) - Taylor statistics of the observed PDV patterns - Statistical significance of the observed SST anomalies associated with the PDV pattern pdv.hist.cmip6.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP6 historical simulations. pdv.hist.cmip5.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP5 historical simulations. pdv.piControl.cmip6.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP6 piControl simulations. pdv.piControl.cmip5.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP5 piControl simulations. --------------------------------------------------- Data provided in relation to figure --------------------------------------------------- Panel a: - ipo_pattern_obs_ref in pdv.obs.nc: shading - ipo_pattern_obs_signif (dataset = 1) in pdv.obs.nc: cross markers Panel b: - Multimodel ensemble mean of ipo_model_pattern in pdv.hist.cmip6.nc: shading, with their sign agreement for hatching Panel c: - tay_stats (stat = 0, 1) in pdv.obs.nc: black dots - tay_stats (stat = 0, 1) in pdv.hist.cmip6.nc: red crosses, and their multimodel ensemble mean for the red dot - tay_stats (stat = 0, 1) in pdv.hist.cmip5.nc: blue crosses, and their multimodel ensemble mean for the blue dot Panel d: - Lag-1 autocorrelation of tpi in pdv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.hist.cmip6.nc: red filled box-whisker in the left - Lag-10 autocorrelation of tpi_lp in pdv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.hist.cmip6.nc: red filled box-whisker in the right Panel e: - Standard deviation of tpi in pdv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.hist.cmip6.nc: red filled box-whisker in the left - Standard deviation of tpi_lp in pdv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.hist.cmip6.nc: red filled box-whisker in the right Panel f: - tpi_lp in pdv.obs.nc: black curves . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - tpi_lp in pdv.hist.cmip6.nc: 5th-95th percentiles in red shading, multimodel ensemble mean and its 5-95% confidence interval for red curves - tpi_lp in pdv.hist.cmip5.nc: 5th-95th percentiles in blue shading, multimodel ensemble mean for blue curve CMIP5 is the fifth phase of the Coupled Model Intercomparison Project. CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. SST stands for Sea Surface Temperature. --------------------------------------------------- Notes on reproducing the figure from the provided data --------------------------------------------------- Multimodel ensemble means and percentiles of historical simulations of CMIP5 and CMIP6 are calculated after weighting individual members with the inverse of the ensemble size of the same model. ensemble_assign in each file provides the model number to which each ensemble member belongs. This weighting does not apply to the sign agreement calculation. piControl simulations from CMIP5 and CMIP6 consist of a single member from each model, so the weighting is not applied. Multimodel ensemble means of the pattern correlation in Taylor statistics in (c) and the autocorrelation of the index in (d) are calculated via Fisher z-transformation and back transformation. --------------------------------------------------- Sources of additional information --------------------------------------------------- The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the report component containing the figure (Chapter 3) - Link to the Supplementary Material for Chapter 3, which contains details on the input data used in Table 3.SM.1 - Link to the code for the figure, archived on Zenodo - Link to the figure on the IPCC AR6 website
Facebook
TwitterSee Materials and Methods section for description of angle B. Group means shown without confidence intervals are those for which sample size is too small to derive 95% confidence intervals (n < 8). See Table 1 for institutional abbreviations.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset contains zonal-mean atmospheric diagnostics computed from reanalysis datasets on pressure levels. Primary variables include temperature, geopotential height, and the three-dimensional wind field. Advanced diagnostics include zonal covariance terms that can be used to compute, for instance, eddy kinetic energy and eddy fluxes. Terms from the primitive zonal-mean momentum equation and the transformed Eulerian momentum equation are also provided.
This dataset was produced to facilitate the comparison of reanalysis datasets for the collaborators of the SPARC- Reanalysis Intercomparison Project (S-RIP) project. The dataset is substantially smaller in size compared to the full three dimensional reanalysis fields and uses unified numerical methods. The dataset includes all global reanalyses available at the time of its development and will be extended to new reanalysis products in the future.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from BILO Gridded Climate Data data provided by the CSIRO. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.
Various climate variable summary for all 15 subregions. Including:
Time series mean annual Bureau of Meteorology Australian Water Availability Project (BAWAP) rainfall from 1900 - 2012.
Long term average BAWAP rainfall and Penman Potential Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month
Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net Radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.
Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). All data used in this analysis came directly from James Risbey, CSIRO Marine and Atmospheric Research (CMAR), Hobart. As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).
Dataset was generated using various source data:
annual BAWAPrainfall
Monthly BAWAP rainfall
Monthly Penman PET
Monthly BAWAP rainfall
Monthly Penman PET
Monthly BAWAP Tair
Monthly BAWAP Tmax
Monthly BAWAP Tmin
Monthly VPD
Actual vapour measured at 9:00am, the saturated vapour is calculated from Tmax and Tmin.
Monthly Rn
Monthly Wind
This dataset is created by CLW Ecohydrological Time Series Remote Sensing Team. See http://www-data.iwis.csiro.au/ts/climate/wind/mcvicar_etal_grl2008/.
Bioregional Assessment Programme (2013) Mean climate variables for all subregions. Bioregional Assessment Derived Dataset. Viewed 12 March 2019, http://data.bioregionalassessments.gov.au/dataset/3f568840-0c77-4f74-bbf3-6f82d189a1fc.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Southern View, IL, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
https://i.neilsberg.com/ch/southern-view-il-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Southern View, IL (in 2022 inflation-adjusted dollars))">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Southern View median household income. You can refer the same here
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset contains zonal-mean model-generated and diagnosed heating rates as potential temperature tendencies on pressure levels. The model-generated heating rates consist of total heating rates due to parameterized physics along with heating rates due to long-wave and short-wave radiative transfer, as generated during the model forecast step. The diagnosed heating rates are calculated from the zonal-mean atmospheric diagnostics (Zonal-mean reanalyses on pressure levels dataset) according to the zonal-mean thermodynamic equation. All heating rates are provided 6-hourly on identical horizontal and vertical grids as the dynamical variables included in Zonal-mean reanalyses on pressure levels dataset. However, the time axis of this dataset lags that of Zonal-mean reanalyses on pressure levels dataset by three hours.
This dataset was produced to facilitate the comparison of reanalysis datasets for the collaborators of the SPARC- Reanalysis Intercomparison Project (S-RIP). The dataset is substantially smaller in size compared to the full three dimensional reanalysis fields and uses unified numerical methods. The dataset includes all global reanalyses available at the time of its development and will be extended to new reanalysis products in the future.
Facebook
TwitterThe Highway-Runoff Database (HRDB) was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration (FHWA) to provide planning-level information for decision makers, planners, and highway engineers to assess and mitigate possible adverse effects of highway runoff on the Nation’s receiving waters. The HRDB was assembled by using a Microsoft Access database application to facilitate use of the data and to calculate runoff-quality statistics with methods that properly handle censored-concentration data. This data release provides highway-runoff data, including information about monitoring sites, precipitation, runoff, and event-mean concentrations of water-quality constituents. The dataset was compiled from 37 studies as documented in 113 scientific or technical reports. The dataset includes data from 242 highway sites across the country. It includes data from 6,837 storm events with dates ranging from April 1975 to November 2017. Therefore, these data span more than 40 years; vehicle emissions and background sources of highway-runoff constituents have changed markedly during this time. For example, some of the early data is affected by use of leaded gasoline, phosphorus-based detergents, and industrial atmospheric deposition. The dataset includes 106,441 concentration values with data for 414 different water-quality constituents. This dataset was assembled from various sources and the original data was collected and analyzed by using various protocols. Where possible the USGS worked with State departments of transportation and the original researchers to obtain, document, and verify the data that was included in the HRDB. This new version (1.1.0) of the database contains software updates to provide data-quality information within the Graphical User Interface (GUI), calculate statistics for multiple sites in batch mode, and output additional statistics. However, inclusion in this dataset does not constitute endorsement by the USGS or the FHWA. People who use this data are responsible for ensuring that the data are complete and correct and that it is suitable for their intended purposes.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Noble gas data from Taylor Glacier and EPICA Dome C (EDC) for mean ocean temperature reconstruction during the Last Interglacial. Also includes trace gas measurements of d18Oatm, CO2, and CH4 from Taylor Glacier from chronology construction.
Facebook
TwitterThis data set contains 1971-2000 mean annual precipitation estimates for west-central Nevada. This is a raster data set developed using the precipitation-zone method, which uses elevation-based regression equations to estimate mean annual precipitation for defined precipitation zones (Lopes and Medina, 2007.) This data set is based on the 30-meter National Elevation Dataset. Reference Cited Lopes, T.J., and Medina, R.L., 2007, Precipitation Zones of West-Central Nevada: Journal of Nevada Water Resources Association, v. 4, no 2, p. 21.
Facebook
TwitterThe BOREAS AFM-06 team from the National Oceanic and Atmospheric Administration Environment Technology Laboratory (NOAA/ETL) operated a 915 MHz wind/Radio Acoustic Sounding System (RASS) profiler system in the Southern Study Area (SSA) near the Old Jack Pine (OJP) tower from 21-May-1994 to 20-Sep-1994. The data set provides temperature profiles at 15 heights, containing the variables of virtual temperature, vertical velocity, the speed of sound, and w-bar.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SummaryThe repository includes the data and R script for performing an analysis of among- and within-individual differences in the timing of first nesting attempts of the year in natal and pre-breeding environmental conditions (see reference). The data come from a long-term study of the demography of Savannah sparrows (Passerculus sandwichensis) breeding on Kent Island, New Brunswick, Canada (44.58°N, 66.76°W). Climate data were taken from an Environment and Climate Change Canada weather station at the airport in Saint John, NB (45.32°N, 65.89°W; https://www.climate.weather.gc.ca)Datasets(1) SAVS_all_nests_samp.csv: contains summary information for all nest attempts observed for all females included in the analysis (i.e., including both first-of-year and subsequent lay dates).(2) SAVS_first_nest_per_year_samp.csv: contains detailed information on the first nesting attempt by each female Savannah sparrow monitored in the population over the course of the study (1987-2019, excluding the years 2005-2007; see Methods: Study site and field sampling in reference).(3) mean_daily_temperature.csv: contains mean daily temperature records from the ECCC weather station at Saint John, NB (see above). These mean daily temperatures were used in a climate sensitivity analysis to determine the optimum pre-breeding window on Kent Island.(4) SAVS_annual_summary.csv: contains annual summaries of average lay dates, breeding density, reproductive output, etc.Variables- female.id = factor; unique aluminum band number (USGS or Canadian Wildlife Service) assigned to each female- rain.categorical = binary (0 = low rainfall; 1 = high rainfall); groups females into low (81-171 mm) and high (172-378 mm) natal rainfall groups, based on the natal environmental conditions observed in each year (see Methods: Statistical analysis in reference)- year = integer (1987-2019); study year. The population on Savannah sparrows on Kent Island has been monitored since 1987 (excluding three years, 2005-2007)- nest.id = factor; an alpha-numeric code assigned to each nest; unique within years (the combination of year and nest.id would create a unique identifier for each nest)- fledglings = integer; number of offspring fledged from a nest- total.fledglings = integer; the total number of fledglings reared by a given female over the course of her lifetime- nest.attempts = integer; the total number of nest attempts per female (the number of nests over which the total number of fledglings is divided; includes both successful and unsuccessful clutches)hatch.yday = integer; day of the year on which the first egg hatched in a given nestlay.ydate = integer; day of the year on which the first egg was laid in a given nestlay.caldate = date (dd/mm/yyyy); calendar date on which the first egg in a given nest was laidnestling.year = integer; the year in which the female/mother of a given nest was born- nestling.density = integer; the density of adult breeders in the year in which a given female (associated with a particular nest) was born- total.nestling.rain = numeric; cumulative rainfall (in mm) experienced by a female during the nestling period in her natal year of life (01 June to 31 July; see Methods: Temperature and precipitation data in reference)- years.experience = integer; number of previous breeding years per female in a particular year- density.total = integer; total number of adult breeders in the study site in a particular year- MCfden = numeric; mean-centred female density- MCbfden = numeric; mean-centred between-female density- MCwfden = numeric; mean-centred within-female density- mean.t.window = numeric; mean temperature during the identified pre-breeding window (03 May to 26 May; see Methods: Climate sensitivity analysis in reference)- MCtemp = numeric; mean-centred temperature during the optimal pre-breeding window- MCbtemp = numeric; mean-centred between-female temperature during the optimal pre-breeding window- MCwtemp = numeric; mean-centred within-female temperature during the optimal pre-breeding window- female.age = integer; age (in years) of a given female in a given year- MCage = numeric; mean-centred female age- MCbage = numeric; mean-centred between-female age- MCwage = numeric; mean-centred within-female age- mean_temp_c = numeric; mean daily temperature in °C- meanLD = numeric; mean lay date (in days of the year) across all first nest attempts in a given year- sdLD = numeric; standard deviation in lay date (in days of the year) across all first nest attempts in a given year- seLD = numeric; standard error n lay date (in days of the year) across all first nest attempts in a given year- meanTEMP = numeric; mean temperature (in °C) during the breeding period in a given year- records = integer; number of first nest attempts from each year included in the analysis- total.nestling.precip = numeric; total rainfall (in mm) during the nestling period (01 June to 31 July) in a given year- total.breeding.precip = numeric; total rainfall (in mm) during the breeding period (15 April to 31 July) in a given year- density.total = integer; total density of adult breeders on the study site in a given year- total.fledglings = integer; total number of offspring fledged by all breeders in the study site on a given year- cohort.fecundity = numeric; average number of offspring per breeder in a given yearCodecode for Burant et al. - SAVS lay date plasticity analysis.RThe R script provided includes all the code required to import the data and perform the statistical analyses presented in the manuscript. These include:- t-tests investigating the effects of natal conditions (rain.categorical) on female age, nest attempts, and reproductive success- linear models of changes in temperature, precipitation, reproductive success, and population density over time, and lay dates in response to female age, density, etc.- a climate sensing analysis to identify the optimal pre-breeding window on Kent Island- mixed effects models investigating how lay dates respond to changes in within- and between-female age, density, and temperaturesee readme.rtf for a list of datasets and variables.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This synthetic dataset contains 5,000 student records exploring the relationship between study hours and academic performance.
This dataset was generated using R.
# Set seed for reproducibility
set.seed(42)
# Define number of observations (students)
n <- 5000
# Generate study hours (independent variable)
# Uniform distribution between 0 and 12 hours
study_hours <- runif(n, min = 0, max = 12)
# Create relationship between study hours and grade
# Base grade: 40 points
# Each study hour adds an average of 5 points
# Add normal noise (standard deviation = 10)
theoretical_grade <- 40 + 5 * study_hours
# Add normal noise to make it realistic
noise <- rnorm(n, mean = 0, sd = 10)
# Calculate final grade
grade <- theoretical_grade + noise
# Limit grades between 0 and 100
grade <- pmin(pmax(grade, 0), 100)
# Create the dataframe
dataset <- data.frame(
student_id = 1:n,
study_hours = round(study_hours, 2),
grade = round(grade, 2)
)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was derived by the Bioregional Assessment Programme from 'Mean climate variables for all subregions' and 'fPAR derived from MODIS for BA subregions'. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.
These are charts of climate statistics and MODIS data for each BA subregion. There are six 600dpi PNG files per subregion, with the naming convention BA-[regioncode]-[subregioncode]-[chartname].png. The charts, according to their filename, are: rain (time-series of rainfall; Figure 1), P-PET (average monthly precipitation and potential evapotranspiration; Figure 2), 5line (assorted monthly statistics; Figure 3), trend (monthly long-term trends; Figure 4) and fPAR (fraction of photosynthetically available radiation - an indication of biomass; Figure 5).
This version was created on 18 November 2014, using data that accounted for a modified boundary for the Gippsland Basin bioregion and the combination of two subregions to form the Sydney Basin bioregion.
These charts were generated to be included in the Contextual Report (geography) for each subregion.
These charts were generated using MatPlotLib 1.3.0 in Python 2.7.5 (Anaconda distribution v1.7.0 32-bit).
The script for generating these plots is BA-ClimateCharts.py, and is packaged with the dataset. This script is a data collection and chart drawing script, it does not do any analysis. The data are charted as they appear in the parent datasets (see Lineage). A word document (BA-ClimateGraphs-ReadMe) is also included. This document includes examples of, and approved captions for, each chart.
Bioregional Assessment Programme (2014) Charts of climate statistics and MODIS data for all Bioregional Assessment subregions. Bioregional Assessment Derived Dataset. Viewed 14 June 2018, http://data.bioregionalassessments.gov.au/dataset/8a1c5f43-b150-4357-aa25-5f301b1a02e1.
Derived From Mean climate variables for all subregions
Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012
Derived From fPar derived from MODIS for BA subregions
Facebook
TwitterThis dataset includes monthly gridded temperature anomalies on a global 2.5 x 2.5 degree grid derived from Microwave Sounding Unit (MSU) and Advanced Microwave Sounding Unit (AMSU) radiance data since December 1978. In addition, there are monthly regional anomalies and monthly mean annual cycle temperatures. All products are derived for four bulk layers of the atmosphere: the Lower Troposphere (TLT), Mid-Troposphere (TMT), Tropopause (TTP) and Lower Stratosphere (TLS). Version 6.0 is the latest UAH version archived at NOAA and is updated monthly. It utilizes the linear calibration equation with hot-target correction for the MSU series (TIROS-N through NOAA-14) rather than other non-linear calibration equations. Gridded values of absolute temperature are calculated from a polynomial fit in the vertical coordinate of all view angle temperatures binned into each grid over a month. The selected temperature is calculated from a prescribed view-angle where it intersects the polynomial fit of the temperature vs. view-angle relationship or each grid. The diurnal adjustment is completely empirical, calculated by comparing a diurnally-drifting spacecraft against one that is not drifting during their overlap comparison period (for a.m. spacecraft, NOAA-15 vs. (non-drifting) AQUA, and for p.m., NOAA-18 vs. (non-drifting) NOAA-19 during 4 years). The calculated diurnal relationship of temperature change vs. time of day is then applied to all drifting satellites. The Lower Troposphere is calculated from a linear combination of TMT, TTP and TLS rather than from a linear combination of view-angles from the single channel (MSU2 or AMSU5) as was done in versions 5.6 and earlier. A new bulk layer centered on the Tropopause was added in version 6.0. These products were converted from the native text file format to netCDF-4 following CF metadata conventions, and they are accompanied by algorithm documentation, data flow diagram and source code for the NOAA CDR Program.
Facebook
Twitter🔍 Dataset Overview: 🐟 Species: Name of the fish species (e.g., Anabas testudineus)
📏 Length: Length of the fish (in centimeters)
⚖️ Weight: Weight of the fish (in grams)
🧮 W/L Ratio: Weight-to-length ratio of the fish
🧠 Steps to Build the Prediction Model: 📋 Data Preprocessing: 1 - Handle Missing Values: Check for and handle any missing values appropriately using methods like:
Imputation (mean/median for numeric data)
Row or column removal (if data is too sparse)
2 - Convert Data Types: Ensure numerical columns (Length, Weight, W/L Ratio) are in the correct numeric format.
3 - Handle Categorical Variables: Convert the Species column into numerical format using:
One-Hot Encoding
Label Encoding
🎯 Feature Selection: 1 - Correlation Analysis: Use correlation heatmaps or statistical tests to identify features most related to the target variable (e.g., Weight).
2 - Feature Importance: Use tree-based models (like Random Forest) to determine which features are most predictive.
🔍 Model Selection: 1 - Algorithm Choice: Choose suitable machine learning algorithms such as:
Linear Regression
Decision Tree Regressor
Random Forest Regressor
Gradient Boosting Regressor
2 - Model Comparison: Evaluate each model using metrics like:
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
R-squared (R²)
🚀 Model Training and Evaluation: 1 - Train the Model: Split the dataset into training and testing sets (e.g., 80/20 split). Train the selected model(s) on the training set.
2 - Evaluate the Model: Use the test set to assess model performance and fine-tune as necessary using grid search or cross-validation.
This dataset and workflow are useful for exploring biometric relationships in fish and building regression models to predict weight based on length or species. Great for marine biology, aquaculture analytics, and educational projects.
🐠 Happy modeling! 👍 Please upvote if you found this helpful!
https://www.kaggle.com/code/abdelrahman16/fish-clustering-diverse-techniques
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COVID-19 prediction has been essential in the aid of prevention and control of the disease. The motivation of this case study is to develop predictive models for COVID-19 cases and deaths based on a cross-sectional data set with a total of 28,955 observations and 18 variables, which is compiled from 5 data sources from Kaggle. A two-part modeling framework, in which the first part is a logistic classifier and the second part includes machine learning or statistical smoothing methods, is introduced to model the highly skewed distribution of COVID-19 cases and deaths. We also aim to understand what factors are most relevant to COVID-19’s occurrence and fatality. Evaluation criteria such as root mean squared error (RMSE) and mean absolute error (MAE) are used. We find that the two-part XGBoost model perform best with predicting the entire distribution of COVID-19 cases and deaths. The most important factors relevant to either COVID-19 cases or deaths include population and the rate of primary care physicians.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for Figure SPM.5 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).
Figure SPM.5 shows changes in annual mean surface temperatures, precipitation, and total column soil moisture.
How to cite this dataset
When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates:
IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 3−32, doi:10.1017/9781009157896.001.
Figure subpanels
The figure has four panels with 11 maps. All data is provided, except for panel a1.
List of data provided
This dataset contains:
The data is given for global warming levels (GWLs), namely +1.0°C (temperature only), +1.5°C, 2.0°C, and +4.0°C.
Data provided in relation to figure
Panel a: - Data file: Panel_a2_Simulated_temperature_change_at_1C.nc, simulated annual mean temperature change (°C) at 1°C global warming relative to 1850-1900 (right).
Panel b: - Data file: Panel_b1_Simulated_temperature_change_at_1_5C.nc, simulated annual mean temperature change (°C) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Panel_b2_Simulated_temperature_change_at_2C.nc, simulated annual mean temperature change (°C) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Panel_b3_Simulated_temperature_change_at_4C.nc, simulated annual mean temperature change (°C) at 4.0°C global warming relative to 1850-1900 (right).
Panel c: - Data file: Panel_c1_Simulated_precipitation_change_at_1_5C.nc, simulated annual mean precipitation change (%) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Panel_c2_Simulated_precipitation_change_at_2C.nc, simulated annual mean precipitation change (%) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Panel_c3_Simulated_precipitation_change_at_4C.nc, simulated annual mean precipitation change (%) at 4.0°C global warming relative to 1850-1900 (right).
Panel d: - Data file: Figure_SPM5_d1_cmip6_SM_tot_change_at_1_5C.nc, simulated annual mean total column soil moisture change (standard deviation) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Figure_SPM5_d2_cmip6_SM_tot_change_at_2C.nc, simulated annual mean total column soil moisture change (standard deviation) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Figure_SPM5_d3_cmip6_SM_tot_change_at_4C.nc, simulated annual mean total column soil moisture change (standard deviation) at 4.0°C global warming relative to 1850-1900 (right).
Sources of additional information
The following weblink is provided in the Related Documents section of this catalogue record:
Facebook
TwitterThe Met Office Hadley Centre's mean sea level pressure (MSLP) data set, HadSLP2, is a unique combination of monthly globally-complete fields of land and marine pressure observations on a 5 degree latitude-longitude grid from 1850 to 2004. This product is also available in an updated form using NCEP/NCAR reanalysis fields, giving the near real time product, HadSLP2r.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This comprehensive dataset explores the relationship between housing and weather conditions across North America in 2012. Through a range of climate variables such as temperature, wind speed, humidity, pressure and visibility it provides unique insights into the weather-influenced environment of numerous regions. The interrelated nature of housing parameters such as longitude, latitude, median income, median house value and ocean proximity further enhances our understanding of how distinct climates play an integral part in area real estate valuations. Analyzing these two data sets offers a wealth of knowledge when it comes to understanding what factors can dictate the value and comfort level offered by residential areas throughout North America
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset offers plenty of insights into the effects of weather and housing on North American regions. To explore these relationships, you can perform data analysis on the variables provided.
First, start by examining descriptive statistics (i.e., mean, median, mode). This can help show you the general trend and distribution of each variable in this dataset. For example, what is the most common temperature in a given region? What is the average wind speed? How does this vary across different regions? By looking at descriptive statistics, you can get an initial idea of how various weather conditions and housing attributes interact with one another.
Next, explore correlations between variables. Are certain weather variables correlated with specific housing attributes? Is there a link between wind speeds and median house value? Or between humidity and ocean proximity? Analyzing correlations allows for deeper insights into how different aspects may influence one another for a given region or area. These correlations may also inform broader patterns that are present across multiple North American regions or countries.
Finally, use visualizations to further investigate this relationship between climate and housing attributes in North America in 2012. Graphs allow you visualize trends like seasonal variations or long-term changes over time more easily so they are useful when interpreting large amounts of data quickly while providing larger context beyond what numbers alone can tell us about relationships between different aspects within this dataset
- Analyzing the effect of climate change on housing markets across North America. By looking at temperature and weather trends in combination with housing values, researchers can better understand how climate change may be impacting certain regions differently than others.
- Investigating the relationship between median income, house values and ocean proximity in coastal areas. Understanding how ocean proximity plays into housing prices may help inform real estate investment decisions and urban planning initiatives related to coastal development.
- Utilizing differences in weather patterns across different climates to determine optimal seasonal rental prices for property owners. By analyzing changes in temperature, wind speed, humidity, pressure and visibility from season to season an investor could gain valuable insights into seasonal market trends to maximize their profits from rentals or Airbnb listings over time
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: Weather.csv | Column name | Description | |:---------------------|:-----------------------------------------------| | Date/Time | Date and time of the observation. (Date/Time) | | Temp_C | Temperature in Celsius. (Numeric) | | Dew Point Temp_C | Dew point temperature in Celsius. (Numeric) | | Rel Hum_% | Relative humidity in percent. (Numeric) | | Wind Speed_km/h | Wind speed in kilometers per hour. (Numeric) | | Visibility_km | Visibilit...