Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is taken from a large National Health and Nutrition Survey conducted by the National Center for Health Statistics. From the original Body Measures Set (P_BMX) we selected only adult observations: Height (BMXHT), Weight (BMXWT) and Index body weights (BMXBMI). To select only adults, the BMDBMIC trait was used, which was determined only for children from 2 to 19 years old.
Full materials are available on the agency's website free. Use of the Materials, including any links to Materials on the CDC, ATSDR, or HHS Web Sites, does not imply endorsement by the CDC, ATSDR, HHS, or the US Government of you, your company, product, facility, service, or enterprise.
Columns contain data for males and females 20 years - 150 years Weight (kg) Standing Height (cm) BMI(kg/m**2)
Detailed description: https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/P_BMX.htm
This dataset allows you to study the relationship between height and weight.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Age, height and weight raw data
Facebook
Twitterhttps://www.geonorge.no/Geodataarbeid/Norge-digitalt/Avtaler-og-maler/Norge-digitalt-lisens/https://www.geonorge.no/Geodataarbeid/Norge-digitalt/Avtaler-og-maler/Norge-digitalt-lisens/
Elevation data provides a detailed description of terrain and surface. The data are established as point clouds from airborne laser scanning or matching from aerial images. The point clouds have varying density (point/m²) for different purposes. From the point clouds, elevation models are generated on grid format of terrain (DTM) and surface (DOM). Elevation data provides a detailed description of terrain and surface. The data are established as point clouds from airborne laser scanning or matching from aerial images. The point clouds have varying density (point/m²) for different purposes. From the point clouds, elevation models are generated on grid format of terrain (DTM) and surface (DOM).
Facebook
TwitterThis is the famous Galton data on the heights or parents and their children (i.e., where the term "regression" comes from). The data are public domain and can be used for teaching or any other purpose. Archived here in part to explore how Dataverse works.
Facebook
TwitterMonthly and annual summary of global sea surface height anomalies above mean sea surface, averaged from raw data. Raw data available from Oct. 2, 1992 to May 14, 2016, at 5 day intervals, with a spatial resolution of 0.17 degrees (Latitude) x 0.17 degrees (Longitude). Dataset contains the fully corrected heights, in meters.
Facebook
TwitterThis dataset was created by Shameer Rao
Facebook
TwitterThis dataset provides global rasters of relative height metrics for vegetation from Global Ecosystem Dynamics Investigation (GEDI) L2A data and Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) L3A ATL08 data at 100-, 200-, 500-, and 1000-m spatial resolutions. The metrics include the relative heights RH98, RH90, RH75, and RH50, corresponding to the height at which the respective 98th, 90th, 75th, and 50th percentile of returned energy is reached relative to the ground. These metrics provide measures of vegetation canopy height and structure. The different relative height metrics were intercalibrated over the overlap area (50 - 52 degrees N). GEDI data were collected from 2019-2022, and ICESat2 data were from 2019-2021. The data are provided in cloud optimized GeoTIFF format.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These are height measurement data for female, individually-known African elephants in the Samburu and Buffalo Springs National Reserves. These data were used in the manuscript "Orphaning stunts growth in African Elephants", currently under review. The first Excel worksheet is titled "GW.growth.curve". It shows the median of the height measurements taken from an elephant on a single date by author George Wittemyer. These medians were used to create a von Bertalanffy growth curve upon which we structured the Bayesian analysis that addressed our main hypotheses. The second worksheet titled "All.data" shows all measurements taken by either author. The third worksheet shows a summary of author Jenna Parker's measurements, including which individuals were not included in the main analysis because we are unsure of their exact birthdate.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We collected data on tree height and diameter at breast height for 26,259 individuals of 102 species in Japan from 86 literature. We hope that this dataset will be widely used for forest management and research.
Facebook
TwitterThe Global Ecosystem Dynamics Investigation (GEDI) mission aims to characterize ecosystem structure and dynamics to enable radically improved quantification and understanding of the Earth’s carbon cycle and biodiversity. The GEDI instrument produces high resolution laser ranging observations of the 3-dimensional structure of the Earth. GEDI is attached to the International Space Station (ISS) and collects data globally between 51.6° N and 51.6° S latitudes at the highest resolution and densest sampling of any light detection and ranging (lidar) instrument in orbit to date. Each GEDI Version 2 granule encompasses one-fourth of an ISS orbit and includes georeferenced metadata to allow for spatial querying and subsetting.The GEDI instrument was removed from the ISS and placed into storage on March 17, 2023. No data were acquired during the hibernation period from March 17, 2023, to April 24, 2024. GEDI has since been reinstalled on the ISS and resumed operations as of April 26, 2024.The purpose of the GEDI Level 2A Geolocated Elevation and Height Metrics product (GEDI02_A) is to provide waveform interpretation and extracted products from each GEDI01_B received waveform, including ground elevation, canopy top height, and relative height (RH) metrics. The methodology for generating the GEDI02_A product datasets is adapted from the Land, Vegetation, and Ice Sensor (LVIS) algorithm. The GEDI02_A product is provided in HDF5 format and has a spatial resolution (average footprint) of 25 meters.The GEDI02_A data product contains 156 layers for each of the eight beams, including ground elevation, canopy top height, relative return energy metrics (e.g., canopy vertical structure), and many other interpreted products from the return waveforms. Additional information for the layers can be found in the GEDI Level 2A Dictionary.Known Issues Data acquisition gaps: GEDI data acquisitions were suspended on December 19, 2019 (2019 Day 353) and resumed on January 8, 2020 (2020 Day 8). Incorrect Reference Ground Track (RGT) number in the filename for select GEDI files: GEDI Science Data Products for six orbits on August 7, 2020, and November 12, 2021, had the incorrect RGT number in the filename. There is no impact to the science data, but users should reference this document for the correct RGT numbers. Known Issues: Section 8 of the User Guide provides additional information on known issues.Improvements/Changes from Previous Versions Metadata has been updated to include spatial coordinates. Granule size has been reduced from one full ISS orbit (~5.83 GB) to four segments per orbit (~1.48 GB). Filename has been updated to include segment number and version number. Improved geolocation for an orbital segment. Added elevation from the SRTM digital elevation model for comparison. Modified the method to predict an optimum algorithm setting group per laser shot. Added additional land cover datasets related to phenology, urban infrastructure, and water persistence. Added selected_mode_flag dataset to root beam group using selected algorithm. Removed shots when the laser is not firing.* Modified file name to include segment number and dataset version.
Facebook
TwitterThe BOREAS AFM-06 team from the National Oceanic and Atmospheric Administration Environment Technology Laboratory (NOAA/ETL) operated a 915 MHz wind/Radio Acoustic Sounding System (RASS) profiler system in the Southern Study Area (SSA) near the Old Jack Pine (OJP) site. This data set provides boundary layer height information over the site. The data were collected from 21-May-1994 to 20-Sep-1994.
Facebook
TwitterThe SWOT Level 2 KaRIn Low Rate Sea Surface Height Basic Data Product from the Surface Water Ocean Topography (SWOT) mission provides global sea surface height and significant wave height observations derived from low rate (LR) measurements from the Ka-band Radar Interferometer (KaRIn). SWOT launched on December 16, 2022 from Vandenberg Air Force Base in California into a 1-day repeat orbit for the "calibration" or "fast-sampling" phase of the mission, which completed in early July 2023. After the calibration phase, SWOT entered a 21-day repeat orbit in August 2023 to start the "science" phase of the mission, which is expected to continue through 2025. The L2 sea surface height data product is distributed in one netCDF-4 file per pass (half-orbit) covering the full KaRIn swath width, which spans 10-60km on each side of the nadir track. Sea surface height, sea surface height anomaly, wind speed, significant waveheight, and related parameters are provided on a geographically fixed, swath-aligned 2x2 km2 grid (Basic, Expert, Windwave). The sea surface height data are also provided on a finer 250x250 m2 "native" grid with minimal smoothing applied (Unsmoothed).Please note that this collection contains SWOT Version C science data products. This collection is a sub-collection of its parent: https://podaac.jpl.nasa.gov/dataset/SWOT_L2_LR_SSH_2.0 It provides the "Basic" file from each L2 SSH product, which contains a limited set of variables and is aimed at the general user.
Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
Provide average height data for students aged 6-15.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Introduction
This dataset contains all buildings in Germany with their footprint polygon and height. It is a partial dump of the ETHOS.BUILDA database (version v7_20240429). ETHOS.BUILDA is a database containing building-level data for the German building stock. It is based on various data sources that are combined and enriched with machine learning approaches to generate one consistent and complete building dataset.
ETHOS.BUILDA is made available under the Open Database License (ODbL). The licenses of the contents of the database depend on the data source. The sources of the building attributes and information on the type of processing that was done to assign the information from the raw data to the building in ETHOS.BUILDA are provided for each individual data point.
Data structure and file overview
Building data is provided per federal state, the files are named according to the NUTS-1 region names. The building data has the following fields:
field name description
ID unique identifier of the building
source the source of the building footprint
footprint footprint polygon in WKT-format, EPSG:3035
height_m
value: height of the building in [m],
source: source of the height data,
lineage: height assignment method
A mapping of the abbreviations of "source" and "lineage" of individual data points to the descriptions is provided in sources.csv and lineages.csv. There is no source entry for the source "v7_model.json" in the sources.csv file, as this refers to the internally trained machine learning model and not to an external dataset.
Acknowledgements
This work was supported by the Helmholtz Association under the program "Energy System Design".
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset is a categorical mapping of estimated mean building heights, by Census block group, in shapefile format for the conterminous United States. The data were derived from the NASA Shuttle Radar Topography Mission, which collected “first return” (top of canopy and buildings) radar data at 30-m resolution in February, 2000 aboard the Space Shuttle Endeavor. These data were processed here to estimate building heights nationally, and then aggregated to block group boundaries. The block groups were then categorized into six classes, ranging from “Low” to “Very High”, based on the mean and standard deviation breakpoints of the data. The data were evaluated in several ways, to include comparing them to a reference dataset of 85,000 buildings for the city of San Francisco for accuracy assessment and to provide contextual definitions for the categories.
Facebook
TwitterNo description is available. Visit https://dataone.org/datasets/farshid25.54.1 for complete metadata about this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Models were fit using auxiliary information that included lidar data from 20 acquisitions in Oregon and climate data. Measurements in plots of the Forest Inventory and Analysis program (FIA) were used to obtain plot-level ground observations for predictive modeling. Tree and transect measurements in FIA plots were respectively used to obtain plot-level values of AGB and DWB. To obtain plot-level values of CBD, CH, CBH and CFL, tree measurements in FIA plots were processed with FuelCalc. Plot level auxiliary variables were obtained intersecting the axiliary information layers with the FIA plots. Predictive models were random forest models in which a parametric component was added to model the error variance. The error variance was modeled as a power function of the predictive value and was used to produce uncertainty maps. A different model was fit for each variable and the resulting models were used to obtain maps of synthetic predictions for all areas covered by the 20 lidar acquisitions. The modeled error variance was used to generate uncertainty maps for the predictions of each response variable. Model accuracy was assessed globally (for the entire dataset) and separately for each one of the 20 lidar acquisitions included in the dataset.
Results from the accuracy assessment can be found in Appendix A and Appendix B of Mauro et al. (2021).
Each variable has two associated maps. These maps are named using the following convention where VARIABLE is the acronym for each variable (AGB, DWB, CBD, CH, CBH or CFL):
### There are two additional rasters. The first one, year.tif is necessary to obtain the reference year for each lidar acquisition. The second one, forest_mask.tif provides a forest vs non-forest mask. Forested areas are coded as 1s and non-forested areas with no-datas. This mask is a resampled subset of the PALSAR JAXA 2014 ‘New global 25m-resolution PALSAR mosaic and forest/non-forest map (2007-2010) - version 1’ from the Japan Aerospace Exploration Agency Earth Observation Research Center (www.eorc.jaxa.jp/ALOS/en/palsar_fnf/fnf_index.htm). Its reference year is 2009. Models to predict forest attributes were created using ground observations in forested areas. For many applications it is advisable to use the provided mask to excluded non-forested areas from analyses. This can be done, for example, multiplying the desired raster by the forest mask. Exceptions to this may occur in relatively open forested lands where the mask eliminates areas that actually sustain forest. In those areas, the use of an add-hoc forest mask might be more appropriate. ### Reference year: year.tif ### Forest mask: forest_mask.tif ###
UNITS:
For a given variable, both predictions and standard deviation of model errors have the same units. These units are:
Variable (Abreviation): Units
Above ground biomass (AGB): Mg/ha
Downed wood biomass (DWB):Mg/ha
Canopy bulk density (CBD): Kg/m3 (Kilogram per cubic meter)
Canopy height (CH): m
Canopy base height (CBH): m
Canopy fuel load (CFL):Mg/ha
COORDINATE REFERENCE SYSTEM:
The reference system for all maps is EPSG 5070
USAGE
These data are made freely available to the public and the scientific community in the belief that their wide dissemination will lead to greater understanding and new scientific insights.
Please include the following citation in any publication that uses these data:
Mauro, F., Hudak, A.T., Fekety, P.A., Frank, B., Temesgen, H., Bell, D.M., Gregory, M.J., McCarley, T.R., 2021. Regional Modeling of Forest Fuels and Structural Attributes Using Airborne Laser Scanning Data in Oregon. Remote Sensing 13. https://doi.org/10.3390/rs13020261
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
NBA data ranging from 1996 to 2024 contains physical attributes, bio information, (advanced) stats, and positions of players.
No missing values, certain data preprocessing will be needed depending on the task.
Data was gathered from the nba.com and Basketball Reference - starting with the season 1996/97 and up until the latest season 2023/24.
A lot of options for EDA & ML present - analyzing the change of physical attributes by position, how the number of 3-point shots changed throughout years, how the number of foreign players increased; using Machine Learning to predict player's points, rebounds and assists, predicting player's position, player clustering, etc.
The issue with the data was that the data about player height and weight was in Imperial system, so the scatterplot of heights and weights was not looking good (around only 20 distinct values for height and around 150 for weight, which is quite bad for the dataset of 13.000 players). I created a script in which I assign a random height to the player between 2 heights (let's say between 200.66 cm and 203.2 cm, which would be 6-7 and 6-8 in Imperial system), but I did it in a way that 80% of values fall in the range of 5 to 35% increase, which still keeps the integrity of the data (average height of the whole dataset increased for less than 1 cm). I did the same thing for the weight: since difference between 2 pounds is around 0.44 kg, I would assign a random value for weight for each player that is either +/- 0.22 from his original weight. Here I observed a change in the average weight of the whole dataset of around 0.09 kg, which is insignificant.
Unfortunately the NBA doesn't provide the data in cm and kg, and although this is not the perfect approach regarding accuracy, it is still much better than assigning only 20 heights to the dataset of 13.000 players.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
You will find three datasets containing heights of the high school students.
All heights are in inches.
The data is simulated. The heights are generated from a normal distribution with different sets of mean and standard deviation for boys and girls.
| Height Statistics (inches) | Boys | Girls |
|---|---|---|
| Mean | 67 | 62 |
| Standard Deviation | 2.9 | 2.2 |
There are 500 measurements for each gender.
Here are the datasets:
hs_heights.csv: contains a single column with heights for all boys and girls. There's no way to tell which of the values are for boys and which ones are for girls.
hs_heights_pair.csv: has two columns. The first column has boy's heights. The second column contains girl's heights.
hs_heights_flag.csv: has two columns. The first column has the flag is_girl. The second column contains a girl's height if the flag is 1. Otherwise, it contains a boy's height.
To see how I generated this dataset, check this out: https://github.com/ysk125103/datascience101/tree/main/datasets/high_school_heights
Image by Gillian Callison from Pixabay
Facebook
TwitterGage height values from U.S. Geological Survey (USGS) streamflow-gaging station 09152500 for the specified period are presented in comma separated value (CSV) format. Values encompass the monthly date range from February 1st to September 30th for each year, 2016 through 2019.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is taken from a large National Health and Nutrition Survey conducted by the National Center for Health Statistics. From the original Body Measures Set (P_BMX) we selected only adult observations: Height (BMXHT), Weight (BMXWT) and Index body weights (BMXBMI). To select only adults, the BMDBMIC trait was used, which was determined only for children from 2 to 19 years old.
Full materials are available on the agency's website free. Use of the Materials, including any links to Materials on the CDC, ATSDR, or HHS Web Sites, does not imply endorsement by the CDC, ATSDR, HHS, or the US Government of you, your company, product, facility, service, or enterprise.
Columns contain data for males and females 20 years - 150 years Weight (kg) Standing Height (cm) BMI(kg/m**2)
Detailed description: https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/P_BMX.htm
This dataset allows you to study the relationship between height and weight.