Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for WORLD reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This simulated dataset is a corrupted segment from the Social Security Death Master File (SSDMF) available at https://ssdmf.info/. There are 11 original datasets: dsxo
where x
runs from 1...11
and the suffix o
stands for original
. The sizes (number of original records) of these datasets are as follows:| dataset | size ||:----------:|:----:|| ds1o | 10K || ds2o | 20K | | ds3o | 40K || ds4o | 80K || ds5o | 120K || ds6o | 160K || ds7o | 200K || ds8o | 400K || ds9o | 600K || ds10o | 800K || ds11o | 1M |These original records are then corrupted via a modified version of the dsgen
Python script by Peter Christen
.The modified/corrupted files are saved as: dsxm
where the suffix m
stands for modified
.The modified records plus four original replicates are concatenated and mixed up (by the Linux command tool shuf
).The resultant datasets are named: dsx.0
(dsx.1)
before(after) shuffling.So, the sizes of these datasets are as follows:| dataset | size ||:-------:|:----:|| ds1.1 | 50k || ds2.1 | 100k || ds3.1 | 200k || ds4.1 | 400k || ds5.1 | 600k || ds6.1 | 800k || ds7.1 | 1M || ds8.1 | 2M || ds9.1 | 3M || ds10.1 | 4M || ds11.1 | 5M |Furthermore, each dataset is split into two halves to serve as input for record linkage algorithms. For example, ds1.1 is split into ds1.1.1 & ds1.1.2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data set contains the Magellan the FMAP, a full-resolution (75 meters/pixel) global mosaic, produced by the U.S. Geological Survey from Magellan F-BIDR data. The complete dataset consists of 340 quadrangles in Sinusoidal equal-area projection. Quadrangles extend approximately 12 degrees in latitude, except for those between 84 and 90 degrees North and South. Quadrangles near the equator extend 12 degrees in longitude longitudinal extent is increased to maintain a roughly constant number of samples.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Countries of the World’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fernandol/countries-of-the-world on 12 November 2021.
--- Dataset description provided by original source is as follows ---
World fact sheet, fun to link with other datasets.
Information on population, region, area size, infant mortality and more.
Source: All these data sets are made up of data from the US government. Generally they are free to use if you use the data in the US. If you are outside of the US, you may need to contact the US Govt to ask.
Data from the World Factbook is public domain. The website says "The World Factbook is in the public domain and may be used freely by anyone at anytime without seeking permission."
https://www.cia.gov/library/publications/the-world-factbook/docs/faqs.html
When making visualisations related to countries, sometimes it is interesting to group them by attributes such as region, or weigh their importance by population, GDP or other variables.
--- Original source retains full ownership of the source dataset ---
This dataset contains Commercial (Comm) Radio Occultation (RO) raw data from Spire Global Subsidiary, which is an established method for remote sounding of the atmosphere. The technique uses an instrument in low-Earth orbit (LEO) to track radio signals from Global Navigation Satellite System (GNSS) transmitters as they rise or set through the atmosphere. The occulting atmosphere refracts or bends the radio signals, and given the precise positions of both satellites, the bending angle can be deduced from the time delay of the signal. Collecting these measurements for a full occultation through the atmosphere provides a vertical profile of bending angles, from which profiles of physical quantities such as temperature, humidity, and ionospheric electron density can be retrieved. These data primarily feed numerical weather prediction (NWP) models that support weather forecasts, and also support space weather analysis/prediction at NOAA.
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
Developed by SOLARGIS and provided by the Global Solar Atlas (GSA), this data resource contains terrain elevation above sea level (ELE) in [m a.s.l.] covering the globe. Data is provided in a geographic spatial reference (EPSG:4326). The resolution (pixel size) of solar resource data (GHI, DIF, GTI, DNI) is 9 arcsec (nominally 250 m), PVOUT and TEMP 30 arcsec (nominally 1 km) and OPTA 2 arcmin (nominally 4 km).
The data is hyperlinked under 'resources' with the following characeristics:
ELE - GISdata (GeoTIFF)
Data format: GEOTIFF
File size : 826.8 MB
There are two temporal representation of solar resource and PVOUT data available:
• Longterm yearly/monthly average of daily totals (LTAym_AvgDailyTotals)
• Longterm average of yearly/monthly totals (LTAym_YearlyMonthlyTotals)
Both type of data are equivalent, you can select the summarization of your preference. The relation between datasets is described by simple equations:
• LTAy_YearlyTotals = LTAy_DailyTotals * 365.25
• LTAy_MonthlyTotals = LTAy_DailyTotals * Number_of_Days_In_The_Month
*For individual country or regional data downloads please see: https://globalsolaratlas.info/download (use the drop-down menu to select country or region of interest)
*For data provided in AAIGrid please see: https://globalsolaratlas.info/download/world.
For more information and terms of use, please, read metadata, provided in PDF and XML format for each data layer in a download file. For other data formats, resolution or time aggregation, please, visit Solargis website. Data can be used for visualization, further processing, and geo-analysis in all mainstream GIS software with raster data processing capabilities (such as open source QGIS, commercial ESRI ArcGIS products and others).
The International Satellite Cloud Climatology Project (ISCCP) focuses on the distribution and variation of cloud radiative properties to improve the understanding of the effects of clouds on climate, the radiation budget, and the long-term global hydrologic cycle. The ISCCP H-Series Climate Data Record consists of several parts: (1) ISCCP H-Series dataset, (2) ISCCP-Basic H-Series, and (3) Ancillary and input datasets. ISCCP H Series data The full ISCCP dataset consists of netCDF files containing various derived cloud parameters. The H-Series data includes several products. These include: * HXS (H-series pixel level single satellite - not in netcdf), * HXG (H-series pixel level gridded), * HGG (H-series Gridded Global), * HGH (H-series gridded monthly by hour), and * HGM ( H-series Gridded Monthly). The netCDF files are not structured with CF-standard names. Data variables are unitless and rely on data tables that are needed to represent each geophysical variable. Keeping ISCCP H-Series in this native format ensures that existing "power users" will be able to continue using the data. ISCCP Basic H Series ISCCP Basic files contains a subset of the cloud variables and products available in the full ISCCP dataset. It consists of remapped, calibrated, and subsetted variables following CF-conventions. In addition, the netCDF files follow full netCDF CF and ACDD Conventions. These files are intended to be use by new and/or less advanced users that may want to use cloud data, but do not need the full ISCCP dataset. Ancillary and Input data Ancillary and Input data used in the production of ISCCP are also archived. These consist of B1U geostationary satellite data, B1U and GAC calibration/HBT tables, and ancillary files for nnHIRS, AEROSOL, SNOWICE, and OZONE datasets.
A computerized data set of demographic, economic and social data for 227 countries of the world. Information presented includes population, health, nutrition, mortality, fertility, family planning and contraceptive use, literacy, housing, and economic activity data. Tabular data are broken down by such variables as age, sex, and urban/rural residence. Data are organized as a series of statistical tables identified by country and table number. Each record consists of the data values associated with a single row of a given table. There are 105 tables with data for 208 countries. The second file is a note file, containing text of notes associated with various tables. These notes provide information such as definitions of categories (i.e. urban/rural) and how various values were calculated. The IDB was created in the U.S. Census Bureau''s International Programs Center (IPC) to help IPC staff meet the needs of organizations that sponsor IPC research. The IDB provides quick access to specialized information, with emphasis on demographic measures, for individual countries or groups of countries. The IDB combines data from country sources (typically censuses and surveys) with IPC estimates and projections to provide information dating back as far as 1950 and as far ahead as 2050. Because the IDB is maintained as a research tool for IPC sponsor requirements, the amount of information available may vary by country. As funding and research activity permit, the IPC updates and expands the data base content. Types of data include: * Population by age and sex * Vital rates, infant mortality, and life tables * Fertility and child survivorship * Migration * Marital status * Family planning Data characteristics: * Temporal: Selected years, 1950present, projected demographic data to 2050. * Spatial: 227 countries and areas. * Resolution: National population, selected data by urban/rural * residence, selected data by age and sex. Sources of data include: * U.S. Census Bureau * International projects (e.g., the Demographic and Health Survey) * United Nations agencies Links: * ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/08490
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data set contains the Magellan Global Vector Data Record (GVDR), a sorted collection of scattering and emission measurements from the Magellan Mission. The sorting is into a grid of equal area 'pixels' distributed regularly about the planet. For data acquired from the same pixel but in different observing geometries, there is a second level of sorting to accommodate the different geometrical conditions. The 'pixel' dimension is 18.225 km. The GVDR is presented in Sinusoidal Equal Area (equatorial), Mercator (equatorial), and Polar Stereographic (polar) projections.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created using LeRobot.
Dataset Structure
meta/info.json: { "codebase_version": "v2.1", "robot_type": "so101_follower", "total_episodes": 6, "total_frames": 5786, "total_tasks": 1, "total_videos": 0, "total_chunks": 1, "chunks_size": 1000, "fps": 30, "splits": { "train": "0:6" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":… See the full description on the dataset page: https://huggingface.co/datasets/vcpatino/record-test.
Cross-national research on the causes and consequences of income inequality has been hindered by the limitations of existing inequality datasets: greater coverage across countries and over time is available from these sources only at the cost of significantly reduced comparability across observations. The goal of the Standardized World Income Inequality Database (SWIID) is to overcome these limitations. A custom missing-data algorithm was used to standardize the United Nations University's World Income Inequality Database and data from other sources; data collected by the Luxembourg Income Study served as the standard. The SWIID provides comparable Gini indices of gross and net income inequality for 192 countries for as many years as possible from 1960 to the present along with estimates of uncertainty in these statistics. By maximizing comparability for the largest possible sample of countries and years, the SWIID is better suited to broadly cross-national research on income inequality than previously available sources: it offers coverage double that of the next largest income inequality dataset, and its record of comparability is three to eight times better than those of alternate datasets.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created using LeRobot.
Dataset Structure
meta/info.json: { "codebase_version": "v2.1", "robot_type": "so100_follower", "total_episodes": 3, "total_frames": 1757, "total_tasks": 1, "total_videos": 3, "total_chunks": 1, "chunks_size": 1000, "fps": 30, "splits": { "train": "0:3" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":… See the full description on the dataset page: https://huggingface.co/datasets/U-RIL/record-Corners.
The SIRIS measurement objectives were: 1) to measure reflectance spectra of a bright and dark target at each GRSFE site for potential use in AVIRIS calibration characterize selected sites at the GRSFE modeling site (Lunar Lake, NV) using visible/infrared reflectance reflectance spectra for selected endmember materials at each of the GRSFE sites inter-instrument calibration.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
In preparation for the concerted international study of Comet Halley, the IHW conducted a trial run with observations of Comet Crommelin, largely during February and March of 1984.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the version 3.2 CYGNSS level 3 science data record which provides the average wind speed and mean square slope (MSS) on a 0.2x0.2 degree latitude by longitude equirectangular grid obtained from the Delay Doppler Mapping Instrument aboard the CYGNSS satellite constellation. The Level 2 Delay Doppler Map (DDM) data are used in the direct processing of the average wind speed and MSS data that are binned on the Level 3 grid. A subset of DDM data used in the direct processing of the average wind speed and MSS is co-located inside of the Level 2 data files. A single netCDF-4 data file is produced for each day of operation with an approximate 6 day latency. This version supersedes Version 3.1; https://doi.org/10.5067/CYGNS-L3X31. The reported sample locations are determined by the specular points corresponding to the Delay Doppler Maps (DDMs).
The v3.2 L3 gridded wind speed product inherits the v3.2 L2 FDS data as input at the same temporal and spatial resolution as the Level 2 data, sampled on consistent 0.2 by 0.2 degree latitude by longitude grid cells. The L3 gridding algorithm is unchanged. Range Corrected Gain (RCG) has been added to the L3 netcdf files as a new data field.
The CYGNSS is a NASA Earth System Science Pathfinder Mission that is intended to collect the first frequent space‐based measurements of surface wind speeds in the inner core of tropical cyclones. Made up of a constellation of eight micro-satellites, the observatories provide nearly gap-free Earth coverage using an orbital inclination of approximately 35° from the equator, with a mean (i.e., average) revisit time of seven hours and a median revisit time of three hours. This inclination allows CYGNSS to measure ocean surface winds between approximately 38° N and 38° S latitude. This range includes the critical latitude band for tropical cyclone formation and movement.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created using LeRobot.
Dataset Structure
meta/info.json: { "codebase_version": "v2.1", "robot_type": "so101_follower", "total_episodes": 1, "total_frames": 1795, "total_tasks": 1, "total_videos": 1, "total_chunks": 1, "chunks_size": 1000, "fps": 30, "splits": { "train": "0:1" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":… See the full description on the dataset page: https://huggingface.co/datasets/nkmurst/record-test3.
World Weather Records (WWR) is an archived publication and digital data set. WWR is meteorological data from locations around the world. Through most of its history, WWR has been a publication, first published in 1927. Data includes monthly mean values of pressure, temperature, precipitation, and where available, station metadata notes documenting observation practices and station configurations. In recent years, data were supplied by National Meteorological Services of various countries, many of which became members of the World Meteorological Organization (WMO). The First Issue included data from earliest records available at that time up to 1920. Data have been collected for periods 1921-30 (2nd Series), 1931-40 (3rd Series), 1941-50 (4th Series), 1951-60 (5th Series), 1961-70 (6th Series), 1971-80 (7th Series), 1981-90 (8th Series), 1991-2000 (9th Series), and 2001-2011 (10th Series). The most recent Series 11 continues, insofar as possible, the record of monthly mean values of station pressure, sea-level pressure, temperature, and monthly total precipitation for stations listed in previous volumes. In addition to these parameters, mean monthly maximum and minimum temperatures have been collected for many stations and are archived in digital files by NCEI. New stations have also been included. In contrast to previous series, the 11th Series is available for the partial decade, so as to limit waiting period for new records. It begins in 2010 and is updated yearly, extending into the entire decade.
This dataset contains information about various attributes of a set of fruits, providing insights into their characteristics. The dataset includes details such as fruit ID, size, weight, sweetness, crunchiness, juiciness, ripeness, acidity, and quality.
- A_id: Unique identifier for each fruit
- Size: Size of the fruit
- Weight: Weight of the fruit
- Sweetness: Degree of sweetness of the fruit
- Crunchiness: Texture indicating the crunchiness of the fruit
- Juiciness: Level of juiciness of the fruit
- Ripeness: Stage of ripeness of the fruit
- Acidity: Acidity level of the fruit
- Quality: Overall quality of the fruit
- Fruit Classification: Develop a classification model to categorize fruits based on their features.
- Quality Prediction: Build a model to predict the quality rating of fruits using various attributes.
The dataset was generously provided by an American agriculture company. The data has been scaled and cleaned for ease of use.
If you find this dataset useful, your support through an upvote would be greatly appreciated ❤️🙂 Thank you
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for WORLD reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.