91 datasets found

d
List of all countries with their 2 digit codes (ISO 3166-1)
datahub.io
Updated Aug 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). List of all countries with their 2 digit codes (ISO 3166-1) [Dataset]. https://datahub.io/core/country-list
Explore at:
Dataset updated
Aug 29, 2017
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
ISO 3166-1-alpha-2 English country names and code elements. This list states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements.
List_of_countries_by_population_in_1800
kaggle.com
zip
Updated Jul 17, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathurin Aché (2020). List_of_countries_by_population_in_1800 [Dataset]. https://www.kaggle.com/datasets/mathurinache/list-of-countries-by-population-in-1800
Explore at:
zip(355 bytes)Available download formats
Dataset updated
Jul 17, 2020
Authors
Mathurin Aché
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is extracted from https://en.wikipedia.org/wiki/List_of_countries_by_population_in_1800. Context: There s a story behind every dataset and heres your opportunity to share yours.Content: What s inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. Acknowledgements:We wouldn t be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.Inspiration: Your data will be in front of the world s largest data science community. What questions do you want to see answered?
Countries and territories Named Authority List
data.europa.eu
rdf xml, xml, zip
Updated Dec 3, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Publications Office of the European Union (2021). Countries and territories Named Authority List [Dataset]. https://data.europa.eu/data/datasets/country?locale=en
Explore at:
xml, rdf xml, zipAvailable download formats
Dataset updated
Dec 3, 2021
Dataset provided by
Publications Office of the European Unionhttp://op.europa.eu/
European Union-
Authors
Publications Office of the European Union
License
http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj
Description
Countries and territories is a controlled vocabulary that lists concepts associated with names of countries and territories. It is a corporate reference data asset covered by the Corporate Reference Data Management policy of the European Commission. It provides codes and names of geospatial and geopolitical entities in all official EU languages and is the result of a combination of multiple relevant standards, created to serve the requirements and use cases of the EU institutions services. Its main scope is to support documentary metadata activities. The codes of the concepts included are correlated with the ISO 3166 international standard. The authority code relies where possible on the ISO 3166-1 alpha-3 code. Additional user-assigned alpha-3 codes have been used to cover entities that are not included in the ISO 3166-1 standard. The corporate list contains mappings with the ISO 3166-1 two-letter codes, the Interinstitutional Style Guide codes and with other internal and external identifiers including ISO 3166-1 numeric, ISO 3166-3, UNSD M49, UNSD Geoscheme, IBAN, TIR, IANA domain. For the names of countries and territories, the corporate list synchronises with the Interinstitutional Style Guide (ISG, Section 7.1 and Annexes A5 and A6) and with the IATE terminology database. Membership and classification properties provide possibilities to group concepts, e.g., UN, EU, EEA, EFTA, Schengen area, Euro area, NATO, OECD, UCPM, ENP-EAST, ENP-SOUTH, EU candidate countries and potential candidates. Countries and territories is maintained by the Publications Office of the European Union and disseminated on the EU Vocabularies website. Regular updates are foreseen based on its stakeholders’ needs. Downloads in human-readable formats (.csv, .html) are also available.
o
Country Codes
public.opendatasoft.com
data.smartidf.services
+6more
csv, excel, geojson +1
Updated Aug 25, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Country Codes [Dataset]. https://public.opendatasoft.com/explore/dataset/countries-codes/
Explore at:
geojson, json, excel, csvAvailable download formats
Dataset updated
Aug 25, 2015
License
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Description
Country codes: ISO 2ISO 3UNLANGLABEL (EN, FR, SP)

CY-Bench: A comprehensive benchmark dataset for subnational crop yield...

zenodo.org
explore.openaire.eu

zip

Updated Sep 25, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Dilli Paudel; Dilli Paudel; Hilmy Baja; Hilmy Baja; Ron van Bree; Michiel Kallenberg; Michiel Kallenberg; Stella Ofori-Ampofo; Aike Potze; Pratishtha Poudel; Pratishtha Poudel; Abdelrahman Saleh; Weston Anderson; Weston Anderson; Malte von Bloh; Andres Castellano; Oumnia Ennaji; Raed Hamed; Rahel Laudien; Donghoon Lee; Inti Luna; Dainius Masiliūnas; Dainius Masiliūnas; Michele Meroni; Janet Mumo Mutuku; Siyabusa Mkuhlani; Jonathan Richetti; Alex C. Ruane; Ritvik Sahajpal; Guanyuan Shuai; Vasileios Sitokonstantinou; Rogerio de Souza Noia Junior; Amit Kumar Srivastava; Robert Strong; Lily-belle Sweet; Lily-belle Sweet; Petar Vojnović; Allard de Wit; Allard de Wit; Maximilian Zachow; Ioannis N. Athanasiadis; Ron van Bree; Stella Ofori-Ampofo; Aike Potze; Abdelrahman Saleh; Malte von Bloh; Andres Castellano; Oumnia Ennaji; Raed Hamed; Rahel Laudien; Donghoon Lee; Inti Luna; Michele Meroni; Janet Mumo Mutuku; Siyabusa Mkuhlani; Jonathan Richetti; Alex C. Ruane; Ritvik Sahajpal; Guanyuan Shuai; Vasileios Sitokonstantinou; Rogerio de Souza Noia Junior; Amit Kumar Srivastava; Robert Strong; Petar Vojnović; Maximilian Zachow; Ioannis N. Athanasiadis (2024). CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting [Dataset]. http://doi.org/10.5281/zenodo.13798797

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13798797

Dataset updated

Sep 25, 2024

Dataset provided by

AgML (https://www.agml.org/)

Authors

License

https://joinup.ec.europa.eu/page/eupl-text-11-12https://joinup.ec.europa.eu/page/eupl-text-11-12

Description

CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting

Overview

CY-Bench is a dataset and benchmark for subnational crop yield forecasting, with coverage of major crop growing countries of the world for maize and wheat. By subnational, we mean the administrative level where yield statistics are published. When statistics are available for multiple levels, we pick the highest resolution. The dataset combines sub-national yield statistics with relevant predictors, such as growing-season weather indicators, remote sensing indicators, evapotranspiration, soil moisture indicators, and static soil properties. CY-Bench has been designed and curated by agricultural experts, climate scientists, and machine learning researchers from the AgML Community, with the aim of facilitating model intercomparison across the diverse agricultural systems around the globe in conditions as close as possible to real-world operationalization. Ultimately, by lowering the barrier to entry for ML researchers in this crucial application area, CY-Bench will facilitate the development of improved crop forecasting tools that can be used to support decision-makers in food security planning worldwide.

* Crops : Wheat & Maize
* Spatial Coverage : Wheat (29 countries), Maize (38).
See CY-Bench paper appendix for the list of countries.
* Temporal Coverage : Varies. See country-specific data

Data

Data format

The benchmark data is organized as a collection of CSV files, with each file representing a specific category of variable for a particular country. Each CSV file is named according to the category and the country it pertains to, facilitating easy identification and retrieval. The data within each CSV file is structured in tabular format, where rows represent observations and columns represent different predictors related to a category of variable.

Data content

All data files are provided as .csv.

Data	Description	Variables (units)	Temporal Resolution	Data Source (Reference)
crop_calendar	Start and end of growing season	sos (day of the year), eos (day of the year)	Static	World Cereal (Franch et al, 2022)
fpar	fraction of absorbed photosynthetically active radiation	fpar (%)	Dekadal (3 times a month; 1-10, 11-20, 21-31)	European Commission's Joint Research Centre (EC-JRC, 2024)
ndvi	normalized difference vegetation index	-	approximately weekly	MOD09CMG (Vermote, 2015)
meteo	temperature, precipitation (prec), radiation, potential evapotranspiration (et0), climatic water balance (= prec - et0)	tmin (C), tmax (C), tavg (C), prec (mm0, et0 (mm), cwb (mm), rad (J m-2 day-1)	daily	AgERA5 (Boogaard et al, 2022), FAO-AQUASTAT for et0 (FAO-AQUASTAT, 2024)
soil_moisture	surface soil moisture, rootzone soil moisture	ssm (kg m-2), rsm (kg m-2)	daily	GLDAS (Rodell et al, 2004)
soil	available water capacity, bulk density, drainage class	awc (c m-1), bulk_density (kg dm-3), drainage class (category)	static	WISE Soil database (Batjes, 2016)
yield	end-of-season yield	yield (t ha-1)	yearly	Various country or region specific sources (see crop_statistics_... in https://github.com/BigDataWUR/AgML-CY-Bench/tree/main/data_preparation)

Folder structure

The CY-Bench dataset has been structure at first level by crop type and subsequently by country. For each country, the folder name follows the ISO 3166-1 alpha-2 two-character code. A separate .csv is available for each predictor data and crop calendar as shown below. The csv files are named to reflect the corresponding country and crop type e.g. **variable_croptype_country.csv**.
```
CY-Bench
│
└─── maize
│ │
│ └─── AO
│ │ -- crop_calendar_maize_AO.csv
│ │ -- fpar_maize_AO.csv
│ │ -- meteo_maize_AO.csv
│ │ -- ndvi_maize_AO.csv
│ │ -- soil_maize_AO.csv
│ │ -- soil_moisture_maize_AO.csv
│ │ -- yield_maize_AO.csv
│ │
│ └─── AR
│ -- crop_calendar_maize_AR.csv
│ -- fpar_maize_AR.csv
│ -- ...
│
└─── wheat
│ │
│ └─── AR
│ │ -- crop_calendar_wheat_AR.csv
│ │ -- fpar_wheat_AR.csv
│ │ ...
```

Example : CSV data content for maize in country X

```
X
└─── crop_calendar_maize_X.csv
│ -- crop_name (name of the crop)
│ -- adm_id (unique identifier for a subnational unit)
│ -- sos (start of crop season)
│ -- eos (end of crop season)
│
└─── fpar_maize_X.csv
│ -- crop_name
│ -- adm_id
│ -- date (in the format YYYYMMdd)
│ -- fpar
│
└─── meteo_maize_X.csv
│ -- crop_name
│ -- adm_id
│ -- date (in the format YYYYMMdd)

│ -- tmin (minimum temperature)
│ -- tmax (maximum temperature)
│ -- prec (precipitation)
│ -- rad (radiation)
│ -- tavg (average temperature)
│ -- et0 (evapotranspiration)
│ -- cwb (crop water balance)
│
└─── ndvi_maize_X.csv
│ -- crop_name
│ -- adm_id
│ -- date (in the format YYYYMMdd)
│ -- ndvi
│
└─── soil_maize_X.csv
│ -- crop_name
│ -- adm_id
│ -- awc (available water capacity)
│ -- bulk_density
│ -- drainage_class
│
└─── soil_moisture_maize_X.csv
│ -- crop_name
│ -- adm_id
│ -- date (in the format YYYYMMdd)
│ -- ssm (surface soil moisture)
│ -- rsm ()
│
└─── yield_maize_X.csv
│ -- crop_name
│ -- country_code
│ -- adm_id
│ -- harvest_year
│ -- yield
│ -- harvest_area
│ -- production

Data access

The full dataset can be downloaded directly from Zenodo or using the ```zenodo_get``` library

License and citation

We kindly ask all users of CY-Bench to properly respect licensing and citation conditions of the datasets included.

T
GDP by Country Dataset
tradingeconomics.com
csv, excel, json, xml
Updated Jun 29, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2011). GDP by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/gdp
Explore at:
csv, json, xml, excelAvailable download formats
Dataset updated
Jun 29, 2011
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
World
Description
This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
COVID-19 useful features by country
kaggle.com
Updated May 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ouassim Adnane (2020). COVID-19 useful features by country [Dataset]. https://www.kaggle.com/ishivinal/covid19-useful-features-by-country/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 3, 2020
Dataset provided by
Kaggle
Authors
Ouassim Adnane
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Context

This dataset provides Country name created to match the COVID19 Global Forecasting (Week 4) challenge If you found this helpful an upvote would be very much appreciated. Let me know if you find any mistakes, so I can correct them.

Content

The dataset consists of one main CSV file: Countries_usefulFeatures.csv that contains 12 columns see the descriptions below for more detailed information.

Column Description

Country_Region: Name of the country

Population_Size: the population size 2018 stats

Tourism: International tourism, number of arrivals 2018

Date_FirstFatality: Date of the first Fatality of the COVID-19

Date_FirstConfirmedCase: Date of the first confirmed case of the COVID-19

Latitude

Longitude

Mean_Age: mean age of the population 2018 stats

Lockdown_Date: date of the lockdown

Lockdown_Type: type of the lockdown

Country_Code: 3 digit country code

Acknowledgements

Data is collected from :
Median age by country since 1950

International tourism, number of arrivals

Population Size

Lockdown Date

Lockdown dates by country

other government websites
p
Luxembourgish Country Border - 5k Coordinates
data.public.lu
csv
Updated May 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pit Schneider (2023). Luxembourgish Country Border - 5k Coordinates [Dataset]. https://data.public.lu/en/datasets/luxembourgish-country-border-5k-coordinates/
Explore at:
csv(110022)Available download formats
Dataset updated
May 8, 2023
Dataset authored and provided by
Pit Schneider
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Luxembourgish country border expressed as a CSV list of 5000 coordinates: First list entry contains northmost coordinates. Last list entry (row 5001) is identical to first entry. List sequence follows border in a clockwise way. All coordinates have a precision of seven decimal digits. Data was manually derived from Apple Maps, thus not representing legal/official border data.
f
country_list.csv and country_period_validation.csv files used in the...
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guy Abel (2023). country_list.csv and country_period_validation.csv files used in the bilateral international migration flow estimates by sex [Dataset]. http://doi.org/10.6084/m9.figshare.18737768.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.18737768.v4
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Guy Abel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary data files for Abel & Cohen (2022) including1. country_list.csv with country codes, country names, first and last period covered and availability of reported data used for validation exercise.2. country_period_validation.csv with types and sources of reported migration statistics for each country and period in each of collections used for the validation exercise.
d
Population figures for countries, regions (e.g. Asia) and the world
datahub.io
Updated Aug 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Population figures for countries, regions (e.g. Asia) and the world [Dataset]. https://datahub.io/core/population
Explore at:
Dataset updated
Aug 29, 2017
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Area covered
Asia, World
Description
Population figures for countries, regions (e.g. Asia) and the world. Data comes originally from World Bank and has been converted into standard CSV.
Geographical names index
gov.uk
s3.amazonaws.com
Updated Mar 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Foreign, Commonwealth & Development Office (2024). Geographical names index [Dataset]. https://www.gov.uk/government/publications/geographical-names-and-information
Explore at:
Dataset updated
Mar 25, 2024
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Foreign, Commonwealth & Development Office
Description
These are the British English-language names and descriptive terms for sovereign countries, UK Crown Dependencies and UK Overseas Territories, as well as their citizens. ‘Sovereign’ means that they are independent states, recognised under international law.

The Foreign, Commonwealth & Development Office (FCDO) approved these names. The FCDO leads on geographical names for the UK government, working closely with the Permanent Committee on Geographical Names.

In these lists:

‘country’ is the https://www.iso.org/iso-3166-country-codes.html" class="govuk-link">2-letter ISO 3166-1 code for the country

the ‘name’ is the FCDO-approved name for the country

All UK government departments and other public bodies must use the approved country and territory names in these datasets. Using these names ensures consistency and clarity across public and internal communications, guidance and services.

the full ‘official name’ is also provided for use when the formal version of a country’s name is needed

citizen names in the lists are not the legal names for the citizen, they do not relate to the citizen’s ethnicity

You can also view the Welsh language version of the geographical names index on https://www.gov.wales/bydtermcymru/international-place-names" class="govuk-link">GOV.WALES: international place-names.
e
Data from: The Tropical Andes Biodiversity Hotspot: A Comprehensive Dataset...
knb.ecoinformatics.org
dataone.org
+3more
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pablo Jarrín-V.; Mario H Yánez-Muñoz (2024). The Tropical Andes Biodiversity Hotspot: A Comprehensive Dataset for the Mira-Mataje Binational Basins [Dataset]. http://doi.org/10.5063/F14F1P6H
Explore at:
Unique identifier
https://doi.org/10.5063/F14F1P6H
Dataset updated
May 30, 2024
Dataset provided by
Knowledge Network for Biocomplexity
Authors
Pablo Jarrín-V.; Mario H Yánez-Muñoz
Time period covered
Jun 11, 2022 - Jun 11, 2023
Area covered

Description
We present a flora and fauna dataset for the Mira-Mataje binational basins. This is an area shared between southwestern Colombia and northwestern Ecuador, where both the Chocó and Tropical Andes biodiversity hotspots converge. Information from 120 sources was systematized in the Darwin Core Archive (DwC-A) standard and geospatial vector data format for geographic information systems (GIS) (shapefiles). Sources included natural history museums, published literature, and citizen science repositories across 18 countries. The resulting database has 33,460 records from 5,281 species, of which 1,083 are endemic and 680 threatened. The diversity represented in the dataset is equivalent to 10\% of the total plant species and 26\% of the total terrestrial vertebrate species in the hotspots. It corresponds to 0.07\% of their total area. The dataset can be used to estimate and compare biodiversity patterns with environmental parameters and provide value to ecosystems, ecoregions, and protected areas. The dataset is a baseline for future assessments of biodiversity in the face of environmental degradation, climate change, and accelerated extinction processes. The data has been formally presented in the manuscript entitled "The Tropical Andes Biodiversity Hotspot: A Comprehensive Dataset for the Mira-Mataje Binational Basins" in the journal "Scientific Data". To maintain DOI integrity, this version will not change after publication of the manuscript and therefore we cannot provide further references on volume, issue, and DOI of manuscript publication. - Data format 1: The .rds file extension saves a single object to be read in R and provides better compression, serialization, and integration within the R environment, than simple .csv files. The description of file names is in the original manuscript. -- m_m_flora_2021_voucher_ecuador.rds -- m_m_flora_2021_observation_ecuador.rds -- m_m_flora_2021_total_ecuador.rds -- m_m_fauna_2021_ecuador.rds - Data format 2: The .csv file has been encoded in UTF-8, and is an ASCII file with text separated by commas. The description of file names is in the original manuscript. -- m_m_flora_fauna_2021_all.zip. This file includes all biodiversity datasets. -- m_m_flora_2021_voucher_ecuador.csv -- m_m_flora_2021_observation_ecuador.csv -- m_m_flora_2021_total_ecuador.csv -- m_m_fauna_2021_ecuador.csv - Data format 3: We consolidated a shapefile for the basin containing layers for vegetation ecosystems and the total number of occurrences, species, and endemic and threatened species for each ecosystem. -- biodiversity_measures_mira_mataje.zip. This file includes the .shp file and accessory geomatic files. - A set of 3D shaded-relief map representations of the data in the shapefile can be found at https://doi.org/10.6084/m9.figshare.23499180.v4 Three taxonomic data tables were used in our technical validation of the presented dataset. These three files are: 1) the_catalog_of_life.tsv (Source: Bánki, O. et al. Catalogue of life checklist (version 2024-03-26). https://doi.org/10.48580/dfz8d (2024)) 2) world_checklist_of_vascular_plants_names.csv (we are also including ancillary tables "world_checklist_of_vascular_plants_distribution.csv", and "README_world_checklist_of_vascular_plants_.xlsx") (Source: Govaerts, R., Lughadha, E. N., Black, N., Turner, R. & Paton, A. The World Checklist of Vascular Plants is a continuously updated resource for exploring global plant diversity. Sci. Data 8, 215, 10.1038/s41597-021-00997-6 (2021).) 3) world_flora_online.csv (Source: The World Flora Online Consortium et al. World flora online plant list December 2023, 10.5281/zenodo.10425161 (2023).)
d
Country, Regional and World GDP (Gross Domestic Product)
datahub.io
Updated Aug 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Country, Regional and World GDP (Gross Domestic Product) [Dataset]. https://datahub.io/core/gdp
Explore at:
Dataset updated
Aug 29, 2017
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Area covered
World
Description
Country, regional and world GDP in current US Dollars ($). Regional means collections of countries e.g. Europe & Central Asia. Data is sourced from the World Bank and turned into a standard normalized CSV.
List_of_countries_by_wheat_exports
kaggle.com
Updated Jul 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathurin Aché (2020). List_of_countries_by_wheat_exports [Dataset]. https://www.kaggle.com/mathurinache/list-of-countries-by-wheat-exports/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 17, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mathurin Aché
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is extracted from https://en.wikipedia.org/wiki/List_of_countries_by_wheat_exports. Context: There s a story behind every dataset and heres your opportunity to share yours.Content: What s inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. Acknowledgements:We wouldn t be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.Inspiration: Your data will be in front of the world s largest data science community. What questions do you want to see answered?
d
Addresses RÚIAN data distributed by the country in the CSV format
data.gov.cz
Updated Feb 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Český úřad zeměměřický a katastrální (2024). Addresses RÚIAN data distributed by the country in the CSV format [Dataset]. https://data.gov.cz/dataset?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A9-sady%2F00025712%2F3eac0278ad025b9a9015465571fdb907
Explore at:
Dataset updated
Feb 23, 2024
Dataset authored and provided by
Český úřad zeměměřický a katastrální
Description
Dataset contains list of address points for the whole Czech Republic in CSV format. For each address point following attributes are specified: address point code, municipality code and name, code and name of town district (for territorialy structured statutory cities only), code and name of Prague city district (for Prague only), municipality part code and name, street code and name (in case it is specified), type of building object (with description/registration house number), house number, orientation number (if it is specified), character of orientation number (if it is specified), postal code, Y and X coordinates of pointer of address point (in JTSK coordinate system) and the date of validity. Dataset is provided as Open Data (licence CC-BY 4.0). Data is based on RÚIAN (Register of Territorial Identification, Addresses and Real Estates). Data covers the whole territory of the Czech Republic. Data is provided in a compressed form (ZIP archive). File is created during the first day of each month with data valid to the last day of previous month. More in the Act No. 111/2009 Coll., on the Basic Registers, in Decree No. 359/2011 Coll., on the Basic Register of Territorial Identification, Addresses and Real Estates.
Film Circulation dataset
zenodo.org
data.niaid.nih.gov
bin, csv, png
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova (2024). Film Circulation dataset [Dataset]. http://doi.org/10.5281/zenodo.7887672
Explore at:
csv, png, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7887672
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

Please cite this when using the dataset.

Detailed description of the dataset:

1 Film Dataset: Festival Programs

The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

2 Survey Dataset

The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

3 IMDb & Scripts

The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

4 Festival Library Dataset

The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,
d
[Eco-Movement] EV Charging Station DC Hardware Data - CSV updated daily
datarade.ai
.csv
Updated Feb 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eco-Movement (2021). [Eco-Movement] EV Charging Station DC Hardware Data - CSV updated daily [Dataset]. https://datarade.ai/data-products/eco-movement-ev-charge-point-data-complete-coverage-of-euro-eco-movement
Explore at:
.csvAvailable download formats
Dataset updated
Feb 26, 2021
Dataset authored and provided by
Eco-Movement
Area covered
Liechtenstein, Réunion, Slovenia, Netherlands, Turkey, Chile, Isle of Man, Lithuania, Guadeloupe, Monaco
Description
Eco-Movement is the leading source for EV charging station data. We offer full coverage of all (semi)public EV chargers across Europe, North & Latin America, Oceania, and ever more additional countries. Our real-time database now contains about 1,000,000 unique plugs. Eco-Movement is a specialised B2B data provider focusing 100% on EV charging station data quality and enrichment. Hundreds of quality checks are performed through our proprietary quality dashboard, IT architecture and AI. With the highest quality on the market, we are the trusted choice of mobility industry leaders such as Google, Tesla, Bloomberg, and the European Commission’s EAFO portal.

Eco-Movement integrates data from 3000+ direct connections with EV Charge Point Operators into a uniform, accurate and complete database. We have an unparalleled set of charge point related attributes, all available on individual charging plug level: from Geolocation to Max Power and from Operator to Hardware and Pricing details. Simple, reliable, and up-to-date: The Eco-Movement database is refreshed every day.

Whether you are in need of insights, building new products or conducting research, high quality data is more important than ever. Our online Data Retrieval Platform is the easy solution to all your EV Charging Station related data needs. Our DC Hardware Data is an unique dataset developed by Eco-Movement, providing hardware information on individual DC charging station level. This report is for your organisation if you want to gain access to accurate data on the manufacturer and model of charging stations, for example as an essential input for your R&D strategy or competitive analysis.

The hardware report includes full geolocation, operator/brand, and technical information for each individual station, as well as two specific hardware attributes: DC Hardware Manufacturer and DC Hardware Model. This report is available for all countries in our database (see full list of territories below). The price of the data is dependent on the geographies chosen, the length of the subscription, and the intended use.

Check out our other Data Offerings available, and gain more valuable market insights on EV charging directly from the experts.

ALSO AVAILABLE We also offer EV Charging Station Location & Tariffs Data via API (JSON) or online download (CSV). Get detailed insights on Charging Station Locations as well as the prices paid at individual chargers, whether payment is done directly to the CPO or with one of the 200+ eMSP products in our database.

ABOUT US Eco-Movement's mission is providing the EV ecosystem with the best and most relevant Charging Station information. Based in Utrecht, the Netherlands, Eco-Movement is completely independent from other industry players. We are an active and trusted player in the EV ecosystem and the exclusive source for European Commission charging infrastructure data (EAFO).
C
Replication data for "High life satisfaction reported among small-scale...
dataverse.csuc.cat
csv, txt
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Galbraith; Eric Galbraith; Victoria Reyes Garcia; Victoria Reyes Garcia (2024). Replication data for "High life satisfaction reported among small-scale societies with low incomes" [Dataset]. http://doi.org/10.34810/data904
Explore at:
csv(1620), csv(7829), txt(7017), csv(227502)Available download formats
Unique identifier
https://doi.org/10.34810/data904
Dataset updated
Feb 7, 2024
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Eric Galbraith; Eric Galbraith; Victoria Reyes Garcia; Victoria Reyes Garcia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2021 - Oct 24, 2023
Area covered
Darjeeling, India, Ba, Fiji, Bulgan soum, Mongolia, Bassari country, Senegal, Laprak, Nepal, Puna, Argentina, United Republic of, Tanzania, Mafia Island, China, Shangri-la, Western highlands, Guatemala, Ghana, Kumbungu
Dataset funded by
European Commission
Description
This dataset was created in order to document self-reported life evaluations among small-scale societies that exist on the fringes of mainstream industrialized socieities. The data were produced as part of the LICCI project, through fieldwork carried out by LICCI partners. The data include individual responses to a life satisfaction question, and household asset values. Data from Gallup World Poll and the World Values Survey are also included, as used for comparison. TABULAR DATA-SPECIFIC INFORMATION --------------------------------- 1. File name: LICCI_individual.csv Number of rows and columns: 2814,7 Variable list: Variable names: User, Site, village Description: identification of investigator and location Variable name: Well.being.general Description: numerical score for life satisfaction question Variable names: HH_Assets_US, HH_Assets_USD_capita Description: estimated value of representative assets in the household of respondent, total and per capita (accounting for number of household inhabitants) 2. File name: LICCI_bySite.csv Number of rows and columns: 19,8 Variable list: Variable names: Site, N Description: site name and number of respondents at the site Variable names: SWB_mean, SWB_SD Description: mean and standard deviation of life satisfaction score Variable names: HHAssets_USD_mean, HHAssets_USD_sd Description: Site mean and standard deviation of household asset value Variable names: PerCapAssets_USD_mean, PerCapAssets_USD_sd Description: Site mean and standard deviation of per capita asset value 3. File name: gallup_WVS_GDP_pk.csv Number of rows and columns: 146,8 Variable list: Variable name: Happiness Score, Whisker-high, Whisker-low Description: from Gallup World Poll as documented in World Happiness Report 2022. Variable name: GDP-PPP2017 Description: Gross Domestic Product per capita for year 2020 at PPP (constant 2017 international $). Accessed May 2022. Variable name: pk Description: Produced capital per capita for year 2018 (in 2018 US$) for available countries, as estimated by the World Bank (accessed February 2022). Variable names: WVS7_mean, WVS7_std Description: Results of Question 49 in the World Values Survey, Wave 7.
List_of_countries_by_traffic-related_death_rate
kaggle.com
Updated Jul 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathurin Aché (2020). List_of_countries_by_traffic-related_death_rate [Dataset]. https://www.kaggle.com/mathurinache/list-of-countries-by-traffic-related-death-rate/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 17, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mathurin Aché
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is extracted from https://en.wikipedia.org/wiki/List_of_countries_by_traffic-related_death_rate. Context: There s a story behind every dataset and heres your opportunity to share yours.Content: What s inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. Acknowledgements:We wouldn t be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.Inspiration: Your data will be in front of the world s largest data science community. What questions do you want to see answered?
d
Key generic technology prediction in patent citation using graph neural...
dataone.org
data.niaid.nih.gov
+1more
Updated Jun 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. L. Ding (2024). Key generic technology prediction in patent citation using graph neural networks [Dataset]. http://doi.org/10.5061/dryad.nk98sf803
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.nk98sf803
Dataset updated
Jun 5, 2024
Dataset provided by
Dryad Digital Repository
Authors
M. L. Ding
Time period covered
Jan 11, 2024
Description
With the rapid advancement of the Fourth Industrial Revolution, international competition in technology and industry is intensifying. However, in the era of big data and large-scale science, making accurate judgments about the key areas of technology and innovative trends has become exceptionally difficult. This paper constructs a patent indicator evaluation system based on the dimensions of key and generic patent citation, integrates graph neural network modeling to predict key common technologies, and confirms the effectiveness of the method using the field of genetic engineering as an example. According to the LDA topic model, the main technical R&D directions in genetic engineering are genetic analysis and detection technologies, the application of microorganisms in industrial production, virology research involving vaccine development and immune responses, high-throughput sequencing and analysis technologies in genomics, targeted drug design and molecular therapeutic strategies..., These datasets were obtained by the Incopat patent database for cited patents (2013-2022) in the field of genetic engineering. Details for the datasets are provided in the README file. This directory contains the selection of the patent datasets. 1) Table of key generic indicators for nodes (partial 1).csv This file consists of 10 indicators of patents: technical coverage, patent families, patent family citation, patent cooperation, enterprise-enterprise cooperation, industry-university-research cooperation, claims, citation frequency, layout countries, and layout countries. 2) Table of key generic indicators for nodes (partial 2).csv This file consists of 10 indicators of patents: technical convergence, cited countries, inventors, citations, homologous countries/areas, degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, and PageRank. 3) patent.content The content file contains descriptions of the patents in the following format:

This README file was generated on 2023-11-25 by Mingli Ding.

GENERAL INFORMATION

Author Information Investigators Contact Information Name: Mingli Ding; Wangke Yu; Shuhua Wang Institution: Jingdezhen Ceramic University Address: Jingdezhen, Jiangxi, China Email: mlding1@163.com

Date of data collection:2013-2022

DATA & FILE OVERVIEW

File List:

A) Table of key generic indicators for nodes (partial 1).csv

B) Table of key generic indicators for nodes (partial 2).csv

C) patent.content

D) patent.cites

E) Graph neural network modeling highest accuracy for different dimensions.csv

F) Prediction effects of key generic technologies.csv

DATA-SPECIFIC INFORMATION FOR: Table of key generic indicators for nodes (partial 1).csv

Number of variables: 10

Number of cases/rows: 72489

Variable List:

technical coverage: number ...

Facebook

Twitter

Click to copy link

Link copied

Cite

(2017). List of all countries with their 2 digit codes (ISO 3166-1) [Dataset]. https://datahub.io/core/country-list

List of all countries with their 2 digit codes (ISO 3166-1)

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Aug 29, 2017

License

ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically

Description

ISO 3166-1-alpha-2 English country names and code elements. This list states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements.

Clear search

Close search

Google apps

Main menu

List of all countries with their 2 digit codes (ISO 3166-1)

List_of_countries_by_population_in_1800

Countries and territories Named Authority List

Country Codes

CY-Bench: A comprehensive benchmark dataset for subnational crop yield...

CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting

Overview

Data

Data format

Data content

Folder structure

Example : CSV data content for maize in country X

Data access

License and citation

GDP by Country Dataset

COVID-19 useful features by country

Context

Content

Column Description

Acknowledgements

Luxembourgish Country Border - 5k Coordinates

country_list.csv and country_period_validation.csv files used in the...

Population figures for countries, regions (e.g. Asia) and the world

Geographical names index

Data from: The Tropical Andes Biodiversity Hotspot: A Comprehensive Dataset...

Country, Regional and World GDP (Gross Domestic Product)

List_of_countries_by_wheat_exports

Addresses RÚIAN data distributed by the country in the CSV format

Film Circulation dataset

[Eco-Movement] EV Charging Station DC Hardware Data - CSV updated daily

Replication data for "High life satisfaction reported among small-scale...

List_of_countries_by_traffic-related_death_rate

Key generic technology prediction in patent citation using graph neural...

GENERAL INFORMATION

DATA & FILE OVERVIEW

DATA-SPECIFIC INFORMATION FOR: Table of key generic indicators for nodes (partial 1).csv

List of all countries with their 2 digit codes (ISO 3166-1)