28 datasets found

Modern value of basic Roman numerals
statista.com
Updated Sep 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2019). Modern value of basic Roman numerals [Dataset]. https://www.statista.com/statistics/1046921/modern-value-basic-roman-numerals/
Explore at:
Dataset updated
Sep 4, 2019
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Roman Empire, Europe
Description
Although there are thousands of languages and over a hundred alphabets in use in the world today, most cultures use the same system for displaying numbers. This system is the Arabic numeral system, also called Hindu-Arabic numerals, which uses ten digits (1, 2, 3, 4, 5, 6, 7, 8, 9 and 0) to display numerical values. This system was developed by Indian mathematicians over 1,500 years ago, and it spread west through Persian and Arab cultures. The work of the Italian mathematician Fibonacci made this system known in Europe, and it proved popular with traders as it was much easier to use when making basic calculations. This system then went global through sixteenth century colonialism and globalization, replacing many of the previously-existing numerical systems, such as the Babylonian and Mayan systems (now considered complicated to use when making calculations). Roman Numerals Another system that was replaced by Arabic numerals was the system of Roman numerals. This system was used from as early as 500BCE, until the late-middle ages (which ended around 1500CE). Similarly to the Arabic system, the Roman system used a set number of digits in different combinations to represent different numerical values. The Roman system uses seven individual digits as the foundation of all numbers. The Roman digits with Arabic values are; I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, and M = 1,000. Combinations The major difference between Roman and Arabic numerals is the way in which the digits are combined to create different numbers. In Roman numerology, the display of larger numbers requires more individual digits than in Arabic numerology, making it more complicated. In modern mathematics, the value of a digit depends on its position related to the decimal point, whereas in Roman numerals the digit I can come before or after other digits to show if it is more or less than the other numerals, which means that the digit with the lowest value is not always the furthest to the right (for example, IX = 9, whereas XI = 11). If one of the basic numerals is shown with a line above it, this means that the number has been multiplied by a thousand, for example, X with a line above it is equal to 10,000.
g
UV-Index in Münster | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UV-Index in Münster | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_89258d87-8d8d-450c-aba7-967b54bdb350
Explore at:
Description
🇩🇪 독일 English The data set "UV Index" provides information on the radiation exposure of ultraviolet radiation (UV) in the city of Münster. The UV index is a standard measure of the strength of sunburn-producing ultraviolet radiation at a specific location and time. Main characteristics of the dataset: Data type: Numerical values representing the UV index. A value of 0 means "no load", values greater than 11 mean "extreme load". The dataset is updated daily to reflect the current UV radiation values. Available in CSV and JSON formats for analysis and integration into applications. Data source: German Weather Service (DWD)
n
Conceptualizing relationships among hyporheic exchange, storage, and water...
data.niaid.nih.gov
datadryad.org
zip
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoffrey Poole; S. Katie Fogg; Scott O'Daniel; Ann Marie Reinhold; Samuel Carlson; Elizabeth Mohr; Hayley Oakland (2022). Conceptualizing relationships among hyporheic exchange, storage, and water age: data represented in published figures [Dataset]. http://doi.org/10.5061/dryad.m905qfv2q
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.m905qfv2q
Dataset updated
Jan 12, 2022
Dataset provided by
Montana State University
Confederated Tribes of the Umatilla Indian Reservation
Authors
Geoffrey Poole; S. Katie Fogg; Scott O'Daniel; Ann Marie Reinhold; Samuel Carlson; Elizabeth Mohr; Hayley Oakland
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Hyporheic exchange is a key driver of ecosystem processes in streams, yet stream ecologists often fail to leverage detailed conceptual models developed by engineers and hydrologists describing the relationship between water storage, water balance, and water age (time elapsed since a conceptual parcel of water entered the hyporheic zone) in hyporheic zones. In a companion paper (G.C. Poole et al. Hyporheic Hydraulic Geometry: Conceptualizing relationships among hyporheic exchange, storage, and water age, published in PLoS ONE; doi:10.1371/journal.pone.0262080), we provide visualizations of these relationships in an effort to allow non-hydrologists to grasp four primary concepts along with associated research and management implications: 1) the rate of hyporheic exchange, size of the hyporheic zone, and hyporheic water age are inexorably linked; 2) such linkages can be leveraged to build understanding of hyporheic processes; 3) the age distribution of hyporheic water and hyporheic discharge is heavily skewed toward young water ages -- at any temporal scale of observation (minutes, hours, days, or months) older hyporheic water is rare relative to younger water; 4) the age distribution of water discharged from any hyporheic zone is not the same as the age distribution of water stored within that hyporheic zone. The data set presented here represents the numerical values represented by the figures published in the companion paper.

Methods Data used to support "nautilus plots" in the companion paper were calculated using the equations described in the companion paper.

Data describing the behavior of a conservative tracer in an annular flume were derived from a salt slug release into an experimental flume, as described in the companion paper.

Data describing the relationship between hyporheic water temperature, water age, and day of year represent a synthetic data set built on relationships between hyporheic water age and temperature along an idealized hyporheic flow path using relationships described in: Helton AM, Poole GC, Payn RA, Izurieta C, Stanford JA. Scaling flow pathprocesses to fluvial landscapes: An integrated field and model assessment of temperature and dissolved oxygen dynamics in a river-floodplain-aquifer system. Journal of Geophysical Research: Biogeosciences. 2012;117(G4). doi:10.1029/2012JG002025.

Data comparing stream temperature, mean temperature of aquifer discharge, and mean temperature of stored hyporheic water were calculated using equations published in the companion paper.
p
CANCER DATA - Dataset - CKAN
data.poltekkes-smg.ac.id
Updated Oct 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). CANCER DATA - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/cancer-data
Explore at:
Dataset updated
Oct 7, 2024
Description
Breast Cancer Data Set This dataset contains the characteristics of patients diagnosed with cancer. The dataset contains a unique ID for each patient, the type of cancer (diagnosis), the visual characteristics of the cancer and the average values of these characteristics. 📚 The main features of the dataset are as follows: id: Represents a unique ID of each patient. diagnosis: Indicates the type of cancer. This property can take the values "M" (Malignant - Benign) or "B" (Benign - Malignant). radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, concavity_mean, concave points_mean: Represents the mean values of the cancer's visual characteristics. There are also several categorical features where patients in the dataset are labeled with numerical values. You can examine them in the Chart area.
Data from: Black Friday Sale
kaggle.com
zip
Updated Dec 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ReLearn (2022). Black Friday Sale [Dataset]. https://www.kaggle.com/rajeshrampure/black-friday-sale
Explore at:
zip(5744184 bytes)Available download formats
Dataset updated
Dec 24, 2022
Authors
ReLearn
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset History

A retail company “ABC Private Limited” wants to understand the customer purchase behavior (specifically, purchase amount) against various products of different categories. They have shared purchase summaries of various customers for selected high-volume products from last month. The data set also contains customer demographics (age, gender, marital status, city type, stayincurrentcity), product details (productid and product category), and Total purchase amount from last month.

Now, they want to build a model to predict the purchase amount of customers against various products which will help them to create a personalized offer for customers against different products.

Tasks to perform

The purchase col column is the Target Variable, perform Univariate Analysis and Bivariate Analysis w.r.t the Purchase.

Masked in the column description means already converted from categorical value to numerical column.

Below mentioned points are just given to get you started with the dataset, not mandatory to follow the same sequence.

DATA PREPROCESSING

Check basic statistics of the dataset

Check for missing values in the data

Check for unique values in data

Perform EDA

Purchase Distribution

Check for outliers

Analysis by Gender, Marital Status, occupation, occupation vs purchase, purchase by city, purchase by age group, etc

Drop unnecessary fields

Convert categorical data into integer using map function (e.g 'Gender' column)

Missing value treatment

Rename columns

Fill nan values

Map range variables into integers (e.g 'Age' column)

Data Visualisation

visualize an individual column

Age vs Purchased

Occupation vs Purchased

Productcategory1 vs Purchased

Productcategory2 vs Purchased

Productcategory3 vs Purchased

City category pie chart

Check for more possible plots

All the Best!!
Z
TrackML Throughput Phase
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated May 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salzburger, Andreas; Innocente, Vincenzo; vlimant, jean-roch; rousseau, david; Gligorov, Vladimir; Basara, Laurent; Estrade, Victor; Calafiura, Paolo; Farell, Steven; Gray, Heather; Golling, Tobias; Kiehn, Moritz; Amrouche, Sabrina; Hushchyn, Mikhail; Ustyuzhanin, Andrey; Moyse, Edward; Germain, Cecile; Guyon, Isabelle (2021). TrackML Throughput Phase [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4730156
Explore at:
Dataset updated
May 10, 2021
Dataset provided by
CERN
California Institute of Technology
INRIA
LBNL
School of Data Analysis
University of Massachusetts
CNRS
University of Geneva
Authors
Salzburger, Andreas; Innocente, Vincenzo; vlimant, jean-roch; rousseau, david; Gligorov, Vladimir; Basara, Laurent; Estrade, Victor; Calafiura, Paolo; Farell, Steven; Gray, Heather; Golling, Tobias; Kiehn, Moritz; Amrouche, Sabrina; Hushchyn, Mikhail; Ustyuzhanin, Andrey; Moyse, Edward; Germain, Cecile; Guyon, Isabelle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Original source from Codalab : https://competitions.codalab.org/competitions/20112

The dataset comprises multiple independent events, where each event contains simulated measurements (essentially 3D points) of particles generated in a collision between proton bunches at the Large Hadron Collider at CERN. The goal of the tracking machine learning challenge is to group the recorded measurements or hits for each event into tracks, sets of hits that belong to the same initial particle. A solution must uniquely associate each hit to one track. The training dataset contains the recorded hits, their ground truth counterpart and their association to particles, and the initial parameters of those particles. The test dataset contains only the recorded hits.

Once unzipped, the dataset is provided as a set of plain .csv files. Each event has four associated files that contain hits, hit cells, particles, and the ground truth association between them. The common prefix, e.g. event000000010, is always event followed by 9 digits.

event000000000-hits.csv

event000000000-cells.csv

event000000000-particles.csv

event000000000-truth.csv

event000000001-hits.csv

event000000001-cells.csv

event000000001-particles.csv

event000000001-truth.csv

Event hits

The hits file contains the following values for each hit/entry:

hit_id: numerical identifier of the hit inside the event.

x, y, z: measured x, y, z position (in millimeter) of the hit in global coordinates.

volume_id: numerical identifier of the detector group.

layer_id: numerical identifier of the detector layer inside the group.

module_id: numerical identifier of the detector module inside the layer.

The volume/layer/module id could in principle be deduced from x, y, z. They are given here to simplify detector-specific data handling.

Event truth

The truth file contains the mapping between hits and generating particles and the true particle state at each measured hit. Each entry maps one hit to one particle.

hit_id: numerical identifier of the hit as defined in the hits file.

particle_id: numerical identifier of the generating particle as defined in the particles file. A value of 0 means that the hit did not originate from a reconstructible particle, but e.g. from detector noise.

tx, ty, tz true intersection point in global coordinates (in millimeters) between the particle trajectory and the sensitive surface.

tpx, tpy, tpz true particle momentum (in GeV/c) in the global coordinate system at the intersection point. The corresponding vector is tangent to the particle trajectory at the intersection point.

weight per-hit weight used for the scoring metric; total sum of weights within one event equals to one.

Event particles

The particles files contains the following values for each particle/entry:

particle_id: numerical identifier of the particle inside the event.

vx, vy, vz: initial position or vertex (in millimeters) in global coordinates.

px, py, pz: initial momentum (in GeV/c) along each global axis.

q: particle charge (as multiple of the absolute electron charge).

nhits: number of hits generated by this particle.

All entries contain the generated information or ground truth.

Event hit cells

The cells file contains the constituent active detector cells that comprise each hit. The cells can be used to refine the hit to track association. A cell is the smallest granularity inside each detector module, much like a pixel on a screen, except that depending on the volume_id a cell can be a square or a long rectangle. It is identified by two channel identifiers that are unique within each detector module and encode the position, much like column/row numbers of a matrix. A cell can provide signal information that the detector module has recorded in addition to the position. Depending on the detector type only one of the channel identifiers is valid, e.g. for the strip detectors, and the value might have different resolution.

hit_id: numerical identifier of the hit as defined in the hits file.

ch0, ch1: channel identifier/coordinates unique within one module.

value: signal value information, e.g. how much charge a particle has deposited.

Additional detector geometry information

The detector is built from silicon slabs (or modules, rectangular or trapezoïdal), arranged in cylinders and disks, which measure the position (or hits) of the particles that cross them. The detector modules are organized into detector groups or volumes identified by a volume id. Inside a volume they are further grouped into layers identified by a layer id. Each layer can contain an arbitrary number of detector modules, the smallest geometrically distinct detector object, each identified by a module_id. Within each group, detector modules are of the same type have e.g. the same granularity. All simulated detector modules are so-called semiconductor sensors that are build from thin silicon sensor chips. Each module can be represented by a two-dimensional, planar, bounded sensitive surface. These sensitive surfaces are subdivided into regular grids that define the detectors cells, the smallest granularity within the detector.

Each module has a different position and orientation described in the detectors file. A local, right-handed coordinate system is defined on each sensitive surface such that the first two coordinates u and v are on the sensitive surface and the third coordinate w is normal to the surface. The orientation and position are defined by the following transformation

pos_xyz = rotation_matrix * pos_uvw + translation

that transform a position described in local coordinates u,v,w into the equivalent position x,y,z in global coordinates using a rotation matrix and and translation vector (cx,cy,cz).

volume_id: numerical identifier of the detector group.

layer_id: numerical identifier of the detector layer inside the group.

module_id: numerical identifier of the detector module inside the layer.

cx, cy, cz: position of the local origin in the global coordinate system (in millimeter).

rot_xu, rot_xv, rot_xw, rot_yu, ...: components of the rotation matrix to rotate from local u,v,w to global x,y,z coordinates.

module_t: half thickness of the detector module (in millimeter).

module_minhu, module_maxhu: the minimum/maximum half-length of the module boundary along the local u direction (in millimeter).

module_hv: the half-length of the module boundary along the local v direction (in millimeter).

pitch_u, pitch_v: the size of detector cells along the local u and v direction (in millimeter).

There are two different module shapes in the detector, rectangular and trapezoidal. The pixel detector ( with volume_id = 7,8,9) is fully built from rectangular modules, and so are the cylindrical barrels in volume_id=13,17. The remaining layers are made out disks that need trapezoidal shapes to cover the full disk.
ERA5 monthly averaged data on single levels from 1940 to present
cds.climate.copernicus.eu
grib
Updated Nov 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 monthly averaged data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.f17050d7
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.f17050d7
Dataset updated
Nov 6, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 monthly mean data on single levels from 1940 to present".
n
Vocalizations of the squirrel family
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jul 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sasha Newar; Jeff Bowman (2020). Vocalizations of the squirrel family [Dataset]. http://doi.org/10.5061/dryad.vt4b8gtpm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.vt4b8gtpm
Dataset updated
Jul 6, 2020
Dataset provided by
Ministry of Natural Resources and Forestry
Trent University
Authors
Sasha Newar; Jeff Bowman
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The dataset Squirrel_Calls is a collection of vocal records (defined as primary literature that numerically describes the vocalization of at least 1 squirrel species) where each row corresponds to a single call type of one species. The details of the row include a summary of the literature metadata, categorical descriptions of the call and the caller as well as numerical values of the call frequencies. The dataset Squirrel_Ecological_Traits is a corresponding set of ecological traits for all the species listed in the Squirrel_Calls dataset. The traits listed (mass, time partitioning, gliding capabilities, habitat, and sociality) reflect hypotheses and predictions explored in the associated article. At the end of this document, there is a complete list of the literature references used to assemble these datasets. Squirrel_Script is the R script used to produce the statistics and models used in the corresponding paper. Squirrel_Tree is a nexus file compiling the data of 1000 trees downloaded from VertLife.org which were subsetted from their published mammalian supertree. The nexus file was used in the R script.

Methods We developed a database beginning with a list of publications that described the vocalizations of squirrels. The minimum requirement for each publication was the description of at least one call with either a spectrographic analysis or numerical data, though the majority of publications described multiple calls per species or described multiple species per publication (493 calls from 72 species represented in 89 publications). The databases used to search for these publications were Google Scholar, JSTOR, Web of Science, and Wiley Online Library. We used the keywords acoustics, acoustic repertoire, calls, frequency, Hz, vocalizations, and ultrasound paired with Sciuridae, squirrel, or an exhaustive list of currently valid and invalid genera (the most updated nomenclature was taken from the Integrated Taxonomic Information System http://www.itis.gov/). For each call described in the selected publication, the following characteristics were taken: the fundamental frequency (F0: the mean frequency of the primary vibrational frequency of the vocal membrane; kHz), dominant frequency (FDom: the frequency with the greatest energy, power or amplitude; kHz), minimum frequency (FMin: the minimum frequency of the fundamental frequency; kHz), maximum frequencies (FMax: the maximum frequency of the fundamental frequency (or of harmonic on which FDom is measured); kHz), and the highest visible harmonic (FHarm: mean frequency of the highest complete harmonic visible on the spectrograph; kHz).

Once our review of vocalization publications was complete, we searched for the body mass (g), diel activity pattern (diurnal or nocturnal), social complexity, and habitat openness of the dominant habitat (open or closed) of each species from the relevant vocalization papers. If not provided, other resources including Mammalian Species accounts, PanTHERIA (Jones et al. 2009), and the Animal Diversity Web (Myers et al. 2020) were reviewed. Both male and female body masses were initially recorded, but male body size could not be found for Spermophilus taurensis. Male and female body mass were strongly correlated (r = 0.98, p < 0.001), therefore female body mass was chosen to represent squirrel body size. Because we could only assign an adult female body mass to all species, calls that are exclusively produced by males or pups were removed from the dataset before analysis. We pooled all other calls (calls produced by both sexes or females only as well as calls produced by juveniles and adults) as there is little evidence to suggest that juveniles and adults produce acoustically distinct calls across the family (Matrosova et al. 2007, 2011; Schneiderová 2012; Volodina, Matrosova, and Volodin 2010; but see: Nikol’skii 2007). While the initial database included a five-tiered social classification ranging from solitary to colonial (based on the social grades of ground squirrels described by Matějů et al. (2016)), social classes were reduced to social or solitary living to reduce model parameters. Species that exhibit dynamic social structures, such as flying squirrels that engage in social nesting to a greater extent during one portion of the year (Garroway, Bowman, and Wilson 2013), were treated as socially living. Two subspecies (Marmota baibacina centralis and Tamias dorsalis dorsalis) could not be used in the subsequent analyses because ecological data and body mass-specific to each subspecies could not be found; similarly, the species Spermophilus pallidicauda could not be included as body mass for either sex could not be found.

Phylogeny

VertLife, an online resource that allows the user to extract pruned trees from vertebrate supertrees, was used to produce 100 pruned trees from the Mammalian supertree (Upham, Esselstyn, and Jetz 2019). Three subspecies had to be incorporated under their parent species, so branch tips were broken in two and subspecies were treated as equivalent to parent species, with branch lengths identical between the parent and subspecies (the addition of a subspecies did not create any polytomies in the tree). Three species are represented by subspecies only: Sciurus aberti kaibensis, Sciurus niger rufiventer, and Callosciurus erythraeus thaiwanensis.

Statistical Analysis

Phylogenetic generalized least square (PGLS) modelling was used to account for the variation in acoustic repertoire that may be explained by phylogenetic relatedness. PGLS models produce a lambda parameter, λ, that represents the degree to which the variance of traits is explained by the phylogenetic relationships in the model. The λ parameter varies between 0 and 1, with 0 representing no phylogenetic trace and 1 representing absolute Brownian motion (Freckleton, Harvey, and Pagel 2002; Martin et al. 2016).

PGLS modelling restricts each species to a single observation (i.e., no subsampling of species permitted). Therefore, the numerous data entries per species had to be reduced. For the fundamental, dominant, maximum, and highest harmonic frequencies, the absolute maximum value for each characteristic reported among all publications was chosen. Likewise, for minimum frequency, the absolute minimum reported frequency was chosen. We use maximum and minimum values rather than the median for a more rigorous test of our hypothesis about method limits.

Body mass and all frequency characteristics were log-transformed to achieve normal distributions. Additive models were built for each frequency type (β0 + body mass (βMass) + diel activity pattern (βDiel) + sociality (βSociality) + habitat openness (βOpen) + method limits (βLim)) using the caper package in R (ver 3.6.2). We reported the test statistics of the regression to evaluate significance and effect size (F-statistic, p-value, and adjusted R2).
H
2000-2015: Monthly Means of Daily Means Wind Speed TIFFs for the Lake...
dataverse.harvard.edu
search.dataone.org
Updated Oct 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2016). 2000-2015: Monthly Means of Daily Means Wind Speed TIFFs for the Lake Victoria Region [Dataset]. http://doi.org/10.7910/DVN/FQBKX5
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/FQBKX5
Dataset updated
Oct 11, 2016
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
2000-2015: Monthly Means of Daily Means Wind Speed TIFFs for the Lake Victoria Region Reference System: SR-ORG:14 Developed for: NSF Award Number: 1518532 CNH-L: The Potential for Aquaculture in Lake Victoria and Implications for Wild Fisheries and Fish Commodity Markets File Naming Convention: All TIFFs are named under the following naming convention: YYYY_MMWS, where YYYY refers to year of the data, MM refers to the month, and WS refers to Wind Speed. This wind speed data set was developed for the purposes of National Science Foundation (NSF) Coupled Natural and Human systems research. Data Origin: Original dataset was downloaded from: http://apps.ecmwf.int/datasets/data/interim-full-moda/levtype=sfc/ Original data downloaded was ERA Interim, Monthly Means of Daily Means, and was developed by the European Centre for Medium-Range Weather Forecasts (ECMWF). Data was downloaded as single component 10 meter Wind Speed data packaged as NetCDF. Data values are measured in m/s-1. Data values are monthly means of daily means. Data is representative of surface level collection. Origin data was derived and developed from reanalysis. ECMWF, the data provider and developer, defines their methods of reanalysis as follows: “Reanalysis (as well as analysis) is a process by which model information and observations of many different sorts are combined in an optimal way to produce a consistent, global best estimate of the various atmospheric, wave and oceanographic parameters.” http://www.ecmwf.int/en/how-are-data-obtained-do-they-come-observations-or-have-they-been-derived-numerical-models Extent downloaded: (degrees) N: 2 W: 27 S: -5 E: 36 Resolution downloaded: (degrees) 0.125 x 0.125 (approx. 14km at the equator). Data Development/Processing: These data were downloaded individually as monthly 10 meter Wind Speed in NetCDF format then converted to TIFF using the following GDAL script. for %A in ("C:\temp*.nc") do gdal_translate -of GTiff -ot FLOAT32 -a_srs "+init=epsg:4326" -unscale -co "COMPRESS=PACKBITS" "%A" "%A.tif Converted TIFF data was validated against the parent NetCDF file for correct cell size and pixel value. To use this data, cite: http://onlinelibrary.wiley.com/doi/10.1002/qj.828/abstract
Wine Quality Data Set (Red & White Wine)
kaggle.com
zip
Updated Nov 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ruthgn (2021). Wine Quality Data Set (Red & White Wine) [Dataset]. https://www.kaggle.com/datasets/ruthgn/wine-quality-data-set-red-white-wine
Explore at:
zip(100361 bytes)Available download formats
Dataset updated
Nov 3, 2021
Authors
ruthgn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Set Information

This data set contains records related to red and white variants of the Portuguese Vinho Verde wine. It contains information from 1599 red wine samples and 4898 white wine samples. Input variables in the data set consist of the type of wine (either red or white wine) and metrics from objective tests (e.g. acidity levels, PH values, ABV, etc.), while the target/output variable is a numerical score based on sensory data—median of at least 3 evaluations made by wine experts. Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Due to privacy and logistic issues, there is no data about grape types, wine brand, and wine selling price.

This data set is a combined version of the two separate files (distinct red and white wine data sets) originally shared in the UCI Machine Learning Repository.

The following are some existing data sets on Kaggle from the same source (with notable differences from this data set): - Red Wine Quality (contains red wine data only) - Wine Quality (combination of red and white wine data but with some values randomly removed) - Wine Quality (red and white wine data not combined)

Contents

Input variables:

1 - type of wine: type of wine (categorical: 'red', 'white')

(continuous variables based on physicochemical tests)

2 - fixed acidity: The acids that naturally occur in the grapes used to ferment the wine and carry over into the wine. They mostly consist of tartaric, malic, citric or succinic acid that mostly originate from the grapes used to ferment the wine. They also do not evaporate easily. (g / dm^3)

3 - volatile acidity: Acids that evaporate at low temperatures—mainly acetic acid which can lead to an unpleasant, vinegar-like taste at very high levels. (g / dm^3)

4 - citric acid: Citric acid is used as an acid supplement which boosts the acidity of the wine. It's typically found in small quantities and can add 'freshness' and flavor to wines. (g / dm^3)

5 - residual sugar: The amount of sugar remaining after fermentation stops. It's rare to find wines with less than 1 gram/liter. Wines residual sugar level greater than 45 grams/liter are considered sweet. On the other end of the spectrum, a wine that does not taste sweet is considered as dry. (g / dm^3)

6 - chlorides: The amount of chloride salts (sodium chloride) present in the wine. (g / dm^3)

7 - free sulfur dioxide: The free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine. All else constant, the higher the free sulfur dioxide content, the stronger the preservative effect. (mg / dm^3)

8 - total sulfur dioxide: The amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine. (mg / dm^3)

9 - density: The density of wine juice depending on the percent alcohol and sugar content; it's typically similar but higher than that of water (wine is 'thicker'). (g / cm^3)

10 - pH: A measure of the acidity of wine; most wines are between 3-4 on the pH scale. The lower the pH, the more acidic the wine is; the higher the pH, the less acidic the wine. (The pH scale technically is a logarithmic scale that measures the concentration of free hydrogen ions floating around in your wine. Each point of the pH scale is a factor of 10. This means a wine with a pH of 3 is 10 times more acidic than a wine with a pH of 4)

11 - sulphates: Amount of potassium sulphate as a wine additive which can contribute to sulfur dioxide gas (S02) levels; it acts as an antimicrobial and antioxidant agent.(g / dm3)

12 - alcohol: How much alcohol is contained in a given volume of wine (ABV). Wine generally contains between 5–15% of alcohols. (% by volume)

Output variable:

13 - quality: score between 0 (very bad) and 10 (very excellent) by wine experts

Acknowledgements

Source: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Data credit goes to UCI. Visit their website to access the original data set directly: https://archive.ics.uci.edu/ml/datasets/wine+quality

Context

So much about wine making remains elusive—taste is very subjective, making it extremely challenging to predict exactly how consumers will react to a certain bottle of wine. There is no doubt that winemakers, connoisseurs, and scientists have greatly contributed their expertise to ...
f
Supporting information – data for plots.
figshare.com
xlsx
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Nadinskaia; Kseniya Gulyaeva; Aleksandr Sukhinin; Alla Sedova; Polina Boykova; Ilya Izmailov; Ksenia Pokidova; Egor Kuzmin; Artem Venediktov; Igor Meglinski; Gennadii Piavchenko (2025). Supporting information – data for plots. [Dataset]. http://doi.org/10.1371/journal.pone.0337178.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0337178.s001
Dataset updated
Nov 21, 2025
Dataset provided by
PLOS ONE
Authors
Maria Nadinskaia; Kseniya Gulyaeva; Aleksandr Sukhinin; Alla Sedova; Polina Boykova; Ilya Izmailov; Ksenia Pokidova; Egor Kuzmin; Artem Venediktov; Igor Meglinski; Gennadii Piavchenko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file provides the numerical data from the experimental study. Sheet “Blood ammonia” includes data on ammonia measurement with post hoc sample size re-estimation with respect to these results. Sheet “Mass or organs” represents data on weighing the liver and spleen immediately after the animal sacrification with post hoc sample size re-estimation for this parameter. Sheet “Morphometric values” shows the means of numerical data for the records of cell count in morphometric study at histological slides of skeletal muscles, spleen, and cerebral cortex. Sheet “Pilot, sample size estimation” provides details about initial calculations for primary definition of the sample size. (XLSX)
Population by current activity status, educational attainment level and NUTS...
ec.europa.eu
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eurostat (2025). Population by current activity status, educational attainment level and NUTS 2 region [Dataset]. http://doi.org/10.2908/CENS_11AED_R2
Explore at:
application/vnd.sdmx.genericdata+xml;version=2.1, tsv, json, application/vnd.sdmx.data+xml;version=3.0.0, application/vnd.sdmx.data+csv;version=1.0.0, application/vnd.sdmx.data+csv;version=2.0.0Available download formats
Unique identifier
https://doi.org/10.2908/CENS_11AED_R2
Dataset updated
Nov 26, 2025
Dataset authored and provided by
Eurostathttps://ec.europa.eu/eurostat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2011
Area covered
Sachsen-Anhalt, Övre Norrland, Severna i Yugoiztochna Bulgaria, Hrvatska, Liechtenstein, Merseyside (NUTS 2021), Südösterreich, Alentejo (NUTS 2021), European Free Trade Association, Noord-Holland
Description
The 2011 Population and Housing Census marks a milestone in census exercises in Europe. For the first time, European legislation defined in detail a set of harmonised high-quality data from the population and housing censuses conducted in the EU Member States. As a result, the data from the 2011 round of censuses offer exceptional flexibility to cross-tabulate different variables and to provide geographically detailed data.

EU Member States have developed different methods to produce these census data. The national differences reflect the specific national situations in terms of data source availability, as well as the administrative practices and traditions of that country.

The EU census legislation respects this diversity. The Regulation of the European Parliament and of the Council on population and housing censuses (Regulation (EC) No 763/2008) is focussed on output harmonisation rather than input harmonisation. Member States are free to assess for themselves how to conduct their 2011 censuses and which data sources, methods and technology should be applied given the national context. This gives the Member States flexibility, in line with the principles of subsidiarity and efficiency, and with the competences of the statistical institutes in the Member States.

However, certain important conditions must be met in order to achieve the objective of comparability of census data from different Member States and to assess the data quality:

Regulation (EC) No 1201/20092 contains definitions and technical specifications for the census topics (variables) and their breakdowns that are required to achieve Europe-wide comparability.

The specifications are based closely on international recommendations and have been designed to provide the best possible information value. The census topics include geographic, demographic, economic and educational characteristics of persons, international and internal migration characteristics as well as household, family and housing characteristics.

Regulation (EU) No 519/2010 requires the data outputs that Member States transmit to the Eurostat to comply with a defined programme of statistical data (tabulation) and with set rules concerning the replacement of statistical data. The content of the EU census programme serves major policy needs of the European Union. Regionally, there is a strong focus on the NUTS 2 level. The data requirements are adapted to the level of regional detail. The Regulation does not require transmission of any data that the Member States consider to be confidential.

The statistical data must be completed by metadata that will facilitate interpretation of the numerical data, including country-specific definitions plus information on the data sources and on methodological issues. This is necessary in order to achieve the transparency that is a condition for valid interpretation of the data.

Users of output-harmonised census data from the EU Member States need to have detailed information on the quality of the censuses and their results.

Regulation (EU) No 1151/2010) therefore requires transmission of a quality report containing a systematic description of the data sources used for census purposes in the Member States and of the quality of the census results produced from these sources. A comparably structured quality report for all EU Member States will support the exchange of experience from the 2011 round and become a reference for future development of census methodology (EU legislation on the 2011 Population and Housing Censuses - Explanatory Notes ).

In order to ensure proper transmission of the data and metadata and provide user-friendly access to this information, a common technical format is set for transmission for all Member States and for the Commission (Eurostat). The Regulation therefore requires the data to be transmitted in a harmonised structure and in the internationally established SDMX format from every Member State. In order to achieve this harmonised transmission, a new system has been developed – the CENSUS HUB.

The Census Hub is a conceptually new system used for the dissemination of the 2011 Census. It is based on the concept of data sharing, where a group of partners (Eurostat on one hand and National Statistical Institutes on the other) agree to provide access to their data according to standard processes, formats and technologies.

The Census Hub is a readily-accessible system that provided the following functions:

Data providers (the NSIs) can make data available directly from their systems through a querying system. In parallel,

Data users browse the hub to define a dataset of interest via the above structural metadata and retrieve the dataset from the NSIs.

From the data management point of view, the hub is based on agreed hypercubes (data-sets in the form of multi-dimensional aggregations). The hypercubes are not sent to the central system. Instead the following process operates:

a user defines a dataset through the web interface of the central hub and requests it;

the central hub translates the user request in one or more queries and sends them to the related NSIs’ systems;

NSIs’ systems process the query and send the result to the central hub in a standard format;

the central hub puts together all the results sent by the NSI systems and presents them in a user-specified format.
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Population by group of country of birth, current activity status and NUTS 2...
ec.europa.eu
Updated Oct 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eurostat (2025). Population by group of country of birth, current activity status and NUTS 2 region [Dataset]. http://doi.org/10.2908/CENS_11COBA_R2
Explore at:
application/vnd.sdmx.data+csv;version=2.0.0, json, application/vnd.sdmx.data+xml;version=3.0.0, application/vnd.sdmx.data+csv;version=1.0.0, application/vnd.sdmx.genericdata+xml;version=2.1, tsvAvailable download formats
Unique identifier
https://doi.org/10.2908/CENS_11COBA_R2
Dataset updated
Oct 10, 2025
Dataset authored and provided by
Eurostathttps://ec.europa.eu/eurostat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2011
Area covered
Macroregiunea Trei, Cumbria (NUTS 2021), Nordjylland, Västsverige, Sud-Ouest (FR) (NUTS 2013), Noord-Holland, Severozapaden Planning Region, Comunidad Foral de Navarra, Calabria, Prov. West-Vlaanderen
Description
The 2011 Population and Housing Census marks a milestone in census exercises in Europe. For the first time, European legislation defined in detail a set of harmonised high-quality data from the population and housing censuses conducted in the EU Member States. As a result, the data from the 2011 round of censuses offer exceptional flexibility to cross-tabulate different variables and to provide geographically detailed data.

EU Member States have developed different methods to produce these census data. The national differences reflect the specific national situations in terms of data source availability, as well as the administrative practices and traditions of that country.

The EU census legislation respects this diversity. The Regulation of the European Parliament and of the Council on population and housing censuses (Regulation (EC) No 763/2008) is focussed on output harmonisation rather than input harmonisation. Member States are free to assess for themselves how to conduct their 2011 censuses and which data sources, methods and technology should be applied given the national context. This gives the Member States flexibility, in line with the principles of subsidiarity and efficiency, and with the competences of the statistical institutes in the Member States.

However, certain important conditions must be met in order to achieve the objective of comparability of census data from different Member States and to assess the data quality:

Regulation (EC) No 1201/20092 contains definitions and technical specifications for the census topics (variables) and their breakdowns that are required to achieve Europe-wide comparability.

The specifications are based closely on international recommendations and have been designed to provide the best possible information value. The census topics include geographic, demographic, economic and educational characteristics of persons, international and internal migration characteristics as well as household, family and housing characteristics.

Regulation (EU) No 519/2010 requires the data outputs that Member States transmit to the Eurostat to comply with a defined programme of statistical data (tabulation) and with set rules concerning the replacement of statistical data. The content of the EU census programme serves major policy needs of the European Union. Regionally, there is a strong focus on the NUTS 2 level. The data requirements are adapted to the level of regional detail. The Regulation does not require transmission of any data that the Member States consider to be confidential.

The statistical data must be completed by metadata that will facilitate interpretation of the numerical data, including country-specific definitions plus information on the data sources and on methodological issues. This is necessary in order to achieve the transparency that is a condition for valid interpretation of the data.

Users of output-harmonised census data from the EU Member States need to have detailed information on the quality of the censuses and their results.

Regulation (EU) No 1151/2010) therefore requires transmission of a quality report containing a systematic description of the data sources used for census purposes in the Member States and of the quality of the census results produced from these sources. A comparably structured quality report for all EU Member States will support the exchange of experience from the 2011 round and become a reference for future development of census methodology (EU legislation on the 2011 Population and Housing Censuses - Explanatory Notes ).

In order to ensure proper transmission of the data and metadata and provide user-friendly access to this information, a common technical format is set for transmission for all Member States and for the Commission (Eurostat). The Regulation therefore requires the data to be transmitted in a harmonised structure and in the internationally established SDMX format from every Member State. In order to achieve this harmonised transmission, a new system has been developed – the CENSUS HUB.

The Census Hub is a conceptually new system used for the dissemination of the 2011 Census. It is based on the concept of data sharing, where a group of partners (Eurostat on one hand and National Statistical Institutes on the other) agree to provide access to their data according to standard processes, formats and technologies.

The Census Hub is a readily-accessible system that provided the following functions:

Data providers (the NSIs) can make data available directly from their systems through a querying system. In parallel,

Data users browse the hub to define a dataset of interest via the above structural metadata and retrieve the dataset from the NSIs.

From the data management point of view, the hub is based on agreed hypercubes (data-sets in the form of multi-dimensional aggregations). The hypercubes are not sent to the central system. Instead the following process operates:

a user defines a dataset through the web interface of the central hub and requests it;

the central hub translates the user request in one or more queries and sends them to the related NSIs’ systems;

NSIs’ systems process the query and send the result to the central hub in a standard format;

the central hub puts together all the results sent by the NSI systems and presents them in a user-specified format.
Population by current activity status, NACE Rev. 2 activity and NUTS 2...
ec.europa.eu
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eurostat, Population by current activity status, NACE Rev. 2 activity and NUTS 2 region [Dataset]. http://doi.org/10.2908/CENS_11AN_R2
Explore at:
application/vnd.sdmx.data+xml;version=3.0.0, application/vnd.sdmx.data+csv;version=1.0.0, json, application/vnd.sdmx.genericdata+xml;version=2.1, application/vnd.sdmx.data+csv;version=2.0.0, tsvAvailable download formats
Unique identifier
https://doi.org/10.2908/CENS_11AN_R2
Dataset authored and provided by
Eurostathttps://ec.europa.eu/eurostat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2011
Area covered
Comunidad Foral de Navarra, Podkarpackie (NUTS 2013), Ciudad de Ceuta, Scotland (NUTS 2021), Principado de Asturias, Syddanmark, Östra Sverige, Saarland, Macroregiunea Patru, Dytiki Makedonia (NUTS 2010)
Description
The 2011 Population and Housing Census marks a milestone in census exercises in Europe. For the first time, European legislation defined in detail a set of harmonised high-quality data from the population and housing censuses conducted in the EU Member States. As a result, the data from the 2011 round of censuses offer exceptional flexibility to cross-tabulate different variables and to provide geographically detailed data.

EU Member States have developed different methods to produce these census data. The national differences reflect the specific national situations in terms of data source availability, as well as the administrative practices and traditions of that country.

The EU census legislation respects this diversity. The Regulation of the European Parliament and of the Council on population and housing censuses (Regulation (EC) No 763/2008) is focussed on output harmonisation rather than input harmonisation. Member States are free to assess for themselves how to conduct their 2011 censuses and which data sources, methods and technology should be applied given the national context. This gives the Member States flexibility, in line with the principles of subsidiarity and efficiency, and with the competences of the statistical institutes in the Member States.

However, certain important conditions must be met in order to achieve the objective of comparability of census data from different Member States and to assess the data quality:

Regulation (EC) No 1201/20092 contains definitions and technical specifications for the census topics (variables) and their breakdowns that are required to achieve Europe-wide comparability.

The specifications are based closely on international recommendations and have been designed to provide the best possible information value. The census topics include geographic, demographic, economic and educational characteristics of persons, international and internal migration characteristics as well as household, family and housing characteristics.

Regulation (EU) No 519/2010 requires the data outputs that Member States transmit to the Eurostat to comply with a defined programme of statistical data (tabulation) and with set rules concerning the replacement of statistical data. The content of the EU census programme serves major policy needs of the European Union. Regionally, there is a strong focus on the NUTS 2 level. The data requirements are adapted to the level of regional detail. The Regulation does not require transmission of any data that the Member States consider to be confidential.

The statistical data must be completed by metadata that will facilitate interpretation of the numerical data, including country-specific definitions plus information on the data sources and on methodological issues. This is necessary in order to achieve the transparency that is a condition for valid interpretation of the data.

Users of output-harmonised census data from the EU Member States need to have detailed information on the quality of the censuses and their results.

Regulation (EU) No 1151/2010) therefore requires transmission of a quality report containing a systematic description of the data sources used for census purposes in the Member States and of the quality of the census results produced from these sources. A comparably structured quality report for all EU Member States will support the exchange of experience from the 2011 round and become a reference for future development of census methodology (EU legislation on the 2011 Population and Housing Censuses - Explanatory Notes ).

In order to ensure proper transmission of the data and metadata and provide user-friendly access to this information, a common technical format is set for transmission for all Member States and for the Commission (Eurostat). The Regulation therefore requires the data to be transmitted in a harmonised structure and in the internationally established SDMX format from every Member State. In order to achieve this harmonised transmission, a new system has been developed – the CENSUS HUB.

The Census Hub is a conceptually new system used for the dissemination of the 2011 Census. It is based on the concept of data sharing, where a group of partners (Eurostat on one hand and National Statistical Institutes on the other) agree to provide access to their data according to standard processes, formats and technologies.

The Census Hub is a readily-accessible system that provided the following functions:

Data providers (the NSIs) can make data available directly from their systems through a querying system. In parallel,

Data users browse the hub to define a dataset of interest via the above structural metadata and retrieve the dataset from the NSIs.

From the data management point of view, the hub is based on agreed hypercubes (data-sets in the form of multi-dimensional aggregations). The hypercubes are not sent to the central system. Instead the following process operates:

a user defines a dataset through the web interface of the central hub and requests it;

the central hub translates the user request in one or more queries and sends them to the related NSIs’ systems;

NSIs’ systems process the query and send the result to the central hub in a standard format;

the central hub puts together all the results sent by the NSI systems and presents them in a user-specified format.
Monthly Global Temperature Projections 2040-2069
climatedataportal.metoffice.gov.uk
Updated Aug 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Met Office (2022). Monthly Global Temperature Projections 2040-2069 [Dataset]. https://climatedataportal.metoffice.gov.uk/datasets/monthly-global-temperature-projections-2040-2069
Explore at:
Dataset updated
Aug 23, 2022
Dataset authored and provided by
Met Officehttp://www.metoffice.gov.uk/
Area covered
Description
What does the data show?

This data shows the monthly averages of surface temperature (°C) for 2040-2069 using a combination of the CRU TS (v. 4.06) and UKCP18 global RCP2.6 datasets. The RCP2.6 scenario is an aggressive mitigation scenario where greenhouse gas emissions are strongly reduced.

The data combines a baseline (1981-2010) value from CRU TS (v. 4.06) with an anomaly from UKCP18 global. Where the anomaly is the change in temperature at 2040-2069 relative to 1981-2010.

The data is provided on the WGS84 grid which measures approximately 60km x 60km (latitude x longitude) at the equator.

Limitations of the data

We recommend the use of multiple grid cells or an average of grid cells around a point of interest to help users get a sense of the variability in the area. This will provide a more robust set of values for informing decisions based on the data.

What are the naming conventions and how do I explore the data?

This data contains a field for each month’s average over the period. They are named 'tas' (temperature at surface), the month and ‘upper’ ‘median’ or ‘lower’. E.g. ‘tas Mar Lower’ is the average of the daily average temperatures in March throughout 2040-2069, in the second lowest ensemble member.

To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578

Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tas Jan Median’ values.

What do the ‘median’, ‘upper’, and ‘lower’ values mean?

Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.

To select which ensemble members to use, the monthly averages of surface temperature for the period 2040-2069 were calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.

The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.

This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.

Data source

CRU TS v. 4.06 - (downloaded 12/07/22)

UKCP18 v.20200110 (downloaded 17/08/22)

Useful links

Further information on CRU TS Further information on the UK Climate Projections (UKCP) Further information on understanding climate data within the Met Office Climate Data Portal
Monthly Global Min Temperature Projections 2070-2099
climatedataportal.metoffice.gov.uk
Updated Aug 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Met Office (2022). Monthly Global Min Temperature Projections 2070-2099 [Dataset]. https://climatedataportal.metoffice.gov.uk/maps/TheMetOffice::monthly-global-min-temperature-projections-2070-2099
Explore at:
Dataset updated
Aug 23, 2022
Dataset authored and provided by
Met Officehttp://www.metoffice.gov.uk/
Area covered
Description
What does the data show?

This data shows the monthly averages of minimum surface temperature (°C) for 2070-2099 using a combination of the CRU TS (v. 4.06) and UKCP18 global RCP2.6 datasets. The RCP2.6 scenario is an aggressive mitigation scenario where greenhouse gas emissions are strongly reduced.

The data combines a baseline (1981-2010) value from CRU TS (v. 4.06) with an anomaly from UKCP18 global. Where the anomaly is the change in temperature at 2070-2099 relative to 1981-2010.

The data is provided on the WGS84 grid which measures approximately 60km x 60km (latitude x longitude) at the equator.

Limitations of the data

We recommend the use of multiple grid cells or an average of grid cells around a point of interest to help users get a sense of the variability in the area. This will provide a more robust set of values for informing decisions based on the data.

What are the naming conventions and how do I explore the data?

This data contains a field for each month’s average over the period. They are named 'tmin' (temperature minimum), the month and ‘upper’ ‘median’ or ‘lower’. E.g. ‘tmin Mar Lower’ is the average of the daily minimum temperatures in March throughout 2070-2099, in the second lowest ensemble member.

To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578

Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tmin Jan Median’ values.

What do the ‘median’, ‘upper’, and ‘lower’ values mean?

Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.

To select which ensemble members to use, the monthly averages of minimum surface temperature for the period 2070-2099 were calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.

The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.

This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.

Data source

CRU TS v. 4.06 - (downloaded 12/07/22)

UKCP18 v.20200110 (downloaded 17/08/22)

Useful links

Further information on CRU TS Further information on the UK Climate Projections (UKCP) Further information on understanding climate data within the Met Office Climate Data Portal
Z
TrackML Particle Tracking Challenge
data.niaid.nih.gov
Updated May 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salzburger, Andreas; Innocente, Vincenzo; vlimant, jean-roch; rousseau, David; Gligorov, Vladimir; Estrade, Victor; Basara, Laurent; Calafiura, Paolo; Farell, Steven; Gray, Heather; Golling, Tobias; Kiehn, Moritz; Amrouche, Sabrina; Ustyuzhanin, Andrey; Hushchyn, Mikhail; Moyse, Edward; Germain, Cecile; Guyon, Isabelle (2021). TrackML Particle Tracking Challenge [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4730166
Explore at:
Dataset updated
May 10, 2021
Dataset provided by
CERN
California Institute of Technology
INRIA
LBNL
School of Data Analysis
University of Massachusetts
CNRS
University of Geneva
Authors
Salzburger, Andreas; Innocente, Vincenzo; vlimant, jean-roch; rousseau, David; Gligorov, Vladimir; Estrade, Victor; Basara, Laurent; Calafiura, Paolo; Farell, Steven; Gray, Heather; Golling, Tobias; Kiehn, Moritz; Amrouche, Sabrina; Ustyuzhanin, Andrey; Hushchyn, Mikhail; Moyse, Edward; Germain, Cecile; Guyon, Isabelle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Original source from Kaggle : https://www.kaggle.com/c/trackml-particle-identification/data

The dataset comprises multiple independent events, where each event contains simulated measurements (essentially 3D points) of particles generated in a collision between proton bunches at the Large Hadron Collider at CERN. The goal of the tracking machine learning challenge is to group the recorded measurements or hits for each event into tracks, sets of hits that belong to the same initial particle. A solution must uniquely associate each hit to one track. The training dataset contains the recorded hits, their ground truth counterpart and their association to particles, and the initial parameters of those particles. The test dataset contains only the recorded hits.

Once unzipped, the dataset is provided as a set of plain .csv files. Each event has four associated files that contain hits, hit cells, particles, and the ground truth association between them. The common prefix, e.g. event000000010, is always event followed by 9 digits.

event000000000-hits.csv

event000000000-cells.csv

event000000000-particles.csv

event000000000-truth.csv

event000000001-hits.csv

event000000001-cells.csv

event000000001-particles.csv

event000000001-truth.csv

Event hits

The hits file contains the following values for each hit/entry:

hit_id: numerical identifier of the hit inside the event.

x, y, z: measured x, y, z position (in millimeter) of the hit in global coordinates.

volume_id: numerical identifier of the detector group.

layer_id: numerical identifier of the detector layer inside the group.

module_id: numerical identifier of the detector module inside the layer.

The volume/layer/module id could in principle be deduced from x, y, z. They are given here to simplify detector-specific data handling.

Event truth

The truth file contains the mapping between hits and generating particles and the true particle state at each measured hit. Each entry maps one hit to one particle.

hit_id: numerical identifier of the hit as defined in the hits file.

particle_id: numerical identifier of the generating particle as defined in the particles file. A value of 0 means that the hit did not originate from a reconstructible particle, but e.g. from detector noise.

tx, ty, tz true intersection point in global coordinates (in millimeters) between the particle trajectory and the sensitive surface.

tpx, tpy, tpz true particle momentum (in GeV/c) in the global coordinate system at the intersection point. The corresponding vector is tangent to the particle trajectory at the intersection point.

weight per-hit weight used for the scoring metric; total sum of weights within one event equals to one.

Event particles

The particles files contains the following values for each particle/entry:

particle_id: numerical identifier of the particle inside the event.

vx, vy, vz: initial position or vertex (in millimeters) in global coordinates.

px, py, pz: initial momentum (in GeV/c) along each global axis.

q: particle charge (as multiple of the absolute electron charge).

nhits: number of hits generated by this particle.

All entries contain the generated information or ground truth.

Event hit cells

The cells file contains the constituent active detector cells that comprise each hit. The cells can be used to refine the hit to track association. A cell is the smallest granularity inside each detector module, much like a pixel on a screen, except that depending on the volume_id a cell can be a square or a long rectangle. It is identified by two channel identifiers that are unique within each detector module and encode the position, much like column/row numbers of a matrix. A cell can provide signal information that the detector module has recorded in addition to the position. Depending on the detector type only one of the channel identifiers is valid, e.g. for the strip detectors, and the value might have different resolution.

hit_id: numerical identifier of the hit as defined in the hits file.

ch0, ch1: channel identifier/coordinates unique within one module.

value: signal value information, e.g. how much charge a particle has deposited.

Additional detector geometry information

The detector is built from silicon slabs (or modules, rectangular or trapezoïdal), arranged in cylinders and disks, which measure the position (or hits) of the particles that cross them. The detector modules are organized into detector groups or volumes identified by a volume id. Inside a volume they are further grouped into layers identified by a layer id. Each layer can contain an arbitrary number of detector modules, the smallest geometrically distinct detector object, each identified by a module_id. Within each group, detector modules are of the same type have e.g. the same granularity. All simulated detector modules are so-called semiconductor sensors that are build from thin silicon sensor chips. Each module can be represented by a two-dimensional, planar, bounded sensitive surface. These sensitive surfaces are subdivided into regular grids that define the detectors cells, the smallest granularity within the detector.

Each module has a different position and orientation described in the detectors file. A local, right-handed coordinate system is defined on each sensitive surface such that the first two coordinates u and v are on the sensitive surface and the third coordinate w is normal to the surface. The orientation and position are defined by the following transformation

pos_xyz = rotation_matrix * pos_uvw + translation

that transform a position described in local coordinates u,v,w into the equivalent position x,y,z in global coordinates using a rotation matrix and and translation vector (cx,cy,cz).

volume_id: numerical identifier of the detector group.

layer_id: numerical identifier of the detector layer inside the group.

module_id: numerical identifier of the detector module inside the layer.

cx, cy, cz: position of the local origin in the global coordinate system (in millimeter).

rot_xu, rot_xv, rot_xw, rot_yu, ...: components of the rotation matrix to rotate from local u,v,w to global x,y,z coordinates.

module_t: half thickness of the detector module (in millimeter).

module_minhu, module_maxhu: the minimum/maximum half-length of the module boundary along the local u direction (in millimeter).

module_hv: the half-length of the module boundary along the local v direction (in millimeter).

pitch_u, pitch_v: the size of detector cells along the local u and v direction (in millimeter).

There are two different module shapes in the detector, rectangular and trapezoidal. The pixel detector ( with volume_id = 7,8,9) is fully built from rectangular modules, and so are the cylindrical barrels in volume_id=13,17. The remaining layers are made out disks that need trapezoidal shapes to cover the full disk.
GSM8K - Grade School Math 8K Q&A
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
Explore at:
zip(3418660 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

By Huggingface Hub [source]

About this dataset

This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

Research Ideas

Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.

Generating new grade school math questions and answers using g...
Estimation performance of the adopted oscillator ensemble models applied on...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajdeep Dutta; Siyu Isaac Parker Tian; Zhe Liu; Madhavkrishnan Lakshminarayanan; Selvaraj Venkataraj; Yuanhang Cheng; Daniil Bash; Vijila Chellappan; Tonio Buonassisi; Senthilnath Jayavelu (2023). Estimation performance of the adopted oscillator ensemble models applied on 100 samples of type A films, picked randomly from the fully-synthetic data set. [Dataset]. http://doi.org/10.1371/journal.pone.0276555.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276555.t003
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Rajdeep Dutta; Siyu Isaac Parker Tian; Zhe Liu; Madhavkrishnan Lakshminarayanan; Selvaraj Venkataraj; Yuanhang Cheng; Daniil Bash; Vijila Chellappan; Tonio Buonassisi; Senthilnath Jayavelu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
EEd: R2-score between original and estimated thickness values, mEEn: median of R2-scores between original and estimated n(λ) arrays; mEEk: median of R2-scores between original and estimated k(λ) arrays; mEER: mean of R2-scores between original and estimated R(λ) arrays; mEET: mean of R2-scores between original and estimated T(λ) arrays; SR: success rate = number of successful occasions (nss)/ total occasions (ns); and mFE: average number of function evaluations: ; and mFE denotes the average number of function evaluations (iterations × population size).

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2019). Modern value of basic Roman numerals [Dataset]. https://www.statista.com/statistics/1046921/modern-value-basic-roman-numerals/

Modern value of basic Roman numerals

Explore at:

Dataset updated

Sep 4, 2019

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

Roman Empire, Europe

Description

Although there are thousands of languages and over a hundred alphabets in use in the world today, most cultures use the same system for displaying numbers. This system is the Arabic numeral system, also called Hindu-Arabic numerals, which uses ten digits (1, 2, 3, 4, 5, 6, 7, 8, 9 and 0) to display numerical values. This system was developed by Indian mathematicians over 1,500 years ago, and it spread west through Persian and Arab cultures. The work of the Italian mathematician Fibonacci made this system known in Europe, and it proved popular with traders as it was much easier to use when making basic calculations. This system then went global through sixteenth century colonialism and globalization, replacing many of the previously-existing numerical systems, such as the Babylonian and Mayan systems (now considered complicated to use when making calculations). Roman Numerals Another system that was replaced by Arabic numerals was the system of Roman numerals. This system was used from as early as 500BCE, until the late-middle ages (which ended around 1500CE). Similarly to the Arabic system, the Roman system used a set number of digits in different combinations to represent different numerical values. The Roman system uses seven individual digits as the foundation of all numbers. The Roman digits with Arabic values are; I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, and M = 1,000. Combinations The major difference between Roman and Arabic numerals is the way in which the digits are combined to create different numbers. In Roman numerology, the display of larger numbers requires more individual digits than in Arabic numerology, making it more complicated. In modern mathematics, the value of a digit depends on its position related to the decimal point, whereas in Roman numerals the digit I can come before or after other digits to show if it is more or less than the other numerals, which means that the digit with the lowest value is not always the furthest to the right (for example, IX = 9, whereas XI = 11). If one of the basic numerals is shown with a line above it, this means that the number has been multiplied by a thousand, for example, X with a line above it is equal to 10,000.

Clear search

Close search

Google apps

Main menu

Modern value of basic Roman numerals

UV-Index in Münster | gimi9.com

Conceptualizing relationships among hyporheic exchange, storage, and water...

CANCER DATA - Dataset - CKAN

Data from: Black Friday Sale

Dataset History

TrackML Throughput Phase

ERA5 monthly averaged data on single levels from 1940 to present

Vocalizations of the squirrel family

2000-2015: Monthly Means of Daily Means Wind Speed TIFFs for the Lake...

Wine Quality Data Set (Red & White Wine)

Data Set Information

Contents

Acknowledgements

Context

Supporting information – data for plots.

Population by current activity status, educational attainment level and NUTS...

Dataset of development of business during the COVID-19 crisis

Population by group of country of birth, current activity status and NUTS 2...

Population by current activity status, NACE Rev. 2 activity and NUTS 2...

Monthly Global Temperature Projections 2040-2069

Monthly Global Min Temperature Projections 2070-2099

TrackML Particle Tracking Challenge

GSM8K - Grade School Math 8K Q&A

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Estimation performance of the adopted oscillator ensemble models applied on...

Modern value of basic Roman numerals