83 datasets found

n
Data from: Selection of Pairings Reaching Evenly Across the Data (SPREAD): a...
data.niaid.nih.gov
datadryad.org
zip
Updated Aug 20, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kolea Zimmerman; Daniel Levitis; Ethan Addicott; Anne Pringle (2015). Selection of Pairings Reaching Evenly Across the Data (SPREAD): a simple algorithm to design maximally informative fully crossed mating experiments [Dataset]. http://doi.org/10.5061/dryad.8br20
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8br20
Dataset updated
Aug 20, 2015
Dataset provided by
Bates College
University of Wisconsin–Madison
Harvard University
Authors
Kolea Zimmerman; Daniel Levitis; Ethan Addicott; Anne Pringle
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
We present a novel algorithm for the design of crossing experiments. The algorithm identifies a set of individuals (a "crossing-set") from a larger pool of potential crossing-sets by maximizing the diversity of traits of interest, for example, maximizing the range of genetic and geographic distances between individuals included in the crossing-set. To calculate diversity, we use the mean nearest neighbor distance of crosses plotted in trait space. We implement our algorithm on a real dataset of Neurospora crassa strains, using the genetic and geographic distances between potential crosses as a two-dimensional trait space. In simulated mating experiments, crossing-sets selected by our algorithm provide better estimates of underlying parameter values than randomly chosen crossing-sets.

E-commerce Sales Prediction Dataset

kaggle.com

Updated Dec 14, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Nevil Dhinoja (2024). E-commerce Sales Prediction Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10197264

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/10197264

Dataset updated

Dec 14, 2024

Dataset provided by

Kaggle

Authors

Nevil Dhinoja

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

E-commerce Sales Prediction Dataset

This repository contains a comprehensive and clean dataset for predicting e-commerce sales, tailored for data scientists, machine learning enthusiasts, and researchers. The dataset is crafted to analyze sales trends, optimize pricing strategies, and develop predictive models for sales forecasting.

📂 Dataset Overview

The dataset includes 1,000 records across the following features:

Column Name	Description
Date	The date of the sale (01-01-2023 onward).
Product_Category	Category of the product (e.g., Electronics, Sports, Other).
Price	Price of the product (numerical).
Discount	Discount applied to the product (numerical).
Customer_Segment	Buyer segment (e.g., Regular, Occasional, Other).
Marketing_Spend	Marketing budget allocated for sales (numerical).
Units_Sold	Number of units sold per transaction (numerical).

📊 Data Summary

General Properties

Date: - Range: 01-01-2023 to 12-31-2023. - Contains 1,000 unique values without missing data.

Product_Category: - Categories: Electronics (21%), Sports (21%), Other (58%). - Most common category: Electronics (21%).

Price: - Range: From 244 to 999. - Mean: 505, Standard Deviation: 290. - Most common price range: 14.59 - 113.07.

Discount: - Range: From 0.01% to 49.92%. - Mean: 24.9%, Standard Deviation: 14.4%. - Most common discount range: 0.01 - 5.00%.

Customer_Segment: - Segments: Regular (35%), Occasional (34%), Other (31%). - Most common segment: Regular.

Marketing_Spend: - Range: From 2.41k to 10k. - Mean: 4.91k, Standard Deviation: 2.84k.

Units_Sold: - Range: From 5 to 57. - Mean: 29.6, Standard Deviation: 7.26. - Most common range: 24 - 34 units sold.

📈 Data Visualizations

The dataset is suitable for creating the following visualizations: - 1. Price Distribution: Histogram to show the spread of prices. - 2. Discount Distribution: Histogram to analyze promotional offers. - 3. Marketing Spend Distribution: Histogram to understand marketing investment patterns. - 4. Customer Segment Distribution: Bar plot of customer segments. - 5. Price vs Units Sold: Scatter plot to show pricing effects on sales. - 6. Discount vs Units Sold: Scatter plot to explore the impact of discounts. - 7. Marketing Spend vs Units Sold: Scatter plot for marketing effectiveness. - 8. Correlation Heatmap: Identify relationships between features. - 9. Pairplot: Visualize pairwise feature interactions.

💡 How the Data Was Created

The dataset is synthetically generated to mimic realistic e-commerce sales trends. Below are the steps taken for data generation:

Feature Engineering:
- Identified key attributes such as product category, price, discount, and marketing spend, typically observed in e-commerce data.
- Generated dependent features like units sold based on logical relationships.
Data Simulation:
- Python Libraries: Used NumPy and Pandas to generate and distribute values.
- Statistical Modeling: Ensured feature distributions aligned with real-world sales data patterns.
Validation:
- Verified data consistency with no missing or invalid values.
- Ensured logical correlations (e.g., higher discounts → increased units sold).

Note: The dataset is synthetic and not sourced from any real-world e-commerce platform.

🛠 Example Usage: Sales Prediction Model

Here’s an example of building a predictive model using Linear Regression:

Written in python

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
df = pd.read_csv('ecommerce_sales.csv')

# Feature selection
X = df[['Price', 'Discount', 'Marketing_Spend']]
y = df['Units_Sold']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')

i
1995 IFREMER Cartopep Acoustic Survey data - Habitat map
gis.ices.dk
ogc:wfs, ogc:wms +1
Updated Jan 20, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IFREMER (2014). 1995 IFREMER Cartopep Acoustic Survey data - Habitat map [Dataset]. https://gis.ices.dk/geonetwork/srv/api/records/b92bf5a1-10bf-4818-a96c-e248da106372
Explore at:
ogc:wfs, www:link-1.0-http--link, ogc:wmsAvailable download formats
Dataset updated
Jan 20, 2014
Dataset provided by
Joint Nature Conservation Committee
Authors
IFREMER
Time period covered
Jan 1, 2014 - Jun 1, 2014
Area covered

Description
Interpretation of Multibeam Bathymetry and Backscatter data from the Cartopep campaign (1995). Multibeam bathymetry data processed with the Caraibes software (v3.9) and a 30m grid created. The data was acquired in 1995 when multibeam systems were first emerging onto the commercial market. The number of beams per ping was limited and thus the data density low in comparison to more modern data. Data editting was only done on data that was significantly in error. The multibeam backscatter data was also processed in the Caraibes software package and produced 2 mosaics at 50m resolution. The backscatter data did not have higher resolution data. Correlation with ground truth data, consisting of sediment samples and photographic imagery samples, predominantly toward the top of slope and seamount summit and flanks,allowed basic interpretation of the bathymetry and backscatter data near the sample locations. To aid interpetation, the sediment samples had been divided into FOLK categories and habitat type determinations, as well as faunal communities in some instances, had beeen made for the photographic imagery samples. Characterisation was spread over the whole area and divided into polygon regions by finding interpretive boundaries on either backscatter imagery (such as texture changes or contrast changes) or on the bathymetric layers of slope, rugosity and relief.
d
LANDFIRE.HI_110FBFM40
catalog.data.gov
Updated Nov 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2021). LANDFIRE.HI_110FBFM40 [Dataset]. https://catalog.data.gov/dataset/landfire-hi-110fbfm40
Explore at:
Dataset updated
Nov 11, 2021
Dataset provided by
U.S. Geological Survey
Description
The LANDFIRE fuel data describe the composition and characteristics of both surface fuel and canopy fuel. Specific products include fire behavior fuel models, canopy bulk density (CBD), canopy base height (CBH), canopy cover (CC), canopy height (CH), and fuel loading models (FLMs). These data may be implemented within models to predict the behavior and effects of wildland fire. These data are useful for strategic fuel treatment prioritization and tactical assessment of fire behavior and effects. DATA SUMMARY: These fire behavior fuel models represent distinct distributions of fuel loadings found among surface fuel components (live and dead), size classes and fuel types. The fuel models are described by the most common fire carrying fuel type (grass, brush, timber litter or slash), loading and surface area-to-volume ratio by size class and component, fuelbed depth and moisture of extinction. Further detail can be found in Scott and Burgan (2005) and Rothermel (1983). This data layer contains a complete set of fire behavior fuel models for use with Rothermel's fire spread models. Characteristics of the new fuel model set, its development and its relationship to the original set of 13 fire behavior fuel models can be found in Burgan (2005). In fire behavior fuel models, canopy characteristics are used to compute shading, wind reduction factors, spotting distances, crown fuel volume, spread characteristics of crown fires and incorporate the effects of ladder fuels for transitions from a surface to crown fire. Canopy characteristics refer to the tree canopy. Where there are tree canopies, i.e. existing vegetation types that are forest and woodland, LANDFIRE has attributed the grid with canopy characteristics with some exceptions. There will be no canopy characteristics in fuel types where the tree canopy is considered a part of the surface fuel and the surface fire behavior fuel model is chosen as such. This is because LANDFIRE assumes the potential burnable biomass in the tree canopy has been accounted for in the surface fuel model parameters. For example, young or short conifer stands where the trees are represented by a shrub type fuel model will not have canopy characteristics. Field plot data contributed either directly or indirectly to this LANDFIRE National data product. Go to http://www.landfire.gov/participate_acknowledgements.php for more information regarding contributors of field plot data. REFRESH 2008 (lf_1.1.0) Refresh 2008 (lf_1.1.0) used 2001 data as a launching point to incorporate disturbance and its severity, both managed and natural, which occurred on the landscape after 2001. Specific examples of disturbance are: fire, vegetation management, weather, and insect and disease. The final disturbance data used in Refresh 2008 (lf_1.1.0) is the result of several efforts that include data derived in part from remotely sensed land change methods, Monitoring Trends in Burn Severity (MTBS), and the LANDFIRE Refresh events data call. Vegetation growth was modeled where both disturbance and non-disturbance occurs. For details on methods, see Process Description for LANDFIRE Refresh 2008 (lf_1.1.0).
d
Dataplex: Reddit Data | Consumer Behavior Data | 2.1M+ subreddits: trends,...
datarade.ai
.json, .csv
Updated Aug 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataplex (2024). Dataplex: Reddit Data | Consumer Behavior Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://datarade.ai/data-products/dataplex-reddit-data-consumer-behavior-data-2-1m-subred-dataplex
Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 14, 2024
Dataset authored and provided by
Dataplex
Area covered
Cuba, Saint Barthélemy, Tunisia, Cocos (Keeling) Islands, Togo, Netherlands, Lithuania, Belize, Burkina Faso, Croatia
Description
The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.

Dataset Overview:

This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.

2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.

Sourced Directly from Reddit:

All data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.

Key Features:

Subreddit Metrics: Detailed data on subreddit activity, including the number of posts, comments, votes, and user participation.

User Engagement: Insights into how users interact with content, including comment threads, upvotes/downvotes, and participation rates.

Trending Topics: Track emerging trends and viral content across the platform, helping you stay ahead of the curve in understanding social media dynamics.

AI-Enhanced Analysis: Utilize AI-generated columns for sentiment analysis, topic categorization, and predictive insights, providing a deeper understanding of the data.

Use Cases:

Social Media Analysis: Researchers and analysts can use this dataset to study online behavior, track the spread of information, and understand how content resonates with different audiences.

Market Research: Marketers can leverage the dataset to identify target audiences, understand consumer preferences, and tailor campaigns to specific communities.

Content Strategy: Content creators and strategists can use insights from the dataset to craft content that aligns with trending topics and user interests, maximizing engagement.

Academic Research: Academics can explore the dynamics of online communities, studying everything from the spread of misinformation to the formation of online subcultures.

Data Quality and Reliability:

The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.

Integration and Usability:

The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.

User-Friendly Structure and Metadata:

The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.

Ideal For:

Data Analysts: Conduct in-depth analyses of subreddit trends, user engagement, and content virality. The dataset’s extensive coverage and AI-enhanced insights make it an invaluable tool for data-driven research.

Marketers: Use the dataset to better understand your target audience, tailor campaigns to specific interests, and track the effectiveness of marketing efforts across Reddit.

Researchers: Explore consumer behavior data of online communities, analyze the spread of ideas and information, and study the impact of digital media on public discourse, all while leveraging AI-generated insights.

This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conducting acade...
Landfire 13 Anderson Fire Behavior Fuel Models Version 140 (CONUS) (Image...
usfs.hub.arcgis.com
Updated Jan 8, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Forest Service (2021). Landfire 13 Anderson Fire Behavior Fuel Models Version 140 (CONUS) (Image Service) [Dataset]. https://usfs.hub.arcgis.com/datasets/8bb24481cce1424ea7ab98524e6dc412
Explore at:
Dataset updated
Jan 8, 2021
Dataset provided by
U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
Authors
U.S. Forest Service
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered

Description
The LANDFIRE fuel data describe the composition and characteristics of both surface fuel and canopy fuel. Specific products include fire behavior fuel models, canopy bulk density (CBD), canopy base height (CBH), canopy cover (CC), canopy height (CH), and fuel loading models (FLMs). These data may be implemented within models to predict the behavior and effects of wildland fire. These data are useful for strategic fuel treatment prioritization and tactical assessment of fire behavior and effects.DATA SUMMARY: Thirteen typical surface fuel arrangements or "collections of fuel properties" (Anderson 1982) were described to serve as input for Rothermel's mathematical surface fire behavior and spread model (Rothermel 1972). These fire behavior fuel models represent distinct distributions of fuel loadings found among surface fuel components (live and dead), size classes and fuel types. The fuel models are described by the most common fire carrying fuel type (grass, brush, timber litter or slash), loading and surface area-to-volume ratio by size class and component, fuelbed depth and moisture of extinction. This dataset can be used for fire spread related characteristics models. In fire behavior fuel models, canopy characteristics are used to compute shading, wind reduction factors, spotting distances, crown fuel volume, spread characteristics of crown fires and incorporate the effects of ladder fuels for transitions from a surface to crown fire. Canopy characteristics refer to the tree canopy. Where there are tree canopies, i.e. existing vegetation types that are forest and woodland, LANDFIRE has attributed the grid with canopy characteristics with some exceptions. There will be no canopy characteristics in fuel types where the tree canopy is considered a part of the surface fuel and the surface fire behavior fuel model is chosen as such. This is because LANDFIRE assumes the potential burnable biomass in the tree canopy has been accounted for in the surface fuel model parameters. For example, young or short conifer stands where the trees are represented by a shrub type fuel model will not have canopy characteristics. Field plot data contributed either directly or indirectly to this LANDFIRE National data product.Go to https://landfire.gov/participate_refdata_sub.php for more information regarding contributors of field plot data. LANDFIRE 2014 (lf_1.4.0) used LANDFIRE 2012 (lf_1.3.0) data as a launching point to incorporate disturbance and its severity, both managed and natural, which occurred on the landscape after 2012. Specific examples of disturbance are: fire, vegetation management, wind, and insect and disease. Disturbance data used in the updating is the result of several efforts that include data derived in part from remotely sensed land change methods, Monitoring Trends in Burn Severity (MTBS), and the LANDFIRE events data call. Vegetation growth was modeled where disturbance occurred.�Metadata and Downloads
Household Budget Survey 2008 - Greece
catalog.ihsn.org
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Population Statistics and Labour Market Statistics (2019). Household Budget Survey 2008 - Greece [Dataset]. https://catalog.ihsn.org/index.php/catalog/7729
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
Hellenic Statistical Authorityhttp://statistics.gr/
General Directorate of Statistical Surveys
Population Statistics and Labour Market Statistics
Time period covered
2008
Area covered
Greece
Description
Abstract

The Household Budget Survey (HBS) is a national survey collecting information from a representative sample of households, on households’ composition, members’ employment status, living conditions and, mainly, focusing on their members’ expenditure on goods and services as well as on their income. The expenditure information collected from households is very detailed. That is, information is not collected on the basis of total expenditure categories like "food", ‘'clothing - footwear', "health ", etc., but separately for each expenditure, for example, white bread, fresh whole milk, fresh beef etc, footwear for men, footwear for women etc., services of medical analysis laboratories, pharmaceutical products etc.

The main purpose of the HBS is to determine in detail the household expenditure pattern in order to revise the Consumer Price Index. Moreover, the HBS is the most appropriate source in order to: - Complete the available statistical data for the estimation of the total private consumption; - Study the households expenditures and their structure in relation to their income and other economic, social and demographic characteristics; - Analyze the changes in the living conditions of the households in comparison with the previous surveys; - Study the relationship between households purchases and receipts in kind; - Study low income limits in the different socio-economic categories and population groups; - Study the changes in the nutritional habits of the households.

Geographic coverage

National coverage

Analysis unit

Households,

Individuals.

Kind of data

Sample survey data [ssd]

Frequency of data collection

The frequency of data collection is continual spread within the reference year.

Sampling procedure

The two-stage area stratified sampling was applied for the Household Budget Survey 2008. The sample of private households was selected in two stages. The primary units are the areas (one or more unified building blocks) and the ultimate sampling units selected in each sampling area are the households.It is estimated that 4.000 questionnaires will be filled in (number equal to, approximately, 1/1000 of the households within the whole Greek territory).

Mode of data collection

Face-to-face [f2f]
f
Data from: Testing and Estimation of Social Network Dependence With Time to...
tandf.figshare.com
txt
Updated Feb 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lin Su; Wenbin Lu; Rui Song; Danyang Huang (2024). Testing and Estimation of Social Network Dependence With Time to Event Data [Dataset]. http://doi.org/10.6084/m9.figshare.8132456.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8132456.v4
Dataset updated
Feb 15, 2024
Dataset provided by
Taylor & Francis
Authors
Lin Su; Wenbin Lu; Rui Song; Danyang Huang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nowadays, events are spread rapidly along social networks. We are interested in whether people’s responses to an event are affected by their friends’ characteristics. For example, how soon will a person start playing a game given that his/her friends like it? Studying social network dependence is an emerging research area. In this work, we propose a novel latent spatial autocorrelation Cox model to study social network dependence with time-to-event data. The proposed model introduces a latent indicator to characterize whether a person’s survival time might be affected by his or her friends’ features. We first propose a score-type test for detecting the existence of social network dependence. If it exists, we further develop an EM-type algorithm to estimate the model parameters. The performance of the proposed test and estimators are illustrated by simulation studies and an application to a time-to-event dataset about playing a popular mobile game from one of the largest online social network platforms. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
d
Data from: Hydrochemical Atlas of the Arctic Ocean
search.dataone.org
doi.pangaea.de
Updated Jan 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikiforov, Sergey L; Colony, Roger; Timokhov, Leonid; Arctic and Antarctic Research Institute of the Russian Federal Service for Hydrometeorology and Environmental Monitoring, International Arctic Research Center, University of Alaska, Fairbanks, St. Petersburg (2018). Hydrochemical Atlas of the Arctic Ocean [Dataset]. http://doi.org/10.1594/PANGAEA.691332
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.691332
Dataset updated
Jan 5, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Nikiforov, Sergey L; Colony, Roger; Timokhov, Leonid; Arctic and Antarctic Research Institute of the Russian Federal Service for Hydrometeorology and Environmental Monitoring, International Arctic Research Center, University of Alaska, Fairbanks, St. Petersburg
Time period covered
Apr 16, 1948 - Sep 17, 2000
Area covered
Arctic Ocean
Description
Introduction: Chemical composition of water determines its physical properties and character of processes proceeding in it: freezing temperature, volume of evaporation, density, color, transparency, filtration capacity, etc. Presence of chemical elements in water solution confers waters special physical properties exerting significant influence on their circulation, creates necessary conditions for development and inhabitance of flora and fauna, and imparts to the ocean waters some chemical features that radically differ them from the land waters (Alekin & Liakhin, 1984). Hydrochemical information helps to determine elements of water circulation, convection depth, makes it easier to distinguish water masses and gives additional knowledge of climatic variability of ocean conditions. Hydrochemical information is a necessary part of biological research. Water chemical composition can be the governing characteristics determining possibility and limits of use of marine objects, both stationary and moving in sea water. Subject of investigation of hydrochemistry is study of dynamics of chemical composition, i.e. processes of its formation and hydrochemical conditions of water bodies (Alekin & Liakhin 1984). The hydrochemical processes in the Arctic Ocean are the least known. Some information on these processes can be obtained in odd publications. A generalizing study of hydrochemical conditions in the Arctic Ocean based on expeditions conducted in the years 1948-1975 has been carried out by Rusanov et al. (1979). The "Atlas of the World Ocean: the Arctic Ocean" contains a special section "Hydrochemistry" (Gorshkov, 1980). Typical vertical profiles, transects and maps for different depths - 0, 100, 300, 500, 1000, 2000, 3000 m are given in this section for the following parameters: dissolved oxygen, phosphate, silicate, pH and alkaline-chlorine coefficient. The maps were constructed using the data of expeditions conducted in the years 1948-1975. The illustrations reflect main features of distribution of the hydrochemical elements for multi-year period and represent a static image of hydrochemical conditions. Distribution of the hydrochemical elements on the ocean surface is given for two seasons - winter and summer, for the other depths are given mean annual fields. Aim of the present Atlas is description of hydrochemical conditions in the Arctic Ocean on the basis of a greater body of hydrochemical information for the years 1948-2000 and using the up-to-date methods of analysis and electronic forms of presentation of hydrochemical information. The most wide-spread characteristics determined in water samples were used as hydrochemical indices. They are: dissolved oxygen, phosphate, silicate, pH, total alkalinity, nitrite and nitrate. An important characteristics of water salt composition - "salinity" has been considered in the Oceanographic Atlas of the Arctic Ocean (1997, 1998). Presentation of the hydrochemical characteristics in this Hydrochemical Atlas is wider if compared with that of the former Atlas (Gorshkov, 1980). Maps of climatic distribution of the hydrochemical elements were constructed for all the standard depths, and seasonal variability of the hydrochemical parameters is given not only for the surface, but also for the underlying standard depths up to 400 m and including. Statistical characteristics of the hydrochemical elements are given for the first time. Detailed accuracy estimates of initial data and map construction are also given in the Atlas. Calculated values of mean-root deviations, maximum and minimum values of the parameters demonstrate limits of their variability for the analyzed period of observations. Therefore, not only investigations of chemical statics are summarized in the Atlas, but also some elements of chemical dynamics are demonstrated. Digital arrays of the hydrochemical elements obtained in nodes of a regular grid are the new form of characteristics presentation in the Atlas. It should be mentioned that the same grid and the same boxes were used in the Atlas, as those that had been used by creation of the US-Russian climatic Oceanographic Atlas. It allows to combine hydrochemical and oceanographic information of these Atlases. The first block of the digital arrays contains climatic characteristics calculated using direct observational data. These climatic characteristics were not calculated in the regions without observations, and the information arrays for these regions have gaps. The other block of climatic information in a gridded form was obtained with the help of objective analysis of observational data. Procedure of the objective analysis allowed us to obtain climatic estimates of the hydrochemical characteristics for the whole water area of the Arctic Ocean including the regions not covered by observations. Data of the objective analysis can be wide... Visit https://dataone.org/datasets/8f096d0c7e2f5962ce4828cc6ea59572 for complete metadata about this dataset.
n
Data from: Seed size, seed dispersal traits, and plant dispersion patterns...
data.niaid.nih.gov
datadryad.org
zip
Updated Mar 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yvette Ortega; Dean Pearson; Jane Tuthill (2023). Seed size, seed dispersal traits, and plant dispersion patterns for native and introduced grassland plants [Dataset]. http://doi.org/10.5061/dryad.2z34tmpr2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2z34tmpr2
Dataset updated
Mar 10, 2023
Dataset provided by
US Forest Service
University of Montana
Authors
Yvette Ortega; Dean Pearson; Jane Tuthill
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Most terrestrial plants disperse by seeds, yet the relationship between seed mass, seed dispersal traits, and plant dispersion is poorly understood. We quantified seed traits for 48 species of native and introduced plants from grasslands of western Montana, USA, to investigate the relationships between seed traits and plant dispersion patterns. Additionally, because the linkage between dispersal traits and dispersion patterns might be stronger for actively dispersing species, we compared these patterns between native and introduced plants. Finally, we evaluated the efficacy of a global trait database, the TRY plant traits database, versus locally collected data for examining these questions. This archive contains species-level data used in analyses, including species metadata (origin, growth form), mean values of measured seed traits (size metrics and type of dispersal structures), two metrics of dispersion (local and broad scales, respectively) derived from grassland surveys in the study region, and information on the seed mass accessed from the TRY traits database. Note that the latter seed mass data could not be included in the archive, but can be acquired directly from the TRY plant traits database (https://www.try-db.org/TryWeb/Home.php). Methods Our study took place in semi-arid grasslands of the Intermountain Region in western Montana, U.S.A. The native system is dominated primarily by bluebunch wheatgrass (Pseudoroegneria spicata) with other grasses and a great variety of forbs diversifying the system, but it is heavily invaded by exotics. We identified our study species, comprised of 23 native and 25 exotic species, to reflect a range of dispersion patterns by using data from 620 1-m2 vegetation plots from 31 grassland sites spread over 20,000 km2 of western Montana. Plant dispersion patterns were defined at a local-scale by the proportion of plots occupied within a site and at a broad-scale by the proportion of sites occupied per species. For each species, we collected at least 50 seeds from each of 10 plants at each of 3 locations in Missoula and Lake County, Montana in either 2020 or 2021. Collection locations were chosen opportunistically based on species presence and hence differed by species. Although these locations did not align with sites surveyed for species dispersion per se, they were generally drawn from the central portion of the study area. Seeds were stored in a laboratory under ambient conditions until measurements were taken, at which point they were cleaned by hand and sorted based primarily on visual characteristics to remove potentially non-viable seeds. To determine the mean seed mass per species, we weighed a fixed number of samples (three or four) from each of the three locations. The number of seeds weighed per sample was set per species to ensure a total mass >1.5 mg, the minimum reading needed for an accuracy of 2% per the specifications of the balance. For 32 of our 48 species, only 10 seeds were needed to reach this minimum. For remaining species, we increased the number of seeds included per sample in increments of 10 (range 20–150 seeds/sample) until the minimum mass was reached. Seed mass included the entire diaspore (e.g., endosperm, seed coat, awns, and dispersal appendages) to ensure that all species could be treated in the same way (e.g., dispersal appendages such as wings would have been very difficult to remove from small-seeded species). Though the inclusion of dispersal appendages potentially biases seed mass estimates for this subset of species, we note that this bias should be small relative to the large variation in seed mass across species. Indeed, estimates for three exotic species (Lactuca serriola, Taraxacum officinale, and Tragopogon dubius) with pappuses showed that these structures increased seed mass measures by <12%. For the remaining measurements, we used a ProgRes C10 camera (Jenoptik, CCD/CMOS) to create images of 20 seeds per species drawn from the 3 sampling locations (n=6 from two locations and n=8 from the third, chosen randomly). We used the images to obtain the following measurements for each seed via ImageJ software (Rasband 1997-2018): seed length (maximum), seed width (maximum), and seed surface area. These seed measurements excluded dispersal structures. Mean values per species for all seed size measurements are included in the species-level dataset archived here. Finally, we inspected seeds to determine whether seeds of each species possessed dispersal structures including pappuses, awns, wings, or plumes. For smaller-seeded species, we accomplished this using the seed images and also checked the literature to assure that dispersal structures were not missed. To enable comparison of empirical seed measures to those available in online trait databases, we used the TRY plant trait database (accessed 22 September – 7 October 2022), a global database integrating ~700 datasets including other major collective databases. This database included seed mass data for 44 of our 48 species but contained insufficient data to evaluate the other seed traits (i.e., length, width, and surface area) we measured (i.e., for only 2-40% of our study species). Importantly, 63% of n=831 seed mass records obtained from the TRY database could not be used in analyses. This is because these contained duplicate data that resulted from the consolidation of many datasets with common sources. See the publication for a full description of our process for identifying duplicate values. Remaining seed mass values from the TRY database were averaged to generate the mean estimate used in analyses. See the archive for sample size information per species.
i
Employment and Unemployment Survey 2014, Economic Research Forum (ERF)...
catalog.ihsn.org
Updated Jun 26, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic Research Forum (2017). Employment and Unemployment Survey 2014, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/6954
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
Department of Statistics
Economic Research Forum
Time period covered
2014
Area covered
Jordan
Description
Abstract

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

The Department of Statistics (DOS) carried out four rounds of the 2014 Employment and Unemployment Survey (EUS) during February, May, August and November 2014. The survey rounds covered a total sample of about fifty three thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design.

It is worthy to mention that the DOS employed new technology in data collection and data processing. Data was collected using electronic questionnaire instead of a hard copy, namely a hand held device (PDA).

The survey main objectives are:

To identify the demographic, social and economic characteristics of the population and manpower.

To identify the occupational structure and economic activity of the employed persons, as well as their employment status.

To identify the reasons behind the desire of the employed persons to search for a new or additional job.

To measure the economic activity participation rates (the number of economically active population divided by the population of 15+ years old).

To identify the different characteristics of the unemployed persons.

To measure unemployment rates (the number of unemployed persons divided by the number of economically active population of 15+ years old) according to the various characteristics of the unemployed, and the changes that might take place in this regard.

To identify the most important ways and means used by the unemployed persons to get a job, in addition to measuring durations of unemployment for such persons.

To identify the changes overtime that might take place regarding the above-mentioned variables.

The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

Geographic coverage

Covering a sample representative on the national level (Kingdom), governorates, and the three Regions (Central, North and South).

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey covered a national sample of households and all individuals permanently residing in surveyed households.

Kind of data

Sample survey data [ssd]

Sampling procedure

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

Survey Frame

The sample of this survey is based on the frame provided by the data of the Population and Housing Census, 2004. The Kingdom was divided into strata, where each city with a population of 100,000 persons or more was considered as a large city. The total number of these cities is 6. Each governorate (except for the 6 large cities) was divided into rural and urban areas. The rest of the urban areas in each governorate was considered as an independent stratum. The same was applied to rural areas where it was considered as an independent stratum. The total number of strata was 30.

In view of the existing significant variation in the socio-economic characteristics in large cities in particular and in urban in general, each stratum of the large cities and urban strata was divided into four sub-stratum according to the socio- economic characteristics provided by the population and housing census with the purpose of providing homogeneous strata.

The frame excludes the population living in remote areas (most of whom are nomads), In addition to that, the frame does not include collective dwellings, such as hotels, hospitals, work camps, prisons and alike.

Sample Design

The sample of this survey was designed, using the two-stage cluster stratified sampling method. The main sample was designed in 2009 based on the data of the population and housing census 2004 for carrying out household surveys. The sample representative on the Kingdom, rural, urban, regions and governorates levels. The total sample size for each round was 1336 PSUs (clusters). These units were distributed to governorates urban, rural and large cities in each governorate according to the weight of persons and households and according to the variance within each stratum. Slight modifications regarding the number of these units were made to cope with the multiple of 8, the number of clusters for four rounds was 53432.

The main sample is consisted of 40 replicates, each replicate is consisted of 167 Primary Sampling Units (PSUs). For the purpose of each round, eight replicates of the main sample were used. The Primary Sampling Units (PSUs) were ordered within each stratum according to geographic characteristics and then according to socio-economic characteristics in order to ensure good spread of the sample. Then, the sample was selected on two stages, in the first stage, The Primary Sampling Units (PSUs) were selected, using the Probability Proportionate to Size with systematic selection procedure. The number of households, in each primary sampling unit (cluster) served as its weight or size. In the second stage, the blocks of the primary sampling units (cluster) which were selected in the first stage have been updated. Then a constant number of households (10 households) was selected, using the random systematic sampling method as final PSUs from each PSU (cluster).

Sampling notes

It is noteworthy that the sample of the present survey does not represent the non-Jordanian population, due to the fact that it is based on households living in conventional dwellings. In other words, it does not cover the collective households living in collective dwellings. Therefore, the non-Jordanian households covered in the present survey are either private households or collective households living in conventional dwellings. In Jordan, it is well known that a large number of non-Jordanian workers live as groups and spend most of their time at workplaces. Hence, it is more unlikely to find them at their residences during daytime (i.e. the time when the data of the survey is collected). Furthermore, most of them live in their workplaces, such as: workshops, sales stores, guard places, or under construction building's sites. Such places are not classified as occupied dwellings for household sampling purposes. Due to all of the above, the coverage of such population would not be complete in household surveys.

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

The questionnaire was designed electronically on the PDA and revised by the DOS technical staff. It was finalized upon completion of the training program. The questionnaire is divided into main topics, each containing a clear and consistent group of questions, and designed in a way that facilitates the electronic data entry and verification. The questionnaire includes the characteristics of household members in addition to the identification information, which reflects the administrative as well as the statistical divisions of the Kingdom.

Cleaning operations

Raw Data

A tabulation results plan has been set based on the previous Employment and Unemployment Surveys while the required programs were prepared and tested. When all prior data processing steps were completed, the actual survey results were tabulated using an ORACLE package. The tabulations were then thoroughly checked for consistency of data. The final report was then prepared, containing detailed tabulations as well as the methodology of the survey.

Harmonized Data

The SPSS package is used to clean and harmonize the datasets.

The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.

All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.

A post-harmonization cleaning process is then conducted on the data.

Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

The results of the fieldwork indicated that all sample households were visited. The number of successfully completed interviews was 48436, that is 90.8 percent of the total sample households.

Among the reasons of un-successful interviews (although three callbacks were made) 1.8 percent of the dwellings were closed at time of the visit.

The findings also indicate that the response rate is 95.5 percent, based on dividing the number of completed questionnaires by the number of expected completed interviews, that is after excluding the vacant dwellings.

More information on the distribution of interviews by region, governorate and visit results is available in table (E) in Page 4 of the annual report provided among the disseminated survey materials under a file named "Jordan 2014- Annual report (English).pdf".

Sampling error estimates

Sampling errors were calculated
Customer Shopping Trends Dataset
kaggle.com
Updated Oct 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

Content

This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

Dataset Glossary (Column-wise)

Customer ID - Unique identifier for each customer

Age - Age of the customer

Gender - Gender of the customer (Male/Female)

Item Purchased - The item purchased by the customer

Category - Category of the item purchased

Purchase Amount (USD) - The amount of the purchase in USD

Location - Location where the purchase was made

Size - Size of the purchased item

Color - Color of the purchased item

Season - Season during which the purchase was made

Review Rating - Rating given by the customer for the purchased item

Subscription Status - Indicates if the customer has a subscription (Yes/No)

Shipping Type - Type of shipping chosen by the customer

Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)

Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)

Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction

Payment Method - Customer's most preferred payment method

Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

Structure of the Dataset

https://i.imgur.com/6UEqejq.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

Cover Photo by: Freepik

Thumbnail by: Clothing icons created by Flat Icons - Flaticon
w
Plan Foncier Rural Impact Evaluation 2018 - Benin
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Feb 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thea Hilhorst (2021). Plan Foncier Rural Impact Evaluation 2018 - Benin [Dataset]. https://microdata.worldbank.org/index.php/catalog/3850
Explore at:
Dataset updated
Feb 16, 2021
Dataset provided by
Daniel Ali Ayalew
Klaus Deininger
Thea Hilhorst
Time period covered
2018
Area covered
Benin
Description
Abstract

The PFR activities to be evaluated at end-line consists mainly of demarcation and registration of land parcels (under customary tenure) as Titre Foncier or an Attestation de Droit Coutumière. The impact evaluation aims to quantify and analyse impact of these interventions on productivity and food security disaggregated by target groups and gender.

The research questions to be answered after the endline data collection are:

1) Do PFRs (or ADCs) contribute to a perception of greater land tenure security? 2) Does improved tenure security lean to a growth in agricultural investment and/or changes to management of land? 3) Do PFRs improve access to land and rights over land among marginalised groups (women, youth and migrants)? 4) Do PFRs lead to an increased number of land transactions? 5) Does increased land security address existing constraints on land markets and lead to more efficient allocation of land resources and thereby an increase in productivity? 6) Do property rights and improved user rights result in better access to credit, possibly allowing for income diversification and thus increasing household welfare? 7) Do the new arrangements put in place during the implementation of the PFRs facilitate the resolution of land conflicts, or even prevent the emergence of these land conflicts?

Geographic coverage

The clusters were spread across the communes of Bembéréké, Sinendé and Kalalé in the north and Tchaourou in the south of the department of Borgou.

Analysis unit

Villages

Households

Kind of data

Sample survey data [ssd]

Sampling procedure

The impact evaluation consists of gender and youth disaggregated data collection at base line, before the start of the intervention, in both the treatment and control villages. End line data will be collected at least 2 growing seasons after issuing of documentation to farmers.

The sample consisted of 2968 households, which were taken from 26 villages selected for the implementation of a Plan Foncier Rural (PFR), or rural landholding plans, these were the treatment villages and 27 control villages that did not benefit from a PFR.

The treatment villages were assigned by the ProPFR team in geographic clusters. The assignment of control villages followed this geographic clustering, also using further village level data with the aim of finding similar villages to maximize comparability. These clusters were spread across the communes of Bembéréké, Sinendé and Kalalé in the north and Tchaourou in the south of the department of Borgou.

Villages were selected from 11 geographical clusters of villages facing similar issues, allowing easier logistical planning for the rollout of the PFRs.

Villages selected to be part of the programme had the following characteristics: • Bordering/near to a classified national forest • At high risk of land grabbing, • The presence of another GIZ supported SEWOH project1 • Agropastoral areas (in particular the presence of transhumance –cattle driving - corridors)

But should not have the following: • Villages bordering Nigeria, within the band of increased security • MCA intervention with a PFR • Suffered serious conflict which could block the realisation of a PFR, or where a PFR may reignite past conflicts.

These characteristics alongside the desire of the implementing team to select villages in clusters, for practical reasons presented the first challenge in selecting suitable comparison villages to measure the impact of the ProPFR programme. Clustering meant that villages selected for comparison should be near the clusters to be comparable, but given the typical geography of villages in northern Benin, in that most people live in the village centre rather than spread evenly with sufficient density at the village boundary, and the lack of clearly defined village boundaries, a geographic discontinuity could not be exploited.

The second challenge in selecting comparison villages arose due to a change in the village definitions in 2013, when Benin changed from 3758 to 5290 villages which is often referred to as the “nouveau découpage”. Some old villages were split but there are no clearly defined village boundaries for the new set of villages. ProPFR selected from among the new villages, so the control villages also needed to be selected from this list. Given that the last census was collected prior to this new definition of villages, no data about the villages existed that could easily be used in matching villages to those selected for the ProPFR.

Due to this lack of data on the characteristics of the people residing in the villages, Geographical Information Systems (GIS) data were used to match each of the treatment PFR villages to a control village. Villages which were previously included in the MCA’s wave of PFRs were excluded from our study due to the difficulty in separating the effects of the two programs (MCA vs ProPFR). For each PFR village, a buffer of 20km was drawn and the union constructed for each cluster. Within this area, other villages were considered as a potential control village. Of the selection criteria, the only one applicable from GIS data is the proximity to a national forest. Where villages were close to a national forest, we attempted to match it with a control village also close to a national forest. The additional criteria on which villages were matched were the proximity to a main road (as classified by the Open Street Map shapefiles for roads) and the number of buildings in the central agglomeration of a village. Main roads are used as a proxy for access to markets and thereby potentially income levels.

The size of a village and the amount of land which can be used around it will be influenced by the size of the population as well as the presence of national forests. This strategy is similar to a Coarsened Exact Matching (CEM) strategy (see Blackwell et al, 2009), in which key characteristics are reduced (perhaps from continuous variables) to a small number of categories and matched with one another exactly. In our selection of villages, one control village was selected for each treatment village based on the key characteristics, defined as proximity to national forests (5km) and main roads (1km), and having a similar number of buildings (within 1km of the central point).

For a small number of villages, we faced an issue of common support, meaning there were no exact matches on the key characteristics. In this case other nearby villages were selected which fulfilled as many of these characteristics as possible. Data were collected on a wide range of variables following the theory of change, which states that the improvements in institutions and the PFRs may lead to improved perceived land tenure security and improved access to land for women and young men through the activities carried out by the ProPFR team. This perceived land tenure security is often seen as key to agricultural investments and thereby food security in the long term, as it allows long-term planning. The issuing of official documentation provides collateral for a loan should households wish to borrow and invest in productive activities or smooth consumption.

Mode of data collection

Face-to-face [f2f]

Research instrument

The Survey comprised two questionnaires namely:

Household Questionnaire: Which comprised 14 modules with 7 rosters. Modules include household members, employment and enterprises, durable goods, housing, census of non-agricultural plots, agricultural plots, land donations, land sales, land losses, perceptions on land tenure, participation in PFR, loans, food security, young men and women.

Community (village) questionnaire: The community survey was administrated to each village in the form of small group interviews to collect information on the socio-economic characteristics of these villages, local land tenure structures and practices, and local prices on agricultural inputs and production. The questionnaire was organized in 9 modules: characteristics of the survey participants, land tenure, land use, land market, land conflicts, other village structures and interventions, agriculture, PFR, and village chief. The characteristics of the participants were recorded in a separate roster.

The extensive household survey was first asked to the household head with additional modules to be answered by the wife of the household head (or the female household head) as well as a young male (defined as an unmarried man, aged 18-35).

Cleaning operations

Various consistency checks were performed to ensure data quality, including systematic reports of contradictory answers and of extreme values. Throughout the data collection process, two main issues were reported. The first pertains to the sampling methodology of buildings, that led to the necessary replacement of pre-selected non-housing buildings. However, just short of 500 households required replacement. The majority of the buildings replaced were not residential buildings and were therefore not eligible for inclusion in the survey. These were replaced by the next building in the random order of buildings. The number of buildings for which nobody could be found for surveying was very low (23), thanks to the robust replacement protocol.

The second issue concerns the refusal of the village Sombouan 2 to participate in the survey. Despite several attempts, this village had to be excluded from the survey. The data were also examined for missing information for required variables, and sections. Any problems found were then reported back to the supervisors where the correction was then made.

Response rate

The response rate for
Household Survey on Information and Communications Technology, 2014 - West...
pcbs.gov.ps
Updated Jan 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of statistics (2020). Household Survey on Information and Communications Technology, 2014 - West Bank and Gaza [Dataset]. https://www.pcbs.gov.ps/PCBS-Metadata-en-v5.2/index.php/catalog/465
Explore at:
Dataset updated
Jan 28, 2020
Dataset provided by
Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
Authors
Palestinian Central Bureau of statistics
Time period covered
2014
Area covered
Gaza Strip, West Bank, Gaza
Description
Abstract

Within the frame of PCBS' efforts in providing official Palestinian statistics in the different life aspects of Palestinian society and because the wide spread of Computer, Internet and Mobile Phone among the Palestinian people, and the important role they may play in spreading knowledge and culture and contribution in formulating the public opinion, PCBS conducted the Household Survey on Information and Communications Technology, 2014.

The main objective of this survey is to provide statistical data on Information and Communication Technology in the Palestine in addition to providing data on the following: -

· Prevalence of computers and access to the Internet. · Study the penetration and purpose of Technology use.

Geographic coverage

Palestine (West Bank and Gaza Strip) , type of locality (Urban, Rural, Refugee Camps) and governorate

Analysis unit

Household. Person 10 years and over .

Universe

All Palestinian households and individuals whose usual place of residence in Palestine with focus on persons aged 10 years and over in year 2014.

Kind of data

Sample survey data [ssd]

Sampling procedure

Sampling Frame The sampling frame consists of a list of enumeration areas adopted in the Population, Housing and Establishments Census of 2007. Each enumeration area has an average size of about 124 households. These were used in the first phase as Preliminary Sampling Units in the process of selecting the survey sample.

Sample Size The total sample size of the survey was 7,268 households, of which 6,000 responded.

Sample Design The sample is a stratified clustered systematic random sample. The design comprised three phases:

Phase I: Random sample of 240 enumeration areas. Phase II: Selection of 25 households from each enumeration area selected in phase one using systematic random selection. Phase III: Selection of an individual (10 years or more) in the field from the selected households; KISH TABLES were used to ensure indiscriminate selection.

Sample Strata Distribution of the sample was stratified by: 1- Governorate (16 governorates, J1). 2- Type of locality (urban, rural and camps).

Sampling deviation

-

Mode of data collection

Face-to-face [f2f]

Research instrument

The survey questionnaire consists of identification data, quality controls and three main sections: Section I: Data on household members that include identification fields, the characteristics of household members (demographic and social) such as the relationship of individuals to the head of household, sex, date of birth and age.

Section II: Household data include information regarding computer processing, access to the Internet, and possession of various media and computer equipment. This section includes information on topics related to the use of computer and Internet, as well as supervision by households of their children (5-17 years old) while using the computer and Internet, and protective measures taken by the household in the home.

Section III: Data on persons (aged 10 years and over) about computer use, access to the Internet and possession of a mobile phone.

Cleaning operations

Preparation of Data Entry Program: This stage included preparation of the data entry programs using an ACCESS package and defining data entry control rules to avoid errors, plus validation inquiries to examine the data after it had been captured electronically.

Data Entry: The data entry process started on 8 May 2014 and ended on 23 June 2014. The data entry took place at the main PCBS office and in field offices using 28 data clerks.

Editing and Cleaning procedures: Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.

Response rate

Response Rates= 79%

Sampling error estimates

There are many aspects of the concept of data quality; this includes the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are three components to the quality of statistics: accuracy, comparability, and quality control procedures.

Checks on data accuracy cover many aspects of the survey and include statistical errors due to the use of a sample, non-statistical errors resulting from field workers or survey tools, and response rates and their effect on estimations. This section includes:

Statistical Errors Data of this survey may be affected by statistical errors due to the use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators.

Variance calculations revealed that there is no problem in disseminating results nationally or regionally (the West Bank, Gaza Strip), but some indicators show high variance by governorate, as noted in the tables of the main report.

Non-Statistical Errors Non-statistical errors are possible at all stages of the project, during data collection or processing. These are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the field workers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, and practical and theoretical training took place during the training course. Training manuals were provided for each section of the questionnaire, along with practical exercises in class and instructions on how to approach respondents to reduce refused cases. Data entry staff were trained on the data entry program, which was tested before starting the data entry process.

Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.

The sources of non-statistical errors can be summarized as: 1. Some of the households were not at home and could not be interviewed, and some households refused to be interviewed. 2. In unique cases, errors occurred due to the way the questions were asked by interviewers and respondents misunderstood some of the questions.
Data from: DC3 Miscellaneous NSF/NCAR GV-HIAPER Data
data.nasa.gov
datasets.ai
+1more
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). DC3 Miscellaneous NSF/NCAR GV-HIAPER Data [Dataset]. https://data.nasa.gov/dataset/dc3-miscellaneous-nsf-ncar-gv-hiaper-data-270ca
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
DC3_Miscellaneous_NSF-GV-HIAPER_Data are miscellaneous data collected onboard the DC-8 aircraft during the Deep Convective Clouds and Chemistry (DC3) field campaign. This product features data from the Global Forecast System (GFS) model. Data collection for this product is complete.The Deep Convective Clouds and Chemistry (DC3) field campaign sought to understand the dynamical, physical, and lightning processes of deep, mid-latitude continental convective clouds and to define the impact of these clouds on upper tropospheric composition and chemistry. DC3 was conducted from May to June 2012 with a base location of Salina, Kansas. Observations were conducted in northeastern Colorado, west Texas to central Oklahoma, and northern Alabama in order to provide a wide geographic sample of storm types and boundary layer compositions, as well as to sample convection.DC3 had two primary science objectives. The first was to investigate storm dynamics and physics, lightning and its production of nitrogen oxides, cloud hydrometeor effects on wet deposition of species, surface emission variability, and chemistry in anvil clouds. Observations related to this objective focused on the early stages of active convection. The second objective was to investigate changes in upper tropospheric chemistry and composition after active convection. Observations related to this objective focused on the 12-48 hours following convection. This objective also served to explore seasonal change of upper tropospheric chemistry.In addition to using the NSF/NCAR Gulfstream-V (GV) aircraft, the NASA DC-8 was used during DC3 to provide in-situ measurements of the convective storm inflow and remotely-sensed measurements used for flight planning and column characterization. DC3 utilized ground-based radar networks spread across its observation area to measure the physical and kinematic characteristics of storms. Additional sampling strategies relied on lightning mapping arrays, radiosondes, and precipitation collection. Lastly, DC3 used data collected from various satellite instruments to achieve its goals, focusing on measurements from CALIOP onboard CALIPSO and CPL onboard CloudSat. In addition to providing an extensive set of data related to deep, mid-latitude continental convective clouds and analyzing their impacts on upper tropospheric composition and chemistry, DC3 improved models used to predict convective transport. DC3 improved knowledge of convection and chemistry, and provided information necessary to understanding the processes relating to ozone in the upper troposphere.
Spectral dataset of daylights and surface properties of natural objects...
zenodo.org
bin, csv
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Takuma Morimoto; Takuma Morimoto; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa (2024). Spectral dataset of daylights and surface properties of natural objects measured in Japan [Dataset]. http://doi.org/10.5281/zenodo.5217752
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5217752
Dataset updated
Aug 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Takuma Morimoto; Takuma Morimoto; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Japan
Description
This is a spectral dataset of natural objects and daylights collected in Japan.

We collected 359 natural objects and measured the reflectance of all objects and the transmittance of 75 leaves. We also measured daylights from dawn till dusk on four different days using a white plate placed (i) under the direct sun and (ii) under the casted shadow (in total 359 measurements). We also separately measured daylights at five different locations (including a sports ground, a space between tall buildings and a forest) with minimum time intervals to reveal the influence of surrounding environments on the spectral composition of daylights reaching the ground (in total 118 measurements).

If you use this dataset in your research, please cite the following publication.

Morimoto, T., Zhang, C., Fukuda, K., & Uchikawa, K. (2022). Spectral measurement of daylights and surface properties of natural objects in Japan. Optics express, 30(3), 3183. https://doi.org/10.1364/OE.441063

Dataset contains following Excel spread sheets and csv files:

(A) Surface properties of natural objects

(A-1) Reflectance_ver1-2.xlsx and .csv

(A-2) Transmittance_FrontSideUp_ver1-2.xlsx and .csv

(A-2) Transmittance_BackSideUp_ver1-2.xlsx and .csv

(B) Daylight measurements

(B-1) Daylight_TimeLapse_v1-2.xlsx and .csv

(B-2) Daylight_DifferentLocations_v1-2.xlsx and .csv

Data description

(A) Surface properties

(A-1) Reflectance_ver1-2.xlsx and .csv

This file contains surface spectral reflectance data (380 - 780 nm, 5 nm step) of 359 natural objects, including 200 flowers, 113 leaves, 23 fruits, 6 vegetables, 8 barks, and 9 stones measured by a spectrophotometer (SR-2A, Topcon, Tokyo, Japan). Photos of all samples are included in the .xlsx file.

For the analysis presented in the paper, we identified reflectance pairs that have a Pearson’s correlation coefficient across 401 spectral channels of more than 0.999 and removed one of reflectances from each pair. The column 'Used in analysis' indicates whether or not each sample is used for the analysis (TRUE indicates used and FALSE indicate not used).

At the time of collection, we noted the scientific names of flowers, leaves and barks from a name board provided by the Tokyo Institute of Technology in which samples are collected. If not available, we used a smartphone software which automatically identifies the scientific name from an input image (PictureThis - Plant Identifier developed by Glority Global Group Ltd.). The names of 2 flowers and 9 stones whose name could not be identified through either method were left blank.

(A-2) Transmittance_FrontSideUp_v1-2.xlsx and .csv

This file contains surface spectral transmittance data (380 - 780 nm, 5 nm step) for 75 leaves measured by a spectrophotometer (SR-2A, Topcon, Tokyo, Japan). Photos of all samples are included in the .xlsx file.

For this data, the transmittance was measured with the front-side of leaves up (the light was transmitted from the back side of the leaves). This is the data presented in the associated article.

(A-3) Transmittance_BackSideUp_v1-2.xlsx and .csv

Spectral transmittance data of the same leaves presented in (A-2).

For this data, the transmittance was measured with the back-side of leaves up (the light was transmitted from the front side of the leaves).

(B) Daylight measurements

(B-1) Daylight_TimeLapse_ver1-2.xlsx and .csv

This file contains daylight spectra from sunrise to sunset on four different days (2013/11/20, 2013/12/24, 2014/07/03 and 2014/10/27) measured by a spectrophotometer (SR-LEDW, Topcon, Tokyo, Japan) with a wavelength range from 380 nm to 780 nm with 1 nm step. We measured the reflected light from the white calibration plate placed either under a direct sunlight or under a casted shadow.

The column 'Cloud cover' provides visual estimate of percentage of cloud cover across the sky at the time of each measurement. The column 'Red lamp' indicates whether an aircraft warning lamp at the measurement site was on (circle) or off (blank).

(B-2) Daylight_DifferentLocations_ver1-2.xlsx and .csv

This file includes daylight spectra measured at five different sites within the Suzukakedai Campus of Tokyo Institute of Technology with minimum time gap on 2014/07/08, using a spectroradiometer (IM-1000, Topcon) from 380 nm to 780 nm with 1 nm step. The instrument was oriented either towards the sun or towards the zenith sky. When the instrument was oriented to the sun, we measured spectra in two ways: (i) one using a black cylinder covering the photodetector and (ii) the other without using a cylinder.

The column 'Cylinder' indicates whether the black cylinder was used (circle) or not (cross). The column 'Cloud cover' shows the visual estimate of percentage of cloud cover at the time of each measurement. The column 'Sun hidden in clouds' denotes whether the measurement was taken when the sun was covered by clouds (circle) or not (blank).
Labor Force Survey, LFS 2013-2014 - Yemen
erfdataportal.com
Updated Oct 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ILO Regional Office for Arab States (2017). Labor Force Survey, LFS 2013-2014 - Yemen [Dataset]. http://www.erfdataportal.com/index.php/catalog/132
Explore at:
Dataset updated
Oct 15, 2017
Dataset provided by
International Labour Organizationhttp://www.ilo.org/
Economic Research Forum
Central Statistical Organization
Time period covered
2013 - 2014
Area covered
Yemen
Description
Abstract

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL STATISTICAL ORGANIZATION OF YEMEN (CSO)

The primary objective of LFS 2013-2014 was to provide current data on the employment and unemployment situation at national and governorate level using the preliminary version of the new standards concerning statistics of work, employment and labour underutilization on adopted by the 19th International Conference of Labour Statisticians (Geneva, October 2013).

---> The survey was then designed to meet five main measurement objectives as follows: 1- To provide current data on the number of employed, unemployed, and underemployed, and their demographic and social characteristics, including the size of women's participation in economic activity with a view to future policies in expanding their participation in the labour market. 2- To collect data on qualifications of the labour force and participation in training programmes of the youth population and other data requirements for improving the performance of employers through knowledge on the levels of skill available to them. 3- To measure the volume and characteristics of labour migration of Yemenis outside the country. 4- To provide information on the amount of wages and employment-related income in different occupations, branches of economic activity and sectors of employment. 5- To collect appropriate data for evaluating the microfinance projects funded through the Social Fund for Development.

Given the extent and diversity of data requirements, the survey was designed to spread over a one-year period, built around the five objectives of the survey. The core labour force survey was conducted throughout the four quarters of the survey period and incorporated the measurement of income from employment along the conventional items of data collection. Data on qualifications and participation in training was collected on the third quarter and on labour migration on the second quarter of the survey programme. Data collection on microfinance was undertaken as a separate survey over the four quarters.

Geographic coverage

Survey operations were carried out in all governorates except parts where recent events have disturbed the normal course of economic activity. In these circumstances, special procedures were used for compensation, either through the replacement of those areas with other areas having otherwise similar characteristics in the respective strata or through the adjustment of the sampling weights for missing values. There were 14 such cases, 5 each in quarters 1 and 4, and 2 each in quarters 2 and 3.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The labour force survey covered the civilian non-institutional settled population excluding certain areas with difficult access or low population densities, in particular, the nomad population, displaced populations who are homeless, population living in public housing (boarding, hotels, prisons, hospitals, etc.), individuals enlisted in the Armed Forces, who are residing permanently within camps and do not spend most days of the year with their families. Similarly, for marine crews and expatriates outside the country and other categories of persons in remote islands.

Kind of data

Sample survey data [ssd]

Sampling procedure

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL STATISTICAL ORGANIZATION OF YEMEN (CSO)

The sample design of the labour force survey of Yemen 2013-2014 is a two-stage stratified sample of enumeration areas in the first stage of sampling and a fixed number of sample households at the second stage of sampling. The resulting sample is spread evenly over the four quarters of the survey period.

Accordingly, the Central Statistics Organization (CSO) has drawn a stratified sample of census enumeration areas recomposed as primary sampling units (PSUs). Sample selection has been made with probability proportional to the number of households as determined in the 2004 population on census. In the second stage of sampling, after relisting of the sample enumeration areas, a fixed number of households (16 sample households) are drawn as clusters with equal probability from each sample enumeration area. The strata consist of the urban and rural areas of the 21 governorates in Yemen.

According to the sample design, urban areas are oversampled and rural areas under-sampled. This is because a relatively larger sample size is required in urban areas where heterogeneity is greater in comparison with rural areas. Also, because the cost of transportation and field operations is relatively greater in rural areas, it is more cost effective to under sample the rural areas relative to the less costly operations in urban areas. The differential sampling rates are then corrected through the sample weights so that the final results accurately reflect to the overall employment pattern.

The sample selection of the cluster of 16 households in each sample enumeration area was drawn after fresh listing of the totality of the households living in the sample enumeration area at the time of listing. This procedure updates the census information that dates back to 2004. The listing operations are carried out in each quarter before survey interviewing. The updated lists are send to CSO in Sana'a for data entry and sample selection of households for transmission to the survey team in each area. Instructions were given so that sample households that could not be found in the field or were absent or refused to be interview should not be substituted with other households as this procedure may introduce bias in the results. Instructions were also given that in cases where the minimum number of households in the sample enumeration areas was to be found to be less than the required 16 in each quarter, all households in the enumeration area should be taken in the sample.

The total sample size was determined on the basis of the requirement of producing national estimates of the unemployment rate with 1.5% margin of errors at the national level, assuming an overall non-response rate of 15%, and a design effect of 3. For the determination of the national sample size, the expected unemployment rate was set at 15% and the expected number of sample households to reach one person of working age, 15 years old and over, in the labour force was set at 0.6.

A more detailed description of the allocation of sample across governorates is provided in the report document available among external resources in English.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaire of the Yemen LFS 2013-2014 was designed on the basis of the ILO model LFS questionnaire (version A) and other national LFS questionnaires used in the region. The draft questionnaire was field tested with six households in Sana’a, each member of the field staff interviewing one sample household in his or her area. The experience gained in the field test was reviewed and led to some modifications of the draft questionnaire.

Apart from the cover page and the back page, the core LFS questionnaire contains 52 questions. There are 11 questions on the social and demographic characteristics of the household members in the household roster. In the individual questionnaire addressed to the working age population 15 years of age or older, there are 3 questions to identify the employed persons and 19 questions on their employment characteristics including timerelated underemployment followed by 8 additional questions on income from employment. The individual questionnaire also includes 5 questions to identify the unemployment and the potential labour force and 5 follow-up questions on unemployment characteristics.

Cleaning operations

----> Raw Data

Data processing involved data entry, coding, editing and tabulation of the survey results. Data entry was carried out in parallel with the interviewing of sample households. It was conducted at the Central Statistical Organization headquarter in Sana'a where all data processing operations except tabulation were centralized.

The supervisory staff of the data entry operations was responsible for editing the questionnaires before actual data entry. Editing at this stage involved review of the questionnaire regarding its filled-in contents including ensuring that there is no missing block of information for household members aged 15 years old and over and correct coding of occupation, branch of economic activity and other variables.

The data files were further processed at ILO headquarters in Geneva. They were first converted into a single file with 86,778 records and augmented with several fields, in particular, the sampling weights (“weight”) and the key derived variables: employed (E), unemployed (U), time-related underemployment (TRU), potential labour force (PLF) as well as other derived variables such as informal sector employment (IS) and informal employment (IE).

----> Harmonized Data

The SPSS package is used to clean and harmonize the datasets.

The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.

All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated
Global Invasive and Alien Traits and Records (GIATAR) dataset
zenodo.org
zip
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ariel Saffer; Ariel Saffer; Thom Worm; Thom Worm (2025). Global Invasive and Alien Traits and Records (GIATAR) dataset [Dataset]. http://doi.org/10.5281/zenodo.15042321
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15042321
Dataset updated
Mar 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ariel Saffer; Ariel Saffer; Thom Worm; Thom Worm
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Time period covered
Jul 30, 2024
Description
Monitoring and managing the global spread of invasive and alien species requires accurate spatiotemporal records of species presence and information about the biological characteristics of species of interest including life cycle information, biotic and abiotic constraints and pathways of spread. The Global Invasive and Alien Traits And Records (GIATAR) dataset provides consolidated dated records of invasive and alien presence at the country-scale combined with a suite of biological information about pests of interest in a standardized, machine-readable format. We provide dated presence records for 46,666 alien taxa in 249 countries constituting 827,300 country-taxon pairs, joined with additional biological information for thousands of taxa. GIATAR is designed to be quickly updateable with future data and easy to integrate into ongoing research on global patterns of alien species movement using scripts provided to query and analyze data.

This publication includes:

GIATAR dataset files (dataset)

Functions in Python and R to join tables and query data (query_functions)

Tutorials and example queries in Python and R (tutorials)

For more information, please refer to the publication:

Saffer, Ariel, Thom Worm, Yu Takeuchi, and Ross Meentemeyer. “GIATAR: A Spatio-Temporal Dataset of Global Invasive and Alien Species and Their Traits.” Scientific Data 11, no. 1 (September 11, 2024): 991. https://doi.org/10.1038/s41597-024-03824-w.

Changes in this version (March 19, 2025):

Removed base folder from folder structure

Included additional files used to update the database

Latest records as of March 9 - 10, 2025

Updated species list from EPPO as of February 26, 2025

For continuous updates to code, please refer to our Github repository: https://github.com/ncsu-landscape-dynamics/GIATAR-dataset
STEMMUS SCOPE emulator train test example data of 2014
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fakhereh (Sarah) Alidoost; Fakhereh (Sarah) Alidoost; Qianqian Han; Qianqian Han (2024). STEMMUS SCOPE emulator train test example data of 2014 [Dataset]. http://doi.org/10.5281/zenodo.12623257
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12623257
Dataset updated
Jul 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Fakhereh (Sarah) Alidoost; Fakhereh (Sarah) Alidoost; Qianqian Han; Qianqian Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

The "csv" file contains land-atmosphere variables and latent heat flux (LEtot) simulated by STEMMUS-SCOPE (soil-plant model), version 1.5.0, see GitHub repository STEMMUS-SCOPE. The data spreads over 19 Fluxnet sites and for the year 2014 with hourly intervals. For more information see EcoExtreML project.

This data was used as training data pairs to develop an emulator using a random forests regression algorithm, the "onnx" file. The target variable is "latent heat flux (LEtot)" and features are land-atmosphere variables. For more information about the emulator, see GitHub repository STEMMUS-SCOPE Emulator.

The model and data are used to create a tutorial on applying an explainability method, for example, Kernel SHAP using the package DIANNA. For more information see Deep Insight and Neural Network Analysis (DIANNA) project.
GAL Hydrochemistry Formations QC for TDS v02 Surfaces
researchdata.edu.au
devweb.dga.links.com.au
+3more
Updated Mar 29, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2016). GAL Hydrochemistry Formations QC for TDS v02 Surfaces [Dataset]. https://researchdata.edu.au/gal-hydrochemistry-formations-v02-surfaces/2994238
Explore at:
Dataset updated
Mar 29, 2016
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

This dataset was derived by the Bioregional Assessment Programme. The parent datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

This dataset contains raster representations of Total Dissolved Solid (TDS) measurement trends in groundwater samples for each hydrogeological formation in the Galilee Basin subregion.

The dataset also contains supplementary polygon Feature Classes for each formation, to be used in the visualisation of the rasters. For each formation this includes:

a) A rectangular data extent polygon feature class - created based on the distribution of data points for each formation and used to define the extent of the each raster

b) Data extent mask - further defines the extent of data distribution as well as the spatial extent of the formation, used to visualise the TDS trends for each formation only within the formation boundary and near the spread of point data.

Purpose

Provides a visual representation for use in maps, of TDS measurement trends in groundwater for each hydrogeological formation in the Galilee Basin subregion.

Dataset History

The raster layers within this dataset were created using the 'Topo to Raster' interpolation method in ArcGIS. Topo to Raster uses an iterative finite difference interpolation technique. This method is preferred for map and visualisation purposes, especially in sparse data regions, as surface continuity is not compromised at a global level. This results in raster layers with smooth surfaces and trends for any level of data density, and surface continuity between areas of varying density.

Raster layers and polygon Feature Classes were created from the source point Feature Classes (dataset: GAL Hydrochemistry Formations QC for TDS v02 GIS - GUID: 109a21cd-a167-4320-84be-ab56cfc12cee)

Formation Data Extent polygons: An arbitrary rectangular polygon was created around the extent of points contained in each source point Feature Class

Formation Data Extent Mask: a hole was clipped from the Formation Data Extent polygon. The Eastern boundary of each hole was traced from the equivalent formation polygon found within the Galilee Groundwater Model, Hydrogeological Formation Extents v01 dataset (GUID: 5afbf7f1-1ee0-444b-9f77-dbad8d8de95b), while the western, northern and southern extent was defined by the distribution of point data or the Galilee subregion boundary (Bioregional Assessment areas v03, GUID: 96dbf469-5463-4f4d-8fad-4214c97e5aac).

Topo to Raster parameters

Input feature data = respective point feature class from source dataset

Field = TDS

Type = Point Elevation

Output cell size = 0.001

Output extent = Formation data extent polygon Feature Class

Smallest z value to be used in interpolation = smallest TDS value of input point Feature Class

Largest z value to be used in interpolation = largest TDS value of input point Feature Class

Drainage enforcement = NO_ENFORCE

Primary type of input data = SPOT

All other parameters left as default.

Dataset Citation

Bioregional Assessment Programme (XXXX) GAL Hydrochemistry Formations QC for TDS v02 Surfaces. Bioregional Assessment Derived Dataset. Viewed 11 April 2016, http://data.bioregionalassessments.gov.au/dataset/ff165a41-f7f3-4922-870e-6837fd40f228.

Dataset Ancestors

Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204

Derived From GAL Hydrochemistry Formations QC for TDS v02

Derived From GAL Aquifer Formation Extents v01

Derived From GAL Aquifer Formation Extents v02

Derived From Carmichael Coal Mine and Rail Project Environmental Impact Statement

Derived From QLD Hydrochemistry QA QC GAL v02

Derived From Bioregional Assessment areas v03

Derived From GAL Hydrochemistry Formations QC for TDS v01

Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores v3 03122014

Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)

Derived From RPS Galilee Hydrogeological Investigations - Appendix tables B to F (original)

Derived From GEODATA TOPO 250K Series 3

Derived From NSW Catchment Management Authority Boundaries 20130917

Derived From Geological Provinces - Full Extent

Derived From Phanerozoic OZ SEEBASE v2 GIS

Derived From QLD Department of Natural Resources and Mining Groundwater Database Extract 20131111

Derived From QLD DNRM Hydrochemistry with QA/QC

Derived From GAL Hydrochemistry Formations QC for TDS v02 GIS

Derived From Galilee Groundwater Model, Hydrogeological Formation Extents v01

Derived From Queensland petroleum exploration data - QPED

Derived From Natural Resource Management (NRM) Regions 2010

Derived From Three-dimensional visualisation of the Great Artesian Basin - GABWRA

Derived From QLD Department of Natural Resources and Mining Groundwater Database Extract 20142808

Derived From Bioregional Assessment areas v01

Derived From Bioregional Assessment areas v02

Derived From Queensland Geological Digital Data - Detailed state extent, regional. November 2012

Derived From Geoscience Australia, 1 second SRTM Digital Elevation Model (DEM)

Facebook

Twitter

Click to copy link

Link copied

Cite

Kolea Zimmerman; Daniel Levitis; Ethan Addicott; Anne Pringle (2015). Selection of Pairings Reaching Evenly Across the Data (SPREAD): a simple algorithm to design maximally informative fully crossed mating experiments [Dataset]. http://doi.org/10.5061/dryad.8br20

Data from: Selection of Pairings Reaching Evenly Across the Data (SPREAD): a simple algorithm to design maximally informative fully crossed mating experiments

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.8br20

Dataset updated

Aug 20, 2015

Dataset provided by

Bates College
University of Wisconsin–Madison
Harvard University

Authors

Kolea Zimmerman; Daniel Levitis; Ethan Addicott; Anne Pringle

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

We present a novel algorithm for the design of crossing experiments. The algorithm identifies a set of individuals (a "crossing-set") from a larger pool of potential crossing-sets by maximizing the diversity of traits of interest, for example, maximizing the range of genetic and geographic distances between individuals included in the crossing-set. To calculate diversity, we use the mean nearest neighbor distance of crosses plotted in trait space. We implement our algorithm on a real dataset of Neurospora crassa strains, using the genetic and geographic distances between potential crosses as a two-dimensional trait space. In simulated mating experiments, crossing-sets selected by our algorithm provide better estimates of underlying parameter values than randomly chosen crossing-sets.

Clear search

Close search

Google apps

Main menu

Data from: Selection of Pairings Reaching Evenly Across the Data (SPREAD): a...

E-commerce Sales Prediction Dataset

E-commerce Sales Prediction Dataset

📂 Dataset Overview

📊 Data Summary

General Properties

📈 Data Visualizations

💡 How the Data Was Created

🛠 Example Usage: Sales Prediction Model

Written in python

1995 IFREMER Cartopep Acoustic Survey data - Habitat map

LANDFIRE.HI_110FBFM40

Dataplex: Reddit Data | Consumer Behavior Data | 2.1M+ subreddits: trends,...

Landfire 13 Anderson Fire Behavior Fuel Models Version 140 (CONUS) (Image...

Household Budget Survey 2008 - Greece

Abstract

Geographic coverage

Analysis unit

Kind of data

Frequency of data collection

Sampling procedure

Mode of data collection

Data from: Testing and Estimation of Social Network Dependence With Time to...

Data from: Hydrochemical Atlas of the Arctic Ocean

Data from: Seed size, seed dispersal traits, and plant dispersion patterns...

Employment and Unemployment Survey 2014, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Survey Frame

Sample Design

Sampling notes

Mode of data collection

Research instrument

Cleaning operations

Raw Data

Harmonized Data

Response rate

Sampling error estimates

Customer Shopping Trends Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

Plan Foncier Rural Impact Evaluation 2018 - Benin

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Household Survey on Information and Communications Technology, 2014 - West...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sampling deviation

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data from: DC3 Miscellaneous NSF/NCAR GV-HIAPER Data

Spectral dataset of daylights and surface properties of natural objects...

Labor Force Survey, LFS 2013-2014 - Yemen

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure