83 datasets found
  1. n

    Data from: Selection of Pairings Reaching Evenly Across the Data (SPREAD): a...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Aug 20, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kolea Zimmerman; Daniel Levitis; Ethan Addicott; Anne Pringle (2015). Selection of Pairings Reaching Evenly Across the Data (SPREAD): a simple algorithm to design maximally informative fully crossed mating experiments [Dataset]. http://doi.org/10.5061/dryad.8br20
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 20, 2015
    Dataset provided by
    Bates College
    University of Wisconsin–Madison
    Harvard University
    Authors
    Kolea Zimmerman; Daniel Levitis; Ethan Addicott; Anne Pringle
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    We present a novel algorithm for the design of crossing experiments. The algorithm identifies a set of individuals (a "crossing-set") from a larger pool of potential crossing-sets by maximizing the diversity of traits of interest, for example, maximizing the range of genetic and geographic distances between individuals included in the crossing-set. To calculate diversity, we use the mean nearest neighbor distance of crosses plotted in trait space. We implement our algorithm on a real dataset of Neurospora crassa strains, using the genetic and geographic distances between potential crosses as a two-dimensional trait space. In simulated mating experiments, crossing-sets selected by our algorithm provide better estimates of underlying parameter values than randomly chosen crossing-sets.

  2. E-commerce Sales Prediction Dataset

    • kaggle.com
    Updated Dec 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nevil Dhinoja (2024). E-commerce Sales Prediction Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10197264
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2024
    Dataset provided by
    Kaggle
    Authors
    Nevil Dhinoja
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    E-commerce Sales Prediction Dataset

    This repository contains a comprehensive and clean dataset for predicting e-commerce sales, tailored for data scientists, machine learning enthusiasts, and researchers. The dataset is crafted to analyze sales trends, optimize pricing strategies, and develop predictive models for sales forecasting.

    📂 Dataset Overview

    The dataset includes 1,000 records across the following features:

    Column NameDescription
    DateThe date of the sale (01-01-2023 onward).
    Product_CategoryCategory of the product (e.g., Electronics, Sports, Other).
    PricePrice of the product (numerical).
    DiscountDiscount applied to the product (numerical).
    Customer_SegmentBuyer segment (e.g., Regular, Occasional, Other).
    Marketing_SpendMarketing budget allocated for sales (numerical).
    Units_SoldNumber of units sold per transaction (numerical).

    📊 Data Summary

    General Properties

    Date: - Range: 01-01-2023 to 12-31-2023. - Contains 1,000 unique values without missing data.

    Product_Category: - Categories: Electronics (21%), Sports (21%), Other (58%). - Most common category: Electronics (21%).

    Price: - Range: From 244 to 999. - Mean: 505, Standard Deviation: 290. - Most common price range: 14.59 - 113.07.

    Discount: - Range: From 0.01% to 49.92%. - Mean: 24.9%, Standard Deviation: 14.4%. - Most common discount range: 0.01 - 5.00%.

    Customer_Segment: - Segments: Regular (35%), Occasional (34%), Other (31%). - Most common segment: Regular.

    Marketing_Spend: - Range: From 2.41k to 10k. - Mean: 4.91k, Standard Deviation: 2.84k.

    Units_Sold: - Range: From 5 to 57. - Mean: 29.6, Standard Deviation: 7.26. - Most common range: 24 - 34 units sold.

    📈 Data Visualizations

    The dataset is suitable for creating the following visualizations: - 1. Price Distribution: Histogram to show the spread of prices. - 2. Discount Distribution: Histogram to analyze promotional offers. - 3. Marketing Spend Distribution: Histogram to understand marketing investment patterns. - 4. Customer Segment Distribution: Bar plot of customer segments. - 5. Price vs Units Sold: Scatter plot to show pricing effects on sales. - 6. Discount vs Units Sold: Scatter plot to explore the impact of discounts. - 7. Marketing Spend vs Units Sold: Scatter plot for marketing effectiveness. - 8. Correlation Heatmap: Identify relationships between features. - 9. Pairplot: Visualize pairwise feature interactions.

    💡 How the Data Was Created

    The dataset is synthetically generated to mimic realistic e-commerce sales trends. Below are the steps taken for data generation:

    1. Feature Engineering:

      • Identified key attributes such as product category, price, discount, and marketing spend, typically observed in e-commerce data.
      • Generated dependent features like units sold based on logical relationships.
    2. Data Simulation:

      • Python Libraries: Used NumPy and Pandas to generate and distribute values.
      • Statistical Modeling: Ensured feature distributions aligned with real-world sales data patterns.
    3. Validation:

      • Verified data consistency with no missing or invalid values.
      • Ensured logical correlations (e.g., higher discounts → increased units sold).

    Note: The dataset is synthetic and not sourced from any real-world e-commerce platform.

    🛠 Example Usage: Sales Prediction Model

    Here’s an example of building a predictive model using Linear Regression:

    Written in python

    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    
    # Load the dataset
    df = pd.read_csv('ecommerce_sales.csv')
    
    # Feature selection
    X = df[['Price', 'Discount', 'Marketing_Spend']]
    y = df['Units_Sold']
    
    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Model training
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    # Predictions
    y_pred = model.predict(X_test)
    
    # Evaluation
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    print(f'Mean Squared Error: {mse:.2f}')
    print(f'R-squared: {r2:.2f}')
    
  3. i

    1995 IFREMER Cartopep Acoustic Survey data - Habitat map

    • gis.ices.dk
    ogc:wfs, ogc:wms +1
    Updated Jan 20, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IFREMER (2014). 1995 IFREMER Cartopep Acoustic Survey data - Habitat map [Dataset]. https://gis.ices.dk/geonetwork/srv/api/records/b92bf5a1-10bf-4818-a96c-e248da106372
    Explore at:
    ogc:wfs, www:link-1.0-http--link, ogc:wmsAvailable download formats
    Dataset updated
    Jan 20, 2014
    Dataset provided by
    Joint Nature Conservation Committee
    Authors
    IFREMER
    Time period covered
    Jan 1, 2014 - Jun 1, 2014
    Area covered
    Description

    Interpretation of Multibeam Bathymetry and Backscatter data from the Cartopep campaign (1995). Multibeam bathymetry data processed with the Caraibes software (v3.9) and a 30m grid created. The data was acquired in 1995 when multibeam systems were first emerging onto the commercial market. The number of beams per ping was limited and thus the data density low in comparison to more modern data. Data editting was only done on data that was significantly in error. The multibeam backscatter data was also processed in the Caraibes software package and produced 2 mosaics at 50m resolution. The backscatter data did not have higher resolution data. Correlation with ground truth data, consisting of sediment samples and photographic imagery samples, predominantly toward the top of slope and seamount summit and flanks,allowed basic interpretation of the bathymetry and backscatter data near the sample locations. To aid interpetation, the sediment samples had been divided into FOLK categories and habitat type determinations, as well as faunal communities in some instances, had beeen made for the photographic imagery samples. Characterisation was spread over the whole area and divided into polygon regions by finding interpretive boundaries on either backscatter imagery (such as texture changes or contrast changes) or on the bathymetric layers of slope, rugosity and relief.

  4. d

    LANDFIRE.HI_110FBFM40

    • catalog.data.gov
    Updated Nov 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2021). LANDFIRE.HI_110FBFM40 [Dataset]. https://catalog.data.gov/dataset/landfire-hi-110fbfm40
    Explore at:
    Dataset updated
    Nov 11, 2021
    Dataset provided by
    U.S. Geological Survey
    Description

    The LANDFIRE fuel data describe the composition and characteristics of both surface fuel and canopy fuel. Specific products include fire behavior fuel models, canopy bulk density (CBD), canopy base height (CBH), canopy cover (CC), canopy height (CH), and fuel loading models (FLMs). These data may be implemented within models to predict the behavior and effects of wildland fire. These data are useful for strategic fuel treatment prioritization and tactical assessment of fire behavior and effects. DATA SUMMARY: These fire behavior fuel models represent distinct distributions of fuel loadings found among surface fuel components (live and dead), size classes and fuel types. The fuel models are described by the most common fire carrying fuel type (grass, brush, timber litter or slash), loading and surface area-to-volume ratio by size class and component, fuelbed depth and moisture of extinction. Further detail can be found in Scott and Burgan (2005) and Rothermel (1983). This data layer contains a complete set of fire behavior fuel models for use with Rothermel's fire spread models. Characteristics of the new fuel model set, its development and its relationship to the original set of 13 fire behavior fuel models can be found in Burgan (2005). In fire behavior fuel models, canopy characteristics are used to compute shading, wind reduction factors, spotting distances, crown fuel volume, spread characteristics of crown fires and incorporate the effects of ladder fuels for transitions from a surface to crown fire. Canopy characteristics refer to the tree canopy. Where there are tree canopies, i.e. existing vegetation types that are forest and woodland, LANDFIRE has attributed the grid with canopy characteristics with some exceptions. There will be no canopy characteristics in fuel types where the tree canopy is considered a part of the surface fuel and the surface fire behavior fuel model is chosen as such. This is because LANDFIRE assumes the potential burnable biomass in the tree canopy has been accounted for in the surface fuel model parameters. For example, young or short conifer stands where the trees are represented by a shrub type fuel model will not have canopy characteristics. Field plot data contributed either directly or indirectly to this LANDFIRE National data product. Go to http://www.landfire.gov/participate_acknowledgements.php for more information regarding contributors of field plot data. REFRESH 2008 (lf_1.1.0) Refresh 2008 (lf_1.1.0) used 2001 data as a launching point to incorporate disturbance and its severity, both managed and natural, which occurred on the landscape after 2001. Specific examples of disturbance are: fire, vegetation management, weather, and insect and disease. The final disturbance data used in Refresh 2008 (lf_1.1.0) is the result of several efforts that include data derived in part from remotely sensed land change methods, Monitoring Trends in Burn Severity (MTBS), and the LANDFIRE Refresh events data call. Vegetation growth was modeled where both disturbance and non-disturbance occurs. For details on methods, see Process Description for LANDFIRE Refresh 2008 (lf_1.1.0).

  5. d

    Dataplex: Reddit Data | Consumer Behavior Data | 2.1M+ subreddits: trends,...

    • datarade.ai
    .json, .csv
    Updated Aug 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2024). Dataplex: Reddit Data | Consumer Behavior Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://datarade.ai/data-products/dataplex-reddit-data-consumer-behavior-data-2-1m-subred-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 14, 2024
    Dataset authored and provided by
    Dataplex
    Area covered
    Cuba, Saint Barthélemy, Tunisia, Cocos (Keeling) Islands, Togo, Netherlands, Lithuania, Belize, Burkina Faso, Croatia
    Description

    The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.

    Dataset Overview:

    This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.

    2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.

    Sourced Directly from Reddit:

    All data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.

    Key Features:

    • Subreddit Metrics: Detailed data on subreddit activity, including the number of posts, comments, votes, and user participation.
    • User Engagement: Insights into how users interact with content, including comment threads, upvotes/downvotes, and participation rates.
    • Trending Topics: Track emerging trends and viral content across the platform, helping you stay ahead of the curve in understanding social media dynamics.
    • AI-Enhanced Analysis: Utilize AI-generated columns for sentiment analysis, topic categorization, and predictive insights, providing a deeper understanding of the data.

    Use Cases:

    • Social Media Analysis: Researchers and analysts can use this dataset to study online behavior, track the spread of information, and understand how content resonates with different audiences.
    • Market Research: Marketers can leverage the dataset to identify target audiences, understand consumer preferences, and tailor campaigns to specific communities.
    • Content Strategy: Content creators and strategists can use insights from the dataset to craft content that aligns with trending topics and user interests, maximizing engagement.
    • Academic Research: Academics can explore the dynamics of online communities, studying everything from the spread of misinformation to the formation of online subcultures.

    Data Quality and Reliability:

    The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.

    Integration and Usability:

    The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.

    User-Friendly Structure and Metadata:

    The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.

    Ideal For:

    • Data Analysts: Conduct in-depth analyses of subreddit trends, user engagement, and content virality. The dataset’s extensive coverage and AI-enhanced insights make it an invaluable tool for data-driven research.
    • Marketers: Use the dataset to better understand your target audience, tailor campaigns to specific interests, and track the effectiveness of marketing efforts across Reddit.
    • Researchers: Explore consumer behavior data of online communities, analyze the spread of ideas and information, and study the impact of digital media on public discourse, all while leveraging AI-generated insights.

    This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conducting acade...

  6. Landfire 13 Anderson Fire Behavior Fuel Models Version 140 (CONUS) (Image...

    • usfs.hub.arcgis.com
    Updated Jan 8, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Forest Service (2021). Landfire 13 Anderson Fire Behavior Fuel Models Version 140 (CONUS) (Image Service) [Dataset]. https://usfs.hub.arcgis.com/datasets/8bb24481cce1424ea7ab98524e6dc412
    Explore at:
    Dataset updated
    Jan 8, 2021
    Dataset provided by
    U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
    Authors
    U.S. Forest Service
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    The LANDFIRE fuel data describe the composition and characteristics of both surface fuel and canopy fuel. Specific products include fire behavior fuel models, canopy bulk density (CBD), canopy base height (CBH), canopy cover (CC), canopy height (CH), and fuel loading models (FLMs). These data may be implemented within models to predict the behavior and effects of wildland fire. These data are useful for strategic fuel treatment prioritization and tactical assessment of fire behavior and effects.DATA SUMMARY: Thirteen typical surface fuel arrangements or "collections of fuel properties" (Anderson 1982) were described to serve as input for Rothermel's mathematical surface fire behavior and spread model (Rothermel 1972). These fire behavior fuel models represent distinct distributions of fuel loadings found among surface fuel components (live and dead), size classes and fuel types. The fuel models are described by the most common fire carrying fuel type (grass, brush, timber litter or slash), loading and surface area-to-volume ratio by size class and component, fuelbed depth and moisture of extinction. This dataset can be used for fire spread related characteristics models. In fire behavior fuel models, canopy characteristics are used to compute shading, wind reduction factors, spotting distances, crown fuel volume, spread characteristics of crown fires and incorporate the effects of ladder fuels for transitions from a surface to crown fire. Canopy characteristics refer to the tree canopy. Where there are tree canopies, i.e. existing vegetation types that are forest and woodland, LANDFIRE has attributed the grid with canopy characteristics with some exceptions. There will be no canopy characteristics in fuel types where the tree canopy is considered a part of the surface fuel and the surface fire behavior fuel model is chosen as such. This is because LANDFIRE assumes the potential burnable biomass in the tree canopy has been accounted for in the surface fuel model parameters. For example, young or short conifer stands where the trees are represented by a shrub type fuel model will not have canopy characteristics. Field plot data contributed either directly or indirectly to this LANDFIRE National data product.Go to https://landfire.gov/participate_refdata_sub.php for more information regarding contributors of field plot data. LANDFIRE 2014 (lf_1.4.0) used LANDFIRE 2012 (lf_1.3.0) data as a launching point to incorporate disturbance and its severity, both managed and natural, which occurred on the landscape after 2012. Specific examples of disturbance are: fire, vegetation management, wind, and insect and disease. Disturbance data used in the updating is the result of several efforts that include data derived in part from remotely sensed land change methods, Monitoring Trends in Burn Severity (MTBS), and the LANDFIRE events data call. Vegetation growth was modeled where disturbance occurred.�Metadata and Downloads

  7. Household Budget Survey 2008 - Greece

    • catalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Population Statistics and Labour Market Statistics (2019). Household Budget Survey 2008 - Greece [Dataset]. https://catalog.ihsn.org/index.php/catalog/7729
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    Hellenic Statistical Authorityhttp://statistics.gr/
    General Directorate of Statistical Surveys
    Population Statistics and Labour Market Statistics
    Time period covered
    2008
    Area covered
    Greece
    Description

    Abstract

    The Household Budget Survey (HBS) is a national survey collecting information from a representative sample of households, on households’ composition, members’ employment status, living conditions and, mainly, focusing on their members’ expenditure on goods and services as well as on their income. The expenditure information collected from households is very detailed. That is, information is not collected on the basis of total expenditure categories like "food", ‘'clothing - footwear', "health ", etc., but separately for each expenditure, for example, white bread, fresh whole milk, fresh beef etc, footwear for men, footwear for women etc., services of medical analysis laboratories, pharmaceutical products etc.

    The main purpose of the HBS is to determine in detail the household expenditure pattern in order to revise the Consumer Price Index. Moreover, the HBS is the most appropriate source in order to: - Complete the available statistical data for the estimation of the total private consumption; - Study the households expenditures and their structure in relation to their income and other economic, social and demographic characteristics; - Analyze the changes in the living conditions of the households in comparison with the previous surveys; - Study the relationship between households purchases and receipts in kind; - Study low income limits in the different socio-economic categories and population groups; - Study the changes in the nutritional habits of the households.

    Geographic coverage

    National coverage

    Analysis unit

    • Households,
    • Individuals.

    Kind of data

    Sample survey data [ssd]

    Frequency of data collection

    The frequency of data collection is continual spread within the reference year.

    Sampling procedure

    The two-stage area stratified sampling was applied for the Household Budget Survey 2008. The sample of private households was selected in two stages. The primary units are the areas (one or more unified building blocks) and the ultimate sampling units selected in each sampling area are the households.It is estimated that 4.000 questionnaires will be filled in (number equal to, approximately, 1/1000 of the households within the whole Greek territory).

    Mode of data collection

    Face-to-face [f2f]

  8. f

    Data from: Testing and Estimation of Social Network Dependence With Time to...

    • tandf.figshare.com
    txt
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lin Su; Wenbin Lu; Rui Song; Danyang Huang (2024). Testing and Estimation of Social Network Dependence With Time to Event Data [Dataset]. http://doi.org/10.6084/m9.figshare.8132456.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Lin Su; Wenbin Lu; Rui Song; Danyang Huang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nowadays, events are spread rapidly along social networks. We are interested in whether people’s responses to an event are affected by their friends’ characteristics. For example, how soon will a person start playing a game given that his/her friends like it? Studying social network dependence is an emerging research area. In this work, we propose a novel latent spatial autocorrelation Cox model to study social network dependence with time-to-event data. The proposed model introduces a latent indicator to characterize whether a person’s survival time might be affected by his or her friends’ features. We first propose a score-type test for detecting the existence of social network dependence. If it exists, we further develop an EM-type algorithm to estimate the model parameters. The performance of the proposed test and estimators are illustrated by simulation studies and an application to a time-to-event dataset about playing a popular mobile game from one of the largest online social network platforms. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

  9. d

    Data from: Hydrochemical Atlas of the Arctic Ocean

    • search.dataone.org
    • doi.pangaea.de
    Updated Jan 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikiforov, Sergey L; Colony, Roger; Timokhov, Leonid; Arctic and Antarctic Research Institute of the Russian Federal Service for Hydrometeorology and Environmental Monitoring, International Arctic Research Center, University of Alaska, Fairbanks, St. Petersburg (2018). Hydrochemical Atlas of the Arctic Ocean [Dataset]. http://doi.org/10.1594/PANGAEA.691332
    Explore at:
    Dataset updated
    Jan 5, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Nikiforov, Sergey L; Colony, Roger; Timokhov, Leonid; Arctic and Antarctic Research Institute of the Russian Federal Service for Hydrometeorology and Environmental Monitoring, International Arctic Research Center, University of Alaska, Fairbanks, St. Petersburg
    Time period covered
    Apr 16, 1948 - Sep 17, 2000
    Area covered
    Arctic Ocean
    Description

    Introduction: Chemical composition of water determines its physical properties and character of processes proceeding in it: freezing temperature, volume of evaporation, density, color, transparency, filtration capacity, etc. Presence of chemical elements in water solution confers waters special physical properties exerting significant influence on their circulation, creates necessary conditions for development and inhabitance of flora and fauna, and imparts to the ocean waters some chemical features that radically differ them from the land waters (Alekin & Liakhin, 1984). Hydrochemical information helps to determine elements of water circulation, convection depth, makes it easier to distinguish water masses and gives additional knowledge of climatic variability of ocean conditions. Hydrochemical information is a necessary part of biological research. Water chemical composition can be the governing characteristics determining possibility and limits of use of marine objects, both stationary and moving in sea water. Subject of investigation of hydrochemistry is study of dynamics of chemical composition, i.e. processes of its formation and hydrochemical conditions of water bodies (Alekin & Liakhin 1984). The hydrochemical processes in the Arctic Ocean are the least known. Some information on these processes can be obtained in odd publications. A generalizing study of hydrochemical conditions in the Arctic Ocean based on expeditions conducted in the years 1948-1975 has been carried out by Rusanov et al. (1979). The "Atlas of the World Ocean: the Arctic Ocean" contains a special section "Hydrochemistry" (Gorshkov, 1980). Typical vertical profiles, transects and maps for different depths - 0, 100, 300, 500, 1000, 2000, 3000 m are given in this section for the following parameters: dissolved oxygen, phosphate, silicate, pH and alkaline-chlorine coefficient. The maps were constructed using the data of expeditions conducted in the years 1948-1975. The illustrations reflect main features of distribution of the hydrochemical elements for multi-year period and represent a static image of hydrochemical conditions. Distribution of the hydrochemical elements on the ocean surface is given for two seasons - winter and summer, for the other depths are given mean annual fields. Aim of the present Atlas is description of hydrochemical conditions in the Arctic Ocean on the basis of a greater body of hydrochemical information for the years 1948-2000 and using the up-to-date methods of analysis and electronic forms of presentation of hydrochemical information. The most wide-spread characteristics determined in water samples were used as hydrochemical indices. They are: dissolved oxygen, phosphate, silicate, pH, total alkalinity, nitrite and nitrate. An important characteristics of water salt composition - "salinity" has been considered in the Oceanographic Atlas of the Arctic Ocean (1997, 1998). Presentation of the hydrochemical characteristics in this Hydrochemical Atlas is wider if compared with that of the former Atlas (Gorshkov, 1980). Maps of climatic distribution of the hydrochemical elements were constructed for all the standard depths, and seasonal variability of the hydrochemical parameters is given not only for the surface, but also for the underlying standard depths up to 400 m and including. Statistical characteristics of the hydrochemical elements are given for the first time. Detailed accuracy estimates of initial data and map construction are also given in the Atlas. Calculated values of mean-root deviations, maximum and minimum values of the parameters demonstrate limits of their variability for the analyzed period of observations. Therefore, not only investigations of chemical statics are summarized in the Atlas, but also some elements of chemical dynamics are demonstrated. Digital arrays of the hydrochemical elements obtained in nodes of a regular grid are the new form of characteristics presentation in the Atlas. It should be mentioned that the same grid and the same boxes were used in the Atlas, as those that had been used by creation of the US-Russian climatic Oceanographic Atlas. It allows to combine hydrochemical and oceanographic information of these Atlases. The first block of the digital arrays contains climatic characteristics calculated using direct observational data. These climatic characteristics were not calculated in the regions without observations, and the information arrays for these regions have gaps. The other block of climatic information in a gridded form was obtained with the help of objective analysis of observational data. Procedure of the objective analysis allowed us to obtain climatic estimates of the hydrochemical characteristics for the whole water area of the Arctic Ocean including the regions not covered by observations. Data of the objective analysis can be wide... Visit https://dataone.org/datasets/8f096d0c7e2f5962ce4828cc6ea59572 for complete metadata about this dataset.

  10. n

    Data from: Seed size, seed dispersal traits, and plant dispersion patterns...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Mar 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yvette Ortega; Dean Pearson; Jane Tuthill (2023). Seed size, seed dispersal traits, and plant dispersion patterns for native and introduced grassland plants [Dataset]. http://doi.org/10.5061/dryad.2z34tmpr2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2023
    Dataset provided by
    US Forest Service
    University of Montana
    Authors
    Yvette Ortega; Dean Pearson; Jane Tuthill
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Most terrestrial plants disperse by seeds, yet the relationship between seed mass, seed dispersal traits, and plant dispersion is poorly understood. We quantified seed traits for 48 species of native and introduced plants from grasslands of western Montana, USA, to investigate the relationships between seed traits and plant dispersion patterns. Additionally, because the linkage between dispersal traits and dispersion patterns might be stronger for actively dispersing species, we compared these patterns between native and introduced plants. Finally, we evaluated the efficacy of a global trait database, the TRY plant traits database, versus locally collected data for examining these questions. This archive contains species-level data used in analyses, including species metadata (origin, growth form), mean values of measured seed traits (size metrics and type of dispersal structures), two metrics of dispersion (local and broad scales, respectively) derived from grassland surveys in the study region, and information on the seed mass accessed from the TRY traits database. Note that the latter seed mass data could not be included in the archive, but can be acquired directly from the TRY plant traits database (https://www.try-db.org/TryWeb/Home.php). Methods Our study took place in semi-arid grasslands of the Intermountain Region in western Montana, U.S.A. The native system is dominated primarily by bluebunch wheatgrass (Pseudoroegneria spicata) with other grasses and a great variety of forbs diversifying the system, but it is heavily invaded by exotics. We identified our study species, comprised of 23 native and 25 exotic species, to reflect a range of dispersion patterns by using data from 620 1-m2 vegetation plots from 31 grassland sites spread over 20,000 km2 of western Montana. Plant dispersion patterns were defined at a local-scale by the proportion of plots occupied within a site and at a broad-scale by the proportion of sites occupied per species. For each species, we collected at least 50 seeds from each of 10 plants at each of 3 locations in Missoula and Lake County, Montana in either 2020 or 2021. Collection locations were chosen opportunistically based on species presence and hence differed by species. Although these locations did not align with sites surveyed for species dispersion per se, they were generally drawn from the central portion of the study area. Seeds were stored in a laboratory under ambient conditions until measurements were taken, at which point they were cleaned by hand and sorted based primarily on visual characteristics to remove potentially non-viable seeds. To determine the mean seed mass per species, we weighed a fixed number of samples (three or four) from each of the three locations. The number of seeds weighed per sample was set per species to ensure a total mass >1.5 mg, the minimum reading needed for an accuracy of 2% per the specifications of the balance. For 32 of our 48 species, only 10 seeds were needed to reach this minimum. For remaining species, we increased the number of seeds included per sample in increments of 10 (range 20–150 seeds/sample) until the minimum mass was reached. Seed mass included the entire diaspore (e.g., endosperm, seed coat, awns, and dispersal appendages) to ensure that all species could be treated in the same way (e.g., dispersal appendages such as wings would have been very difficult to remove from small-seeded species). Though the inclusion of dispersal appendages potentially biases seed mass estimates for this subset of species, we note that this bias should be small relative to the large variation in seed mass across species. Indeed, estimates for three exotic species (Lactuca serriola, Taraxacum officinale, and Tragopogon dubius) with pappuses showed that these structures increased seed mass measures by <12%. For the remaining measurements, we used a ProgRes C10 camera (Jenoptik, CCD/CMOS) to create images of 20 seeds per species drawn from the 3 sampling locations (n=6 from two locations and n=8 from the third, chosen randomly). We used the images to obtain the following measurements for each seed via ImageJ software (Rasband 1997-2018): seed length (maximum), seed width (maximum), and seed surface area. These seed measurements excluded dispersal structures. Mean values per species for all seed size measurements are included in the species-level dataset archived here. Finally, we inspected seeds to determine whether seeds of each species possessed dispersal structures including pappuses, awns, wings, or plumes. For smaller-seeded species, we accomplished this using the seed images and also checked the literature to assure that dispersal structures were not missed. To enable comparison of empirical seed measures to those available in online trait databases, we used the TRY plant trait database (accessed 22 September – 7 October 2022), a global database integrating ~700 datasets including other major collective databases. This database included seed mass data for 44 of our 48 species but contained insufficient data to evaluate the other seed traits (i.e., length, width, and surface area) we measured (i.e., for only 2-40% of our study species). Importantly, 63% of n=831 seed mass records obtained from the TRY database could not be used in analyses. This is because these contained duplicate data that resulted from the consolidation of many datasets with common sources. See the publication for a full description of our process for identifying duplicate values. Remaining seed mass values from the TRY database were averaged to generate the mean estimate used in analyses. See the archive for sample size information per species.

  11. i

    Employment and Unemployment Survey 2014, Economic Research Forum (ERF)...

    • catalog.ihsn.org
    Updated Jun 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic Research Forum (2017). Employment and Unemployment Survey 2014, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/6954
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Department of Statistics
    Economic Research Forum
    Time period covered
    2014
    Area covered
    Jordan
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    The Department of Statistics (DOS) carried out four rounds of the 2014 Employment and Unemployment Survey (EUS) during February, May, August and November 2014. The survey rounds covered a total sample of about fifty three thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design.

    It is worthy to mention that the DOS employed new technology in data collection and data processing. Data was collected using electronic questionnaire instead of a hard copy, namely a hand held device (PDA).

    The survey main objectives are:

    • To identify the demographic, social and economic characteristics of the population and manpower.
    • To identify the occupational structure and economic activity of the employed persons, as well as their employment status.
    • To identify the reasons behind the desire of the employed persons to search for a new or additional job.
    • To measure the economic activity participation rates (the number of economically active population divided by the population of 15+ years old).
    • To identify the different characteristics of the unemployed persons.
    • To measure unemployment rates (the number of unemployed persons divided by the number of economically active population of 15+ years old) according to the various characteristics of the unemployed, and the changes that might take place in this regard.
    • To identify the most important ways and means used by the unemployed persons to get a job, in addition to measuring durations of unemployment for such persons.
    • To identify the changes overtime that might take place regarding the above-mentioned variables.

    The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

    Geographic coverage

    Covering a sample representative on the national level (Kingdom), governorates, and the three Regions (Central, North and South).

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey covered a national sample of households and all individuals permanently residing in surveyed households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    Survey Frame

    The sample of this survey is based on the frame provided by the data of the Population and Housing Census, 2004. The Kingdom was divided into strata, where each city with a population of 100,000 persons or more was considered as a large city. The total number of these cities is 6. Each governorate (except for the 6 large cities) was divided into rural and urban areas. The rest of the urban areas in each governorate was considered as an independent stratum. The same was applied to rural areas where it was considered as an independent stratum. The total number of strata was 30.

    In view of the existing significant variation in the socio-economic characteristics in large cities in particular and in urban in general, each stratum of the large cities and urban strata was divided into four sub-stratum according to the socio- economic characteristics provided by the population and housing census with the purpose of providing homogeneous strata.

    The frame excludes the population living in remote areas (most of whom are nomads), In addition to that, the frame does not include collective dwellings, such as hotels, hospitals, work camps, prisons and alike.

    Sample Design

    The sample of this survey was designed, using the two-stage cluster stratified sampling method. The main sample was designed in 2009 based on the data of the population and housing census 2004 for carrying out household surveys. The sample representative on the Kingdom, rural, urban, regions and governorates levels. The total sample size for each round was 1336 PSUs (clusters). These units were distributed to governorates urban, rural and large cities in each governorate according to the weight of persons and households and according to the variance within each stratum. Slight modifications regarding the number of these units were made to cope with the multiple of 8, the number of clusters for four rounds was 53432.

    The main sample is consisted of 40 replicates, each replicate is consisted of 167 Primary Sampling Units (PSUs). For the purpose of each round, eight replicates of the main sample were used. The Primary Sampling Units (PSUs) were ordered within each stratum according to geographic characteristics and then according to socio-economic characteristics in order to ensure good spread of the sample. Then, the sample was selected on two stages, in the first stage, The Primary Sampling Units (PSUs) were selected, using the Probability Proportionate to Size with systematic selection procedure. The number of households, in each primary sampling unit (cluster) served as its weight or size. In the second stage, the blocks of the primary sampling units (cluster) which were selected in the first stage have been updated. Then a constant number of households (10 households) was selected, using the random systematic sampling method as final PSUs from each PSU (cluster).

    Sampling notes

    It is noteworthy that the sample of the present survey does not represent the non-Jordanian population, due to the fact that it is based on households living in conventional dwellings. In other words, it does not cover the collective households living in collective dwellings. Therefore, the non-Jordanian households covered in the present survey are either private households or collective households living in conventional dwellings. In Jordan, it is well known that a large number of non-Jordanian workers live as groups and spend most of their time at workplaces. Hence, it is more unlikely to find them at their residences during daytime (i.e. the time when the data of the survey is collected). Furthermore, most of them live in their workplaces, such as: workshops, sales stores, guard places, or under construction building's sites. Such places are not classified as occupied dwellings for household sampling purposes. Due to all of the above, the coverage of such population would not be complete in household surveys.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The questionnaire was designed electronically on the PDA and revised by the DOS technical staff. It was finalized upon completion of the training program. The questionnaire is divided into main topics, each containing a clear and consistent group of questions, and designed in a way that facilitates the electronic data entry and verification. The questionnaire includes the characteristics of household members in addition to the identification information, which reflects the administrative as well as the statistical divisions of the Kingdom.

    Cleaning operations

    Raw Data

    A tabulation results plan has been set based on the previous Employment and Unemployment Surveys while the required programs were prepared and tested. When all prior data processing steps were completed, the actual survey results were tabulated using an ORACLE package. The tabulations were then thoroughly checked for consistency of data. The final report was then prepared, containing detailed tabulations as well as the methodology of the survey.

    Harmonized Data

    • The SPSS package is used to clean and harmonize the datasets.
    • The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
    • All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.
    • A post-harmonization cleaning process is then conducted on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    The results of the fieldwork indicated that all sample households were visited. The number of successfully completed interviews was 48436, that is 90.8 percent of the total sample households.

    Among the reasons of un-successful interviews (although three callbacks were made) 1.8 percent of the dwellings were closed at time of the visit.

    The findings also indicate that the response rate is 95.5 percent, based on dividing the number of completed questionnaires by the number of expected completed interviews, that is after excluding the vacant dwellings.

    More information on the distribution of interviews by region, governorate and visit results is available in table (E) in Page 4 of the annual report provided among the disseminated survey materials under a file named "Jordan 2014- Annual report (English).pdf".

    Sampling error estimates

    Sampling errors were calculated

  12. Customer Shopping Trends Dataset

    • kaggle.com
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sourav Banerjee
    Description

    Context

    The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

    Content

    This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

    Dataset Glossary (Column-wise)

    • Customer ID - Unique identifier for each customer
    • Age - Age of the customer
    • Gender - Gender of the customer (Male/Female)
    • Item Purchased - The item purchased by the customer
    • Category - Category of the item purchased
    • Purchase Amount (USD) - The amount of the purchase in USD
    • Location - Location where the purchase was made
    • Size - Size of the purchased item
    • Color - Color of the purchased item
    • Season - Season during which the purchase was made
    • Review Rating - Rating given by the customer for the purchased item
    • Subscription Status - Indicates if the customer has a subscription (Yes/No)
    • Shipping Type - Type of shipping chosen by the customer
    • Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)
    • Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)
    • Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction
    • Payment Method - Customer's most preferred payment method
    • Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

    Structure of the Dataset

    https://i.imgur.com/6UEqejq.png" alt="">

    Acknowledgement

    This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

    Cover Photo by: Freepik

    Thumbnail by: Clothing icons created by Flat Icons - Flaticon

  13. w

    Plan Foncier Rural Impact Evaluation 2018 - Benin

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Feb 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thea Hilhorst (2021). Plan Foncier Rural Impact Evaluation 2018 - Benin [Dataset]. https://microdata.worldbank.org/index.php/catalog/3850
    Explore at:
    Dataset updated
    Feb 16, 2021
    Dataset provided by
    Daniel Ali Ayalew
    Klaus Deininger
    Thea Hilhorst
    Time period covered
    2018
    Area covered
    Benin
    Description

    Abstract

    The PFR activities to be evaluated at end-line consists mainly of demarcation and registration of land parcels (under customary tenure) as Titre Foncier or an Attestation de Droit Coutumière. The impact evaluation aims to quantify and analyse impact of these interventions on productivity and food security disaggregated by target groups and gender.

    The research questions to be answered after the endline data collection are:

    1) Do PFRs (or ADCs) contribute to a perception of greater land tenure security? 2) Does improved tenure security lean to a growth in agricultural investment and/or changes to management of land? 3) Do PFRs improve access to land and rights over land among marginalised groups (women, youth and migrants)? 4) Do PFRs lead to an increased number of land transactions? 5) Does increased land security address existing constraints on land markets and lead to more efficient allocation of land resources and thereby an increase in productivity? 6) Do property rights and improved user rights result in better access to credit, possibly allowing for income diversification and thus increasing household welfare? 7) Do the new arrangements put in place during the implementation of the PFRs facilitate the resolution of land conflicts, or even prevent the emergence of these land conflicts?

    Geographic coverage

    The clusters were spread across the communes of Bembéréké, Sinendé and Kalalé in the north and Tchaourou in the south of the department of Borgou.

    Analysis unit

    • Villages
    • Households

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The impact evaluation consists of gender and youth disaggregated data collection at base line, before the start of the intervention, in both the treatment and control villages. End line data will be collected at least 2 growing seasons after issuing of documentation to farmers.

    The sample consisted of 2968 households, which were taken from 26 villages selected for the implementation of a Plan Foncier Rural (PFR), or rural landholding plans, these were the treatment villages and 27 control villages that did not benefit from a PFR.

    The treatment villages were assigned by the ProPFR team in geographic clusters. The assignment of control villages followed this geographic clustering, also using further village level data with the aim of finding similar villages to maximize comparability. These clusters were spread across the communes of Bembéréké, Sinendé and Kalalé in the north and Tchaourou in the south of the department of Borgou.

    Villages were selected from 11 geographical clusters of villages facing similar issues, allowing easier logistical planning for the rollout of the PFRs.

    Villages selected to be part of the programme had the following characteristics: • Bordering/near to a classified national forest • At high risk of land grabbing, • The presence of another GIZ supported SEWOH project1 • Agropastoral areas (in particular the presence of transhumance –cattle driving - corridors)

    But should not have the following: • Villages bordering Nigeria, within the band of increased security • MCA intervention with a PFR • Suffered serious conflict which could block the realisation of a PFR, or where a PFR may reignite past conflicts.

    These characteristics alongside the desire of the implementing team to select villages in clusters, for practical reasons presented the first challenge in selecting suitable comparison villages to measure the impact of the ProPFR programme. Clustering meant that villages selected for comparison should be near the clusters to be comparable, but given the typical geography of villages in northern Benin, in that most people live in the village centre rather than spread evenly with sufficient density at the village boundary, and the lack of clearly defined village boundaries, a geographic discontinuity could not be exploited.

    The second challenge in selecting comparison villages arose due to a change in the village definitions in 2013, when Benin changed from 3758 to 5290 villages which is often referred to as the “nouveau découpage”. Some old villages were split but there are no clearly defined village boundaries for the new set of villages. ProPFR selected from among the new villages, so the control villages also needed to be selected from this list. Given that the last census was collected prior to this new definition of villages, no data about the villages existed that could easily be used in matching villages to those selected for the ProPFR.

    Due to this lack of data on the characteristics of the people residing in the villages, Geographical Information Systems (GIS) data were used to match each of the treatment PFR villages to a control village. Villages which were previously included in the MCA’s wave of PFRs were excluded from our study due to the difficulty in separating the effects of the two programs (MCA vs ProPFR). For each PFR village, a buffer of 20km was drawn and the union constructed for each cluster. Within this area, other villages were considered as a potential control village. Of the selection criteria, the only one applicable from GIS data is the proximity to a national forest. Where villages were close to a national forest, we attempted to match it with a control village also close to a national forest. The additional criteria on which villages were matched were the proximity to a main road (as classified by the Open Street Map shapefiles for roads) and the number of buildings in the central agglomeration of a village. Main roads are used as a proxy for access to markets and thereby potentially income levels.

    The size of a village and the amount of land which can be used around it will be influenced by the size of the population as well as the presence of national forests. This strategy is similar to a Coarsened Exact Matching (CEM) strategy (see Blackwell et al, 2009), in which key characteristics are reduced (perhaps from continuous variables) to a small number of categories and matched with one another exactly. In our selection of villages, one control village was selected for each treatment village based on the key characteristics, defined as proximity to national forests (5km) and main roads (1km), and having a similar number of buildings (within 1km of the central point).

    For a small number of villages, we faced an issue of common support, meaning there were no exact matches on the key characteristics. In this case other nearby villages were selected which fulfilled as many of these characteristics as possible. Data were collected on a wide range of variables following the theory of change, which states that the improvements in institutions and the PFRs may lead to improved perceived land tenure security and improved access to land for women and young men through the activities carried out by the ProPFR team. This perceived land tenure security is often seen as key to agricultural investments and thereby food security in the long term, as it allows long-term planning. The issuing of official documentation provides collateral for a loan should households wish to borrow and invest in productive activities or smooth consumption.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The Survey comprised two questionnaires namely:

    1. Household Questionnaire: Which comprised 14 modules with 7 rosters. Modules include household members, employment and enterprises, durable goods, housing, census of non-agricultural plots, agricultural plots, land donations, land sales, land losses, perceptions on land tenure, participation in PFR, loans, food security, young men and women.

    2. Community (village) questionnaire: The community survey was administrated to each village in the form of small group interviews to collect information on the socio-economic characteristics of these villages, local land tenure structures and practices, and local prices on agricultural inputs and production. The questionnaire was organized in 9 modules: characteristics of the survey participants, land tenure, land use, land market, land conflicts, other village structures and interventions, agriculture, PFR, and village chief. The characteristics of the participants were recorded in a separate roster.

    The extensive household survey was first asked to the household head with additional modules to be answered by the wife of the household head (or the female household head) as well as a young male (defined as an unmarried man, aged 18-35).

    Cleaning operations

    Various consistency checks were performed to ensure data quality, including systematic reports of contradictory answers and of extreme values. Throughout the data collection process, two main issues were reported. The first pertains to the sampling methodology of buildings, that led to the necessary replacement of pre-selected non-housing buildings. However, just short of 500 households required replacement. The majority of the buildings replaced were not residential buildings and were therefore not eligible for inclusion in the survey. These were replaced by the next building in the random order of buildings. The number of buildings for which nobody could be found for surveying was very low (23), thanks to the robust replacement protocol.

    The second issue concerns the refusal of the village Sombouan 2 to participate in the survey. Despite several attempts, this village had to be excluded from the survey. The data were also examined for missing information for required variables, and sections. Any problems found were then reported back to the supervisors where the correction was then made.

    Response rate

    The response rate for

  14. Household Survey on Information and Communications Technology, 2014 - West...

    • pcbs.gov.ps
    Updated Jan 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palestinian Central Bureau of statistics (2020). Household Survey on Information and Communications Technology, 2014 - West Bank and Gaza [Dataset]. https://www.pcbs.gov.ps/PCBS-Metadata-en-v5.2/index.php/catalog/465
    Explore at:
    Dataset updated
    Jan 28, 2020
    Dataset provided by
    Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
    Authors
    Palestinian Central Bureau of statistics
    Time period covered
    2014
    Area covered
    Gaza Strip, West Bank, Gaza
    Description

    Abstract

    Within the frame of PCBS' efforts in providing official Palestinian statistics in the different life aspects of Palestinian society and because the wide spread of Computer, Internet and Mobile Phone among the Palestinian people, and the important role they may play in spreading knowledge and culture and contribution in formulating the public opinion, PCBS conducted the Household Survey on Information and Communications Technology, 2014.

    The main objective of this survey is to provide statistical data on Information and Communication Technology in the Palestine in addition to providing data on the following: -

    · Prevalence of computers and access to the Internet. · Study the penetration and purpose of Technology use.

    Geographic coverage

    Palestine (West Bank and Gaza Strip) , type of locality (Urban, Rural, Refugee Camps) and governorate

    Analysis unit

    Household. Person 10 years and over .

    Universe

    All Palestinian households and individuals whose usual place of residence in Palestine with focus on persons aged 10 years and over in year 2014.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sampling Frame The sampling frame consists of a list of enumeration areas adopted in the Population, Housing and Establishments Census of 2007. Each enumeration area has an average size of about 124 households. These were used in the first phase as Preliminary Sampling Units in the process of selecting the survey sample.

    Sample Size The total sample size of the survey was 7,268 households, of which 6,000 responded.

    Sample Design The sample is a stratified clustered systematic random sample. The design comprised three phases:

    Phase I: Random sample of 240 enumeration areas. Phase II: Selection of 25 households from each enumeration area selected in phase one using systematic random selection. Phase III: Selection of an individual (10 years or more) in the field from the selected households; KISH TABLES were used to ensure indiscriminate selection.

    Sample Strata Distribution of the sample was stratified by: 1- Governorate (16 governorates, J1). 2- Type of locality (urban, rural and camps).

    Sampling deviation

    -

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The survey questionnaire consists of identification data, quality controls and three main sections: Section I: Data on household members that include identification fields, the characteristics of household members (demographic and social) such as the relationship of individuals to the head of household, sex, date of birth and age.

    Section II: Household data include information regarding computer processing, access to the Internet, and possession of various media and computer equipment. This section includes information on topics related to the use of computer and Internet, as well as supervision by households of their children (5-17 years old) while using the computer and Internet, and protective measures taken by the household in the home.

    Section III: Data on persons (aged 10 years and over) about computer use, access to the Internet and possession of a mobile phone.

    Cleaning operations

    Preparation of Data Entry Program: This stage included preparation of the data entry programs using an ACCESS package and defining data entry control rules to avoid errors, plus validation inquiries to examine the data after it had been captured electronically.

    Data Entry: The data entry process started on 8 May 2014 and ended on 23 June 2014. The data entry took place at the main PCBS office and in field offices using 28 data clerks.

    Editing and Cleaning procedures: Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.

    Response rate

    Response Rates= 79%

    Sampling error estimates

    There are many aspects of the concept of data quality; this includes the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are three components to the quality of statistics: accuracy, comparability, and quality control procedures.

    Checks on data accuracy cover many aspects of the survey and include statistical errors due to the use of a sample, non-statistical errors resulting from field workers or survey tools, and response rates and their effect on estimations. This section includes:

    Statistical Errors Data of this survey may be affected by statistical errors due to the use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators.

    Variance calculations revealed that there is no problem in disseminating results nationally or regionally (the West Bank, Gaza Strip), but some indicators show high variance by governorate, as noted in the tables of the main report.

    Non-Statistical Errors Non-statistical errors are possible at all stages of the project, during data collection or processing. These are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the field workers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, and practical and theoretical training took place during the training course. Training manuals were provided for each section of the questionnaire, along with practical exercises in class and instructions on how to approach respondents to reduce refused cases. Data entry staff were trained on the data entry program, which was tested before starting the data entry process.

    Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.

    The sources of non-statistical errors can be summarized as: 1. Some of the households were not at home and could not be interviewed, and some households refused to be interviewed. 2. In unique cases, errors occurred due to the way the questions were asked by interviewers and respondents misunderstood some of the questions.

  15. Data from: DC3 Miscellaneous NSF/NCAR GV-HIAPER Data

    • data.nasa.gov
    • datasets.ai
    • +1more
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). DC3 Miscellaneous NSF/NCAR GV-HIAPER Data [Dataset]. https://data.nasa.gov/dataset/dc3-miscellaneous-nsf-ncar-gv-hiaper-data-270ca
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    DC3_Miscellaneous_NSF-GV-HIAPER_Data are miscellaneous data collected onboard the DC-8 aircraft during the Deep Convective Clouds and Chemistry (DC3) field campaign. This product features data from the Global Forecast System (GFS) model. Data collection for this product is complete.The Deep Convective Clouds and Chemistry (DC3) field campaign sought to understand the dynamical, physical, and lightning processes of deep, mid-latitude continental convective clouds and to define the impact of these clouds on upper tropospheric composition and chemistry. DC3 was conducted from May to June 2012 with a base location of Salina, Kansas. Observations were conducted in northeastern Colorado, west Texas to central Oklahoma, and northern Alabama in order to provide a wide geographic sample of storm types and boundary layer compositions, as well as to sample convection.DC3 had two primary science objectives. The first was to investigate storm dynamics and physics, lightning and its production of nitrogen oxides, cloud hydrometeor effects on wet deposition of species, surface emission variability, and chemistry in anvil clouds. Observations related to this objective focused on the early stages of active convection. The second objective was to investigate changes in upper tropospheric chemistry and composition after active convection. Observations related to this objective focused on the 12-48 hours following convection. This objective also served to explore seasonal change of upper tropospheric chemistry.In addition to using the NSF/NCAR Gulfstream-V (GV) aircraft, the NASA DC-8 was used during DC3 to provide in-situ measurements of the convective storm inflow and remotely-sensed measurements used for flight planning and column characterization. DC3 utilized ground-based radar networks spread across its observation area to measure the physical and kinematic characteristics of storms. Additional sampling strategies relied on lightning mapping arrays, radiosondes, and precipitation collection. Lastly, DC3 used data collected from various satellite instruments to achieve its goals, focusing on measurements from CALIOP onboard CALIPSO and CPL onboard CloudSat. In addition to providing an extensive set of data related to deep, mid-latitude continental convective clouds and analyzing their impacts on upper tropospheric composition and chemistry, DC3 improved models used to predict convective transport. DC3 improved knowledge of convection and chemistry, and provided information necessary to understanding the processes relating to ozone in the upper troposphere.

  16. Spectral dataset of daylights and surface properties of natural objects...

    • zenodo.org
    bin, csv
    Updated Aug 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Takuma Morimoto; Takuma Morimoto; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa (2024). Spectral dataset of daylights and surface properties of natural objects measured in Japan [Dataset]. http://doi.org/10.5281/zenodo.5217752
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Aug 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Takuma Morimoto; Takuma Morimoto; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa; Cong Zhang; Kazuho Fukuda; Keiji Uchikawa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Japan
    Description

    This is a spectral dataset of natural objects and daylights collected in Japan.

    We collected 359 natural objects and measured the reflectance of all objects and the transmittance of 75 leaves. We also measured daylights from dawn till dusk on four different days using a white plate placed (i) under the direct sun and (ii) under the casted shadow (in total 359 measurements). We also separately measured daylights at five different locations (including a sports ground, a space between tall buildings and a forest) with minimum time intervals to reveal the influence of surrounding environments on the spectral composition of daylights reaching the ground (in total 118 measurements).

    If you use this dataset in your research, please cite the following publication.

    Morimoto, T., Zhang, C., Fukuda, K., & Uchikawa, K. (2022). Spectral measurement of daylights and surface properties of natural objects in Japan. Optics express, 30(3), 3183. https://doi.org/10.1364/OE.441063

    Dataset contains following Excel spread sheets and csv files:

    (A) Surface properties of natural objects

    (A-1) Reflectance_ver1-2.xlsx and .csv

    (A-2) Transmittance_FrontSideUp_ver1-2.xlsx and .csv

    (A-2) Transmittance_BackSideUp_ver1-2.xlsx and .csv

    (B) Daylight measurements

    (B-1) Daylight_TimeLapse_v1-2.xlsx and .csv

    (B-2) Daylight_DifferentLocations_v1-2.xlsx and .csv

    Data description

    (A) Surface properties

    (A-1) Reflectance_ver1-2.xlsx and .csv

    This file contains surface spectral reflectance data (380 - 780 nm, 5 nm step) of 359 natural objects, including 200 flowers, 113 leaves, 23 fruits, 6 vegetables, 8 barks, and 9 stones measured by a spectrophotometer (SR-2A, Topcon, Tokyo, Japan). Photos of all samples are included in the .xlsx file.

    For the analysis presented in the paper, we identified reflectance pairs that have a Pearson’s correlation coefficient across 401 spectral channels of more than 0.999 and removed one of reflectances from each pair. The column 'Used in analysis' indicates whether or not each sample is used for the analysis (TRUE indicates used and FALSE indicate not used).

    At the time of collection, we noted the scientific names of flowers, leaves and barks from a name board provided by the Tokyo Institute of Technology in which samples are collected. If not available, we used a smartphone software which automatically identifies the scientific name from an input image (PictureThis - Plant Identifier developed by Glority Global Group Ltd.). The names of 2 flowers and 9 stones whose name could not be identified through either method were left blank.

    (A-2) Transmittance_FrontSideUp_v1-2.xlsx and .csv

    This file contains surface spectral transmittance data (380 - 780 nm, 5 nm step) for 75 leaves measured by a spectrophotometer (SR-2A, Topcon, Tokyo, Japan). Photos of all samples are included in the .xlsx file.

    For this data, the transmittance was measured with the front-side of leaves up (the light was transmitted from the back side of the leaves). This is the data presented in the associated article.

    (A-3) Transmittance_BackSideUp_v1-2.xlsx and .csv

    Spectral transmittance data of the same leaves presented in (A-2).

    For this data, the transmittance was measured with the back-side of leaves up (the light was transmitted from the front side of the leaves).

    (B) Daylight measurements

    (B-1) Daylight_TimeLapse_ver1-2.xlsx and .csv

    This file contains daylight spectra from sunrise to sunset on four different days (2013/11/20, 2013/12/24, 2014/07/03 and 2014/10/27) measured by a spectrophotometer (SR-LEDW, Topcon, Tokyo, Japan) with a wavelength range from 380 nm to 780 nm with 1 nm step. We measured the reflected light from the white calibration plate placed either under a direct sunlight or under a casted shadow.

    The column 'Cloud cover' provides visual estimate of percentage of cloud cover across the sky at the time of each measurement. The column 'Red lamp' indicates whether an aircraft warning lamp at the measurement site was on (circle) or off (blank).

    (B-2) Daylight_DifferentLocations_ver1-2.xlsx and .csv

    This file includes daylight spectra measured at five different sites within the Suzukakedai Campus of Tokyo Institute of Technology with minimum time gap on 2014/07/08, using a spectroradiometer (IM-1000, Topcon) from 380 nm to 780 nm with 1 nm step. The instrument was oriented either towards the sun or towards the zenith sky. When the instrument was oriented to the sun, we measured spectra in two ways: (i) one using a black cylinder covering the photodetector and (ii) the other without using a cylinder.

    The column 'Cylinder' indicates whether the black cylinder was used (circle) or not (cross). The column 'Cloud cover' shows the visual estimate of percentage of cloud cover at the time of each measurement. The column 'Sun hidden in clouds' denotes whether the measurement was taken when the sun was covered by clouds (circle) or not (blank).

  17. Labor Force Survey, LFS 2013-2014 - Yemen

    • erfdataportal.com
    Updated Oct 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ILO Regional Office for Arab States (2017). Labor Force Survey, LFS 2013-2014 - Yemen [Dataset]. http://www.erfdataportal.com/index.php/catalog/132
    Explore at:
    Dataset updated
    Oct 15, 2017
    Dataset provided by
    International Labour Organizationhttp://www.ilo.org/
    Economic Research Forum
    Central Statistical Organization
    Time period covered
    2013 - 2014
    Area covered
    Yemen
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL STATISTICAL ORGANIZATION OF YEMEN (CSO)

    The primary objective of LFS 2013-2014 was to provide current data on the employment and unemployment situation at national and governorate level using the preliminary version of the new standards concerning statistics of work, employment and labour underutilization on adopted by the 19th International Conference of Labour Statisticians (Geneva, October 2013).

    ---> The survey was then designed to meet five main measurement objectives as follows: 1- To provide current data on the number of employed, unemployed, and underemployed, and their demographic and social characteristics, including the size of women's participation in economic activity with a view to future policies in expanding their participation in the labour market. 2- To collect data on qualifications of the labour force and participation in training programmes of the youth population and other data requirements for improving the performance of employers through knowledge on the levels of skill available to them. 3- To measure the volume and characteristics of labour migration of Yemenis outside the country. 4- To provide information on the amount of wages and employment-related income in different occupations, branches of economic activity and sectors of employment. 5- To collect appropriate data for evaluating the microfinance projects funded through the Social Fund for Development.

    Given the extent and diversity of data requirements, the survey was designed to spread over a one-year period, built around the five objectives of the survey. The core labour force survey was conducted throughout the four quarters of the survey period and incorporated the measurement of income from employment along the conventional items of data collection. Data on qualifications and participation in training was collected on the third quarter and on labour migration on the second quarter of the survey programme. Data collection on microfinance was undertaken as a separate survey over the four quarters.

    Geographic coverage

    Survey operations were carried out in all governorates except parts where recent events have disturbed the normal course of economic activity. In these circumstances, special procedures were used for compensation, either through the replacement of those areas with other areas having otherwise similar characteristics in the respective strata or through the adjustment of the sampling weights for missing values. There were 14 such cases, 5 each in quarters 1 and 4, and 2 each in quarters 2 and 3.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The labour force survey covered the civilian non-institutional settled population excluding certain areas with difficult access or low population densities, in particular, the nomad population, displaced populations who are homeless, population living in public housing (boarding, hotels, prisons, hospitals, etc.), individuals enlisted in the Armed Forces, who are residing permanently within camps and do not spend most days of the year with their families. Similarly, for marine crews and expatriates outside the country and other categories of persons in remote islands.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL STATISTICAL ORGANIZATION OF YEMEN (CSO)

    The sample design of the labour force survey of Yemen 2013-2014 is a two-stage stratified sample of enumeration areas in the first stage of sampling and a fixed number of sample households at the second stage of sampling. The resulting sample is spread evenly over the four quarters of the survey period.

    Accordingly, the Central Statistics Organization (CSO) has drawn a stratified sample of census enumeration areas recomposed as primary sampling units (PSUs). Sample selection has been made with probability proportional to the number of households as determined in the 2004 population on census. In the second stage of sampling, after relisting of the sample enumeration areas, a fixed number of households (16 sample households) are drawn as clusters with equal probability from each sample enumeration area. The strata consist of the urban and rural areas of the 21 governorates in Yemen.

    According to the sample design, urban areas are oversampled and rural areas under-sampled. This is because a relatively larger sample size is required in urban areas where heterogeneity is greater in comparison with rural areas. Also, because the cost of transportation and field operations is relatively greater in rural areas, it is more cost effective to under sample the rural areas relative to the less costly operations in urban areas. The differential sampling rates are then corrected through the sample weights so that the final results accurately reflect to the overall employment pattern.

    The sample selection of the cluster of 16 households in each sample enumeration area was drawn after fresh listing of the totality of the households living in the sample enumeration area at the time of listing. This procedure updates the census information that dates back to 2004. The listing operations are carried out in each quarter before survey interviewing. The updated lists are send to CSO in Sana'a for data entry and sample selection of households for transmission to the survey team in each area. Instructions were given so that sample households that could not be found in the field or were absent or refused to be interview should not be substituted with other households as this procedure may introduce bias in the results. Instructions were also given that in cases where the minimum number of households in the sample enumeration areas was to be found to be less than the required 16 in each quarter, all households in the enumeration area should be taken in the sample.

    The total sample size was determined on the basis of the requirement of producing national estimates of the unemployment rate with 1.5% margin of errors at the national level, assuming an overall non-response rate of 15%, and a design effect of 3. For the determination of the national sample size, the expected unemployment rate was set at 15% and the expected number of sample households to reach one person of working age, 15 years old and over, in the labour force was set at 0.6.

    A more detailed description of the allocation of sample across governorates is provided in the report document available among external resources in English.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire of the Yemen LFS 2013-2014 was designed on the basis of the ILO model LFS questionnaire (version A) and other national LFS questionnaires used in the region. The draft questionnaire was field tested with six households in Sana’a, each member of the field staff interviewing one sample household in his or her area. The experience gained in the field test was reviewed and led to some modifications of the draft questionnaire.

    Apart from the cover page and the back page, the core LFS questionnaire contains 52 questions. There are 11 questions on the social and demographic characteristics of the household members in the household roster. In the individual questionnaire addressed to the working age population 15 years of age or older, there are 3 questions to identify the employed persons and 19 questions on their employment characteristics including timerelated underemployment followed by 8 additional questions on income from employment. The individual questionnaire also includes 5 questions to identify the unemployment and the potential labour force and 5 follow-up questions on unemployment characteristics.

    Cleaning operations

    ----> Raw Data

    Data processing involved data entry, coding, editing and tabulation of the survey results. Data entry was carried out in parallel with the interviewing of sample households. It was conducted at the Central Statistical Organization headquarter in Sana'a where all data processing operations except tabulation were centralized.

    The supervisory staff of the data entry operations was responsible for editing the questionnaires before actual data entry. Editing at this stage involved review of the questionnaire regarding its filled-in contents including ensuring that there is no missing block of information for household members aged 15 years old and over and correct coding of occupation, branch of economic activity and other variables.

    The data files were further processed at ILO headquarters in Geneva. They were first converted into a single file with 86,778 records and augmented with several fields, in particular, the sampling weights (“weight”) and the key derived variables: employed (E), unemployed (U), time-related underemployment (TRU), potential labour force (PLF) as well as other derived variables such as informal sector employment (IS) and informal employment (IE).

    ----> Harmonized Data

    • The SPSS package is used to clean and harmonize the datasets.
    • The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
    • All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated
  18. Global Invasive and Alien Traits and Records (GIATAR) dataset

    • zenodo.org
    zip
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ariel Saffer; Ariel Saffer; Thom Worm; Thom Worm (2025). Global Invasive and Alien Traits and Records (GIATAR) dataset [Dataset]. http://doi.org/10.5281/zenodo.15042321
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ariel Saffer; Ariel Saffer; Thom Worm; Thom Worm
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Time period covered
    Jul 30, 2024
    Description

    Monitoring and managing the global spread of invasive and alien species requires accurate spatiotemporal records of species presence and information about the biological characteristics of species of interest including life cycle information, biotic and abiotic constraints and pathways of spread. The Global Invasive and Alien Traits And Records (GIATAR) dataset provides consolidated dated records of invasive and alien presence at the country-scale combined with a suite of biological information about pests of interest in a standardized, machine-readable format. We provide dated presence records for 46,666 alien taxa in 249 countries constituting 827,300 country-taxon pairs, joined with additional biological information for thousands of taxa. GIATAR is designed to be quickly updateable with future data and easy to integrate into ongoing research on global patterns of alien species movement using scripts provided to query and analyze data.

    This publication includes:

    • GIATAR dataset files (dataset)
    • Functions in Python and R to join tables and query data (query_functions)
    • Tutorials and example queries in Python and R (tutorials)

    For more information, please refer to the publication:

    Saffer, Ariel, Thom Worm, Yu Takeuchi, and Ross Meentemeyer. “GIATAR: A Spatio-Temporal Dataset of Global Invasive and Alien Species and Their Traits.” Scientific Data 11, no. 1 (September 11, 2024): 991. https://doi.org/10.1038/s41597-024-03824-w.
    Changes in this version (March 19, 2025):
    • Removed base folder from folder structure
    • Included additional files used to update the database
    • Latest records as of March 9 - 10, 2025
    • Updated species list from EPPO as of February 26, 2025
    For continuous updates to code, please refer to our Github repository: https://github.com/ncsu-landscape-dynamics/GIATAR-dataset
  19. STEMMUS SCOPE emulator train test example data of 2014

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fakhereh (Sarah) Alidoost; Fakhereh (Sarah) Alidoost; Qianqian Han; Qianqian Han (2024). STEMMUS SCOPE emulator train test example data of 2014 [Dataset]. http://doi.org/10.5281/zenodo.12623257
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fakhereh (Sarah) Alidoost; Fakhereh (Sarah) Alidoost; Qianqian Han; Qianqian Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    The "csv" file contains land-atmosphere variables and latent heat flux (LEtot) simulated by STEMMUS-SCOPE (soil-plant model), version 1.5.0, see GitHub repository STEMMUS-SCOPE. The data spreads over 19 Fluxnet sites and for the year 2014 with hourly intervals. For more information see EcoExtreML project.


    This data was used as training data pairs to develop an emulator using a random forests regression algorithm, the "onnx" file. The target variable is "latent heat flux (LEtot)" and features are land-atmosphere variables. For more information about the emulator, see GitHub repository STEMMUS-SCOPE Emulator.
    The model and data are used to create a tutorial on applying an explainability method, for example, Kernel SHAP using the package DIANNA. For more information see Deep Insight and Neural Network Analysis (DIANNA) project.

  20. GAL Hydrochemistry Formations QC for TDS v02 Surfaces

    • researchdata.edu.au
    • devweb.dga.links.com.au
    • +3more
    Updated Mar 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2016). GAL Hydrochemistry Formations QC for TDS v02 Surfaces [Dataset]. https://researchdata.edu.au/gal-hydrochemistry-formations-v02-surfaces/2994238
    Explore at:
    Dataset updated
    Mar 29, 2016
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    This dataset was derived by the Bioregional Assessment Programme. The parent datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    This dataset contains raster representations of Total Dissolved Solid (TDS) measurement trends in groundwater samples for each hydrogeological formation in the Galilee Basin subregion.

    The dataset also contains supplementary polygon Feature Classes for each formation, to be used in the visualisation of the rasters. For each formation this includes:

    a) A rectangular data extent polygon feature class - created based on the distribution of data points for each formation and used to define the extent of the each raster

    b) Data extent mask - further defines the extent of data distribution as well as the spatial extent of the formation, used to visualise the TDS trends for each formation only within the formation boundary and near the spread of point data.

    Purpose

    Provides a visual representation for use in maps, of TDS measurement trends in groundwater for each hydrogeological formation in the Galilee Basin subregion.

    Dataset History

    The raster layers within this dataset were created using the 'Topo to Raster' interpolation method in ArcGIS. Topo to Raster uses an iterative finite difference interpolation technique. This method is preferred for map and visualisation purposes, especially in sparse data regions, as surface continuity is not compromised at a global level. This results in raster layers with smooth surfaces and trends for any level of data density, and surface continuity between areas of varying density.

    Raster layers and polygon Feature Classes were created from the source point Feature Classes (dataset: GAL Hydrochemistry Formations QC for TDS v02 GIS - GUID: 109a21cd-a167-4320-84be-ab56cfc12cee)

    Formation Data Extent polygons: An arbitrary rectangular polygon was created around the extent of points contained in each source point Feature Class

    Formation Data Extent Mask: a hole was clipped from the Formation Data Extent polygon. The Eastern boundary of each hole was traced from the equivalent formation polygon found within the Galilee Groundwater Model, Hydrogeological Formation Extents v01 dataset (GUID: 5afbf7f1-1ee0-444b-9f77-dbad8d8de95b), while the western, northern and southern extent was defined by the distribution of point data or the Galilee subregion boundary (Bioregional Assessment areas v03, GUID: 96dbf469-5463-4f4d-8fad-4214c97e5aac).

    Topo to Raster parameters

    Input feature data = respective point feature class from source dataset

    Field = TDS

    Type = Point Elevation

    Output cell size = 0.001

    Output extent = Formation data extent polygon Feature Class

    Smallest z value to be used in interpolation = smallest TDS value of input point Feature Class

    Largest z value to be used in interpolation = largest TDS value of input point Feature Class

    Drainage enforcement = NO_ENFORCE

    Primary type of input data = SPOT

    All other parameters left as default.

    Dataset Citation

    Bioregional Assessment Programme (XXXX) GAL Hydrochemistry Formations QC for TDS v02 Surfaces. Bioregional Assessment Derived Dataset. Viewed 11 April 2016, http://data.bioregionalassessments.gov.au/dataset/ff165a41-f7f3-4922-870e-6837fd40f228.

    Dataset Ancestors

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kolea Zimmerman; Daniel Levitis; Ethan Addicott; Anne Pringle (2015). Selection of Pairings Reaching Evenly Across the Data (SPREAD): a simple algorithm to design maximally informative fully crossed mating experiments [Dataset]. http://doi.org/10.5061/dryad.8br20

Data from: Selection of Pairings Reaching Evenly Across the Data (SPREAD): a simple algorithm to design maximally informative fully crossed mating experiments

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Aug 20, 2015
Dataset provided by
Bates College
University of Wisconsin–Madison
Harvard University
Authors
Kolea Zimmerman; Daniel Levitis; Ethan Addicott; Anne Pringle
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

We present a novel algorithm for the design of crossing experiments. The algorithm identifies a set of individuals (a "crossing-set") from a larger pool of potential crossing-sets by maximizing the diversity of traits of interest, for example, maximizing the range of genetic and geographic distances between individuals included in the crossing-set. To calculate diversity, we use the mean nearest neighbor distance of crosses plotted in trait space. We implement our algorithm on a real dataset of Neurospora crassa strains, using the genetic and geographic distances between potential crosses as a two-dimensional trait space. In simulated mating experiments, crossing-sets selected by our algorithm provide better estimates of underlying parameter values than randomly chosen crossing-sets.

Search
Clear search
Close search
Google apps
Main menu