100+ datasets found
  1. w

    SPSS demystified : a step-by-step guide to successful data analysis : for...

    • workwithdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data, SPSS demystified : a step-by-step guide to successful data analysis : for S.. [Dataset]. https://www.workwithdata.com/object/spss-demystified-a-step-by-step-guide-to-successful-data-analysis-for-spss-version-18-0-book-by-ronald-d-yockey-0000
    Explore at:
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SPSS demystified : a step-by-step guide to successful data analysis : for SPSS version 18.0 is a book. It was written by Ronald D. Yockey and published by Pearson Education in 2011.

  2. t

    Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...

    • researchdata.tuwien.at
    html, pdf, zip
    Updated Feb 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi (2025). Decoding Wayfinding: Analyzing Wayfinding Processes in the Outdoor Environment [Dataset]. http://doi.org/10.48436/m2ha4-t1v92
    Explore at:
    html, zip, pdfAvailable download formats
    Dataset updated
    Feb 23, 2025
    Dataset provided by
    TU Wien
    Authors
    Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Folder Structure

    The folder named “submission” contains the following:

    1. “pythonProject”: This folder contains all the Python files and subfolders needed for analysis.
    2. ijgis.yml: This file lists all the Python libraries and dependencies required to run the code.

    Setting Up the Environment

    1. Use the ijgis.yml file to create a Python project and environment. Ensure you activate the environment before running the code.
    2. The pythonProject folder contains several .py files and subfolders, each with specific functionality as described below.

    Subfolders

    1. Data_4_IJGIS

    • This folder contains the data used for the results reported in the paper.
    • Note: The data analysis that we explain in this paper already begins with the synchronization and cleaning of the recorded raw data. The published data is already synchronized and cleaned. Both the cleaned files and the merged files with features extracted for them are given in this directory. If you want to perform the segmentation and feature extraction yourself, you should run the respective Python files yourself. If not, you can use the “merged_…csv” files as input for the training.

    2. results_[DateTime] (e.g., results_20240906_15_00_13)

    • This folder will be generated when you run the code and will store the output of each step.
    • The current folder contains results created during code debugging for the submission.
    • When you run the code, a new folder with fresh results will be generated.

    Python Files

    1. helper_functions.py

    • Contains reusable functions used throughout the analysis.
    • Each function includes a description of its purpose and the input parameters required.

    2. create_sanity_plots.py

    • Generates scatter plots like those in Figure 3 of the paper.
    • Although the code has been run for all 309 trials, it can be used to check the sample data provided.
    • Output: A .png file for each column of the raw gaze and IMU recordings, color-coded with logged events.
    • Usage: Run this file to create visualizations similar to Figure 3.

    3. overlapping_sliding_window_loop.py

    • Implements overlapping sliding window segmentation and generates plots like those in Figure 4.
    • Output:
      • Two new subfolders, “Gaze” and “IMU”, will be added to the Data_4_IJGIS folder.
      • Segmented files (default: 2–10 seconds with a 1-second step size) will be saved as .csv files.
      • A visualization of the segments, similar to Figure 4, will be automatically generated.

    4. gaze_features.py & imu_features.py

    • These files compute features as explained in Tables 1 and 2 of the paper, respectively.
    • They process the segmented recordings generated by the overlapping_sliding_window_loop.py.
    • Usage: Just to know how the features are calculated, you can run this code after the segmentation with the sliding window and run these files to calculate the features from the segmented data.

    5. training_prediction.py

    • This file contains the main machine learning analysis of the paper. This file contains all the code for the training of the model, its evaluation, and its use for the inference of the “monitoring part”. It covers the following steps:
    a. Data Preparation (corresponding to Section 5.1.1 of the paper)
    • Prepares the data according to the research question (RQ) described in the paper. Since this data was collected with several RQs in mind, we remove parts of the data that are not related to the RQ of this paper.
    • A function named plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5)) in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line.
    b. Training/Validation/Test Split
    • Splits the data for machine learning experiments (an explanation can be found in Section 5.1.1. Preparation of data for training and inference of the paper).
    • Make sure that you follow the instructions in the comments to the code exactly.
    • Output: The split data is saved as .csv files in the results folder.
    c. Machine and Deep Learning Experiments

    This part contains three main code blocks:

    iii. One for the XGboost code with correct hyperparameter tuning:
    Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of

    • MLP Network (Commented Out): This code was used for classification with the MLP network, and the results shown in Table 3 are from this code. If you wish to use this model, please comment out the following blocks accordingly.
    • XGBoost without Hyperparameter Tuning: If you want to run the code but do not want to spend time on the full training with hyperparameter tuning (as was done for the paper), just uncomment this part. This will give you a simple, untuned model with which you can achieve at least some results.
    • XGBoost with Hyperparameter Tuning: If you want to train the model the way we trained it for the analysis reported in the paper, use this block (the plots in Figure 7 are from this block). We ran this block with different feature sets and different segmentation files and created a simple bar chart from the saved results, shown in Figure 6.

    Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.

    d. Inference (Monitoring Part)
    • Final inference is performed using the monitoring data. This step produces a .csv file containing inferred labels.
    • Figure 8 in the paper is generated using this part of the code.

    6. sequence_analysis.py

    • Performs analysis on the inferred data, producing Figures 9 and 10 from the paper.
    • This file reads the inferred data from the previous step and performs sequence analysis as described in Sections 5.2.1 and 5.2.2.

    Licenses

    The data is licensed under CC-BY, the code is licensed under MIT.

  3. f

    Data from: pmartR: Quality Control and Statistics for Mass...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer (2023). pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00760.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Prior to statistical analysis of mass spectrometry (MS) data, quality control (QC) of the identified biomolecule peak intensities is imperative for reducing process-based sources of variation and extreme biological outliers. Without this step, statistical results can be biased. Additionally, liquid chromatography–MS proteomics data present inherent challenges due to large amounts of missing data that require special consideration during statistical analysis. While a number of R packages exist to address these challenges individually, there is no single R package that addresses all of them. We present pmartR, an open-source R package, for QC (filtering and normalization), exploratory data analysis (EDA), visualization, and statistical analysis robust to missing data. Example analysis using proteomics data from a mouse study comparing smoke exposure to control demonstrates the core functionality of the package and highlights the capabilities for handling missing data. In particular, using a combined quantitative and qualitative statistical test, 19 proteins whose statistical significance would have been missed by a quantitative test alone were identified. The pmartR package provides a single software tool for QC, EDA, and statistical comparisons of MS data that is robust to missing data and includes numerous visualization capabilities.

  4. H

    Data for: Descending 13 real world steps: A dataset and analysis of stair...

    • dataverse.harvard.edu
    • dataone.org
    Updated Dec 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Astrini Sie; Maxim Karrenbach; Charlie Fisher; Shawn Fisher; Nathaniel Wieck; Callysta Caraballo; Elisabeth Case; David Boe; Brittney Muir; Eric Rombokas (2021). Data for: Descending 13 real world steps: A dataset and analysis of stair descent [Dataset]. http://doi.org/10.7910/DVN/SFZPOK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Astrini Sie; Maxim Karrenbach; Charlie Fisher; Shawn Fisher; Nathaniel Wieck; Callysta Caraballo; Elisabeth Case; David Boe; Brittney Muir; Eric Rombokas
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    World
    Description

    Stair descent analysis has been typically limited to laboratory staircases of 4 or 5 steps. To date there has been no report of gait parameters during unconstrained stair descent outside of the laboratory, and few motion capture datasets are publicly available. We aim to collect a dataset and perform gait analysis for stair descent outside of the laboratory. We aim to measure basic kinematic and kinetic gait parameters and foot placement behavior. We present a public stair descent dataset from 101 unimpaired participants aged 18-35 on an unconstrained 13-step staircase collected using wearable sensors. The dataset consists of kinematics (full-body joint angle and position), kinetics (plantar normal forces, acceleration), and foot placement for 30,609 steps. This is the first quantitative observation of gait data from a large number (n = 101) of participants descending an unconstrained staircase outside of a laboratory. The dataset is a public resource for understanding typical stair descent.

  5. Data from: A protocol for conducting and presenting results of...

    • zenodo.org
    • search.dataone.org
    • +1more
    bin, txt
    Updated May 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alain F. Zuur; Elena N. Ieno; Alain F. Zuur; Elena N. Ieno (2022). Data from: A protocol for conducting and presenting results of regression-type analyses [Dataset]. http://doi.org/10.5061/dryad.v4t42
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    May 29, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alain F. Zuur; Elena N. Ieno; Alain F. Zuur; Elena N. Ieno
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Scientific investigation is of value only insofar as relevant results are obtained and communicated, a task that requires organizing, evaluating, analysing and unambiguously communicating the significance of data. In this context, working with ecological data, reflecting the complexities and interactions of the natural world, can be a challenge. Recent innovations for statistical analysis of multifaceted interrelated data make obtaining more accurate and meaningful results possible, but key decisions of the analyses to use, and which components to present in a scientific paper or report, may be overwhelming. We offer a 10-step protocol to streamline analysis of data that will enhance understanding of the data, the statistical models and the results, and optimize communication with the reader with respect to both the procedure and the outcomes. The protocol takes the investigator from study design and organization of data (formulating relevant questions, visualizing data collection, data exploration, identifying dependency), through conducting analysis (presenting, fitting and validating the model) and presenting output (numerically and visually), to extending the model via simulation. Each step includes procedures to clarify aspects of the data that affect statistical analysis, as well as guidelines for written presentation. Steps are illustrated with examples using data from the literature. Following this protocol will reduce the organization, analysis and presentation of what may be an overwhelming information avalanche into sequential and, more to the point, manageable, steps. It provides guidelines for selecting optimal statistical tools to assess data relevance and significance, for choosing aspects of the analysis to include in a published report and for clearly communicating information.

  6. PATIENT CENTRIC MANAGEMENT ANALYSIS AND FUTURE PROSPECTS IN BIG DATA...

    • osf.io
    Updated Jul 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krishnachaitanya.Katkam; Dr. Harsh Lohiya (2023). PATIENT CENTRIC MANAGEMENT ANALYSIS AND FUTURE PROSPECTS IN BIG DATA HEALTHCARE [Dataset]. http://doi.org/10.17605/OSF.IO/DF4UQ
    Explore at:
    Dataset updated
    Jul 21, 2023
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Krishnachaitanya.Katkam; Dr. Harsh Lohiya
    Description

    ABSTRACT A lot amounts of data i.e information that related to make wonders with work is called as 'BIG DATA' Last two decades big data treated as a special interest and had a lot potentiality because of hidden features in it. To generate, store, and analyze big data with an aim to improve the services they provide in multiple no of small & large scale industries. As we are considering the health care industry for this big data is providing multiple opportunities like records of patients, inflow & outflow of the hospitals. It also generates a significant portion of big data relevant to public healthcare in biomedical research. In order to derive meaningful information analysis & proper management of data is required. In the haystack seeking solution in big data will be quickly analyzable just like finding a needle. in big data analysis various challenges associated with each step of handling big data surpassed by using high-end computing solutions. for improving public health healthcare providers provide relevant solutions & to systematically generate and analyze big data requirements to be fully loaded with efficient infrastructure. in big data can change the game by opening new avenues for modern healthcare with an efficient management, analysis, and interpretation. vigorous instructions are given by the various industries like public sectors followed by healthcare for the betterment of services and as well as financial upgrades. by taking the revolution in healthcare industry we can accommodate personnel medicine included by therapies in strong integration manner. Keywords: Healthcare, Biomedical Research, Big Data Analytics, Internet of Things, Personalized Medicine, Quantum Computing Cite this Article: Krishnachaitanya.Katkam and Harsh Lohiya, Patient Centric Management Analysis and Future Prospects in Big Data Healthcare, International Journal of Computer Engineering and Technology (IJCET), 13(3), 2022, pp. 76-86.

  7. d

    General Mission Analysis Tool Project

    • catalog.data.gov
    • data.nasa.gov
    • +1more
    Updated Dec 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). General Mission Analysis Tool Project [Dataset]. https://catalog.data.gov/dataset/general-mission-analysis-tool-project
    Explore at:
    Dataset updated
    Dec 6, 2023
    Description

    Overview

    GMAT is a feature rich system containing high fidelity space system models, optimization and targeting,
    built in scripting and programming infrastructure, and customizable plots, reports and data
    products, to enable flexible analysis and solutions for custom and unique applications. GMAT can
    be driven from a fully featured, interactive GUI or from a custom script language. Here are some
    of GMAT’s key features broken down by feature group.

    Dynamics and Environment Modelling

    • High fidelity dynamics models including harmonic gravity, drag, tides, and relativistic corrections
    • High fidelity spacecraft modeling
    • Formations and constellations
    • Impulsive and finite maneuver modeling and optimization
    • Propulsion system modeling including tanks and thrusters
    • Solar System modeling including high fidelity ephemerides, custom celestial bodies, libration points, and barycenters
    • Rich set of coordinate system including J2000, ICRF, fixed, rotating, topocentric, and many others
    • SPICE kernel propagation
    • Propagators that naturally synchronize epochs of multiple vehicles and avoid fixed step integration
    • and interpolation

    Plotting, Reporting and Product Generation

    • Interactive 3-D graphics
    • Customizable data plots and reports
    • Post computation animation
    • CCSDS, SPK, and Code-500 ephemeris generation

    Optimization and Targeting

    • Boundary value targeters
    • Nonlinear, constrained optimization
    • Custom, scriptable cost functions
    • Custom, scriptable nonlinear equality and inequality constraint functions
    • Custom targeter controls and constraints

    Programming Infrastructure

    • User defined variables, arrays, and strings
    • User defined equations using MATLAB syntax. (i.e. overloaded array operation)
    • Control flow such as If, For, and While loops for custom applications
    • Matlab interface
    • Built in parameters and calculations in multiple coordinate systems

    Interfaces

    • Fully featured, interactive GUI that makes simple analysis quick and easy
    • Custom scripting language that makes complex, custom analysis possible
    • Matlab interface for custom external simulations and calculations
    • File interface for the TCOPS Vector Hold

  8. STEP Skills Measurement Household Survey 2012 (Wave 1) - Colombia

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Apr 8, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2016). STEP Skills Measurement Household Survey 2012 (Wave 1) - Colombia [Dataset]. https://microdata.worldbank.org/index.php/catalog/2012
    Explore at:
    Dataset updated
    Apr 8, 2016
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Time period covered
    2012
    Area covered
    Colombia
    Description

    Abstract

    The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.

    The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.

    Geographic coverage

    13 major metropolitan areas: Bogota, Medellin, Cali, Baranquilla, Bucaramanga, Cucuta, Cartagena, Pasto, Ibague, Pereira, Manizales, Monteira, and Villavicencio.

    Analysis unit

    The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.

    Universe

    The target population for the Colombia STEP survey is all non-institutionalized persons 15 to 64 years old (inclusive) living in private dwellings in urban areas of the country at the time of data collection. This includes all residents except foreign diplomats and non-nationals working for international organizations.

    The following groups are excluded from the sample: - residents of institutions (prisons, hospitals, etc.) - residents of senior homes and hospices - residents of other group dwellings such as college dormitories, halfway homes, workers' quarters, etc. - persons living outside the country at the time of data collection.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Stratified 7-stage sample design was used in Colombia. The stratification variable is city-size category.

    First Stage Sample The primary sample unit (PSU) is a metropolitan area. A sample of 9 metropolitan areas was selected from the 13 metropolitan areas on the sample frame. The metropolitan areas were grouped according to city-size; the five largest metropolitan areas are included in Stratum 1 and the remaining 8 metropolitan areas are included in Stratum 2. The five metropolitan areas in Stratum 1 were selected with certainty; in Stratum 2, four metropolitan areas were selected with probability proportional to size (PPS), where the measure of size was the number of persons aged 15 to 64 in a metropolitan area.

    Second Stage Sample The second stage sample unit is a Section. At the second stage of sample selection, a PPS sample of 267 Sections was selected from the sampled metropolitan areas; the measure of size was the number of persons aged 15 to 64 in a Section. The sample of 267 Sections consisted of 243 initial Sections and 24 reserve Sections to be used in the event of complete non-response at the Section level.

    Third Stage Sample The third stage sample unit is a Block. Within each selected Section, a PPS sample of 4 blocks was selected; the measure of size was the number of persons aged 15 to 64 in a Block. Two sample Blocks were initially activated while the remaining two sample Blocks were reserved for use in cases where there was a refusal to cooperate at the Block level or cases where the block did not belong to the target population (e.g., parks, and commercial and industrial areas).

    Fourth Stage Sample The fourth stage sample unit is a Block Segment. Regarding the Block segmentation strategy, the Colombia document 'FINAL SAMPLING PLAN (ARD-397)' states "According to the 2005 population and housing census conducted by DANE, the average number of dwellings per block in the 13 large cities or metropolitan areas was approximately 42 dwellings. Based on this finding, the defined protocol was to report those cases in which 80 or more dwellings were present in a given block in order to partition block using a random selection algorithm." At the fourth stage of sample selection, 1 Block Segment was selected in each selected Block using a simple random sample (SRS) method.

    Fifth Stage Sample The fifth stage sample unit is a dwelling. At the fifth stage of sample selection, 5582 dwellings were selected from the sampled Blocks/Block Segments using a simple random sample (SRS) method. According to the Colombia document 'FINAL SAMPLING PLAN (ARD-397)', the selection of dwellings within a participant Block "was performed differentially amongst the different socioeconomic strata that the Colombian government uses for the generation of cross-subsidies for public utilities (in this case, the socioeconomic stratum used for the electricity bill was used). Given that it is known from previous survey implementations that refusal rates are highest amongst households of higher socioeconomic status, the number of dwellings to be selected increased with the socioeconomic stratum (1 being the poorest and 6 being the richest) that was most prevalent in a given block".

    Sixth Stage Sample The sixth stage sample unit is a household. At the sixth stage of sample selection, one household was selected in each selected dwelling using an SRS method.

    Seventh Stage Sample The seventh stage sample unit was an individual aged 15-64 (inclusive). The sampling objective was to select one individual with equal probability from each selected household.

    Sampling methodologies are described for each country in two documents and are provided as external resources: (i) the National Survey Design Planning Report (NSDPR) (ii) the weighting documentation (available for all countries)

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The STEP survey instruments include:

    • The background questionnaire developed by the World Bank (WB) STEP team
    • Reading Literacy Assessment developed by Educational Testing Services (ETS).

    All countries adapted and translated both instruments following the STEP technical standards: two independent translators adapted and translated the STEP background questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator.

    The survey instruments were piloted as part of the survey pre-test.

    The background questionnaire covers such topics as respondents' demographic characteristics, dwelling characteristics, education and training, health, employment, job skill requirements, personality, behavior and preferences, language and family background.

    The background questionnaire, the structure of the Reading Literacy Assessment and Reading Literacy Data Codebook are provided in the document "Colombia STEP Skills Measurement Survey Instruments", available in external resources.

    Cleaning operations

    STEP data management process:

    1) Raw data is sent by the survey firm 2) The World Bank (WB) STEP team runs data checks on the background questionnaire data. Educational Testing Services (ETS) runs data checks on the Reading Literacy Assessment data. Comments and questions are sent back to the survey firm. 3) The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data. 4) The WB STEP team and ETS check if the data files are clean. This might require additional iterations with the survey firm. 5) Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies. 6) ETS scales the Reading Literacy Assessment data. 7) The WB STEP team merges the background questionnaire data with the Reading Literacy Assessment data and computes derived variables.

    Detailed information on data processing in STEP surveys is provided in "STEP Guidelines for Data Processing", available in external resources. The template do-file used by the STEP team to check raw background questionnaire data is provided as an external resource, too.`

    Response rate

    An overall response rate of 48% was achieved in the Colombia STEP Survey.

  9. STEP Skills Measurement Household Survey 2012 (Wave 1) - Viet Nam

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    STEP Skills Measurement Household Survey 2012 (Wave 1) - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/2018
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Time period covered
    2012
    Area covered
    Viet Nam
    Description

    Abstract

    The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.

    The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.

    Geographic coverage

    The survey covers the urban area of two largest cities of Vietnam, Ha Noi and HCMCT.

    Analysis unit

    The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.

    Universe

    The STEP target population is the population aged 15 to 64 included, living in urban areas, as defined by each country's statistical office. In Vietnam, the target population comprised all people from 15-64 years old living in urban areas in Ha Noi and Ho Chi Minh City (HCM).

    The reasons for selection of these two cities include :

    (i) They are two biggest cities of Vietnam, so they would have all urban characteristics needed for STEP study, and (ii) It is less costly to conduct STEP survey in these to cities, compared to all urban areas of Vietnam, given limitation of survey budget.

    • The target population is not representative for the national urban population.

    The following are excluded from the sample:

    • Residents of institutions (prisons, hospitals, etc)
    • Residents of senior homes and hospices
    • Residents of other group dwellings such as college dormitories, halfway homes, workers' quarters, etc
    • Persons living outside the country at the time of data collection

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    • The sample of 3405 households was selected from 227 urban Enumeration Areas (EAs) in Ha Noi (107 EAs) and Ho Chi Minh City (120 EAs). From each EA 15 households were selected, so the number of households selected in Ha Noi was 1245 HHs, and in HCM, 2160 HHs.
    • The 2009 Population and Housing Census was used as a sample frame.
    • Regarding PSUs (EAs), the sampling frame is the list of 15% of total EAs of the 2009 Population Census. Data items on the frame for PSU include provincecode, districtcode, commune code, and EA code; address of EA, number of households.
    • Regarding ultimate sampling units (households), sampling frame is a list of (100) households in each EA. Data items on the frame for ultimate sampling units (households) include names of heads of households.

    The sample frame includes the list of urban EAs and the count of households for each EA. Changes of the EAs list and household list would impact on coverage of sample frame. In a recent review of Ha Noi, there were only 3 EAs either new or destroyed from 140 randomly selected Eas (2%). GSO would increase the coverage of sample frame (>95% as standard) by updating the household list of the selected Eas before selecting households for STEP.

    A detailed description of the sample design is available in section 4 of the NSDPR provided with the metadata. On completion of the household listing operation, GSO will deliver to the World Bank a copy of the lists, and an Excel spreadsheet with the total number of households listed in each of the 227 visited PSUs.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The STEP survey instruments include: (i) a Background Questionnaire developed by the WB STEP team (ii) a Reading Literacy Assessment developed by Educational Testing Services (ETS).

    All countries adapted and translated both instruments following the STEP Technical Standards: 2 independent translators adapted and translated the Background Questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator. The WB STEP team and ETS collaborated closely with the survey firms during the process and reviewed the adaptation and translation to Vietnamese (using a back translation). - The survey instruments were both piloted as part of the survey pretest. - The adapted Background Questionnaires are provided in English as external resources. The Reading Literacy Assessment is protected by copyright and will not be published.

    Cleaning operations

    STEP Data Management Process 1. Raw data is sent by the survey firm 2. The WB STEP team runs data checks on the Background Questionnaire data. - ETS runs data checks on the Reading Literacy Assessment data. - Comments and questions are sent back to the survey firm. 3. The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data. 4. The WB STEP team and ETS check the data files are clean. This might require additional iterations with the survey firm. 5. Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies. 6. ETS scales the Reading Literacy Assessment data. 7. The WB STEP team merges the Background Questionnaire data with the Reading Literacy Assessment data and computes derived variables.

    Detailed information data processing in STEP surveys is provided in the 'Guidelines for STEP Data Entry Programs' document provided as an external resource. The template do-file used by the STEP team to check the raw background questionnaire data is provided as an external resource.

    Response rate

    The response rate for Vietnam (urban) was 62%. (See STEP Methodology Note Table 4).

    Sampling error estimates

    A weighting documentation was prepared for each participating country and provides some information on sampling errors. All country weighting documentations are provided as an external resource.

  10. f

    Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...

    • frontiersin.figshare.com
    docx
    Updated Mar 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    Frontiers
    Authors
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.

  11. STEP Skills Measurement Household Survey 2013 (Wave 2) - Ghana

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Apr 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    STEP Skills Measurement Household Survey 2013 (Wave 2) - Ghana [Dataset]. https://microdata.worldbank.org/index.php/catalog/2015
    Explore at:
    Dataset updated
    Apr 19, 2016
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Time period covered
    2013
    Area covered
    Ghana
    Description

    Abstract

    The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.

    The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.

    Geographic coverage

    The survey covered the following regions: Western, Central, Greater Accra, Volta, Eastern, Ashanti, Brong Ahafo, Northern, Upper East and Upper West.
    - Areas are classified as urban based on each country's official definition.

    Analysis unit

    The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.

    Universe

    The target population for the Ghana STEP survey comprises all non-institutionalized persons 15 to 64 years of age (inclusive) living in private dwellings in urban areas of the country at the time of data collection. This includes all residents except foreign diplomats and non-nationals working for international organizations. Exclusions : Military barracks were excluded from the Ghana target population.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Ghana sample design is a four-stage sample design. There was no explicit stratification but the sample was implicitly stratified by Region. [Note: Implicit stratification was achieved by sorting the PSUs (i.e., EACode) by RegnCode and selecting a systematic sample of PSUs.]

    First Stage Sample The primary sample unit (PSU) was a Census Enumeration Area (EA). Each PSU was uniquely defined by the sample frame variables Regncode, and EAcode. The sample frame was sorted by RegnCode to implicitly stratify the sample frame PSUs by region. The sampling objective was to select 250 PSUs, comprised of 200 Initial PSUs and 50 Reserve PSUs. Although 250 PSUs were selected, only 201 PSUs were activated. The PSUs were selected using a systematic probability proportional to size (PPS) sampling method, where the measure of size was the population size (i.e., EAPopn) in a PSU.

    Second Stage Sample The second stage sample unit is a PSU partition. It was considered necessary to partition 'large' PSUs into smaller areas to facilitate the listing process. After the partitioning of the PSUs, the survey firm randomly selected one partition. The selected partition was fully listed for subsequent enumeration in accordance with the field procedures.

    Third Stage Sample The third stage sample unit (SSU) is a household. The sampling objective was to obtain interviews at 15 households within each selected PSU. The households were selected in each PSU using a systematic random method.

    Fourth Stage Sample The fourth stage sample unit was an individual aged 15-64 (inclusive). The sampling objective was to select one individual with equal probability from each selected household.

    Sample Size The Ghana firm's sampling objective was to obtain interviews from 3000 individuals in the urban areas of the country. In order to provide sufficient sample to allow for a worst case scenario of a 50% response rate the number of sampled cases was doubled in each selected PSU. Although 50 extra PSUs were selected for use in case it was impossible to conduct any interviews in one or more initially selected PSUs only one reserve PSU was activated. Therefore, the Ghana firm conducted the STEP data collection in a total of 201 PSUs.

    Sampling methodologies are described for each country in two documents: (i) The National Survey Design Planning Report (NSDPR) (ii) The weighting documentation

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The STEP survey instruments include: (i) a Background Questionnaire developed by the WB STEP team (ii) a Reading Literacy Assessment developed by Educational Testing Services (ETS).

    All countries adapted and translated both instruments following the STEP Technical Standards: 2 independent translators adapted and translated the Background Questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator. The WB STEP team and ETS collaborated closely with the survey firms during the process and reviewed the adaptation and translation (using a back translation). In the case of Ghana, no translation was necessary, but the adaptation process ensured that the English used in the Background Questionnaire and Reading Literacy Assessment closely reflected local use.

    • The survey instruments were both piloted as part of the survey pretest.
    • The adapted Background Questionnaires are provided in English as external resources. The Reading Literacy Assessment is protected by copyright and will not be published.

    Cleaning operations

    STEP Data Management Process 1. Raw data is sent by the survey firm 2. The WB STEP team runs data checks on the Background Questionnaire data. - ETS runs data checks on the Reading Literacy Assessment data. - Comments and questions are sent back to the survey firm. 3. The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data. 4. The WB STEP team and ETS check the data files are clean. This might require additional iterations with the survey firm. 5. Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies. 6. ETS scales the Reading Literacy Assessment data. 7. The WB STEP team merges the Background Questionnaire data with the Reading Literacy Assessment data and computes derived variables.

    Detailed information data processing in STEP surveys is provided in the 'Guidelines for STEP Data Entry Programs' document provided as an external resource. The template do-file used by the STEP team to check the raw background questionnaire data is provided as an external resource.

    Response rate

    An overall response rate of 83.2% was achieved in the Ghana STEP Survey. Table 20 of the weighting documentation provides the detailed percentage distribution by final status code.

    Sampling error estimates

    A weighting documentation was prepared for each participating country and provides some information on sampling errors. The weighting documentation is provided as an external resource.

  12. Example data for working with the ASpecD framework

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Till Biskup; Till Biskup (2023). Example data for working with the ASpecD framework [Dataset]. http://doi.org/10.5281/zenodo.8150115
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Till Biskup; Till Biskup
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ASpecD is a Python framework for handling spectroscopic data focussing on reproducibility. In short: Each and every processing step applied to your data will be recorded and can be traced back. Additionally, for each representation of your data (e.g., figures, tables) you can easily follow how the data shown have been processed and where they originate from.

    To provide readers of the publication describing the ASpecD framework with a concrete example of data analysis making use of recipe-driven data analysis, this repository contains both, a recipe as well as the data that are analysed, as shown in the publication describing the ASpecD framework:

    • Jara Popp, Till Biskup: ASpecD: A Modular Framework for the Analysis of Spectroscopic Data Focussing on Reproducibility and Good Scientific Practice. Chemistry--Methods 2:e202100097, 2022. doi:10.1002/cmtd.202100097
  13. Data for Integrated Step Selection Analysis of translocated female greater...

    • zenodo.org
    • datadryad.org
    csv
    Updated Jun 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simona Picardi; Simona Picardi (2022). Data for Integrated Step Selection Analysis of translocated female greater sage-grouse in the 60 days post-release, North Dakota 2018-2020 [Dataset]. http://doi.org/10.5061/dryad.44j0zpcf5
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Simona Picardi; Simona Picardi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The data include used and random available steps at 11-hour resolution generated for 26 female greater sage-grouse in the 60 days post-translocation to North Dakota, with associated environmental predictors and individual information. The code fits individual habitat selection models in an Integrated Step Selection Analysis framework.

    Data used to fit the models described in:

    Picardi, S., Ranc, N., Smith, B.J., Coates, P.S., Mathews, S.R., Dahlgren, D.K. Individual variation in temporal dynamics of post-release habitat selection. Frontiers in Conservation Science (in review)

    Code used to implement the analysis is available on GitHub: https://github.com/picardis/picardi-et-al_2021_sage-grouse_frontiers-in-conservation

  14. d

    Data from: The role of Data Science and AI for predicting the decline of...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azevedo, Caio da Silva; Borges, Aline de Fátima Soares (2023). The role of Data Science and AI for predicting the decline of professionals in the recruitment process: augmenting decision-making in human resources management [Dataset]. http://doi.org/10.7910/DVN/OZJCFG
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Azevedo, Caio da Silva; Borges, Aline de Fátima Soares
    Description

    The role of Data Science and AI for predicting the decline of professionals in the recruitment process: augmenting decision-making in human resources management Features Description: Declined: Variable to be predict, where value 0 means that the candi- date continued in the recruit- ment process until the hiring, and value 1 implies the candi- date’s declination from recruit- ment process. ValueClient: The total amount the customer plan to pay by the hired candidate. The value 0 means that client yet did not define a value to pay the candidate. Values must be greater than or equal to 0. ExtraCost: Extra cost the customer has to pay to hire the candidate. Values must be greater than or equal to 0. ValueResources: Requested value by the candidate to work. The value 0 means that the candidate did not request a salary amount yet an this value will be negotiate later. Values must be greater than or equal to 0. Net: The difference between the “ValueClient”, yearly taxes and “ValueResources”. Negative values mean that the amount the client plans to pay the candidate has not yet been defined and is still open for negotiation. DaysOnContact: Number of days that the candidate is in the “Contact” step of the recruitment process. Values must be greater than or equal to 0. DaysOnInterview: Number of days that the candidate is in the “Interview” step of the recruitment process. Values must be greater than or equal to 0. DaysOnSendCV: Number of days that the candidate is in the “Send CV” step of the recruitment process. Values must be greater than or equal to 0. DaysOnReturn: Number of days that the candidate is in the “Return” step of the recruitment process. Values must be greater than or equal to 0. DaysOnCSchedule: Number of days that the candidate is in the “C. Schedule” step of the recruitment process. Values must be greater than or equal to 0. DaysOnCRealized: Number of days that the candidate is in the “C. Realized” step of the recruitment process. Values must be greater than or equal to 0. ProcessDuration: Duration of entire recruitment process in days. Values must be greater than or equal to 0

  15. B

    Easing into Excellent Excel Practices Learning Series / Série...

    • borealisdata.ca
    • search.dataone.org
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie Marcoux (2023). Easing into Excellent Excel Practices Learning Series / Série d'apprentissages en route vers des excellentes pratiques Excel [Dataset]. http://doi.org/10.5683/SP3/WZYO1F
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Borealis
    Authors
    Julie Marcoux
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    With a step-by-step approach, learn to prepare Excel files, data worksheets, and individual data columns for data analysis; practice conditional formatting and creating pivot tables/charts; go over basic principles of Research Data Management as they might apply to an Excel project. Avec une approche étape par étape, apprenez à préparer pour l’analyse des données des fichiers Excel, des feuilles de calcul de données et des colonnes de données individuelles; pratiquez la mise en forme conditionnelle et la création de tableaux croisés dynamiques ou de graphiques; passez en revue les principes de base de la gestion des données de recherche tels qu’ils pourraient s’appliquer à un projet Excel.

  16. DeepCube: Post-processing and annotated datasets of social media data

    • zenodo.org
    • data.niaid.nih.gov
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis; Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis (2024). DeepCube: Post-processing and annotated datasets of social media data [Dataset]. http://doi.org/10.5281/zenodo.10731637
    Explore at:
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis; Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Researcher(s): Alexandros Mokas, Eleni Kamateri

    Supervisor: Ioannis Tsampoulatidis

    This repository contains 3 social media datasets:

    2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:

    • The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.
    • The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.

    1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:

    • The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.

    For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.

    After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.

    The dataset is provided by INFALIA.

    INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.

  17. d

    Data from: Equivalence between step selection functions and biased...

    • datadryad.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Mar 3, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thierry Duchesne; Daniel Fortin; Louis-Paul Rivest (2016). Equivalence between step selection functions and biased correlated random walks for statistical inference on animal movement [Dataset]. http://doi.org/10.5061/dryad.217t3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 3, 2016
    Dataset provided by
    Dryad
    Authors
    Thierry Duchesne; Daniel Fortin; Louis-Paul Rivest
    Time period covered
    2016
    Description

    Data for the analysis of bison trailsData used for the directional analysis of bison trails with respect to directional persistence, the target meadow, and the nearest canopy gap.Data_Duchesneetal.csv

  18. Data from: Supplementary Material for "Sonification for Exploratory Data...

    • search.datacite.org
    • pub.uni-bielefeld.de
    Updated Feb 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Hermann (2019). Supplementary Material for "Sonification for Exploratory Data Analysis" [Dataset]. http://doi.org/10.4119/unibi/2920448
    Explore at:
    Dataset updated
    Feb 5, 2019
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Bielefeld University
    Authors
    Thomas Hermann
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Sonification for Exploratory Data Analysis #### Chapter 8: Sonification Models In Chapter 8 of the thesis, 6 sonification models are presented to give some examples for the framework of Model-Based Sonification, developed in Chapter 7. Sonification models determine the rendering of the sonification and possible interactions. The "model in mind" helps the user to interprete the sound with respect to the data. ##### 8.1 Data Sonograms Data Sonograms use spherical expanding shock waves to excite linear oscillators which are represented by point masses in model space. * Table 8.2, page 87: Sound examples for Data Sonograms File: Iris dataset: started in plot (a) at S0 (b) at S1 (c) at S2
    10d noisy circle dataset: started in plot (c) at S0 (mean) (d) at S1 (edge)
    10d Gaussian: plot (d) started at S0
    3 clusters: Example 1
    3 clusters: invisible columns used as output variables: Example 2 Description: Data Sonogram Sound examples for synthetic datasets and the Iris dataset Duration: about 5 s ##### 8.2 Particle Trajectory Sonification Model This sonification model explores features of a data distribution by computing the trajectories of test particles which are injected into model space and move according to Newton's laws of motion in a potential given by the dataset. * Sound example: page 93, PTSM-Ex-1 Audification of 1 particle in the potential of phi(x). * Sound example: page 93, PTSM-Ex-2 Audification of a sequence of 15 particles in the potential of a dataset with 2 clusters. * Sound example: page 94, PTSM-Ex-3 Audification of 25 particles simultaneous in a potential of a dataset with 2 clusters. * Sound example: page 94, PTSM-Ex-4 Audification of 25 particles simultaneous in a potential of a dataset with 1 cluster. * Sound example: page 95, PTSM-Ex-5 sigma-step sequence for a mixture of three Gaussian clusters * Sound example: page 95, PTSM-Ex-6 sigma-step sequence for a Gaussian cluster * Sound example: page 96, PTSM-Iris-1 Sonification for the Iris Dataset with 20 particles per step. * Sound example: page 96, PTSM-Iris-2 Sonification for the Iris Dataset with 3 particles per step. * Sound example: page 96, PTSM-Tetra-1 Sonification for a 4d tetrahedron clusters dataset. ##### 8.3 Markov chain Monte Carlo Sonification The McMC Sonification Model defines a exploratory process in the domain of a given density p such that the acoustic representation summarizes features of p, particularly concerning the modes of p by sound. * Sound Example: page 105, MCMC-Ex-1 McMC Sonification, stabilization of amplitudes. * Sound Example: page 106, MCMC-Ex-2 Trajectory Audification for 100 McMC steps in 3 cluster dataset * McMC Sonification for Cluster Analysis, dataset with three clusters, page 107 * Stream 1 MCMC-Ex-3.1 * Stream 2 MCMC-Ex-3.2 * Stream 3 MCMC-Ex-3.3 * Mix MCMC-Ex-3.4 * McMC Sonification for Cluster Analysis, dataset with three clusters, T =0.002s, page 107 * Stream 1 MCMC-Ex-4.1 (stream 1) * Stream 2 MCMC-Ex-4.2 (stream 2) * Stream 3 MCMC-Ex-4.3 (stream 3) * Mix MCMC-Ex-4.4 * McMC Sonification for Cluster Analysis, density with 6 modes, T=0.008s, page 107 * Stream 1 MCMC-Ex-5.1 (stream 1) * Stream 2 MCMC-Ex-5.2 (stream 2) * Stream 3 MCMC-Ex-5.3 (stream 3) * Mix MCMC-Ex-5.4 * McMC Sonification for the Iris dataset, page 108 * MCMC-Ex-6.1 * MCMC-Ex-6.2 * MCMC-Ex-6.3 * MCMC-Ex-6.4 * MCMC-Ex-6.5 * MCMC-Ex-6.6 * MCMC-Ex-6.7 * MCMC-Ex-6.8 ##### 8.4 Principal Curve Sonification Principal Curve Sonification represents data by synthesizing the soundscape while a virtual listener moves along the principal curve of the dataset through the model space. * Noisy Spiral dataset, PCS-Ex-1.1 , page 113 * Noisy Spiral dataset with variance modulation PCS-Ex-1.2 , page 114 * 9d tetrahedron cluster dataset (10 clusters) PCS-Ex-2 , page 114 * Iris dataset, class label used as pitch of auditory grains PCS-Ex-3 , page 114 ##### 8.5 Data Crystallization Sonification Model * Table 8.6, page 122: Sound examples for Crystallization Sonification for 5d Gaussian distribution File: DCS started at center, in tail, from far outside Description: DCS for dataset sampled from N{0, I_5} excited at different locations Duration: 1.4 s * Mixture of 2 Gaussians, page 122 * DCS started at point A DCS-Ex1A * DCS started at point B DCS-Ex1B * Table 8.7, page 124: Sound examples for DCS on variation of the harmonics factor File: h_omega = 1, 2, 3, 4, 5, 6 Description: DCS for a mixture of two Gaussians with varying harmonics factor Duration: 1.4 s * Table 8.8, page 124: Sound examples for DCS on variation of the energy decay time File: tau_(1/2) = 0.001, 0.005, 0.01, 0.05, 0.1, 0.2 Description: DCS for a mixture of two Gaussians varying the energy decay time tau_(1/2) Duration: 1.4 s * Table 8.9, page 125: Sound examples for DCS on variation of the sonification time File: T = 0.2, 0.5, 1, 2, 4, 8 Description: DCS for a mixture of two Gaussians on varying the duration T Duration: 0.2s -- 8s * Table 8.10, page 125: Sound examples for DCS on variation of model space dimension File: selected columns of the dataset: (x0) (x0,x1) (x0,...,x2) (x0,...,x3) (x0,...,x4) (x0,...,x5) Description: DCS for a mixture of two Gaussians varying the dimension Duration: 1.4 s * Table 8.11, page 126: Sound examples for DCS for different excitation locations File: starting point: C0, C1, C2 Description: DCS for a mixture of three Gaussians in 10d space with different rank(S) = {2,4,8} Duration: 1.9 s * Table 8.12, page 126: Sound examples for DCS for the mixture of a 2d distribution and a 5d cluster File: condensation nucleus in (x0,x1)-plane at: (-6,0)=C1, (-3,0)=C2, ( 0,0)=C0 Description: DCS for a mixture of a uniform 2d and a 5d Gaussian Duration: 2.16 s * Table 8.13, page 127: Sound examples for DCS for the cancer dataset File: condensation nucleus in (x0,x1)-plane at: benign 1, benign 2
    malignant 1, malignant 2 Description: DCS for a mixture of a uniform 2d and a 5d Gaussian Duration: 2.16 s ##### 8.6 Growing Neural Gas Sonification * Table 8.14, page 133: Sound examples for GNGS Probing File: Cluster C0 (2d): a, b, c
    Cluster C1 (4d): a, b, c
    Cluster C2 (8d): a, b, c Description: GNGS for a mixture of 3 Gaussians in 10d space Duration: 1 s * Table 8.15, page 134: Sound examples for GNGS for the noisy spiral dataset File: (a) GNG with 3 neurons 1, 2
    (b) GNG with 20 neurons end, middle, inner end
    (c) GNG with 45 neurons outer end, middle, close to inner end, at inner end
    (d) GNG with 150 neurons outer end, in the middle, inner end
    (e) GNG with 20 neurons outer end, in the middle, inner end
    (f) GNG with 45 neurons outer end, in the middle, inner end Description: GNG probing sonification for 2d noisy spiral dataset Duration: 1 s * Table 8.16, page 136: Sound examples for GNG Process Monitoring Sonification for different data distributions File: Noisy spiral with 1 rotation: sound
    Noisy spiral with 2 rotations: sound
    Gaussian in 5d: sound
    Mixture of 5d and 2d distributions: sound Description: GNG process sonification examples Duration: 5 s #### Chapter 9: Extensions #### In this chapter, two extensions for Parameter Mapping

  19. D

    Data Management Platform Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Data Management Platform Market Report [Dataset]. https://www.promarketreports.com/reports/data-management-platform-market-8903
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Data Management Platform Market was valued at USD 3.4 billion in 2023 and is projected to reach USD 8.25 billion by 2032, with an expected CAGR of 13.50% during the forecast period. The Data Management Platform (DMP) market is experiencing robust growth, driven by the increasing demand for personalized marketing and data-driven decision-making. Organizations across various industries are leveraging DMPs to collect, analyze, and manage vast amounts of first-party, second-party, and third-party data. These platforms enable businesses to gain actionable insights into customer behavior, preferences, and trends, facilitating targeted advertising and improved customer engagement. The proliferation of digital channels, such as mobile applications, social media, and e-commerce platforms, further fuels the adoption of DMPs, as businesses seek to unify fragmented data sources. Additionally, advancements in artificial intelligence and machine learning are enhancing the analytical capabilities of DMPs, enabling real-time audience segmentation and predictive analytics. However, data privacy regulations and concerns around user consent pose challenges to the market's growth. To address these, vendors are focusing on compliance, transparency, and robust data security measures. As businesses increasingly prioritize data-driven strategies, the DMP market is poised for significant expansion, with opportunities for innovation in integration, scalability, and interoperability to meet evolving organizational needs. Recent developments include: March 2022 Oracle Corporation announced that Oracle Unity Customer Data Platform- which is an enterprise grade data platform that powers next generation adtech strategies and enables marketers to unify customer data for segmentation. It is also used for providing hyper personalized experience. Thus oracle has unified AdTech and martech into one unit. The company has done so by using design principles of marketing and advertising products around first party data. Thus improved data management capabilities are used to compliment systems of customer record and help marketers gain cost efficiencies., September 2019 Oracle Corporation has announced that they have integrated Bluekai and ID graph with CX unity. The company has integrated Bluekai data management platform DMP and ID graph with its customer data platform. This step is aimed to help marketers tie device level data about unknown prospects to their customers data and receive insights about marketing techniques and ad techniques. this step is going to allow customers to deliver personalization at a whole new level., March 2023 Adobe Corporation at Adobe Summit in New Delhi announced that they have launched Adobe product analytics in adobe experience cloud. The tool unifies customer journey insights across marketing and products. Using the tool, customer experience teams can now look deeply across marketing and products insights for a single customer view., March 2023 Adobe Corporation announced at Adobe Summit in New Delhi announced that they have launched new innovations in Adobe Experience manager which is a leading Data Management Platform DMP. The new release will deliver next generation features that bring speed and makes it easy for content developments and publishing higher quality web experiences and AI powered data insights that help organizations to optimize new content for the targeted audiences.. Key drivers for this market are: Increasing data volumes and complexity Growing importance of customer data and personalization Adoption of digital marketing channels Need for data-driven decision-making Government regulations. Potential restraints include: Data privacy concerns Cost and complexity of implementation Lack of skilled data professionals Data quality issues Integration challenges with other systems. Notable trends are: Rise of the Identity Graph Adoption of Cloud-Native Platforms Real-Time Data Management Multi-Vendor Integration Ethical and Sustainable Data Use.

  20. STEP Skills Measurement Household Survey 2012 (Wave 1) - Sri Lanka

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    • +1more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). STEP Skills Measurement Household Survey 2012 (Wave 1) - Sri Lanka [Dataset]. https://datacatalog.ihsn.org/catalog/4786
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Time period covered
    2012
    Area covered
    Sri Lanka
    Description

    Abstract

    The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.

    The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.

    Geographic coverage

    The STEP target population is the urban population aged 15 to 64 included. Sri Lanka sampled both urban and rural areas. Areas are classified as rural or urban based on each country's official definition.

    Analysis unit

    The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.

    Universe

    The target population for the Sri Lanka STEP survey comprised all non-institutionalized persons 15 to 64 years of age (inclusive) living in private dwellings in urban and rural areas of Sri Lanka at the time of data collection. Exclusions The target population excludes: - Foreign diplomats and non-nationals working for international organizations; - People in institutions such as hospitals or prisons; - Collective dwellings or group quarters; - Persons living outside the country at the time of data collection, e.g., students at foreign universities; - Persons who are unable to complete the STEP assessment due to a physical or mental condition, e.g., visual impairment or paralysis.

    The sample frame for the selection of first stage sample units was the Census 2011/12

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Sri Lanka sample size was 2,989 households. The sample design is a 5 stage stratified sample design. The stratification variable is Urban-Rural indicator.

    First Stage Sample The primary sample unit (PSU) is a Grama Niladari (GN) division. The sampling objective was to conduct interviews in 200 GNs, consisting of 80 urban GNs and 120 rural GNs. Because there was some concern that it might not be possible to conduct any interviews in some initially selected GNs (e.g. due to war, conflict, or inaccessibility, for some other reason), the sampling strategy also called for the selection of 60 extra GNs (i.e., 24 urban GNs and 36 rural GNs) to be held in reserve for such eventualities. Hence, a total of 260 GNs were selected, consisting of 200 'initial' GNs and 60 'reserve' GNs. Two GNS from the initial sample of GNs were not accessible and reserve sampled GNs were used instead. Thus a total of 202 GNs were activated for data collection, and interviews were conducted in 200 GNs. The sample frame for the selection of first stage sample units was the list of GNs from the Census 2011/12. Note: The sample of first stage sample units was selected by the Sri Lanka Department of Census & Statistics (DCS) and provided to the World Bank. The DCS selected the GNs with probability proportional to size (PPS), where the measure of size was the number of dwellings in a GN.

    Second Stage Sample The second stage sample unit (SSU) is a GN segment, i.e., GN BLOCK. One GN Block was selected from each activated PSU (i.e., GN). According to the Sri Lanka survey firm, each sampled GN was divided into a number of segments, i.e., GN Blocks, with approximately the same number of households, and one GN Block was selected from each sampled GN.

    Third Stage Sample The third stage sample unit is a dwelling. The sampling objective was to obtain interviews at 15 dwellings within each selected SSU.

    Fourth Stage Sample The fourth stage sample unit is a household. The sampling objective was to select one household within each selected third stage dwelling.

    Fifth Stage Sample The fourth stage sample unit is an individual aged 15-64 (inclusive). The sampling objective was to select one individual with equal probability from each selected household.

    Please refer to the Sri Lanka STEP Survey Weighting Procedures Summary for additional information on sampling.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The STEP survey instruments include: (i) A Background Questionnaire developed by the WB STEP team. (ii) A Reading Literacy Assessment developed by Educational Testing Services (ETS).

    All countries adapted and translated both instruments following the STEP Technical Standards: 2 independent translators adapted and translated the Background Questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator. - The survey instruments were both piloted as part of the survey pretest. - The adapted Background Questionnaires are provided in English as external resources. The Reading Literacy Assessment is protected by copyright and will not be published.

    Cleaning operations

    STEP Data Management Process 1. Raw data is sent by the survey firm 2. The WB STEP team runs data checks on the Background Questionnaire data. - ETS runs data checks on the Reading Literacy Assessment data. - Comments and questions are sent back to the survey firm. 3. The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data. 4. The WB STEP team and ETS check the data files are clean. This might require additional iterations with the survey firm. 5. Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies. 6. ETS scales the Reading Literacy Assessment data. 7. The WB STEP team merges the Background Questionnaire data with the Reading Literacy Assessment data and computes derived variables.

    Detailed information data processing in STEP surveys is provided in the 'Guidelines for STEP Data Entry Programs' document provided as an external resource. The template do-file used by the STEP team to check the raw background questionnaire data is provided as an external resource.

    Response rate

    The response rate for Sri Lanka (urban and rural) was 63%. (See STEP Methodology Note Table 4).

    Sampling error estimates

    A weighting documentation was prepared for each participating country and provides some information on sampling errors. Weighting documentation is provided as an external resource.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Work With Data, SPSS demystified : a step-by-step guide to successful data analysis : for S.. [Dataset]. https://www.workwithdata.com/object/spss-demystified-a-step-by-step-guide-to-successful-data-analysis-for-spss-version-18-0-book-by-ronald-d-yockey-0000

SPSS demystified : a step-by-step guide to successful data analysis : for S..

Explore at:
21 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
Work With Data
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

SPSS demystified : a step-by-step guide to successful data analysis : for SPSS version 18.0 is a book. It was written by Ronald D. Yockey and published by Pearson Education in 2011.

Search
Clear search
Close search
Google apps
Main menu