77 datasets found
  1. f

    Data from: Data transformation: an underestimated tool by inappropriate use

    • scielo.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Paulo Ribeiro-Oliveira; Denise Garcia de Santana; Vanderley José Pereira; Carlos Machado dos Santos (2023). Data transformation: an underestimated tool by inappropriate use [Dataset]. http://doi.org/10.6084/m9.figshare.6083840.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELO journals
    Authors
    João Paulo Ribeiro-Oliveira; Denise Garcia de Santana; Vanderley José Pereira; Carlos Machado dos Santos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT. There are researchers who do not recommend data transformation arguing it causes problems in inferences and mischaracterises data sets, which can hinder interpretation. There are other researchers who consider data transformation necessary to meet the assumptions of parametric models. Perhaps the largest group of researchers who make use of data transformation are concerned with experimental accuracy, which provokes the misuse of this tool. Considering this, our paper offer a study about the most frequent situations related to data transformation and how this tool can impact ANOVA assumptions and experimental accuracy. Our database was obtained from measurements of seed physiology and seed technology. The coefficient of variation cannot be used as an indicator of data transformation. Data transformation might violate the assumptions of analysis of variance, invalidating the idea that its use will provoke fail the inferences, even if it does not improve the quality of the analysis. The decision about when to use data transformation is dichotomous, but the criteria for this decision are many. The unit (percentage, day or seedlings per day), the experimental design and the possible robustness of F-statistics to ‘small deviations’ to Normal are among the main indicators for the choice of the type of transformation.

  2. D

    Statistics Software Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Statistics Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-statistics-software-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 22, 2024
    Authors
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Statistics Software Market Outlook



    The global statistics software market size is projected to grow from USD 10.5 billion in 2023 to USD 18.7 billion by 2032, exhibiting a CAGR of 6.5% over the forecast period. The growth of this market is driven by the increasing adoption of data-driven decision-making processes across various industries, the rising need for statistical modeling and analysis tools, and the growing emphasis on advanced analytics to gain competitive advantages. Additionally, the expanding use of artificial intelligence (AI) and machine learning (ML) technologies to enhance the capabilities of statistics software is contributing significantly to market growth.



    One of the primary growth factors of the statistics software market is the increasing reliance on data analytics and business intelligence tools across different sectors. Organizations are leveraging statistical software to analyze large volumes of data generated through various digital channels, enabling them to make informed decisions and identify new business opportunities. This trend is particularly evident in the healthcare, finance, and retail sectors, where data-driven insights are crucial for improving operational efficiency, customer satisfaction, and overall performance.



    Another key driver for the market is the proliferation of big data and the need for advanced data management solutions. With the exponential growth of data generated by various sources such as social media, IoT devices, and enterprise systems, there is a heightened demand for robust statistical software that can handle complex data sets and perform sophisticated analyses. This has led to increased investments in the development of innovative statistics software solutions that offer enhanced features and capabilities, such as real-time data processing, predictive analytics, and automated reporting.



    The integration of AI and ML technologies into statistics software is also significantly boosting market growth. These technologies enable more accurate and efficient data analysis, allowing organizations to uncover hidden patterns and trends that were previously impossible to detect. AI-powered statistical tools can automate repetitive tasks, reduce human error, and provide deeper insights into data, thereby enhancing the overall decision-making process. As a result, there is a growing adoption of AI-driven statistics software across various industries, further propelling market expansion.



    Regionally, North America is expected to maintain its dominance in the statistics software market, owing to the presence of numerous leading software providers, high adoption of advanced analytics solutions, and substantial investments in research and development. However, the Asia Pacific region is anticipated to witness the highest growth rate over the forecast period, driven by the rapid digital transformation of businesses, increasing awareness of data analytics benefits, and supportive government initiatives promoting technological advancements.



    Component Analysis



    The statistics software market is segmented by component into software and services. The software segment includes various types of statistical analysis tools, ranging from basic data visualization software to advanced predictive analytics platforms. This segment holds the largest market share due to the widespread adoption of software solutions that enable organizations to analyze and interpret data efficiently. The continuous development of innovative features, such as real-time analytics, data mining, and machine learning capabilities, is further driving the demand for statistics software.



    In contrast, the services segment encompasses consulting, implementation, training, and support services provided by software vendors and third-party providers. These services are crucial for organizations to effectively utilize statistical software and maximize its benefits. The growing complexity of data and the need for specialized expertise in data analysis are driving the demand for professional services in the statistics software market. Moreover, as more businesses adopt advanced analytics solutions, the need for ongoing support and training services is expected to increase, contributing to the growth of the services segment.



    The integration of cloud computing with statistics software is also influencing the component-wise growth of this market. Cloud-based solutions offer several advantages, such as scalability, flexibility, and cost-effectiveness, making them an attractive option for organizations of all sizes. As a result, there is a

  3. d

    Learning Disability Services Monthly Statistics - AT: July 2021, MHSDS: May...

    • digital.nhs.uk
    csv, xlsx
    Updated Aug 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Learning Disability Services Monthly Statistics - AT: July 2021, MHSDS: May 2021 Final [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/learning-disability-services-statistics/at-july-2021-mhsds-may-2021-final
    Explore at:
    xlsx(1.8 MB), csv(122.9 kB)Available download formats
    Dataset updated
    Aug 19, 2021
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Jul 1, 2021 - Jul 31, 2021
    Area covered
    England
    Description

    Contains monthly data from the Assuring Transformation dataset. Data is available in Excel or CSV format. PLEASE NOTE: Some updates to the structure and numbering of the data tables and csv were applied from April 2021. This was primarily to group similar table types and content together. Additionally we have increased the amount of tables that have time series data retrospectively updated each month (green tabs). We welcome any feedback on this updated format.

  4. Big data and business analytics revenue worldwide 2015-2022

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Big data and business analytics revenue worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/551501/worldwide-big-data-business-analytics-revenue/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global big data and business analytics (BDA) market was valued at ***** billion U.S. dollars in 2018 and is forecast to grow to ***** billion U.S. dollars by 2021. In 2021, more than half of BDA spending will go towards services. IT services is projected to make up around ** billion U.S. dollars, and business services will account for the remainder. Big data High volume, high velocity and high variety: one or more of these characteristics is used to define big data, the kind of data sets that are too large or too complex for traditional data processing applications. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. For example, connected IoT devices are projected to generate **** ZBs of data in 2025. Business analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate business insights. The size of the business intelligence and analytics software application market is forecast to reach around **** billion U.S. dollars in 2022. Growth in this market is driven by a focus on digital transformation, a demand for data visualization dashboards, and an increased adoption of cloud.

  5. f

    Evaluating Functional Diversity: Missing Trait Data and the Importance of...

    • plos.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello (2023). Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation [Dataset]. http://doi.org/10.1371/journal.pone.0149270
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.

  6. S

    Digital Transformation Statistics By Trends, Expenditure, Adoption And...

    • sci-tech-today.com
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sci-Tech Today (2025). Digital Transformation Statistics By Trends, Expenditure, Adoption And Predictions [Dataset]. https://www.sci-tech-today.com/stats/digital-transformation-statistics/
    Explore at:
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    Sci-Tech Today
    License

    https://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Digital Transformation Statistics: Today, businesses are trying to embrace innovative technologies that are also challenging, as they quickly change the digital environment worldwide. Digital transformation statistics involve integrating these technologies to boost productivity, efficiency, and sustainability in operations.

    This concept emerged during the COVID-19 pandemic, which heralded an avalanche of more agile and intelligent ways of doing business. The main technologies driving this transformation include artificial intelligence (AI), big data, and cloud computing, which have diverse applications across different sectors. A key trend in 2024 is for companies to adopt new technologies to remain competitive in their respective fields of business.

    With a projected $3.7 trillion by the end of this year for the global digital transformation statistics market, it becomes clear that the adoption of cloud computing, automation, and AI has become a major propeller for business growth. As more companies adopt digital strategies, market researchers must understand current trends and statistics that will inform future strategies.

  7. Priorities in digital transformation among businesses Vietnam 2022

    • statista.com
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Priorities in digital transformation among businesses Vietnam 2022 [Dataset]. https://www.statista.com/statistics/1368527/vietnam-digital-transformation-priorities/
    Explore at:
    Dataset updated
    Jul 5, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2022
    Area covered
    Vietnam
    Description

    According to a survey conducted in 2022, the majority of respondents from large domestic and multi-national businesses operating in Vietnam identified the establishment of Business Continuity Plans (BCP) and Disaster Recovery plans (DRP), alongside using data analytics platforms as the leading initiative categories in their digital transformation survey. The same survey revealed that 84 percent of respondents have a cloud migration strategy.

  8. Economic Statistics Transformation Programme: enhanced financial accounts...

    • s3.amazonaws.com
    • gov.uk
    Updated Mar 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2020). Economic Statistics Transformation Programme: enhanced financial accounts (UK flow of funds): historical data for the household and NPISH financial categories AF.6, AF.7 and AF.8 assets and liabilities [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/161/1613491.html
    Explore at:
    Dataset updated
    Mar 16, 2020
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Office for National Statistics
    Area covered
    United Kingdom
    Description

    Official statistics are produced impartially and free from political influence.

  9. Digital transformation spending worldwide 2017-2027

    • statista.com
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Digital transformation spending worldwide 2017-2027 [Dataset]. https://www.statista.com/statistics/870924/worldwide-digital-transformation-market-size/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2024, spending on digital transformation (DX) is projected to reach *** trillion U.S. dollars. By 2027, global digital transformation spending is forecast to reach *** trillion U.S. dollars. What is digital transformation? Digital transformation refers to the adoption of digital technology to transform business processes and services from non-digital to digital. This encompasses, among others, moving data to the cloud, using technological devices and tools for communication and collaboration, as well as automating processes. What is driving digital transformation? Digital transformation growth is due to several contributing factors. Among these was COVID-19 pandemic, which has increased the digital transformation tempo in organizations around the globe in 2020 considerably. Although the pandemic is over, working from home among organizations globally has not only remained, but also increased, increasing the drive for digital transformation. Other contributing causes include customer demand and the need to be on par with competitors. Overall, utilizing technologies for digital transformation render organizations more agile in responding to changing markets and enhance innovation, thereby making them more resilient.

  10. e

    ID 2007 Mental Health indicator

    • data.europa.eu
    • data.gov.uk
    html
    Updated Oct 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2021). ID 2007 Mental Health indicator [Dataset]. https://data.europa.eu/data/datasets/id_2007_mental_health_indicator
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Oct 30, 2021
    Dataset authored and provided by
    Office for National Statistics
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Indices of Deprivation (ID) 2004: Health Deprivation and Disability, measure of adults under 60 suffering from mood or anxiety disorders, based on prescribing, suicides, and health benefits data Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Lower Layer Super Output Area (LSOA) Geographic coverage: England Time coverage: 2007 Type of data: Administrative data (with statistical transformations applied) Notes: These data represent a 'standardised and normalised measure' of mental health problems within an area rather than an absolute count or percentage of mental health problems.

  11. A

    ‘California Housing Data (1990)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘California Housing Data (1990)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-data-1990-a0c5/b7389540/?iid=007-628&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    Analysis of ‘California Housing Data (1990)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harrywang/housing on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Source

    This is the dataset used in this book: https://github.com/ageron/handson-ml/tree/master/datasets/housing to illustrate a sample end-to-end ML project workflow (pipeline). This is a great book - I highly recommend!

    The data is based on California Census in 1990.

    About the Data (from the book):

    "This dataset is a modified version of the California Housing dataset available from Luís Torgo's page (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors.

    The following is the description from the book author:

    This dataset appeared in a 1997 paper titled Sparse Spatial Autoregressions by Pace, R. Kelley and Ronald Barry, published in the Statistics and Probability Letters journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

    The dataset in this directory is almost identical to the original, with two differences: 207 values were randomly removed from the total_bedrooms column, so we can discuss what to do with missing data. An additional categorical attribute called ocean_proximity was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing."

    About the Data (From Luís Torgo page):

    http://www.dcc.fc.up.pt/%7Eltorgo/Regression/cal_housing.html

    This is a dataset obtained from the StatLib repository. Here is the included description:

    "We collected information on the variables using all the block groups in California from the 1990 Cens us. In this sample a block group on average includes 1425.5 individuals living in a geographically co mpact area. Naturally, the geographical area included varies inversely with the population density. W e computed distances among the centroids of each block group as measured in latitude and longitude. W e excluded all the block groups reporting zero entries for the independent and dependent variables. T he final data contained 20,640 observations on 9 variables. The dependent variable is ln(median house value)."

    End-to-End ML Project Steps (Chapter 2 of the book)

    1. Look at the big picture
    2. Get the data
    3. Discover and visualize the data to gain insights
    4. Prepare the data for Machine Learning algorithms
    5. Select a model and train it
    6. Fine-tune your model
    7. Present your solution
    8. Launch, monitor, and maintain your system

    The 10-Step Machine Learning Project Workflow (My Version)

    1. Define business object
    2. Make sense of the data from a high level
      • data types (number, text, object, etc.)
      • continuous/discrete
      • basic stats (min, max, std, median, etc.) using boxplot
      • frequency via histogram
      • scales and distributions of different features
    3. Create the traning and test sets using proper sampling methods, e.g., random vs. stratified
    4. Correlation analysis (pair-wise and attribute combinations)
    5. Data cleaning (missing data, outliers, data errors)
    6. Data transformation via pipelines (categorical text to number using one hot encoding, feature scaling via normalization/standardization, feature combinations)
    7. Train and cross validate different models and select the most promising one (Linear Regression, Decision Tree, and Random Forest were tried in this tutorial)
    8. Fine tune the model using trying different combinations of hyperparameters
    9. Evaluate the model with best estimators in the test set
    10. Launch, monitor, and refresh the model and system

    --- Original source retains full ownership of the source dataset ---

  12. l

    Household Income and Expenditure Survey 2016 - Liberia

    • microdata.lisgislr.org
    • catalog.ihsn.org
    • +1more
    Updated Oct 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liberia Institute for Statistics and Geo-Information Services (2024). Household Income and Expenditure Survey 2016 - Liberia [Dataset]. https://microdata.lisgislr.org/index.php/catalog/29
    Explore at:
    Dataset updated
    Oct 17, 2024
    Dataset authored and provided by
    Liberia Institute for Statistics and Geo-Information Services
    Time period covered
    2016 - 2017
    Area covered
    Liberia
    Description

    Abstract

    The main purpose of the Household Income Expenditure Survey (HIES) 2016 was to offer high quality and nationwide representative household data that provided information on incomes and expenditure in order to update the Consumer Price Index (CPI), improve National Accounts statistics, provide agricultural data and measure poverty as well as other socio-economic indicators. These statistics were urgently required for evidence-based policy making and monitoring of implementation results supported by the Poverty Reduction Strategy (I & II), the AfT and the Liberia National Vision 2030. The survey was implemented by the Liberia Institute of Statistics and Geo-Information Services (LISGIS) over a 12-month period, starting from January 2016 and was completed in January 2017. LISGIS completed a total of 8,350 interviews, thus providing sufficient observations to make the data statistically significant at the county level. The data captured the effects of seasonality, making it the first of its kind in Liberia. Support for the survey was offered by the Government of Liberia, the World Bank, the European Union, the Swedish International Development Corporation Agency, the United States Agency for International Development and the African Development Bank. The objectives of the 2016 HIES were:

    1. Update the Consumer Price Index (CPI): To obtain a new set of weights for the basket of goods and services that upgrade the Monrovia Consumer Price Index (MCPI) and the National Consumer Price Index (NCPI) and to revise the CPI basket of goods and services in Liberia to reflect the current consumption pattern of residence.
    2. Improve National Accounts Statistics: To get information on annual household expenditure patterns in order to update the household component of the National Accounts.
    3. Measure Poverty: To prepare robust poverty indices that enable the understanding of poverty dynamics across the country and of the factors influencing them.
    4. Improve Agricultural Statistics: To obtain nationally representative and policy relevant agricultural statistics in order to undertake in-depth analysis of agricultural households.
    5. Capture Socio-economic Impact of Ebola Virus Disease (EVD): To obtain a post-EVD dataset which allows for an in-depth analysis of the socioeconomic impact of EVD on households.
    6. Benchmark Agenda for Transformation Indicators: To provide an update on selected socioeconomic indicators used to benchmark the government’s policies embedded within the Agenda for Transformation.
    7. Develop Statistical Capacity: Emphasize capacity building and development of sustainable statistical systems through every stage of the project to produce accurate and timely information about Liberia.

    Geographic coverage

    National

    Analysis unit

    • Households
    • Individuals

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The original sample design for the HIES exploited two-phased clustered sampling methods, encompassing a nationally representative sample of households in every quarter and was obtained using the 2008 National Housing and Population Census sampling frame. The procedures used for each sampling stage are as follows:
    i. First stage
    Selection of sample EAs. The sample EAs for the 2016 HIES were selected within each stratum systematically with Probability Proportional to Size from the ordered list of EAs in the sampling frame. They are selected separately for each county by urban/rural stratum. The measure of size for each EA was based on the number of households from the sampling frame of EAs based on the 2008 Liberia Census. Within each stratum the EAs were ordered geographically by district, clan and EA codes. This provided implicit geographic stratification of the sampling frame.

    ii. Second stage
    Selection of sample households within a sample EA. A random systematic sample of 10 households were selected from the listing for each sample EA. Using this type of table, the supervisor only has to look up the total number of households listed, and a specific systematic sample of households is identified in the corresponding row of the table.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    There were three questionnaires administered for this survey: 1. Household and Individual Questionnaire 2. Market Price Questionnaire 3. Agricultural Recall Questionnaire

    Cleaning operations

    The data entry clerk for each team, using data entry software called CSPro, entered data for each household in the field. For each household, an error report was generated on-site, which identified key problems with the data collected (outliers, incorrect entries, inconsistencies with skip patterns, basic filters for age and gender specific questions etc.). The Supervisor along with the Data Entry Clerk and the Enumerator that collected the data reviewed these errors. Callbacks were made to households if necessary to verify information and rectify the errors while in that EA.

    Once the data were collected in each EA, they were sent to LISGIS headquarters for further processing along with EA reports for each area visited. The HIES Technical committee converted the data into STATA and ran several consistency checks to manage overall data quality and prepared reports to identify key problems with the data set and called the field teams to update them about the same. Monthly reports were prepared by summarizing observations from data received from the field alongside statistics on data collection status to share with the field teams and LISGIS Management.

  13. e

    Index of Multiple Deprivation (IMD) 2007

    • data.europa.eu
    html
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics, Index of Multiple Deprivation (IMD) 2007 [Dataset]. https://data.europa.eu/data/datasets/index_of_multiple_deprivation_imd_2007?locale=pt
    Explore at:
    htmlAvailable download formats
    Dataset authored and provided by
    Office for National Statistics
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Index of Multiple Deprivation 2007: Measure of multiple deprivation at small area level made up of seven domains Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Lower Layer Super Output Area (LSOA) Geographic coverage: England Time coverage: 2007 (using data from 2001 to 2005) Type of data: Administrative data (with statistical transformations applied)

  14. g

    ID 2004 Combined Road Distance to Services indicator

    • gimi9.com
    • cloud.csiss.gmu.edu
    • +2more
    Updated Jan 21, 2010
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2010). ID 2004 Combined Road Distance to Services indicator [Dataset]. https://gimi9.com/dataset/uk_id_2004_combined_road_distance_to_services_indicator/
    Explore at:
    Dataset updated
    Jan 21, 2010
    Description

    Road distances to nearest General Practice (GP) premises, primary schools, post offices and supermarket/convenience stores. Source: Office of the Deputy Prime Minister (ODPM): ID 2004 Publisher: Neighbourhood Statistics Geographies: Lower Layer Super Output Area (LSOA) Geographic coverage: England Time coverage: 2004 (using data from 2001 to 2003) Type of data: Administrative data (with statistical transformations applied)

  15. W

    ID 2007 Extent

    • cloud.csiss.gmu.edu
    • data.europa.eu
    • +1more
    html
    Updated Dec 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Kingdom (2019). ID 2007 Extent [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/id_2007_extent
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 20, 2019
    Dataset provided by
    United Kingdom
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    ID 2007 Extent Score: Proportion of a district's population living in the most deprived SOAs in the country Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Local Authority District (LAD), County/Unitary Authority Geographic coverage: England Time coverage: 2007 Type of data: Administrative data (with statistical transformations applied)

  16. e

    ID 2007 Income Scale

    • data.europa.eu
    html
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics, ID 2007 Income Scale [Dataset]. https://data.europa.eu/data/datasets/id_2007_income_scale
    Explore at:
    htmlAvailable download formats
    Dataset authored and provided by
    Office for National Statistics
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    ID 2007 Income Scale: Number of people income deprived Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Local Authority District (LAD), County/Unitary Authority Geographic coverage: England Time coverage: 2007 (using data from 2005 to 2006) Type of data: Administrative data (with statistical transformations applied)

  17. W

    ID 2007 Health Domain

    • cloud.csiss.gmu.edu
    • data.europa.eu
    • +1more
    html
    Updated Dec 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Kingdom (2019). ID 2007 Health Domain [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/id_2007_health_domain
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 19, 2019
    Dataset provided by
    United Kingdom
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    ID 2007 Health Deprivation and Disability domain (high rates of premature death, poor health or disability) Source: Communities and Local Government (CLG): ID 2007 Publisher: Neighbourhood Statistics Geographies: Lower Layer Super Output Area (LSOA) Geographic coverage: England Time coverage: 2007 (using data from 2001 to 2005) Type of data: Administrative data (with statistical transformations applied)

  18. Modular Data-Transformation Modelling with Geospatial Semantic Array...

    • figshare.com
    png
    Updated Jan 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniele de Rigo (2016). Modular Data-Transformation Modelling with Geospatial Semantic Array Programming [Dataset]. http://doi.org/10.6084/m9.figshare.842695.v5
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jan 18, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Daniele de Rigo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    de Rigo, D., Modular Data-Transformation Modelling with Geospatial Semantic Array Programming. FigShare Digital Science. DOI: 10.6084/m9.figshare.842695

    Modular Data-Transformation Modelling with Geospatial Semantic Array Programming

    Daniele de Rigo

    Summary. Wide-scale transdisciplinary modelling for environment (WSTMe) is a scientific challenge with an increasingly important role in allowing strategic policy-making to be effectively discussed and programmed with the support of robust science [1]. Natural resources such as forests, water and soil, along with climate and human-driven changes, are subject to a network of interactions, whose large scale effects may be significant. WSTMe raises challenging issues when the characteristic heterogeneity of available geospatial information, complexity of systems and multiple sources of uncertainty (including those related to scientific software [2]) may affect the robustness, transparency and comprehensibility of hypotheses and results. In this respect, earth observation and computational science [3,4] are intrinsically linked and expected to deal with such a modular array of transdisciplinary aspects while preserving as much as possible conciseness and a terse semantics [5]. This is desirable in order to better communicate key messages and issues, both among different scientific communities and at the science-policy interface. Geospatial Semantic Array Programming (GeoSemAP) is a new approach [6] for WSTMe that has recently emerged in which a concise integration is introduced among semantics, geospatial tools and the array of data-transformation models (D-TM). WSTMe may often be described as a composition of D‑TMs where the flow of initial and derived/intermediate geo‑data highlights its array-based modular structure and semantics. Transparency (even due to the open science approach) is also a goal, to aid society in clearly understanding and controlling the implications of the technical apparatus on collective environmental decision-making [1–6].

    Caption of the image. Wide-scale transdisciplinary modelling for environment (WSTMe) may often be described as a composition of data-transformation models (D‑TM) where the flow of initial and derived/intermediate geo‑data highlights its array-based modular structure and semantics (Geospatial Semantic Array Programming, GeoSemAP). Sources: [2,6].

    References [1] van der Sluijs, J. P., 2005. Uncertainty as a Monster in the Science-Policy Interface: Four Coping Strategies. Water Science & Technology 52 (6), 87-92. http://scholar.google.com/scholar?cluster=3385318353116653032 [2] de Rigo, D., 2013. Software Uncertainty in Integrated Environmental Modelling: the role of Semantics and Open Science. Geophysical Research Abstracts 15, 13292+. http://scholar.google.com/scholar?cluster=13790404181931852043 [3] Peng, R. D., 2011. Reproducible Research in Computational Science. Science 334 (6060), 1226-1227. http://scholar.google.com/scholar?cluster=905554772905069177 [4] Morin, A., Urban, J., Adams, P. D., Foster, I., Sali, A., Baker, D., Sliz, P., 2012. Shining Light into Black Boxes. Science 336 (6078), 159-160. http://scholar.google.com/scholar?cluster=12575758499484368256 [5] de Rigo, D., 2012. Semantic Array Programming for Environmental Modelling: Application of the Mastrave Library. In: Seppelt, R., Voinov, A. A., Lange, S., Bankamp, D. (Eds.), International Environmental Modelling and Software Society (iEMSs) 2012 International Congress on Environmental Modelling and Software. Managing Resources of a Limited Planet: Pathways and Visions under Uncertainty, Sixth Biennial Meeting. pp. 1167-1176. http://scholar.google.com/scholar?cluster=6628751141895151391 [6] de Rigo, D., Corti, P., Caudullo, G., McInerney, D., Di Leo, M., San-Miguel-Ayanz, J., 2013. Toward Open Science at the European Scale: Geospatial Semantic Array Programming for Integrated Environmental Modelling. Geophysical Research Abstracts 15, 13245+. http://scholar.google.com/scholar?cluster=17118262245556811911

  19. f

    Data from: Insights into the Effects of Violating Statistical Assumptions...

    • acs.figshare.com
    xlsx
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber O. Brown; Peter J. Green; Greta J. Frankham; Barbara H. Stuart; Maiken Ueland (2023). Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “‑omics” Data with Multiple Explanatory Variables [Dataset]. http://doi.org/10.1021/acsomega.3c01613.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    ACS Publications
    Authors
    Amber O. Brown; Peter J. Green; Greta J. Frankham; Barbara H. Stuart; Maiken Ueland
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Biological volatilome analysis is inherently complex due to the considerable number of compounds (i.e., dimensions) and differences in peak areas by orders of magnitude, between and within compounds found within datasets. Traditional volatilome analysis relies on dimensionality reduction techniques which aid in the selection of compounds that are considered relevant to respective research questions prior to further analysis. Currently, compounds of interest are identified using either supervised or unsupervised statistical methods which assume the data residuals are normally distributed and exhibit linearity. However, biological data often violate the statistical assumptions of these models related to normality and the presence of multiple explanatory variables which are innate to biological samples. In an attempt to address deviations from normality, volatilome data can be log transformed. However, whether the effects of each assessed variable are additive or multiplicative should be considered prior to transformation, as this will impact the effect of each variable on the data. If assumptions of normality and variable effects are not investigated prior to dimensionality reduction, ineffective or erroneous compound dimensionality reduction can impact downstream analyses. It is the aim of this manuscript to assess the impact of single and multivariable statistical models with and without the log transformation to volatilome dimensionality reduction prior to any supervised or unsupervised classification analysis. As a proof of concept, Shingleback lizard (Tiliqua rugosa) volatilomes were collected across their species distribution and from captivity and were assessed. Shingleback volatilomes are suspected to be influenced by multiple explanatory variables related to habitat (Bioregion), sex, parasite presence, total body volume, and captive status. This work determined that the exclusion of relevant multiple explanatory variables from analysis overestimates the effect of Bioregion and the identification of significant compounds. The log transformation increased the number of compounds that were identified as significant, as did analyses that assumed that residuals were normally distributed. Among the methods considered in this work, the most conservative form of dimensionality reduction was achieved through analyzing untransformed data using Monte Carlo tests with multiple explanatory variables.

  20. Impact of AI on work performance 2023, by skill level

    • statista.com
    • ai-chatbox.pro
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bergur Thormundsson (2024). Impact of AI on work performance 2023, by skill level [Dataset]. https://www.statista.com/topics/6778/digital-transformation/
    Explore at:
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Bergur Thormundsson
    Description

    As of 2023, artificial intelligence (AI) has shown to improve work performance for both lower-skilled and higher-skilled workers. While the improvement gained from the use of AI was higher for lower-skilled workers with a performance score of 6.06, higher-skilled workers continued to perform better with and without the technology.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
João Paulo Ribeiro-Oliveira; Denise Garcia de Santana; Vanderley José Pereira; Carlos Machado dos Santos (2023). Data transformation: an underestimated tool by inappropriate use [Dataset]. http://doi.org/10.6084/m9.figshare.6083840.v1

Data from: Data transformation: an underestimated tool by inappropriate use

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
SciELO journals
Authors
João Paulo Ribeiro-Oliveira; Denise Garcia de Santana; Vanderley José Pereira; Carlos Machado dos Santos
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ABSTRACT. There are researchers who do not recommend data transformation arguing it causes problems in inferences and mischaracterises data sets, which can hinder interpretation. There are other researchers who consider data transformation necessary to meet the assumptions of parametric models. Perhaps the largest group of researchers who make use of data transformation are concerned with experimental accuracy, which provokes the misuse of this tool. Considering this, our paper offer a study about the most frequent situations related to data transformation and how this tool can impact ANOVA assumptions and experimental accuracy. Our database was obtained from measurements of seed physiology and seed technology. The coefficient of variation cannot be used as an indicator of data transformation. Data transformation might violate the assumptions of analysis of variance, invalidating the idea that its use will provoke fail the inferences, even if it does not improve the quality of the analysis. The decision about when to use data transformation is dichotomous, but the criteria for this decision are many. The unit (percentage, day or seedlings per day), the experimental design and the possible robustness of F-statistics to ‘small deviations’ to Normal are among the main indicators for the choice of the type of transformation.

Search
Clear search
Close search
Google apps
Main menu